From ndbecker2 at gmail.com Fri Jul 1 07:01:32 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 01 Jul 2011 07:01:32 -0400 Subject: [Numpy-discussion] review request: introductory datetime documentation References: Message-ID: Just trying it out with 1.6: np.datetime64('now') Out[6]: 2011-07-01 00:00:00 Well the time now is 07:01am. Is this expected behaviour? From matthew.brett at gmail.com Fri Jul 1 07:58:07 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 1 Jul 2011 12:58:07 +0100 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: Hi, On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman wrote: > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith wrote: >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett wrote: >>> In the interest of making the discussion as concrete as possible, here >>> is my draft of an alternative proposal for NAs and masking, based on >>> Nathaniel's comments. ?Writing it, it seemed to me that Nathaniel is >>> right, that the ideas become much clearer when the NA idea and the >>> MASK idea are separate. ? Please do pitch in for things I may have >>> missed or misunderstood: >> [...] >> >> Thanks for writing this up! I stuck it up as a gist so we can edit it >> more easily: >> ?https://gist.github.com/1056379/ >> This is your initial version: >> ?https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 >> And I made a few changes: >> ?https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 >> Specifically, I added a rationale section, changed np.MASKED to >> np.IGNORE (as per comments in this thread), and added a vowel to >> "propmsk". > > It might be helpful to make a small toy class in python so that people > can play around with NA and IGNORE from the alterNEP. Thanks for doing this. I don't know about you, but I don't know where to work on the discussion or draft implementation, because I am not sure where the disagreement is. Lluis has helpfully pointed out a specific case of interest. Pierre has fed back with some points of clarification. However, other than that, I'm not sure what we should be discussing. @Mark @Chuck @anyone Do you see problems with the alterNEP proposal? If so, what are they? Do you agree that the alterNEP proposal is easier to understand? If not, can you explain why? What do you see as the important points of difference between the NEP and the alterNEP? @Pierre - what do you think? Best, Matthew From tkgamble at windstream.net Fri Jul 1 08:41:57 2011 From: tkgamble at windstream.net (Thomas K Gamble) Date: Fri, 1 Jul 2011 06:41:57 -0600 Subject: [Numpy-discussion] broacasting question In-Reply-To: <83163AE5-E45B-4369-A848-F4F62329E98E@astro.physik.uni-goettingen.de> References: <201106301132.22357.tkgamble@windstream.net> <201106301557.58611.tkgamble@windstream.net> <83163AE5-E45B-4369-A848-F4F62329E98E@astro.physik.uni-goettingen.de> Message-ID: <201107010641.57643.tkgamble@windstream.net> > > Right, I forgot to point out that there are at least 2 ways to bring the > arrays into compatible shapes (that's the reason broadcasting does not > work here, because numpy only does automatic broadcasting if there is an > unambiguous way to do so). So the IDL arrays being Fortran-ordered is the > essential bit of information here. Just two remarks: > I. Assigning a = reshape(b.flatten('F')[:size(c)]/c.flatten('F'), c.shape, > order='F') as above will create a new array of shape c.shape - if you > wanted to put your results into an existing array of shape(2048,3577), > you'd still have to explicitly say a[:,:3136] = ... II. The flatten() That was the error in my example I refered to. > operations and the assignment above all create full copies of the arrays, > thus the np.add ufunc above together with simple reshape operations might > improve performance somewhat - however keeping the Fortran order also > requires some costly transpositions, as for your last example > > a = np.divide(b.T[:3136].reshape(c.T.shape).T, c, out=a) > > so YMMV... > > Cheers, > Derek > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Thomas K. Gamble tkgamble at windstream.net The fruit of the righteous is a tree of life; and he who wins souls is wise. (Proverbs 11:30) From tkgamble at windstream.net Fri Jul 1 08:59:27 2011 From: tkgamble at windstream.net (Thomas K Gamble) Date: Fri, 1 Jul 2011 06:59:27 -0600 Subject: [Numpy-discussion] broacasting question In-Reply-To: <83163AE5-E45B-4369-A848-F4F62329E98E@astro.physik.uni-goettingen.de> References: <201106301132.22357.tkgamble@windstream.net> <201106301557.58611.tkgamble@windstream.net> <83163AE5-E45B-4369-A848-F4F62329E98E@astro.physik.uni-goettingen.de> Message-ID: <201107010659.27611.tkgamble@windstream.net> > On 30.06.2011, at 11:57PM, Thomas K Gamble wrote: > >> np.add(b.reshape(2048,3136) * c, d, out=a[:,:3136]) > >> > >> But to say whether this is really the equivalent result to what IDL > >> does, one would have to study the IDL manual in detail or directly > >> compare the output (e.g. check what happens to the values in > >> a[:,3136:]...) > >> > >> Cheers, > >> > >> Derek > > > > Your post gave me the cluse I needed. > > > > I had my shapes slightly off in the example I gave, but if I try: > > > > a = reshape(b.flatten('F') * c.flatten('F') + d.flatten('F'), b.shape, > > order='F') > > > > I get a result in line with the IDL result. > > > > Another example with different total size arrays: > > > > b = np.ndarray((2048,3577), dtype=float) > > c = np.ndarray((256,25088), dtype=float) > > > > a= reshape(b.flatten('F')[:size(c)]/c.flatten('F'), c.shape, order='F') > > > > This also gives a result like that of IDL. > > Right, I forgot to point out that there are at least 2 ways to bring the > arrays into compatible shapes (that's the reason broadcasting does not > work here, because numpy only does automatic broadcasting if there is an > unambiguous way to do so). So the IDL arrays being Fortran-ordered is the > essential bit of information here. Just two remarks: > I. Assigning a = reshape(b.flatten('F')[:size(c)]/c.flatten('F'), c.shape, > order='F') as above will create a new array of shape c.shape - if you > wanted to put your results into an existing array of shape(2048,3577), > you'd still have to explicitly say a[:,:3136] = ... II. The flatten() That was the error in my first example I refered to. I confused it with the second 'divide' example and probably should have used different variable names to avoid confusing things further. Sorry. > operations and the assignment above all create full copies of the arrays, > thus the np.add ufunc above together with simple reshape operations might > improve performance somewhat - however keeping the Fortran order also > requires some costly transpositions, as for your last example Right now, I'm just interested in getting the right answer. Once I have that, I'll work on performance. Unfortunately the order does seem to make a difference. > > a = np.divide(b.T[:3136].reshape(c.T.shape).T, c, out=a) Interesting. > > so YMMV... > > Cheers, > Derek Thanks for your help. -- Thomas K. Gamble tkgamble at windstream.net LANL employee waiting out the Las Conchas fire. From charlesr.harris at gmail.com Fri Jul 1 09:20:22 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 07:20:22 -0600 Subject: [Numpy-discussion] review request: introductory datetime documentation In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 5:01 AM, Neal Becker wrote: > Just trying it out with 1.6: > > np.datetime64('now') > Out[6]: 2011-07-01 00:00:00 > > I get In [1]: datetime64('now') Out[1]: numpy.datetime64('2011-07-01T07:18:35-0600') You need the development branch for trials, Mark has made a lot of fixingup/changes to datetime that aren't in 1.6. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 09:34:46 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 08:34:46 -0500 Subject: [Numpy-discussion] review request: introductory datetime documentation In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 6:01 AM, Neal Becker wrote: > Just trying it out with 1.6: > > np.datetime64('now') > Out[6]: 2011-07-01 00:00:00 > > Well the time now is 07:01am. Is this expected behaviour? > The version of datetime in 1.6 is quite broken. When 1.6 was released, I thought it was probably ok because it had already passed whatever review processes in 1.4, but this turned out not to be a good assumption. In the documentation, I've labeled it as "new in 1.7" (which would change to the actual release version later). -Mark > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 09:53:09 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 07:53:09 -0600 Subject: [Numpy-discussion] broacasting question In-Reply-To: <201106301132.22357.tkgamble@windstream.net> References: <201106301132.22357.tkgamble@windstream.net> Message-ID: On Thu, Jun 30, 2011 at 11:32 AM, Thomas K Gamble wrote: > I'm trying to convert some IDL code to python/numpy and i'm having some > trouble understanding the rules for boradcasting during some operations. > example: > > given the following arrays: > a = array((2048,3577), dtype=float) > b = array((256,25088), dtype=float) > c = array((2048,3136), dtype=float) > d = array((2048,3136), dtype=float) > > do: > a = b * c + d > > In IDL, the computation is done without complaint and all array sizes are > preserved. In ptyhon I get a value error concerning broadcasting. I can > force it to work by taking slices, but the resulting size would be a = > (256x3136) rather than (2048x3577). I admit that I don't understand IDL > (or > python to be honest) well enough to know how it handles this to be able to > replicate the result properly. Does it only operate on the smallest > dimensions ignoring the larger indices leaving their values unchanged? Can > someone explain this to me? > > I don't see a problem In [1]: datetime64('now') Out[1]: numpy.datetime64('2011-07-01T07:18:35-0600') In [2]: a = array((2048, 3577), float) In [3]: b = array((256, 25088), float) In [4]: c = array((2048, 3136), float) In [5]: d = array((2048, 3136), float) In [6]: a = b*c + d In [7]: a Out[7]: array([ 526336., 78679104.]) What is the '*' in your expression supposed to mean? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 10:09:38 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 09:09:38 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett wrote: > Hi, > > On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman wrote: > > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith wrote: > >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett > wrote: > >>> In the interest of making the discussion as concrete as possible, here > >>> is my draft of an alternative proposal for NAs and masking, based on > >>> Nathaniel's comments. Writing it, it seemed to me that Nathaniel is > >>> right, that the ideas become much clearer when the NA idea and the > >>> MASK idea are separate. Please do pitch in for things I may have > >>> missed or misunderstood: > >> [...] > >> > >> Thanks for writing this up! I stuck it up as a gist so we can edit it > >> more easily: > >> https://gist.github.com/1056379/ > >> This is your initial version: > >> > https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 > >> And I made a few changes: > >> > https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 > >> Specifically, I added a rationale section, changed np.MASKED to > >> np.IGNORE (as per comments in this thread), and added a vowel to > >> "propmsk". > > > > It might be helpful to make a small toy class in python so that people > > can play around with NA and IGNORE from the alterNEP. > > Thanks for doing this. > > I don't know about you, but I don't know where to work on the > discussion or draft implementation, because I am not sure where the > disagreement is. Lluis has helpfully pointed out a specific case of > interest. Pierre has fed back with some points of clarification. > However, other than that, I'm not sure what we should be discussing. > > @Mark > @Chuck > @anyone > > Do you see problems with the alterNEP proposal? Yes, I really like my design as it stands now, and the alterNEP removes a lot of the abstraction and interoperability that are in my opinion the best parts. I've made more updates to the NEP based on continuing feedback, which are part of the pull request I want reviews for. > If so, what are they? > Mainly: Reduced interoperability, more complex implementation (leading to more bugs), and an unclear theoretical model for the masked part of it. > Do you agree that the alterNEP proposal is easier to understand? No. If not, can you explain why? > My answers to that are already scattered in the emails in various places, and in the various rationales and justifications provided in the NEP. > What do you see as the important points of difference between the NEP > and the alterNEP? > The biggest thing is the NEP supports more use cases in a clean way by composition of different simpler components. It defines one clear missing data abstraction, and proposes two implementations that are interchangeable and can interoperate. The alterNEP proposes two independent APIs, reducing interoperability and so significantly increasing the amount of learning required to work with both of them. This also precludes switching between the two approaches without a lot of work. The current pull request that's sitting there waiting for review does not have an impact on which approach goes ahead, but the code I'm doing now does. This is a fairly large project, and I don't have a great length of time to do it in, so I'm not going to participate extensively in the alterNEP discussion. If you want to help me, please review my code and provide specific feedback on my NEP (the code review system in github is great for this too, I've received some excellent feedback on the NEP that way). If you want to change my mind about things, please address the specific design decisions you think are problematic by specifically responding to lines in the NEP, as part of code-reviewing my pull request in github. Thanks, -Mark @Pierre - what do you think? > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From marc.labadens at gmail.com Fri Jul 1 10:45:47 2011 From: marc.labadens at gmail.com (Marc Labadens) Date: Fri, 1 Jul 2011 16:45:47 +0200 Subject: [Numpy-discussion] Using numpy array in C code ? Message-ID: Hello ! I am trying to interface some python code using numpy array with some C code. I have tried out this: - - - - - - - - Python code - - - - - - - - import numpy a = numpy.array([1.4, 2.4, 3.6], dtype=float) my_c_method(a) # call to the C code - - - - - - - - C code - - - - - - - - - - - - #include static PyObject * my_c_method(PyObject *self, PyObject *args) { double * points; points = malloc(sizeof(double)*3); //points = PyTuple_GET_ITEM(args, 3); // doesn't work PyArg_ParseTuple(args, "O&", points); // doesn't work either... printf("points[0] = %f \n",points[0]); // I want this to print points[0] = 1.4 } - - - - - - - - - - - - - - - - - - - - - - - - - - but it doesn't work... Does anyone knows how to use numpy array in some C code? Thanks a lot, Marc -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jul 1 10:50:03 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 1 Jul 2011 15:50:03 +0100 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: Hi, On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman wrote: >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith wrote: >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett >> >> wrote: >> >>> In the interest of making the discussion as concrete as possible, here >> >>> is my draft of an alternative proposal for NAs and masking, based on >> >>> Nathaniel's comments. ?Writing it, it seemed to me that Nathaniel is >> >>> right, that the ideas become much clearer when the NA idea and the >> >>> MASK idea are separate. ? Please do pitch in for things I may have >> >>> missed or misunderstood: >> >> [...] >> >> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit it >> >> more easily: >> >> ?https://gist.github.com/1056379/ >> >> This is your initial version: >> >> >> >> ?https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 >> >> And I made a few changes: >> >> >> >> ?https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 >> >> Specifically, I added a rationale section, changed np.MASKED to >> >> np.IGNORE (as per comments in this thread), and added a vowel to >> >> "propmsk". >> > >> > It might be helpful to make a small toy class in python so that people >> > can play around with NA and IGNORE from the alterNEP. >> >> Thanks for doing this. >> >> I don't know about you, but I don't know where to work on the >> discussion or draft implementation, because I am not sure where the >> disagreement is. ?Lluis has helpfully pointed out a specific case of >> interest. ? Pierre has fed back with some points of clarification. >> However, other than that, I'm not sure what we should be discussing. >> >> @Mark >> @Chuck >> @anyone >> >> Do you see problems with the alterNEP proposal? > > Yes, I really like my design as it stands now, and the alterNEP removes a > lot of the abstraction and interoperability that are in my opinion the best > parts. I've made more updates to the NEP based on continuing feedback, which > are part of the pull request I want reviews for. Ah - I think what you are saying is - too late I've started writing it. > Mainly: Reduced interoperability Meaning? > more complex implementation (leading to > more bugs), OK - but the discussion did not seem to be about the complexity of the implementation, but about the API. > and an unclear theoretical model for the masked part of i What's unclear? Or even different? >> Do you agree that the alterNEP proposal is easier to understand? > > > No. Do you agree that there are several people on the list who do thing that the alterNEP proposal is easier to understand? >> If?not, can you explain why? > > My answers to that are already scattered in the emails in various places, > and in the various rationales and justifications provided in the NEP. I can't see any reference to the alterNEP or the idea of the separate API in the NEP. Can you point me to it? >> What do you see as the important points of difference between the NEP >> and the alterNEP? > > The biggest thing is the NEP supports more use cases in a clean way by > composition of different simpler components. It defines one clear missing > data abstraction, and proposes two implementations that are interchangeable > and can interoperate. The alterNEP proposes two independent APIs, reducing > interoperability and so significantly increasing the amount of learning > required to work with both of them. This also precludes switching between > the two approaches without a lot of work. Lluis gave a particular somewhat obscure case where it is convenient that the NA and IGNORE are the same. Are there any others? It seems to me the API you propose is a classic example of implicit rather than explicit, and that it would be very easy, at this stage, to fix that. > The current pull request that's sitting there waiting for review does not > have an impact on which approach goes ahead, but the code I'm doing now > does. This is a fairly large project, and I don't have a great length of > time to do it in, so I'm not going to participate extensively in the > alterNEP discussion. If you want to help me, please review my code and > provide specific feedback on my NEP (the code review system in github is > great for this too, I've received some excellent feedback on the NEP that > way). If you want to change my mind about things, please address the > specific design decisions you think are problematic by specifically > responding to lines in the NEP, as part of code-reviewing my pull request in > github. OK - unless you tell me differently I'l take that as 'the discussion of the separate API for NA and IGNORE is over as far as I am concerned'. I would say, for future reference, that if there is a substantial and reasonable discussion of the API, that is not well resolved, then it does harm to go ahead and implement regardless. Specifically, it demoralizes those of us who put energy into trying to have a substantial reasoned discussion. I think that's bad for the list and bad for the community. See you, Matthew From pav at iki.fi Fri Jul 1 10:56:22 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 1 Jul 2011 14:56:22 +0000 (UTC) Subject: [Numpy-discussion] Using numpy array in C code ? References: Message-ID: Hi, Fri, 01 Jul 2011 16:45:47 +0200, Marc Labadens wrote: > I am trying to interface some python code using numpy array with some C > code. You can read: http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#writing-an-extension-module However, using Cython often saves you from writing boilerplate: http://cython.org/ From njs at pobox.com Fri Jul 1 11:15:50 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 1 Jul 2011 08:15:50 -0700 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett > wrote: >> Do you see problems with the alterNEP proposal? > > Yes, I really like my design as it stands now, and the alterNEP removes a > lot of the abstraction and interoperability that are in my opinion the best > parts. I've made more updates to the NEP based on continuing feedback, which > are part of the pull request I want reviews for. > >> >> If so, what are they? > > Mainly: Reduced interoperability, more complex implementation (leading to > more bugs), and an unclear theoretical model for the masked part of it. Can you give any examples of situations where one would run into this "reduced interoperability"? I'm not sure what it means. The only person who has so far spoken up as needing both masking semantics and NA semantics -- Gary Strangman -- has said that he strongly prefers the alterNEP semantics *exactly because* it makes it clear *how these functions will interoperate.* Can you give any examples of how the implementation would be more complicated? As far as I can tell there are no elements in the alterNEP that are not in your NEP, they mostly just expose the functionality differently at the top level. Do you have a clearer theoretical model for the masked part of your proposal? The best I've been able to extract from any of your messages is when you wrote "it seems to me that people wanting masked arrays want missing data without touching their data". But as a matter of English grammar, I have no idea what this means -- if you have data, it's not missing! It seems to me that people wanting masked data want to *hide* parts of their data, which seems much clearer to me and is the theoretical model used in the alterNEP. Note that this model actually predicts several of the differences between how people want masks to work and how people want NAs to work (e.g., their behavior during reduction); I >> Do you agree that the alterNEP proposal is easier to understand? > > No. >> >> If?not, can you explain why? > > My answers to that are already scattered in the emails in various places, > and in the various rationales and justifications provided in the NEP. I understand the desire not to get caught up in spending all your time writing emails explaining things that you feel like you've already explained. Maybe there's an email I missed somewhere where you explain the conceptual model behind your NEP's semantics in a short, easy-to-understand way (comparable to, say, the Rationale section of the alterNEP). But I haven't seen it and I can't reconstruct a rationale for it myself (the alterNEP comes out of my attempts to do so!). >> What do you see as the important points of difference between the NEP >> and the alterNEP? > > The biggest thing is the NEP supports more use cases in a clean way by > composition of different simpler components. It defines one clear missing > data abstraction, and proposes two implementations that are interchangeable > and can interoperate. But the two implementations in your proposal are not interchangeable! The whole justification for starting with a masked-based implementation in your proposal is that it supports unmasking via views; if that requirement were removed, then there would be no reason to bother with the masking-based implementation at all. Well, that's not true. There are some marginal advantages in the special case of working with integers+NAs. But I don't think anyone's making that argument. > The alterNEP proposes two independent APIs, reducing > interoperability and so significantly increasing the amount of learning > required to work with both of them. This also precludes switching between > the two approaches without a lot of work. You can't switch between Python and C without a lot of work too, but that doesn't mean that they should be merged into one design... but they do complement each other beautifully. Just like missing data and masked arrays :-). > The current pull request that's sitting there waiting for review does not > have an impact on which approach goes ahead, but the code I'm doing now > does. This is a fairly large project, and I don't have a great length of > time to do it in, so I'm not going to participate extensively in the > alterNEP discussion. If you want to help me, please review my code and > provide specific feedback on my NEP (the code review system in github is > great for this too, I've received some excellent feedback on the NEP that > way). If you want to change my mind about things, please address the > specific design decisions you think are problematic by specifically > responding to lines in the NEP, as part of code-reviewing my pull request in > github. I know I'm being grumpy in this email, and I apologize for that. But, no. I've given extensive feedback, read the list carefully, and thought hard about these issues, and so far you've basically just dismissed my concerns. (See, e.g., [1], where your response to "we have to choose whether it's possible to recover data after it has been masked/NAed/whatever" is "no we don't, it should be both possible and impossible", which, I mean, what?) I've done my best to express them clearly, in the best way I know how -- and that way is *not* line by line comments on your NEP, because my concerns are more fundamental than that. I am of course happy to answer questions and such if there are places where I've been unclear. And of course it's your prerogative to decide how you want to spend your time (well, yours and your employer's, I guess), which forums you want to participate in, what code you want to write, etc. If you have decided that you are tired to talking about this and want to just go off and implement something, then good luck (and I do mean that, it isn't sarcasm). But as far as I can tell right now, every single person who has experience with handling missing data for statistical purposes (esp. in R) has real concerns about your proposal, and AFAICT the community has very much *not* reached consensus on how these features should look. So I guess my question is, once you've spent your limited time on writing this code -- how confident are you that it will be merged? This isn't a threat or anything, I have no power over what gets merged, but -- it seems to me that there's a real chance that you'll do this work and then it will go down in flames, or that it will be merged and then the people you're trying to target will ignore it anyway. This is why we try to build consensus first, right? I would love to find some way to make everyone happy (and have been doing what I can on that front), but right now I am not happy, other people are not happy, and you're communicating that you don't think that matters. I'd love for that to change. -- Nathaniel [1] http://mail.scipy.org/pipermail/numpy-discussion/2011-June/057274.html From 1989lzhh at gmail.com Fri Jul 1 11:22:13 2011 From: 1989lzhh at gmail.com (=?GB2312?B?wfXV8bqj?=) Date: Fri, 1 Jul 2011 23:22:13 +0800 Subject: [Numpy-discussion] Using numpy array in C code ? In-Reply-To: References: Message-ID: static PyObject * my_c_method(PyObject *self, PyObject *args) { PyArrayObject *array; double *points; PyArg_ParseTuple(args, "O", &array); points=(double*)array->data printf("points[0] = %f \n",points[0]); } It is should be like that. 2011/7/1 Pauli Virtanen > Hi, > > Fri, 01 Jul 2011 16:45:47 +0200, Marc Labadens wrote: > > I am trying to interface some python code using numpy array with some C > > code. > > You can read: > > > http://docs.scipy.org/doc/numpy/user/c-info.how-to-extend.html#writing-an-extension-module > > However, using Cython often saves you from writing boilerplate: > > http://cython.org/ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Jul 1 11:29:06 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 01 Jul 2011 08:29:06 -0700 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: <4E0DE7C2.9020405@noaa.gov> Matthew Brett wrote: > should raise an error. On the other hand, if I make a normal array: > > arr = np.array([1.0, 2.0, 7.0]) > > and then do this: > > arr.visible[2] = False > > then either I should raise an error (it's not a masked array), or, > more magically, construct a mask on the fly. maybe it's too much Magic, but it seems reasonable to me that for an array without a mask, arr.visible[i] is simply True for all values of i -- no need to create a mask to determine that. does arr[i] = np.IGNORE auto-create a mask if there is not one there already? I think it should. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From mwwiebe at gmail.com Fri Jul 1 11:34:52 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 10:34:52 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett wrote: > Hi, > > On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe wrote: > > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman > wrote: > >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith > wrote: > >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett > >> >> wrote: > >> >>> In the interest of making the discussion as concrete as possible, > here > >> >>> is my draft of an alternative proposal for NAs and masking, based on > >> >>> Nathaniel's comments. Writing it, it seemed to me that Nathaniel is > >> >>> right, that the ideas become much clearer when the NA idea and the > >> >>> MASK idea are separate. Please do pitch in for things I may have > >> >>> missed or misunderstood: > >> >> [...] > >> >> > >> >> Thanks for writing this up! I stuck it up as a gist so we can edit it > >> >> more easily: > >> >> https://gist.github.com/1056379/ > >> >> This is your initial version: > >> >> > >> >> > https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 > >> >> And I made a few changes: > >> >> > >> >> > https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 > >> >> Specifically, I added a rationale section, changed np.MASKED to > >> >> np.IGNORE (as per comments in this thread), and added a vowel to > >> >> "propmsk". > >> > > >> > It might be helpful to make a small toy class in python so that people > >> > can play around with NA and IGNORE from the alterNEP. > >> > >> Thanks for doing this. > >> > >> I don't know about you, but I don't know where to work on the > >> discussion or draft implementation, because I am not sure where the > >> disagreement is. Lluis has helpfully pointed out a specific case of > >> interest. Pierre has fed back with some points of clarification. > >> However, other than that, I'm not sure what we should be discussing. > >> > >> @Mark > >> @Chuck > >> @anyone > >> > >> Do you see problems with the alterNEP proposal? > > > > Yes, I really like my design as it stands now, and the alterNEP removes a > > lot of the abstraction and interoperability that are in my opinion the > best > > parts. I've made more updates to the NEP based on continuing feedback, > which > > are part of the pull request I want reviews for. > > Ah - I think what you are saying is - too late I've started writing it. > Do you want me to spend my whole summer designing something before starting the implementation? I made a pull request implementing a non-controversial part of the NEP to get started, and I've not seen any feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!) Implementation and design are tied together in a feedback loop, and separate designs that aren't informed by the implementation details, for example information gained by going through the proposed code changes and reviewing them, are counterproductive. I appreciate the effort you're putting in, and I've been trying to guide you towards a more holistic path of contribution by pointing out the pull request. > Mainly: Reduced interoperability > > Meaning? > You can't switch between the two approaches without big changes in your code. > > > more complex implementation (leading to > > more bugs), > > OK - but the discussion did not seem to be about the complexity of the > implementation, but about the API. > The implementation always plays a role in the design of anything. Making an API design abstractly, then testing it against implementation constraints is good, making an API completely divorced from considerations of implementation is really really bad. > > > and an unclear theoretical model for the masked part of i > > What's unclear? Or even different? > After thinking about the missing data model some more, I've come up with more rationale for why the R approach is good, and adopting both the R default and skipna option is appropriate. It's in the pull request up for code review. > >> Do you agree that the alterNEP proposal is easier to understand? > > > > > > No. > > Do you agree that there are several people on the list who do thing > that the alterNEP proposal is easier to understand? > Feedback on the clarity of my writing in the NEP is welcome, if something is unclear to someone, please point out the specific part so I can continue to improve it. I don't think the clarity of the writing is a good reason for choosing one design or another, the quality of the design is what should decide that. > >> If not, can you explain why? > > > > My answers to that are already scattered in the emails in various places, > > and in the various rationales and justifications provided in the NEP. > > I can't see any reference to the alterNEP or the idea of the separate > API in the NEP. Can you point me to it? > I'm referring to positive arguments for why the design decisions are as they are. I don't see the alterNEP referencing specific things that are wrong with the NEP either, it just assumes sharing the API is a bad idea without making clearly stated arguments for or against it. >> What do you see as the important points of difference between the NEP > >> and the alterNEP? > > > > The biggest thing is the NEP supports more use cases in a clean way by > > composition of different simpler components. It defines one clear missing > > data abstraction, and proposes two implementations that are > interchangeable > > and can interoperate. The alterNEP proposes two independent APIs, > reducing > > interoperability and so significantly increasing the amount of learning > > required to work with both of them. This also precludes switching between > > the two approaches without a lot of work. > > Lluis gave a particular somewhat obscure case where it is convenient > that the NA and IGNORE are the same. Are there any others? It seems > to me the API you propose is a classic example of implicit rather than > explicit, and that it would be very easy, at this stage, to fix that. > And I came up with a nice way to deal with this situation through a subclass of ndarray changing the default 'skipna=' parameter value. The "implicit vs explicit" quote is overused, but even so I've applied the idea very carefully. In the NEP, you never get missing value support unless you explicitly request it. > The current pull request that's sitting there waiting for review does not > > have an impact on which approach goes ahead, but the code I'm doing now > > does. This is a fairly large project, and I don't have a great length of > > time to do it in, so I'm not going to participate extensively in the > > alterNEP discussion. If you want to help me, please review my code and > > provide specific feedback on my NEP (the code review system in github is > > great for this too, I've received some excellent feedback on the NEP that > > way). If you want to change my mind about things, please address the > > specific design decisions you think are problematic by specifically > > responding to lines in the NEP, as part of code-reviewing my pull request > in > > github. > > OK - unless you tell me differently I'l take that as 'the discussion > of the separate API for NA and IGNORE is over as far as I am > concerned'. > Yes, because I'm not seeing arguments responding with specific examples or use cases showing why a separate API is better, in particular which deal with the arguments I've given indicating why sharing the API is useful. I would say, for future reference, that if there is a substantial and > reasonable discussion of the API, that is not well resolved, then it > does harm to go ahead and implement regardless. Specifically, it > demoralizes those of us who put energy into trying to have a > substantial reasoned discussion. I think that's bad for the list and > bad for the community. > You might have consideration for morale of those who are putting substantial effort into designing and implementing it as well. The ecosystem is not just this mailing list, it also is the code and documentation review process on github, and when people who only participate on the mailing list are tearing apart carefully constructed designs based in part on some mischaracterizations of those designs, then expecting to be corrected each time instead of studying the proposed design to understand and compare it to their competing ideas, it's harder and harder to keep responding with corrections. I appreciate your feedback, the design for the NA bit pattern approach that is in the NEP is inspired by your feedback for wanting that style of NA functionality. Thanks, Mark > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 11:48:15 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 09:48:15 -0600 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 9:34 AM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett wrote: > >> Hi, >> >> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe wrote: >> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman >> wrote: >> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith >> wrote: >> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett >> >> >> wrote: >> >> >>> In the interest of making the discussion as concrete as possible, >> here >> >> >>> is my draft of an alternative proposal for NAs and masking, based >> on >> >> >>> Nathaniel's comments. Writing it, it seemed to me that Nathaniel >> is >> >> >>> right, that the ideas become much clearer when the NA idea and the >> >> >>> MASK idea are separate. Please do pitch in for things I may have >> >> >>> missed or misunderstood: >> >> >> [...] >> >> >> >> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit >> it >> >> >> more easily: >> >> >> https://gist.github.com/1056379/ >> >> >> This is your initial version: >> >> >> >> >> >> >> https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 >> >> >> And I made a few changes: >> >> >> >> >> >> >> https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 >> >> >> Specifically, I added a rationale section, changed np.MASKED to >> >> >> np.IGNORE (as per comments in this thread), and added a vowel to >> >> >> "propmsk". >> >> > >> >> > It might be helpful to make a small toy class in python so that >> people >> >> > can play around with NA and IGNORE from the alterNEP. >> >> >> >> Thanks for doing this. >> >> >> >> I don't know about you, but I don't know where to work on the >> >> discussion or draft implementation, because I am not sure where the >> >> disagreement is. Lluis has helpfully pointed out a specific case of >> >> interest. Pierre has fed back with some points of clarification. >> >> However, other than that, I'm not sure what we should be discussing. >> >> >> >> @Mark >> >> @Chuck >> >> @anyone >> >> >> >> Do you see problems with the alterNEP proposal? >> > >> > Yes, I really like my design as it stands now, and the alterNEP removes >> a >> > lot of the abstraction and interoperability that are in my opinion the >> best >> > parts. I've made more updates to the NEP based on continuing feedback, >> which >> > are part of the pull request I want reviews for. >> >> Ah - I think what you are saying is - too late I've started writing it. >> > > Do you want me to spend my whole summer designing something before starting > the implementation? I made a pull request implementing a > non-controversial part of the NEP to get started, and I've not seen any > feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!) > Implementation and design are tied together in a feedback loop, and separate > designs that aren't informed by the implementation details, for example > information gained by going through the proposed code changes and reviewing > them, are counterproductive. I appreciate the effort you're putting in, and > I've been trying to guide you towards a more holistic path of contribution > by pointing out the pull request. > > > Mainly: Reduced interoperability >> >> Meaning? >> > > You can't switch between the two approaches without big changes in your > code. > > >> >> > more complex implementation (leading to >> > more bugs), >> >> OK - but the discussion did not seem to be about the complexity of the >> implementation, but about the API. >> > > The implementation always plays a role in the design of anything. Making an > API design abstractly, then testing it against implementation constraints is > good, making an API completely divorced from considerations of > implementation is really really bad. > > >> >> > and an unclear theoretical model for the masked part of i >> >> What's unclear? Or even different? >> > > After thinking about the missing data model some more, I've come up with > more rationale for why the R approach is good, and adopting both the R > default and skipna option is appropriate. It's in the pull request up for > code review. > > >> >> Do you agree that the alterNEP proposal is easier to understand? >> > >> > >> > No. >> >> Do you agree that there are several people on the list who do thing >> that the alterNEP proposal is easier to understand? >> > > Feedback on the clarity of my writing in the NEP is welcome, if something > is unclear to someone, please point out the specific part so I can continue > to improve it. I don't think the clarity of the writing is a good reason for > choosing one design or another, the quality of the design is what should > decide that. > > >> >> If not, can you explain why? >> > >> > My answers to that are already scattered in the emails in various >> places, >> > and in the various rationales and justifications provided in the NEP. >> >> I can't see any reference to the alterNEP or the idea of the separate >> API in the NEP. Can you point me to it? >> > > I'm referring to positive arguments for why the design decisions are as > they are. I don't see the alterNEP referencing specific things that are > wrong with the NEP either, it just assumes sharing the API is a bad idea > without making clearly stated arguments for or against it. > > >> What do you see as the important points of difference between the NEP >> >> and the alterNEP? >> > >> > The biggest thing is the NEP supports more use cases in a clean way by >> > composition of different simpler components. It defines one clear >> missing >> > data abstraction, and proposes two implementations that are >> interchangeable >> > and can interoperate. The alterNEP proposes two independent APIs, >> reducing >> > interoperability and so significantly increasing the amount of learning >> > required to work with both of them. This also precludes switching >> between >> > the two approaches without a lot of work. >> >> Lluis gave a particular somewhat obscure case where it is convenient >> that the NA and IGNORE are the same. Are there any others? It seems >> to me the API you propose is a classic example of implicit rather than >> explicit, and that it would be very easy, at this stage, to fix that. >> > > And I came up with a nice way to deal with this situation through a > subclass of ndarray changing the default 'skipna=' parameter value. The > "implicit vs explicit" quote is overused, but even so I've applied the idea > very carefully. In the NEP, you never get missing value support unless you > explicitly request it. > > > The current pull request that's sitting there waiting for review does not >> > have an impact on which approach goes ahead, but the code I'm doing now >> > does. This is a fairly large project, and I don't have a great length of >> > time to do it in, so I'm not going to participate extensively in the >> > alterNEP discussion. If you want to help me, please review my code and >> > provide specific feedback on my NEP (the code review system in github is >> > great for this too, I've received some excellent feedback on the NEP >> that >> > way). If you want to change my mind about things, please address the >> > specific design decisions you think are problematic by specifically >> > responding to lines in the NEP, as part of code-reviewing my pull >> request in >> > github. >> >> OK - unless you tell me differently I'l take that as 'the discussion >> of the separate API for NA and IGNORE is over as far as I am >> concerned'. >> > > Yes, because I'm not seeing arguments responding with specific examples or > use cases showing why a separate API is better, in particular which deal > with the arguments I've given indicating why sharing the API is useful. > > I would say, for future reference, that if there is a substantial and >> reasonable discussion of the API, that is not well resolved, then it >> does harm to go ahead and implement regardless. Specifically, it >> demoralizes those of us who put energy into trying to have a >> substantial reasoned discussion. I think that's bad for the list and >> bad for the community. >> > > You might have consideration for morale of those who are putting > substantial effort into designing and implementing it as well. The ecosystem > is not just this mailing list, it also is the code and documentation review > process on github, and when people who only participate on the mailing list > are tearing apart carefully constructed designs based in part on some > mischaracterizations of those designs, then expecting to be corrected each > time instead of studying the proposed design to understand and compare it to > their competing ideas, it's harder and harder to keep responding with > corrections. > > I appreciate your feedback, the design for the NA bit pattern approach that > is in the NEP is inspired by your feedback for wanting that style of NA > functionality. > > Speaking for myself, at this point I'd rather have Mark writing code than getting sucked into a long thread about alternative designs. I think the point about getting more involved with the implementation review process is a good one. When we have a prototype to play with, then we can see if it is adequate to the needs of the various users and at that point feedback is essential. I expect Mark will be begging for people to try out the code at that point, both to find bugs and to improve the API. I hope you all rise to the occasion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jul 1 12:00:00 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 1 Jul 2011 17:00:00 +0100 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: Hi, On Fri, Jul 1, 2011 at 4:34 PM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe wrote: >> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman >> >> wrote: >> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith >> >> > wrote: >> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett >> >> >> wrote: >> >> >>> In the interest of making the discussion as concrete as possible, >> >> >>> here >> >> >>> is my draft of an alternative proposal for NAs and masking, based >> >> >>> on >> >> >>> Nathaniel's comments. ?Writing it, it seemed to me that Nathaniel >> >> >>> is >> >> >>> right, that the ideas become much clearer when the NA idea and the >> >> >>> MASK idea are separate. ? Please do pitch in for things I may have >> >> >>> missed or misunderstood: >> >> >> [...] >> >> >> >> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit >> >> >> it >> >> >> more easily: >> >> >> ?https://gist.github.com/1056379/ >> >> >> This is your initial version: >> >> >> >> >> >> >> >> >> ?https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 >> >> >> And I made a few changes: >> >> >> >> >> >> >> >> >> ?https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 >> >> >> Specifically, I added a rationale section, changed np.MASKED to >> >> >> np.IGNORE (as per comments in this thread), and added a vowel to >> >> >> "propmsk". >> >> > >> >> > It might be helpful to make a small toy class in python so that >> >> > people >> >> > can play around with NA and IGNORE from the alterNEP. >> >> >> >> Thanks for doing this. >> >> >> >> I don't know about you, but I don't know where to work on the >> >> discussion or draft implementation, because I am not sure where the >> >> disagreement is. ?Lluis has helpfully pointed out a specific case of >> >> interest. ? Pierre has fed back with some points of clarification. >> >> However, other than that, I'm not sure what we should be discussing. >> >> >> >> @Mark >> >> @Chuck >> >> @anyone >> >> >> >> Do you see problems with the alterNEP proposal? >> > >> > Yes, I really like my design as it stands now, and the alterNEP removes >> > a >> > lot of the abstraction and interoperability that are in my opinion the >> > best >> > parts. I've made more updates to the NEP based on continuing feedback, >> > which >> > are part of the pull request I want reviews for. >> >> Ah - I think what you are saying is - too late I've started writing it. > > Do you want me to spend my whole summer designing something before starting > the implementation? No, but, this is an open source project. Hence it matters not only what gets written but how the decisions are made and quality of the discussion. Here what I see is that you lost interest in the discussion some time ago and stopped responding in any specific way. This unfortunately conveys a lack of interest in our views. That might not be true, in which case I'm sure you can convey the opposite with some substantial discsussion now. Or it might be for good reason, heaven knows I've been wrong enough times. But the community cost is high for the sake of an extra few days implementation time. Frankly I think the API will also suffer, but I'm less certain about that. > I made a pull request implementing a > non-controversial?part of the NEP to get started, and I've not seen any > feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!) > Implementation and design are tied together in a feedback loop, and separate > designs that aren't informed by the implementation details, for example > information gained by going through the proposed code changes and reviewing > them, are counterproductive. I appreciate the effort you're putting in, and > I've been trying to guide you towards a more holistic path of contribution > by pointing out the pull request. Holistic? You surely accept that code review is not the mechanism for high-level API decisions? >> > Mainly: Reduced interoperability >> >> Meaning? > > You can't switch between the two approaches without big changes in your > code. Lluis provided a case, and it was obscure. That switch seems like a rare or non-existent use-case that should not guide the API. >> >> > more complex implementation (leading to >> > more bugs), >> >> OK - but the discussion did not seem to be about the complexity of the >> implementation, but about the API. > > The implementation always plays a role in the design of anything. Making an > API design abstractly, then testing it against implementation constraints is > good, making an API completely divorced from considerations of > implementation is really really bad. Making major API decisions on the basis of implementation ease is also bad because it leads to a bad API and a bad API leads to confusion, and makes people use the feature less. You spent considerable energy trying to persuade us that we should not worry about the implementation, and that it was a detail. Now you are telling us that your chose the API for the implementation. All that is fine, but it is not fine to imply that the discussion of the API is a waste of your time. >> > and an unclear theoretical model for the masked part of i >> >> What's unclear? ?Or even different? > > After thinking about the missing data model some more, I've come up with > more rationale for why the R approach is good, and adopting both the R > default and skipna option is appropriate. It's in the pull request up for > code review. >> >> >> Do you agree that the alterNEP proposal is easier to understand? >> > >> > >> > No. >> >> Do you agree that there are several people on the list who do thing >> that the alterNEP proposal is easier to understand? > > Feedback on the clarity of my writing in the NEP is welcome, if something is > unclear to someone, please point out the specific part so I can continue to > improve it. I don't think the clarity of the writing is a good reason for > choosing one design or another, the quality of the design is what should > decide that. It's difficult for me not to feel you are deliberately misunderstanding me here. I don't mean the writing, I mean the API. >> >> If?not, can you explain why? >> > >> > My answers to that are already scattered in the emails in various >> > places, >> > and in the various rationales and justifications provided in the NEP. >> >> I can't see any reference to the alterNEP or the idea of the separate >> API in the NEP. ?Can you point me to it? > > I'm referring to positive arguments for why the design decisions are as they > are. I don't see the alterNEP referencing specific things that are wrong > with the NEP either, it just assumes sharing the API is a bad idea without > making clearly stated arguments for or against it. We've made that argument many times - that the masking use-case and the missing data use-case are separate, and imply different ufunc semantics, and different assignment semantics. You'll see the two ideas set out at the top of the aNEP, and Nathaniel has stated them clearly in his emails. >> >> What do you see as the important points of difference between the NEP >> >> and the alterNEP? >> > >> > The biggest thing is the NEP supports more use cases in a clean way by >> > composition of different simpler components. It defines one clear >> > missing >> > data abstraction, and proposes two implementations that are >> > interchangeable >> > and can interoperate. The alterNEP proposes two independent APIs, >> > reducing >> > interoperability and so significantly increasing the amount of learning >> > required to work with both of them. This also precludes switching >> > between >> > the two approaches without a lot of work. >> >> Lluis gave a particular somewhat obscure case where it is convenient >> that the NA and IGNORE are the same. ? Are there any others? ?It seems >> to me the API you propose is a classic example of implicit rather than >> explicit, and that it would be very easy, at this stage, to fix that. > > And I came up with a nice way to deal with this situation through a subclass > of ndarray changing the default 'skipna=' parameter value. The "implicit vs > explicit" quote is overused, but even so I've applied the idea very > carefully. In the NEP, you never get missing value support unless you > explicitly request it. Explicit about NA rather than IGNORE >> > The current pull request that's sitting there waiting for review does >> > not >> > have an impact on which approach goes ahead, but the code I'm doing now >> > does. This is a fairly large project, and I don't have a great length of >> > time to do it in, so I'm not going to participate extensively in the >> > alterNEP discussion. If you want to help me, please review my code and >> > provide specific feedback on my NEP (the code review system in github is >> > great for this too, I've received some excellent feedback on the NEP >> > that >> > way). If you want to change my mind about things, please address the >> > specific design decisions you think are problematic by specifically >> > responding to lines in the NEP, as part of code-reviewing my pull >> > request in >> > github. >> >> OK - unless you tell me differently I'l take that as 'the discussion >> of the separate API for NA and IGNORE is over as far as I am >> concerned'. > > Yes, because I'm not seeing arguments responding with specific examples or > use cases showing why a separate API is better, in particular which deal > with the arguments I've given indicating why sharing the API is useful. What are those arguments? Are they really restricted to Lluis' case? >> I would say, for future reference, that if there is a substantial and >> reasonable discussion of the API, that is not well resolved, then it >> does harm to go ahead and implement regardless. ?Specifically, it >> demoralizes those of us who put energy into trying to have a >> substantial reasoned discussion. ? I think that's bad for the list and >> bad for the community. > > You might have consideration for morale of those who are putting substantial > effort into designing and implementing it as well. The ecosystem is not just > this mailing list, it also is the code and documentation review process on > github, and when people who only participate on the mailing list are tearing > apart carefully constructed designs based in part on some > mischaracterizations of those designs, What are the mischaracterizations? > then expecting to be corrected each > time instead of studying the proposed design to understand and compare it to > their competing ideas, it's harder and harder to keep responding with > corrections. In what sense have we failed to compare your design to ours? Are you really saying that our proposal was a poorly done piece of work and hence not worth delaying for? > I appreciate your feedback, the design for the NA bit pattern approach that > is in the NEP is inspired by your feedback for wanting that style of NA > functionality. I'm glad it was useful, and sorry it was not more useful. Best, Matthew From matthew.brett at gmail.com Fri Jul 1 12:08:30 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 1 Jul 2011 17:08:30 +0100 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: Hi, On Fri, Jul 1, 2011 at 4:48 PM, Charles R Harris wrote: > > > On Fri, Jul 1, 2011 at 9:34 AM, Mark Wiebe wrote: >> >> On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe wrote: >>> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett >>> > wrote: >>> >> >>> >> Hi, >>> >> >>> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman >>> >> wrote: >>> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith >>> >> > wrote: >>> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett >>> >> >> wrote: >>> >> >>> In the interest of making the discussion as concrete as possible, >>> >> >>> here >>> >> >>> is my draft of an alternative proposal for NAs and masking, based >>> >> >>> on >>> >> >>> Nathaniel's comments. ?Writing it, it seemed to me that Nathaniel >>> >> >>> is >>> >> >>> right, that the ideas become much clearer when the NA idea and the >>> >> >>> MASK idea are separate. ? Please do pitch in for things I may have >>> >> >>> missed or misunderstood: >>> >> >> [...] >>> >> >> >>> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit >>> >> >> it >>> >> >> more easily: >>> >> >> ?https://gist.github.com/1056379/ >>> >> >> This is your initial version: >>> >> >> >>> >> >> >>> >> >> ?https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 >>> >> >> And I made a few changes: >>> >> >> >>> >> >> >>> >> >> ?https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 >>> >> >> Specifically, I added a rationale section, changed np.MASKED to >>> >> >> np.IGNORE (as per comments in this thread), and added a vowel to >>> >> >> "propmsk". >>> >> > >>> >> > It might be helpful to make a small toy class in python so that >>> >> > people >>> >> > can play around with NA and IGNORE from the alterNEP. >>> >> >>> >> Thanks for doing this. >>> >> >>> >> I don't know about you, but I don't know where to work on the >>> >> discussion or draft implementation, because I am not sure where the >>> >> disagreement is. ?Lluis has helpfully pointed out a specific case of >>> >> interest. ? Pierre has fed back with some points of clarification. >>> >> However, other than that, I'm not sure what we should be discussing. >>> >> >>> >> @Mark >>> >> @Chuck >>> >> @anyone >>> >> >>> >> Do you see problems with the alterNEP proposal? >>> > >>> > Yes, I really like my design as it stands now, and the alterNEP removes >>> > a >>> > lot of the abstraction and interoperability that are in my opinion the >>> > best >>> > parts. I've made more updates to the NEP based on continuing feedback, >>> > which >>> > are part of the pull request I want reviews for. >>> >>> Ah - I think what you are saying is - too late I've started writing it. >> >> Do you want me to spend my whole summer designing something before >> starting the implementation? I made a pull request implementing a >> non-controversial?part of the NEP to get started, and I've not seen any >> feedback on except from Chuck and Derek. (Many thanks to Chuck and Derek!) >> Implementation and design are tied together in a feedback loop, and separate >> designs that aren't informed by the implementation details, for example >> information gained by going through the proposed code changes and reviewing >> them, are counterproductive. I appreciate the effort you're putting in, and >> I've been trying to guide you towards a more holistic path of contribution >> by pointing out the pull request. >>> >>> > Mainly: Reduced interoperability >>> >>> Meaning? >> >> You can't switch between the two approaches without big changes in your >> code. >> >>> >>> > more complex implementation (leading to >>> > more bugs), >>> >>> OK - but the discussion did not seem to be about the complexity of the >>> implementation, but about the API. >> >> The implementation always plays a role in the design of anything. Making >> an API design abstractly, then testing it against implementation constraints >> is good, making an API completely divorced from considerations of >> implementation is really really bad. >> >>> >>> > and an unclear theoretical model for the masked part of i >>> >>> What's unclear? ?Or even different? >> >> After thinking about the missing data model some more, I've come up with >> more rationale for why the R approach is good, and adopting both the R >> default and skipna option is appropriate. It's in the pull request up for >> code review. >> >>> >>> >> Do you agree that the alterNEP proposal is easier to understand? >>> > >>> > >>> > No. >>> >>> Do you agree that there are several people on the list who do thing >>> that the alterNEP proposal is easier to understand? >> >> Feedback on the clarity of my writing in the NEP is welcome, if something >> is unclear to someone, please point out the specific part so I can continue >> to improve it. I don't think the clarity of the writing is a good reason for >> choosing one design or another, the quality of the design is what should >> decide that. >> >>> >>> >> If?not, can you explain why? >>> > >>> > My answers to that are already scattered in the emails in various >>> > places, >>> > and in the various rationales and justifications provided in the NEP. >>> >>> I can't see any reference to the alterNEP or the idea of the separate >>> API in the NEP. ?Can you point me to it? >> >> I'm referring to positive arguments for why the design decisions are as >> they are. I don't see the alterNEP referencing specific things that are >> wrong with the NEP either, it just assumes sharing the API is a bad idea >> without making clearly stated arguments for or against it. >>> >>> >> What do you see as the important points of difference between the NEP >>> >> and the alterNEP? >>> > >>> > The biggest thing is the NEP supports more use cases in a clean way by >>> > composition of different simpler components. It defines one clear >>> > missing >>> > data abstraction, and proposes two implementations that are >>> > interchangeable >>> > and can interoperate. The alterNEP proposes two independent APIs, >>> > reducing >>> > interoperability and so significantly increasing the amount of learning >>> > required to work with both of them. This also precludes switching >>> > between >>> > the two approaches without a lot of work. >>> >>> Lluis gave a particular somewhat obscure case where it is convenient >>> that the NA and IGNORE are the same. ? Are there any others? ?It seems >>> to me the API you propose is a classic example of implicit rather than >>> explicit, and that it would be very easy, at this stage, to fix that. >> >> And I came up with a nice way to deal with this situation through a >> subclass of ndarray changing the default 'skipna=' parameter value. The >> "implicit vs explicit" quote is overused, but even so I've applied the idea >> very carefully. In the NEP, you never get missing value support unless you >> explicitly request it. >>> >>> > The current pull request that's sitting there waiting for review does >>> > not >>> > have an impact on which approach goes ahead, but the code I'm doing now >>> > does. This is a fairly large project, and I don't have a great length >>> > of >>> > time to do it in, so I'm not going to participate extensively in the >>> > alterNEP discussion. If you want to help me, please review my code and >>> > provide specific feedback on my NEP (the code review system in github >>> > is >>> > great for this too, I've received some excellent feedback on the NEP >>> > that >>> > way). If you want to change my mind about things, please address the >>> > specific design decisions you think are problematic by specifically >>> > responding to lines in the NEP, as part of code-reviewing my pull >>> > request in >>> > github. >>> >>> OK - unless you tell me differently I'l take that as 'the discussion >>> of the separate API for NA and IGNORE is over as far as I am >>> concerned'. >> >> Yes, because I'm not seeing arguments responding with specific examples or >> use cases showing why a separate API is better, in particular which deal >> with the arguments I've given indicating why sharing the API is useful. >>> >>> I would say, for future reference, that if there is a substantial and >>> reasonable discussion of the API, that is not well resolved, then it >>> does harm to go ahead and implement regardless. ?Specifically, it >>> demoralizes those of us who put energy into trying to have a >>> substantial reasoned discussion. ? I think that's bad for the list and >>> bad for the community. >> >> You might have consideration for morale of those who are putting >> substantial effort into designing and implementing it as well. The ecosystem >> is not just this mailing list, it also is the code and documentation review >> process on github, and when people who only participate on the mailing list >> are tearing apart carefully constructed designs based in part on some >> mischaracterizations of those designs, then expecting to be corrected each >> time instead of studying the proposed design to understand and compare it to >> their competing ideas, it's harder and harder to keep responding with >> corrections. >> I appreciate your feedback, the design for the NA bit pattern approach >> that is in the NEP is inspired by your feedback for wanting that style of NA >> functionality. > > Speaking for myself, at this point I'd rather have Mark writing code than > getting sucked into a long thread about alternative designs. I think the > point about getting more involved with the implementation review process is > a good one. When we have a prototype to play with, then we can see if it is > adequate to the needs of the various users and at that point feedback is > essential. I expect Mark will be begging for people to try out the code at > that point, both to find bugs and to improve the API. I hope you all rise to > the occasion. Continuing a discussion that started off-list - I would humbly ask that we avoid the more corporate 'get behind the team' mentality here. It's not the open-source way, and no-one enjoys it. I don't think anyone is in any doubt that Mark's work has the potential to be extremely useful and important here. That makes it all the more important that we discuss it fully without recourse to 'too much talking not enough typing'. Discussion is the root of good decisions [1]. We should value discussion and place it highly on our priorities. We all of us write real code here, and know when a draft is the next best step. Some of us believe we did not get there in this case. Best, Matthew [1] http://en.wikipedia.org/wiki/Alan_Brooke,_1st_Viscount_Alanbrooke#Relationship_with_Churchill From charlesr.harris at gmail.com Fri Jul 1 12:15:39 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 10:15:39 -0600 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 10:00 AM, Matthew Brett wrote: > Hi, > > On Fri, Jul 1, 2011 at 4:34 PM, Mark Wiebe wrote: > > On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe wrote: > >> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman > >> >> wrote: > >> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith > >> >> > wrote: > >> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett > >> >> >> wrote: > >> >> >>> In the interest of making the discussion as concrete as possible, > >> >> >>> here > >> >> >>> is my draft of an alternative proposal for NAs and masking, based > >> >> >>> on > >> >> >>> Nathaniel's comments. Writing it, it seemed to me that Nathaniel > >> >> >>> is > >> >> >>> right, that the ideas become much clearer when the NA idea and > the > >> >> >>> MASK idea are separate. Please do pitch in for things I may > have > >> >> >>> missed or misunderstood: > >> >> >> [...] > >> >> >> > >> >> >> Thanks for writing this up! I stuck it up as a gist so we can edit > >> >> >> it > >> >> >> more easily: > >> >> >> https://gist.github.com/1056379/ > >> >> >> This is your initial version: > >> >> >> > >> >> >> > >> >> >> > https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 > >> >> >> And I made a few changes: > >> >> >> > >> >> >> > >> >> >> > https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 > >> >> >> Specifically, I added a rationale section, changed np.MASKED to > >> >> >> np.IGNORE (as per comments in this thread), and added a vowel to > >> >> >> "propmsk". > >> >> > > >> >> > It might be helpful to make a small toy class in python so that > >> >> > people > >> >> > can play around with NA and IGNORE from the alterNEP. > >> >> > >> >> Thanks for doing this. > >> >> > >> >> I don't know about you, but I don't know where to work on the > >> >> discussion or draft implementation, because I am not sure where the > >> >> disagreement is. Lluis has helpfully pointed out a specific case of > >> >> interest. Pierre has fed back with some points of clarification. > >> >> However, other than that, I'm not sure what we should be discussing. > >> >> > >> >> @Mark > >> >> @Chuck > >> >> @anyone > >> >> > >> >> Do you see problems with the alterNEP proposal? > >> > > >> > Yes, I really like my design as it stands now, and the alterNEP > removes > >> > a > >> > lot of the abstraction and interoperability that are in my opinion the > >> > best > >> > parts. I've made more updates to the NEP based on continuing feedback, > >> > which > >> > are part of the pull request I want reviews for. > >> > >> Ah - I think what you are saying is - too late I've started writing it. > > > > Do you want me to spend my whole summer designing something before > starting > > the implementation? > > No, but, this is an open source project. Hence it matters not only > what gets written but how the decisions are made and quality of the > discussion. Here what I see is that you lost interest in the > discussion some time ago and stopped responding in any specific way. > This unfortunately conveys a lack of interest in our views. That > might not be true, in which case I'm sure you can convey the opposite > with some substantial discsussion now. Or it might be for good > reason, heaven knows I've been wrong enough times. But the community > cost is high for the sake of an extra few days implementation time. > Frankly I think the API will also suffer, but I'm less certain about > that. > What open source has trouble with isn't discussion, it's attracting active and competent developers. You should treat them as gifts from the $deity when they show up. If they are open and responsive to discussion, and I think Mark is, so much the better. Mind, you don't need to bow down and kiss their feet, but you should at least take the time to understand what they are doing so your criticisms and feedback are informed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Jul 1 12:17:51 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 1 Jul 2011 11:17:51 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett wrote: > > > You can't switch between the two approaches without big changes in your > > code. > > > > Lluis provided a case, and it was obscure. That switch seems like a > rare or non-existent use-case that should not guide the API. > > Just to respond to this specific issue. In matplotlib, there are often constructs like the following: plot_something(X, Y, V) >From a module perspective, we have no clue about the nature of the input data. We often have to do things like np.asanyarray, np.atleast_2d and such to establish some base-level assumptions about the input data. Numpy currently makes this fairly cheap by not performing a copy if it is not needed. So far, so good. Next, some plotting functions needs to broadcast the arrays together (again, numpy makes that fairly cheap). Then, we need to figure out the common elements to plot. With something simple like plot(), this is straight-forward or-ing of any masks. Of course, right now, this is not cheap because we can't assume that the array supports masking semantics. This is where we either cast the arrays as masked arrays, or perform our own masking semantics. But, essentially, a point that was masked in X, may not be masked in Y and/or V, and we can not change the original data (or else we would be a bad tool). For more complicated functions like pcolor() and contour(), the arrays needs to know what the status of the neighboring points in itself, and for the other arrays. Again, either we use numpy.ma to share a common mask across the data arrays, or we implement our own semantics to deal with this. And again, we can not change any of the original data. This is not an obscure case. This is existing code in matplotlib. I will be evaluating the current missingdata branch later today to assess its suitability for use in matplotlib. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jul 1 12:18:59 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 1 Jul 2011 17:18:59 +0100 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: Hi, On Fri, Jul 1, 2011 at 5:15 PM, Charles R Harris wrote: > > > On Fri, Jul 1, 2011 at 10:00 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Fri, Jul 1, 2011 at 4:34 PM, Mark Wiebe wrote: >> > On Fri, Jul 1, 2011 at 9:50 AM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Fri, Jul 1, 2011 at 3:09 PM, Mark Wiebe wrote: >> >> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Fri, Jul 1, 2011 at 2:36 AM, Keith Goodman >> >> >> wrote: >> >> >> > On Thu, Jun 30, 2011 at 10:51 AM, Nathaniel Smith >> >> >> > wrote: >> >> >> >> On Thu, Jun 30, 2011 at 6:31 AM, Matthew Brett >> >> >> >> wrote: >> >> >> >>> In the interest of making the discussion as concrete as >> >> >> >>> possible, >> >> >> >>> here >> >> >> >>> is my draft of an alternative proposal for NAs and masking, >> >> >> >>> based >> >> >> >>> on >> >> >> >>> Nathaniel's comments. ?Writing it, it seemed to me that >> >> >> >>> Nathaniel >> >> >> >>> is >> >> >> >>> right, that the ideas become much clearer when the NA idea and >> >> >> >>> the >> >> >> >>> MASK idea are separate. ? Please do pitch in for things I may >> >> >> >>> have >> >> >> >>> missed or misunderstood: >> >> >> >> [...] >> >> >> >> >> >> >> >> Thanks for writing this up! I stuck it up as a gist so we can >> >> >> >> edit >> >> >> >> it >> >> >> >> more easily: >> >> >> >> ?https://gist.github.com/1056379/ >> >> >> >> This is your initial version: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ?https://gist.github.com/1056379/c809715f4e9765db72908c605468304ea1eb2191 >> >> >> >> And I made a few changes: >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> ?https://gist.github.com/1056379/33ba20300e1b72156c8fb655bd1ceef03f8a6583 >> >> >> >> Specifically, I added a rationale section, changed np.MASKED to >> >> >> >> np.IGNORE (as per comments in this thread), and added a vowel to >> >> >> >> "propmsk". >> >> >> > >> >> >> > It might be helpful to make a small toy class in python so that >> >> >> > people >> >> >> > can play around with NA and IGNORE from the alterNEP. >> >> >> >> >> >> Thanks for doing this. >> >> >> >> >> >> I don't know about you, but I don't know where to work on the >> >> >> discussion or draft implementation, because I am not sure where the >> >> >> disagreement is. ?Lluis has helpfully pointed out a specific case of >> >> >> interest. ? Pierre has fed back with some points of clarification. >> >> >> However, other than that, I'm not sure what we should be discussing. >> >> >> >> >> >> @Mark >> >> >> @Chuck >> >> >> @anyone >> >> >> >> >> >> Do you see problems with the alterNEP proposal? >> >> > >> >> > Yes, I really like my design as it stands now, and the alterNEP >> >> > removes >> >> > a >> >> > lot of the abstraction and interoperability that are in my opinion >> >> > the >> >> > best >> >> > parts. I've made more updates to the NEP based on continuing >> >> > feedback, >> >> > which >> >> > are part of the pull request I want reviews for. >> >> >> >> Ah - I think what you are saying is - too late I've started writing it. >> > >> > Do you want me to spend my whole summer designing something before >> > starting >> > the implementation? >> >> No, but, this is an open source project. ?Hence it matters not only >> what gets written but how the decisions are made and quality of the >> discussion. ? Here what I see is that you lost interest in the >> discussion some time ago and stopped responding in any specific way. >> This unfortunately conveys a lack of interest in our views. ? That >> might not be true, in which case I'm sure you can convey the opposite >> with some substantial discsussion now. ?Or it might be for good >> reason, heaven knows I've been wrong enough times. ?But the community >> cost is high for the sake of an extra few days implementation time. >> Frankly I think the API will also suffer, but I'm less certain about >> that. > > What open source has trouble with isn't discussion, it's attracting active > and competent developers. You should treat them as gifts from the $deity > when they show up. If they are open and responsive to discussion, and I > think Mark is, so much the better. Mind, you don't need to bow down and kiss > their feet, but you should at least take the time to understand what they > are doing so your criticisms and feedback are informed. Are you now going to explain why you believe our criticisms and feedback are not well informed? See you, Matthew From bsouthey at gmail.com Fri Jul 1 12:18:40 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 01 Jul 2011 11:18:40 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: <4E0DF360.3090906@gmail.com> On 07/01/2011 10:15 AM, Nathaniel Smith wrote: > On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe wrote: >> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett >> wrote: >>> Do you see problems with the alterNEP proposal? >> Yes, I really like my design as it stands now, and the alterNEP removes a >> lot of the abstraction and interoperability that are in my opinion the best >> parts. I've made more updates to the NEP based on continuing feedback, which >> are part of the pull request I want reviews for. >> >>> If so, what are they? >> Mainly: Reduced interoperability, more complex implementation (leading to >> more bugs), and an unclear theoretical model for the masked part of it. > Can you give any examples of situations where one would run into this > "reduced interoperability"? I'm not sure what it means. The only > person who has so far spoken up as needing both masking semantics and > NA semantics -- Gary Strangman -- has said that he strongly prefers > the alterNEP semantics *exactly because* it makes it clear *how these > functions will interoperate.* > > Can you give any examples of how the implementation would be more > complicated? As far as I can tell there are no elements in the > alterNEP that are not in your NEP, they mostly just expose the > functionality differently at the top level. > > Do you have a clearer theoretical model for the masked part of your > proposal? The best I've been able to extract from any of your messages > is when you wrote "it seems to me that people wanting masked arrays > want missing data without touching their data". But as a matter of > English grammar, I have no idea what this means -- if you have data, > it's not missing! It seems to me that people wanting masked data want > to *hide* parts of their data, which seems much clearer to me and is > the theoretical model used in the alterNEP. Note that this model > actually predicts several of the differences between how people want > masks to work and how people want NAs to work (e.g., their behavior > during reduction); I > >>> Do you agree that the alterNEP proposal is easier to understand? >> No. >>> If not, can you explain why? >> My answers to that are already scattered in the emails in various places, >> and in the various rationales and justifications provided in the NEP. > I understand the desire not to get caught up in spending all your time > writing emails explaining things that you feel like you've already > explained. > > Maybe there's an email I missed somewhere where you explain the > conceptual model behind your NEP's semantics in a short, > easy-to-understand way (comparable to, say, the Rationale section of > the alterNEP). But I haven't seen it and I can't reconstruct a > rationale for it myself (the alterNEP comes out of my attempts to do > so!). > >>> What do you see as the important points of difference between the NEP >>> and the alterNEP? >> The biggest thing is the NEP supports more use cases in a clean way by >> composition of different simpler components. It defines one clear missing >> data abstraction, and proposes two implementations that are interchangeable >> and can interoperate. > But the two implementations in your proposal are not interchangeable! > The whole justification for starting with a masked-based > implementation in your proposal is that it supports unmasking via > views; if that requirement were removed, then there would be no reason > to bother with the masking-based implementation at all. > > Well, that's not true. There are some marginal advantages in the > special case of working with integers+NAs. But I don't think anyone's > making that argument. > >> The alterNEP proposes two independent APIs, reducing >> interoperability and so significantly increasing the amount of learning >> required to work with both of them. This also precludes switching between >> the two approaches without a lot of work. > You can't switch between Python and C without a lot of work too, but > that doesn't mean that they should be merged into one design... but > they do complement each other beautifully. Just like missing data and > masked arrays :-). > >> The current pull request that's sitting there waiting for review does not >> have an impact on which approach goes ahead, but the code I'm doing now >> does. This is a fairly large project, and I don't have a great length of >> time to do it in, so I'm not going to participate extensively in the >> alterNEP discussion. If you want to help me, please review my code and >> provide specific feedback on my NEP (the code review system in github is >> great for this too, I've received some excellent feedback on the NEP that >> way). If you want to change my mind about things, please address the >> specific design decisions you think are problematic by specifically >> responding to lines in the NEP, as part of code-reviewing my pull request in >> github. > I know I'm being grumpy in this email, and I apologize for that. But, > no. I've given extensive feedback, read the list carefully, and > thought hard about these issues, and so far you've basically just > dismissed my concerns. (See, e.g., [1], where your response to "we > have to choose whether it's possible to recover data after it has been > masked/NAed/whatever" is "no we don't, it should be both possible and > impossible", which, I mean, what?) I've done my best to express them > clearly, in the best way I know how -- and that way is *not* line by > line comments on your NEP, because my concerns are more fundamental > than that. > > I am of course happy to answer questions and such if there are places > where I've been unclear. > > And of course it's your prerogative to decide how you want to spend > your time (well, yours and your employer's, I guess), which forums you > want to participate in, what code you want to write, etc. If you have > decided that you are tired to talking about this and want to just go > off and implement something, then good luck (and I do mean that, it > isn't sarcasm). > > But as far as I can tell right now, every single person who has > experience with handling missing data for statistical purposes (esp. > in R) has real concerns about your proposal, and AFAICT the community > has very much *not* reached consensus on how these features should > look. So I guess my question is, once you've spent your limited time > on writing this code -- how confident are you that it will be merged? > This isn't a threat or anything, I have no power over what gets > merged, but -- it seems to me that there's a real chance that you'll > do this work and then it will go down in flames, or that it will be > merged and then the people you're trying to target will ignore it > anyway. This is why we try to build consensus first, right? I would > love to find some way to make everyone happy (and have been doing what > I can on that front), but right now I am not happy, other people are > not happy, and you're communicating that you don't think that matters. > I'd love for that to change. > > -- Nathaniel > > [1] http://mail.scipy.org/pipermail/numpy-discussion/2011-June/057274.html > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I am sorry that that is NOT true - DON'T just lump every one into this when they have clearly stated the opposite! Missing values are nothing special to me, just reality. There are many statistical applications where masking is extremely common like outlier detection and flagging unusual observations (missing values is also masking). Just that you as a user have to do that yourself by creating and maintaining working variables. I really find that you are 'splitting hairs' in your arguments as it really has to be up to the application on how missing values and NaN have to be handled. I see no difference between a missing value and a NaN because in virtually all statistical applications, both of these are dropped. This is what SAS typically does although certain procedure like FREQ allow you to treat missing values as 'valid'. R has slightly more flexibility since it differentiates missing valves and NaN. R allows you to decide how missing values are handled using arguments like na.rm or using na.fail, na.omit, na.exclude, na.pass functions. But I think for the majority of cases (I'm not an R guru), R acts the same way as, by default (which is how most people use R) R excludes missing values and NaN's. One of the problems I see here is that numpy has to work with a wide range of situations that neither R nor SAS or any other statistical-based language/application have to deal with. So you have suggest has to work for string, integer and data/time arrays. I generally agree with what Chuck has said. But I know that while we have little say in some of numpy, we can file tickets that actually get some action. It is also how times change as this missing value topic has way more interest than previous times it has been raised. So I think we are gradually getting some positive awareness. Bruce From matthew.brett at gmail.com Fri Jul 1 12:20:45 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 1 Jul 2011 17:20:45 +0100 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: Hi, On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root wrote: > > > On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett > wrote: >> >> > You can't switch between the two approaches without big changes in your >> > code. >> >> > >> Lluis provided a case, and it was obscure. ?That switch seems like a >> rare or non-existent use-case that should not guide the API. >> > > Just to respond to this specific issue. > > In matplotlib, there are often constructs like the following: > > plot_something(X, Y, V) > > From a module perspective, we have no clue about the nature of the input > data.? We often have to do things like np.asanyarray, np.atleast_2d and such > to establish some base-level assumptions about the input data.? Numpy > currently makes this fairly cheap by not performing a copy if it is not > needed.? So far, so good. > > Next, some plotting functions needs to broadcast the arrays together (again, > numpy makes that fairly cheap). > > Then, we need to figure out the common elements to plot.? With something > simple like plot(), this is straight-forward or-ing of any masks.? Of > course, right now, this is not cheap because we can't assume that the array > supports masking semantics.? This is where we either cast the arrays as > masked arrays, or perform our own masking semantics.? But, essentially, a > point that was masked in X, may not be masked in Y and/or V, and we can not > change the original data (or else we would be a bad tool). > > For more complicated functions like pcolor() and contour(), the arrays needs > to know what the status of the neighboring points in itself, and for the > other arrays.? Again, either we use numpy.ma to share a common mask across > the data arrays, or we implement our own semantics to deal with this.? And > again, we can not change any of the original data. > > This is not an obscure case.? This is existing code in matplotlib.? I will > be evaluating the current missingdata branch later today to assess its > suitability for use in matplotlib. I think I missed why your case needs NA and IGNORE to use the same API. Why can't you just use masks and IGNORE here? Best, Matthew From matthew.brett at gmail.com Fri Jul 1 12:24:48 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 1 Jul 2011 17:24:48 +0100 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: <4E0DF360.3090906@gmail.com> References: <4E0DF360.3090906@gmail.com> Message-ID: Hi, On Fri, Jul 1, 2011 at 5:18 PM, Bruce Southey wrote: > On 07/01/2011 10:15 AM, Nathaniel Smith wrote: > I really find that you are 'splitting hairs' in your arguments as it > really has to be up to the application on how missing values and NaN > have to be handled. I see no difference between a missing value and a > NaN because in virtually all statistical applications, both of these are > dropped. The argument is that NA and IGNORE are conceptually different and should have a separate API. That if you don't, it will be confusing. By default, in alterNEP, NAs propagate and masked values are ignored. If you want to treat them just the same, then that's an argument to your ufunc. Or use an 'isvalid' utility function. Do you have a concrete case where making NA and IGNORE the same thing in the API, gives some benefit? Best, Matthew From ben.root at ou.edu Fri Jul 1 12:29:11 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 1 Jul 2011 11:29:11 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 11:20 AM, Matthew Brett wrote: > Hi, > > On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root wrote: > > > > > > On Fri, Jul 1, 2011 at 11:00 AM, Matthew Brett > > wrote: > >> > >> > You can't switch between the two approaches without big changes in > your > >> > code. > >> > >> > > >> Lluis provided a case, and it was obscure. That switch seems like a > >> rare or non-existent use-case that should not guide the API. > >> > > > > Just to respond to this specific issue. > > > > In matplotlib, there are often constructs like the following: > > > > plot_something(X, Y, V) > > > > From a module perspective, we have no clue about the nature of the input > > data. We often have to do things like np.asanyarray, np.atleast_2d and > such > > to establish some base-level assumptions about the input data. Numpy > > currently makes this fairly cheap by not performing a copy if it is not > > needed. So far, so good. > > > > Next, some plotting functions needs to broadcast the arrays together > (again, > > numpy makes that fairly cheap). > > > > Then, we need to figure out the common elements to plot. With something > > simple like plot(), this is straight-forward or-ing of any masks. Of > > course, right now, this is not cheap because we can't assume that the > array > > supports masking semantics. This is where we either cast the arrays as > > masked arrays, or perform our own masking semantics. But, essentially, a > > point that was masked in X, may not be masked in Y and/or V, and we can > not > > change the original data (or else we would be a bad tool). > > > > For more complicated functions like pcolor() and contour(), the arrays > needs > > to know what the status of the neighboring points in itself, and for the > > other arrays. Again, either we use numpy.ma to share a common mask > across > > the data arrays, or we implement our own semantics to deal with this. > And > > again, we can not change any of the original data. > > > > This is not an obscure case. This is existing code in matplotlib. I > will > > be evaluating the current missingdata branch later today to assess its > > suitability for use in matplotlib. > > I think I missed why your case needs NA and IGNORE to use the same > API. Why can't you just use masks and IGNORE here? > > Best, > > Matthew > The point is that matplotlib can not make assumptions about the nature of the input data. From matplotlib's perspective, NA's and IGNORE's are the same thing and should be treated the same way (i.e. - skipped). Right now, matplotlib's code is messy and inconsistent with its treatment of masked arrays and NaNs (some functions treat them the same, some only apply to NaNs and vice versa). This is because of code cruft over the years. If we had one interface to rule them all, we can bring *all* plotting functions to have similar handling code and be more consistent across the board. However, I think Mark's NEP provides a good way to distinguish between the cases when needed (but I have not examined it from that perspective yet). Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Fri Jul 1 12:29:50 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Fri, 1 Jul 2011 11:29:50 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: This is kind of late to be jumping into the 'long thread of doom', but I've been following most of the posts, so I'd figured I'd throw in my 2 cents. I'm Mark's officemate over the summer, and we've been talking daily about his design. I was skeptical of various details at first, but by now Mark's largely sold me on his design. Though, FWIW, my background is largely statistical uses of arrays rather than scientific uses, so I grok missing data usage more naturally than masking. On Fri, Jul 1, 2011 at 10:15 AM, Nathaniel Smith wrote: > On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe wrote: > > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett > > wrote: > >> Do you see problems with the alterNEP proposal? > > > > Yes, I really like my design as it stands now, and the alterNEP removes a > > lot of the abstraction and interoperability that are in my opinion the > best > > parts. I've made more updates to the NEP based on continuing feedback, > which > > are part of the pull request I want reviews for. > > > >> > >> If so, what are they? > > > > Mainly: Reduced interoperability, more complex implementation (leading to > > more bugs), and an unclear theoretical model for the masked part of it. > > Can you give any examples of situations where one would run into this > "reduced interoperability"? I'm not sure what it means. The only > person who has so far spoken up as needing both masking semantics and > NA semantics -- Gary Strangman -- has said that he strongly prefers > the alterNEP semantics *exactly because* it makes it clear *how these > functions will interoperate.* > > Can you give any examples of how the implementation would be more > complicated? As far as I can tell there are no elements in the > alterNEP that are not in your NEP, they mostly just expose the > functionality differently at the top level. > > Do you have a clearer theoretical model for the masked part of your > proposal? The best I've been able to extract from any of your messages > is when you wrote "it seems to me that people wanting masked arrays > want missing data without touching their data". But as a matter of > English grammar, I have no idea what this means -- if you have data, > it's not missing! It seems to me that people wanting masked data want > to *hide* parts of their data, which seems much clearer to me and is > the theoretical model used in the alterNEP. Note that this model > actually predicts several of the differences between how people want > masks to work and how people want NAs to work (e.g., their behavior > during reduction); I > > I looked over the theoretical mode in the aNEP, and I disagree with it. I think a masked array is just that: an array with a mask. Do whatever with the mask, but it's up to the user to decide how they want to use it. It doesn't seem like it has to come with a theoretical model. (Unlike missing data, which comes which does have a nice theoretical model.) The theoretical model in the aNEP seems to assume too much. I'm thinking in particular of this idea: "a length-4 array in which the last value has been masked out behaves just like an ordinary length-3 array, so long as you don't change the mask." That's forcing a notion of column/position independence on the masked array, in that any function operating on the rows must treat each column the same. And I'm don't think that's part of the contract that should come from creating a masked array. >> Do you agree that the alterNEP proposal is easier to understand? > > > > No. > >> > >> If not, can you explain why? > > > > My answers to that are already scattered in the emails in various places, > > and in the various rationales and justifications provided in the NEP. > > I understand the desire not to get caught up in spending all your time > writing emails explaining things that you feel like you've already > explained. > > Maybe there's an email I missed somewhere where you explain the > conceptual model behind your NEP's semantics in a short, > easy-to-understand way (comparable to, say, the Rationale section of > the alterNEP). But I haven't seen it and I can't reconstruct a > rationale for it myself (the alterNEP comes out of my attempts to do > so!). > > >> What do you see as the important points of difference between the NEP > >> and the alterNEP? > > > > The biggest thing is the NEP supports more use cases in a clean way by > > composition of different simpler components. It defines one clear missing > > data abstraction, and proposes two implementations that are > interchangeable > > and can interoperate. > > But the two implementations in your proposal are not interchangeable! > The whole justification for starting with a masked-based > implementation in your proposal is that it supports unmasking via > views; if that requirement were removed, then there would be no reason > to bother with the masking-based implementation at all. > > Well, that's not true. There are some marginal advantages in the > special case of working with integers+NAs. But I don't think anyone's > making that argument. > > > The alterNEP proposes two independent APIs, reducing > > interoperability and so significantly increasing the amount of learning > > required to work with both of them. This also precludes switching between > > the two approaches without a lot of work. > > You can't switch between Python and C without a lot of work too, but > that doesn't mean that they should be merged into one design... but > they do complement each other beautifully. Just like missing data and > masked arrays :-). > > > The current pull request that's sitting there waiting for review does not > > have an impact on which approach goes ahead, but the code I'm doing now > > does. This is a fairly large project, and I don't have a great length of > > time to do it in, so I'm not going to participate extensively in the > > alterNEP discussion. If you want to help me, please review my code and > > provide specific feedback on my NEP (the code review system in github is > > great for this too, I've received some excellent feedback on the NEP that > > way). If you want to change my mind about things, please address the > > specific design decisions you think are problematic by specifically > > responding to lines in the NEP, as part of code-reviewing my pull request > in > > github. > > I know I'm being grumpy in this email, and I apologize for that. But, > no. I've given extensive feedback, read the list carefully, and > thought hard about these issues, and so far you've basically just > dismissed my concerns. (See, e.g., [1], where your response to "we > have to choose whether it's possible to recover data after it has been > masked/NAed/whatever" is "no we don't, it should be both possible and > impossible", which, I mean, what?) I've done my best to express them > clearly, in the best way I know how -- and that way is *not* line by > line comments on your NEP, because my concerns are more fundamental > than that. > > I am of course happy to answer questions and such if there are places > where I've been unclear. > > And of course it's your prerogative to decide how you want to spend > your time (well, yours and your employer's, I guess), which forums you > want to participate in, what code you want to write, etc. If you have > decided that you are tired to talking about this and want to just go > off and implement something, then good luck (and I do mean that, it > isn't sarcasm). > > But as far as I can tell right now, every single person who has > experience with handling missing data for statistical purposes (esp. > in R) has real concerns about your proposal, and AFAICT the community > has very much *not* reached consensus on how these features should > look. So I guess my question is, once you've spent your limited time > on writing this code -- how confident are you that it will be merged? > This isn't a threat or anything, I have no power over what gets > merged, but -- it seems to me that there's a real chance that you'll > do this work and then it will go down in flames, or that it will be > merged and then the people you're trying to target will ignore it > anyway. This is why we try to build consensus first, right? I would > love to find some way to make everyone happy (and have been doing what > I can on that front), but right now I am not happy, other people are > not happy, and you're communicating that you don't think that matters. > I'd love for that to change. > I'm a statistics grad students and an R user, and I'm mostly ok with what Mark is doing. Currently, as I understand it, Mark is working on a structure that will make missing data into a first class citizen in the numpy world. This is great! Before it had been more of a 2nd class-citizen. And Mark is even trying to copy R semantics as much as possible. It's true that Mark's making it so the masked part of these new arrays won't be as front and center. The functionality will be there and it will be easy to used. But it will be based more on an explicit contract that the data memory contents of a masked array will not be overwritten when the data is masked. So I don't think Mark is making anything implicit--he's making a very explicit contract about how the data memory is handled when the mask is changed. If I understand correctly, it seems like the main objection to Mark's current API is that the explicit contract about data memory isn't somehow immediately visible in the API. It's true this is a trade-off, but it leads to a simpler API with easier ability to use all features at once at the pretty small cost of the user just having to read enough to realize that there's an explicit contract about what happens to the memory of a masked value, and they can access it by taking a view. That's easy enough to add at the very beginning of the documentation. -Chris JS > > -- Nathaniel > > [1] http://mail.scipy.org/pipermail/numpy-discussion/2011-June/057274.html > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 13:14:27 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 12:14:27 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 10:15 AM, Nathaniel Smith wrote: > On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe wrote: > > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett > > wrote: > >> Do you see problems with the alterNEP proposal? > > > > Yes, I really like my design as it stands now, and the alterNEP removes a > > lot of the abstraction and interoperability that are in my opinion the > best > > parts. I've made more updates to the NEP based on continuing feedback, > which > > are part of the pull request I want reviews for. > > > >> > >> If so, what are they? > > > > Mainly: Reduced interoperability, more complex implementation (leading to > > more bugs), and an unclear theoretical model for the masked part of it. > > Can you give any examples of situations where one would run into this > "reduced interoperability"? I'm not sure what it means. The only > person who has so far spoken up as needing both masking semantics and > NA semantics -- Gary Strangman -- has said that he strongly prefers > the alterNEP semantics *exactly because* it makes it clear *how these > functions will interoperate.* > I've given examples before, but here are a few: 1) You're using NA dtypes. You realize you want multiple views of the same data with different choices of NA. You switch to masked arrays with a few lines of code changes. 2) You're using masks. You realize that you will save memory/disk space if you switch to NA dtypes, and it's possible because it turned out that while you thought you would need masking, you came up with a new algorithm that didn't require it. 3) You're writing matplotlib, and you want to support all forms of NA-style data. You write it once instead of twice. Repeat for all other open source libraries that want to do this. > Can you give any examples of how the implementation would be more > complicated? As far as I can tell there are no elements in the > alterNEP that are not in your NEP, they mostly just expose the > functionality differently at the top level. > If that is the case, then it should be easy to change to your model after the implementation is complete. I'm happy with that, these style of design choices are easier to make when you're comparing actual usage than hypotheticals. Do you have a clearer theoretical model for the masked part of your > proposal? Yes, exactly the same model used for NA dtypes. > The best I've been able to extract from any of your messages > is when you wrote "it seems to me that people wanting masked arrays > want missing data without touching their data". But as a matter of > English grammar, I have no idea what this means -- if you have data, > it's not missing! Ok, missing data-like functionality, which is provided by the solid theory behind the missing data. > It seems to me that people wanting masked data want > to *hide* parts of their data, which seems much clearer to me and is > the theoretical model used in the alterNEP. Once you've hidden it, isn't it now missing? > Note that this model > actually predicts several of the differences between how people want > masks to work and how people want NAs to work (e.g., their behavior > during reduction); I > > >> Do you agree that the alterNEP proposal is easier to understand? > > > > No. > >> > >> If not, can you explain why? > > > > My answers to that are already scattered in the emails in various places, > > and in the various rationales and justifications provided in the NEP. > > I understand the desire not to get caught up in spending all your time > writing emails explaining things that you feel like you've already > explained. > > Maybe there's an email I missed somewhere where you explain the > conceptual model behind your NEP's semantics in a short, > easy-to-understand way (comparable to, say, the Rationale section of > the alterNEP). But I haven't seen it and I can't reconstruct a > rationale for it myself (the alterNEP comes out of my attempts to do > so!). > I've been repeatedly updating the NEP. In particular this "round 2" email was an attempt to clarify between the two missing data models (what's being called NA and IGNORE), and the two implementation techniques (NA bit patterns and masks). I've argued that these are completely independent from each other. > >> What do you see as the important points of difference between the NEP > >> and the alterNEP? > > > > The biggest thing is the NEP supports more use cases in a clean way by > > composition of different simpler components. It defines one clear missing > > data abstraction, and proposes two implementations that are > interchangeable > > and can interoperate. > > But the two implementations in your proposal are not interchangeable! > The whole justification for starting with a masked-based > implementation in your proposal is that it supports unmasking via > views; if that requirement were removed, then there would be no reason > to bother with the masking-based implementation at all. > They are interchangeable 100% with regard to the missing data semantics. Views are an orthogonal feature, and it is through composition of these two features that the masks gain this power. > Well, that's not true. There are some marginal advantages in the > special case of working with integers+NAs. But I don't think anyone's > making that argument. > > > The alterNEP proposes two independent APIs, reducing > > interoperability and so significantly increasing the amount of learning > > required to work with both of them. This also precludes switching between > > the two approaches without a lot of work. > > You can't switch between Python and C without a lot of work too, but > that doesn't mean that they should be merged into one design... but > they do complement each other beautifully. Just like missing data and > masked arrays :-). > This last statement is why I feel like you haven't been reading my emails. I've clearly positioned masks as an implementation technique, not implying any specific semantics. > > > The current pull request that's sitting there waiting for review does not > > have an impact on which approach goes ahead, but the code I'm doing now > > does. This is a fairly large project, and I don't have a great length of > > time to do it in, so I'm not going to participate extensively in the > > alterNEP discussion. If you want to help me, please review my code and > > provide specific feedback on my NEP (the code review system in github is > > great for this too, I've received some excellent feedback on the NEP that > > way). If you want to change my mind about things, please address the > > specific design decisions you think are problematic by specifically > > responding to lines in the NEP, as part of code-reviewing my pull request > in > > github. > > I know I'm being grumpy in this email, and I apologize for that. But, > no. I've given extensive feedback, read the list carefully, and > thought hard about these issues, and so far you've basically just > dismissed my concerns. (See, e.g., [1], where your response to "we > have to choose whether it's possible to recover data after it has been > masked/NAed/whatever" is "no we don't, it should be both possible and > impossible", which, I mean, what?) I've done my best to express them > clearly, in the best way I know how -- and that way is *not* line by > line comments on your NEP, because my concerns are more fundamental > than that. > I've likewise read your emails carefully, and really appreciated that you jumped in right at the beginning with a good explanation of R's missing value semantics. I think line by line comments on the NEP expressing where the fundamental problems would help us communicate better. I've tried to tease apart the distinction between the missing value abstractions and the implementation techniques, and I haven't seen the fact that you read that reflected in your emails. If you have a good reason why implementing something with masks implies certain semantics, please explain, dealing with the points that I've laid out arguing for this design choice in the latest NEP, accessible via the pull request. I am of course happy to answer questions and such if there are places > where I've been unclear. > > And of course it's your prerogative to decide how you want to spend > your time (well, yours and your employer's, I guess), which forums you > want to participate in, what code you want to write, etc. If you have > decided that you are tired to talking about this and want to just go > off and implement something, then good luck (and I do mean that, it > isn't sarcasm). > I do want to constructively engage the community at the same time as I do the implementation, and I have a track record of producing good interfaces even when the underlying functionality is complex. I've had very positive feedback about einsum from people who deal with multiple arrays of multidimensional data and were missing an easy way to do that kind of operation. But as far as I can tell right now, every single person who has > experience with handling missing data for statistical purposes (esp. > in R) has real concerns about your proposal, and AFAICT the community > has very much *not* reached consensus on how these features should > look. So I guess my question is, once you've spent your limited time > on writing this code -- how confident are you that it will be merged? > This isn't a threat or anything, I have no power over what gets > merged, but -- it seems to me that there's a real chance that you'll > do this work and then it will go down in flames, or that it will be > merged and then the people you're trying to target will ignore it > anyway. This is why we try to build consensus first, right? I would > love to find some way to make everyone happy (and have been doing what > I can on that front), but right now I am not happy, other people are > not happy, and you're communicating that you don't think that matters. > I'd love for that to change. > Building consensus is general virtually impossible, I'm for example very impressed with the C++ standards committee's success in achieving it where they have. My development process is different from what you're describing, Like with datetime, I am merging periodically, not doing one big merge at the end. There's a reason why design by committee is frowned upon. The feedback is great, but still needs to go through a very strict software design quality filter. -Mark > > -- Nathaniel > > [1] http://mail.scipy.org/pipermail/numpy-discussion/2011-June/057274.html > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 13:22:30 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 11:22:30 -0600 Subject: [Numpy-discussion] Current status of 64 bit windows support. Message-ID: Just curious as to what folks know about the current status of the free windows 64 bit compilers. I know things were dicey with gcc and gfortran some two years ago, but... well, two years have passed. This Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From xscript at gmx.net Fri Jul 1 13:39:01 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Fri, 01 Jul 2011 19:39:01 +0200 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: (Matthew Brett's message of "Fri, 1 Jul 2011 17:00:00 +0100") References: Message-ID: <87liwhew96.fsf@ginnungagap.bsc.es> Matthew Brett writes: >>> > Mainly: Reduced interoperability >>> >>> Meaning? >> >> You can't switch between the two approaches without big changes in your >> code. > Lluis provided a case, and it was obscure. That switch seems like a > rare or non-existent use-case that should not guide the API. The example was for an outlier detection *in-place*. I see the merged API as beneficial in cases where: * There are arguments used both as input *and* output (w.r.t. missing data information), and it is up to the *caller* to decide whether to also maintain the original data. That is, with a merged API, the caller can retain a "copy" - a view in fact - of its original data more efficiently. In the matplotlib case, the outlier detection caller might decide to pass a brand new array copy, so then the outlier detection is then implemented using np.NA (as they are both developed inside the same framework). But it may also be the case that later on, the developer decides to rewrite the caller function (for whatever reason, like avoiding a full copy of the array) as passing an array with masking activated. With the merged API the outlier detection will still work perfectly. With np.IGNORE the outlier detection code should also be changed. This is what Mark talks about when saying "interoperability", and it is a good choice from the point of view of code maintenance. * Propagation of np.NA and np.IGNORE are controlled with a single argument (thus simpler and less error-prone code), as opposed to two separate arguments and two possible outcomes (np.NA and np.IGNORE) with aNEP. I have been repeating these 2 points again and again, and I still feel they have not yet been addressed by the aNEP. Still, the only clear statement I've seen in favour of the aNEP is minimizing "surprises". And I will repeat it again. You have to *explicitly* "activate" masks, just as well as you *explicitly* use np.IGNORE, so it should not surprise you when you see a mask-like behaviour, precisely because you have asked for it. If you don't want that behaviour, you simply don't activate masks. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From xscript at gmx.net Fri Jul 1 13:47:16 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Fri, 01 Jul 2011 19:47:16 +0200 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: (Nathaniel Smith's message of "Fri, 1 Jul 2011 08:15:50 -0700") References: Message-ID: <87zkkxdhaz.fsf@ginnungagap.bsc.es> Nathaniel Smith writes: > On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe wrote: >> On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett >> wrote: >>> Do you see problems with the alterNEP proposal? >> >> Yes, I really like my design as it stands now, and the alterNEP removes a >> lot of the abstraction and interoperability that are in my opinion the best >> parts. I've made more updates to the NEP based on continuing feedback, which >> are part of the pull request I want reviews for. >> >>> >>> If so, what are they? >> >> Mainly: Reduced interoperability, more complex implementation (leading to >> more bugs), and an unclear theoretical model for the masked part of it. > Can you give any examples of situations where one would run into this > "reduced interoperability"? I'm not sure what it means. The only > person who has so far spoken up as needing both masking semantics and > NA semantics -- Gary Strangman -- has said that he strongly prefers > the alterNEP semantics *exactly because* it makes it clear *how these > functions will interoperate.* Interoperability improves code maintenance, see my other mail. [...] > Do you have a clearer theoretical model for the masked part of your > proposal? The best I've been able to extract from any of your messages > is when you wrote "it seems to me that people wanting masked arrays > want missing data without touching their data". But as a matter of > English grammar, I have no idea what this means -- if you have data, > it's not missing! It seems to me that people wanting masked data want > to *hide* parts of their data, which seems much clearer to me and is > the theoretical model used in the alterNEP. Note that this model > actually predicts several of the differences between how people want > masks to work and how people want NAs to work (e.g., their behavior > during reduction); I Come on, let's not jump into each other's throats, I think we've long ago arrived at a point where we all know what masked means. If you agree on the interoperability point, then I don't see how the aNEP improves on that, having in mind that masks must be *explicitly* activated (again, see the other mail). [...] > Well, that's not true. There are some marginal advantages in the > special case of working with integers+NAs. But I don't think anyone's > making that argument. I for one would love that, instead of having to explicitly set dtypes when using genfromtxt. [...] > But as far as I can tell right now, every single person who has > experience with handling missing data for statistical purposes (esp. > in R) has real concerns about your proposal, and AFAICT the community > has very much *not* reached consensus on how these features should > look. What I have seen is that people used to R see the mask concept as an alien, and said "I don't want to use it, so please make it more explicit so that I will know what to avoid". What I say is that you simply don't have to make np.IGNORE explicit to avoid masks. Simply do not create arrays with masks. Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From jsseabold at gmail.com Fri Jul 1 13:59:43 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 1 Jul 2011 13:59:43 -0400 Subject: [Numpy-discussion] Moving lib.recfunctions? Message-ID: lib.recfunctions has never been fully advertised. The two bugs I just discovered lead me to believe that it's not that well vetted, but it is useful. I can't be the only one using these? What do people think of either deprecating lib.recfunctions or at least importing them into the numpy.rec namespace? I'm sure this has come up before, but gmane search isn't working for me. Skipper From pgmdevlist at gmail.com Fri Jul 1 14:17:00 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Jul 2011 20:17:00 +0200 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 Message-ID: On Jul 1, 2011 7:14 PM, "Mark Wiebe" wrote: > > On Fri, Jul 1, 2011 at 10:15 AM, Nathaniel Smith wrote: >> >> On Fri, Jul 1, 2011 at 7:09 AM, Mark Wiebe wrote: >> > On Fri, Jul 1, 2011 at 6:58 AM, Matthew Brett >> > wrote: >> >> Do you see problems with the alterNEP proposal? >> > >> > Yes, I really like my design as it stands now, and the alterNEP removes a >> > lot of the abstraction and interoperability that are in my opinion the best >> > parts. I've made more updates to the NEP based on continuing feedback, which >> > are part of the pull request I want reviews for. >> > >> >> >> >> If so, what are they? >> > >> > Mainly: Reduced interoperability, more complex implementation (leading to >> > more bugs), and an unclear theoretical model for the masked part of it. >> >> Can you give any examples of situations where one would run into this >> "reduced interoperability"? I'm not sure what it means. The only >> person who has so far spoken up as needing both masking semantics and >> NA semantics -- Gary Strangman -- has said that he strongly prefers >> the alterNEP semantics *exactly because* it makes it clear *how these >> functions will interoperate.* > > > I've given examples before, but here are a few: > > 1) You're using NA dtypes. You realize you want multiple views of the same data with different choices of NA. You switch to masked arrays with a few lines of code changes. Multiple NAs? AFAIU, there's only one NA (per type) but several choices to allocate a IGNORE depending on the situation. > 2) You're using masks. You realize that you will save memory/disk space if you switch to NA dtypes, and it's possible because it turned out that while you thought you would need masking, you came up with a new algorithm that didn't require it. Ok, your IGNOREs become N'as because you want to... > 3) You're writing matplotlib, and you want to support all forms of NA-style data. You write it once instead of twice. Repeat for all other open source libraries that want to do this. You switch your NAs to IGNOREs, ok, your call again. > >> >> Can you give any examples of how the implementation would be more >> complicated? As far as I can tell there are no elements in the >> alterNEP that are not in your NEP, they mostly just expose the >> functionality differently at the top level. > > > If that is the case, then it should be easy to change to your model after the implementation is complete. I'm happy with that, these style of design choices are easier to make when you're comparing actual usage than hypotheticals. > >> Do you have a clearer theoretical model for the masked part of your >> proposal? > > > Yes, exactly the same model used for NA dtypes. > >> >> The best I've been able to extract from any of your messages >> is when you wrote "it seems to me that people wanting masked arrays >> want missing data without touching their data". But as a matter of >> English grammar, I have no idea what this means -- if you have data, >> it's not missing! > > > Ok, missing data-like functionality, which is provided by the solid theory behind the missing data. Which is a subset of 'masked/to ignore' data... >> >> It seems to me that people wanting masked data want >> to *hide* parts of their data, which seems much clearer to me and is >> the theoretical model used in the alterNEP. > > > Once you've hidden it, isn't it now missing? Only temporarily, you can revert to not hidden when needed. If a data is flagged as NA, it should never be accessible again. >> >> Note that this model >> actually predicts several of the differences between how people want >> masks to work and how people want NAs to work (e.g., their behavior >> during reduction); I > > >> >> >> Do you agree that the alterNEP proposal is easier to understand? >> > >> > No. >> >> >> >> If not, can you explain why? >> > >> > My answers to that are already scattered in the emails in various places, >> > and in the various rationales and justifications provided in the NEP. >> >> I understand the desire not to get caught up in spending all your time >> writing emails explaining things that you feel like you've already >> explained. >> >> Maybe there's an email I missed somewhere where you explain the >> conceptual model behind your NEP's semantics in a short, >> easy-to-understand way (comparable to, say, the Rationale section of >> the alterNEP). But I haven't seen it and I can't reconstruct a >> rationale for it myself (the alterNEP comes out of my attempts to do >> so!). > > > I've been repeatedly updating the NEP. In particular this "round 2" email was an attempt to clarify between the two missing data models (what's being called NA and IGNORE), and the two implementation techniques (NA bit patterns and masks). I've argued that these are completely independent from each other. > >> >> >> What do you see as the important points of difference between the NEP >> >> and the alterNEP? >> > >> > The biggest thing is the NEP supports more use cases in a clean way by >> > composition of different simpler components. It defines one clear missing >> > data abstraction, and proposes two implementations that are interchangeable >> > and can interoperate. >> >> But the two implementations in your proposal are not interchangeable! >> The whole justification for starting with a masked-based >> implementation in your proposal is that it supports unmasking via >> views; if that requirement were removed, then there would be no reason >> to bother with the masking-based implementation at all. > > > They are interchangeable 100% with regard to the missing data semantics. Views are an orthogonal feature, and it is through composition of these two features that the masks gain this power. I'll check your code, but conceptually, NAs and IGNOREs are NOT interchangeable. >> >> Well, that's not true. There are some marginal advantages in the >> special case of working with integers+NAs. But I don't think anyone's >> making that argument. >> >> > The alterNEP proposes two independent APIs, reducing >> > interoperability and so significantly increasing the amount of learning >> > required to work with both of them. This also precludes switching between >> > the two approaches without a lot of work. >> >> You can't switch between Python and C without a lot of work too, but >> that doesn't mean that they should be merged into one design... but >> they do complement each other beautifully. Just like missing data and >> masked arrays :-). > > > This last statement is why I feel like you haven't been reading my emails. I've clearly positioned masks as an implementation technique, not implying any specific semantics. > >> >> >> > The current pull request that's sitting there waiting for review does not >> > have an impact on which approach goes ahead, but the code I'm doing now >> > does. This is a fairly large project, and I don't have a great length of >> > time to do it in, so I'm not going to participate extensively in the >> > alterNEP discussion. If you want to help me, please review my code and >> > provide specific feedback on my NEP (the code review system in github is >> > great for this too, I've received some excellent feedback on the NEP that >> > way). If you want to change my mind about things, please address the >> > specific design decisions you think are problematic by specifically >> > responding to lines in the NEP, as part of code-reviewing my pull request in >> > github. >> >> I know I'm being grumpy in this email, and I apologize for that. But, >> no. I've given extensive feedback, read the list carefully, and >> thought hard about these issues, and so far you've basically just >> dismissed my concerns. (See, e.g., [1], where your response to "we >> have to choose whether it's possible to recover data after it has been >> masked/NAed/whatever" is "no we don't, it should be both possible and >> impossible", which, I mean, what?) I've done my best to express them >> clearly, in the best way I know how -- and that way is *not* line by >> line comments on your NEP, because my concerns are more fundamental >> than that. > > > I've likewise read your emails carefully, and really appreciated that you jumped in right at the beginning with a good explanation of R's missing value semantics. I think line by line comments on the NEP expressing where the fundamental problems would help us communicate better. I've tried to tease apart the distinction between the missing value abstractions and the implementation techniques, and I haven't seen the fact that you read that reflected in your emails. If you have a good reason why implementing something with masks implies certain semantics, please explain, dealing with the points that I've laid out arguing for this design choice in the latest NEP, accessible via the pull request. > >> I am of course happy to answer questions and such if there are places >> where I've been unclear. >> >> And of course it's your prerogative to decide how you want to spend >> your time (well, yours and your employer's, I guess), which forums you >> want to participate in, what code you want to write, etc. If you have >> decided that you are tired to talking about this and want to just go >> off and implement something, then good luck (and I do mean that, it >> isn't sarcasm). > > > I do want to constructively engage the community at the same time as I do the implementation, and I have a track record of producing good interfaces even when the underlying functionality is complex. I've had very positive feedback about einsum from people who deal with multiple arrays of multidimensional data and were missing an easy way to do that kind of operation. > >> But as far as I can tell right now, every single person who has >> experience with handling missing data for statistical purposes (esp. >> in R) has real concerns about your proposal, and AFAICT the community >> has very much *not* reached consensus on how these features should >> look. So I guess my question is, once you've spent your limited time >> on writing this code -- how confident are you that it will be merged? >> This isn't a threat or anything, I have no power over what gets >> merged, but -- it seems to me that there's a real chance that you'll >> do this work and then it will go down in flames, or that it will be >> merged and then the people you're trying to target will ignore it >> anyway. This is why we try to build consensus first, right? I would >> love to find some way to make everyone happy (and have been doing what >> I can on that front), but right now I am not happy, other people are >> not happy, and you're communicating that you don't think that matters. >> I'd love for that to change. > > > Building consensus is general virtually impossible, I'm for example very impressed with the C++ standards committee's success in achieving it where they have. My development process is different from what you're describing, Like with datetime, I am merging periodically, not doing one big merge at the end. There's a reason why design by committee is frowned upon. The feedback is great, but still needs to go through a very strict software design quality filter. > > -Mark > >> >> >> -- Nathaniel >> >> [1] http://mail.scipy.org/pipermail/numpy-discussion/2011-June/057274.html >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Jul 1 14:22:20 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 1 Jul 2011 14:22:20 -0400 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold wrote: > lib.recfunctions has never been fully advertised. The two bugs I just > discovered lead me to believe that it's not that well vetted, but it > is useful. I can't be the only one using these? > > What do people think of either deprecating lib.recfunctions or at > least importing them into the numpy.rec namespace? > > I'm sure this has come up before, but gmane search isn't working for me. about once a year http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 my guess is not much has changed since then Josef > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla at molden.no Fri Jul 1 14:23:55 2011 From: sturla at molden.no (Sturla Molden) Date: Fri, 01 Jul 2011 20:23:55 +0200 Subject: [Numpy-discussion] Current status of 64 bit windows support. In-Reply-To: References: Message-ID: <4E0E10BB.6040000@molden.no> Den 01.07.2011 19:22, skrev Charles R Harris: > Just curious as to what folks know about the current status of the > free windows 64 bit compilers. I know things were dicey with gcc and > gfortran some two years ago, but... well, two years have passed. This Windows 7 SDK is free (as in beer). It is the C compiler used to build Python on Windows 64. Here is the download: http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3138 A newer version of the Windows SDK will use a C compiler that links with a different CRT than Python uses. Use version 3.5. When using this compiler, remember to set the environment variable DISTUTILS_USE_SDK. This should be sufficient to build NumPy. AFAIK only SciPy requires a Fortran compiler. Mingw is still not stabile on Windows 64. There are supposedly compatibility issues between the MinGW runtime used by libgfortran and Python's CRT. While there are experimental MinGW builds for Windows 64 (e.g. TDM-GCC), we will probably need to build libgfortran against another C runtime for SciPy. A commercial Fortran compiler compatible with MSVC is recommended for SciPy, e.g. Intel, Absoft or Portland. Sturla From jsseabold at gmail.com Fri Jul 1 14:32:54 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 1 Jul 2011 14:32:54 -0400 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:22 PM, wrote: > On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold wrote: >> lib.recfunctions has never been fully advertised. The two bugs I just >> discovered lead me to believe that it's not that well vetted, but it >> is useful. I can't be the only one using these? >> >> What do people think of either deprecating lib.recfunctions or at >> least importing them into the numpy.rec namespace? >> >> I'm sure this has come up before, but gmane search isn't working for me. > > about once a year > > http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 > > my guess is not much has changed since then > Ah, yes. I recall now. I agree that they're more general than rec, but also don't have a first best solution for this. So I think we should move them (in a correct way) to numpy.rec and add (some of?) them as methods to recarrays. The best we can do beyond that is put some docs on the structured array page and notes in the docstrings that they also work for ndarrays with structured dtype. I'll submit a pull request soon and maybe that'll generate some interest. Skipper From ben.root at ou.edu Fri Jul 1 15:05:45 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 1 Jul 2011 14:05:45 -0500 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 12:59 PM, Skipper Seabold wrote: > lib.recfunctions has never been fully advertised. The two bugs I just > discovered lead me to believe that it's not that well vetted, but it > is useful. I can't be the only one using these? > > Nope, you aren't the only one. I use them in my code. > What do people think of either deprecating lib.recfunctions or at > least importing them into the numpy.rec namespace? > > I wouldn't mind moving them around, but I certainly would not want them deprecated. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Fri Jul 1 15:12:28 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 1 Jul 2011 15:12:28 -0400 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 3:05 PM, Benjamin Root wrote: > > > On Fri, Jul 1, 2011 at 12:59 PM, Skipper Seabold > wrote: >> >> lib.recfunctions has never been fully advertised. The two bugs I just >> discovered lead me to believe that it's not that well vetted, but it >> is useful. I can't be the only one using these? >> > > Nope, you aren't the only one.? I use them in my code. Ah, good. I was recently just surprised by some results of pretty basic use of join_by. > >> >> What do people think of either deprecating lib.recfunctions or at >> least importing them into the numpy.rec namespace? >> > > I wouldn't mind moving them around, but I certainly would not want them > deprecated. > Just meant deprecated in terms of the namespace. Skipper From mwwiebe at gmail.com Fri Jul 1 15:22:33 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 14:22:33 -0500 Subject: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design Message-ID: The missing data thread has gotten a bit heated, and after sitting down with Travis to discuss the issues a bit, we've concluded that it would be nice to do a call with everyone who's interested in the discussion with better communication bandwidth. There are lots of good ideas out there, and it is very easy for things to get lost when we're just emailing. Getting on the phone should provide a more effective way to ensure everyone is properly being heard. We're proposing to set up a GotoMeeting call at 4pm CST today. Please respond if you can make it and your level of interest. I've created a Doodle where you can indicate your availability if 4pm today is too short notice, and we should schedule for a different time: http://www.doodle.com/eu9k3xip47a6gnue Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From jh at physics.ucf.edu Fri Jul 1 15:29:32 2011 From: jh at physics.ucf.edu (Joe Harrington) Date: Fri, 01 Jul 2011 15:29:32 -0400 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: (numpy-discussion-request@scipy.org) References: Message-ID: Mark Wiebe : > With a non-boolean alpha mask, there's an implication of a > multiplication operator in there somewhere, but with a boolean mask, > the data can be any data whatsoever that doesn't necessarily support > any kind of blending operations. My goal in raising the point is to find a common core that supports everything. The benefit of the np.ma module is that you have traditional numerical routines like median() and mean() that now sensibly handle missing data, plus a data structure (the paired array and mask) that you can use for other things of your own devising. All that has to happen is to allow the sense of the mask to be FALSE = the data are bad, TRUE = the data are good, and allow (not require) the mask to be of any numerical type, or at least of integer type as well as boolean. I believe that with these two basic requirements, everyone's needs can be met. Note that you could still have boolean masks, and could still have the bad=TRUE, good=FALSE of the current np.ma module, if you had a flag to set in the dtype for what sense of the mask you wanted. It could default to the current behavior if that makes people happy/breaks the least code. > For the image accumulation you're describing, I would use either a > structured array with 'color' and 'weight' fields, or have the last > element of the color channel be the weight (like an RGBA image) so > adding multiple weighted images together would add both the colors > and the weights simultaneously, without requiring a ufunc extension > supporting struct dtypes. Well, yes, we can always design a new data structure that meets our needs, and write all the routines that will ever operate on them. But we don't want that. We want to add a feature to the *old* data structure (i.e., a numerical array of the basic data) that makes the standard routines handle missing data sensibly so we don't have to rewrite them to do so. --jh-- From Chris.Barker at noaa.gov Fri Jul 1 15:39:18 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 01 Jul 2011 12:39:18 -0700 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: References: Message-ID: <4E0E2266.8090000@noaa.gov> Joe Harrington wrote: > All > that has to happen is to allow the sense of the mask to be FALSE = the > data are bad, TRUE = the data are good, and allow (not require) the > mask to be of any numerical type, or at least of integer type as well > as boolean. quick note on this: I like the "FALSE == good" way, because: instead of good and bad we think "masked" and "unmasked", then we have: False = "unmasked" = "regular old data" True = "masked" = "something special about the data The default for "something special" is "bad" (or "missing" , or "ignore"), but the cool thing is that if you use an int: 0 = "unmasked" 1 = "masked because of one thing" 2 = "masked because of another" etc., etc. This could be pretty powerful -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From d.s.seljebotn at astro.uio.no Fri Jul 1 15:46:48 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Fri, 01 Jul 2011 21:46:48 +0200 Subject: [Numpy-discussion] An NA compromise idea -- many-NA Message-ID: <4E0E2428.1010407@astro.uio.no> I propose a simple idea *for the long term* for generalizing Mark's proposal, that I hope may perhaps put some people behind Mark's concrete proposal in the short term. If key feature missing in Mark's proposal is the ability to distinguish between different reason for NA-ness; IGNORE vs. NA. However, one could conceive wanting to track a whole host of reasons: homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, TOO_LAZY]) Wouldn't it be a shame to put a lot of work into NA, but then have users to still keep a seperate "shadow-array" for stuff like this? a) In this case the generality of Mark's proposal seems justified and less confusing to teach newcomers (?) b) Since Mark's proposal seems to generalize well to many NAs (there's 8 bits in the mask, and millions of available NaN-s in floating point), if people agreed to this one could leave it for later and just go on with the proposed idea. I don't think we should scetch out the above in more detail now, I don't want to distract, I just thought it something to consider to resolve the current situation... FWIW, Dag Sverre From ben.root at ou.edu Fri Jul 1 16:01:28 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 1 Jul 2011 15:01:28 -0500 Subject: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe wrote: > The missing data thread has gotten a bit heated, and after sitting down > with Travis to discuss the issues a bit, we've concluded that it would be > nice to do a call with everyone who's interested in the discussion with > better communication bandwidth. There are lots of good ideas out there, and > it is very easy for things to get lost when we're just emailing. Getting on > the phone should provide a more effective way to ensure everyone is properly > being heard. > > We're proposing to set up a GotoMeeting call at 4pm CST today. Please > respond if you can make it and your level of interest. I've created a Doodle > where you can indicate your availability if 4pm today is too short notice, > and we should schedule for a different time: > > http://www.doodle.com/eu9k3xip47a6gnue > > Thanks, > Mark > > Being Linux-only at work, I don't think I am able to use GoToMeeting (unless there is a mobile app for it?). Would Skype be more preferred? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Fri Jul 1 16:01:51 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 1 Jul 2011 16:01:51 -0400 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: <4E0E2428.1010407@astro.uio.no> References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn wrote: > I propose a simple idea *for the long term* for generalizing Mark's > proposal, that I hope may perhaps put some people behind Mark's concrete > proposal in the short term. > > If key feature missing in Mark's proposal is the ability to distinguish > between different reason for NA-ness; IGNORE vs. NA. However, one could > conceive wanting to track a whole host of reasons: > > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, TOO_LAZY]) > > Wouldn't it be a shame to put a lot of work into NA, but then have users > to still keep a seperate "shadow-array" for stuff like this? > > a) In this case the generality of Mark's proposal seems justified and > less confusing to teach newcomers (?) > > b) Since Mark's proposal seems to generalize well to many NAs (there's 8 > bits in the mask, and millions of available NaN-s in floating point), if > people agreed to this one could leave it for later and just go on with > the proposed idea. > I have not been following the discussion in much detail, so forgive me if this has come up. But I think this approach is also similar to thinking behind missing values in SAS and "extended" missing values in Stata. They are missing but preserve an order. This way you can pull out values that are missing because they were eaten by a dog and see if these missing ones are systematically different than the ones that are missing because they're too lazy. Use case that pops to mind, seeing if the various ways of attrition in surveys or experiments varies in a non-random way. http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm http://www.stata.com/help.cgi?missing Maybe this is neither here nor there, I just don't want to end up with the R way is the only way. That's why I prefer Python :) Skipper From mwwiebe at gmail.com Fri Jul 1 16:11:14 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 15:11:14 -0500 Subject: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 3:01 PM, Benjamin Root wrote: > On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe wrote: > >> The missing data thread has gotten a bit heated, and after sitting down >> with Travis to discuss the issues a bit, we've concluded that it would be >> nice to do a call with everyone who's interested in the discussion with >> better communication bandwidth. There are lots of good ideas out there, and >> it is very easy for things to get lost when we're just emailing. Getting on >> the phone should provide a more effective way to ensure everyone is properly >> being heard. >> >> We're proposing to set up a GotoMeeting call at 4pm CST today. Please >> respond if you can make it and your level of interest. I've created a Doodle >> where you can indicate your availability if 4pm today is too short notice, >> and we should schedule for a different time: >> >> http://www.doodle.com/eu9k3xip47a6gnue >> >> Thanks, >> Mark >> >> > Being Linux-only at work, I don't think I am able to use GoToMeeting > (unless there is a mobile app for it?). Would Skype be more preferred? > It's possible to call by phone with gotomeeting, but that doesn't allow for the nice screen sharing. I think it depends on how many people want to join in on the call. It looks like 4pm was overly optimistic for a call, Travis can't do Monday so hopefully Tuesday can work. Please fill in the doodle! Thanks, Mark > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 16:20:20 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 15:20:20 -0500 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold wrote: > On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn > wrote: > > I propose a simple idea *for the long term* for generalizing Mark's > > proposal, that I hope may perhaps put some people behind Mark's concrete > > proposal in the short term. > > > > If key feature missing in Mark's proposal is the ability to distinguish > > between different reason for NA-ness; IGNORE vs. NA. However, one could > > conceive wanting to track a whole host of reasons: > > > > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, > TOO_LAZY]) > > > > Wouldn't it be a shame to put a lot of work into NA, but then have users > > to still keep a seperate "shadow-array" for stuff like this? > > > > a) In this case the generality of Mark's proposal seems justified and > > less confusing to teach newcomers (?) > > > > b) Since Mark's proposal seems to generalize well to many NAs (there's 8 > > bits in the mask, and millions of available NaN-s in floating point), if > > people agreed to this one could leave it for later and just go on with > > the proposed idea. > > > > I have not been following the discussion in much detail, so forgive me > if this has come up. But I think this approach is also similar to > thinking behind missing values in SAS and "extended" missing values in > Stata. They are missing but preserve an order. This way you can pull > out values that are missing because they were eaten by a dog and see > if these missing ones are systematically different than the ones that > are missing because they're too lazy. Use case that pops to mind, > seeing if the various ways of attrition in surveys or experiments > varies in a non-random way. > > > http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm > http://www.stata.com/help.cgi?missing That's interesting, and I see that they use a numerical ordering for the different NA values. I think if instead of using the AND operator to combine masks, we use MINIMUM, this behavior would happen naturally with almost no additional work. Then, in addition to np.NA and np.NA(dtype), it could allow np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. -Mark > > > Maybe this is neither here nor there, I just don't want to end up with > the R way is the only way. That's why I prefer Python :) > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 16:26:30 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 15:26:30 -0500 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold wrote: > >> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >> wrote: >> > I propose a simple idea *for the long term* for generalizing Mark's >> > proposal, that I hope may perhaps put some people behind Mark's concrete >> > proposal in the short term. >> > >> > If key feature missing in Mark's proposal is the ability to distinguish >> > between different reason for NA-ness; IGNORE vs. NA. However, one could >> > conceive wanting to track a whole host of reasons: >> > >> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >> TOO_LAZY]) >> > >> > Wouldn't it be a shame to put a lot of work into NA, but then have users >> > to still keep a seperate "shadow-array" for stuff like this? >> > >> > a) In this case the generality of Mark's proposal seems justified and >> > less confusing to teach newcomers (?) >> > >> > b) Since Mark's proposal seems to generalize well to many NAs (there's 8 >> > bits in the mask, and millions of available NaN-s in floating point), if >> > people agreed to this one could leave it for later and just go on with >> > the proposed idea. >> > >> >> I have not been following the discussion in much detail, so forgive me >> if this has come up. But I think this approach is also similar to >> thinking behind missing values in SAS and "extended" missing values in >> Stata. They are missing but preserve an order. This way you can pull >> out values that are missing because they were eaten by a dog and see >> if these missing ones are systematically different than the ones that >> are missing because they're too lazy. Use case that pops to mind, >> seeing if the various ways of attrition in surveys or experiments >> varies in a non-random way. >> >> >> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >> http://www.stata.com/help.cgi?missing > > > That's interesting, and I see that they use a numerical ordering for the > different NA values. I think if instead of using the AND operator to combine > masks, we use MINIMUM, this behavior would happen naturally with almost no > additional work. Then, in addition to np.NA and np.NA(dtype), it could allow > np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. > Sorry, my brain is a bit addled by all these comments. This idea would also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as Christopher pointed out in a different thread. -Mark > > -Mark > > >> >> >> Maybe this is neither here nor there, I just don't want to end up with >> the R way is the only way. That's why I prefer Python :) >> >> Skipper >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 16:27:39 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 14:27:39 -0600 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: <4E0E2266.8090000@noaa.gov> References: <4E0E2266.8090000@noaa.gov> Message-ID: On Fri, Jul 1, 2011 at 1:39 PM, Christopher Barker wrote: > Joe Harrington wrote: > > All > > that has to happen is to allow the sense of the mask to be FALSE = the > > data are bad, TRUE = the data are good, and allow (not require) the > > mask to be of any numerical type, or at least of integer type as well > > as boolean. > > quick note on this: I like the "FALSE == good" way, because: > > instead of good and bad we think "masked" and "unmasked", then we have: > > False = "unmasked" = "regular old data" > True = "masked" = "something special about the data > > The default for "something special" is "bad" (or "missing" , or > "ignore"), but the cool thing is that if you use an int: > > 0 = "unmasked" > 1 = "masked because of one thing" > 2 = "masked because of another" > etc., etc. > > This could be pretty powerful > > I don't think the false/true dichotomy isn't something to worry about, it is an implementation detail that is hidden from the user... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 16:29:40 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 14:29:40 -0600 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe wrote: > >> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold wrote: >> >>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>> wrote: >>> > I propose a simple idea *for the long term* for generalizing Mark's >>> > proposal, that I hope may perhaps put some people behind Mark's >>> concrete >>> > proposal in the short term. >>> > >>> > If key feature missing in Mark's proposal is the ability to distinguish >>> > between different reason for NA-ness; IGNORE vs. NA. However, one could >>> > conceive wanting to track a whole host of reasons: >>> > >>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>> TOO_LAZY]) >>> > >>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>> users >>> > to still keep a seperate "shadow-array" for stuff like this? >>> > >>> > a) In this case the generality of Mark's proposal seems justified and >>> > less confusing to teach newcomers (?) >>> > >>> > b) Since Mark's proposal seems to generalize well to many NAs (there's >>> 8 >>> > bits in the mask, and millions of available NaN-s in floating point), >>> if >>> > people agreed to this one could leave it for later and just go on with >>> > the proposed idea. >>> > >>> >>> I have not been following the discussion in much detail, so forgive me >>> if this has come up. But I think this approach is also similar to >>> thinking behind missing values in SAS and "extended" missing values in >>> Stata. They are missing but preserve an order. This way you can pull >>> out values that are missing because they were eaten by a dog and see >>> if these missing ones are systematically different than the ones that >>> are missing because they're too lazy. Use case that pops to mind, >>> seeing if the various ways of attrition in surveys or experiments >>> varies in a non-random way. >>> >>> >>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>> http://www.stata.com/help.cgi?missing >> >> >> That's interesting, and I see that they use a numerical ordering for the >> different NA values. I think if instead of using the AND operator to combine >> masks, we use MINIMUM, this behavior would happen naturally with almost no >> additional work. Then, in addition to np.NA and np.NA(dtype), it could allow >> np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. >> > > Sorry, my brain is a bit addled by all these comments. This idea would also > require flipping the mask so 0 is unmasked. and 1 to 255 is masked as > Christopher pointed out in a different thread. > Or you could subtract instead of add and use maximum instead of minimum. I thought those details would be hidden. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 16:32:18 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 14:32:18 -0600 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: <4E0E2428.1010407@astro.uio.no> References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 1:46 PM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > I propose a simple idea *for the long term* for generalizing Mark's > proposal, that I hope may perhaps put some people behind Mark's concrete > proposal in the short term. > > If key feature missing in Mark's proposal is the ability to distinguish > between different reason for NA-ness; IGNORE vs. NA. However, one could > conceive wanting to track a whole host of reasons: > > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, TOO_LAZY]) > > Wouldn't it be a shame to put a lot of work into NA, but then have users > to still keep a seperate "shadow-array" for stuff like this? > > a) In this case the generality of Mark's proposal seems justified and > less confusing to teach newcomers (?) > > b) Since Mark's proposal seems to generalize well to many NAs (there's 8 > bits in the mask, and millions of available NaN-s in floating point), if > people agreed to this one could leave it for later and just go on with > the proposed idea. > > Exactly so. > I don't think we should scetch out the above in more detail now, I don't > want to distract, I just thought it something to consider to resolve the > current situation... > > The important thing is to have a working version to play with, and then see how it would be useful to extend it. I think Mark's framework wouldn't require a massive rewrite to add this sort of functionality, most of the infrastructure would probably remain the same. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 16:33:34 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 15:33:34 -0500 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 3:29 PM, Charles R Harris wrote: > > > On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe wrote: > >> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe wrote: >> >>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold wrote: >>> >>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>>> wrote: >>>> > I propose a simple idea *for the long term* for generalizing Mark's >>>> > proposal, that I hope may perhaps put some people behind Mark's >>>> concrete >>>> > proposal in the short term. >>>> > >>>> > If key feature missing in Mark's proposal is the ability to >>>> distinguish >>>> > between different reason for NA-ness; IGNORE vs. NA. However, one >>>> could >>>> > conceive wanting to track a whole host of reasons: >>>> > >>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>>> TOO_LAZY]) >>>> > >>>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>>> users >>>> > to still keep a seperate "shadow-array" for stuff like this? >>>> > >>>> > a) In this case the generality of Mark's proposal seems justified and >>>> > less confusing to teach newcomers (?) >>>> > >>>> > b) Since Mark's proposal seems to generalize well to many NAs (there's >>>> 8 >>>> > bits in the mask, and millions of available NaN-s in floating point), >>>> if >>>> > people agreed to this one could leave it for later and just go on with >>>> > the proposed idea. >>>> > >>>> >>>> I have not been following the discussion in much detail, so forgive me >>>> if this has come up. But I think this approach is also similar to >>>> thinking behind missing values in SAS and "extended" missing values in >>>> Stata. They are missing but preserve an order. This way you can pull >>>> out values that are missing because they were eaten by a dog and see >>>> if these missing ones are systematically different than the ones that >>>> are missing because they're too lazy. Use case that pops to mind, >>>> seeing if the various ways of attrition in surveys or experiments >>>> varies in a non-random way. >>>> >>>> >>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>>> http://www.stata.com/help.cgi?missing >>> >>> >>> That's interesting, and I see that they use a numerical ordering for the >>> different NA values. I think if instead of using the AND operator to combine >>> masks, we use MINIMUM, this behavior would happen naturally with almost no >>> additional work. Then, in addition to np.NA and np.NA(dtype), it could allow >>> np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. >>> >> >> Sorry, my brain is a bit addled by all these comments. This idea would >> also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as >> Christopher pointed out in a different thread. >> > > Or you could subtract instead of add and use maximum instead of minimum. I > thought those details would be hidden. > Definitely, but the most natural distinction thinking numerically is between zero and non-zero, and there's only one zero, so giving it the 'unmasked' value is natural for this way of extending it. If you follow Joe's idea where you're basically introducing it as an image alpha mask, you would have 0 be fully masked, 128 be 50% masked, and 255 be fully unmasked. -Mark > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 16:35:10 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 15:35:10 -0500 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 3:32 PM, Charles R Harris wrote: > On Fri, Jul 1, 2011 at 1:46 PM, Dag Sverre Seljebotn < > d.s.seljebotn at astro.uio.no> wrote: > >> I propose a simple idea *for the long term* for generalizing Mark's >> proposal, that I hope may perhaps put some people behind Mark's concrete >> proposal in the short term. >> >> If key feature missing in Mark's proposal is the ability to distinguish >> between different reason for NA-ness; IGNORE vs. NA. However, one could >> conceive wanting to track a whole host of reasons: >> >> homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >> TOO_LAZY]) >> >> Wouldn't it be a shame to put a lot of work into NA, but then have users >> to still keep a seperate "shadow-array" for stuff like this? >> >> a) In this case the generality of Mark's proposal seems justified and >> less confusing to teach newcomers (?) >> >> b) Since Mark's proposal seems to generalize well to many NAs (there's 8 >> bits in the mask, and millions of available NaN-s in floating point), if >> people agreed to this one could leave it for later and just go on with >> the proposed idea. >> >> > Exactly so. > > >> I don't think we should scetch out the above in more detail now, I don't >> want to distract, I just thought it something to consider to resolve the >> current situation... >> >> > The important thing is to have a working version to play with, and then see > how it would be useful to extend it. I think Mark's framework wouldn't > require a massive rewrite to add this sort of functionality, most of the > infrastructure would probably remain the same. > At the same time, I think it's great to get these ideas out there, because often it's possible to predict potential future interactions and make minor course corrections that could save a lot of work later. -Mark > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 16:36:54 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 14:36:54 -0600 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 2:33 PM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 3:29 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe wrote: >> >>> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe wrote: >>> >>>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold wrote: >>>> >>>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>>>> wrote: >>>>> > I propose a simple idea *for the long term* for generalizing Mark's >>>>> > proposal, that I hope may perhaps put some people behind Mark's >>>>> concrete >>>>> > proposal in the short term. >>>>> > >>>>> > If key feature missing in Mark's proposal is the ability to >>>>> distinguish >>>>> > between different reason for NA-ness; IGNORE vs. NA. However, one >>>>> could >>>>> > conceive wanting to track a whole host of reasons: >>>>> > >>>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>>>> TOO_LAZY]) >>>>> > >>>>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>>>> users >>>>> > to still keep a seperate "shadow-array" for stuff like this? >>>>> > >>>>> > a) In this case the generality of Mark's proposal seems justified and >>>>> > less confusing to teach newcomers (?) >>>>> > >>>>> > b) Since Mark's proposal seems to generalize well to many NAs >>>>> (there's 8 >>>>> > bits in the mask, and millions of available NaN-s in floating point), >>>>> if >>>>> > people agreed to this one could leave it for later and just go on >>>>> with >>>>> > the proposed idea. >>>>> > >>>>> >>>>> I have not been following the discussion in much detail, so forgive me >>>>> if this has come up. But I think this approach is also similar to >>>>> thinking behind missing values in SAS and "extended" missing values in >>>>> Stata. They are missing but preserve an order. This way you can pull >>>>> out values that are missing because they were eaten by a dog and see >>>>> if these missing ones are systematically different than the ones that >>>>> are missing because they're too lazy. Use case that pops to mind, >>>>> seeing if the various ways of attrition in surveys or experiments >>>>> varies in a non-random way. >>>>> >>>>> >>>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>>>> http://www.stata.com/help.cgi?missing >>>> >>>> >>>> That's interesting, and I see that they use a numerical ordering for the >>>> different NA values. I think if instead of using the AND operator to combine >>>> masks, we use MINIMUM, this behavior would happen naturally with almost no >>>> additional work. Then, in addition to np.NA and np.NA(dtype), it could allow >>>> np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is the default. >>>> >>> >>> Sorry, my brain is a bit addled by all these comments. This idea would >>> also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as >>> Christopher pointed out in a different thread. >>> >> >> Or you could subtract instead of add and use maximum instead of minimum. I >> thought those details would be hidden. >> > > Definitely, but the most natural distinction thinking numerically is > between zero and non-zero, and there's only one zero, so giving it the > 'unmasked' value is natural for this way of extending it. If you follow > Joe's idea where you're basically introducing it as an image alpha mask, you > would have 0 be fully masked, 128 be 50% masked, and 255 be fully unmasked. > > I'm not complaining ;) I thought these ideas were out there from the beginning, but maybe that was just me... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 16:42:37 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 15:42:37 -0500 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 3:36 PM, Charles R Harris wrote: > > > On Fri, Jul 1, 2011 at 2:33 PM, Mark Wiebe wrote: > >> On Fri, Jul 1, 2011 at 3:29 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe wrote: >>> >>>> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe wrote: >>>> >>>>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold wrote: >>>>> >>>>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>>>>> wrote: >>>>>> > I propose a simple idea *for the long term* for generalizing Mark's >>>>>> > proposal, that I hope may perhaps put some people behind Mark's >>>>>> concrete >>>>>> > proposal in the short term. >>>>>> > >>>>>> > If key feature missing in Mark's proposal is the ability to >>>>>> distinguish >>>>>> > between different reason for NA-ness; IGNORE vs. NA. However, one >>>>>> could >>>>>> > conceive wanting to track a whole host of reasons: >>>>>> > >>>>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>>>>> TOO_LAZY]) >>>>>> > >>>>>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>>>>> users >>>>>> > to still keep a seperate "shadow-array" for stuff like this? >>>>>> > >>>>>> > a) In this case the generality of Mark's proposal seems justified >>>>>> and >>>>>> > less confusing to teach newcomers (?) >>>>>> > >>>>>> > b) Since Mark's proposal seems to generalize well to many NAs >>>>>> (there's 8 >>>>>> > bits in the mask, and millions of available NaN-s in floating >>>>>> point), if >>>>>> > people agreed to this one could leave it for later and just go on >>>>>> with >>>>>> > the proposed idea. >>>>>> > >>>>>> >>>>>> I have not been following the discussion in much detail, so forgive me >>>>>> if this has come up. But I think this approach is also similar to >>>>>> thinking behind missing values in SAS and "extended" missing values in >>>>>> Stata. They are missing but preserve an order. This way you can pull >>>>>> out values that are missing because they were eaten by a dog and see >>>>>> if these missing ones are systematically different than the ones that >>>>>> are missing because they're too lazy. Use case that pops to mind, >>>>>> seeing if the various ways of attrition in surveys or experiments >>>>>> varies in a non-random way. >>>>>> >>>>>> >>>>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>>>>> http://www.stata.com/help.cgi?missing >>>>> >>>>> >>>>> That's interesting, and I see that they use a numerical ordering for >>>>> the different NA values. I think if instead of using the AND operator to >>>>> combine masks, we use MINIMUM, this behavior would happen naturally with >>>>> almost no additional work. Then, in addition to np.NA and np.NA(dtype), it >>>>> could allow np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is >>>>> the default. >>>>> >>>> >>>> Sorry, my brain is a bit addled by all these comments. This idea would >>>> also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as >>>> Christopher pointed out in a different thread. >>>> >>> >>> Or you could subtract instead of add and use maximum instead of minimum. >>> I thought those details would be hidden. >>> >> >> Definitely, but the most natural distinction thinking numerically is >> between zero and non-zero, and there's only one zero, so giving it the >> 'unmasked' value is natural for this way of extending it. If you follow >> Joe's idea where you're basically introducing it as an image alpha mask, you >> would have 0 be fully masked, 128 be 50% masked, and 255 be fully unmasked. >> >> > I'm not complaining ;) I thought these ideas were out there from the > beginning, but maybe that was just me... > You're right, but it feels like it's been 10 years in internet time by now. :) The design has evolved a lot from all the feedback too, so revisiting some of these things that initially may have felt less like they fit before doesn't hurt. I'm not so keen on rereading 250+ email messages though... -Mark > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 16:49:39 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 14:49:39 -0600 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: <4E0E2428.1010407@astro.uio.no> Message-ID: On Fri, Jul 1, 2011 at 2:42 PM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 3:36 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Jul 1, 2011 at 2:33 PM, Mark Wiebe wrote: >> >>> On Fri, Jul 1, 2011 at 3:29 PM, Charles R Harris < >>> charlesr.harris at gmail.com> wrote: >>> >>>> >>>> >>>> On Fri, Jul 1, 2011 at 2:26 PM, Mark Wiebe wrote: >>>> >>>>> On Fri, Jul 1, 2011 at 3:20 PM, Mark Wiebe wrote: >>>>> >>>>>> On Fri, Jul 1, 2011 at 3:01 PM, Skipper Seabold wrote: >>>>>> >>>>>>> On Fri, Jul 1, 2011 at 3:46 PM, Dag Sverre Seljebotn >>>>>>> wrote: >>>>>>> > I propose a simple idea *for the long term* for generalizing Mark's >>>>>>> > proposal, that I hope may perhaps put some people behind Mark's >>>>>>> concrete >>>>>>> > proposal in the short term. >>>>>>> > >>>>>>> > If key feature missing in Mark's proposal is the ability to >>>>>>> distinguish >>>>>>> > between different reason for NA-ness; IGNORE vs. NA. However, one >>>>>>> could >>>>>>> > conceive wanting to track a whole host of reasons: >>>>>>> > >>>>>>> > homework_grades = np.asarray([2, 3, 1, EATEN_BY_DOG, 5, SICK, 2, >>>>>>> TOO_LAZY]) >>>>>>> > >>>>>>> > Wouldn't it be a shame to put a lot of work into NA, but then have >>>>>>> users >>>>>>> > to still keep a seperate "shadow-array" for stuff like this? >>>>>>> > >>>>>>> > a) In this case the generality of Mark's proposal seems justified >>>>>>> and >>>>>>> > less confusing to teach newcomers (?) >>>>>>> > >>>>>>> > b) Since Mark's proposal seems to generalize well to many NAs >>>>>>> (there's 8 >>>>>>> > bits in the mask, and millions of available NaN-s in floating >>>>>>> point), if >>>>>>> > people agreed to this one could leave it for later and just go on >>>>>>> with >>>>>>> > the proposed idea. >>>>>>> > >>>>>>> >>>>>>> I have not been following the discussion in much detail, so forgive >>>>>>> me >>>>>>> if this has come up. But I think this approach is also similar to >>>>>>> thinking behind missing values in SAS and "extended" missing values >>>>>>> in >>>>>>> Stata. They are missing but preserve an order. This way you can pull >>>>>>> out values that are missing because they were eaten by a dog and see >>>>>>> if these missing ones are systematically different than the ones that >>>>>>> are missing because they're too lazy. Use case that pops to mind, >>>>>>> seeing if the various ways of attrition in surveys or experiments >>>>>>> varies in a non-random way. >>>>>>> >>>>>>> >>>>>>> http://support.sas.com/documentation/cdl/en/lrcon/62955/HTML/default/viewer.htm#a000989180.htm >>>>>>> http://www.stata.com/help.cgi?missing >>>>>> >>>>>> >>>>>> That's interesting, and I see that they use a numerical ordering for >>>>>> the different NA values. I think if instead of using the AND operator to >>>>>> combine masks, we use MINIMUM, this behavior would happen naturally with >>>>>> almost no additional work. Then, in addition to np.NA and np.NA(dtype), it >>>>>> could allow np.NA(dtype, ID) to assign an ID between 1 and 255, where 1 is >>>>>> the default. >>>>>> >>>>> >>>>> Sorry, my brain is a bit addled by all these comments. This idea would >>>>> also require flipping the mask so 0 is unmasked. and 1 to 255 is masked as >>>>> Christopher pointed out in a different thread. >>>>> >>>> >>>> Or you could subtract instead of add and use maximum instead of minimum. >>>> I thought those details would be hidden. >>>> >>> >>> Definitely, but the most natural distinction thinking numerically is >>> between zero and non-zero, and there's only one zero, so giving it the >>> 'unmasked' value is natural for this way of extending it. If you follow >>> Joe's idea where you're basically introducing it as an image alpha mask, you >>> would have 0 be fully masked, 128 be 50% masked, and 255 be fully unmasked. >>> >>> >> I'm not complaining ;) I thought these ideas were out there from the >> beginning, but maybe that was just me... >> > > You're right, but it feels like it's been 10 years in internet time by now. > :) > > The design has evolved a lot from all the feedback too, so revisiting some > of these things that initially may have felt less like they fit before > doesn't hurt. I'm not so keen on rereading 250+ email messages though... > > I wouldn't worry about it too much. You chose masks as one of the fundamental options because of their generality and this is one of the consequences of that generality. I was also thinking about this in terms of Pierre's soft/hard mask distinction, I don't know about the shared mask thing. Several questions that have also been floating about in my mind are these. Can you mask an array with NA values? can you mask a masked array with a view? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Fri Jul 1 17:04:16 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 1 Jul 2011 23:04:16 +0200 Subject: [Numpy-discussion] An NA compromise idea -- many-NA Message-ID: Mask an array with NAs? You should be able to, as IGNORE<>NA. Mask an array with a view? That's sharing the data with a different mask, you should be able to, too (np.ma works like that). Sharing mask? That'd be great if we could... That way, there'd be almost nothing left to do to adapt np.ma... On Jul 1, 2011 10:49 PM, "Charles R Harris" wrote: -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Fri Jul 1 17:07:05 2011 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 01 Jul 2011 11:07:05 -1000 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: References: <4E0E2266.8090000@noaa.gov> Message-ID: <4E0E36F9.3080106@hawaii.edu> On 07/01/2011 10:27 AM, Charles R Harris wrote: > > > On Fri, Jul 1, 2011 at 1:39 PM, Christopher Barker > > wrote: > > Joe Harrington wrote: > > All > > that has to happen is to allow the sense of the mask to be FALSE > = the > > data are bad, TRUE = the data are good, and allow (not require) the > > mask to be of any numerical type, or at least of integer type as well > > as boolean. > > quick note on this: I like the "FALSE == good" way, because: > > instead of good and bad we think "masked" and "unmasked", then we have: > > False = "unmasked" = "regular old data" > True = "masked" = "something special about the data > > The default for "something special" is "bad" (or "missing" , or > "ignore"), but the cool thing is that if you use an int: > > 0 = "unmasked" > 1 = "masked because of one thing" > 2 = "masked because of another" > etc., etc. > > This could be pretty powerful > > > I don't think the false/true dichotomy isn't something to worry about, > it is an implementation detail that is hidden from the user... But Joe's point and Chris's seemingly opposite (in terms of the Boolean value of the mask) point are that if it is not completely hidden, and if it is not restricted to be Boolean but is merely treated as Boolean with True meaning NA or Ignore, then it can be more powerful because it can carry additional information without affecting its Boolean functionality as a mask in ufuncs. Although I might use such a capability if it existed, to reduce the need to have a separate flags array corresponding to a given data array, I think that for my own purposes this is very low priority, and chances are I would often use a separate flags array even if the underlying mask were not restricted to Boolean. Eric > > Chuck From charlesr.harris at gmail.com Fri Jul 1 17:14:34 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 15:14:34 -0600 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 3:04 PM, Pierre GM wrote: > Mask an array with NAs? You should be able to, as IGNORE<>NA. Mask an array > with a view? That's sharing the data with a different mask, > I was thinking about a mask on top of a mask, i.e., start with a bare array, take a masked view of that, then a masked view of the view. > you should be able to, too (np.ma works like that). > Sharing mask? That'd be great if we could... That way, there'd be almost > nothing left to do to adapt np.ma... > Could you spend a bit of time explicating the nature of hard, soft, and shared masks? I'm guessing that a shared mask is using one mask on several data arrays, but I'm guessing. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 1 17:18:22 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 15:18:22 -0600 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: <4E0E36F9.3080106@hawaii.edu> References: <4E0E2266.8090000@noaa.gov> <4E0E36F9.3080106@hawaii.edu> Message-ID: On Fri, Jul 1, 2011 at 3:07 PM, Eric Firing wrote: > On 07/01/2011 10:27 AM, Charles R Harris wrote: > > > > > > On Fri, Jul 1, 2011 at 1:39 PM, Christopher Barker > > > wrote: > > > > Joe Harrington wrote: > > > All > > > that has to happen is to allow the sense of the mask to be FALSE > > = the > > > data are bad, TRUE = the data are good, and allow (not require) > the > > > mask to be of any numerical type, or at least of integer type as > well > > > as boolean. > > > > quick note on this: I like the "FALSE == good" way, because: > > > > instead of good and bad we think "masked" and "unmasked", then we > have: > > > > False = "unmasked" = "regular old data" > > True = "masked" = "something special about the data > > > > The default for "something special" is "bad" (or "missing" , or > > "ignore"), but the cool thing is that if you use an int: > > > > 0 = "unmasked" > > 1 = "masked because of one thing" > > 2 = "masked because of another" > > etc., etc. > > > > This could be pretty powerful > > > > > > I don't think the false/true dichotomy isn't something to worry about, > > it is an implementation detail that is hidden from the user... > > But Joe's point and Chris's seemingly opposite (in terms of the Boolean > value of the mask) point are that if it is not completely hidden, and if > it is not restricted to be Boolean but is merely treated as Boolean with > True meaning NA or Ignore, then it can be more powerful because it can > carry additional information without affecting its Boolean functionality > as a mask in ufuncs. > > Although I might use such a capability if it existed, to reduce the need > to have a separate flags array corresponding to a given data array, I > think that for my own purposes this is very low priority, and chances > are I would often use a separate flags array even if the underlying mask > were not restricted to Boolean. > > Array access needs to be distinguished from array exposure. If the access goes through getter/setter functions than the underlying representation can change. Whether or not that degree of abstraction is needed is another question, but it does make things more flexible. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 1 17:23:34 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 1 Jul 2011 16:23:34 -0500 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:29 PM, Joe Harrington wrote: > Mark Wiebe : > > > With a non-boolean alpha mask, there's an implication of a > > multiplication operator in there somewhere, but with a boolean mask, > > the data can be any data whatsoever that doesn't necessarily support > > any kind of blending operations. > > My goal in raising the point is to find a common core that supports > everything. The benefit of the np.ma module is that you have > traditional numerical routines like median() and mean() that now > sensibly handle missing data, plus a data structure (the paired array > and mask) that you can use for other things of your own devising. All > that has to happen is to allow the sense of the mask to be FALSE = the > data are bad, TRUE = the data are good, and allow (not require) the > mask to be of any numerical type, or at least of integer type as well > as boolean. I believe that with these two basic requirements, > everyone's needs can be met. Note that you could still have boolean > masks, and could still have the bad=TRUE, good=FALSE of the current > np.ma module, if you had a flag to set in the dtype for what sense of > the mask you wanted. It could default to the current behavior if that > makes people happy/breaks the least code. > > > For the image accumulation you're describing, I would use either a > > structured array with 'color' and 'weight' fields, or have the last > > element of the color channel be the weight (like an RGBA image) so > > adding multiple weighted images together would add both the colors > > and the weights simultaneously, without requiring a ufunc extension > > supporting struct dtypes. > > Well, yes, we can always design a new data structure that meets our > needs, and write all the routines that will ever operate on them. But > we don't want that. We want to add a feature to the *old* data > structure (i.e., a numerical array of the basic data) that makes the > standard routines handle missing data sensibly so we don't have to > rewrite them to do so. > I've used this style of weighted image masking quite a bit, but I think it doesn't quite fit with the discrete nature of the NA missing value concepts. The NA idea works with any dtype, like datetime, but 50% of a datetime isn't a reasonable concept, hurting the idea of general dtypes + alpha masking. It's also incompatible with the SAS or Stata-style multiple NA values idea. -Mark > > --jh-- > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Jul 1 17:46:53 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 1 Jul 2011 23:46:53 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 Message-ID: Hi, I am pleased to announce the availability (only a little later than planned) of the second release candidate of NumPy 1.6.1. This is a bugfix release, list of fixed bugs: #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() If no new problems are reported, the final release will be in one week. Sources and binaries can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 1 19:17:29 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 1 Jul 2011 16:17:29 -0700 Subject: [Numpy-discussion] An NA compromise idea -- many-NA In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:04 PM, Pierre GM wrote: > Mask an array with NAs? You should be able to, as IGNORE<>NA. Mask an array > with a view? That's sharing the data with a different mask, you should be > able to, too (np.ma works like that). I think you might be getting the proposals mixed up... Charles is talking about the NEP design, which has no distinction between IGNORE and NA; there's just NA-because-of-mask and NA-because-of-bit-pattern, which behave the same way except that under certain special circumstances you can trick the NA-because-of-mask one into acting more like the masked arrays you're thinking of. (For instance, you can "unmask" an NA-because-of-mask by using the following algorithm: save a view of the original array before you ever add a mask to it. Then when you want to unmask a value in place, you make a copy of the current mask, flip the appropriate bit in the copy, and then make a new masked array by combining a new view of the original array with your new copy of the mask. Now you have a new array object that shares memory with the old array and has that value unmasked. IIUC.) > Sharing mask? That'd be great if we could... That way, there'd be almost > nothing left to do to adapt np.ma... I'm not sure if the NEP design supports sharing masks or not -- maybe you could just assign the same object to two different array's .validitymask properties, but that property has a lot of magic in it. I don't know if that would work like a normal 'a = b' assignment, or would actually be more like 'a[:] = b[:]'. In at least some versions of the NEP design, it was an explicit goal that it not be possible to access the mask's memory directly under any circumstances, because they wanted to keep the API agnostic between using a one-byte-per-boolean mask, versus a one-bit-per-boolean mask. If that's still true (the current text doesn't seem to say either way), then there can't be any API that lets you get any kind of numpy array view of the mask, and .validitymask might actually be a snapshot generated from scratch on each access, in which case the obvious 'a.validitymask = b.validitymask' definitely wouldn't work. I guess you could support sharing by defining an opaque 'mask' object that you can't peek inside, but can only take from one array and attach to another? In the alterNEP design, the .visible field is just an ordinary numpy array with some extra checking applied (to ensure that its shape matches, etc.), so sharing masks would just be a matter of assigning the same object to two different arrays. -- Nathaniel From tkgamble at windstream.net Fri Jul 1 20:38:03 2011 From: tkgamble at windstream.net (Thomas K Gamble) Date: Fri, 1 Jul 2011 18:38:03 -0600 Subject: [Numpy-discussion] broacasting question In-Reply-To: References: <201106301132.22357.tkgamble@windstream.net> Message-ID: <201107011838.03089.tkgamble@windstream.net> > On Thu, Jun 30, 2011 at 11:32 AM, Thomas K Gamble > > wrote: > > I'm trying to convert some IDL code to python/numpy and i'm having some > > trouble understanding the rules for boradcasting during some operations. > > example: > > > > given the following arrays: > > a = array((2048,3577), dtype=float) > > b = array((256,25088), dtype=float) > > c = array((2048,3136), dtype=float) > > d = array((2048,3136), dtype=float) > > > > do: > > a = b * c + d > > > > In IDL, the computation is done without complaint and all array sizes are > > preserved. In ptyhon I get a value error concerning broadcasting. I can > > force it to work by taking slices, but the resulting size would be a = > > (256x3136) rather than (2048x3577). I admit that I don't understand IDL > > (or > > python to be honest) well enough to know how it handles this to be able > > to replicate the result properly. Does it only operate on the smallest > > dimensions ignoring the larger indices leaving their values unchanged? > > Can someone explain this to me? > > I don't see a problem > > In [1]: datetime64('now') > Out[1]: numpy.datetime64('2011-07-01T07:18:35-0600') > > In [2]: a = array((2048, 3577), float) > > In [3]: b = array((256, 25088), float) > > In [4]: c = array((2048, 3136), float) > > In [5]: d = array((2048, 3136), float) > > In [6]: a = b*c + d > > In [7]: a > Out[7]: array([ 526336., 78679104.]) > > What is the '*' in your expression supposed to mean? My apologies for the errors in my example. It should have been: a = numpy.ndarray((2048,3577), dtype=float) b = numpy.ndarray((256,25088), dtype=float) c = numpy.ndarray((2048,3136), dtype=float) d = numpy.ndarray((2048,3136), dtype=float) The numbers are the array dimensions. Data values are not provided in the example. e = b * c + d f = a / b Both of these expressions result in value errors in python but IDL handles them without complaint. The * is a multiplication operator. IDL also stores its data in Fortran/column-major order, which causes some other issues. > > Chuck -- Thomas K. Gamble tkgamble at windstream.net From charlesr.harris at gmail.com Fri Jul 1 21:00:22 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 Jul 2011 19:00:22 -0600 Subject: [Numpy-discussion] broacasting question In-Reply-To: <201107011838.03089.tkgamble@windstream.net> References: <201106301132.22357.tkgamble@windstream.net> <201107011838.03089.tkgamble@windstream.net> Message-ID: On Fri, Jul 1, 2011 at 6:38 PM, Thomas K Gamble wrote: > > On Thu, Jun 30, 2011 at 11:32 AM, Thomas K Gamble > > > > wrote: > > > I'm trying to convert some IDL code to python/numpy and i'm having some > > > trouble understanding the rules for boradcasting during some > operations. > > > example: > > > > > > given the following arrays: > > > a = array((2048,3577), dtype=float) > > > b = array((256,25088), dtype=float) > > > c = array((2048,3136), dtype=float) > > > d = array((2048,3136), dtype=float) > > > > > > do: > > > a = b * c + d > > > > > > In IDL, the computation is done without complaint and all array sizes > are > > > preserved. In ptyhon I get a value error concerning broadcasting. I > can > > > force it to work by taking slices, but the resulting size would be a = > > > (256x3136) rather than (2048x3577). I admit that I don't understand > IDL > > > (or > > > python to be honest) well enough to know how it handles this to be able > > > to replicate the result properly. Does it only operate on the smallest > > > dimensions ignoring the larger indices leaving their values unchanged? > > > Can someone explain this to me? > > > > I don't see a problem > > > > In [1]: datetime64('now') > > Out[1]: numpy.datetime64('2011-07-01T07:18:35-0600') > > > > In [2]: a = array((2048, 3577), float) > > > > In [3]: b = array((256, 25088), float) > > > > In [4]: c = array((2048, 3136), float) > > > > In [5]: d = array((2048, 3136), float) > > > > In [6]: a = b*c + d > > > > In [7]: a > > Out[7]: array([ 526336., 78679104.]) > > > > What is the '*' in your expression supposed to mean? > > My apologies for the errors in my example. It should have been: > > a = numpy.ndarray((2048,3577), dtype=float) > b = numpy.ndarray((256,25088), dtype=float) > c = numpy.ndarray((2048,3136), dtype=float) > d = numpy.ndarray((2048,3136), dtype=float) > > The numbers are the array dimensions. Data values are not provided in the > example. > > Ah, what that does in make (2,) arrays with the given elements and '*' and '+' are element-wise multiplication and addition. To get arrays with the dimensions you need something like In [19]: a = numpy.zeros((2048,3577), dtype=float) In [20]: b = numpy.zeros((256,25088), dtype=float) In [21]: c = numpy.zeros((2048,3136), dtype=float) In [22]: d = numpy.zeros((2048,3136), dtype=float) However, broadcasting b and c won't work, it isn't enough that 256 divides 2048, 256 must actually equal 1, which you can omit since it is a leading index. Same with 3136 and 25088 except the one won't automagically be added. So you can do things like In [24]: numpy.zeros((5,7))*numpy.zeros((7,)) Out[24]: array([[ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 0., 0., 0., 0., 0.]]) second array will be broadcast over the first. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Jul 1 23:47:01 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 1 Jul 2011 20:47:01 -0700 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: <4E0DF360.3090906@gmail.com> References: <4E0DF360.3090906@gmail.com> Message-ID: On Fri, Jul 1, 2011 at 9:18 AM, Bruce Southey wrote: > I am sorry that that is NOT true - DON'T just lump every one into this > when they have clearly stated the opposite! Missing values are nothing > special to me, just reality. There are many statistical applications > where masking is extremely common like outlier detection and flagging > unusual observations (missing values is also masking). Just that you as > a user have to do that yourself by creating and maintaining working > variables. Thanks for speaking up -- we all definitely want something that will work as well as possible for everyone! I'm a little confused about what you're saying, though -- I assume that you mean that you're happy with the NEP proposal for handling NA values[1], and so I misrepresented you when I said that everyone doing statistics with missing values had concerns about the NEP? If so, then my apologies. [1] https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst > I really find that you are 'splitting hairs' in your arguments as it > really has to be up to the application on how missing values and NaN > have to be handled. I see no difference between a missing value and a > NaN because in virtually all statistical applications, both of these are > dropped. This is what SAS typically does although certain procedure like > FREQ allow you to treat missing values as 'valid'. R has slightly more > flexibility since it differentiates missing valves and NaN. R allows you > to decide how missing values are handled using arguments like na.rm or > using na.fail, na.omit, na.exclude, na.pass functions. ?But I think for > the majority of cases (I'm not an R guru), R acts the same way as, by > default (which is how most people use R) R excludes missing values and > NaN's. Is your point here that NA and NaN are pretty similar, so it's splitting hairs to differentiate them? They are pretty similar, but this is the justification I wrote for having both in the alterNEP (https://gist.github.com/1056379): "For floating point computations, NAs and NaNs have (almost?) identical behavior. But they represent different things -- NaN an invalid computation like 0/0, NA a value that is not available -- and distinguishing between these things is useful because in some situations they should be treated differently. (For example, an imputation procedure should replace NAs with imputed values, but probably should leave NaNs alone.) And anyway, we can't use NaNs for integers, or strings, or booleans, so we need NA anyway, and once we have NA support for all these types, we might as well support it for floating point too for consistency." Does that seem reasonable? In any case, my arguments haven't really been about NA versus NaN -- everyone seems to agree that we want something like NA. In the NEP proposal, there are two different versions of NAs, one that's implemented using special values (e.g., a special NaN that means NA) and one that's implemented by using a secondary mask array. My argument has been that for people who just want NAs, this secondary mask version is redundant and confusing; but the mask version doesn't really help the people who want "masked arrays" either, because it's working too hard to be compatible with NAs, and the masked array people want different behavior (unmasking, automatic skipping of NAs, etc.). So it doesn't really work well for anybody. -- Nathaniel From njs at pobox.com Sat Jul 2 00:03:48 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 1 Jul 2011 21:03:48 -0700 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 9:29 AM, Benjamin Root wrote: > On Fri, Jul 1, 2011 at 11:20 AM, Matthew Brett > wrote: >> On Fri, Jul 1, 2011 at 5:17 PM, Benjamin Root wrote: >> > For more complicated functions like pcolor() and contour(), the arrays >> > needs >> > to know what the status of the neighboring points in itself, and for the >> > other arrays.? Again, either we use numpy.ma to share a common mask >> > across >> > the data arrays, or we implement our own semantics to deal with this. >> > And >> > again, we can not change any of the original data. >> > >> > This is not an obscure case.? This is existing code in matplotlib.? I >> > will >> > be evaluating the current missingdata branch later today to assess its >> > suitability for use in matplotlib. >> >> I think I missed why your case needs NA and IGNORE to use the same >> API. ?Why can't you just use masks and IGNORE here? > > The point is that matplotlib can not make assumptions about the nature of > the input data.? From matplotlib's perspective, NA's and IGNORE's are the > same thing and should be treated the same way (i.e. - skipped).? Right now, > matplotlib's code is messy and inconsistent with its treatment of masked > arrays and NaNs (some functions treat them the same, some only apply to NaNs > and vice versa).? This is because of code cruft over the years.? If we had > one interface to rule them all, we can bring *all* plotting functions to > have similar handling code and be more consistent across the board. Maybe I'm missing something, but it seems like no matter how the NA handling thing plays out, what you need is something like # For current numpy: def usable_points(a): a = np.asanyarray(a) usable = ~np.isnan(a) usable &= ~np.isinf(a) if isinstance(a, np.ma.masked_array): usable &= ~a.mask return usable def all_usable(a, *rest): usable = usable_points(a) for other in rest: usable &= usable_points(other) return usable And then you need to call all_usable from each of your plotting functions and away you go, yes? AFAICT, under the NEP proposal, in usable_points() you need to add a line like: usable &= ~np.isna(a) # NEP Under the alterNEP proposal, you need to add two lines, like usable &= ~np.isna(a) # alterNEP usable &= a.visible # alterNEP And either way, once you get your mask, you pretty much do the same thing: either use it directly, or use it to set up a masked array (of whatever flavor, and they all seem to work the same as far as this is concerned). You seem to see some way in which the alterNEP's separation of masks and NA handling makes a big difference to your architecture, but I'm not getting it :-(. -- Nathaniel From njs at pobox.com Sat Jul 2 00:40:51 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 1 Jul 2011 21:40:51 -0700 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 9:29 AM, Christopher Jordan-Squire wrote: > This is kind of late to be jumping into the 'long thread of doom', but I've > been following most of the posts, so I'd figured I'd throw in my 2 cents. > I'm Mark's officemate over the summer, and we've been talking daily about > his design. I was skeptical of various details at first, but by now Mark's > largely sold me on his design. Though, FWIW, my background is largely > statistical uses of arrays rather than scientific uses, so I grok missing > data usage more naturally than masking. Always good to hear more perspectives! Thanks for speaking up. > I looked over the theoretical mode in the aNEP, and I disagree with it. I > think a masked array is just that: an array with a mask. Do whatever with > the mask, but it's up to the user to decide how they want to use it. It > doesn't seem like it has to come with a theoretical model. (Unlike missing > data, which comes which does have a nice theoretical model.) I'm not sure what you mean here. If we have masked array support at all (and some people seem to want it), then we have to say more than "it's an array with a mask". Indexing such a beast has to do *something*, so we need some kind of theory to say what, ufuncs have to do *something*, ditto. I mean, I guess we could just say that a masked array is literally an np.ndarray where you have attached a field named "mask" that doesn't do anything, but I don't think that would really satisfy most users :-). > The theoretical model in the aNEP seems to assume too much. I'm thinking in > particular of this idea: "a length-4 array in which the last value has been > masked out behaves just like an ordinary length-3 array, so long as you > don't change the mask." That's forcing a notion of column/position > independence on the masked array, in that any function operating on the rows > must treat each column the same. And I'm don't think that's part of the > contract that should come from creating a masked array. I'm really lost on what you mean by columns versus rows here. In that sentence I'm literally saying that these two 1-d arrays should behave the same: [1, 2, 3] [1, 2, 3, --] For example, we have to decide what np.sum should do on the second array. Well, this says that it should work like this: >>> np.sum(np.array([1, 2, 3, np.IGNORE])) 6 Why? Because that's what happens when we do this: >>> np.sum(np.array([1, 2, 3])) 6 There are other ways to think about how masked arrays should act, but this seemed like one plausible heuristic to put out there as a starting point. ...If you still have an objection, could you rephrase it? And any thoughts on how I could phrase that better? > I'm a statistics grad students and an R user, and I'm mostly ok with what > Mark is doing. > Currently, as I understand it, Mark is working on a structure that will make > missing data into a first class citizen in the numpy world. This is great! > Before it had been more of a 2nd class-citizen. And Mark is even trying to > copy R semantics as much as possible. Yes, It's wonderful! > It's true that Mark's making it so the masked part of these new arrays won't > be as front and center. The functionality will be there and it will be easy > to used. But it will be based more on an explicit contract that the data > memory contents of a masked array will not be overwritten when the data is > masked.?So I don't think Mark is making anything implicit--he's making a > very explicit contract about how the data memory is handled when the mask is > changed. > If I understand correctly, it seems like the main objection to Mark's > current API is that the explicit contract about data memory isn't somehow > immediately visible in the API. It's true this is a trade-off, but it leads > to a simpler API with easier ability to use all features at once at the > pretty small cost of the user just having to read enough to realize that > there's an explicit contract about what happens to the memory of a masked > value, and they can access it by taking a view. That's easy enough to add at > the very beginning of the documentation. I don't know about others, but my main objection is this: He's proposing two different implementations for NA. I only need one, so having two is redundant and confusing. Of these two, the bit-pattern one has lower memory overhead (which many people have spoken up to say matters to them), and really obvious semantics (assignment is implemented as assignment, etc.). So why force people to make this confusing choice? What does the mask implementation add? AFAICT, its only purpose is to satisfy a rather different set of use cases. (See Gary Strangman's email here for a good description of these use cases: http://www.mail-archive.com/numpy-discussion at scipy.org/msg32385.html) But AFAICT again, it's been crippled for those use cases in order to give it the NA semantics. So I just don't see who the masking part is supposed to help. BTW, you can't access the memory of a masked value by taking a view, at least if I'm reading this version of the NEP correctly, and it seems to be the latest: https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst The only way to access the memory of a masked value is take a view *before* you mask it. And if the array has a mask at all when you take the view, you also have to set a.flags.ownmask = True, before you mask the value. -- Nathaniel From efiring at hawaii.edu Sat Jul 2 01:07:30 2011 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 01 Jul 2011 19:07:30 -1000 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: <4E0EA792.3040908@hawaii.edu> On 07/01/2011 06:40 PM, Nathaniel Smith wrote: > On Fri, Jul 1, 2011 at 9:29 AM, Christopher Jordan-Squire > BTW, you can't access the memory of a masked value by taking a view, > at least if I'm reading this version of the NEP correctly, and it > seems to be the latest: > https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst No, to see the latest you need to go to pull request #99, I believe: https://github.com/numpy/numpy/pull/99 From there click the diff button, then select doc/neps/missing-data.rst, then "view file" to get to a formatted view of the whole file in its most recent form. You can also look at the history of the file there. c-masked-array.rst was renamed to missing-data.rst and editing continued. Eric From matthew.brett at gmail.com Sat Jul 2 07:54:57 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 2 Jul 2011 12:54:57 +0100 Subject: [Numpy-discussion] NEPaNEP lessons - was: alterNEP Message-ID: Hi, These are some notes about the masking NEP discussion, in the hope that they will be useful for thinking about the NEP and other discussions in the future. This is not a discussion about the masking API. I'm trying not to mix the mask discussion with the discussion-about-the-mask-discussion. Maybe some Points of agreement =============== 1) Blame the process not the people 2) It is useful to go back over controversial incidents to see how the current process works and how it can be improved. I believe that we as a community should aim for the discussion matching the following distinctive features of successful organizations [1] A) Calm, patient, ego-free leadership styles B) Vigorous discussion followed by group agreement C) Brutal realism There was considerable disagreement and some bad feeling during the discussion. The main axis of disagreement was between the alterNEP (Nathaniel, Matthew) and NEP camps (Mark). This was and is Mark's NEP, Mark is implementing it, and so Mark is and was in charge. Mark, as the person implementing, and in charge, has full authority on the final features. Now for areas of: Fruitful disagreement =============== Here I'm going to express my opinions in the hope that it is helpful. The main area for reflection should be the discussion of the alterNEP / NEP discussion. In the spirit of the Toyota 5 whys? [2] - let's ask why? Statement: the alterNEP / NEP discussion failed and led to bad feeling. Why1: Because we the aNEP party believed were were not being heard Why2: Because, when we asked for specific feedback, we did not get it Why3: Because the NEP party had in fact already decided to go for the NEP implementation Why4: Because Mark believed that he would lose implementation time by delaying for further discussion Why5: Because there was a belief that implementation was more important than discussion I think why5 are the most important of these. My personal belief here is that we should: a) Strengthen our commitment to full and open discussion before substantial API change. b) Be careful to state the timetable for future NEP discussion. If it is short for some reason, then we should be specific about that. I believe that will have the effect of i) Strengthening our community, so that it will be clear that each person making substantial comment will be fully heard. ii) Improving our code by improving the level of discussion. I think we must be particularly careful to avoid denial-by-delay [3] or the similar denial-by-no-reply, because these are both very toxic to openness and trust in discussion. Best, Matthew [1] http://en.wikipedia.org/wiki/Good_to_Great [2] http://en.wikipedia.org/wiki/5_Whys [3] http://thinkexist.com/quotation/delay_is_the_deadliest_form_of_denial/253524.html From mmueller at python-academy.de Sat Jul 2 08:29:01 2011 From: mmueller at python-academy.de (=?ISO-8859-15?Q?Mike_M=FCller?=) Date: Sat, 02 Jul 2011 14:29:01 +0200 Subject: [Numpy-discussion] PyCon DE 2011 - Call for Proposals extended to July 15, 2011 Message-ID: <4E0F0F0D.2000808@python-academy.de> PyCon DE 2011 - Deadline for Proposals extended to July 15, 2011 ================================================================ The deadline for talk proposals is extended to July 15, 2011. You would like to talk about your Python project to the German-speaking Python community? Just submit your proposal within the next two weeks: http://de.pycon.org/2011/speaker/ About PyCon DE 2011 ------------------- The first PyCon DE will be held October 4-9, 2011 in Leipzig, Germany. The conference language will be German. Talks in English are possible. Please contact us for details. The call for proposals is now open. Please submit your talk by June 30, 2011 online. There are two types of talks: standard talks (20 minutes + 5 minutes Q&A) and long talks (45 minutes + 10 minutes Q&A). More details about the call can be found on the PyCon DE website: http://de.pycon.org/2011/Call_for_Papers/ Since the conference language will be German, the call is in German too. PyCon DE 2011 - Neuer Einsendeschluss f?r Vortragsvorschl?ge 15.07.2011 ======================================================================= Noch bis zum 15.7.2011 kann jeder, der sich f?r Python interessiert, einen Vortragsvorschlag f?r die PyCon DE 2011 einreichen. Es gibt nur zwei Bedingungen: das Thema sollte interessant sein und etwas mit Python zu tun haben. F?r die erste deutsche Python-Konferenz sind wir an einer breiten Themenpalette interessiert, die das ganze Spektrum der Entwicklung, Nutzung und Wirkung von Python zeigt. M?gliche Themen sind zum Beispiel: * Webanwendungen mit Python * Contentmanagement mit Python * Datenbankanwendungen mit Python * Testen mit Python * Systemintegration mit Python * Python f?r gro?e Systeme * Python im Unternehmensumfeld * Pythonimplementierungen (Jython, IronPython, PyPy, Unladen Swallow und andere) * Python als erste Programmiersprache * Grafische Nutzerschnittstellen (GUIs) * Parallele Programmierung mit Python * Python im wissenschaftlichen Bereich (Bioinformatik, Numerik, Visualisierung und anderes) * Embedded Python * Marketing f?r Python * Python, Open Source und Entwickler-Gemeinschaft * Zuk?nftige Entwicklungen * mehr ... Ihr Themenbereich ist nicht aufgelistet, w?re aber aus Ihrer Sicht f?r die PyCon DE interessant? Kein Problem. Reichen Sie Ihren Vortragsvorschlag einfach ein. Auch wir k?nnen nicht alle Anwendungsbereiche von Python ?berschauen. Vortragstage sind vom 5. bis 7. Oktober 2011. Es gibt zwei Vortragsformate: * Standard-Vortrag -- 20 Minuten Vortrag + 5 Minuten Diskussion * Lang-Vortrag -- 45 Minuten Vortrag + 10 Minuten Diskussion Die Vortragszeit wird strikt eingehalten. Bitte testen Sie die L?nge Ihres Vortrags. Lassen Sie gegebenenfalls ein paar Folien weg. Die Vortragsprache ist Deutsch. In begr?ndeten Ausnahmef?llen k?nnen Vortr?ge auch auf Englisch gehalten werden. Bitte fragen Sie uns dazu. Bitte reichen Sie Ihren Vortrag auf der Konferenz-Webseite http://de.pycon.org bis zum 15.07.2011 ein. Wir entscheiden bis zum 31. Juli 2011 ?ber die Annahme des Vortrags. From jason-sage at creativetrax.com Sat Jul 2 08:29:17 2011 From: jason-sage at creativetrax.com (Jason Grout) Date: Sat, 02 Jul 2011 07:29:17 -0500 Subject: [Numpy-discussion] NEPaNEP lessons - was: alterNEP In-Reply-To: References: Message-ID: <4E0F0F1D.40400@creativetrax.com> On 7/2/11 6:54 AM, Matthew Brett wrote: > Why5: Because there was a belief that implementation was more > important than discussion I hesitate to jump into the discussion here, but it seems to me that Mark and others were making the point that beginning implementation *informs* the discussion in a very valuable way. In a case like this where it seems like the differences are fundamentally based on untested assumptions (e.g., "this would be confusing" or "the consistency would provide greater benefits than any confusion"), it seems that having an implementation to play around with is a very valuable thing. Release early and often, etc. Of course, it should also be pointed out that Mark and others are trying to have a conference call, where (as they said it) the communication bandwidth is greater, which hopefully would lead to more effective and clear communication. I see that as a very responsible thing to do, given the intensity of some of the feelings in this discussion. Thanks, Jason -- Jason Grout From matthew.brett at gmail.com Sat Jul 2 09:28:41 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 2 Jul 2011 14:28:41 +0100 Subject: [Numpy-discussion] NEPaNEP lessons - was: alterNEP In-Reply-To: <4E0F0F1D.40400@creativetrax.com> References: <4E0F0F1D.40400@creativetrax.com> Message-ID: Hi, On Sat, Jul 2, 2011 at 1:29 PM, Jason Grout wrote: > On 7/2/11 6:54 AM, Matthew Brett wrote: >> Why5: Because there was a belief that implementation was more >> important than discussion > > I hesitate to jump into the discussion here, but it seems to me that > Mark and others were making the point that beginning implementation > *informs* the discussion in a very valuable way. ?In a case like this > where it seems like the differences are fundamentally based on untested > assumptions (e.g., "this would be confusing" or "the consistency would > provide greater benefits than any confusion"), it seems that having an > implementation to play around with is a very valuable thing. ?Release > early and often, etc. There is of course a time to make a draft implementation, and there's a time to discuss the API in the abstract. Here the primary discussion I was trying to start was about why the discussion failed and led to bad feeling. > Of course, it should also be pointed out that Mark and others are trying > to have a conference call, where (as they said it) the communication > bandwidth is greater, which hopefully would lead to more effective and > clear communication. ?I see that as a very responsible thing to do, > given the intensity of some of the feelings in this discussion. While phone-calls are often good, I think it would be a mistake to diagnose this problem as primarily one of hurt feelings. This is what I meant about 'Blame the process not the people'. I also feel strongly that it is important to have substantial discussions on-list in order to strengthen community involvement and ownership [1]. I am hoping that, in discussing the process, it will become clear how we can improve the way we work in order to make discussion richer, calmer, and more effective. Best, Matthew [1] http://producingoss.com/en/setting-tone.html From njs at pobox.com Sat Jul 2 10:34:08 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 2 Jul 2011 07:34:08 -0700 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: <4E0EA792.3040908@hawaii.edu> References: <4E0EA792.3040908@hawaii.edu> Message-ID: On Fri, Jul 1, 2011 at 10:07 PM, Eric Firing wrote: > On 07/01/2011 06:40 PM, Nathaniel Smith wrote: >> On Fri, Jul 1, 2011 at 9:29 AM, Christopher Jordan-Squire > >> BTW, you can't access the memory of a masked value by taking a view, >> at least if I'm reading this version of the NEP correctly, and it >> seems to be the latest: >> ? ?https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst > > No, to see the latest you need to go to pull request #99, I believe: > https://github.com/numpy/numpy/pull/99 > ?From there click the diff button, then select > doc/neps/missing-data.rst, then "view file" to get to a formatted view > of the whole file in its most recent form. You can also look at the > history of the file there. ?c-masked-array.rst was renamed to > missing-data.rst and editing continued. Oh. Thanks for the link! Fortunately, I'm not seeing any changes that invalidate anything I've said here. The disappearance of .validitymask changes the details of my response earlier to Pierre, but not the content, I think. But sorry for the confusion. -- Nathaniel From ben.root at ou.edu Sat Jul 2 16:10:31 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 2 Jul 2011 15:10:31 -0500 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith wrote: > > I'm not sure what you mean here. If we have masked array support at > all (and some people seem to want it), then we have to say more than > "it's an array with a mask". Indexing such a beast has to do > *something*, so we need some kind of theory to say what, ufuncs have > to do *something*, ditto. I mean, I guess we could just say that a > masked array is literally an np.ndarray where you have attached a > field named "mask" that doesn't do anything, but I don't think that > would really satisfy most users :-). > > Indexing a masked array just returns an array with np.NA in the appropriate elements. This is no different than with regular ndarray objects or in numpy.ma. As for ufuncs, the NEP already addresses this in multiple ways. For element-wise ufuncs, a "where" parameter is available for indicating which elements to skip. For reduction ufuncs, a "skipna" parameter will indicate whether or not to skip the values. On top of that, subclassed ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function that can set a default value for those parameters to make things easier for masked array users. I don't know about others, but my main objection is this: He's > proposing two different implementations for NA. I only need one, so > having two is redundant and confusing. Of these two, the bit-pattern > one has lower memory overhead (which many people have spoken up to say > matters to them), and really obvious semantics (assignment is > implemented as assignment, etc.). So why force people to make this > confusing choice? What does the mask implementation add? AFAICT, its > only purpose is to satisfy a rather different set of use cases. (See > Gary Strangman's email here for a good description of these use cases: > http://www.mail-archive.com/numpy-discussion at scipy.org/msg32385.html) > But AFAICT again, it's been crippled for those use cases in order to > give it the NA semantics. So I just don't see who the masking part is > supposed to help. > > As a user of numpy.ma, masked arrays have always been a second-class citizen to me. Developing new code with it always brought about new surprises and discoveries of strange behavior from various functions. In this sense, numpy.ma has always been crippled. By sacrificing *some* of the existing semantics (which would likely be taken care of by a re-implemented numpy.mato preserve backwards-compatibility), the masked array community gains a first-class citizen in numpy, and numpy developers will have the masked/missing data issue in the forefront whenever developing new functions and libraries. I am more than happy with that trade-off. I am willing to learn to semantics so long as I have a guarantee that the functions I use behaves the way I expect them to. > BTW, you can't access the memory of a masked value by taking a view, > at least if I'm reading this version of the NEP correctly, and it > seems to be the latest: > > https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst > The only way to access the memory of a masked value is take a view > *before* you mask it. And if the array has a mask at all when you take > the view, you also have to set a.flags.ownmask = True, before you mask > the value. > This isn't actually as bad as it sounds. From a function's perspective, it should only know the values that it has been given access to. If I -- as a user of said function -- decide that certain values should be unknown to the function, I wouldn't want the function to be able to override that decision. Remember, it is possible that the masked element never was initialized. Therefore, we wouldn't want the function to use that element. (Note, this is one of those "fun" surprises that a numpy.ma user sometimes encounters when a function uses np.asarray instead of np.asanyarray). Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Jul 2 22:35:09 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 2 Jul 2011 22:35:09 -0400 Subject: [Numpy-discussion] alterNEP - was: missing data discussion round 2 In-Reply-To: References: Message-ID: On Sat, Jul 2, 2011 at 4:10 PM, Benjamin Root wrote: > On Fri, Jul 1, 2011 at 11:40 PM, Nathaniel Smith wrote: >> >> I'm not sure what you mean here. If we have masked array support at >> all (and some people seem to want it), then we have to say more than >> "it's an array with a mask". Indexing such a beast has to do >> *something*, so we need some kind of theory to say what, ufuncs have >> to do *something*, ditto. I mean, I guess we could just say that a >> masked array is literally an np.ndarray where you have attached a >> field named "mask" that doesn't do anything, but I don't think that >> would really satisfy most users :-). >> > > Indexing a masked array just returns an array with np.NA in the appropriate > elements.? This is no different than with regular ndarray objects or in > numpy.ma.? As for ufuncs, the NEP already addresses this in multiple ways. > For element-wise ufuncs, a "where" parameter is available for indicating > which elements to skip.? For reduction ufuncs, a "skipna" parameter will > indicate whether or not to skip the values.? On top of that, subclassed > ndarrays (such as numpy.ma, I guess) can create a __ufunc_wrap__ function > that can set a default value for those parameters to make things easier for > masked array users. > >> I don't know about others, but my main objection is this: He's >> proposing two different implementations for NA. I only need one, so >> having two is redundant and confusing. Of these two, the bit-pattern >> one has lower memory overhead (which many people have spoken up to say >> matters to them), and really obvious semantics (assignment is >> implemented as assignment, etc.). So why force people to make this >> confusing choice? What does the mask implementation add? AFAICT, its >> only purpose is to satisfy a rather different set of use cases. (See >> Gary Strangman's email here for a good description of these use cases: >> http://www.mail-archive.com/numpy-discussion at scipy.org/msg32385.html) >> But AFAICT again, it's been crippled for those use cases in order to >> give it the NA semantics. So I just don't see who the masking part is >> supposed to help. >> > > As a user of numpy.ma, masked arrays have always been a second-class citizen > to me. Developing new code with it always brought about new surprises and > discoveries of strange behavior from various functions. In this sense, > numpy.ma has always been crippled.? By sacrificing *some* of the existing > semantics (which would likely be taken care of by a re-implemented numpy.ma > to preserve backwards-compatibility), the masked array community gains a > first-class citizen in numpy, and numpy developers will have the > masked/missing data issue in the forefront whenever developing new functions > and libraries.? I am more than happy with that trade-off.? I am willing to > learn to semantics so long as I have a guarantee that the functions I use > behaves the way I expect them to. > >> >> BTW, you can't access the memory of a masked value by taking a view, >> at least if I'm reading this version of the NEP correctly, and it >> seems to be the latest: >> >> ?https://github.com/m-paradox/numpy/blob/4afdb2768c4bb8cfe47c21154c4c8ca5f85e41aa/doc/neps/c-masked-array.rst >> The only way to access the memory of a masked value is take a view >> *before* you mask it. And if the array has a mask at all when you take >> the view, you also have to set a.flags.ownmask = True, before you mask >> the value. > > This isn't actually as bad as it sounds.? From a function's perspective, it > should only know the values that it has been given access to.? If I -- as a > user of said function -- decide that certain values should be unknown to the > function, I wouldn't want the function to be able to override that > decision.? Remember, it is possible that the masked element never was > initialized.? Therefore, we wouldn't want the function to use that element. > (Note, this is one of those "fun" surprises that a numpy.ma user sometimes > encounters when a function uses np.asarray instead of np.asanyarray). But as far as I understand this takes away the ability to temporarily fill in the masked values with values that are neutral for a calculation, e.g. zero when taking a sum or dot product. Instead it looks like a copy of the array has to be made in the new version. (I'm thinking more correlate, convolution, linalg, scipy.signal, not simple ufuncs. In many cases new arrays might be created anyway so the loss from getting a copy of the non-NA data might not be so severe.) I guess the "fun" surprises will remain fun since most function in scipy or other libraries won't suddenly learn how to handle masked arrays or NAs. What happens if you feed the new animals to linalg.svd, or linalg.inv or fft ... that are all designed for asarray and not for asanyarray? Josef > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From jh at physics.ucf.edu Mon Jul 4 00:03:29 2011 From: jh at physics.ucf.edu (Joe Harrington) Date: Mon, 04 Jul 2011 00:03:29 -0400 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: (numpy-discussion-request@scipy.org) References: Message-ID: Christopher Barker, Ph.D. wrote > quick note on this: I like the "FALSE == good" way, because: So, you like to have multiple different kinds of masked, but I need multiple good values for counts. We could do it with negative masks and positive counts, but that doesn't reduce to a boolean for whoever has the negatives. We could have separate arrays, one for masks and one for counts, with both being optional. That's harder to implement and may be slower, but there's precedent: Spacecraft data are given to the investigator with several images per "data collection event". One is the actual image, another is the uncertainties per pixel, a third and often fourth are 32-bit bitmasks for error codes. There can be a dozen of these (raw data, permanently bad pixel mask, etc.). Chuck Harris wrote: > Array access needs to be distinguished from array exposure. If the access > goes through getter/setter functions than the underlying representation can > change. Whether or not that degree of abstraction is needed is another > question, but it does make things more flexible. Well, I've never been excited about data structures so complicated you can't manipulate them directly. In teaching about data analysis, we work hard to teach students *not* to stuff things into black boxes and ignore what's really going on. Too much abstraction is hard to think about, if you're used to dealing with data directly yourself. Mark Weibe wrote: > The NA idea works with any dtype, like datetime, but 50% of a datetime isn't > a reasonable concept, hurting the idea of general dtypes + alpha masking. Yes, that is correct. You shouldn't use an integer mask array with a struct, you should use a boolean. If you do use an int, you should get an error. I think the error would not cause too much confusion, since it's obvious you shouldn't do that. What I'm getting from all this discussion is that there's not much consensus on an ancillary or masked datatype. Someone could select one of these options by fiat and make a small subset of the community happy, but if it isn't solving a big problem for a lot of people, it probably shouldn't be in the core, especially if a general solution might be possible in the future with a little more thought. --jh-- From davide.lasagna at polito.it Mon Jul 4 09:38:58 2011 From: davide.lasagna at polito.it (Davide) Date: Mon, 04 Jul 2011 15:38:58 +0200 Subject: [Numpy-discussion] Broadcasting shape mismatch exception Message-ID: <4E11C272.4070907@polito.it> Hi, The exception which is currently, (v1.6), raised when two non broadcastable arrays are summed is a ValueError exception. Wouldn't it be better to create a specific exception class, e.g. BroadcastError, to be more specific and give better control in exception catching? Just a suggestion, Davide From charlesr.harris at gmail.com Mon Jul 4 09:43:48 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 Jul 2011 07:43:48 -0600 Subject: [Numpy-discussion] Broadcasting shape mismatch exception In-Reply-To: <4E11C272.4070907@polito.it> References: <4E11C272.4070907@polito.it> Message-ID: On Mon, Jul 4, 2011 at 7:38 AM, Davide wrote: > Hi, > > The exception which is currently, (v1.6), raised when two non > broadcastable arrays are summed is a ValueError exception. Wouldn't it > be better to create a specific exception class, e.g. BroadcastError, to > be more specific and give better control in exception catching? > > Sounds like a good idea. Open a ticket. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rainexpected at theo.to Mon Jul 4 22:13:07 2011 From: rainexpected at theo.to (Ted To) Date: Mon, 04 Jul 2011 22:13:07 -0400 Subject: [Numpy-discussion] Conditional random variables Message-ID: <4E127333.4010903@theo.to> Hi, Is there an easy way to make random draws from a conditional random variable? E.g., draw a random variable, x conditional on x>=\bar x. Thank you, Ted To From mwwiebe at gmail.com Tue Jul 5 09:12:58 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 5 Jul 2011 08:12:58 -0500 Subject: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe wrote: > The missing data thread has gotten a bit heated, and after sitting down > with Travis to discuss the issues a bit, we've concluded that it would be > nice to do a call with everyone who's interested in the discussion with > better communication bandwidth. There are lots of good ideas out there, and > it is very easy for things to get lost when we're just emailing. Getting on > the phone should provide a more effective way to ensure everyone is properly > being heard. > > We're proposing to set up a GotoMeeting call at 4pm CST today. Please > respond if you can make it and your level of interest. I've created a Doodle > where you can indicate your availability if 4pm today is too short notice, > and we should schedule for a different time: > > http://www.doodle.com/eu9k3xip47a6gnue > I hope everyone had a great weekend. Thanks to all who filled in the doodle, we have a unanimous winning time of 2PM central time today. I'll post with details about how to connect to the call when that has been prepared. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Jul 5 09:45:31 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 05 Jul 2011 09:45:31 -0400 Subject: [Numpy-discussion] custom atlas Message-ID: I thought I'd try to speed up numpy on my fedora system by rebuilding the atlas package so it would be tuned for my machine. But when I do: rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec it fails with: res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. A bit of googling has not revealed a solution. Any hints? From charlesr.harris at gmail.com Tue Jul 5 09:55:23 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jul 2011 07:55:23 -0600 Subject: [Numpy-discussion] custom atlas In-Reply-To: References: Message-ID: On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker wrote: > I thought I'd try to speed up numpy on my fedora system by rebuilding the > atlas > package so it would be tuned for my machine. But when I do: > > rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec > > it fails with: > > res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. > > A bit of googling has not revealed a solution. Any hints? > > > I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do you have all the power saving/frequency changing options turned off? What version of ATLAS are you using? What CPU? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Jul 5 10:13:54 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 05 Jul 2011 10:13:54 -0400 Subject: [Numpy-discussion] custom atlas References: Message-ID: Charles R Harris wrote: > On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker wrote: > >> I thought I'd try to speed up numpy on my fedora system by rebuilding the >> atlas >> package so it would be tuned for my machine. But when I do: >> >> rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec >> >> it fails with: >> >> res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER REPS. >> >> A bit of googling has not revealed a solution. Any hints? >> >> >> > I've never seen that, OTOH, I haven't built ATLAS in the last few years. Do > you have all the power saving/frequency changing options turned off? What > version of ATLAS are you using? What CPU? > > Chuck Ah, hadn't tried turing off cpuspeed. Try again... nope same error. 2 cpus, each: model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping : 11 cpu MHz : 800.000 << that's what it says @idle cache size : 4096 KB From josef.pktd at gmail.com Tue Jul 5 10:17:01 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 5 Jul 2011 10:17:01 -0400 Subject: [Numpy-discussion] Conditional random variables In-Reply-To: <4E127333.4010903@theo.to> References: <4E127333.4010903@theo.to> Message-ID: On Mon, Jul 4, 2011 at 10:13 PM, Ted To wrote: > Hi, > > Is there an easy way to make random draws from a conditional random > variable? ?E.g., draw a random variable, x conditional on x>=\bar x. If you mean here truncated distribution, then I asked a similar question on the scipy user list a month ago for the normal distribution. The answer was use rejection sampling, Gibbs or MCMC. I just sample from the original distribution and throw away those values that are not in the desired range. This works fine if there is only a small truncation, but not so well for distribution with support only in the tails. It's reasonably fast for distributions that numpy.random produces relatively fast. (Having a bi- or multi-variate distribution and sampling y conditional on given x sounds more "fun".) Josef > > Thank you, > Ted To > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From rainexpected at theo.to Tue Jul 5 10:33:29 2011 From: rainexpected at theo.to (Ted To) Date: Tue, 05 Jul 2011 10:33:29 -0400 Subject: [Numpy-discussion] Conditional random variables In-Reply-To: References: <4E127333.4010903@theo.to> Message-ID: <4E1320B9.3020007@theo.to> On 07/05/2011 10:17 AM, josef.pktd at gmail.com wrote: > On Mon, Jul 4, 2011 at 10:13 PM, Ted To wrote: >> Hi, >> >> Is there an easy way to make random draws from a conditional random >> variable? E.g., draw a random variable, x conditional on x>=\bar x. > > If you mean here truncated distribution, then I asked a similar > question on the scipy user list a month ago for the normal > distribution. > > The answer was use rejection sampling, Gibbs or MCMC. > > I just sample from the original distribution and throw away those > values that are not in the desired range. This works fine if there is > only a small truncation, but not so well for distribution with support > only in the tails. It's reasonably fast for distributions that > numpy.random produces relatively fast. > > (Having a bi- or multi-variate distribution and sampling y conditional > on given x sounds more "fun".) Yes, that is what I had been doing but in some cases my truncations moves into the upper tail and it takes an extraordinary amount of time. I found that I could use scipy.stats.truncnorm but I haven't yet figured out how to use it for a joint distribution. E.g., I have 2 normal rv's X and Y from which I would like to draw X and Y where X+Y>= U. Any suggestions? Cheers, Ted To From charlesr.harris at gmail.com Tue Jul 5 10:37:01 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jul 2011 08:37:01 -0600 Subject: [Numpy-discussion] custom atlas In-Reply-To: References: Message-ID: On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker wrote: > Charles R Harris wrote: > > > On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker wrote: > > > >> I thought I'd try to speed up numpy on my fedora system by rebuilding > the > >> atlas > >> package so it would be tuned for my machine. But when I do: > >> > >> rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec > >> > >> it fails with: > >> > >> res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER > REPS. > >> > >> A bit of googling has not revealed a solution. Any hints? > >> > >> > >> > > I've never seen that, OTOH, I haven't built ATLAS in the last few years. > Do > > you have all the power saving/frequency changing options turned off? What > > version of ATLAS are you using? What CPU? > > > > Chuck > > Ah, hadn't tried turing off cpuspeed. Try again... nope same error. > > 2 cpus, each: > model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz > stepping : 11 > cpu MHz : 800.000 << that's what it says @idle > You haven't got cpu frequency scaling under control. Linux? Depending on the distro you can write to a file in /sys (for each cpu) or run a program to make the setting, or click on a panel applet. Sometimes the scaling is set in the bios also. Google is your friend here. I have $charris at f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor ondemand And what you want to see is performance instead of ondemand. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Jul 5 11:06:36 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 Jul 2011 09:06:36 -0600 Subject: [Numpy-discussion] custom atlas In-Reply-To: References: Message-ID: On Tue, Jul 5, 2011 at 8:37 AM, Charles R Harris wrote: > > > On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker wrote: > >> Charles R Harris wrote: >> >> > On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker >> wrote: >> > >> >> I thought I'd try to speed up numpy on my fedora system by rebuilding >> the >> >> atlas >> >> package so it would be tuned for my machine. But when I do: >> >> >> >> rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec >> >> >> >> it fails with: >> >> >> >> res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER >> REPS. >> >> >> >> A bit of googling has not revealed a solution. Any hints? >> >> >> >> >> >> >> > I've never seen that, OTOH, I haven't built ATLAS in the last few years. >> Do >> > you have all the power saving/frequency changing options turned off? >> What >> > version of ATLAS are you using? What CPU? >> > >> > Chuck >> >> Ah, hadn't tried turing off cpuspeed. Try again... nope same error. >> >> 2 cpus, each: >> model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz >> stepping : 11 >> cpu MHz : 800.000 << that's what it says @idle >> > > You haven't got cpu frequency scaling under control. Linux? Depending on > the distro you can write to a file in /sys (for each cpu) or run a program > to make the setting, or click on a panel applet. Sometimes the scaling is > set in the bios also. Google is your friend here. I have > > $charris at f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor > ondemand > > And what you want to see is performance instead of ondemand. > > Here's some good info . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jul 5 11:07:21 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 5 Jul 2011 11:07:21 -0400 Subject: [Numpy-discussion] Conditional random variables In-Reply-To: <4E1320B9.3020007@theo.to> References: <4E127333.4010903@theo.to> <4E1320B9.3020007@theo.to> Message-ID: On Tue, Jul 5, 2011 at 10:33 AM, Ted To wrote: > On 07/05/2011 10:17 AM, josef.pktd at gmail.com wrote: >> On Mon, Jul 4, 2011 at 10:13 PM, Ted To wrote: >>> Hi, >>> >>> Is there an easy way to make random draws from a conditional random >>> variable? ?E.g., draw a random variable, x conditional on x>=\bar x. >> >> If you mean here truncated distribution, then I asked a similar >> question on the scipy user list a month ago for the normal >> distribution. >> >> The answer was use rejection sampling, Gibbs or MCMC. >> >> I just sample from the original distribution and throw away those >> values that are not in the desired range. This works fine if there is >> only a small truncation, but not so well for distribution with support >> only in the tails. It's reasonably fast for distributions that >> numpy.random produces relatively fast. >> >> (Having a bi- or multi-variate distribution and sampling y conditional >> on given x sounds more "fun".) > > Yes, that is what I had been doing but in some cases my truncations > moves into the upper tail and it takes an extraordinary amount of time. > ?I found that I could use scipy.stats.truncnorm but I haven't yet > figured out how to use it for a joint distribution. ?E.g., I have 2 > normal rv's X and Y from which I would like to draw X and Y where X+Y>= U. > > Any suggestions? If you only need to sample the sum Z=X+Y, then it would be just a univariate normal again (in Z). For the general case, I'm at least a month away from being able to sample from a generic multivariate distribution. There is an integral transform that does recursive conditioning y|x. (like F^{-1} transform for multivariate distributions, used for example for copulas) For example sample x>=U and then sample y>=u-x. That's two univariate normal samples. Another trick I used for the tail is to take the absolute value around the mean, because of symmetry you get twice as many valid samples. I also never tried importance sampling and the other biased sampling procedures. If you find something, then I'm also interested in a solution. Cheers, Josef > > Cheers, > Ted To > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From rainexpected at theo.to Tue Jul 5 12:26:17 2011 From: rainexpected at theo.to (Ted To) Date: Tue, 05 Jul 2011 12:26:17 -0400 Subject: [Numpy-discussion] Conditional random variables In-Reply-To: References: <4E127333.4010903@theo.to> <4E1320B9.3020007@theo.to> Message-ID: <4E133B29.1020103@theo.to> On 07/05/2011 11:07 AM, josef.pktd at gmail.com wrote: > For example sample x>=U and then sample y>=u-x. That's two univariate > normal samples. Ah, that's what I was looking for! Many thanks! From Chris.Barker at noaa.gov Tue Jul 5 12:34:41 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Tue, 05 Jul 2011 09:34:41 -0700 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: References: Message-ID: <4E133D21.8090401@noaa.gov> On 7/3/11 9:03 PM, Joe Harrington wrote: > Christopher Barker, Ph.D. wrote >> quick note on this: I like the "FALSE == good" way, because: > > So, you like to have multiple different kinds of masked, but I need > multiple good values for counts. fair enough, maybe there isn't a consensus about what is best, or most common, interpretation. However, I was thinking less "different kinds of masks" than, "something special" -- so if there is ANY additional information about a given element, it has a non-zero value. so less "FALSE == good", then "FALSE == raw_value" seems like the cleanest way to do it. That having been said, I generally DON'T like the "zero is false" convention -- I wish that Python actually required a Boolean where one was called, for, rather that being able to pass in zero or any-other-value. Speaking of which, would we make the NA value be false? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From mwwiebe at gmail.com Tue Jul 5 12:43:09 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 5 Jul 2011 11:43:09 -0500 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: <4E133D21.8090401@noaa.gov> References: <4E133D21.8090401@noaa.gov> Message-ID: On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker wrote: > On 7/3/11 9:03 PM, Joe Harrington wrote: > > Christopher Barker, Ph.D. wrote > >> quick note on this: I like the "FALSE == good" way, because: > > > > So, you like to have multiple different kinds of masked, but I need > > multiple good values for counts. > > fair enough, maybe there isn't a consensus about what is best, or most > common, interpretation. > > However, I was thinking less "different kinds of masks" than, "something > special" -- so if there is ANY additional information about a given > element, it has a non-zero value. > > so less "FALSE == good", then "FALSE == raw_value" > > seems like the cleanest way to do it. > > That having been said, I generally DON'T like the "zero is false" > convention -- I wish that Python actually required a Boolean where one > was called, for, rather that being able to pass in zero or any-other-value. > > Speaking of which, would we make the NA value be false? > For booleans, it works out like this: http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic In R, trying to test the truth value of NA ("if (NA) ...") raises an exception. Adopting this behavior seems reasonable to me. -Mark > -Chris > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Jul 5 12:45:30 2011 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 5 Jul 2011 11:45:30 -0500 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: <4E133D21.8090401@noaa.gov> References: <4E133D21.8090401@noaa.gov> Message-ID: On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker wrote: > > Speaking of which, would we make the NA value be false? > > The NEP currently states that accessing np.NA as a boolean will act as an error. However, logical_and([NA, False]) == False and logical_or([NA, True]) will be special-cased. This does raise the question... how should np.any() and np.all() behave? Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Jul 5 13:07:52 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 5 Jul 2011 12:07:52 -0500 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: References: <4E133D21.8090401@noaa.gov> Message-ID: On Tue, Jul 5, 2011 at 11:45 AM, Benjamin Root wrote: > > On Tue, Jul 5, 2011 at 11:34 AM, Chris Barker wrote: > >> >> Speaking of which, would we make the NA value be false? >> >> > The NEP currently states that accessing np.NA as a boolean will act as an > error. However, logical_and([NA, False]) == False and logical_or([NA, > True]) will be special-cased. > > This does raise the question... how should np.any() and np.all() behave? > I've added a paragraph/examples for this case to the NEP in pull request 99. -Mark > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Jul 5 13:36:35 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 5 Jul 2011 13:36:35 -0400 Subject: [Numpy-discussion] Conditional random variables In-Reply-To: <4E133B29.1020103@theo.to> References: <4E127333.4010903@theo.to> <4E1320B9.3020007@theo.to> <4E133B29.1020103@theo.to> Message-ID: On Tue, Jul 5, 2011 at 12:26 PM, Ted To wrote: > On 07/05/2011 11:07 AM, josef.pktd at gmail.com wrote: >> For example sample x>=U and then sample y>=u-x. That's two univariate >> normal samples. > > Ah, that's what I was looking for! ?Many thanks! just in case I wasn't clear, if x and y are correlated, then y: y>u-x needs to be sampled from the conditional distribution y|x http://en.wikipedia.org/wiki/Multivariate_normal_distribution#Conditional_distributions Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ndbecker2 at gmail.com Tue Jul 5 13:39:35 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 05 Jul 2011 13:39:35 -0400 Subject: [Numpy-discussion] custom atlas References: Message-ID: Charles R Harris wrote: > On Tue, Jul 5, 2011 at 8:37 AM, Charles R Harris > wrote: > >> >> >> On Tue, Jul 5, 2011 at 8:13 AM, Neal Becker wrote: >> >>> Charles R Harris wrote: >>> >>> > On Tue, Jul 5, 2011 at 7:45 AM, Neal Becker >>> wrote: >>> > >>> >> I thought I'd try to speed up numpy on my fedora system by rebuilding >>> the >>> >> atlas >>> >> package so it would be tuned for my machine. But when I do: >>> >> >>> >> rpmbuild -ba -D 'enable_native_atlas 1' atlas.spec >>> >> >>> >> it fails with: >>> >> >>> >> res/zgemvN_5000_100 : VARIATION EXCEEDS TOLERENCE, RERUN WITH HIGHER >>> REPS. >>> >> >>> >> A bit of googling has not revealed a solution. Any hints? >>> >> >>> >> >>> >> >>> > I've never seen that, OTOH, I haven't built ATLAS in the last few years. >>> Do >>> > you have all the power saving/frequency changing options turned off? >>> What >>> > version of ATLAS are you using? What CPU? >>> > >>> > Chuck >>> >>> Ah, hadn't tried turing off cpuspeed. Try again... nope same error. >>> >>> 2 cpus, each: >>> model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz >>> stepping : 11 >>> cpu MHz : 800.000 << that's what it says @idle >>> >> >> You haven't got cpu frequency scaling under control. Linux? Depending on >> the distro you can write to a file in /sys (for each cpu) or run a program >> to make the setting, or click on a panel applet. Sometimes the scaling is >> set in the bios also. Google is your friend here. I have >> >> $charris at f13 ~$ cat /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor >> ondemand >> >> And what you want to see is performance instead of ondemand. >> >> > Here's some good info . > > Chuck Thanks! Good info. But same result. # service cpuspeed stop # echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor # echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor build stopped exactly same as before. From njs at pobox.com Tue Jul 5 13:52:28 2011 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jul 2011 10:52:28 -0700 Subject: [Numpy-discussion] NEPaNEP lessons - was: alterNEP In-Reply-To: References: <4E0F0F1D.40400@creativetrax.com> Message-ID: On Sat, Jul 2, 2011 at 6:28 AM, Matthew Brett wrote: > Here the primary discussion I was trying to start was about why the > discussion failed and led to bad feeling. Well, I have a hypothesis, don't know if it's true. It goes like this: Most of the time, when one of us decides to take the trouble to try and implement some change to the numpy core, it's because we really want to be able to take advantage of that change in our own work. This has two consequences: (a) it's only worth bothering if we can make sure that the resulting code is really useful to us. So we're really motivated to make sure we nail at least one use case. (b) we don't get any benefit from our work unless the code actually gets merged. So we're really motivated to build consensus and convince other people they really want our code too, because otherwise it'll probably get dropped. In this case, though, Mark got asked to write some code as part of his job. Making commercial development and FOSS mix has this notorious habit of going off the rails despite everyone having the best of intentions, and I wonder if that was part of the problem here. If Travis hired me to implement some feature demanded by the community, then I wouldn't feel the same urgency to really make sure that everyone was on board before investing my time. And I wouldn't have the same urgency to make sure that it really nailed my use cases, because that wouldn't be so central to my motivation for doing the work. And on a limited-length contract, I'd have more urgency to get something done quick. As it is, I don't want to waste this opportunity enabled by Mark's time and Enthought's money, but I do care a lot more about getting a good result than I do about making something happen this month -- because I'll have to work with, support, and teach people about whatever we come up with for the next however many years, and that weighs a lot more heavily in my calculations. Hopefully i tgoes without saying, but to be clear -- I'm sure Mark *is* worrying about all the things I mentioned, and doing his best to make something awesome that works for people. (And, Mark, sorry for talking about you in the third person... not sure how to talk about this better.) But sometimes that's not enough when the incentives are weird. It also doesn't help that apparently there have been multiple discussions going on in different venues (on the mailing list, in github, and presumably some face-to-face at Enthought's offices too), which makes it very hard to keep everyone in the loop. I'm a big fan of Karl's book too -- here are some sections I think might be particularly relevant: http://producingoss.com/en/contracting.html http://producingoss.com/en/setting-tone.html#avoid-private-discussions http://producingoss.com/en/bug-tracker-usage.html -- Nathaniel From njs at pobox.com Tue Jul 5 13:54:17 2011 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 Jul 2011 10:54:17 -0700 Subject: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design In-Reply-To: References: Message-ID: Also, I can see the motivation for wanting a voice meeting, but on the subject of keeping people in the loop, could we make sure that someone is taking notes on what happens, and that they get posted to the list? -- Nathaniel On Tue, Jul 5, 2011 at 6:12 AM, Mark Wiebe wrote: > On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe wrote: >> >> The missing data thread has gotten a bit heated, and after sitting?down >> with Travis to discuss the issues a bit, we've concluded that it would be >> nice to do a call with everyone who's interested in the discussion with >> better communication bandwidth. There are lots of good ideas out there, and >> it is very easy for things to get lost when we're just emailing. Getting on >> the phone should provide a more effective way to ensure everyone is properly >> being heard. >> We're proposing to set up a GotoMeeting call at 4pm CST today. Please >> respond if you can make it and your level of interest. I've created a Doodle >> where you can indicate your availability if 4pm today is too short notice, >> and we should schedule for a different time: >> http://www.doodle.com/eu9k3xip47a6gnue > > I hope everyone had a great weekend. Thanks to all who filled in the doodle, > we have a?unanimous?winning time of 2PM central time today. I'll post with > details about how to connect to the call when that has been prepared. > Cheers, > Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From mwwiebe at gmail.com Tue Jul 5 14:07:25 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 5 Jul 2011 13:07:25 -0500 Subject: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design In-Reply-To: References: Message-ID: Here are the details for the call: 1. Please join my meeting, Jul 5, 2011 at 2:00 PM Central time. https://www1.gotomeeting.com/join/972295593 2. Use your microphone and speakers (VoIP) - a headset is recommended. Or, call in using your telephone. Dial +1 (312) 878-3070 Access Code: 972-295-593 Audio PIN: Shown after joining the meeting Meeting ID: 972-295-593 GoToMeeting? Online Meetings Made Easy? We'll have someone taking notes to create a summary as Nathaniel suggested. The NEP and other reference material will be visible with the screen sharing of gotomeeting, but those running Linux can follow along by viewing the document we're browsing here: https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst Thanks, Mark On Tue, Jul 5, 2011 at 12:54 PM, Nathaniel Smith wrote: > Also, I can see the motivation for wanting a voice meeting, but on the > subject of keeping people in the loop, could we make sure that someone > is taking notes on what happens, and that they get posted to the list? > -- Nathaniel > > On Tue, Jul 5, 2011 at 6:12 AM, Mark Wiebe wrote: > > On Fri, Jul 1, 2011 at 2:22 PM, Mark Wiebe wrote: > >> > >> The missing data thread has gotten a bit heated, and after sitting down > >> with Travis to discuss the issues a bit, we've concluded that it would > be > >> nice to do a call with everyone who's interested in the discussion with > >> better communication bandwidth. There are lots of good ideas out there, > and > >> it is very easy for things to get lost when we're just emailing. Getting > on > >> the phone should provide a more effective way to ensure everyone is > properly > >> being heard. > >> We're proposing to set up a GotoMeeting call at 4pm CST today. Please > >> respond if you can make it and your level of interest. I've created a > Doodle > >> where you can indicate your availability if 4pm today is too short > notice, > >> and we should schedule for a different time: > >> http://www.doodle.com/eu9k3xip47a6gnue > > > > I hope everyone had a great weekend. Thanks to all who filled in the > doodle, > > we have a unanimous winning time of 2PM central time today. I'll post > with > > details about how to connect to the call when that has been prepared. > > Cheers, > > Mark > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Tue Jul 5 14:33:32 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 5 Jul 2011 14:33:32 -0400 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: References: Message-ID: On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold wrote: > On Fri, Jul 1, 2011 at 2:22 PM, ? wrote: >> On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold wrote: >>> lib.recfunctions has never been fully advertised. The two bugs I just >>> discovered lead me to believe that it's not that well vetted, but it >>> is useful. I can't be the only one using these? >>> >>> What do people think of either deprecating lib.recfunctions or at >>> least importing them into the numpy.rec namespace? >>> >>> I'm sure this has come up before, but gmane search isn't working for me. >> >> about once a year >> >> http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 >> >> my guess is not much has changed since then >> > > Ah, yes. I recall now. > > I agree that they're more general than rec, but also don't have a > first best solution for this. So I think we should move them (in a > correct way) to numpy.rec and add (some of?) them as methods to > recarrays. The best we can do beyond that is put some docs on the > structured array page and notes in the docstrings that they also work > for ndarrays with structured dtype. > > I'll submit a pull request soon and maybe that'll generate some interest. > Had a brief look at what getting lib.recfunctions into rec/core.rec namespace would entail. It's not as simple as it could be, because there are circular imports between core.records and recfunctions (and its imports). It seems that it is possible to work around the circular imports in some of the code except for the degree to which recfunctions is wrapped up with the masked array code. The path of least resistance is to just import lib.recfunctions.* into the (already crowded) main numpy namespace and be done with it. Another option, though it's more work, is to remove all the internal masked array support and let the user decide what do with the record/structured arrays after they're returned (I invariably have to set usemask=False anyway). The functions can then be wrapped by higher-level ones in np.ma if the old usemask behavior is still desirable for people. This should probably wait until the new masked array changes are in and settled a bit though. Skipper From pgmdevlist at gmail.com Tue Jul 5 14:46:26 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 5 Jul 2011 20:46:26 +0200 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: References: Message-ID: <0949C672-C283-4176-A165-420E4917BA1A@gmail.com> On Jul 5, 2011, at 8:33 PM, Skipper Seabold wrote: > On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold wrote: >> On Fri, Jul 1, 2011 at 2:22 PM, wrote: >>> On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold wrote: >>>> lib.recfunctions has never been fully advertised. The two bugs I just >>>> discovered lead me to believe that it's not that well vetted, but it >>>> is useful. I can't be the only one using these? >>>> >>>> What do people think of either deprecating lib.recfunctions or at >>>> least importing them into the numpy.rec namespace? >>>> >>>> I'm sure this has come up before, but gmane search isn't working for me. >>> >>> about once a year >>> >>> http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 >>> >>> my guess is not much has changed since then >>> >> >> Ah, yes. I recall now. >> >> I agree that they're more general than rec, but also don't have a >> first best solution for this. So I think we should move them (in a >> correct way) to numpy.rec and add (some of?) them as methods to >> recarrays. The best we can do beyond that is put some docs on the >> structured array page and notes in the docstrings that they also work >> for ndarrays with structured dtype. >> >> I'll submit a pull request soon and maybe that'll generate some interest. >> > > Had a brief look at what getting lib.recfunctions into rec/core.rec > namespace would entail. It's not as simple as it could be, because > there are circular imports between core.records and recfunctions (and > its imports). It seems that it is possible to work around the circular > imports in some of the code except for the degree to which > recfunctions is wrapped up with the masked array code. Hello, The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever was to illustrate that the functions of this package are more generic than they appear. They work with regular structured ndarrays and don't need recarrays. Methinks we gonna lose this aspect if you try to rename it, but hey, your call. As to as why they were never really advertised ? Because I never received any feedback when I started developing them (developing is a big word here, I just took a lot of code that John D Hunter had developed in matplotlib and make it more consistent). I advertised them once or twice on the list, wrote the basic docstrings, but waited for other people to start using them. Anyhow. So, yes, there might be some weird import to polish. Note that if you decided to just rename the package and leave it where it was, it would probably be easier. > The path of least resistance is to just import lib.recfunctions.* into > the (already crowded) main numpy namespace and be done with it. Why ? Why can't you leave it available through numpy.lib ? Once again, if it's only a matter of PRing, you could start writing an entry page in the doc describing the functions, that would improve the visibility. > Another option, though it's more work, is to remove all the internal > masked array support and let the user decide what do with the > record/structured arrays after they're returned (I invariably have to > set usemask=False anyway). Or you just port the functions in numpy.ma (making a numpy.ma.recfunctions, for example). > The functions can then be wrapped by > higher-level ones in np.ma if the old usemask behavior is still > desirable for people. This should probably wait until the new masked > array changes are in and settled a bit though. Oh yes... I agree with that P. From jsseabold at gmail.com Tue Jul 5 15:23:10 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 5 Jul 2011 15:23:10 -0400 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: <0949C672-C283-4176-A165-420E4917BA1A@gmail.com> References: <0949C672-C283-4176-A165-420E4917BA1A@gmail.com> Message-ID: On Tue, Jul 5, 2011 at 2:46 PM, Pierre GM wrote: > > On Jul 5, 2011, at 8:33 PM, Skipper Seabold wrote: > >> On Fri, Jul 1, 2011 at 2:32 PM, Skipper Seabold wrote: >>> On Fri, Jul 1, 2011 at 2:22 PM, ? wrote: >>>> On Fri, Jul 1, 2011 at 1:59 PM, Skipper Seabold wrote: >>>>> lib.recfunctions has never been fully advertised. The two bugs I just >>>>> discovered lead me to believe that it's not that well vetted, but it >>>>> is useful. I can't be the only one using these? >>>>> >>>>> What do people think of either deprecating lib.recfunctions or at >>>>> least importing them into the numpy.rec namespace? >>>>> >>>>> I'm sure this has come up before, but gmane search isn't working for me. >>>> >>>> about once a year >>>> >>>> http://old.nabble.com/Emulate-left-outer-join--td27522655.html#a27522655 >>>> >>>> my guess is not much has changed since then >>>> >>> >>> Ah, yes. I recall now. >>> >>> I agree that they're more general than rec, but also don't have a >>> first best solution for this. So I think we should move them (in a >>> correct way) to numpy.rec and add (some of?) them as methods to >>> recarrays. The best we can do beyond that is put some docs on the >>> structured array page and notes in the docstrings that they also work >>> for ndarrays with structured dtype. >>> >>> I'll submit a pull request soon and maybe that'll generate some interest. >>> >> >> Had a brief look at what getting lib.recfunctions into rec/core.rec >> namespace would entail. It's not as simple as it could be, because >> there are circular imports between core.records and recfunctions (and >> its imports). It seems that it is possible to work around the circular >> imports in some of the code except for the degree to which >> recfunctions is wrapped up with the masked array code. > > Hello, > The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever was to illustrate that the functions of this package are more generic than they appear. They work with regular structured ndarrays and don't need recarrays. Methinks we gonna lose this aspect if you try to rename it, but hey, your call. I agree (even though 'rec' is already in the name). My goal was to just have numpy.rec.join_by, numpy.rec.stack_arrays, etc, so they're right there (rec seems more intuitive than lib to me). Do you think that they may be better off in the main numpy namespace? This is far from my call, just trying to reach some consensus and make an effort to move the status quo. > As to as why they were never really advertised ? Because I never received any feedback when I started developing them (developing is a big word here, I just took a lot of code that John D Hunter had developed in matplotlib and make it more consistent). I advertised them once or twice on the list, wrote the basic docstrings, but waited for other people to start using them. As Josef pointed out before, it's a chicken and egg thing re: advertisement and feedback. I think the best advertisement is by namespace. I use them frequently, and I haven't offered any feedback because I've never been left wanting (recent pull request is the only exception). For the most part they do what I want and the docs are good. > Anyhow. > So, yes, there might be some weird import to polish. Note that if you decided to just rename the package and leave it where it was, it would probably be easier. > Imports are fine as long as they stay where they are and aren't imported into numpy.core. > >> The path of least resistance is to just import lib.recfunctions.* into >> the (already crowded) main numpy namespace and be done with it. > > Why ? Why can't you leave it available through numpy.lib ? Once again, if it's only a matter of PRing, you could start writing an entry page in the doc describing the functions, that would improve the visibility. I'm fine with leaving the code where it is, but typing numpy.lib.recfunctions. is painful (ditto `import numpy.lib.recfunctions as nprf`). Every time. And I have to do this often. Even if they are imported into the lib namespace (they aren't), it would be an improvement, but I still don't think it occurs to people to hunt through lib to try and join two structured arrays. It looks like everything in the lib namespace is imported into the main numpy namespace anyway. And 2) I found a little buglet recently that made me think this code should be banged on more. The best way to do this is to get it out there. If other users are anything like me, I rely on tab-completion and docstrings not online docs for working with projects that I don't need to be intimately familiar with, the implication being that lib is intimate, I guess. Skipper (standing astride this molehill) > > >> Another option, though it's more work, is to remove all the internal >> masked array support and let the user decide what do with the >> record/structured arrays after they're returned (I invariably have to >> set usemask=False anyway). > > Or you just port the functions in numpy.ma (making a numpy.ma.recfunctions, for example). > > >> The functions can then be wrapped by >> higher-level ones in np.ma if the old usemask behavior is still >> desirable for people. This should probably wait until the new masked >> array changes are in and settled a bit though. > > Oh yes... I agree with that > P. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From xscript at gmx.net Tue Jul 5 15:33:31 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Tue, 05 Jul 2011 21:33:31 +0200 Subject: [Numpy-discussion] conference call / gotomeeting to discuss the missing data design In-Reply-To: (Mark Wiebe's message of "Tue, 5 Jul 2011 13:07:25 -0500") References: Message-ID: <87d3hoiktw.fsf@ginnungagap.bsc.es> Mark Wiebe writes: > We'll have someone taking notes to create a summary as Nathaniel suggested. Thanks. -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From Chris.Barker at noaa.gov Tue Jul 5 16:02:36 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jul 2011 13:02:36 -0700 Subject: [Numpy-discussion] Missing/accumulating data In-Reply-To: References: <4E133D21.8090401@noaa.gov> Message-ID: <4E136DDC.7030500@noaa.gov> Mark Wiebe wrote: > Speaking of which, would we make the NA value be false? > > For booleans, it works out like this: > > http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic That's pretty cool! > In R, trying to test the truth value of NA ("if (NA) ...") raises an > exception. Adopting this behavior seems reasonable to me. I'm not so sure. the other president is Python, where None is interpreted as False. In general, in non-numpy code, I use None to mean "not set yet" or "I'm not sure", or, whatever. It's pretty useful to have it be false. However, I also do: if x is not None: rather than- if x: so as to be unambiguous about what I'm testing for (and because if x == 0, I don't want the test to fail), so I guess: if arr[i] is np.NA: would be perfectly analogous. -Chris > -Mark > > > > -Chris > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 > voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 > main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From qubax at gmx.at Tue Jul 5 17:12:04 2011 From: qubax at gmx.at (qubax at gmx.at) Date: Tue, 5 Jul 2011 23:12:04 +0200 Subject: [Numpy-discussion] suggestions on optimising a special matrix reshape Message-ID: <20110705211204.GA4604@tux.hotze.com> i have to reshape a matrix beta of the form (4**N, 4**N, 4**N, 4**N) into betam like (16**N, 16**N) following: betam = np.zeros((16**N,16**N), dtype = complex) for k in xrange(16**N): ind1 = np.mod(k,4**N) ind2 = k/4**N for l in xrange(16**N): betam[k,l] = beta[np.mod(l,4**N), l/4**N, ind1 , ind2] is there a smarter/faster way of getting the above done? for N=2, that already takes 0.5 seconds but i intend to use it for N=3 and N=4 ... thanks for your input, q -- History consists of nothing more than the lies we tell ourselves to justify the present. The king who needs to remind his people of his rank, is no king. A beggar's mistake harms no one but the beggar. A king's mistake, however, harms everyone but the king. Too often, the measure of power lies not in the number who obey your will, but in the number who suffer your stupidity. From Chris.Barker at noaa.gov Tue Jul 5 16:39:34 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 Jul 2011 13:39:34 -0700 Subject: [Numpy-discussion] suggestions on optimising a special matrix reshape In-Reply-To: <20110705211204.GA4604@tux.hotze.com> References: <20110705211204.GA4604@tux.hotze.com> Message-ID: <4E137686.5030003@noaa.gov> qubax at gmx.at wrote: > i have to reshape a matrix beta of the form (4**N, 4**N, 4**N, 4**N) > into betam like (16**N, 16**N) following: > > betam = np.zeros((16**N,16**N), dtype = complex) > for k in xrange(16**N): > ind1 = np.mod(k,4**N) > ind2 = k/4**N > for l in xrange(16**N): > betam[k,l] = beta[np.mod(l,4**N), l/4**N, ind1 , ind2] > > is there a smarter/faster way of getting the above done? no time to check if this is what you want, but is this it? a = np.arange((4**(4*N))).reshape(4**N,4**N,4**N,4**N) b = a.reshape((16**N, 16**N)) If that doesn't do it right, you may be able to mess with the strides, etc. do some googling, and check out: numpy.lib.stride_tricks -Chris > for N=2, that already takes 0.5 seconds but i intend to use it > for N=3 and N=4 ... > > thanks for your input, > q > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rowen at uw.edu Tue Jul 5 17:41:32 2011 From: rowen at uw.edu (Russell E. Owen) Date: Tue, 05 Jul 2011 14:41:32 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 References: Message-ID: In article , Ralf Gommers wrote: > https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Will there be a Mac binary for 32-bit pythons (one that is compatible with older versions of MacOS X)? At present I only see a 64-bit 10.6-only version. -- Russell From pgmdevlist at gmail.com Tue Jul 5 18:39:55 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 6 Jul 2011 00:39:55 +0200 Subject: [Numpy-discussion] Moving lib.recfunctions? In-Reply-To: References: <0949C672-C283-4176-A165-420E4917BA1A@gmail.com> Message-ID: On Jul 5, 2011, at 9:23 PM, Skipper Seabold wrote: > On Tue, Jul 5, 2011 at 2:46 PM, Pierre GM wrote: >>> >>> <...> >> >> Hello, >> The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever was to illustrate that the functions of this package are more generic than they appear. They work with regular structured ndarrays and don't need recarrays. Methinks we gonna lose this aspect if you try to rename it, but hey, your call. > > I agree (even though 'rec' is already in the name). My goal was to > just have numpy.rec.join_by, numpy.rec.stack_arrays, etc, so they're > right there (rec seems more intuitive than lib to me). Do you think > that they may be better off in the main numpy namespace? This is far > from my call, just trying to reach some consensus and make an effort > to move the status quo. Sure, a np.join_by or np.stack_array is easy and non-ambiguous enough... >> As to as why they were never really advertised ? Because I never received any feedback when I started developing them (developing is a big word here, I just took a lot of code that John D Hunter had developed in matplotlib and make it more consistent). I advertised them once or twice on the list, wrote the basic docstrings, but waited for other people to start using them. > > As Josef pointed out before, it's a chicken and egg thing re: > advertisement and feedback. I think the best advertisement is by > namespace. I use them frequently, and I haven't offered any feedback > because I've never been left wanting (recent pull request is the only > exception). For the most part they do what I want and the docs are > good. Cool ! > >> >>> The path of least resistance is to just import lib.recfunctions.* into >>> the (already crowded) main numpy namespace and be done with it. >> >> Why ? Why can't you leave it available through numpy.lib ? Once again, if it's only a matter of PRing, you could start writing an entry page in the doc describing the functions, that would improve the visibility. > > I'm fine with leaving the code where it is, but typing > numpy.lib.recfunctions. is painful (ditto `import > numpy.lib.recfunctions as nprf`). Every time. And I have to do this > often. You have a point. As long as nobody minds and you don't lose too much time trying to tweak the imports, I'm quite OK with it. > Even if they are imported into the lib namespace (they aren't), > it would be an improvement, but I still don't think it occurs to > people to hunt through lib to try and join two structured arrays. It > looks like everything in the lib namespace is imported into the main > numpy namespace anyway. And 2) I found a little buglet recently that > made me think this code should be banged on more. The best way to do > this is to get it out there. If other users are anything like me, I > rely on tab-completion and docstrings not online docs for working with > projects that I don't need to be intimately familiar with, the > implication being that lib is intimate, I guess. > > Skipper > (standing astride this molehill) Trample it! From cjordan1 at uw.edu Tue Jul 5 19:46:27 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Tue, 5 Jul 2011 18:46:27 -0500 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary Message-ID: Here's a short-ish summary of the topics discussed in the conference call this afternoon. WARNING: I try to give examples for everything discussed to make it as concrete as possible. However, most of the examples were not explicitly discussed during the conference. I apologize in advance if I mischaracterize anyone's arguments, and please jump in to correct me if I did. Participants: Travis Oliphant, Mark Wiebe, Matthew Brett, Nathaniel Smith, Pierre GM, Ben Root, Chuck Harris, Wes McKinney, Chris Jordan-Squire First, areas of broad agreement: *There should be more functionality for missing data *There should be dtypes which support missing data ('parameterized dtypes' in the current NEP) *Adding a 'where' semantic to ufuncs *Have the same data with different sets of missing elements in different views *Easy for non-expert numpy users Since we only have Mark is only around Austin until early August, there's also broad agreement that we need to get something done quickly. However, the numpy community (and Travis in particular) are balancing this against the possibility of a sub-optimal solution which can't be taken back. BIT PATTERN & MASK IMPLEMENTATIONS FOR NA ------------------------------------------------------------------------------------------ The current NEP proposes both mask and bit pattern implementations for missing data. I use the terms bit pattern and parameterized dtype interchangeably, since the parameterized dtype will use a bit pattern for its implementation. The two implementations will support the same functionality with respect to NA, and the implementation details will be largely invisible to the user. Their differences are in the 'extra' features each supports. Two common questions were: 1. Why make two implementations of missing data: one with masks and the other with parameterized dtypes? 2. Why does the implementation using masks have higher priority? The answers are: 1. The mask implementation is more general and easier to implement and maintain. The bit pattern implementation saves memory, makes interoperability easier, and makes ABI (Application Binary Interface) compatibility easier. Since each has different strengths, the argument is both should be implemented. 2. The implementation for the parameterized dtypes will rely on the implementation using a mask. NA VS. IGNORE --------------------------------- A lot of discussion centered on IGNORE vs. NA types. We take IGNORE in aNEP sense and NA in NEP sense. With NA, there is a clear notion of how NA propagates through all basic numpy operations. (e.g., 3+NA=NA and log(NA) = NA, while NA | True = True.) IGNORE is separate from NA, with different interpretations depending on the use case. IGNORE could mean: 1. Data that is being temporarily ignored. e.g., a possible outlier that is temporarily being removed from consideration. 2. Data that cannot exist. e.g., a matrix representing a grid of water depths for a lake. Since the lake isn't square, some entries will represent land, and so depth will be a meaningless concept for those entries. 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], [IGNORE, 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though this leaves open how [1, 2, IGNORE] + [3 , 4] should behave. Because of these different uses of IGNORE, it doesn't have as clear a theoretical interpretation as NA. (For instance, what is IGNORE+3, IGNORE*3, or IGNORE | True?) But several of the discussants thought the use cases for IGNORE were very compelling. Specifically, they wanted to be able to use IGNORE's and NA's simultaneously while still being able to differentiate between them. So, for example, being able to designate some data as IGNORE while still able to determine which data was NA but not IGNORE. The current NEP does not allow for this directly. Although in some cases it can be indirectly done via views. (By taking a view of the original data, expanding the values which are considered NA in the view, and then comparing with the original data to see if the NA is in the original or not.) Since both are possible in this sense, Mark's NEP makes it so IGNORE is allowed but isn't the default. Another important point from the current NEP is that not being able to access values considered missing, even if the implementation of missingness is via a mask, is a feature and not a bug. It is a feature because if the data is missing then, conceptually, neither the user nor any function the user calls should be able to obtain that data. This is precisely why the indirect route, via views of the original data, is required to access data that a different view says is missing. The current NEP treats all NA's the same. The reasoning is that, regardless of where the NA originated, the functions the numpy array is fed in to will either ignore all NA's or propagate them (i.e. not ignore them). These two different behaviors are chosen when passed into a ufunc by setting the skipna ufunc parameter to True or False. Since the NA's are treated the same, their source is irrelevant. Though this could be argued against if there are compelling cases where the IGNORE and NA are treated differently. A possible solution to the above desires for an IGNORE notion of missingness is to allow for multiple types of missing values. For example, the mask underlying the missing data could have int types, and different ints mean different missing. E.g. 0 is present, 1 is NA, 2 is IGNORE. However, this was only discussed briefly at the end of the conference call, and should be discussed further. HOW DOES THIS RELATE TO THE CURRENT MASKED ARRAY? ---------------------------------------------------------------------------------------------------- Everyone seems to agree they'd love it if this could encompass all current use cases of the numpy.ma arrays, so numpy.ma arrays could be deprecated. (However they wouldn't be eliminated for several years, even in the most optimistic scenarios.) IMPLEMENTATION DETAILS ----------------------------------------------------- *Under the hood, the parameterized dtypes will use buffered masks when performing operations. This can be a source of confusion when discussing their behavior, since there is no true mask, hence no extra memory, but a mask is created on the fly. *The iterator will be given a new 'masked' mode, triggered by a flag, which will use or ignore data based on a boolean array. *Currently won't allow shared masks. But Pierre GM suggests that's just as well since they easily lead to buggy code. I hope this summary roughly captures what was said. Please chime in with additional comments/corrections. -Chris Jordan-Squire -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Jul 5 22:53:16 2011 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 5 Jul 2011 21:53:16 -0500 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: Thanks for these notes. Just a couple of thoughts as I looked over these notes. On Tue, Jul 5, 2011 at 6:46 PM, Christopher Jordan-Squire wrote: > 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], [IGNORE, > 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though this > leaves open how [1, 2, IGNORE] + [3 , 4] should behave. > > I don't think there is any confusion about that particular case. Even when using the IGNORE semantics, numpy broadcasting rules are still in play. This particular case should throw an exception. > Because of these different uses of IGNORE, it doesn't have as clear a > theoretical interpretation as NA. (For instance, what is IGNORE+3, IGNORE*3, > or IGNORE | True?) > > I think we were more referring to matrix operations like dot products. Element-by-element operations should still behave the same as NA. Scalar operations should return IGNORE. HOW DOES THIS RELATE TO THE CURRENT MASKED ARRAY? > > ---------------------------------------------------------------------------------------------------- > > Everyone seems to agree they'd love it if this could encompass all current > use cases of the numpy.ma arrays, so numpy.ma arrays could be deprecated. > (However they wouldn't be eliminated for several years, even in the most > optimistic scenarios.) > > This is going to be a very tricky thing to handle and it is going to require coordination and agreements among many of the third-party toolkits like scipy and matplotlib. In addition to these notes (unless I missed it), Nathaniel pointed out that with the ufunc where= parameter feature and the ufunc wrapper, we have the potential to greatly improve the codebase of numpy.ma as it stands. Potentially mitigating the need for moving more of numpy.ma into the core, and to focus more on NA. While I am not 100% on board with this idea, I can definitely see the potential for this path. Thanks everybody for the productive chat! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Jul 6 01:30:36 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 6 Jul 2011 07:30:36 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: Message-ID: On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen wrote: > In article , > Ralf Gommers wrote: > > > https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > > Will there be a Mac binary for 32-bit pythons (one that is compatible > with older versions of MacOS X)? At present I only see a 64-bit > 10.6-only version. > > > Yes there will be for the final release (10.4-10.6 compatible). I can't create those on my own computer, so sometimes I don't make them for RCs. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.hirschfeld at gmail.com Wed Jul 6 04:09:55 2011 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 6 Jul 2011 08:09:55 +0000 (UTC) Subject: [Numpy-discussion] Moving lib.recfunctions? References: <0949C672-C283-4176-A165-420E4917BA1A@gmail.com> Message-ID: Pierre GM gmail.com> writes: > > > Hello, > The idea behin having a lib.recfunctions and not a rec.recfunctions or whatever was to illustrate that the > functions of this package are more generic than they appear. They work with regular structured ndarrays > and don't need recarrays. Methinks we gonna lose this aspect if you try to rename it, but hey, your call. I've never really thought there's much distinction between the two - AFAICT a recarray is just a structured array with attribute access? If a function only accepts a recarray (are there any?) isn't it just a simple call to .view(np.recarray) to get it to work with structured arrays? Because of this view I've always thought functions which worked on either should be grouped together. > As to as why they were never really advertised ? Because I never received any feedback when I started > developing them (developing is a big word here, I just took a lot of code that John D Hunter had developed in > matplotlib and make it more consistent). I advertised them once or twice on the list, wrote the basic > docstrings, but waited for other people to start using them. > Anyhow. > So, yes, there might be some weird import to polish. Note that if you decided to just rename the package and > leave it where it was, it would probably be easier. > > > The path of least resistance is to just import lib.recfunctions.* into > > the (already crowded) main numpy namespace and be done with it. > > Why ? Why can't you leave it available through numpy.lib ? Once again, if it's only a matter of PRing, you > could start writing an entry page in the doc describing the functions, that would improve the visibility. > I do recall them being advertised a while ago, but when I came to look for them I couldn't find them - IMHO np.rec is a much more intuitive (and nicer/shorter) namespace than np.lib.recfunctions. I think having similar functionality in two completely different namespaces is confusing & hard to remember. It also doesn't help that np.lib.recfunctions isn't discoverable by t ab-completion: In [2]: np.lib.rec np.lib.recfromcsv np.lib.recfromtxt ...of course you could probably find it with np.lookfor but it's one more barrier to their use. FWIW I'd be happy if the np.lib.recfunctions fuctions were made available in the np.rec namespace (and possibly deprecate np.lib.recfunctions to avoid confusion?) I'm conscious that as a user (not a developer) talk is cheap and I'm happy with whatever the consensus is. I just thought I'd pipe up since it was only through this thread that I re-discovered np.lib.recfunctions! HTH, Dave From numpy-discussion at maubp.freeserve.co.uk Wed Jul 6 05:13:49 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Wed, 6 Jul 2011 10:13:49 +0100 Subject: [Numpy-discussion] Current status of 64 bit windows support. In-Reply-To: <4E0E10BB.6040000@molden.no> References: <4E0E10BB.6040000@molden.no> Message-ID: On Fri, Jul 1, 2011 at 7:23 PM, Sturla Molden wrote: > > Den 01.07.2011 19:22, skrev Charles R Harris: >> Just curious as to what folks know about the current status of the >> free windows 64 bit compilers. I know things were dicey with gcc and >> gfortran some two years ago, but... well, two years have passed. This > > Windows 7 SDK is free (as in beer). It is the C compiler used to build > Python on Windows 64. Here is the download: > > http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=3138 > > A newer version of the Windows SDK will use a C compiler that links with > a different CRT than Python uses. Use version 3.5. When using this > compiler, remember to set the environment variable DISTUTILS_USE_SDK. > > This should be sufficient to build NumPy. AFAIK only SciPy requires a > Fortran compiler. > > Mingw is still not stabile on Windows 64. There are supposedly > compatibility issues between the MinGW runtime used by libgfortran and > Python's CRT. ?While there are experimental MinGW builds for Windows 64 > (e.g. TDM-GCC), we will probably need to build libgfortran against > another C runtime for SciPy. A commercial Fortran compiler compatible > with MSVC is recommended for SciPy, e.g. Intel, Absoft or Portland. > > > Sturla So it sounds like we're getting closer to having official NumPy 1.6.x binaries for 64 bit Windows (using the Windows 7 SDK), but not quite there yet? What is the roadblock? I would guess from the comments on Christoph Gohlke's page the issue is having something that will work with SciPy... see http://www.lfd.uci.edu/~gohlke/pythonlibs/ I'm interested from the point of view of third party libraries using NumPy, where we have had users asking for 64bit installers. We need an official NumPy installer to build against. Regards, Peter From matthew.brett at gmail.com Wed Jul 6 08:05:03 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 6 Jul 2011 13:05:03 +0100 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: Hi, Just for reference, I am using this as the latest version of the NEP - I hope it's current: https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst I'm mostly relaying stuff I said, although generally (please do correct me if I am wrong) I am just re-expressing points that Nathaniel has already made in the alterNEP text and the emails. On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire wrote: ... > Since we only have Mark is only around Austin until early August, there's > also broad agreement that we need to get something done quickly. I think I might have missed that part of the discussion :) I feel the need to emphasize the centrality of the assertion by Nathaniel, and agreement by (at least) me, that the NA case (there really is no data) and the IGNORE case (there is data but I'm concealing it from you) are conceptually different, and come from different use-cases. The underlying disagreement returned many times to this fundamental difference between the NEP and alterNEP: In the NEP - by design - it is impossible to distinguish between na.NA and na.IGNORE The alterNEP insists you should be able to distinguish. Mark says something like "it's all missing data, there's no reason you should want to distinguish". Nathaniel and I were saying "the two types of missing do have different use-cases, and it should be possible to distinguish. You might want to chose to treat them the same, but you should be able to see what they are.". I returned several times to this (original point by Nathaniel): a[3] = np.NA (what does this mean? I am altering the underlying array, or a mask? How would I explain this to someone?) We confirmed that, in order to make it difficult to know what your NA is (masked or bit-pattern), Mark has to a) hinder access to the data below the mask and b) prevent direct API access to the masking array. I described this as 'hobbling the API' and Mark thought of it as 'generic programming' (missing is always missing). I asserted that explaining NA to people would be easier if ``a[3] = np.NA`` was direct assignment and altered the array. > BIT PATTERN & MASK IMPLEMENTATIONS FOR NA > ------------------------------------------------------------------------------------------ > The current NEP proposes both mask and bit pattern implementations for > missing data. I use the terms bit pattern and parameterized dtype > interchangeably, since the parameterized dtype will use a bit pattern for > its implementation. The two implementations will support the same > functionality with respect to NA, and the implementation details will be > largely invisible to the user. Their differences are in the 'extra' features > each supports. > > Two common questions were: > 1. Why make two implementations of missing data: one with masks and the > other with parameterized dtypes? > 2. Why does the implementation using masks have higher priority? > The answers are: > 1.??The mask implementation is more general and easier to implement and > maintain. ?The bit pattern implementation saves memory, makes > interoperability easier, and makes ABI (Application Binary Interface) > compatibility easier. Since each has different strengths, the argument is > both should be implemented. > 2. The implementation for the parameterized dtypes will rely on the > implementation using a mask. > > NA VS. IGNORE > --------------------------------- > A lot of discussion centered on IGNORE vs. NA types. We take IGNORE in aNEP > sense and NA in ?NEP sense. With NA, there is a clear notion of how NA > propagates through all basic numpy operations. ?(e.g., 3+NA=NA and log(NA) = > NA, while NA | True = True.) IGNORE is separate from NA, with different > interpretations depending on the use case. > IGNORE could mean: > 1. Data that is being temporarily ignored. e.g., a possible outlier that is > temporarily being removed from consideration. > 2. Data that cannot exist. e.g., a matrix representing a grid of water > depths for a lake. Since the lake isn't square, some entries will represent > land, and so depth will be a meaningless concept for those entries. > 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], [IGNORE, > 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though this > leaves open how [1, 2, IGNORE] + [3 , 4] should behave. > Because of these different uses of IGNORE, it doesn't have as clear a > theoretical interpretation as NA. (For instance, what is IGNORE+3, IGNORE*3, > or IGNORE | True?) I don't remember this bit of the discussion, but I see from current masked arrays that IGNORE is treated as the identity, so: IGNORE + 3 = 3 IGNORE * 3 = 3 > But several of the discussants thought the use cases for IGNORE were very > compelling. Specifically, they wanted to be able to use IGNORE's and NA's > simultaneously while still being able to differentiate between them. So, for > example, being able to designate some data as IGNORE while still able to > determine which data was NA but not IGNORE. The current NEP does not allow > for this directly. I think we discovered that the current NEP is designed to prevent us distinguishing between these cases. I was asking what it was about the implementation (as opposed to the API) that influenced the decision to make masked and bit-pattern missing data appear to be identical. I left the conversation before the end, but up until that point, had failed to understand. See you, Matthew From d.s.seljebotn at astro.uio.no Wed Jul 6 08:27:53 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 06 Jul 2011 14:27:53 +0200 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: <4E1454C9.5010209@astro.uio.no> On 07/06/2011 02:05 PM, Matthew Brett wrote: > Hi, > > Just for reference, I am using this as the latest version of the NEP - > I hope it's current: > > https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst > > I'm mostly relaying stuff I said, although generally (please do > correct me if I am wrong) I am just re-expressing points that > Nathaniel has already made in the alterNEP text and the emails. > > On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire > wrote: > ... >> Since we only have Mark is only around Austin until early August, there's >> also broad agreement that we need to get something done quickly. > > I think I might have missed that part of the discussion :) > > I feel the need to emphasize the centrality of the assertion by > Nathaniel, and agreement by (at least) me, that the NA case (there > really is no data) and the IGNORE case (there is data but I'm > concealing it from you) are conceptually different, and come from > different use-cases. > > The underlying disagreement returned many times to this fundamental > difference between the NEP and alterNEP: > > In the NEP - by design - it is impossible to distinguish between na.NA > and na.IGNORE > The alterNEP insists you should be able to distinguish. > > Mark says something like "it's all missing data, there's no reason you > should want to distinguish". Nathaniel and I were saying "the two > types of missing do have different use-cases, and it should be > possible to distinguish. You might want to chose to treat them the > same, but you should be able to see what they are.". > > I returned several times to this (original point by Nathaniel): > > a[3] = np.NA > > (what does this mean? I am altering the underlying array, or a mask? > How would I explain this to someone?) > > We confirmed that, in order to make it difficult to know what your NA > is (masked or bit-pattern), Mark has to a) hinder access to the data > below the mask and b) prevent direct API access to the masking array. > I described this as 'hobbling the API' and Mark thought of it as > 'generic programming' (missing is always missing). Here's an HPC perspective...: If you, say, want to off-load array processing with a mask to some code running on a GPU, you really can't have the GPU go through some NumPy API. Or if you want to implement a masked array on a cluster with MPI, you similarly really, really want raw access. At least I feel that the transparency of NumPy is a huge part of its current success. Many more than me spend half their time in C/Fortran and half their time in Python. I tend to look at NumPy this way: Assuming you have some data in memory (possibly loaded by a C or Fortran library). (Almost) no matter how it is allocated, ordered, packed, aligned -- there's a way to find strides and dtypes to put a nice NumPy wrapper around it and use the memory from Python. So, my view on Mark's NEP was: With a reasonably amount of flexibility in how you decided to implement masking for your data, you can create a NumPy wrapper that will understand that. Whether your Fortran library exposes NAs in its 40GB buffer as bit patterns, or using a seperate mask, both will work. And IMO Mark's NEP comes rather close to this, you just need an additional NEP later to give raw details to the implementation details, once those are settled :-) Dag Sverre From d.s.seljebotn at astro.uio.no Wed Jul 6 08:31:44 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 06 Jul 2011 14:31:44 +0200 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: <4E1454C9.5010209@astro.uio.no> References: <4E1454C9.5010209@astro.uio.no> Message-ID: <4E1455B0.5010101@astro.uio.no> On 07/06/2011 02:27 PM, Dag Sverre Seljebotn wrote: > On 07/06/2011 02:05 PM, Matthew Brett wrote: >> Hi, >> >> Just for reference, I am using this as the latest version of the NEP - >> I hope it's current: >> >> https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst >> >> I'm mostly relaying stuff I said, although generally (please do >> correct me if I am wrong) I am just re-expressing points that >> Nathaniel has already made in the alterNEP text and the emails. >> >> On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire >> wrote: >> ... >>> Since we only have Mark is only around Austin until early August, there's >>> also broad agreement that we need to get something done quickly. >> >> I think I might have missed that part of the discussion :) >> >> I feel the need to emphasize the centrality of the assertion by >> Nathaniel, and agreement by (at least) me, that the NA case (there >> really is no data) and the IGNORE case (there is data but I'm >> concealing it from you) are conceptually different, and come from >> different use-cases. >> >> The underlying disagreement returned many times to this fundamental >> difference between the NEP and alterNEP: >> >> In the NEP - by design - it is impossible to distinguish between na.NA >> and na.IGNORE >> The alterNEP insists you should be able to distinguish. >> >> Mark says something like "it's all missing data, there's no reason you >> should want to distinguish". Nathaniel and I were saying "the two >> types of missing do have different use-cases, and it should be >> possible to distinguish. You might want to chose to treat them the >> same, but you should be able to see what they are.". >> >> I returned several times to this (original point by Nathaniel): >> >> a[3] = np.NA >> >> (what does this mean? I am altering the underlying array, or a mask? >> How would I explain this to someone?) >> >> We confirmed that, in order to make it difficult to know what your NA >> is (masked or bit-pattern), Mark has to a) hinder access to the data >> below the mask and b) prevent direct API access to the masking array. >> I described this as 'hobbling the API' and Mark thought of it as >> 'generic programming' (missing is always missing). > > Here's an HPC perspective...: > > If you, say, want to off-load array processing with a mask to some code > running on a GPU, you really can't have the GPU go through some NumPy > API. Or if you want to implement a masked array on a cluster with MPI, > you similarly really, really want raw access. > > At least I feel that the transparency of NumPy is a huge part of its > current success. Many more than me spend half their time in C/Fortran > and half their time in Python. > > I tend to look at NumPy this way: Assuming you have some data in memory > (possibly loaded by a C or Fortran library). (Almost) no matter how it > is allocated, ordered, packed, aligned -- there's a way to find strides > and dtypes to put a nice NumPy wrapper around it and use the memory from > Python. > > So, my view on Mark's NEP was: With a reasonably amount of flexibility > in how you decided to implement masking for your data, you can create a > NumPy wrapper that will understand that. Whether your Fortran library > exposes NAs in its 40GB buffer as bit patterns, or using a seperate > mask, both will work. > > And IMO Mark's NEP comes rather close to this, you just need an > additional NEP later to give raw details to the implementation details, > once those are settled :-) To be concrete, I'm thinking something like a custom extension to PEP 3118, which could also allow efficient access from Cython without hard-coding Cython for NumPy (a GSoC project this summer will continue to move us away from the "np.ndarray[int]" syntax to a more generic "int[:]" that's less tied to NumPy). But first things first! Dag Sverre From matthew.brett at gmail.com Wed Jul 6 08:46:06 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 6 Jul 2011 13:46:06 +0100 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary Message-ID: Hi, Sorry, I hope you don't mind, I moved this to it's own thread, trying to separate comments on the NA debate from the discussion yesterday. On Wed, Jul 6, 2011 at 1:27 PM, Dag Sverre Seljebotn wrote: > On 07/06/2011 02:05 PM, Matthew Brett wrote: >> Hi, >> >> Just for reference, I am using this as the latest version of the NEP - >> I hope it's current: >> >> https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst >> >> I'm mostly relaying stuff I said, although generally (please do >> correct me if I am wrong) I am just re-expressing points that >> Nathaniel has already made in the alterNEP text and the emails. >> >> On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire >> ?wrote: >> ... >>> Since we only have Mark is only around Austin until early August, there's >>> also broad agreement that we need to get something done quickly. >> >> I think I might have missed that part of the discussion :) >> >> I feel the need to emphasize the centrality of the assertion by >> Nathaniel, and agreement by (at least) me, that the NA case (there >> really is no data) and the IGNORE case (there is data but I'm >> concealing it from you) are conceptually different, and come from >> different use-cases. >> >> The underlying disagreement returned many times to this fundamental >> difference between the NEP and alterNEP: >> >> In the NEP - by design - it is impossible to distinguish between na.NA >> and na.IGNORE >> The alterNEP insists you should be able to distinguish. >> >> Mark says something like "it's all missing data, there's no reason you >> should want to distinguish". ?Nathaniel and I were saying "the two >> types of missing do have different use-cases, and it should be >> possible to distinguish. ?You might want to chose to treat them the >> same, but you should be able to see what they are.". >> >> I returned several times to this (original point by Nathaniel): >> >> a[3] = np.NA >> >> (what does this mean? ? I am altering the underlying array, or a mask? >> ? ?How would I explain this to someone?) >> >> We confirmed that, in order to make it difficult to know what your NA >> is (masked or bit-pattern), Mark has to a) hinder access to the data >> below the mask and b) prevent direct API access to the masking array. >> I described this as 'hobbling the API' and Mark thought of it as >> 'generic programming' (missing is always missing). > > Here's an HPC perspective...: > > If you, say, want to off-load array processing with a mask to some code > running on a GPU, you really can't have the GPU go through some NumPy > API. Or if you want to implement a masked array on a cluster with MPI, > you similarly really, really want raw access. > > At least I feel that the transparency of NumPy is a huge part of its > current success. Many more than me spend half their time in C/Fortran > and half their time in Python. > > I tend to look at NumPy this way: Assuming you have some data in memory > (possibly loaded by a C or Fortran library). (Almost) no matter how it > is allocated, ordered, packed, aligned -- there's a way to find strides > and dtypes to put a nice NumPy wrapper around it and use the memory from > Python. > > So, my view on Mark's NEP was: With a reasonably amount of flexibility > in how you decided to implement masking for your data, you can create a > NumPy wrapper that will understand that. Whether your Fortran library > exposes NAs in its 40GB buffer as bit patterns, or using a seperate > mask, both will work. > > And IMO Mark's NEP comes rather close to this, you just need an > additional NEP later to give raw details to the implementation details, > once those are settled :-) I was a little puzzled as to what you were trying to say, but I suspect that's my ignorance about Numpy internals. Superficially, I would have assumed that, making masked and bit-pattern NAs behave the same in numpy, would take you away from the raw data, in the sense that you not only need the dtype, you also need the mask machinery, in order to know if you have an NA. Later I realized that you probably weren't saying that. So, just for my unhappy ignorance - how does the HPC perspective relate to debate about "can / can't distinguish NA from ignore"? Sorry, thanks, Matthew From d.s.seljebotn at astro.uio.no Wed Jul 6 09:12:34 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 06 Jul 2011 15:12:34 +0200 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: <4E145F42.9060007@astro.uio.no> On 07/06/2011 02:46 PM, Matthew Brett wrote: > Hi, > > Sorry, I hope you don't mind, I moved this to it's own thread, trying > to separate comments on the NA debate from the discussion yesterday. I'm sorry. > On Wed, Jul 6, 2011 at 1:27 PM, Dag Sverre Seljebotn > wrote: >> On 07/06/2011 02:05 PM, Matthew Brett wrote: >>> Hi, >>> >>> Just for reference, I am using this as the latest version of the NEP - >>> I hope it's current: >>> >>> https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst >>> >>> I'm mostly relaying stuff I said, although generally (please do >>> correct me if I am wrong) I am just re-expressing points that >>> Nathaniel has already made in the alterNEP text and the emails. >>> >>> On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire >>> wrote: >>> ... >>>> Since we only have Mark is only around Austin until early August, there's >>>> also broad agreement that we need to get something done quickly. >>> >>> I think I might have missed that part of the discussion :) >>> >>> I feel the need to emphasize the centrality of the assertion by >>> Nathaniel, and agreement by (at least) me, that the NA case (there >>> really is no data) and the IGNORE case (there is data but I'm >>> concealing it from you) are conceptually different, and come from >>> different use-cases. >>> >>> The underlying disagreement returned many times to this fundamental >>> difference between the NEP and alterNEP: >>> >>> In the NEP - by design - it is impossible to distinguish between na.NA >>> and na.IGNORE >>> The alterNEP insists you should be able to distinguish. >>> >>> Mark says something like "it's all missing data, there's no reason you >>> should want to distinguish". Nathaniel and I were saying "the two >>> types of missing do have different use-cases, and it should be >>> possible to distinguish. You might want to chose to treat them the >>> same, but you should be able to see what they are.". >>> >>> I returned several times to this (original point by Nathaniel): >>> >>> a[3] = np.NA >>> >>> (what does this mean? I am altering the underlying array, or a mask? >>> How would I explain this to someone?) >>> >>> We confirmed that, in order to make it difficult to know what your NA >>> is (masked or bit-pattern), Mark has to a) hinder access to the data >>> below the mask and b) prevent direct API access to the masking array. >>> I described this as 'hobbling the API' and Mark thought of it as >>> 'generic programming' (missing is always missing). >> >> Here's an HPC perspective...: >> >> If you, say, want to off-load array processing with a mask to some code >> running on a GPU, you really can't have the GPU go through some NumPy >> API. Or if you want to implement a masked array on a cluster with MPI, >> you similarly really, really want raw access. >> >> At least I feel that the transparency of NumPy is a huge part of its >> current success. Many more than me spend half their time in C/Fortran >> and half their time in Python. >> >> I tend to look at NumPy this way: Assuming you have some data in memory >> (possibly loaded by a C or Fortran library). (Almost) no matter how it >> is allocated, ordered, packed, aligned -- there's a way to find strides >> and dtypes to put a nice NumPy wrapper around it and use the memory from >> Python. >> >> So, my view on Mark's NEP was: With a reasonably amount of flexibility >> in how you decided to implement masking for your data, you can create a >> NumPy wrapper that will understand that. Whether your Fortran library >> exposes NAs in its 40GB buffer as bit patterns, or using a seperate >> mask, both will work. >> >> And IMO Mark's NEP comes rather close to this, you just need an >> additional NEP later to give raw details to the implementation details, >> once those are settled :-) > > I was a little puzzled as to what you were trying to say, but I > suspect that's my ignorance about Numpy internals. > > Superficially, I would have assumed that, making masked and > bit-pattern NAs behave the same in numpy, would take you away from the > raw data, in the sense that you not only need the dtype, you also need > the mask machinery, in order to know if you have an NA. Later I > realized that you probably weren't saying that. So, just for my > unhappy ignorance - how does the HPC perspective relate to debate > about "can / can't distinguish NA from ignore"? I just commented on the "prevent direct API access to the masking array" part -- I'm hoping direct access by external code to the underlying implementation details will be allowed, at some point. What I'm saying is that Mark's proposal is more flexible. Say for the sake of the argument that I have two codes I need to interface with: - Library A is written in Fortran and uses a seperate (explicit) mask array for NA - Library B runs on a GPU and uses a bit pattern for NA Mark's proposal then comes closer to allowing me to wrap both codes using NumPy, since it supports both implementation mechanisms. Sure, it would need a seperate NEP down the road to extend it, but it goes in the right direction for this to happen. As for NA vs. IGNORE I still think 2 types is too little. One should allow for 255 different NA-values, each with user-defined behaviour. Again, Mark's proposal then makes a good start on that, even if more work would be needed to make it happen. I.e., in my perfect world I'd do this to wrap library A (Cythonish psuedo-code: def call_lib_A(): ... lib_A_function(arraybuf, maskbuf, ...) DOG_ATE_IT = np.NA("DOG_ATE_IT", value=42, behaviour="raise") # behaviour could also be "zero", "invalid" missing_value_map = {0xAF: np.NA, 0x43: np.IGNORE, 0xF0: DOG_ATE_IT} result = np.PyArray_CreateArrayFromBufferWithMaskBuffer( arraybuf, maskbuf, missing_value_map, ...) return result def call_lib_B(): lib_B_function(arraybuf, ...) missing_value_patterns = {0xFFFFCACA : np.NA} result = np.PyArray_CreateArrayFromBufferWithBitPattern( arraybuf, maskbuf, missing_value_patterns, ...) return result Hope that is clearer. Again, my intention is not to suggest even more work at the present stage, just to state some advantages with the general direction of Mark's proposal. Dag Sverre From matthew.brett at gmail.com Wed Jul 6 10:47:15 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 6 Jul 2011 15:47:15 +0100 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary In-Reply-To: <4E145F42.9060007@astro.uio.no> References: <4E145F42.9060007@astro.uio.no> Message-ID: Hi, On Wed, Jul 6, 2011 at 2:12 PM, Dag Sverre Seljebotn wrote: > On 07/06/2011 02:46 PM, Matthew Brett wrote: >> Hi, >> >> Sorry, I hope you don't mind, I moved this to it's own thread, trying >> to separate comments on the NA debate from the discussion yesterday. > > I'm sorry. > >> On Wed, Jul 6, 2011 at 1:27 PM, Dag Sverre Seljebotn >> ?wrote: >>> On 07/06/2011 02:05 PM, Matthew Brett wrote: >>>> Hi, >>>> >>>> Just for reference, I am using this as the latest version of the NEP - >>>> I hope it's current: >>>> >>>> https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst >>>> >>>> I'm mostly relaying stuff I said, although generally (please do >>>> correct me if I am wrong) I am just re-expressing points that >>>> Nathaniel has already made in the alterNEP text and the emails. >>>> >>>> On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire >>>> ? ?wrote: >>>> ... >>>>> Since we only have Mark is only around Austin until early August, there's >>>>> also broad agreement that we need to get something done quickly. >>>> >>>> I think I might have missed that part of the discussion :) >>>> >>>> I feel the need to emphasize the centrality of the assertion by >>>> Nathaniel, and agreement by (at least) me, that the NA case (there >>>> really is no data) and the IGNORE case (there is data but I'm >>>> concealing it from you) are conceptually different, and come from >>>> different use-cases. >>>> >>>> The underlying disagreement returned many times to this fundamental >>>> difference between the NEP and alterNEP: >>>> >>>> In the NEP - by design - it is impossible to distinguish between na.NA >>>> and na.IGNORE >>>> The alterNEP insists you should be able to distinguish. >>>> >>>> Mark says something like "it's all missing data, there's no reason you >>>> should want to distinguish". ?Nathaniel and I were saying "the two >>>> types of missing do have different use-cases, and it should be >>>> possible to distinguish. ?You might want to chose to treat them the >>>> same, but you should be able to see what they are.". >>>> >>>> I returned several times to this (original point by Nathaniel): >>>> >>>> a[3] = np.NA >>>> >>>> (what does this mean? ? I am altering the underlying array, or a mask? >>>> ? ? How would I explain this to someone?) >>>> >>>> We confirmed that, in order to make it difficult to know what your NA >>>> is (masked or bit-pattern), Mark has to a) hinder access to the data >>>> below the mask and b) prevent direct API access to the masking array. >>>> I described this as 'hobbling the API' and Mark thought of it as >>>> 'generic programming' (missing is always missing). >>> >>> Here's an HPC perspective...: >>> >>> If you, say, want to off-load array processing with a mask to some code >>> running on a GPU, you really can't have the GPU go through some NumPy >>> API. Or if you want to implement a masked array on a cluster with MPI, >>> you similarly really, really want raw access. >>> >>> At least I feel that the transparency of NumPy is a huge part of its >>> current success. Many more than me spend half their time in C/Fortran >>> and half their time in Python. >>> >>> I tend to look at NumPy this way: Assuming you have some data in memory >>> (possibly loaded by a C or Fortran library). (Almost) no matter how it >>> is allocated, ordered, packed, aligned -- there's a way to find strides >>> and dtypes to put a nice NumPy wrapper around it and use the memory from >>> Python. >>> >>> So, my view on Mark's NEP was: With a reasonably amount of flexibility >>> in how you decided to implement masking for your data, you can create a >>> NumPy wrapper that will understand that. Whether your Fortran library >>> exposes NAs in its 40GB buffer as bit patterns, or using a seperate >>> mask, both will work. >>> >>> And IMO Mark's NEP comes rather close to this, you just need an >>> additional NEP later to give raw details to the implementation details, >>> once those are settled :-) >> >> I was a little puzzled as to what you were trying to say, but I >> suspect that's my ignorance about Numpy internals. >> >> Superficially, I would have assumed that, making masked and >> bit-pattern NAs behave the same in numpy, would take you away from the >> raw data, in the sense that you not only need the dtype, you also need >> the mask machinery, in order to know if you have an NA. ? Later I >> realized that you probably weren't saying that. ?So, just for my >> unhappy ignorance - how does the HPC perspective relate to debate >> about "can / can't distinguish NA from ignore"? > > I just commented on the "prevent direct API access to the masking array" > part -- I'm hoping direct access by external code to the underlying > implementation details will be allowed, at some point. > > What I'm saying is that Mark's proposal is more flexible. Say for the > sake of the argument that I have two codes I need to interface with: > > ?- Library A is written in Fortran and uses a seperate (explicit) mask > array for NA > > ?- Library B runs on a GPU and uses a bit pattern for NA > > Mark's proposal then comes closer to allowing me to wrap both codes > using NumPy, since it supports both implementation mechanisms. Sure, it > would need a seperate NEP down the road to extend it, but it goes in the > right direction for this to happen. I'm sorry - honestly - maybe it's because I've just had lunch, but I think I am not understanding something. When you say "Mark's proposal is more flexible" - more flexible than what? I think we agree that: * NA bitpatterns are good to have * masks are good to have and the discussion is about: * should it be possible to distinguish between bitpatterns (NAs) and masks (IGNORE). Are you saying that making it not-possible to distinguish - at the numpy level, is more flexible? Cheers, Matthew From mwwiebe at gmail.com Wed Jul 6 11:40:11 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 10:40:11 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas Message-ID: It appears to me that one of the biggest reason some of us have been talking past each other in the discussions is that different people have different definitions for the terms being used. Until this is thoroughly cleared up, I feel the design process is tilting at windmills. In the interests of clarity in our discussions, here is a starting point which is consistent with the NEP. These definitions have been added in a glossary within the NEP. If there are any ideas for amendments to these definitions that we can agree on, I will update the NEP with those amendments. Also, if I missed any important terms which need to be added, please propose definitions for them. NA (Not Available) A placeholder for a value which is unknown to computations. That value may be temporarily hidden with a mask, may have been lost due to hard drive corruption, or gone for any number of reasons. This is the same as NA in the R project. IGNORE (Skip/Ignore) A placeholder which should be treated by computations as if no value does or could exist there. For sums, this means act as if the value were zero, and for products, this means act as if the value were one. It's as if the array were compressed in some fashion to not include that element. bitpattern A technique for implementing either NA or IGNORE, where a particular set of bit patterns are chosen from all the possible bit patterns of the value's data type to signal that the element is NA or IGNORE. mask A technique for implementing either NA or IGNORE, where a boolean or enum array parallel to the data array is used to signal which elements are NA or IGNORE. numpy.ma The existing implementation of a particular form of masked arrays, which is part of the NumPy codebase. The most important distinctions I'm trying to draw are: 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and IGNORE as mask are reasonable. 2) The idea of masking and the numpy.ma implementation are different. The numpy.ma object makes particular choices about how to interpret the mask, but while backwards compatibility is important, a fresh evaluation of all the design choices going into a mask implementation is worthwhile. Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From numpy-discussion at maubp.freeserve.co.uk Wed Jul 6 12:33:27 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Wed, 6 Jul 2011 17:33:27 +0100 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: > It appears to me that one of the biggest reason some of us have been talking > past each other in the discussions is that different people have different > definitions for the terms being used. Until this is thoroughly cleared up, I > feel the design process is tilting at windmills. > In the interests of clarity in our discussions, here is a starting point > which is consistent with the NEP. These definitions have been added in a > glossary within the NEP. If there are any ideas for amendments to these > definitions that we can agree on, I will update the NEP with those > amendments. Also, if I missed any important terms which need to be added, > please propose definitions for them. That sounds good - I've only been scanning these discussions and it is confusing. > NA (Not Available) > ? ? A placeholder for a value which is unknown to computations. That > ? ? value may be temporarily hidden with a mask, may have been lost > ? ? due to hard drive corruption, or gone for any number of reasons. > ? ? This is the same as NA in the R project. Could you expand that to say how sums and products act with NA (since you do so for the IGNORE case). Thanks, Peter From matthew.brett at gmail.com Wed Jul 6 12:38:48 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 6 Jul 2011 17:38:48 +0100 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: Hi, On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: > It appears to me that one of the biggest reason some of us have been talking > past each other in the discussions is that different people have different > definitions for the terms being used. Until this is thoroughly cleared up, I > feel the design process is tilting at windmills. > In the interests of clarity in our discussions, here is a starting point > which is consistent with the NEP. These definitions have been added in a > glossary within the NEP. If there are any ideas for amendments to these > definitions that we can agree on, I will update the NEP with those > amendments. Also, if I missed any important terms which need to be added, > please propose definitions for them. > NA (Not Available) > ? ? A placeholder for a value which is unknown to computations. That > ? ? value may be temporarily hidden with a mask, may have been lost > ? ? due to hard drive corruption, or gone for any number of reasons. > ? ? This is the same as NA in the R project. Really? Can one implement NA with a mask in R? I thought an NA was always bitpattern in R? > IGNORE (Skip/Ignore) > ? ? A placeholder which should be treated by computations as if no value > does > ? ? or could exist there. For sums, this means act as if the value > ? ? were zero, and for products, this means act as if the value were one. > ? ? It's as if the array were compressed in some fashion to not include > ? ? that element. > bitpattern > ? ? A technique for implementing either NA or IGNORE, where a particular > ? ? set of bit patterns are chosen from all the possible bit patterns of the > ? ? value's data type to signal that the element is NA or IGNORE. > mask > ? ? A technique for implementing either NA or IGNORE, where a > ? ? boolean or enum array parallel to the data array is used to signal > ? ? which elements are NA or IGNORE. > numpy.ma > ? ? The existing implementation of a particular form of masked arrays, > ? ? which is part of the NumPy codebase. > > The most important distinctions I'm trying to draw are: > 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > IGNORE as mask are reasonable. > 2) The idea of masking and the numpy.ma implementation are different. The > numpy.ma object makes particular choices about how to interpret the mask, > but while backwards compatibility is important, a fresh evaluation of all > the design choices going into a mask implementation is worthwhile. I agree that there has been some confusion due to the terms. However, I continue to believe that the discussion is substantial and not due to confusion. Let us then characterize the substantial discussion as this: NEP: bitpattern and masked out values should be made nearly impossible to distinguish in the API alterNEP: bitpattern and masked out values should be distinct in the API so that it can be made clear which is meant (and therefore, implicitly, how they are implemented). Do you agree that this is the discussion? See you, Matthew From numpy-discussion at maubp.freeserve.co.uk Wed Jul 6 12:48:53 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Wed, 6 Jul 2011 17:48:53 +0100 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 5:38 PM, Matthew Brett wrote: > > Hi, > > On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: >> It appears to me that one of the biggest reason some of us have been talking >> past each other in the discussions is that different people have different >> definitions for the terms being used. Until this is thoroughly cleared up, I >> feel the design process is tilting at windmills. >> In the interests of clarity in our discussions, here is a starting point >> which is consistent with the NEP. These definitions have been added in a >> glossary within the NEP. If there are any ideas for amendments to these >> definitions that we can agree on, I will update the NEP with those >> amendments. Also, if I missed any important terms which need to be added, >> please propose definitions for them. >> NA (Not Available) >> ? ? A placeholder for a value which is unknown to computations. That >> ? ? value may be temporarily hidden with a mask, may have been lost >> ? ? due to hard drive corruption, or gone for any number of reasons. >> ? ? This is the same as NA in the R project. > > Really? ?Can one implement NA with a mask in R? ?I thought an NA was > always bitpattern in R? I don't think that was what Mark was saying, see this bit later in this email: >> The most important distinctions I'm trying to draw are: >> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any >> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and >> IGNORE as mask are reasonable. This point as I understood it is there is the semantics of the special values (not available vs ignore), and there is the implementation (bitpattern vs mask), and they are independent. Peter From matthew.brett at gmail.com Wed Jul 6 13:01:11 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 6 Jul 2011 18:01:11 +0100 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: Hi, On Wed, Jul 6, 2011 at 5:48 PM, Peter wrote: > On Wed, Jul 6, 2011 at 5:38 PM, Matthew Brett wrote: >> >> Hi, >> >> On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: >>> It appears to me that one of the biggest reason some of us have been talking >>> past each other in the discussions is that different people have different >>> definitions for the terms being used. Until this is thoroughly cleared up, I >>> feel the design process is tilting at windmills. >>> In the interests of clarity in our discussions, here is a starting point >>> which is consistent with the NEP. These definitions have been added in a >>> glossary within the NEP. If there are any ideas for amendments to these >>> definitions that we can agree on, I will update the NEP with those >>> amendments. Also, if I missed any important terms which need to be added, >>> please propose definitions for them. >>> NA (Not Available) >>> ? ? A placeholder for a value which is unknown to computations. That >>> ? ? value may be temporarily hidden with a mask, may have been lost >>> ? ? due to hard drive corruption, or gone for any number of reasons. >>> ? ? This is the same as NA in the R project. >> >> Really? ?Can one implement NA with a mask in R? ?I thought an NA was >> always bitpattern in R? > > I don't think that was what Mark was saying, see this bit later in this email: I think it would make a difference if there was an implementation that had conflated masking with bitpatterns in terms of API. I don't think R is an example. >>> The most important distinctions I'm trying to draw are: >>> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any >>> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and >>> IGNORE as mask are reasonable. > > This point as I understood it is there is the semantics of the special values > (not available vs ignore), and there is the implementation (bitpattern vs > mask), and they are independent. Yes. Although, we can see from the implementations that we have to hand that a) bitpatterns -> propagation (NaN-like) semantics by default (R) b) masks -> ignore semantics by default (masked arrays) I don't think Mark accepts that there is any reason for this tendency of implementations to semantics, but Nathaniel was arguing otherwise in the alterNEP. I think we all accept that it's possible to imagine masking have propagation semantics and bitpatterns having ignore semantics. Cheers, Matthew From ben.root at ou.edu Wed Jul 6 13:11:04 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 6 Jul 2011 12:11:04 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 12:01 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 6, 2011 at 5:48 PM, Peter > wrote: > > On Wed, Jul 6, 2011 at 5:38 PM, Matthew Brett > wrote: > >> > >> Hi, > >> > >> On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: > >>> It appears to me that one of the biggest reason some of us have been > talking > >>> past each other in the discussions is that different people have > different > >>> definitions for the terms being used. Until this is thoroughly cleared > up, I > >>> feel the design process is tilting at windmills. > >>> In the interests of clarity in our discussions, here is a starting > point > >>> which is consistent with the NEP. These definitions have been added in > a > >>> glossary within the NEP. If there are any ideas for amendments to these > >>> definitions that we can agree on, I will update the NEP with those > >>> amendments. Also, if I missed any important terms which need to be > added, > >>> please propose definitions for them. > >>> NA (Not Available) > >>> A placeholder for a value which is unknown to computations. That > >>> value may be temporarily hidden with a mask, may have been lost > >>> due to hard drive corruption, or gone for any number of reasons. > >>> This is the same as NA in the R project. > >> > >> Really? Can one implement NA with a mask in R? I thought an NA was > >> always bitpattern in R? > > > > I don't think that was what Mark was saying, see this bit later in this > email: > > I think it would make a difference if there was an implementation that > had conflated masking with bitpatterns in terms of API. I don't think > R is an example. > > Of course R is not an example of that. Nothing is. This is merely conceptual. Separate NA from np.NA in Mark's NEP, and you will see his point. Consider it the logical intersection of NA in Mark's NEP and the aNEP. > >>> The most important distinctions I'm trying to draw are: > >>> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > >>> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > >>> IGNORE as mask are reasonable. > > > > This point as I understood it is there is the semantics of the special > values > > (not available vs ignore), and there is the implementation (bitpattern vs > > mask), and they are independent. > > Yes. Good, that's all Mark's definition guide is trying to do. > Although, we can see from the implementations that we have to hand that > > a) bitpatterns -> propagation (NaN-like) semantics by default (R) > b) masks -> ignore semantics by default (masked arrays) > The above is extraneous and out of the scope of Mark's definitions. We are taking this little-by-little. > I don't think Mark accepts that there is any reason for this tendency > of implementations to semantics, but Nathaniel was arguing otherwise > in the alterNEP. > > Then that is what we will debate *later*, once we establish definitions. > I think we all accept that it's possible to imagine masking have > propagation semantics and bitpatterns having ignore semantics. > Good! I think that is what Mark wanted to get across in this set of definitions. It kinda seems like you are champing at the bit here to continue the debate, but I agree with Mark that after yesterday's discussion, we need to make sure that we have a solid foundation for understanding each other. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Jul 6 13:41:24 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 6 Jul 2011 19:41:24 +0200 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: Ah, semantics... On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote: > > NA (Not Available) > A placeholder for a value which is unknown to computations. That > value may be temporarily hidden with a mask, may have been lost > due to hard drive corruption, or gone for any number of reasons. > This is the same as NA in the R project. I have a problem with 'temporarily hidden with a mask'. In my mind, the concept of NA carries a notion of perennation. The data is just not available, just as a NaN is just not a number. > IGNORE (Skip/Ignore) > A placeholder which should be treated by computations as if no value does > or could exist there. For sums, this means act as if the value > were zero, and for products, this means act as if the value were one. > It's as if the array were compressed in some fashion to not include > that element. A data temporarily hidden by a mask becomes np.IGNORE. > bitpattern > A technique for implementing either NA or IGNORE, where a particular > set of bit patterns are chosen from all the possible bit patterns of the > value's data type to signal that the element is NA or IGNORE. > > mask > A technique for implementing either NA or IGNORE, where a > boolean or enum array parallel to the data array is used to signal > which elements are NA or IGNORE. > > numpy.ma > The existing implementation of a particular form of masked arrays, > which is part of the NumPy codebase. OK with that. > > The most important distinctions I'm trying to draw are: > > 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and IGNORE as mask are reasonable. OK with that. > 2) The idea of masking and the numpy.ma implementation are different. The numpy.ma object makes particular choices about how to interpret the mask, but while backwards compatibility is important, a fresh evaluation of all the design choices going into a mask implementation is worthwhile. Indeed. From matthew.brett at gmail.com Wed Jul 6 13:44:38 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 6 Jul 2011 18:44:38 +0100 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: Hi, On Wed, Jul 6, 2011 at 6:11 PM, Benjamin Root wrote: > > > On Wed, Jul 6, 2011 at 12:01 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Jul 6, 2011 at 5:48 PM, Peter >> wrote: >> > On Wed, Jul 6, 2011 at 5:38 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: >> >>> It appears to me that one of the biggest reason some of us have been >> >>> talking >> >>> past each other in the discussions is that different people have >> >>> different >> >>> definitions for the terms being used. Until this is thoroughly cleared >> >>> up, I >> >>> feel the design process is tilting at windmills. >> >>> In the interests of clarity in our discussions, here is a starting >> >>> point >> >>> which is consistent with the NEP. These definitions have been added in >> >>> a >> >>> glossary within the NEP. If there are any ideas for amendments to >> >>> these >> >>> definitions that we can agree on, I will update the NEP with those >> >>> amendments. Also, if I missed any important terms which need to be >> >>> added, >> >>> please propose definitions for them. >> >>> NA (Not Available) >> >>> ? ? A placeholder for a value which is unknown to computations. That >> >>> ? ? value may be temporarily hidden with a mask, may have been lost >> >>> ? ? due to hard drive corruption, or gone for any number of reasons. >> >>> ? ? This is the same as NA in the R project. >> >> >> >> Really? ?Can one implement NA with a mask in R? ?I thought an NA was >> >> always bitpattern in R? >> > >> > I don't think that was what Mark was saying, see this bit later in this >> > email: >> >> I think it would make a difference if there was an implementation that >> had conflated masking with bitpatterns in terms of API. ?I don't think >> R is an example. >> > > Of course R is not an example of that.? Nothing is.? This is merely > conceptual.? Separate NA from np.NA in Mark's NEP, and you will see his > point.? Consider it the logical intersection of NA in Mark's NEP and the > aNEP. I am trying to work out what you feel you feel the points of discussion are. There's surely no point in continuing to debate things we agree on. I don't think anyone disputes (or has ever disputed) that: There can be missing data implemented with bitpatterns There can be missing data implemented with masks Missing data can have propagate semantics Missing data can have ignore semantics. The implementation does not in itself constrain the semantics. Let's not discuss that any more; we all agree. So what do you think is the source of the disagreement? Or are you saying that there should be no disagreement at this stage? Cheers, Matthew From cjordan1 at uw.edu Wed Jul 6 13:54:15 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 10:54:15 -0700 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 5:05 AM, Matthew Brett wrote: > Hi, > > Just for reference, I am using this as the latest version of the NEP - > I hope it's current: > > > https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst > > I'm mostly relaying stuff I said, although generally (please do > correct me if I am wrong) I am just re-expressing points that > Nathaniel has already made in the alterNEP text and the emails. > > On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire > wrote: > ... > > Since we only have Mark is only around Austin until early August, there's > > also broad agreement that we need to get something done quickly. > > I think I might have missed that part of the discussion :) > > I think that might have been mentioned by Travis right before he had to leave for another meeting, which might have been after you'd disconnected. Travis' concern as a member of a numpy community is the desire for something that is broadly applicable and adopted. But as Mark's employer, his concern is to get a more complete and coherent missing data functionality implemented in numpy while Mark is still at Enthought, for use in the problems Enthought and statisticians commonly encounter if nothing else. > I feel the need to emphasize the centrality of the assertion by > Nathaniel, and agreement by (at least) me, that the NA case (there > really is no data) and the IGNORE case (there is data but I'm > concealing it from you) are conceptually different, and come from > different use-cases. > > The underlying disagreement returned many times to this fundamental > difference between the NEP and alterNEP: > > In the NEP - by design - it is impossible to distinguish between na.NA > and na.IGNORE > The alterNEP insists you should be able to distinguish. > > Mark says something like "it's all missing data, there's no reason you > should want to distinguish". Nathaniel and I were saying "the two > types of missing do have different use-cases, and it should be > possible to distinguish. You might want to chose to treat them the > same, but you should be able to see what they are.". > > I returned several times to this (original point by Nathaniel): > > a[3] = np.NA > > (what does this mean? I am altering the underlying array, or a mask? > How would I explain this to someone?) > > We confirmed that, in order to make it difficult to know what your NA > is (masked or bit-pattern), Mark has to a) hinder access to the data > below the mask and b) prevent direct API access to the masking array. > I described this as 'hobbling the API' and Mark thought of it as > 'generic programming' (missing is always missing). > > I asserted that explaining NA to people would be easier if ``a[3] = > np.NA`` was direct assignment and altered the array. > > > BIT PATTERN & MASK IMPLEMENTATIONS FOR NA > > > ------------------------------------------------------------------------------------------ > > The current NEP proposes both mask and bit pattern implementations for > > missing data. I use the terms bit pattern and parameterized dtype > > interchangeably, since the parameterized dtype will use a bit pattern for > > its implementation. The two implementations will support the same > > functionality with respect to NA, and the implementation details will be > > largely invisible to the user. Their differences are in the 'extra' > features > > each supports. > > > > Two common questions were: > > 1. Why make two implementations of missing data: one with masks and the > > other with parameterized dtypes? > > 2. Why does the implementation using masks have higher priority? > > The answers are: > > 1. The mask implementation is more general and easier to implement and > > maintain. The bit pattern implementation saves memory, makes > > interoperability easier, and makes ABI (Application Binary Interface) > > compatibility easier. Since each has different strengths, the argument is > > both should be implemented. > > 2. The implementation for the parameterized dtypes will rely on the > > implementation using a mask. > > > > NA VS. IGNORE > > --------------------------------- > > A lot of discussion centered on IGNORE vs. NA types. We take IGNORE in > aNEP > > sense and NA in NEP sense. With NA, there is a clear notion of how NA > > propagates through all basic numpy operations. (e.g., 3+NA=NA and > log(NA) = > > NA, while NA | True = True.) IGNORE is separate from NA, with different > > interpretations depending on the use case. > > IGNORE could mean: > > 1. Data that is being temporarily ignored. e.g., a possible outlier that > is > > temporarily being removed from consideration. > > 2. Data that cannot exist. e.g., a matrix representing a grid of water > > depths for a lake. Since the lake isn't square, some entries will > represent > > land, and so depth will be a meaningless concept for those entries. > > 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], > [IGNORE, > > 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though > this > > leaves open how [1, 2, IGNORE] + [3 , 4] should behave. > > Because of these different uses of IGNORE, it doesn't have as clear a > > theoretical interpretation as NA. (For instance, what is IGNORE+3, > IGNORE*3, > > or IGNORE | True?) > > I don't remember this bit of the discussion, but I see from current > masked arrays that IGNORE is treated as the identity, so: > > IGNORE + 3 = 3 > IGNORE * 3 = 3 > > I'd mentioned at the top of my summary that some of the concrete examples weren't talked about, even though the ideas were. And the fact that IGNORE doesn't have a computational model behind it was mentioned briefly, though it wasn't expanded on. If we follow those rules for IGNORE for all computations, we sometimes get some weird output. For example: [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix multiply and not * with broadcasting.) Or should that sort of operation through an error? > But several of the discussants thought the use cases for IGNORE were very > > compelling. Specifically, they wanted to be able to use IGNORE's and NA's > > simultaneously while still being able to differentiate between them. So, > for > > example, being able to designate some data as IGNORE while still able to > > determine which data was NA but not IGNORE. The current NEP does not > allow > > for this directly. > > I think we discovered that the current NEP is designed to prevent us > distinguishing between these cases. > > I was asking what it was about the implementation (as opposed to the > API) that influenced the decision to make masked and bit-pattern > missing data appear to be identical. I left the conversation before > the end, but up until that point, had failed to understand. > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rowen at uw.edu Wed Jul 6 13:57:45 2011 From: rowen at uw.edu (Russell E. Owen) Date: Wed, 06 Jul 2011 10:57:45 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 References: Message-ID: In article , Ralf Gommers wrote: > On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen wrote: > > > In article , > > Ralf Gommers wrote: > > > > > https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > > > > Will there be a Mac binary for 32-bit pythons (one that is compatible > > with older versions of MacOS X)? At present I only see a 64-bit > > 10.6-only version. > > > > > > Yes there will be for the final release (10.4-10.6 compatible). I can't > create those on my own computer, so sometimes I don't make them for RCs. I'm glad they will be present for the final release. FYI: I built my own 1.6.1rc2 against Python 2.7.2 (the 32-bit Mac version from python.org). I reproduced a memory error that I've been trying to narrow down. This is ticket 1896: and the problem is also in 1.6.0. -- Russell From Chris.Barker at noaa.gov Wed Jul 6 14:01:03 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 06 Jul 2011 11:01:03 -0700 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: <4E14A2DF.2030303@noaa.gov> Christopher Jordan-Squire wrote: > Here's a short-ish summary of the topics discussed in the conference > call this afternoon. Thanks, this is great! And thanks to all who participated in the call. > 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], > [IGNORE, 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] > ]. whoooa! I actually have been looking for, and thinking about "jagged arrays" a fair bit lately, so this is kind of exciting, but this looks like a "bad idea" to me. The above indicates that: a = np.array( [ [1, 2, np.IGNORE], [np.IGNORE, 3, 4] ] a[:,1] would yield: array([2, 4]) which seems really wrong -- you've tossed out the location information altogether. ( think it should be: array([2, 3]) I could see a jagged array being represented by IGNOREs all at the END of each row, but putting items in the middle, and shifting things to the left strikes me as a plain old bad idea (and a pain to implement) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From matthew.brett at gmail.com Wed Jul 6 14:03:17 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 6 Jul 2011 19:03:17 +0100 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: Hi, On Wed, Jul 6, 2011 at 6:54 PM, Christopher Jordan-Squire wrote: > > > On Wed, Jul 6, 2011 at 5:05 AM, Matthew Brett > wrote: >> >> Hi, >> >> Just for reference, I am using this as the latest version of the NEP - >> I hope it's current: >> >> >> https://github.com/m-paradox/numpy/blob/7b10c9ab1616b9100e98dd2ab80cef639d5b5735/doc/neps/missing-data.rst >> >> I'm mostly relaying stuff I said, although generally (please do >> correct me if I am wrong) I am just re-expressing points that >> Nathaniel has already made in the alterNEP text and the emails. >> >> On Wed, Jul 6, 2011 at 12:46 AM, Christopher Jordan-Squire >> wrote: >> ... >> > Since we only have Mark is only around Austin until early August, >> > there's >> > also broad agreement that we need to get something done quickly. >> >> I think I might have missed that part of the discussion :) >> > > I think that might have been mentioned by Travis right before he had to > leave for another meeting, which might have been after you'd disconnected. > Travis' concern as a member of a numpy community is the desire for something > that is broadly applicable and adopted. But as Mark's employer, his concern > is to get a more complete and coherent missing data functionality > implemented in numpy while Mark is still at Enthought, for use in the > problems Enthought and statisticians commonly encounter if nothing else. Sorry - yes - I wasn't there for all the conversation. Of course (not disagreeing), we must take care to get the API right because it's unlikely to change and will be explaining and supporting it for a long time to come. >> I feel the need to emphasize the centrality of the assertion by >> Nathaniel, and agreement by (at least) me, that the NA case (there >> really is no data) and the IGNORE case (there is data but I'm >> concealing it from you) are conceptually different, and come from >> different use-cases. >> >> The underlying disagreement returned many times to this fundamental >> difference between the NEP and alterNEP: >> >> In the NEP - by design - it is impossible to distinguish between na.NA >> and na.IGNORE >> The alterNEP insists you should be able to distinguish. >> >> Mark says something like "it's all missing data, there's no reason you >> should want to distinguish". ?Nathaniel and I were saying "the two >> types of missing do have different use-cases, and it should be >> possible to distinguish. ?You might want to chose to treat them the >> same, but you should be able to see what they are.". >> >> I returned several times to this (original point by Nathaniel): >> >> a[3] = np.NA >> >> (what does this mean? ? I am altering the underlying array, or a mask? >> ?How would I explain this to someone?) >> >> We confirmed that, in order to make it difficult to know what your NA >> is (masked or bit-pattern), Mark has to a) hinder access to the data >> below the mask and b) prevent direct API access to the masking array. >> I described this as 'hobbling the API' and Mark thought of it as >> 'generic programming' (missing is always missing). >> >> I asserted that explaining NA to people would be easier if ``a[3] = >> np.NA`` was direct assignment and altered the array. >> >> > BIT PATTERN & MASK IMPLEMENTATIONS FOR NA >> > >> > ------------------------------------------------------------------------------------------ >> > The current NEP proposes both mask and bit pattern implementations for >> > missing data. I use the terms bit pattern and parameterized dtype >> > interchangeably, since the parameterized dtype will use a bit pattern >> > for >> > its implementation. The two implementations will support the same >> > functionality with respect to NA, and the implementation details will be >> > largely invisible to the user. Their differences are in the 'extra' >> > features >> > each supports. >> > >> > Two common questions were: >> > 1. Why make two implementations of missing data: one with masks and the >> > other with parameterized dtypes? >> > 2. Why does the implementation using masks have higher priority? >> > The answers are: >> > 1.??The mask implementation is more general and easier to implement and >> > maintain. ?The bit pattern implementation saves memory, makes >> > interoperability easier, and makes ABI (Application Binary Interface) >> > compatibility easier. Since each has different strengths, the argument >> > is >> > both should be implemented. >> > 2. The implementation for the parameterized dtypes will rely on the >> > implementation using a mask. >> > >> > NA VS. IGNORE >> > --------------------------------- >> > A lot of discussion centered on IGNORE vs. NA types. We take IGNORE in >> > aNEP >> > sense and NA in ?NEP sense. With NA, there is a clear notion of how NA >> > propagates through all basic numpy operations. ?(e.g., 3+NA=NA and >> > log(NA) = >> > NA, while NA | True = True.) IGNORE is separate from NA, with different >> > interpretations depending on the use case. >> > IGNORE could mean: >> > 1. Data that is being temporarily ignored. e.g., a possible outlier that >> > is >> > temporarily being removed from consideration. >> > 2. Data that cannot exist. e.g., a matrix representing a grid of water >> > depths for a lake. Since the lake isn't square, some entries will >> > represent >> > land, and so depth will be a meaningless concept for those entries. >> > 3. Using IGNORE to signal a jagged array. e.g., [ [1, 2, IGNORE], >> > [IGNORE, >> > 3, 4] ] should behave exactly the same as [ [1 , 2] , [3 , 4] ]. Though >> > this >> > leaves open how [1, 2, IGNORE] + [3 , 4] should behave. >> > Because of these different uses of IGNORE, it doesn't have as clear a >> > theoretical interpretation as NA. (For instance, what is IGNORE+3, >> > IGNORE*3, >> > or IGNORE | True?) >> >> I don't remember this bit of the discussion, but I see from current >> masked arrays that IGNORE is treated as the identity, so: >> >> IGNORE + 3 = 3 >> IGNORE * 3 = 3 >> > > I'd mentioned at the top of my summary that some of the concrete examples > weren't talked about, even though the ideas were. And the fact that IGNORE > doesn't have a computational model behind it was mentioned briefly, though > it wasn't expanded on. > If we follow those rules for IGNORE for all computations, we sometimes get > some weird output. For example: > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix multiply > and not * with broadcasting.) Or should that sort of operation through an > error? I'm sorry to say that I haven't thought about ignore semantics very much! What does masked array do? See you, Matthew From njs at pobox.com Wed Jul 6 14:10:34 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jul 2011 11:10:34 -0700 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary In-Reply-To: <4E145F42.9060007@astro.uio.no> References: <4E145F42.9060007@astro.uio.no> Message-ID: On Wed, Jul 6, 2011 at 6:12 AM, Dag Sverre Seljebotn wrote: > What I'm saying is that Mark's proposal is more flexible. Say for the > sake of the argument that I have two codes I need to interface with: > > ?- Library A is written in Fortran and uses a seperate (explicit) mask > array for NA > > ?- Library B runs on a GPU and uses a bit pattern for NA Have you ever encountered any such codes? I'm not aware of any code outside of R that implements the proposed NA semantics -- esp. in high-performance code, people generally want to avoid lots of conditionals, and the proposed NA semantics require a branch around every operation inside your inner loops. Certainly there is code out there that uses NaNs, and code that uses masks (in various ways that might or might not match the way the NEP uses them). And it's easy to work with both from numpy right now. The question is whether and how the core should add some tricky and subtle semantics for a few very specific ways of handling NaN-like objects and masking. Upthread you also wrote: > At least I feel that the transparency of NumPy is a huge part of its > current success. Many more than me spend half their time in C/Fortran > and half their time in Python. It's exactly this transparency that worries Matthew and me -- we feel that the alterNEP preserves it, and the NEP attempts to erase it. In the NEP, there are two totally different underlying data structures, but this difference is blurred at the Python level. The idea is that you shouldn't have to think about which you have, but if you work with C/Fortran, then of course you do have to be constantly aware of the underlying implementation anyway. And operations which would obviously make sense for the some of the objects that you know you're working with (e.g., unmasking elements from a masked array, or even accessing the mask directly using numpy slicing) are disallowed, specifically in order to make this distinction harder to make. According to the NEP, C code that takes a masked array should never ever unmask any element; unmasking should only be done by making a full copy of the mask, and attaching it to a new view taken from the original array. Would you honestly feel obliged to follow this requirement in your C code? Or would you just unmask elements in place when it made sense, in order to save memory? -- Nathaniel From cjordan1 at uw.edu Wed Jul 6 14:10:44 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 11:10:44 -0700 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 10:44 AM, Matthew Brett wrote: > Hi, > > On Wed, Jul 6, 2011 at 6:11 PM, Benjamin Root wrote: > > > > > > On Wed, Jul 6, 2011 at 12:01 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Wed, Jul 6, 2011 at 5:48 PM, Peter > >> wrote: > >> > On Wed, Jul 6, 2011 at 5:38 PM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe > wrote: > >> >>> It appears to me that one of the biggest reason some of us have been > >> >>> talking > >> >>> past each other in the discussions is that different people have > >> >>> different > >> >>> definitions for the terms being used. Until this is thoroughly > cleared > >> >>> up, I > >> >>> feel the design process is tilting at windmills. > >> >>> In the interests of clarity in our discussions, here is a starting > >> >>> point > >> >>> which is consistent with the NEP. These definitions have been added > in > >> >>> a > >> >>> glossary within the NEP. If there are any ideas for amendments to > >> >>> these > >> >>> definitions that we can agree on, I will update the NEP with those > >> >>> amendments. Also, if I missed any important terms which need to be > >> >>> added, > >> >>> please propose definitions for them. > >> >>> NA (Not Available) > >> >>> A placeholder for a value which is unknown to computations. That > >> >>> value may be temporarily hidden with a mask, may have been lost > >> >>> due to hard drive corruption, or gone for any number of reasons. > >> >>> This is the same as NA in the R project. > >> >> > >> >> Really? Can one implement NA with a mask in R? I thought an NA was > >> >> always bitpattern in R? > >> > > >> > I don't think that was what Mark was saying, see this bit later in > this > >> > email: > >> > >> I think it would make a difference if there was an implementation that > >> had conflated masking with bitpatterns in terms of API. I don't think > >> R is an example. > >> > > > > Of course R is not an example of that. Nothing is. This is merely > > conceptual. Separate NA from np.NA in Mark's NEP, and you will see his > > point. Consider it the logical intersection of NA in Mark's NEP and the > > aNEP. > > I am trying to work out what you feel you feel the points of > discussion are. There's surely no point in continuing to debate > things we agree on. > > I don't think anyone disputes (or has ever disputed) that: > > There can be missing data implemented with bitpatterns > There can be missing data implemented with masks > Missing data can have propagate semantics > Missing data can have ignore semantics. > The implementation does not in itself constrain the semantics. > > So, to be clear, is your concern is that you want to be able to tell difference between whether an np.NA comes from the bit pattern or the mask in its implementation? But why would you have both the parameterized dtype and the mask implementation at the same time? They implement the same abstraction. Is your desire that the np.NA's are implemented solely through bit patterns and np.IGNORE is implemented solely through masks? So that you can think of the masks as being IGNORE flags? What if you want multiple types of IGNORE? (To ignore certain values because they're outliers, others because the data wouldn't make sense, and others because you're just focusing on a particular subgroup, for instance.) A related question is if the IGNORE values could just be another NA value? I don't understand what the specific problem would be with having several NA values, say NA(1), NA(2), ..., and then letting the user decide that NA(1) means NA in the sense discussed above and NA(2) means IGNORE. Then the ufuncs could be told whether to ignore or propagate each type of NA value. Could you explain to me if this would resolve your concerns about NA/IGNORE, or possibly give a few examples if it doesn't? Because I am still rather confused. Let's not discuss that any more; we all agree. So what do you think > is the source of the disagreement? > > Or are you saying that there should be no disagreement at this stage? > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Jul 6 14:22:32 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 06 Jul 2011 11:22:32 -0700 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: <4E1454C9.5010209@astro.uio.no> References: <4E1454C9.5010209@astro.uio.no> Message-ID: <4E14A7E8.8040800@noaa.gov> Dag Sverre Seljebotn wrote: > Here's an HPC perspective...: > At least I feel that the transparency of NumPy is a huge part of its > current success. Many more than me spend half their time in C/Fortran > and half their time in Python. Absolutely -- and this point has been raised a couple times in the discussion, so I hope it is not forgotten. > I tend to look at NumPy this way: Assuming you have some data in memory > (possibly loaded by a C or Fortran library). (Almost) no matter how it > is allocated, ordered, packed, aligned -- there's a way to find strides > and dtypes to put a nice NumPy wrapper around it and use the memory from > Python. and vice-versa -- Assuming you have some data in numpy arrays, there's a way to process it with a C or Fortran library without copying the data. And this is where I am skeptical of the bit-pattern idea -- while one can expect C and fortran and GPU, and ??? to understand NaNs for floating point data, is there any support in compilers or hardware for special bit patterns for NA values to integers? I've never seen in my (very limited experience). Maybe having the mask option, too, will make that irrelevant, but I want to be clear about that kind of use case. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Jul 6 14:25:23 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 06 Jul 2011 11:25:23 -0700 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: <4E14A893.7040407@noaa.gov> Mark Wiebe wrote: > 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > IGNORE as mask are reasonable. Is this really true? if you use a bitpattern for IGNORE, haven't you just lost the ability to get the original value back if you want to stop ignoring it? Maybe that's not inherent to what an IGNORE means, but it seems pretty key to me. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From mwwiebe at gmail.com Wed Jul 6 14:36:44 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 13:36:44 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 12:01 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 6, 2011 at 5:48 PM, Peter > wrote: > > On Wed, Jul 6, 2011 at 5:38 PM, Matthew Brett > wrote: > >> > >> Hi, > >> > >> On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: > >>> It appears to me that one of the biggest reason some of us have been > talking > >>> past each other in the discussions is that different people have > different > >>> definitions for the terms being used. Until this is thoroughly cleared > up, I > >>> feel the design process is tilting at windmills. > >>> In the interests of clarity in our discussions, here is a starting > point > >>> which is consistent with the NEP. These definitions have been added in > a > >>> glossary within the NEP. If there are any ideas for amendments to these > >>> definitions that we can agree on, I will update the NEP with those > >>> amendments. Also, if I missed any important terms which need to be > added, > >>> please propose definitions for them. > >>> NA (Not Available) > >>> A placeholder for a value which is unknown to computations. That > >>> value may be temporarily hidden with a mask, may have been lost > >>> due to hard drive corruption, or gone for any number of reasons. > >>> This is the same as NA in the R project. > >> > >> Really? Can one implement NA with a mask in R? I thought an NA was > >> always bitpattern in R? > > > > I don't think that was what Mark was saying, see this bit later in this > email: > > I think it would make a difference if there was an implementation that > had conflated masking with bitpatterns in terms of API. I don't think > R is an example. > This reminds me of another confusion I've seen in the list. I'd like to suggest that we ban the word API by itself from the present discussion, and always specify Python API or C API for clarity's sake. Here are my suggested definitions for these two terms: Python API All the interface mechanisms that are exposed to Python code for using missing values in NumPy. This API is designed to be Pythonic and fit into the way NumPy works as much as possible. C API All the implementation mechanisms exposed for CPython extensions written in C that want to support NumPy missing value support. This API is designed to be as natural as possible in C, and is usually prioritizes flexibility and high performance. Before we proceed to any discussion of what are good/bad choices, I really want to nail this down from just the definition perspective. I don't want arbitrary choices baked into the terms we use, because that implies already having made a design decision. -Mark > > >>> The most important distinctions I'm trying to draw are: > >>> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > >>> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > >>> IGNORE as mask are reasonable. > > > > This point as I understood it is there is the semantics of the special > values > > (not available vs ignore), and there is the implementation (bitpattern vs > > mask), and they are independent. > > Yes. Although, we can see from the implementations that we have to hand > that > > a) bitpatterns -> propagation (NaN-like) semantics by default (R) > b) masks -> ignore semantics by default (masked arrays) > > I don't think Mark accepts that there is any reason for this tendency > of implementations to semantics, but Nathaniel was arguing otherwise > in the alterNEP. > > I think we all accept that it's possible to imagine masking have > propagation semantics and bitpatterns having ignore semantics. > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Jul 6 14:38:02 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 06 Jul 2011 11:38:02 -0700 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: Message-ID: <4E14AB8A.90707@noaa.gov> Christopher Jordan-Squire wrote: > If we follow those rules for IGNORE for all computations, we sometimes > get some weird output. For example: > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix > multiply and not * with broadcasting.) Or should that sort of operation > through an error? That should throw an error -- matrix computation is heavily influenced by the shape and size of matrices, so I think IGNORES really don't make sense there. Nathaniel Smith wrote: > It's exactly this transparency that worries Matthew and me -- we feel > that the alterNEP preserves it, and the NEP attempts to erase it. In > the NEP, there are two totally different underlying data structures, > but this difference is blurred at the Python level. The idea is that > you shouldn't have to think about which you have, but if you work with > C/Fortran, then of course you do have to be constantly aware of the > underlying implementation anyway. I don't think this bothers me -- I think it's analogous to things in numpy like Fortran order and non-contiguous arrays -- you can ignore all that when working in pure python when performance isn't critical, but you need a deeper understanding if you want to work with the data in C or Fortran or to tune performance in python. So as long as there is an API to query and control how things work, I like that it's hidden from simple python code. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From d.s.seljebotn at astro.uio.no Wed Jul 6 14:39:37 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 06 Jul 2011 20:39:37 +0200 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary In-Reply-To: References: <4E145F42.9060007@astro.uio.no> Message-ID: <4E14ABE9.8030903@astro.uio.no> On 07/06/2011 08:10 PM, Nathaniel Smith wrote: > On Wed, Jul 6, 2011 at 6:12 AM, Dag Sverre Seljebotn > wrote: >> What I'm saying is that Mark's proposal is more flexible. Say for the >> sake of the argument that I have two codes I need to interface with: >> >> - Library A is written in Fortran and uses a seperate (explicit) mask >> array for NA >> >> - Library B runs on a GPU and uses a bit pattern for NA > > Have you ever encountered any such codes? I'm not aware of any code > outside of R that implements the proposed NA semantics -- esp. in > high-performance code, people generally want to avoid lots of > conditionals, and the proposed NA semantics require a branch around > every operation inside your inner loops. I'll admit that this whole thing was an hypothetical exercise. I've interfaced with Fortran code with NA values -- not a high performance case, but not all you interface with is high performance. > Certainly there is code out there that uses NaNs, and code that uses > masks (in various ways that might or might not match the way the NEP > uses them). And it's easy to work with both from numpy right now. The > question is whether and how the core should add some tricky and subtle > semantics for a few very specific ways of handling NaN-like objects > and masking. I don't disagree with this. > It's exactly this transparency that worries Matthew and me -- we feel > that the alterNEP preserves it, and the NEP attempts to erase it. In > the NEP, there are two totally different underlying data structures, > but this difference is blurred at the Python level. The idea is that > you shouldn't have to think about which you have, but if you work with > C/Fortran, then of course you do have to be constantly aware of the > underlying implementation anyway. And operations which would obviously > make sense for the some of the objects that you know you're working > with (e.g., unmasking elements from a masked array, or even accessing > the mask directly using numpy slicing) are disallowed, specifically in > order to make this distinction harder to make. This worries me too. What I was thinking is that it could be sort of like indexing -- it works OK to have indexing be transparent in Python-land with respect to striding, and have a contiguous array be just a special case marked by an attribute. If you want, you can still check the strides or flags attributes. > According to the NEP, C code that takes a masked array should never > ever unmask any element; unmasking should only be done by making a > full copy of the mask, and attaching it to a new view taken from the > original array. Would you honestly feel obliged to follow this > requirement in your C code? Or would you just unmask elements in place > when it made sense, in order to save memory? I'm with you on this one: I wouldn't adopt any NumPy feature widely unless I had totally transparent access to the underlying implementation details from C -- without relying on any NumPy headers (except in my Cython wrappers)! I don't believe in APIs, I believe in standardized binary data. But I always assumed that could be done down the road, once the internal details had stabilized. As for myself, I'll admit that I'll almost certainly continue with explicit masking without using any of the proposed NEPs -- I have to be extremely aware of the masks in the statistical methods I use. Perhaps that's a sign I should withdraw from the discussion. Dag Sverre From mwwiebe at gmail.com Wed Jul 6 14:42:23 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 13:42:23 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 11:33 AM, Peter < numpy-discussion at maubp.freeserve.co.uk> wrote: > On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: > > It appears to me that one of the biggest reason some of us have been > talking > > past each other in the discussions is that different people have > different > > definitions for the terms being used. Until this is thoroughly cleared > up, I > > feel the design process is tilting at windmills. > > In the interests of clarity in our discussions, here is a starting point > > which is consistent with the NEP. These definitions have been added in a > > glossary within the NEP. If there are any ideas for amendments to these > > definitions that we can agree on, I will update the NEP with those > > amendments. Also, if I missed any important terms which need to be added, > > please propose definitions for them. > > That sounds good - I've only been scanning these discussions and it > is confusing. > > > NA (Not Available) > > A placeholder for a value which is unknown to computations. That > > value may be temporarily hidden with a mask, may have been lost > > due to hard drive corruption, or gone for any number of reasons. > > This is the same as NA in the R project. > > Could you expand that to say how sums and products act with NA > (since you do so for the IGNORE case). > I've added that, here's the new version: NA (Not Available) A placeholder for a value which is unknown to computations. That value may be temporarily hidden with a mask, may have been lost due to hard drive corruption, or gone for any number of reasons. For sums and products this means to produce NA if any of the inputs are NA. This is the same as NA in the R project. Thanks, -Mark > > Thanks, > > Peter > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jul 6 14:48:55 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 13:48:55 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 11:38 AM, Matthew Brett wrote: > Hi, > > On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe wrote: > > It appears to me that one of the biggest reason some of us have been > talking > > past each other in the discussions is that different people have > different > > definitions for the terms being used. Until this is thoroughly cleared > up, I > > feel the design process is tilting at windmills. > > In the interests of clarity in our discussions, here is a starting point > > which is consistent with the NEP. These definitions have been added in a > > glossary within the NEP. If there are any ideas for amendments to these > > definitions that we can agree on, I will update the NEP with those > > amendments. Also, if I missed any important terms which need to be added, > > please propose definitions for them. > > NA (Not Available) > > A placeholder for a value which is unknown to computations. That > > value may be temporarily hidden with a mask, may have been lost > > due to hard drive corruption, or gone for any number of reasons. > > This is the same as NA in the R project. > > Really? Can one implement NA with a mask in R? I thought an NA was > always bitpattern in R? > > > IGNORE (Skip/Ignore) > > A placeholder which should be treated by computations as if no value > > does > > or could exist there. For sums, this means act as if the value > > were zero, and for products, this means act as if the value were one. > > It's as if the array were compressed in some fashion to not include > > that element. > > bitpattern > > A technique for implementing either NA or IGNORE, where a particular > > set of bit patterns are chosen from all the possible bit patterns of > the > > value's data type to signal that the element is NA or IGNORE. > > mask > > A technique for implementing either NA or IGNORE, where a > > boolean or enum array parallel to the data array is used to signal > > which elements are NA or IGNORE. > > numpy.ma > > The existing implementation of a particular form of masked arrays, > > which is part of the NumPy codebase. > > > > The most important distinctions I'm trying to draw are: > > 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > > combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > > IGNORE as mask are reasonable. > > 2) The idea of masking and the numpy.ma implementation are different. > The > > numpy.ma object makes particular choices about how to interpret the > mask, > > but while backwards compatibility is important, a fresh evaluation of all > > the design choices going into a mask implementation is worthwhile. > > I agree that there has been some confusion due to the terms. > > However, I continue to believe that the discussion is substantial and > not due to confusion. > I believe this is true as well, but the confusion due to the terms appears to be one of the root causes preventing the ideas from getting across. Without first clearing up this aspect of the discussion, things will stay confusing. > Let us then characterize the substantial discussion as this: > > NEP: bitpattern and masked out values should be made nearly impossible > to distinguish in the API > alterNEP: bitpattern and masked out values should be distinct in the > API so that it can be made clear which is meant (and therefore, > implicitly, how they are implemented). > > Do you agree that this is the discussion? > I'd like to get agreement on the definitions before moving to any of the points of contention that are being raised. Thanks, -Mark > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed Jul 6 14:49:48 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 06 Jul 2011 20:49:48 +0200 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary In-Reply-To: References: <4E145F42.9060007@astro.uio.no> Message-ID: <4E14AE4C.2030207@astro.uio.no> On 07/06/2011 04:47 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 6, 2011 at 2:12 PM, Dag Sverre Seljebotn > wrote: >> I just commented on the "prevent direct API access to the masking array" >> part -- I'm hoping direct access by external code to the underlying >> implementation details will be allowed, at some point. >> >> What I'm saying is that Mark's proposal is more flexible. Say for the >> sake of the argument that I have two codes I need to interface with: >> >> - Library A is written in Fortran and uses a seperate (explicit) mask >> array for NA >> >> - Library B runs on a GPU and uses a bit pattern for NA >> >> Mark's proposal then comes closer to allowing me to wrap both codes >> using NumPy, since it supports both implementation mechanisms. Sure, it >> would need a seperate NEP down the road to extend it, but it goes in the >> right direction for this to happen. > > I'm sorry - honestly - maybe it's because I've just had lunch, but I > think I am not understanding something. When you say "Mark's > proposal is more flexible" - more flexible than what? I think we > agree that: > > * NA bitpatterns are good to have > * masks are good to have > > and the discussion is about: > > * should it be possible to distinguish between bitpatterns (NAs) and > masks (IGNORE). I guess I just don't agree with these definitions. There's (NA, IGNORE), and there's (bitpatterns, masks); these are in principle orthogonal. It is possible (and perhaps reasonable) to hard-wire them they way you say -- that may be more obvious, user-friendly, etc., but it is not more flexible. Both Mark and Chuck have explicitly supported having many different NA types down the road (thread: "An NA compromise idea -- many-NA"). So the main difference to me seems to be that you want to hard-wire the NA type and the representation in a specific configuration. I may be missing something though. > Are you saying that making it not-possible to distinguish - at the > numpy level, is more flexible? I'm OK with the "common" ways of accessing data to not distinguish, as long as there's some poweruser way around it. Just like strides -- you index a strided array just like a contiguous array, but you can peek inside into the implementation if you want. Dag Sverre From mwwiebe at gmail.com Wed Jul 6 14:56:01 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 13:56:01 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 12:41 PM, Pierre GM wrote: > Ah, semantics... > > On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote: > > > > NA (Not Available) > > A placeholder for a value which is unknown to computations. That > > value may be temporarily hidden with a mask, may have been lost > > due to hard drive corruption, or gone for any number of reasons. > > This is the same as NA in the R project. > > I have a problem with 'temporarily hidden with a mask'. In my mind, the > concept of NA carries a notion of perennation. The data is just not > available, just as a NaN is just not a number. > Yes, this gets directly to what I've been meaning when I say NA vs IGNORE is independent of mask vs bitpattern. The way I'm trying to structure things, NA vs IGNORE only affects the semantic meaning, i.e. the outputs produced by computations. This is precisely why I put 'temporarily hidden with a mask' first, to make that more clear. > > IGNORE (Skip/Ignore) > > A placeholder which should be treated by computations as if no value > does > > or could exist there. For sums, this means act as if the value > > were zero, and for products, this means act as if the value were one. > > It's as if the array were compressed in some fashion to not include > > that element. > > A data temporarily hidden by a mask becomes np.IGNORE. > Are you willing to suspend the idea of that implication for the purposes of the present discussion? If not, do you see a way to amend things so that masked NAs and bitpattern-based IGNOREs make sense? Would renaming IGNORE to SKIP be more clear, perhaps? Thanks, Mark > > > > bitpattern > > A technique for implementing either NA or IGNORE, where a particular > > set of bit patterns are chosen from all the possible bit patterns of > the > > value's data type to signal that the element is NA or IGNORE. > > > > mask > > A technique for implementing either NA or IGNORE, where a > > boolean or enum array parallel to the data array is used to signal > > which elements are NA or IGNORE. > > > > numpy.ma > > The existing implementation of a particular form of masked arrays, > > which is part of the NumPy codebase. > > OK with that. > > > > > > > The most important distinctions I'm trying to draw are: > > > > 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > IGNORE as mask are reasonable. > > OK with that. > > > > > 2) The idea of masking and the numpy.ma implementation are different. > The numpy.ma object makes particular choices about how to interpret the > mask, but while backwards compatibility is important, a fresh evaluation of > all the design choices going into a mask implementation is worthwhile. > > Indeed. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Wed Jul 6 14:56:02 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Wed, 06 Jul 2011 11:56:02 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: Message-ID: <4E14AFC2.6000704@uci.edu> On 7/6/2011 10:57 AM, Russell E. Owen wrote: > In article > , > Ralf Gommers wrote: > >> On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen wrote: >> >>> In article, >>> Ralf Gommers wrote: >>> >>>> https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ >>> >>> Will there be a Mac binary for 32-bit pythons (one that is compatible >>> with older versions of MacOS X)? At present I only see a 64-bit >>> 10.6-only version. >>> >>> >>> Yes there will be for the final release (10.4-10.6 compatible). I can't >> create those on my own computer, so sometimes I don't make them for RCs. > > I'm glad they will be present for the final release. > > FYI: I built my own 1.6.1rc2 against Python 2.7.2 (the 32-bit Mac > version from python.org). I reproduced a memory error that I've been > trying to narrow down. This is ticket 1896: > > and the problem is also in 1.6.0. > > -- Russell > I can reproduce this error on Windows. It looks like a serious regression. Christoph From mwwiebe at gmail.com Wed Jul 6 14:57:50 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 13:57:50 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: <4E14A893.7040407@noaa.gov> References: <4E14A893.7040407@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 1:25 PM, Christopher Barker wrote: > Mark Wiebe wrote: > > 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > > combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > > IGNORE as mask are reasonable. > > Is this really true? if you use a bitpattern for IGNORE, haven't you > just lost the ability to get the original value back if you want to stop > ignoring it? Maybe that's not inherent to what an IGNORE means, but it > seems pretty key to me. > What do you think of renaming IGNORE to SKIP? -Mark > > -Chris > > > > > > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Wed Jul 6 15:09:47 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 06 Jul 2011 21:09:47 +0200 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: <4E14A893.7040407@noaa.gov> References: <4E14A893.7040407@noaa.gov> Message-ID: <4E14B2FB.7090904@astro.uio.no> On 07/06/2011 08:25 PM, Christopher Barker wrote: > Mark Wiebe wrote: >> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any >> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and >> IGNORE as mask are reasonable. > > Is this really true? if you use a bitpattern for IGNORE, haven't you > just lost the ability to get the original value back if you want to stop > ignoring it? Maybe that's not inherent to what an IGNORE means, but it > seems pretty key to me. There's the question of how reductions treats the value. IIUC, IGNORE as bitpattern would imply that reductions treat the value as 0, which is a question orthogonal to whether the value can possibly be unmasked or not. Dag Sverre From njs at pobox.com Wed Jul 6 15:20:09 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jul 2011 12:20:09 -0700 Subject: [Numpy-discussion] towards a more productive missing values/masked arrays discussion... Message-ID: So one thing that came up on the call yesterday is that there actually is a significant chunk of functionality that everyone seems to agree is useful, needed, and basically how it should work. This includes: -- the basic existence and semantics for NA values (however this is implemented) -- that there should exist a dtype/bit-pattern implementation for NAs (whatever other implementations there might also be) -- that ufunc's should take a where= argument -- that there should be a better way for ndarray subclasses like numpy.ma to override the arguments to ufuncs involving them -- maybe some other things I'm not thinking of The real controversy is around what role masking should play, both at the API and implementation level; there are lots of different arguments for different approaches, and it's not at all clear any current proposal will actually solve the problems are facing (or even what those problems are). So rather than continue to go around in circles indefinitely on that, I'm going to write up some "miniNEPs" just focusing on the details of how the features we do agree on should work, so we can hopefully have a more technical discussion of *that*. Cheers, -- Nathaniel From mwwiebe at gmail.com Wed Jul 6 15:24:05 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 14:24:05 -0500 Subject: [Numpy-discussion] towards a more productive missing values/masked arrays discussion... In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 2:20 PM, Nathaniel Smith wrote: > So one thing that came up on the call yesterday is that there actually > is a significant chunk of functionality that everyone seems to agree > is useful, needed, and basically how it should work. > > This includes: > -- the basic existence and semantics for NA values (however this is > implemented) > -- that there should exist a dtype/bit-pattern implementation for > NAs (whatever other implementations there might also be) > -- that ufunc's should take a where= argument > -- that there should be a better way for ndarray subclasses like > numpy.ma to override the arguments to ufuncs involving them > -- maybe some other things I'm not thinking of > > The real controversy is around what role masking should play, both at > the API and implementation level; there are lots of different > arguments for different approaches, and it's not at all clear any > current proposal will actually solve the problems are facing (or even > what those problems are). > > So rather than continue to go around in circles indefinitely on that, > I'm going to write up some "miniNEPs" just focusing on the details of > how the features we do agree on should work, so we can hopefully have > a more technical discussion of *that*. > That sounds alright to me. One thing I would like to ask is to please adopt the vocabulary we are discussing, using it exactly as defined so that people reading all the various ideas don't have to readjust when switching between documents. Thanks, Mark > > Cheers, > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 6 15:26:24 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jul 2011 12:26:24 -0700 Subject: [Numpy-discussion] miniNEP1: where= argument for ufuncs Message-ID: Here's the master copy: https://gist.github.com/1068056 But for your commenting convenience, I'll include the current text here: ############################################ A mini-NEP for the where= argument to ufuncs ############################################ To try and make more progress on the whole missing values/masked arrays/... debate, it seems useful to have a more technical discussion of the pieces which we *can* agree on. This is the first, which attempts to nail down the details of the new ``where=`` argument to ufuncs. ********* Rationale ********* It is often useful to apply operations to a subset of your data, and numpy provides a rich interface for accomplishing this, by combining indexing operations with ufunc operations, e.g.:: a[10, mymask] += b np.sum(a[which_indices], axis=0) But any kind of complex indexing necessarily requires making a temporary copy of (parts of) the underlying array, which can be quite expensive, and this copying could be avoided by teaching the ufunc loop to `index as it goes'. There are strong arguments against doing this. There are tons of cases like this where one can save some memory by avoiding temporaries, and we can't build them all into the core -- especially since we also have more general solutions like numexpr or writing optimized routines C/Fortran/Cython. Furthermore, this case is a clear violation of orthogonality -- we already have indexing and ufuncs as separate things, so adding a second, somewhat crippled implementation of indexing to ufuncs themselves is a bit ugly. (It would be better if we could make sure that anything that could be passed to ndarray.__getitem__ could also be passed to ufuncs with the same semantics, but this would require substantial refactoring and seems unlikely to be implemented any time soon.) However, *** API *** A new optional, keyword argument named ``where=`` will be added to all ufuncs. -------------- Error checking -------------- If given, this argument must be a boolean array. If ``f`` is a ufunc, then given a function call like:: f(a, b, where=mymask) the following occurs. First, ``mymask`` is coerced to an array if necessary, but no type conversion is performed. (I.e., we do ``np.asarray(mymask)``.) Next, we check whether ``mymask`` is a boolean array. If it is not, then we raise an exception. (In the future it would be nice to support other forms of indexing as well, such as lists of slices or arrays of integer indices. In order to preserve this option, we do not want to coerce integers into booleans.) Next, ``a`` and ``b`` are broadcast against each other, just as now; this determines the shape of the output array. Then ``mymask`` is broadcast to match this output array shape. (The shape of the output array cannot be changed by this process -- for example, having ``a.shape == (10, 1, 1)``, ``b.shape = (1, 10, 1)``, ``mymask.shape == (1, 1, 10)`` will raise an error rather than returning a new array with shape ``(10, 10, 10)``.) ----------------------------- Semantics: ufunc ``__call__`` ----------------------------- When simply calling a ufunc with an output argument, e.g.:: f(a, b, out=c, where=mymask) then the result is equivalent to:: c[mymask] = f(a[mymask], b[mymask]) On the other hand, if no output argument is given:: f(a, b, where=mymask) then an output array is instantiated as if by calling ``np.empty(shape, dtype=dtype)``, and then treated as above:: c = np.empty(shape_for(a, b), dtype=dtype_for(f, a, b)) f(a, b, out=c, where=mymask) return c Note that this means that the output will, in general, contain uninitialized values. ---------------------------- Semantics: ufunc ``.reduce`` ---------------------------- Take an expression like:: f.reduce(a, axis=0, where=mymask) This performs the given reduction operation along each column of ``a``, but simply skips any elements where the corresponding entry in ``mymask`` is false. (For ufuncs which have an identity, this is equivalent to treating the given elements as if they were the identity.) For example, if ``a`` is a 2-dimensional array and skipping over the details of broadcasting, dtype selection, etc., the above operation produces the same result as:: out = np.empty(a.shape[1]) for i in xrange(a.shape[1]): out[i] = f.reduce(a[mymask[:, i], i]) return out -------------------------------- Semantics: ufunc ``.accumulate`` -------------------------------- Accumulation is similar to reduction, except that ``.accumulate`` saves the intermediate values generated during the reduction loop. Therefore we use the same semantics as for ``.reduce`` above. If ``a`` is 2-d etc., then this expression:: f.accumulate(a, axis=0, where=mymask) is equivalent to:: out = np.empty(a.shape) for i in xrange(a.shape[1]): out[mymask[:, i], i] = f.accumulate(a[mymask[:, i], i]) return out Notice that once again, elements of ``out`` which correspond to False entries in the mask are left unitialized. ------------------------------ Semantics: ufunc ``.reduceat`` ------------------------------ I've never used ``.reduceat``, and 30 seconds staring at the documentation was not sufficient to wrap my head around it. It's not obvious to me that a ``where=`` argument even make sense for this? If not we could just say that it's not supported. --------------------------- Semantics: ufunc ``.outer`` --------------------------- This is more complicated -- ``.outer`` takes two arrays, which do not necessarily have the same shape. Therefore we need to also accept two ``where=`` arguments, which I tentatively and uncreatively propose be called ``where1=`` and ``where2=``. So we have:: f.outer(a, b, where1=mymask_for_a, where2=mymask_for_b) This produces an output array with shape ``a.shape + b.shape``, in which those pairs of elements in ``a`` for which the corresponding entry in mymask_for_a is True, and in ``b`` for which both corresponding entry in ``mymask_for_b`` is True, are combined by f and placed into the appropriate spot. All other entries are left uninitialized. ******************** Unresolved questions ******************** Does it make sense to support ``where=`` in ``.reduceat`` operations? Is there any less stupid-looking name than ``where1=`` and ``where2=`` for the ``.outer`` operation? (For that matter, can ``.outer`` be applied to more than 2 arrays? The docs say it can't, but it's perfectly well-defined for arbitrary number of arrays too, so maybe we want an interface that allows for 3-way, 4-way etc. ``.outer`` operations in the future?) How does this interact with iterator support (e.g., the new ``nditer``)? A section should be added, but I'm not the one to write it. -- Nathaniel From cjordan1 at uw.edu Wed Jul 6 15:38:40 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 12:38:40 -0700 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: <4E14AB8A.90707@noaa.gov> References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker wrote: > Christopher Jordan-Squire wrote: > > If we follow those rules for IGNORE for all computations, we sometimes > > get some weird output. For example: > > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix > > multiply and not * with broadcasting.) Or should that sort of operation > > through an error? > > That should throw an error -- matrix computation is heavily influenced > by the shape and size of matrices, so I think IGNORES really don't make > sense there. > > > If the IGNORES don't make sense in basic numpy computations then I'm kinda confused why they'd be included at the numpy core level. > Nathaniel Smith wrote: > > It's exactly this transparency that worries Matthew and me -- we feel > > that the alterNEP preserves it, and the NEP attempts to erase it. In > > the NEP, there are two totally different underlying data structures, > > but this difference is blurred at the Python level. The idea is that > > you shouldn't have to think about which you have, but if you work with > > C/Fortran, then of course you do have to be constantly aware of the > > underlying implementation anyway. > > I don't think this bothers me -- I think it's analogous to things in > numpy like Fortran order and non-contiguous arrays -- you can ignore all > that when working in pure python when performance isn't critical, but > you need a deeper understanding if you want to work with the data in C > or Fortran or to tune performance in python. > > So as long as there is an API to query and control how things work, I > like that it's hidden from simple python code. > > -Chris > > > I'm similarly not too concerned about it. Performance seems finicky when you're dealing with missing data, since a lot of arrays will likely have to be copied over to other arrays containing only complete data before being handed over to BLAS. My primary concern is that the np.NA stuff 'just works'. Especially since I've never run into use cases in statistics where the difference between IGNORE and NA mattered. > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Wed Jul 6 15:41:22 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 12:41:22 -0700 Subject: [Numpy-discussion] towards a more productive missing values/masked arrays discussion... In-Reply-To: References: Message-ID: It'd be easier to follow if you just made changes/suggestions on github to Mark's NEP directly. (You can checkout Mark's missing data branch to get the NEP.) Then I'll be able to focus on the ways the suggestions differ or compliment the current NEP. -Chris Jordan-Squire On Wed, Jul 6, 2011 at 12:24 PM, Mark Wiebe wrote: > On Wed, Jul 6, 2011 at 2:20 PM, Nathaniel Smith wrote: > >> So one thing that came up on the call yesterday is that there actually >> is a significant chunk of functionality that everyone seems to agree >> is useful, needed, and basically how it should work. >> >> This includes: >> -- the basic existence and semantics for NA values (however this is >> implemented) >> -- that there should exist a dtype/bit-pattern implementation for >> NAs (whatever other implementations there might also be) >> -- that ufunc's should take a where= argument >> -- that there should be a better way for ndarray subclasses like >> numpy.ma to override the arguments to ufuncs involving them >> -- maybe some other things I'm not thinking of >> >> The real controversy is around what role masking should play, both at >> the API and implementation level; there are lots of different >> arguments for different approaches, and it's not at all clear any >> current proposal will actually solve the problems are facing (or even >> what those problems are). >> >> So rather than continue to go around in circles indefinitely on that, >> I'm going to write up some "miniNEPs" just focusing on the details of >> how the features we do agree on should work, so we can hopefully have >> a more technical discussion of *that*. >> > > That sounds alright to me. One thing I would like to ask is to please adopt > the vocabulary we are discussing, using it exactly as defined so that people > reading all the various ideas don't have to readjust when switching between > documents. > > Thanks, > Mark > > >> >> Cheers, >> -- Nathaniel >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jul 6 15:43:27 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 6 Jul 2011 14:43:27 -0500 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary In-Reply-To: <4E145F42.9060007@astro.uio.no> References: <4E145F42.9060007@astro.uio.no> Message-ID: On Wed, Jul 6, 2011 at 8:12 AM, Dag Sverre Seljebotn < d.s.seljebotn at astro.uio.no> wrote: > > I just commented on the "prevent direct API access to the masking array" > part -- I'm hoping direct access by external code to the underlying > implementation details will be allowed, at some point. > I think direct or nearly direct access needs to be in right away, unless we're fairly sure that we will change low level implementation details in the near future. I've added "Python API" and "C API" definitions for us to use to try and clear up this kind of potential confusion. -Mark > What I'm saying is that Mark's proposal is more flexible. Say for the > sake of the argument that I have two codes I need to interface with: > > - Library A is written in Fortran and uses a seperate (explicit) mask > array for NA > > - Library B runs on a GPU and uses a bit pattern for NA > > Mark's proposal then comes closer to allowing me to wrap both codes > using NumPy, since it supports both implementation mechanisms. Sure, it > would need a seperate NEP down the road to extend it, but it goes in the > right direction for this to happen. > > As for NA vs. IGNORE I still think 2 types is too little. One should > allow for 255 different NA-values, each with user-defined behaviour. > Again, Mark's proposal then makes a good start on that, even if more > work would be needed to make it happen. > > I.e., in my perfect world I'd do this to wrap library A (Cythonish > psuedo-code: > > def call_lib_A(): > ... > lib_A_function(arraybuf, maskbuf, ...) > DOG_ATE_IT = np.NA("DOG_ATE_IT", value=42, behaviour="raise") > # behaviour could also be "zero", "invalid" > missing_value_map = {0xAF: np.NA, 0x43: np.IGNORE, 0xF0: DOG_ATE_IT} > result = np.PyArray_CreateArrayFromBufferWithMaskBuffer( > arraybuf, maskbuf, missing_value_map, ...) > return result > > def call_lib_B(): > lib_B_function(arraybuf, ...) > missing_value_patterns = {0xFFFFCACA : np.NA} > result = np.PyArray_CreateArrayFromBufferWithBitPattern( > arraybuf, maskbuf, missing_value_patterns, ...) > return result > > Hope that is clearer. Again, my intention is not to suggest even more > work at the present stage, just to state some advantages with the > general direction of Mark's proposal. > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Jul 6 16:08:34 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 Jul 2011 16:08:34 -0400 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire wrote: > > > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker > wrote: >> >> Christopher Jordan-Squire wrote: >> > If we follow those rules for IGNORE for all computations, we sometimes >> > get some weird output. For example: >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix >> > multiply and not * with broadcasting.) Or should that sort of operation >> > through an error? >> >> That should throw an error -- matrix computation is heavily influenced >> by the shape and size of matrices, so I think IGNORES really don't make >> sense there. >> >> > > If the IGNORES don't make sense in basic numpy computations then I'm kinda > confused why they'd be included at the numpy core level. > >> >> Nathaniel Smith wrote: >> > It's exactly this transparency that worries Matthew and me -- we feel >> > that the alterNEP preserves it, and the NEP attempts to erase it. In >> > the NEP, there are two totally different underlying data structures, >> > but this difference is blurred at the Python level. The idea is that >> > you shouldn't have to think about which you have, but if you work with >> > C/Fortran, then of course you do have to be constantly aware of the >> > underlying implementation anyway. >> >> I don't think this bothers me -- I think it's analogous to things in >> numpy like Fortran order and non-contiguous arrays -- you can ignore all >> that when working in pure python when performance isn't critical, but >> you need a deeper understanding if you want to work with the data in C >> or Fortran or to tune performance in python. >> >> So as long as there is an API to query and control how things work, I >> like that it's hidden from simple python code. >> >> -Chris >> >> > > I'm similarly not too concerned about it. Performance seems finicky when > you're dealing with missing data, since a lot of arrays will likely have to > be copied over to other arrays containing only complete data before being > handed over to BLAS. Unless you know the neutral value for the computation or you just want to do a forward_fill in time series, and you have to ask the user not to give you an unmutable array with NAs if they don't want extra copies. Josef > My primary concern is that the np.NA stuff 'just > works'. Especially since I've never run into use cases in statistics where > the difference between IGNORE and NA mattered. > > >> >> >> -- >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice >> 7600 Sand Point Way NE ? (206) 526-6329 ? fax >> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception >> >> Chris.Barker at noaa.gov >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From bsouthey at gmail.com Wed Jul 6 16:11:37 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 06 Jul 2011 15:11:37 -0500 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: <4E14C179.4060502@gmail.com> On 07/06/2011 02:38 PM, Christopher Jordan-Squire wrote: > > > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker > > wrote: > > Christopher Jordan-Squire wrote: > > If we follow those rules for IGNORE for all computations, we > sometimes > > get some weird output. For example: > > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix > > multiply and not * with broadcasting.) Or should that sort of > operation > > through an error? > > That should throw an error -- matrix computation is heavily influenced > by the shape and size of matrices, so I think IGNORES really don't > make > sense there. > > > > If the IGNORES don't make sense in basic numpy computations then I'm > kinda confused why they'd be included at the numpy core level. > > Nathaniel Smith wrote: > > It's exactly this transparency that worries Matthew and me -- we > feel > > that the alterNEP preserves it, and the NEP attempts to erase it. In > > the NEP, there are two totally different underlying data structures, > > but this difference is blurred at the Python level. The idea is that > > you shouldn't have to think about which you have, but if you > work with > > C/Fortran, then of course you do have to be constantly aware of the > > underlying implementation anyway. > > I don't think this bothers me -- I think it's analogous to things in > numpy like Fortran order and non-contiguous arrays -- you can > ignore all > that when working in pure python when performance isn't critical, but > you need a deeper understanding if you want to work with the data in C > or Fortran or to tune performance in python. > > So as long as there is an API to query and control how things work, I > like that it's hidden from simple python code. > > -Chris > > > > I'm similarly not too concerned about it. Performance seems finicky > when you're dealing with missing data, since a lot of arrays will > likely have to be copied over to other arrays containing only complete > data before being handed over to BLAS. My primary concern is that the > np.NA stuff 'just works'. Especially since I've never run into use > cases in statistics where the difference between IGNORE and NA mattered. > > Exactly! I have not been able to think of an real example where that difference matters as the calculations are only on the 'valid' (ie non-missing and non-masked) values. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Wed Jul 6 16:22:26 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 13:22:26 -0700 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 1:08 PM, wrote: > On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire > wrote: > > > > > > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker < > Chris.Barker at noaa.gov> > > wrote: > >> > >> Christopher Jordan-Squire wrote: > >> > If we follow those rules for IGNORE for all computations, we sometimes > >> > get some weird output. For example: > >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix > >> > multiply and not * with broadcasting.) Or should that sort of > operation > >> > through an error? > >> > >> That should throw an error -- matrix computation is heavily influenced > >> by the shape and size of matrices, so I think IGNORES really don't make > >> sense there. > >> > >> > > > > If the IGNORES don't make sense in basic numpy computations then I'm > kinda > > confused why they'd be included at the numpy core level. > > > >> > >> Nathaniel Smith wrote: > >> > It's exactly this transparency that worries Matthew and me -- we feel > >> > that the alterNEP preserves it, and the NEP attempts to erase it. In > >> > the NEP, there are two totally different underlying data structures, > >> > but this difference is blurred at the Python level. The idea is that > >> > you shouldn't have to think about which you have, but if you work with > >> > C/Fortran, then of course you do have to be constantly aware of the > >> > underlying implementation anyway. > >> > >> I don't think this bothers me -- I think it's analogous to things in > >> numpy like Fortran order and non-contiguous arrays -- you can ignore all > >> that when working in pure python when performance isn't critical, but > >> you need a deeper understanding if you want to work with the data in C > >> or Fortran or to tune performance in python. > >> > >> So as long as there is an API to query and control how things work, I > >> like that it's hidden from simple python code. > >> > >> -Chris > >> > >> > > > > I'm similarly not too concerned about it. Performance seems finicky when > > you're dealing with missing data, since a lot of arrays will likely have > to > > be copied over to other arrays containing only complete data before being > > handed over to BLAS. > > Unless you know the neutral value for the computation or you just want > to do a forward_fill in time series, and you have to ask the user not > to give you an unmutable array with NAs if they don't want extra > copies. > > Josef > > Mean value replacement, or more generally single scalar value replacement, is generally not a good idea. It biases downward your standard error estimates if you use mean replacement, and it will bias both if you use anything other than mean replacement. The bias is gets worse with more missing data. So it's worst in the precisely the cases where you'd want to fill in the data the most. (Though I admit I'm not too familiar with time series, so maybe this doesn't apply. But it's true as a general principle in statistics.) I'm not sure why we'd want to make this use case easier. -Chris Jordan-Squire > > My primary concern is that the np.NA stuff 'just > > works'. Especially since I've never run into use cases in statistics > where > > the difference between IGNORE and NA mattered. > > > > > >> > >> > >> -- > >> Christopher Barker, Ph.D. > >> Oceanographer > >> > >> Emergency Response Division > >> NOAA/NOS/OR&R (206) 526-6959 voice > >> 7600 Sand Point Way NE (206) 526-6329 fax > >> Seattle, WA 98115 (206) 526-6317 main reception > >> > >> Chris.Barker at noaa.gov > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Jul 6 16:37:17 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 6 Jul 2011 22:37:17 +0200 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: <4E14C179.4060502@gmail.com> References: <4E14AB8A.90707@noaa.gov> <4E14C179.4060502@gmail.com> Message-ID: On Jul 6, 2011, at 10:11 PM, Bruce Southey wrote: > On 07/06/2011 02:38 PM, Christopher Jordan-Squire wrote: >> >> >> On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker wrote: >> Christopher Jordan-Squire wrote: >> > If we follow those rules for IGNORE for all computations, we sometimes >> > get some weird output. For example: >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix >> > multiply and not * with broadcasting.) Or should that sort of operation >> > through an error? >> >> That should throw an error -- matrix computation is heavily influenced >> by the shape and size of matrices, so I think IGNORES really don't make >> sense there. >> >> >> >> If the IGNORES don't make sense in basic numpy computations then I'm kinda confused why they'd be included at the numpy core level. >> >> >> Nathaniel Smith wrote: >> > It's exactly this transparency that worries Matthew and me -- we feel >> > that the alterNEP preserves it, and the NEP attempts to erase it. In >> > the NEP, there are two totally different underlying data structures, >> > but this difference is blurred at the Python level. The idea is that >> > you shouldn't have to think about which you have, but if you work with >> > C/Fortran, then of course you do have to be constantly aware of the >> > underlying implementation anyway. >> >> I don't think this bothers me -- I think it's analogous to things in >> numpy like Fortran order and non-contiguous arrays -- you can ignore all >> that when working in pure python when performance isn't critical, but >> you need a deeper understanding if you want to work with the data in C >> or Fortran or to tune performance in python. >> >> So as long as there is an API to query and control how things work, I >> like that it's hidden from simple python code. >> >> -Chris >> >> >> >> I'm similarly not too concerned about it. Performance seems finicky when you're dealing with missing data, since a lot of arrays will likely have to be copied over to other arrays containing only complete data before being handed over to BLAS. My primary concern is that the np.NA stuff 'just works'. Especially since I've never run into use cases in statistics where the difference between IGNORE and NA mattered. >> >> > Exactly! > I have not been able to think of an real example where that difference matters as the calculations are only on the 'valid' (ie non-missing and non-masked) values. In practice, they could be treated the same way (ie, skipped). However, they are conceptually different and one may wish to keep this difference of information around (between NAs you didn't have and IGNOREs you just dropped temporarily. From josef.pktd at gmail.com Wed Jul 6 16:38:51 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 Jul 2011 16:38:51 -0400 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire wrote: > > > On Wed, Jul 6, 2011 at 1:08 PM, wrote: >> >> On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire >> wrote: >> > >> > >> > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker >> > >> > wrote: >> >> >> >> Christopher Jordan-Squire wrote: >> >> > If we follow those rules for IGNORE for all computations, we >> >> > sometimes >> >> > get some weird output. For example: >> >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix >> >> > multiply and not * with broadcasting.) Or should that sort of >> >> > operation >> >> > through an error? >> >> >> >> That should throw an error -- matrix computation is heavily influenced >> >> by the shape and size of matrices, so I think IGNORES really don't make >> >> sense there. >> >> >> >> >> > >> > If the IGNORES don't make sense in basic numpy computations then I'm >> > kinda >> > confused why they'd be included at the numpy core level. >> > >> >> >> >> Nathaniel Smith wrote: >> >> > It's exactly this transparency that worries Matthew and me -- we feel >> >> > that the alterNEP preserves it, and the NEP attempts to erase it. In >> >> > the NEP, there are two totally different underlying data structures, >> >> > but this difference is blurred at the Python level. The idea is that >> >> > you shouldn't have to think about which you have, but if you work >> >> > with >> >> > C/Fortran, then of course you do have to be constantly aware of the >> >> > underlying implementation anyway. >> >> >> >> I don't think this bothers me -- I think it's analogous to things in >> >> numpy like Fortran order and non-contiguous arrays -- you can ignore >> >> all >> >> that when working in pure python when performance isn't critical, but >> >> you need a deeper understanding if you want to work with the data in C >> >> or Fortran or to tune performance in python. >> >> >> >> So as long as there is an API to query and control how things work, I >> >> like that it's hidden from simple python code. >> >> >> >> -Chris >> >> >> >> >> > >> > I'm similarly not too concerned about it. Performance seems finicky when >> > you're dealing with missing data, since a lot of arrays will likely have >> > to >> > be copied over to other arrays containing only complete data before >> > being >> > handed over to BLAS. >> >> Unless you know the neutral value for the computation or you just want >> to do a forward_fill in time series, and you have to ask the user not >> to give you an unmutable array with NAs if they don't want extra >> copies. >> >> Josef >> > > Mean value replacement, or more generally single scalar value replacement, > is generally not a good idea. It biases downward your standard error > estimates if you use mean replacement, and it will bias both if you use > anything other than mean replacement. The bias is gets worse with more > missing data. So it's worst in the precisely the cases where you'd want to > fill in the data the most. (Though I admit I'm not too familiar with time > series, so maybe this doesn't apply. But it's true as a general principle in > statistics.) I'm not sure why we'd want to make this use case easier. We just discussed a use case for pandas on the statsmodels mailing list, minute data of stock quotes (prices), if the quote is NA then fill it with the last price quote. If it would be necessary for memory usage and performance, this can be handled efficiently and with minimal copying. If you want to fill in a missing value without messing up any result statistics, then there is a large literature in statistics on imputations, repeatedly assigning values to a NA from an underlying distribution. scipy/statsmodels doesn't have anything like this (yet) but R and the others have it available, and it looks more popular in bio-statistics. (But similar to what Dag said, for statistical analysis it will be necessary to keep case specific masks and data arrays around. I haven't actually written any missing values algorithm yet, so I'm quite again.) Josef > -Chris Jordan-Squire > >> >> > My primary concern is that the np.NA stuff 'just >> > works'. Especially since I've never run into use cases in statistics >> > where >> > the difference between IGNORE and NA mattered. >> > >> > >> >> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> >> Oceanographer >> >> >> >> Emergency Response Division >> >> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice >> >> 7600 Sand Point Way NE ? (206) 526-6329 ? fax >> >> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception >> >> >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From josef.pktd at gmail.com Wed Jul 6 16:47:36 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 Jul 2011 16:47:36 -0400 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 4:38 PM, wrote: > On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire > wrote: >> >> >> On Wed, Jul 6, 2011 at 1:08 PM, wrote: >>> >>> On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire >>> wrote: >>> > >>> > >>> > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker >>> > >>> > wrote: >>> >> >>> >> Christopher Jordan-Squire wrote: >>> >> > If we follow those rules for IGNORE for all computations, we >>> >> > sometimes >>> >> > get some weird output. For example: >>> >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix >>> >> > multiply and not * with broadcasting.) Or should that sort of >>> >> > operation >>> >> > through an error? >>> >> >>> >> That should throw an error -- matrix computation is heavily influenced >>> >> by the shape and size of matrices, so I think IGNORES really don't make >>> >> sense there. >>> >> >>> >> >>> > >>> > If the IGNORES don't make sense in basic numpy computations then I'm >>> > kinda >>> > confused why they'd be included at the numpy core level. >>> > >>> >> >>> >> Nathaniel Smith wrote: >>> >> > It's exactly this transparency that worries Matthew and me -- we feel >>> >> > that the alterNEP preserves it, and the NEP attempts to erase it. In >>> >> > the NEP, there are two totally different underlying data structures, >>> >> > but this difference is blurred at the Python level. The idea is that >>> >> > you shouldn't have to think about which you have, but if you work >>> >> > with >>> >> > C/Fortran, then of course you do have to be constantly aware of the >>> >> > underlying implementation anyway. >>> >> >>> >> I don't think this bothers me -- I think it's analogous to things in >>> >> numpy like Fortran order and non-contiguous arrays -- you can ignore >>> >> all >>> >> that when working in pure python when performance isn't critical, but >>> >> you need a deeper understanding if you want to work with the data in C >>> >> or Fortran or to tune performance in python. >>> >> >>> >> So as long as there is an API to query and control how things work, I >>> >> like that it's hidden from simple python code. >>> >> >>> >> -Chris >>> >> >>> >> >>> > >>> > I'm similarly not too concerned about it. Performance seems finicky when >>> > you're dealing with missing data, since a lot of arrays will likely have >>> > to >>> > be copied over to other arrays containing only complete data before >>> > being >>> > handed over to BLAS. >>> >>> Unless you know the neutral value for the computation or you just want >>> to do a forward_fill in time series, and you have to ask the user not >>> to give you an unmutable array with NAs if they don't want extra >>> copies. >>> >>> Josef >>> >> >> Mean value replacement, or more generally single scalar value replacement, >> is generally not a good idea. It biases downward your standard error >> estimates if you use mean replacement, and it will bias both if you use >> anything other than mean replacement. The bias is gets worse with more >> missing data. So it's worst in the precisely the cases where you'd want to >> fill in the data the most. (Though I admit I'm not too familiar with time >> series, so maybe this doesn't apply. But it's true as a general principle in >> statistics.) I'm not sure why we'd want to make this use case easier. Another qualification on this (I cannot help it). I think this only applies if you use a prefabricated no-missing-values algorithm. If I write it myself, I can do the proper correction for the reduced number of observations. (similar to the case when we ignore correlated information and use statistics based on uncorrelated observations which also overestimate the amount of information we have available.) Josef > > We just discussed a use case for pandas on the statsmodels mailing > list, minute data of stock quotes (prices), if the quote is NA then > fill it with the last price quote. If it would be necessary for memory > usage and performance, this can be handled efficiently and with > minimal copying. > > If you want to fill in a missing value without messing up any result > statistics, then there is a large literature in statistics on > imputations, repeatedly assigning values to a NA from an underlying > distribution. scipy/statsmodels doesn't have anything like this (yet) > but R and the others have it available, and it looks more popular in > bio-statistics. > > (But similar to what Dag said, for statistical analysis it will be > necessary to keep case specific masks and data arrays around. I > haven't actually written any missing values algorithm yet, so I'm > quite again.) > > Josef > >> -Chris Jordan-Squire >> >>> >>> > My primary concern is that the np.NA stuff 'just >>> > works'. Especially since I've never run into use cases in statistics >>> > where >>> > the difference between IGNORE and NA mattered. >>> > >>> > >>> >> >>> >> >>> >> -- >>> >> Christopher Barker, Ph.D. >>> >> Oceanographer >>> >> >>> >> Emergency Response Division >>> >> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice >>> >> 7600 Sand Point Way NE ? (206) 526-6329 ? fax >>> >> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception >>> >> >>> >> Chris.Barker at noaa.gov >>> >> _______________________________________________ >>> >> NumPy-Discussion mailing list >>> >> NumPy-Discussion at scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > From ndbecker2 at gmail.com Wed Jul 6 16:53:04 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 06 Jul 2011 16:53:04 -0400 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary References: <4E1454C9.5010209@astro.uio.no> <4E14A7E8.8040800@noaa.gov> Message-ID: Christopher Barker wrote: > Dag Sverre Seljebotn wrote: >> Here's an HPC perspective...: > >> At least I feel that the transparency of NumPy is a huge part of its >> current success. Many more than me spend half their time in C/Fortran >> and half their time in Python. > > Absolutely -- and this point has been raised a couple times in the > discussion, so I hope it is not forgotten. > > > I tend to look at NumPy this way: Assuming you have some data in memory >> (possibly loaded by a C or Fortran library). (Almost) no matter how it >> is allocated, ordered, packed, aligned -- there's a way to find strides >> and dtypes to put a nice NumPy wrapper around it and use the memory from >> Python. > > and vice-versa -- Assuming you have some data in numpy arrays, there's a > way to process it with a C or Fortran library without copying the data. > > And this is where I am skeptical of the bit-pattern idea -- while one > can expect C and fortran and GPU, and ??? to understand NaNs for > floating point data, is there any support in compilers or hardware for > special bit patterns for NA values to integers? I've never seen in my > (very limited experience). > > Maybe having the mask option, too, will make that irrelevant, but I want > to be clear about that kind of use case. > > -Chris Am I the only one that finds the idea of special values of things like int[1] to have special meanings to be really ugly? [1] which already have defined behavior over their entire domain of bit patterns From gael.varoquaux at normalesup.org Wed Jul 6 16:58:38 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 6 Jul 2011 22:58:38 +0200 Subject: [Numpy-discussion] miniNEP1: where= argument for ufuncs In-Reply-To: References: Message-ID: <20110706205838.GK24514@phare.normalesup.org> On Wed, Jul 06, 2011 at 12:26:24PM -0700, Nathaniel Smith wrote: > A mini-NEP for the where= argument to ufuncs I _love_ this proposal and it would probably be much more useful to me than the different masked array proposal that are too focused on a specific usage pattern to answer all my needs. So a strong +1 on the miniNEP. G From ralf.gommers at googlemail.com Wed Jul 6 16:59:18 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 6 Jul 2011 22:59:18 +0200 Subject: [Numpy-discussion] histogram2d error with empty inputs In-Reply-To: References: Message-ID: On Mon, Jun 27, 2011 at 9:38 PM, Benjamin Root wrote: > I found another empty input edge case. Somewhat recently, we fixed an > issue with np.histogram() and empty inputs (so long as the bins are somehow > known). > > >>> np.histogram([], bins=4) > (array([0, 0, 0, 0]), array([ 0. , 0.25, 0.5 , 0.75, 1. ])) > > However, histogram2d needs the same treatment. > > >>> np.histogram([], [], bins=4) > (array([ 0., 0.]), array([ 0. , 0.25, 0.5 , 0.75, 1. ]), array([ 0. > , 0.25, 0.5 , 0.75, 1. ])) > > The first element in the return tuple needs to be 4x4 (in this case). > > Could you open a ticket for this? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Jul 6 16:59:59 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 6 Jul 2011 22:59:59 +0200 Subject: [Numpy-discussion] HPC missing data - was: NA/Missing Data Conference Call Summary In-Reply-To: <4E14ABE9.8030903@astro.uio.no> References: <4E145F42.9060007@astro.uio.no> <4E14ABE9.8030903@astro.uio.no> Message-ID: <20110706205959.GL24514@phare.normalesup.org> On Wed, Jul 06, 2011 at 08:39:37PM +0200, Dag Sverre Seljebotn wrote: > As for myself, I'll admit that I'll almost certainly continue with > explicit masking without using any of the proposed NEPs -- I have to be > extremely aware of the masks in the statistical methods I use. My gut feeling is that I am in the same case. G From charlesr.harris at gmail.com Wed Jul 6 17:10:55 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Jul 2011 15:10:55 -0600 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E1454C9.5010209@astro.uio.no> <4E14A7E8.8040800@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 2:53 PM, Neal Becker wrote: > Christopher Barker wrote: > > > Dag Sverre Seljebotn wrote: > >> Here's an HPC perspective...: > > > >> At least I feel that the transparency of NumPy is a huge part of its > >> current success. Many more than me spend half their time in C/Fortran > >> and half their time in Python. > > > > Absolutely -- and this point has been raised a couple times in the > > discussion, so I hope it is not forgotten. > > > > > I tend to look at NumPy this way: Assuming you have some data in > memory > >> (possibly loaded by a C or Fortran library). (Almost) no matter how it > >> is allocated, ordered, packed, aligned -- there's a way to find strides > >> and dtypes to put a nice NumPy wrapper around it and use the memory from > >> Python. > > > > and vice-versa -- Assuming you have some data in numpy arrays, there's a > > way to process it with a C or Fortran library without copying the data. > > > > And this is where I am skeptical of the bit-pattern idea -- while one > > can expect C and fortran and GPU, and ??? to understand NaNs for > > floating point data, is there any support in compilers or hardware for > > special bit patterns for NA values to integers? I've never seen in my > > (very limited experience). > > > > Maybe having the mask option, too, will make that irrelevant, but I want > > to be clear about that kind of use case. > > > > -Chris > > Am I the only one that finds the idea of special values of things like > int[1] to > have special meanings to be really ugly? > > [1] which already have defined behavior over their entire domain of bit > patterns > > Umm, no, I find it ugly also. On the other hand, it is an useful artifact left to us by the ancients and solves a lot of problems. So in the absence of anything more standardized... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Wed Jul 6 17:29:00 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 06 Jul 2011 16:29:00 -0500 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> <4E14C179.4060502@gmail.com> Message-ID: <4E14D39C.3010701@gmail.com> On 07/06/2011 03:37 PM, Pierre GM wrote: > On Jul 6, 2011, at 10:11 PM, Bruce Southey wrote: > >> On 07/06/2011 02:38 PM, Christopher Jordan-Squire wrote: >>> >>> On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker wrote: >>> Christopher Jordan-Squire wrote: >>>> If we follow those rules for IGNORE for all computations, we sometimes >>>> get some weird output. For example: >>>> [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix >>>> multiply and not * with broadcasting.) Or should that sort of operation >>>> through an error? >>> That should throw an error -- matrix computation is heavily influenced >>> by the shape and size of matrices, so I think IGNORES really don't make >>> sense there. >>> >>> >>> >>> If the IGNORES don't make sense in basic numpy computations then I'm kinda confused why they'd be included at the numpy core level. >>> >>> >>> Nathaniel Smith wrote: >>>> It's exactly this transparency that worries Matthew and me -- we feel >>>> that the alterNEP preserves it, and the NEP attempts to erase it. In >>>> the NEP, there are two totally different underlying data structures, >>>> but this difference is blurred at the Python level. The idea is that >>>> you shouldn't have to think about which you have, but if you work with >>>> C/Fortran, then of course you do have to be constantly aware of the >>>> underlying implementation anyway. >>> I don't think this bothers me -- I think it's analogous to things in >>> numpy like Fortran order and non-contiguous arrays -- you can ignore all >>> that when working in pure python when performance isn't critical, but >>> you need a deeper understanding if you want to work with the data in C >>> or Fortran or to tune performance in python. >>> >>> So as long as there is an API to query and control how things work, I >>> like that it's hidden from simple python code. >>> >>> -Chris >>> >>> >>> >>> I'm similarly not too concerned about it. Performance seems finicky when you're dealing with missing data, since a lot of arrays will likely have to be copied over to other arrays containing only complete data before being handed over to BLAS. My primary concern is that the np.NA stuff 'just works'. Especially since I've never run into use cases in statistics where the difference between IGNORE and NA mattered. >>> >>> >> Exactly! >> I have not been able to think of an real example where that difference matters as the calculations are only on the 'valid' (ie non-missing and non-masked) values. > In practice, they could be treated the same way (ie, skipped). However, they are conceptually different and one may wish to keep this difference of information around (between NAs you didn't have and IGNOREs you just dropped temporarily. > > > _______________________________________________ I have yet to see these as *conceptually different* in any of the arguments given. Separate NAs or IGNORES or any number of missing value codes just requires use to avoid 'unmasking' those missing value codes in your array as, I presume like masked arrays, you need some placeholder values. Bruce From ben.root at ou.edu Wed Jul 6 17:53:23 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 6 Jul 2011 16:53:23 -0500 Subject: [Numpy-discussion] histogram2d error with empty inputs In-Reply-To: References: Message-ID: On Wednesday, July 6, 2011, Ralf Gommers wrote: > > > On Mon, Jun 27, 2011 at 9:38 PM, Benjamin Root wrote: > > I found another empty input edge case.? Somewhat recently, we fixed an issue with np.histogram() and empty inputs (so long as the bins are somehow known). > >>>> np.histogram([], bins=4) > (array([0, 0, 0, 0]), array([ 0.? ,? 0.25,? 0.5 ,? 0.75,? 1.? ])) > > However, histogram2d needs the same treatment. > >>>> np.histogram([], [], bins=4) > (array([ 0.,? 0.]), array([ 0.? ,? 0.25,? 0.5 ,? 0.75,? 1.? ]), array([ 0.? ,? 0.25,? 0.5 ,? 0.75,? 1.? ])) > > The first element in the return tuple needs to be 4x4 (in this case). > > Could you open a ticket for this? > > Ralf > > > Not a problem. I managed to partly trace the problem down into histogramdd, but the function is a little confusing. Ben Root From ben.root at ou.edu Wed Jul 6 18:03:56 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 6 Jul 2011 17:03:56 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: <4E14B2FB.7090904@astro.uio.no> References: <4E14A893.7040407@noaa.gov> <4E14B2FB.7090904@astro.uio.no> Message-ID: On Wednesday, July 6, 2011, Dag Sverre Seljebotn wrote: > On 07/06/2011 08:25 PM, Christopher Barker wrote: >> Mark Wiebe wrote: >>> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any >>> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and >>> IGNORE as mask are reasonable. >> >> Is this really true? if you use a bitpattern for IGNORE, haven't you >> just lost the ability to get the original value back if you want to stop >> ignoring it? Maybe that's not inherent to what an IGNORE means, but it >> seems pretty key to me. > > There's the question of how reductions treats the value. IIUC, IGNORE as > bitpattern would imply that reductions treat the value as 0, which is a > question orthogonal to whether the value can possibly be unmasked or not. > > Dag Sverre > Just because we are trying to be exact here, the reductions would treat IGNORE as the operation's identity. Therefore, for addition, it would be treated like 0, but for multiplication, it is treated like a 1. Ben Root From dbrown at ucar.edu Wed Jul 6 18:16:50 2011 From: dbrown at ucar.edu (David Brown) Date: Wed, 6 Jul 2011 16:16:50 -0600 Subject: [Numpy-discussion] Call for papers: AMS Jan 22-26, 2012 Message-ID: <07E4F021-E9E4-4210-962F-A168E5AE8931@ucar.edu> I would like to call to the attention of the NumPy community the following call for papers: Second Symposium on Advances in Modeling and Analysis Using Python, 22?26 January 2012, New Orleans, Louisiana The Second Symposium on Advances in Modeling and Analysis Using Python, sponsored by the American Meteorological Society, will be held 22?26 January 2012, as part of the 92nd AMS Annual Meeting in New Orleans, Louisiana. Preliminary programs, registration, hotel, and general information will be posted on the AMS Web site (http://www.ametsoc.org/meet/annual/) in late-September 2011. The application of object-oriented programming and other advances in computer science to the atmospheric and oceanic sciences has in turn led to advances in modeling and analysis tools and methods. This symposium focuses on applications of the open-source language Python and seeks to disseminate advances using Python in the atmospheric and oceanic sciences, as well as grow the earth sciences Python community. Papers describing Python work in applications, methodologies, and package development in all areas of meteorology, climatology, oceanography, and space sciences are welcome, including (but not limited to): modeling, time series analysis, air quality, satellite data processing, in-situ data analysis, GIS, Python as a software integration platform, visualization, gridding, model intercomparison, and very large (petabyte) dataset manipulation and access. The $95 abstract fee includes the submission of your abstract, the posting of your extended abstract, and the uploading and recording of your presentation which will be archived on the AMS Web site. Please submit your abstract electronically via the Web by 1 August 2011 (refer to the AMS Web page athttp://www.ametsoc.org/meet/online_submit.html.) An abstract fee of $95 (payable by credit card or purchase order) is charged at the time of submission (refundable only if abstract is not accepted). Authors of accepted presentations will be notified via e-mail by late-September 2011. All extended abstracts are to be submitted electronically and will be available on-line via the Web, Instructions for formatting extended abstracts will be posted on the AMS Web site. Manuscripts (up to 3MB) must be submitted electronically by 22 February 2012. All abstracts, extended abstracts and presentations will be available on the AMS Web site at no cost. For additional information, please contact the program chairperson, Johnny Lin, Physics Department, North Park University (jlin at northpark.edu). (5/11) -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Wed Jul 6 18:21:42 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 17:21:42 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: <4E14A893.7040407@noaa.gov> <4E14B2FB.7090904@astro.uio.no> Message-ID: On Wed, Jul 6, 2011 at 5:03 PM, Benjamin Root wrote: > On Wednesday, July 6, 2011, Dag Sverre Seljebotn > wrote: > > On 07/06/2011 08:25 PM, Christopher Barker wrote: > >> Mark Wiebe wrote: > >>> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > >>> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > >>> IGNORE as mask are reasonable. > >> > >> Is this really true? if you use a bitpattern for IGNORE, haven't you > >> just lost the ability to get the original value back if you want to stop > >> ignoring it? Maybe that's not inherent to what an IGNORE means, but it > >> seems pretty key to me. > > > > There's the question of how reductions treats the value. IIUC, IGNORE as > > bitpattern would imply that reductions treat the value as 0, which is a > > question orthogonal to whether the value can possibly be unmasked or not. > > > > Dag Sverre > > > > Just because we are trying to be exact here, the reductions would > treat IGNORE as the operation's identity. Therefore, for addition, it > would be treated like 0, but for multiplication, it is treated like a > 1. > > Ben Root > Yes. But, as discussed on another thread, that can lead to unexpected results when it's propagated through several operations. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Jul 6 18:38:16 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 6 Jul 2011 17:38:16 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: <4E14A893.7040407@noaa.gov> <4E14B2FB.7090904@astro.uio.no> Message-ID: On Wednesday, July 6, 2011, Christopher Jordan-Squire wrote: > > > On Wed, Jul 6, 2011 at 5:03 PM, Benjamin Root wrote: > > On Wednesday, July 6, 2011, Dag Sverre Seljebotn > wrote: >> On 07/06/2011 08:25 PM, Christopher Barker wrote: >>> Mark Wiebe wrote: >>>> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any >>>> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and >>>> IGNORE as mask are reasonable. >>> >>> Is this really true? if you use a bitpattern for IGNORE, haven't you >>> just lost the ability to get the original value back if you want to stop >>> ignoring it? Maybe that's not inherent to what an IGNORE means, but it >>> seems pretty key to me. >> >> There's the question of how reductions treats the value. IIUC, IGNORE as >> bitpattern would imply that reductions treat the value as 0, which is a >> question orthogonal to whether the value can possibly be unmasked or not. >> >> Dag Sverre >> > > Just because we are trying to be exact here, the reductions would > treat IGNORE as the operation's identity. ?Therefore, for addition, it > would be treated like 0, but for multiplication, it is treated like a > 1. > > Ben Root > > Yes. But, as discussed on another thread, that can lead to unexpected results when it's propagated through several operations. > > If you are talking about means, for example, then the count is adjusted before dividing. ?It is like they never existed. Same with standard deviation. Of course, there are issues with having fewer samples, but that isn't a problem caused by the underlying concept of skipping elements. As long as the underlying mathematical support for array math is still valid, I am not certain what the issue is. ?Matrix math on the other hand... Ben Root From cjordan1 at uw.edu Wed Jul 6 19:08:30 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 18:08:30 -0500 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: <4E14A893.7040407@noaa.gov> <4E14B2FB.7090904@astro.uio.no> Message-ID: On Wed, Jul 6, 2011 at 5:38 PM, Benjamin Root wrote: > On Wednesday, July 6, 2011, Christopher Jordan-Squire > wrote: > > > > > > On Wed, Jul 6, 2011 at 5:03 PM, Benjamin Root wrote: > > > > On Wednesday, July 6, 2011, Dag Sverre Seljebotn > > wrote: > >> On 07/06/2011 08:25 PM, Christopher Barker wrote: > >>> Mark Wiebe wrote: > >>>> 1) NA vs IGNORE and bitpattern vs mask are completely independent. Any > >>>> combination of NA as bitpattern, NA as mask, IGNORE as bitpattern, and > >>>> IGNORE as mask are reasonable. > >>> > >>> Is this really true? if you use a bitpattern for IGNORE, haven't you > >>> just lost the ability to get the original value back if you want to > stop > >>> ignoring it? Maybe that's not inherent to what an IGNORE means, but it > >>> seems pretty key to me. > >> > >> There's the question of how reductions treats the value. IIUC, IGNORE as > >> bitpattern would imply that reductions treat the value as 0, which is a > >> question orthogonal to whether the value can possibly be unmasked or > not. > >> > >> Dag Sverre > >> > > > > Just because we are trying to be exact here, the reductions would > > treat IGNORE as the operation's identity. Therefore, for addition, it > > would be treated like 0, but for multiplication, it is treated like a > > 1. > > > > Ben Root > > > > Yes. But, as discussed on another thread, that can lead to unexpected > results when it's propagated through several operations. > > > > > > If you are talking about means, for example, then the count is > adjusted before dividing. It is like they never existed. Same with > standard deviation. Of course, there are issues with having fewer > samples, but that isn't a problem caused by the underlying concept of > skipping elements. > > As long as the underlying mathematical support for array math is still > valid, I am not certain what the issue is. Matrix math on the other > hand... > > Ah, I see. I misunderstood the class of operations you were discussing. -Chris Jordan-Squire > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Wed Jul 6 19:14:21 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 6 Jul 2011 18:14:21 -0500 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 3:47 PM, wrote: > On Wed, Jul 6, 2011 at 4:38 PM, wrote: > > On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire > > wrote: > >> > >> > >> On Wed, Jul 6, 2011 at 1:08 PM, wrote: > >>> > >>> On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire > >>> wrote: > >>> > > >>> > > >>> > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker > >>> > > >>> > wrote: > >>> >> > >>> >> Christopher Jordan-Squire wrote: > >>> >> > If we follow those rules for IGNORE for all computations, we > >>> >> > sometimes > >>> >> > get some weird output. For example: > >>> >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is matrix > >>> >> > multiply and not * with broadcasting.) Or should that sort of > >>> >> > operation > >>> >> > through an error? > >>> >> > >>> >> That should throw an error -- matrix computation is heavily > influenced > >>> >> by the shape and size of matrices, so I think IGNORES really don't > make > >>> >> sense there. > >>> >> > >>> >> > >>> > > >>> > If the IGNORES don't make sense in basic numpy computations then I'm > >>> > kinda > >>> > confused why they'd be included at the numpy core level. > >>> > > >>> >> > >>> >> Nathaniel Smith wrote: > >>> >> > It's exactly this transparency that worries Matthew and me -- we > feel > >>> >> > that the alterNEP preserves it, and the NEP attempts to erase it. > In > >>> >> > the NEP, there are two totally different underlying data > structures, > >>> >> > but this difference is blurred at the Python level. The idea is > that > >>> >> > you shouldn't have to think about which you have, but if you work > >>> >> > with > >>> >> > C/Fortran, then of course you do have to be constantly aware of > the > >>> >> > underlying implementation anyway. > >>> >> > >>> >> I don't think this bothers me -- I think it's analogous to things in > >>> >> numpy like Fortran order and non-contiguous arrays -- you can ignore > >>> >> all > >>> >> that when working in pure python when performance isn't critical, > but > >>> >> you need a deeper understanding if you want to work with the data in > C > >>> >> or Fortran or to tune performance in python. > >>> >> > >>> >> So as long as there is an API to query and control how things work, > I > >>> >> like that it's hidden from simple python code. > >>> >> > >>> >> -Chris > >>> >> > >>> >> > >>> > > >>> > I'm similarly not too concerned about it. Performance seems finicky > when > >>> > you're dealing with missing data, since a lot of arrays will likely > have > >>> > to > >>> > be copied over to other arrays containing only complete data before > >>> > being > >>> > handed over to BLAS. > >>> > >>> Unless you know the neutral value for the computation or you just want > >>> to do a forward_fill in time series, and you have to ask the user not > >>> to give you an unmutable array with NAs if they don't want extra > >>> copies. > >>> > >>> Josef > >>> > >> > >> Mean value replacement, or more generally single scalar value > replacement, > >> is generally not a good idea. It biases downward your standard error > >> estimates if you use mean replacement, and it will bias both if you use > >> anything other than mean replacement. The bias is gets worse with more > >> missing data. So it's worst in the precisely the cases where you'd want > to > >> fill in the data the most. (Though I admit I'm not too familiar with > time > >> series, so maybe this doesn't apply. But it's true as a general > principle in > >> statistics.) I'm not sure why we'd want to make this use case easier. > > Another qualification on this (I cannot help it). > I think this only applies if you use a prefabricated no-missing-values > algorithm. If I write it myself, I can do the proper correction for > the reduced number of observations. (similar to the case when we > ignore correlated information and use statistics based on uncorrelated > observations which also overestimate the amount of information we have > available.) > > Can you do that sort of technique with longitudinal (panel) data? I'm honestly curious because I haven't looked into such corrections before. I haven't been able to find a reference after a few quick google searches. I don't suppose you know one off the top of your head? And you're right about the last measurement carried forward. I was just thinking about filling in all missing values with the same value. -Chris Jordan-Squire PS--Thanks for mentioning the statsmodels discussion. I'd been keeping track of that on a different email account, and I haven't realized it wasn't forwarding those messages correctly. > Josef > > > > > We just discussed a use case for pandas on the statsmodels mailing > > list, minute data of stock quotes (prices), if the quote is NA then > > fill it with the last price quote. If it would be necessary for memory > > usage and performance, this can be handled efficiently and with > > minimal copying. > > > > If you want to fill in a missing value without messing up any result > > statistics, then there is a large literature in statistics on > > imputations, repeatedly assigning values to a NA from an underlying > > distribution. scipy/statsmodels doesn't have anything like this (yet) > > but R and the others have it available, and it looks more popular in > > bio-statistics. > > > > (But similar to what Dag said, for statistical analysis it will be > > necessary to keep case specific masks and data arrays around. I > > haven't actually written any missing values algorithm yet, so I'm > > quite again.) > > > > Josef > > > >> -Chris Jordan-Squire > >> > >>> > >>> > My primary concern is that the np.NA stuff 'just > >>> > works'. Especially since I've never run into use cases in statistics > >>> > where > >>> > the difference between IGNORE and NA mattered. > >>> > > >>> > > >>> >> > >>> >> > >>> >> -- > >>> >> Christopher Barker, Ph.D. > >>> >> Oceanographer > >>> >> > >>> >> Emergency Response Division > >>> >> NOAA/NOS/OR&R (206) 526-6959 voice > >>> >> 7600 Sand Point Way NE (206) 526-6329 fax > >>> >> Seattle, WA 98115 (206) 526-6317 main reception > >>> >> > >>> >> Chris.Barker at noaa.gov > >>> >> _______________________________________________ > >>> >> NumPy-Discussion mailing list > >>> >> NumPy-Discussion at scipy.org > >>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > > >>> > > >>> > _______________________________________________ > >>> > NumPy-Discussion mailing list > >>> > NumPy-Discussion at scipy.org > >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > > >>> > > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 6 20:01:52 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 7 Jul 2011 01:01:52 +0100 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: Hi, On Wed, Jul 6, 2011 at 7:10 PM, Christopher Jordan-Squire wrote: > > > On Wed, Jul 6, 2011 at 10:44 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Jul 6, 2011 at 6:11 PM, Benjamin Root wrote: >> > >> > >> > On Wed, Jul 6, 2011 at 12:01 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Jul 6, 2011 at 5:48 PM, Peter >> >> wrote: >> >> > On Wed, Jul 6, 2011 at 5:38 PM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Wed, Jul 6, 2011 at 4:40 PM, Mark Wiebe >> >> >> wrote: >> >> >>> It appears to me that one of the biggest reason some of us have >> >> >>> been >> >> >>> talking >> >> >>> past each other in the discussions is that different people have >> >> >>> different >> >> >>> definitions for the terms being used. Until this is thoroughly >> >> >>> cleared >> >> >>> up, I >> >> >>> feel the design process is tilting at windmills. >> >> >>> In the interests of clarity in our discussions, here is a starting >> >> >>> point >> >> >>> which is consistent with the NEP. These definitions have been added >> >> >>> in >> >> >>> a >> >> >>> glossary within the NEP. If there are any ideas for amendments to >> >> >>> these >> >> >>> definitions that we can agree on, I will update the NEP with those >> >> >>> amendments. Also, if I missed any important terms which need to be >> >> >>> added, >> >> >>> please propose definitions for them. >> >> >>> NA (Not Available) >> >> >>> ? ? A placeholder for a value which is unknown to computations. >> >> >>> That >> >> >>> ? ? value may be temporarily hidden with a mask, may have been lost >> >> >>> ? ? due to hard drive corruption, or gone for any number of >> >> >>> reasons. >> >> >>> ? ? This is the same as NA in the R project. >> >> >> >> >> >> Really? ?Can one implement NA with a mask in R? ?I thought an NA was >> >> >> always bitpattern in R? >> >> > >> >> > I don't think that was what Mark was saying, see this bit later in >> >> > this >> >> > email: >> >> >> >> I think it would make a difference if there was an implementation that >> >> had conflated masking with bitpatterns in terms of API. ?I don't think >> >> R is an example. >> >> >> > >> > Of course R is not an example of that.? Nothing is.? This is merely >> > conceptual.? Separate NA from np.NA in Mark's NEP, and you will see his >> > point.? Consider it the logical intersection of NA in Mark's NEP and the >> > aNEP. >> >> I am trying to work out what you feel you feel the points of >> discussion are. ?There's surely no point in continuing to debate >> things we agree on. >> >> I don't think anyone disputes (or has ever disputed) that: >> >> There can be missing data implemented with bitpatterns >> There can be missing data implemented with masks >> Missing data can have propagate semantics >> Missing data can have ignore semantics. >> The implementation does not in itself constrain the semantics. >> > So, to be clear, is your concern is that you want to be able to tell > difference between whether an np.NA comes from the bit pattern or the mask > in its implementation? But why would you have both the parameterized dtype > and the mask?implementation?at the same time? They implement the same > abstraction. In Mark's mind they implement the same abstraction. In my mind, and Nathaniels, and I think, Pierre's, and others, they are not the same abstraction. You can treat them the same if you want, even by default, but they are two different ideas, with two different implementations. A bitmask NA value is absolutely completely missing. It's a value that says 'missing' A masked-out value is temporarily or provisionally missing. When you take away the mask, the previous value is there. These are two different things. They are each very easy to explain. > Is your desire that the np.NA's are implemented solely through bit patterns > and np.IGNORE is implemented solely through masks? So that you can think of > the masks as being IGNORE flags? What if you want multiple types of IGNORE? > (To ignore certain values because they're outliers, others because the data > wouldn't make sense, and others because you're just focusing on a particular > subgroup, for instance.) Forgive me, I have been at dinner and had several glasses of wine. So, what I'm about to say might be dumber than usual. With that rider: I agree with Mark, we should avoid np.IGNORE because it conflates ignore semantics with the masking implementation. The idea of several different missings seems to me orthogonal. There can be different missings with bitmasks and different missings with masks. My fundamental point, that I accept I am not getting across with much success, is the following: In general, as Dag has pointed out elsewhere, numpy is close the metal - you can almost feel the C array underneath the python numpy object. This is its strength. It doesn't try and hide the C array from you, it gives you the whole machinery, open kimono. I can see an open kimono way of dealing with missing values. There's the bitpattern way. If I do a[3] = np.NA, what I mean is 'store an NA in the array memory'. Exactly the same as when I do a[3] = 2, I mean 'store a 2 in the array memory'. It's obvious and transparent, easy to explain. I can see an open kimono way of doing masking. I make a masked array. The masked array has a 'mask'. I can set the mask values to "True" or "False". I can get the array from underneath the mask. It's obvious and transparent, easy to explain. I can see that you might want, for practical purposes, to treat these two 'missing' signals as being equivalalent. I can even see that you might not expose machinery to distinguish between them. But, it seems ugly and confusing to me, and to others, to try and make the bitpattern and the masked missing value appear to be exactly the same. If I do this: a[3] = np.NA I want an NA in a[3]. I don't want you to make it look as if there's an NA in a[3], I want there to be an NA in a[3]. I want to know what I did. So, maybe I want to 'mask' a[3]. Well then I make a masked array, and then I do a.mask[3] = False # or True. It's obvious. It's explicit. It does what I want. I can feel the C array and the mask array underneath. I know what I did. On the other hand, to try and conceal these implementation differences, seems to me to break my feeling for numpy arrays, and make me feel I have an object that is rather magic, that I don't fully understand, and for which clever stuff is going on, under the hood, that I worry about but have to trust. I think this is not the numpy way. I think I fully understand why it's attractive, but I continue to think that it's a mistake, and one that may take some time to become clear. It will become clear only after a few years of trying to teach people, and noticing that when they get to this stuff, they start switching off, and getting a bit confused, and concluding it's all too hard for them. I can see that we're starting to go round in circles again, and that writing when drunk is unlikely to help that, so at this point, I will drop out of the conversation and let y'all get on with it. Thanks for the substantial question by the way, it was helpful, Cheers, Matthew From xscript at gmx.net Wed Jul 6 20:41:28 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Thu, 07 Jul 2011 02:41:28 +0200 Subject: [Numpy-discussion] miniNEP1: where= argument for ufuncs In-Reply-To: (Nathaniel Smith's message of "Wed, 6 Jul 2011 12:26:24 -0700") References: Message-ID: <8762ne6hxj.fsf@fulla.xlab.taz> Sorry, but I didn't find a way of inserting inline comments in the gist. Nathaniel Smith writes: [...] > Is there any less stupid-looking name than ``where1=`` and ``where2=`` > for the ``.outer`` operation? (For that matter, can ``.outer`` be > applied to more than 2 arrays? The docs say it can't, but it's > perfectly well-defined for arbitrary number of arrays too, so maybe we > want an interface that allows for 3-way, 4-way etc. ``.outer`` > operations in the future?) Well, if outer can indeed be defined for an arbitrary number of arrays (and if it's going to be sometime in the future), I'd say the simplest is to use an array: .outer(a, b, ..., where = [my_where1, my_where2, ...]) Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From josef.pktd at gmail.com Wed Jul 6 20:47:19 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 Jul 2011 20:47:19 -0400 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 7:14 PM, Christopher Jordan-Squire wrote: > > > On Wed, Jul 6, 2011 at 3:47 PM, wrote: >> >> On Wed, Jul 6, 2011 at 4:38 PM, ? wrote: >> > On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire >> > wrote: >> >> >> >> >> >> On Wed, Jul 6, 2011 at 1:08 PM, wrote: >> >>> >> >>> On Wed, Jul 6, 2011 at 3:38 PM, Christopher Jordan-Squire >> >>> wrote: >> >>> > >> >>> > >> >>> > On Wed, Jul 6, 2011 at 11:38 AM, Christopher Barker >> >>> > >> >>> > wrote: >> >>> >> >> >>> >> Christopher Jordan-Squire wrote: >> >>> >> > If we follow those rules for IGNORE for all computations, we >> >>> >> > sometimes >> >>> >> > get some weird output. For example: >> >>> >> > [ [1, 2], [3, 4] ] * [ IGNORE, 7] = [ 15, 31 ]. (Where * is >> >>> >> > matrix >> >>> >> > multiply and not * with broadcasting.) Or should that sort of >> >>> >> > operation >> >>> >> > through an error? >> >>> >> >> >>> >> That should throw an error -- matrix computation is heavily >> >>> >> influenced >> >>> >> by the shape and size of matrices, so I think IGNORES really don't >> >>> >> make >> >>> >> sense there. >> >>> >> >> >>> >> >> >>> > >> >>> > If the IGNORES don't make sense in basic numpy computations then I'm >> >>> > kinda >> >>> > confused why they'd be included at the numpy core level. >> >>> > >> >>> >> >> >>> >> Nathaniel Smith wrote: >> >>> >> > It's exactly this transparency that worries Matthew and me -- we >> >>> >> > feel >> >>> >> > that the alterNEP preserves it, and the NEP attempts to erase it. >> >>> >> > In >> >>> >> > the NEP, there are two totally different underlying data >> >>> >> > structures, >> >>> >> > but this difference is blurred at the Python level. The idea is >> >>> >> > that >> >>> >> > you shouldn't have to think about which you have, but if you work >> >>> >> > with >> >>> >> > C/Fortran, then of course you do have to be constantly aware of >> >>> >> > the >> >>> >> > underlying implementation anyway. >> >>> >> >> >>> >> I don't think this bothers me -- I think it's analogous to things >> >>> >> in >> >>> >> numpy like Fortran order and non-contiguous arrays -- you can >> >>> >> ignore >> >>> >> all >> >>> >> that when working in pure python when performance isn't critical, >> >>> >> but >> >>> >> you need a deeper understanding if you want to work with the data >> >>> >> in C >> >>> >> or Fortran or to tune performance in python. >> >>> >> >> >>> >> So as long as there is an API to query and control how things work, >> >>> >> I >> >>> >> like that it's hidden from simple python code. >> >>> >> >> >>> >> -Chris >> >>> >> >> >>> >> >> >>> > >> >>> > I'm similarly not too concerned about it. Performance seems finicky >> >>> > when >> >>> > you're dealing with missing data, since a lot of arrays will likely >> >>> > have >> >>> > to >> >>> > be copied over to other arrays containing only complete data before >> >>> > being >> >>> > handed over to BLAS. >> >>> >> >>> Unless you know the neutral value for the computation or you just want >> >>> to do a forward_fill in time series, and you have to ask the user not >> >>> to give you an unmutable array with NAs if they don't want extra >> >>> copies. >> >>> >> >>> Josef >> >>> >> >> >> >> Mean value replacement, or more generally single scalar value >> >> replacement, >> >> is generally not a good idea. It biases downward your standard error >> >> estimates if you use mean replacement, and it will bias both if you use >> >> anything other than mean replacement. The bias is gets worse with more >> >> missing data. So it's worst in the precisely the cases where you'd want >> >> to >> >> fill in the data the most. (Though I admit I'm not too familiar with >> >> time >> >> series, so maybe this doesn't apply. But it's true as a general >> >> principle in >> >> statistics.) I'm not sure why we'd want to make this use case easier. >> >> Another qualification on this (I cannot help it). >> I think this only applies if you use a prefabricated no-missing-values >> algorithm. If I write it myself, I can do the proper correction for >> the reduced number of observations. (similar to the case when we >> ignore correlated information and use statistics based on uncorrelated >> observations which also overestimate the amount of information we have >> available.) >> > > Can you do that sort of technique with longitudinal (panel) data? I'm > honestly curious because I haven't looked into such corrections before. I > haven't been able to find a reference after a few quick google searches. I > don't suppose you know one off the top of your head? I was thinking mainly of simple cases where the correction only requires to correctly count the number of observations in order to adjust the degrees of freedom. For example, statistical tests that are based on relatively simple statistics or ANOVA which just needs a correct counting of the number of observations by groups. (This might be partially covered by any NA ufunc implementation, that does mean, var and cov correctly and maybe sorting like the current NaN sort.) In the panel data case it might be possible to do this, if it can just be treated like an unbalanced panel. I guess it depends on the details of the model. For regression, one way to remove an observation is to include a dummy variable for that observation, or use X'X with rows zeroed out. R has a package for multivariate normal with missing values that allows calculation of expected values for the missing ones. But in many of these cases, getting a clean (no-NA) copy of the data will be simpler to implement. (Leave-one-out cross validation as an IGNORE problem, instead of slicing?) Then there are cases where the missingness contains information. If observations are not randomly missing, then dropping the missing information will bias the estimation results, and proper treatment would require to model the fact that data is missing separately, e.g. with a first step binomial model. Censored observations, e.g. no measurements below a machine threshold are observed (maybe a Tobit model), ... filling forward might create mass points in the distribution, which (to be "clean") would also have to be taken into account if they are a sizable fraction of the data. However, dropping observations (or outliers) might not be possible with time series data (or time series panel data) if it screws up the interpretation of equal spaced time periods. (Electricity or weather forecasts if your hours or seasons get shifted all the time.) Then it's starting to get messy, and I haven't looked at any details. I'm just making up stories at this point. But, I guess, in these cases it will often end up working with a data array and a mask array. > And you're right about the last measurement carried forward. I was just > thinking about filling in all missing values with the same value. If I remember correctly, forward filling also showed up several times on the mailing lists from scikits.timeseries users. Josef > -Chris Jordan-Squire > PS--Thanks for mentioning the statsmodels discussion. I'd been keeping track > of that on a different email account, and I haven't realized it wasn't > forwarding those messages correctly. > > >> >> Josef >> >> >> > >> > We just discussed a use case for pandas on the statsmodels mailing >> > list, minute data of stock quotes (prices), if the quote is NA then >> > fill it with the last price quote. If it would be necessary for memory >> > usage and performance, this can be handled efficiently and with >> > minimal copying. >> > >> > If you want to fill in a missing value without messing up any result >> > statistics, then there is a large literature in statistics on >> > imputations, repeatedly assigning values to a NA from an underlying >> > distribution. scipy/statsmodels doesn't have anything like this (yet) >> > but R and the others have it available, and it looks more popular in >> > bio-statistics. >> > >> > (But similar to what Dag said, for statistical analysis it will be >> > necessary to keep case specific masks and data arrays around. I >> > haven't actually written any missing values algorithm yet, so I'm >> > quite again.) >> > >> > Josef >> > >> >> -Chris Jordan-Squire >> >> >> >>> >> >>> > My primary concern is that the np.NA stuff 'just >> >>> > works'. Especially since I've never run into use cases in statistics >> >>> > where >> >>> > the difference between IGNORE and NA mattered. >> >>> > >> >>> > >> >>> >> >> >>> >> >> >>> >> -- >> >>> >> Christopher Barker, Ph.D. >> >>> >> Oceanographer >> >>> >> >> >>> >> Emergency Response Division >> >>> >> NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice >> >>> >> 7600 Sand Point Way NE ? (206) 526-6329 ? fax >> >>> >> Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception >> >>> >> >> >>> >> Chris.Barker at noaa.gov >> >>> >> _______________________________________________ >> >>> >> NumPy-Discussion mailing list >> >>> >> NumPy-Discussion at scipy.org >> >>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >>> > >> >>> > >> >>> > _______________________________________________ >> >>> > NumPy-Discussion mailing list >> >>> > NumPy-Discussion at scipy.org >> >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >>> > >> >>> > >> >>> _______________________________________________ >> >>> NumPy-Discussion mailing list >> >>> NumPy-Discussion at scipy.org >> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From strang at nmr.mgh.harvard.edu Wed Jul 6 21:09:03 2011 From: strang at nmr.mgh.harvard.edu (Gary Strangman) Date: Wed, 6 Jul 2011 21:09:03 -0400 (EDT) Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: Message-ID: (snip discussion of open kimono) > On the other hand, to try and conceal these implementation > differences, seems to me to break my feeling for numpy arrays, and > make me feel I have an object that is rather magic, that I don't fully > understand, and for which clever stuff is going on, under the hood, > that I worry about but have to trust. To weigh-in as someone less tipsy, I totally agree with this concern. In fact, in trying to understand the proposal myself--and I use numpy R NAs all the time--it was difficult to understand, and I don't think I have fully gotten it yet. That makes it seem like magic, and magic makes me seriously nervous ... specifically, that I won't get what I intended, which will lead to nearly-impossible-to-find bugs. > I think this is not the numpy way. I think I fully understand why > it's attractive, but I continue to think that it's a mistake, and one > that may take some time to become clear. It will become clear only > after a few years of trying to teach people, and noticing that when > they get to this stuff, they start switching off, and getting a bit > confused, and concluding it's all too hard for them. Agreed. For ultra simplicity, I'd be perfectly happy with a np.NA element (bitpattern?) that I could use to represent points that will forevermore be missing, as well as a masking capability that allows multiple masking values (not just true/false) such as: a.mask[3] = 0 # unmasked a.mask[3] = 1 # masked "type 1" (eg, missing?) a.mask[3] = 2 # masked "type 2" (eg, data from different source) a.mask[3] = 3 # masked "type 3" (eg, ignore in complete-case analysis) etc. Regardless of whether a mask is boolean or more, though, the simplicity of explaining masking separate from NA cases is, I think, a huge win. -best Gary The information in this e-mail is intended only for the person to whom it is addressed. If you believe this e-mail was sent to you in error and the e-mail contains patient information, please contact the Partners Compliance HelpLine at http://www.partners.org/complianceline . If the e-mail was sent to you in error but does not contain patient information, please contact the sender and properly dispose of the e-mail. From xscript at gmx.net Wed Jul 6 21:09:56 2011 From: xscript at gmx.net (=?utf-8?Q?Llu=C3=ADs?=) Date: Thu, 07 Jul 2011 03:09:56 +0200 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: (Mark Wiebe's message of "Wed, 6 Jul 2011 13:56:01 -0500") References: Message-ID: <87oc16521n.fsf@fulla.xlab.taz> Mark Wiebe writes: > On Wed, Jul 6, 2011 at 12:41 PM, Pierre GM wrote: > ?Ah, semantics... > On Jul 6, 2011, at 5:40 PM, Mark Wiebe wrote: >> >> NA (Not Available) >> ? ? A placeholder for a value which is unknown to computations. That >> ? ? value may be temporarily hidden with a mask, may have been lost >> ? ? due to hard drive corruption, or gone for any number of reasons. >> ? ? This is the same as NA in the R project. > I have a problem with 'temporarily hidden with a mask'. In my mind, the > concept of NA carries a notion of perennation. The data is just not > available, just as a NaN is just not a number. > Yes, this gets directly to what I've been meaning when I say NA vs IGNORE is > independent of mask vs bitpattern. The way I'm trying to structure things, NA > vs IGNORE only affects the semantic meaning, i.e. the outputs produced by > computations. This is precisely why I put?'temporarily hidden with a mask' > first, to make that more clear. > ? >> IGNORE (Skip/Ignore) >> ? ? A placeholder which should be treated by computations as if no value > does >> ? ? or could exist there. For sums, this means act as if the value >> ? ? were zero, and for products, this means act as if the value were one. >> ? ? It's as if the array were compressed in some fashion to not include >> ? ? that element. > A data temporarily hidden by a mask becomes np.IGNORE. > Are you willing to suspend the idea of that implication for the purposes of the > present discussion? If not, do you see a way to amend things so that masked NAs > and bitpattern-based IGNOREs make sense? Would renaming IGNORE to SKIP be more > clear, perhaps? Yes, I was going to propose something similar. The NA/IGNORE is about the propagation mechanism, and this is not as explicit in NA as it is in IGNORE. So maybe, and avoiding too much concept renaming: NA (Propagate) ... IGNORE (Skip) ... Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From njs at pobox.com Wed Jul 6 21:27:13 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jul 2011 18:27:13 -0700 Subject: [Numpy-discussion] miniNEP1: where= argument for ufuncs In-Reply-To: <8762ne6hxj.fsf@fulla.xlab.taz> References: <8762ne6hxj.fsf@fulla.xlab.taz> Message-ID: On Wed, Jul 6, 2011 at 5:41 PM, Llu?s wrote: > Sorry, but I didn't find a way of inserting inline comments in the gist. I'm a little confused about how gists work, actually. For actual discussion, it's probably just as well, since this way everyone sees the comment on the list and has a chance to join the conversation... but I'd be just as happy if other people could just go in and edit it, and I'm not sure how that works. I'm happy to move to somewhere else if people have suggestions, this was just easiest. > Nathaniel Smith writes: > [...] >> Is there any less stupid-looking name than ``where1=`` and ``where2=`` >> for the ``.outer`` operation? (For that matter, can ``.outer`` be >> applied to more than 2 arrays? The docs say it can't, but it's >> perfectly well-defined for arbitrary number of arrays too, so maybe we >> want an interface that allows for 3-way, 4-way etc. ``.outer`` >> operations in the future?) > > Well, if outer can indeed be defined for an arbitrary number of arrays > (and if it's going to be sometime in the future), I'd say the simplest > is to use an array: > > ? ?.outer(a, b, ..., where = [my_where1, my_where2, ...]) Yeah, that's a much better idea... I've edited it to match. -- Nathaniel From njs at pobox.com Wed Jul 6 21:34:53 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jul 2011 18:34:53 -0700 Subject: [Numpy-discussion] miniNEP 2: NA support via special dtypes Message-ID: Well, everyone seems to like my first attempt at this so far, so I guess I'll really stick my foot in it now... here's my second miniNEP, which lays out a plan for handling dtype/bit-pattern-style NAs. I've stolen bits of text from both the NEP and the alterNEP for this, but since the focus is on nailing down the details, most of the content is new. There are many FIXME's noted, where some decisions or more work is needed... the idea here is to lay out some specifics, so we can figure out if the idea will work and get the details right. So feedback is *very* welcome! Master version: https://gist.github.com/1068264 Current version for commenting: ####################################### miniNEP 2: NA support via special dtypes ####################################### To try and make more progress on the whole missing values/masked arrays/... debate, it seems useful to have a more technical discussion of the pieces which we *can* agree on. This is the second, which attempts to nail down the details of how NAs can be implemented using special dtype's. ***************** Table of contents ***************** .. contents:: ********* Rationale ********* An ordinary value is something like an integer or a floating point number. A missing value is a placeholder for an ordinary value that is for some reason unavailable. For example, in working with statistical data, we often build tables in which each row represents one item, and each column represents properties of that item. For instance, we might take a group of people and for each one record height, age, education level, and income, and then stick these values into a table. But then we discover that our research assistant screwed up and forgot to record the age of one of our individuals. We could throw out the rest of their data as well, but this would be wasteful; even such an incomplete row is still perfectly usable for some analyses (e.g., we can compute the correlation of height and income). The traditional way to handle this would be to stick some particular meaningless value in for the missing data, e.g., recording this person's age as 0. But this is very error prone; we may later forget about these special values while running other analyses, and discover to our surprise that babies have higher incomes than teenagers. (In this case, the solution would be to just leave out all the items where we have no age recorded, but this isn't a general solution; many analyses require something more clever to handle missing values.) So instead of using an ordinary value like 0, we define a special "missing" value, written "NA" for "not available". There are several possible ways to represent such a value in memory. For instance, we could reserve a specific value (like 0, or a particular NaN, or the smallest negative integer) and then ensure that this value is treated specially by all arithmetic and other operations on our array. Another option would be to add an additional mask array next to our main array, use this to indicate which values should be treated as NA, and then extend our array operations to check this mask array whenever performing computations. Each implementation approach has various strengths and weaknesses, but here we focus on the former (value-based) approach exclusively and leave the possible addition of the latter to future discussion. The core advantages of this approach are (1) it adds no additional memory overhead, (2) it is straightforward to store and retrieve such arrays to disk using existing file storage formats, (3) it allows binary compatibility with R arrays including NA values, (4) it is compatible with the common practice of using NaN to indicate missingness when working with floating point numbers, (5) the dtype is already a place where `weird things can happen' -- there are a wide variety of dtypes that don't act like ordinary numbers (including structs, Python objects, fixed-length strings, ...), so code that accepts arbitrary numpy arrays already has to be prepared to handle these (even if only by checking for them and raising an error). Therefore adding yet more new dtypes has less impact on extension authors than if we change the ndarray object itself. The basic semantics of NA values are as follows. Like any other value, they must be supported by your array's dtype -- you can't store a floating point number in an array with dtype=int32, and you can't store an NA in it either. You need an array with dtype=NAint32 or something (exact syntax to be determined). Otherwise, NA values act exactly like any other values. In particular, you can apply arithmetic functions and so forth to them. By default, any function which takes an NA as an argument always returns an NA as well, regardless of the values of the other arguments. This ensures that if we try to compute the correlation of income with age, we will get "NA", meaning "given that some of the entries could be anything, the answer could be anything as well". This reminds us to spend a moment thinking about how we should rephrase our question to be more meaningful. And as a convenience for those times when you do decide that you just want the correlation between the known ages and income, then you can enable this behavior by adding a single argument to your function call. For floating point computations, NAs and NaNs have (almost?) identical behavior. But they represent different things -- NaN an invalid computation like 0/0, NA a value that is not available -- and distinguishing between these things is useful because in some situations they should be treated differently. (For example, an imputation procedure should replace NAs with imputed values, but probably should leave NaNs alone.) And anyway, we can't use NaNs for integers, or strings, or booleans, so we need NA anyway, and once we have NA support for all these types, we might as well support it for floating point too for consistency. **************** General strategy **************** Numpy already has a general mechanism for defining new dtypes and slotting them in so that they're supported by ndarrays, by the casting machinery, by ufuncs, and so on. In principle, we could implement NA-dtypes just using these existing interfaces. But we don't want to do that, because defining all those new ufunc loops etc. from scratch would be a huge hassle, especially since the basic functionality needed is the same in all cases. So we need some generic functionality for NAs -- but it would be better not to bake this in as a single set of special "NA types", since users may well want to define new custom dtypes that have their own NA values, and have them integrate well the rest of the NA machinery. Our strategy, therefore, is to avoid the `mid-layer mistake`_ by exposing some code for generic NA handling in different situations, which dtypes can selectively use or not as they choose. .. _mid-layer mistake: https://lwn.net/Articles/336262/ Some example use cases: 1. We want to define a dtype that acts exactly like an int32, except that the most negative value is treated as NA. 2. We want to define a parametrized dtype to represent `categorical data`_, and the bit-pattern to be used for NA depends on the number of categories defined, so our code needs to play an active role handling it rather than simply deferring to the standard machinery. 3. We want to define a dtype that acts like an length-10 string and supports NAs. Since our string may hold arbitrary binary values, we want to actually allocate 11 bytes for it, with the first byte a flag indicating whether this string is NA and the rest containing the string content. 4. We want to define a dtype that allows multiple different types of NA data, which print differently and can be distinguished by the new ufunc that we define called ``is_na_of_type(...)``, but otherwise takes advantage of the generic NA machinery for most operations. .. _categorical data: http://mail.scipy.org/pipermail/numpy-discussion/2010-August/052401.html **************************** dtype C-level API extensions **************************** The `PyArray_Descr`_ struct gains the following new fields:: void * NA_value; PyArray_Descr * NA_extends; int NA_extends_offset; .. _PyArray_Descr: http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#PyArray_Descr The following new flag values are defined:: NPY_NA_SUPPORTED NPY_NA_AUTO_ARRFUNCS NPY_NA_AUTO_CAST NPY_NA_AUTO_UFUNC NPY_NA_AUTO_ALL /* the above flags OR'ed together */ The `PyArray_ArrFuncs`_ struct gains the following new fields:: void (*isna)(void * src, void * dst, npy_intp n, void * arr); void (*clearna)(void * data, npy_intp n, void * arr); .. _PyArray_ArrFuncs: http://docs.scipy.org/doc/numpy/reference/c-api.types-and-structures.html#PyArray_ArrFuncs The general idea is that anywhere where we used to call a dtype-specific function pointer, the code will be modified to instead: 1. Check for whether the relevant NPY_NA_AUTO_... bit is enabled, the NA_extends field is non-NULL, and the function pointer we wanted to call is NULL. 2. If these conditions are met, then use ``isna`` to identify which entries in the array are NA, and handle them appropriately. Then look up whatever function we were *going* to call using this dtype on the ``NA_extends`` dtype instead, and use that to handle the non-NA elements. For more specifics, see following sections. Note that if ``NA_extends`` points to a parametrized dtype, then the dtype object it points to must be fully specified. For example, if it is a string dtype, it must have a non-zero ``elsize`` field. In order to handle the case where the NA information is stored in a field next to the `real' data, the ``NA_extends_offset`` field is set to a non-zero value; it must point to the location within each element of this dtype where some data of the ``NA_extends`` dtype is found. For example, if we have are storing 10-byte strings with an NA indicator byte at the beginning, then we have:: elsize == 11 NA_extends_offset == 1 NA_extends->elsize == 10 When delegating to the ``NA_extends`` dtype, we offset our data pointer by ``NA_extends_offset`` (while keeping our strides the same) so that it sees an array of data of the expected type (plus some superfluous padding). This is basically the same mechanism that record dtypes use, IIUC, so it should be pretty well-tested. When delegating to a function that cannot handle `misbehaved' source data (see the ``PyArray_ArrFuncs`` documentation for details), then we need to check for alignment issues before delegating (especially with a non-zero ``NA_extends_offset``). If there's a problem, when we need to `clean up' the source data first, using the usual mechanisms for handling misaligned data. (Of course, we should usually set up our dtypes so that there aren't any alignment issues, but someone screws that up, or decides that reduced memory usage is more important to them then fast inner loops, then we should still handle that gracefully, as we do now.) The ``NA_value`` and ``clearna`` fields are used for various sorts of casting. ``NA_value`` is a bit-pattern to be used when, for example, assigning from np.NA. ``clearna`` can be a no-op if ``elsize`` and ``NA_extends->elsize`` are the same, but if they aren't then it should clear whatever auxiliary NA storage this dtype uses, so that none of the specified array elements are NA. -------------------- Core dtype functions -------------------- The following functions are defined in ``PyArray_ArrFuncs``. The special behavior described here is enabled by the NPY_NA_AUTO_ARRFUNCS bit in the dtype flags, and only enabled if the given function field is *not* filled in. ``getitem``: Calls ``isna``. If ``isna`` returns true, returns np.NA. Otherwise, delegates to the ``NA_extends`` dtype. ``setitem``: If the input object is ``np.NA``, then runs ``memcpy(self->NA_value, data, arr->dtype->elsize);''. Otherwise, calls ``clearna``, and then delegates to the ``NA_extends`` dtype. ``copyswapn``, ``copyswap``: FIXME: Not sure whether there's any special handling to use for these? ``compare``: FIXME: how should this handle NAs? R's sort function *discards* NAs, which doesn't seem like a good option. ``argmax``: FIXME: what is this used for? If it's the underlying implementation for np.max, then it really needs some way to get a skipna argument. If not, then the appropriate semantics depends on what it's supposed to accomplish... ``dotfunc``: QUESTION: is it actually guaranteed that everything has the same dtype? FIXME: same issues as for ``argmax``. ``scanfunc``: This one's ugly. We may have to explicitly override it in all of our special dtypes, because assuming that we want the option of, say, having the token "NA" represent an NA value in a text file, we need some way to check whether that's there before delegating. But ``ungetc`` is only guaranteed to let us put back 1 character, and we need 2 (or maybe 3 if we actually check for "NA "). The other option would be to read to the next delimiter, check whether we have an NA, and if not then delegate to ``fromstr`` instead of ``scanfunc``, but according to the current API, each dtype might in principle use a totally different rule for defining `the next delimiter'. So... any ideas? (FIXME) ``fromstr``: Easy -- check for "NA ", if present then assign ``NA_value``, otherwise call ``clearna`` and delegate. ``nonzero``: FIXME: again, what is this used for? (It seems redundant with using the casting machinery to cast to bool.) Probably it needs to be modified so that it can return NA, though... ``fill``: Use ``isna`` to check if either of the first two values is NA. If so, then fill the rest of the array with ``NA_value``. Otherwise, call ``clearna`` and then delegate. ``fillwithvalue``: Guess this can just delegate? ``sort``, ``argsort``: These should probably arrange to sort NAs to a particular place in the array (either the front or the back -- any opinions?) ``scalarkind``: FIXME: I have no idea what this does. ``castdict``, ``cancastscalarkindto``, ``cancastto``: See section on casting below. ------- Casting ------- FIXME: this really needs attention from an expert on numpy's casting rules. But I can't seem to find the docs that explain how casting loops are looked up and decided between (e.g., if you're casting from dtype A to dtype B, which dtype's loops are used?), so I can't go into details. But those details are tricky and they matter... But the general idea is, if you have a dtype with ``NPY_NA_AUTO_CAST`` set, then the following conversions are automatically allowed: * Casting from the underlying type to the NA-type: this is performed by the usual ``clearna`` + potentially-strided copy dance. Also, ``isna`` is called to check that none of the regular values have been accidentally converted into NA; if so, then an error is raised. * Casting from the NA-type to the underlying type: allowed in principle, but if ``isna`` returns true for any of the values that are to be converted, then again, an error is raised. (If you want to get around this, use ``np.view(array_with_NAs, dtype=float)``.) * Casting between the NA-type and other types that do not support NA: this is allowed if the underlying type is allowed to cast to the other type, and is performed by combining a cast to or from the underlying type (using the above rules) with a cast to or from the other type (using the underlying type's rules). * Casting between the NA-type and other types that do support NA: if the other type has NPY_NA_AUTO_CAST set, then we use the above rules plus the usual dance with ``isna`` on one array being converted to ``NA_value`` elements in the other. If only one of the arrays has NPY_NA_AUTO_CAST set, then it's assumed that that dtype knows what it's doing, and we don't do any magic. (But this is one of the things that I'm not sure makes sense, as per my caveat above.) ------ Ufuncs ------ All ufuncs gain an additional optional keyword argument, ``skipNA=``, which defaults to False. If ``skipNA == True``, then the ufunc machinery *unconditionally* calls ``isna``, and then acts as if any values for which isna returns True were masked out in the ``where=`` argument (see miniNEP 1 for the behavior of ``where=``). If a ``where=`` argument is also given, then it acts as if the ``isna`` values had be ANDed out of the ``where=`` mask, though it does not actually modify the mask. Unlike the other changes below, this is performed *unconditionally* for any dtype which has an ``isna`` function defined; the NPY_NA_AUTO_UFUNC flag is *not* checked. If NPY_NA_AUTO_UFUNC is set, then ufunc loop lookup is modified so that whenever it checks for the existence of a loop on the current dtype, and does not find one, then it also checks for a loop on the ``NA_extends`` dtype. If that loop is found, then it uses it in the normal way, with the exceptions that (1) it is only called for values which are not NA according to ``isna``, (2) if the output array has NPY_NA_AUTO_UFUNC set, then ``clearna`` is called on it before calling the ufunc loop, (3) pointer offsets are adjusted by ``NA_extends_offset`` before calling the ufunc loop. FIXME: We should go into more detail here about how NPY_NA_AUTO_UFUNC works when there are multiple input arrays, of which potentially some have the flag set and some do not. -------- Printing -------- FIXME: There should be some sort of mechanism by which values which are NA are automatically repr'ed as NA, but I don't really understand how numpy printing works, so I'll let someone else fill in this section. -------- Indexing -------- Scalar indexing like ``a[12]`` goes via the ``getitem`` function, so according to the proposal as described above, if a dtype delegates ``getitem``, then scalar indexing on NAs will return the object ``np.NA``. (If it doesn't delegate ``getitem``, of course, then it can return whatever it wants.) This seems like the simplest approach, but an alternative would be to add a special case to scalar indexing, where if an ``NPY_NA_AUTO_INDEX`` flag were set, then it would call ``isna`` on the specified element. If this returned false, it would call ``getitem`` as usual; otherwise, it would return a 0-d array containing the specified element. The problem with this is that it breaks expressions like ``if a[i] is np.NA: ...``. (Of course, there is nothing nearly so convenient as that for NaN values now, but then, NaN values don't have their own global singleton.) So for now we stick to scalar indexing just returning ``np.NA``, but this can be revisited if anyone objects. ********************************* Python API for generic NA support ********************************* NumPy will gain a global singleton called numpy.NA, similar to None, but with semantics reflecting its status as a missing value. In particular, trying to treat it as a boolean will raise an exception, and comparisons with it will produce numpy.NA instead of True or False. These basics are adopted from the behavior of the NA value in the R project. To dig deeper into the ideas, http://en.wikipedia.org/wiki/Ternary_logic#Kleene_logic provides a starting point. Most operations on ``np.NA`` (e.g., ``__add__``, ``__mul__``) are overridden to unconditionally return ``np.NA``. The automagic dtype detection used for expressions like ``np.asarray([1, 2, 3])``, ``np.asarray([1.0, 2.0. 3.0])`` will be extended to recognize the ``np.NA`` value, and use it to automatically switch to a built-in NA-enabled dtype (which one being determined by the other elements in the array). A simple ``np.asarray([np.NA])`` will use an NA-enabled float64 dtype (which is analogous to what you get from ``np.asarray([])``). Note that this means that expressions like ``np.log(np.NA)`` will work: first ``np.NA`` will be coerced to a 0-d NA-float array, and then ``np.log`` will be called on that. Python-level dtype objects gain the following new fields:: NA_supported NA_value ``NA_supported`` is a boolean which simply exposes the value of the ``NPY_NA_SUPPORTED`` flag; it should be true if this dtype allows for NAs, false otherwise. [FIXME: would it be better to just key this off the existence of the ``isna`` function? Even if a dtype decides to implement all other NA handling itself, it still has to define ``isna`` in order to make ``skipNA=`` work correctly.] ``NA_value`` is a 0-d array of the given dtype, and its sole element contains the same bit-pattern as the dtype's underlying ``NA_value`` field. This makes it possible to determine the default bit-pattern for NA values for this type (e.g., with ``np.view(mydtype.NA_value, dtype=int8)``). We *do not* expose the ``NA_extends`` and ``NA_extends_offset`` values at the Python level, at least for now; they're considered an implementation detail (and it's easier to expose them later if they're needed then unexpose them if they aren't). Two new ufuncs are defined: ``np.isNA`` returns a logical array, with true values where-ever the dtype's ``isna`` function returned true. ``np.isnumber`` is only defined for numeric dtypes, and returns True for all elements which are not NA, and for which ``np.isfinite`` would return True. ***************** Builtin NA dtypes ***************** The above describes the generic machinery for NA support in dtypes. It's flexible enough to handle all sorts of situations, but we also want to define a few generally useful NA-supporting dtypes that are available by default. For each built-in dtype, we define an associated NA-supporting dtype, as follows:: floats: the associated dtype uses a specific NaN bit-pattern to indicate NA (chosen for R compatibility) complex: we do whatever R does (FIXME: look this up -- two NA floats, probably?) signed integers: the most-negative signed value is used as NA (chosen for R compatibility) unsigned integers: the most-positive value is used as NA (no R compatibility possible). strings: the first byte (or, in the case of unicode strings, first 4 bytes) is used as a flag to indicate NA, and the rest of the data gives the actual string. (no R compatibility possible) objects: Two options (FIXME): either we don't include an NA-ful version, or we use np.NA as the NA bit pattern. boolean: we do whatever R does (FIXME: look this up -- 0 == FALSE, 1 == TRUE, 2 == NA?) Each of these dtypes is trivially defined using the above machinery, and are what are automatically used by the automagic type inference machinery (for ``np.asarray([True, np.NA, False])``, etc.). They can also be accessed via a new function ``np.withNA``, which takes a regular dtype (or an object that can be coerced to a dtype, like 'float') and returns one of the above dtypes. Ideally ``withNA`` should also take some optional arguments that let you describe which values you want to count as NA, etc., but I'll leave that for a future draft (FIXME). FIXME: If ``d`` is one of the above dtypes, then should ``d.type`` return? The NEP also contains a proposal for a somewhat elaborate domain-specific-language for describing NA dtypes. I'm not sure how great an idea that is. (I have a bias against using strings as data structures, and find the already existing strings confusing enough as it is -- also, apparently the NEP version of numpy uses strings like 'f8' when printing dtypes, while my numpy uses object names like 'float64', so I'm not sure what's going on there. ``withNA(float64, arg1=value1)`` seems like a more pleasant way to print a dtype than "NA[f8,value1]", at least to me.) But if people want it, then cool. -------------- Type hierarchy -------------- FIXME: how should we do subtype checks, etc., for NA dtypes? What does ``issubdtype(withNA(float), float)`` return? How about ``issubdtype(withNA(float), np.floating)``? ------------- Serialization ------------- FIXME: How are dtypes stored in .npz or pickle files? Do we need to do anything special here? From jsseabold at gmail.com Wed Jul 6 21:43:25 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 6 Jul 2011 21:43:25 -0400 Subject: [Numpy-discussion] NA/Missing Data Conference Call Summary In-Reply-To: References: <4E14AB8A.90707@noaa.gov> Message-ID: On Wed, Jul 6, 2011 at 7:14 PM, Christopher Jordan-Squire wrote: > On Wed, Jul 6, 2011 at 3:47 PM, wrote: >> On Wed, Jul 6, 2011 at 4:38 PM, ? wrote: >> > On Wed, Jul 6, 2011 at 4:22 PM, Christopher Jordan-Squire >> >> Mean value replacement, or more generally single scalar value >> >> replacement, >> >> is generally not a good idea. It biases downward your standard error >> >> estimates if you use mean replacement, and it will bias both if you use >> >> anything other than mean replacement. The bias is gets worse with more >> >> missing data. So it's worst in the precisely the cases where you'd want >> >> to >> >> fill in the data the most. (Though I admit I'm not too familiar with >> >> time >> >> series, so maybe this doesn't apply. But it's true as a general >> >> principle in >> >> statistics.) I'm not sure why we'd want to make this use case easier. >> >> Another qualification on this (I cannot help it). >> I think this only applies if you use a prefabricated no-missing-values >> algorithm. If I write it myself, I can do the proper correction for >> the reduced number of observations. (similar to the case when we >> ignore correlated information and use statistics based on uncorrelated >> observations which also overestimate the amount of information we have >> available.) >> > > Can you do that sort of technique with longitudinal (panel) data? I'm > honestly curious because I haven't looked into such corrections before. I > haven't been able to find a reference after a few quick google searches. I > don't suppose you know one off the top of your head? > And you're right about the last measurement carried forward. I was just > thinking about filling in all missing values with the same value. > -Chris Jordan-Squire > PS--Thanks for mentioning the statsmodels discussion. I'd been keeping track > of that on a different email account, and I haven't realized it wasn't > forwarding those messages correctly. > Maybe a bit OT, but I've seen people doing imputation using Bayesian MCMC or multiple imputation for missing values in panel data. Google 'data augmentation' or 'multiple imputation'. I haven't looked much into the details yet, but it's definitely not mean replacement. FWIW (I haven't been following closely the discussion), there is a distinction in statistics between ignorable and nonignorable missing data, but I can't think of a situation where I would need this at the computational level rather than relying on a (numerically comparable) missing data type(s) a la SAS/Stata. I've also found the odd examples of IGNORE without a clear answer to be scary. Skipper From charlesr.harris at gmail.com Wed Jul 6 22:01:34 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Jul 2011 20:01:34 -0600 Subject: [Numpy-discussion] miniNEP 2: NA support via special dtypes In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 7:34 PM, Nathaniel Smith wrote: > Well, everyone seems to like my first attempt at this so far, so I > guess I'll really stick my foot in it now... here's my second miniNEP, > which lays out a plan for handling dtype/bit-pattern-style NAs. I've > stolen bits of text from both the NEP and the alterNEP for this, but > since the focus is on nailing down the details, most of the content is > new. > > There are many FIXME's noted, where some decisions or more work is > needed... the idea here is to lay out some specifics, so we can figure > out if the idea will work and get the details right. So feedback is > *very* welcome! > > Master version: > https://gist.github.com/1068264 > > Current version for commenting: > > ####################################### > miniNEP 2: NA support via special dtypes > ####################################### > > To try and make more progress on the whole missing values/masked > arrays/... debate, it seems useful to have a more technical discussion > of the pieces which we *can* agree on. This is the second, which > attempts to nail down the details of how NAs can be implemented using > special dtype's. > > ***************** > Table of contents > ***************** > > .. contents:: > > ********* > Rationale > ********* > > An ordinary value is something like an integer or a floating point > number. A missing value is a placeholder for an ordinary value that is > for some reason unavailable. For example, in working with statistical > data, we often build tables in which each row represents one item, and > each column represents properties of that item. For instance, we might > take a group of people and for each one record height, age, education > level, and income, and then stick these values into a table. But then > we discover that our research assistant screwed up and forgot to > record the age of one of our individuals. We could throw out the rest > of their data as well, but this would be wasteful; even such an > incomplete row is still perfectly usable for some analyses (e.g., we > can compute the correlation of height and income). The traditional way > to handle this would be to stick some particular meaningless value in > for the missing data, e.g., recording this person's age as 0. But this > is very error prone; we may later forget about these special values > while running other analyses, and discover to our surprise that babies > have higher incomes than teenagers. (In this case, the solution would > be to just leave out all the items where we have no age recorded, but > this isn't a general solution; many analyses require something more > clever to handle missing values.) So instead of using an ordinary > value like 0, we define a special "missing" value, written "NA" for > "not available". > > There are several possible ways to represent such a value in memory. > For instance, we could reserve a specific value (like 0, or a > particular NaN, or the smallest negative integer) and then ensure that > this value is treated specially by all arithmetic and other operations > on our array. Another option would be to add an additional mask array > next to our main array, use this to indicate which values should be > treated as NA, and then extend our array operations to check this mask > array whenever performing computations. Each implementation approach > has various strengths and weaknesses, but here we focus on the former > (value-based) approach exclusively and leave the possible addition of > the latter to future discussion. The core advantages of this approach > are (1) it adds no additional memory overhead, (2) it is > straightforward to store and retrieve such arrays to disk using > existing file storage formats, (3) it allows binary compatibility with > R arrays including NA values, (4) it is compatible with the common > practice of using NaN to indicate missingness when working with > floating point numbers, (5) the dtype is already a place where `weird > things can happen' -- there are a wide variety of dtypes that don't > act like ordinary numbers (including structs, Python objects, > fixed-length strings, ...), so code that accepts arbitrary numpy > arrays already has to be prepared to handle these (even if only by > checking for them and raising an error). Therefore adding yet more new > dtypes has less impact on extension authors than if we change the > ndarray object itself. > > The basic semantics of NA values are as follows. Like any other value, > they must be supported by your array's dtype -- you can't store a > floating point number in an array with dtype=int32, and you can't > store an NA in it either. You need an array with dtype=NAint32 or > something (exact syntax to be determined). Otherwise, NA values act > exactly like any other values. In particular, you can apply arithmetic > functions and so forth to them. By default, any function which takes > an NA as an argument always returns an NA as well, regardless of the > values of the other arguments. This ensures that if we try to compute > the correlation of income with age, we will get "NA", meaning "given > that some of the entries could be anything, the answer could be > anything as well". This reminds us to spend a moment thinking about > how we should rephrase our question to be more meaningful. And as a > convenience for those times when you do decide that you just want the > correlation between the known ages and income, then you can enable > this behavior by adding a single argument to your function call. > > For floating point computations, NAs and NaNs have (almost?) identical > behavior. But they represent different things -- NaN an invalid > computation like 0/0, NA a value that is not available -- and > distinguishing between these things is useful because in some > situations they should be treated differently. (For example, an > imputation procedure should replace NAs with imputed values, but > probably should leave NaNs alone.) And anyway, we can't use NaNs for > integers, or strings, or booleans, so we need NA anyway, and once we > have NA support for all these types, we might as well support it for > floating point too for consistency. > > **************** > General strategy > **************** > > Numpy already has a general mechanism for defining new dtypes and > slotting them in so that they're supported by ndarrays, by the casting > machinery, by ufuncs, and so on. In principle, we could implement > Well, actually not in any useful sense, take a look at what Mark went through for the half floats. There is a reason the NEP went with parametrized dtypes and masks. But we would sure welcome a plan and code to make it true, it is one of the areas that could really use improvement. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 6 22:09:50 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jul 2011 19:09:50 -0700 Subject: [Numpy-discussion] miniNEP 2: NA support via special dtypes In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 7:01 PM, Charles R Harris wrote: >> Numpy already has a general mechanism for defining new dtypes and >> slotting them in so that they're supported by ndarrays, by the casting >> machinery, by ufuncs, and so on. In principle, we could implement > > Well, actually not in any useful sense, take a look at what Mark went > through for the half floats. There is a reason the NEP went with > parametrized dtypes and masks. But we would sure welcome a plan and code to > make it true, it is one of the areas that could really use improvement. Err, yes, that's basically what the next few sentences say? This is basically a draft spec for implementing the parametrized dtypes idea. -- Nathaniel From charlesr.harris at gmail.com Wed Jul 6 22:34:56 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Jul 2011 20:34:56 -0600 Subject: [Numpy-discussion] miniNEP 2: NA support via special dtypes In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 8:09 PM, Nathaniel Smith wrote: > On Wed, Jul 6, 2011 at 7:01 PM, Charles R Harris > wrote: > >> Numpy already has a general mechanism for defining new dtypes and > >> slotting them in so that they're supported by ndarrays, by the casting > >> machinery, by ufuncs, and so on. In principle, we could implement > > > > Well, actually not in any useful sense, take a look at what Mark went > > through for the half floats. There is a reason the NEP went with > > parametrized dtypes and masks. But we would sure welcome a plan and code > to > > make it true, it is one of the areas that could really use improvement. > > Err, yes, that's basically what the next few sentences say? > > This is basically a draft spec for implementing the parametrized dtypes > idea. > > And yet: FIXME: this really needs attention from an expert on numpy's casting rules. But I can't seem to find the docs that explain how casting loops are looked up and decided between (e.g., if you're casting from dtype A to dtype B, which dtype's loops are used?), so I can't go into details. But those details are tricky and they matter... There is also a reason that masks were chosen to be implemented first. The numpy code is freely available and there is no reason not to make experiments or help Mark get some of the current problems solved, it doesn't need to be a one man effort and your feedback will have a lot more impact if you are in the trenches. In particular, I think there is a good deal of work that will need to be done for the sorts, argmax, and the other functions you mention that would give you a good idea of what was involved and how to go about implementing your ideas. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jul 6 23:26:52 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 Jul 2011 21:26:52 -0600 Subject: [Numpy-discussion] miniNEP 2: NA support via special dtypes In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 8:34 PM, Charles R Harris wrote: > > > On Wed, Jul 6, 2011 at 8:09 PM, Nathaniel Smith wrote: > >> On Wed, Jul 6, 2011 at 7:01 PM, Charles R Harris >> wrote: >> >> Numpy already has a general mechanism for defining new dtypes and >> >> slotting them in so that they're supported by ndarrays, by the casting >> >> machinery, by ufuncs, and so on. In principle, we could implement >> > >> > Well, actually not in any useful sense, take a look at what Mark went >> > through for the half floats. There is a reason the NEP went with >> > parametrized dtypes and masks. But we would sure welcome a plan and code >> to >> > make it true, it is one of the areas that could really use improvement. >> >> Err, yes, that's basically what the next few sentences say? >> >> This is basically a draft spec for implementing the parametrized dtypes >> idea. >> >> And yet: > > > FIXME: this really needs attention from an expert on numpy's casting > rules. But I can't seem to find the docs that explain how casting > loops are looked up and decided between (e.g., if you're casting from > dtype A to dtype B, which dtype's loops are used?), so I can't go into > details. But those details are tricky and they matter... > > There is also a reason that masks were chosen to be implemented first. The > numpy code is freely available and there is no reason not to make > experiments or help Mark get some of the current problems solved, it doesn't > need to be a one man effort and your feedback will have a lot more impact if > you are in the trenches. In particular, I think there is a good deal of work > that will need to be done for the sorts, argmax, and the other functions you > mention that would give you a good idea of what was involved and how to go > about implementing your ideas. > > Let me lay out a bit more how I see things developing at this point, and bear in mind that I am not a psychic so this is just a guess ;) Mark is going to work at Enthought for maybe 3-4 more weeks and then return to school. Mark is very good, but that is still a very tough schedule and all the things in the NEP may not get finished, let alone all the supporting work that will be needed around the core implementation. After that what Mark does in his spare time is up to him. I expect there will be another numpy release sometime in the Fall, maybe around Nov/Dec, to get the new features, especially the datetime work, out there. At that point the interface is semi-fixed. I like to think that new features should be regarded as experimental for at least one release cycle but that is certainly not official Numpy policy. In any case there is likely going to be a gap of several months where the rate of commits slows down and other folks, if they are interested, have a real opportunity to get involved. After the projected Fall release I see maybe another six months to make changes/extensions to the interface, and this is where new ideas can get worked out, but there needs to be someone with the interest and skill to implement those ideas for that to happen. If no such person shows up, then the interface will be what it is until there is such a person with an interest in carrying things forward. But at that point they will need take care to maintain backward compatibility unless pretty much everyone agrees that the then current interface is a disaster. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Jul 6 23:32:34 2011 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 Jul 2011 20:32:34 -0700 Subject: [Numpy-discussion] miniNEP 2: NA support via special dtypes In-Reply-To: References: Message-ID: On Wed, Jul 6, 2011 at 7:34 PM, Charles R Harris wrote: > > > On Wed, Jul 6, 2011 at 8:09 PM, Nathaniel Smith wrote: >> >> On Wed, Jul 6, 2011 at 7:01 PM, Charles R Harris >> wrote: >> >> Numpy already has a general mechanism for defining new dtypes and >> >> slotting them in so that they're supported by ndarrays, by the casting >> >> machinery, by ufuncs, and so on. In principle, we could implement >> > >> > Well, actually not in any useful sense, take a look at what Mark went >> > through for the half floats. There is a reason the NEP went with >> > parametrized dtypes and masks. But we would sure welcome a plan and code >> > to >> > make it true, it is one of the areas that could really use improvement. >> >> Err, yes, that's basically what the next few sentences say? >> >> This is basically a draft spec for implementing the parametrized dtypes >> idea. >> > And yet: > > FIXME: this really needs attention from an expert on numpy's casting > rules. But I can't seem to find the docs that explain how casting > loops are looked up and decided between (e.g., if you're casting from > dtype A to dtype B, which dtype's loops are used?), so I can't go into > details. But those details are tricky and they matter... > > There is also a reason that masks were chosen to be implemented first. The > numpy code is freely available and there is no reason not to make > experiments or help Mark get some of the current problems solved, it doesn't > need to be a one man effort and your feedback will have a lot more impact if > you are in the trenches. In particular, I think there is a good deal of work > that will need to be done for the sorts, argmax, and the other functions you > mention that would give you a good idea of what was involved and how to go > about implementing your ideas. Hi Chuck, My goal in posting this was to try to find a way for those of us who disagree to still be productive together. If you'd like to help with that in a constructive way, then please do, but otherwise, can I ask in a polite and well-meaning way that you butt out? Scolding me for not getting "in the trenches" is not helpful. People like Wes and Matthew and I have been "in the trenches" for years building up numpy as a viable platform for statistical computing. (I can't claim that my efforts compare to theirs, but see for instance [1], which is an improved version of R's formula support, one of the other key advantages it has over Python. It works, so I'd have written some docs and released it by now, except I'm defending my PhD in 4 weeks, so, well, you know.) Yes, there are some details missing from the spec I wrote up in a few hours this afternoon, but how about we solve them? There are plenty of people on this list who know more than me, or Mark, or any one of any of us. This problem is complicated, but not *that* complicated. So, you know, let's do this. And maybe that way, in a month, we'll have something that we all actually like, even if it doesn't do everything that we want. -- Nathaniel [1] https://github.com/charlton/charlton From Chris.Barker at noaa.gov Thu Jul 7 01:51:36 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Wed, 06 Jul 2011 22:51:36 -0700 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: References: <4E14A893.7040407@noaa.gov> Message-ID: <4E154968.3070004@noaa.gov> On 7/6/11 11:57 AM, Mark Wiebe wrote: > On Wed, Jul 6, 2011 at 1:25 PM, Christopher Barker > Is this really true? if you use a bitpattern for IGNORE, haven't you > just lost the ability to get the original value back if you want to stop > ignoring it? Maybe that's not inherent to what an IGNORE means, but it > seems pretty key to me. > > What do you think of renaming IGNORE to SKIP? This isn't a semantics issue -- IGNORE is fine. What I'm getting at is that we need a word (and code) for: "ignore for now, but I might want to use it later" - Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From efiring at hawaii.edu Thu Jul 7 02:46:37 2011 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 06 Jul 2011 20:46:37 -1000 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: <4E154968.3070004@noaa.gov> References: <4E14A893.7040407@noaa.gov> <4E154968.3070004@noaa.gov> Message-ID: <4E15564D.7000009@hawaii.edu> On 07/06/2011 07:51 PM, Chris Barker wrote: > On 7/6/11 11:57 AM, Mark Wiebe wrote: >> On Wed, Jul 6, 2011 at 1:25 PM, Christopher Barker > >> Is this really true? if you use a bitpattern for IGNORE, haven't you >> just lost the ability to get the original value back if you want to stop >> ignoring it? Maybe that's not inherent to what an IGNORE means, but it >> seems pretty key to me. >> >> What do you think of renaming IGNORE to SKIP? > > This isn't a semantics issue -- IGNORE is fine. > > What I'm getting at is that we need a word (and code) for: > > "ignore for now, but I might want to use it later" HIDE? That implies there is still something there, potentially recoverable. Eric > > - Chris > > > > From wkerzendorf at googlemail.com Thu Jul 7 03:51:28 2011 From: wkerzendorf at googlemail.com (Wolfgang Kerzendorf) Date: Thu, 07 Jul 2011 09:51:28 +0200 Subject: [Numpy-discussion] reading in files with fixed with format Message-ID: <4E156580.6010309@gmail.com> Dear all, I have a couple of data files that were written with fortran at a fixed with. That means its tabular data which might not have spaces (it is just specified how many characters each field has and what type it is). Is there anything to read that with scipy and or numpy? Cheers Wolfgang From miguel.deval at gmail.com Thu Jul 7 04:51:40 2011 From: miguel.deval at gmail.com (Miguel de Val-Borro) Date: Thu, 7 Jul 2011 10:51:40 +0200 Subject: [Numpy-discussion] reading in files with fixed with format In-Reply-To: <4E156580.6010309@gmail.com> References: <4E156580.6010309@gmail.com> Message-ID: <20110707085140.GB26058@poincare.pc.linmpi.mpg.de> The function numpy.genfromtxt reads text files into arrays. There is an example on how to deal with fixed-width columns using the delimiter argument in the docstring and in the I/O chapter of the user guide: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#the-delimiter-argument Miguel On Thu, Jul 07, 2011 at 09:51:28AM +0200, Wolfgang Kerzendorf wrote: > Dear all, > > I have a couple of data files that were written with fortran at a fixed > with. That means its tabular data which might not have spaces (it is > just specified how many characters each field has and what type it is). > Is there anything to read that with scipy and or numpy? > > Cheers > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Thu Jul 7 05:14:23 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 7 Jul 2011 11:14:23 +0200 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: <4E15564D.7000009@hawaii.edu> References: <4E14A893.7040407@noaa.gov> <4E154968.3070004@noaa.gov> <4E15564D.7000009@hawaii.edu> Message-ID: <73F1F597-5E7F-49C4-B7AF-AC4D8E0592D4@gmail.com> On Jul 7, 2011, at 8:46 AM, Eric Firing wrote: > On 07/06/2011 07:51 PM, Chris Barker wrote: >> On 7/6/11 11:57 AM, Mark Wiebe wrote: >>> On Wed, Jul 6, 2011 at 1:25 PM, Christopher Barker >> >>> Is this really true? if you use a bitpattern for IGNORE, haven't you >>> just lost the ability to get the original value back if you want to stop >>> ignoring it? Maybe that's not inherent to what an IGNORE means, but it >>> seems pretty key to me. >>> >>> What do you think of renaming IGNORE to SKIP? >> >> This isn't a semantics issue -- IGNORE is fine. >> >> What I'm getting at is that we need a word (and code) for: >> >> "ignore for now, but I might want to use it later" > > HIDE? That implies there is still something there, potentially recoverable. > > Eric +1 From d.s.seljebotn at astro.uio.no Thu Jul 7 05:15:33 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Thu, 07 Jul 2011 11:15:33 +0200 Subject: [Numpy-discussion] using the same vocabulary for missing value ideas In-Reply-To: <4E154968.3070004@noaa.gov> References: <4E14A893.7040407@noaa.gov> <4E154968.3070004@noaa.gov> Message-ID: <4E157935.9080707@astro.uio.no> On 07/07/2011 07:51 AM, Chris Barker wrote: > On 7/6/11 11:57 AM, Mark Wiebe wrote: >> On Wed, Jul 6, 2011 at 1:25 PM, Christopher Barker > >> Is this really true? if you use a bitpattern for IGNORE, haven't you >> just lost the ability to get the original value back if you want to stop >> ignoring it? Maybe that's not inherent to what an IGNORE means, but it >> seems pretty key to me. >> >> What do you think of renaming IGNORE to SKIP? > > This isn't a semantics issue -- IGNORE is fine. > > What I'm getting at is that we need a word (and code) for: > > "ignore for now, but I might want to use it later" Wouldn't that be IGNORE+MASK? There's (IGNORE, NA), and (MASK, BITPATTERN), with four combinations: IGNORE+MASK: "ignore for now, but I might want to use it later" NA+MASK: "treat as NA for now, but I might change my mind about that later" [1] IGNORE+BITPATTERN: Simply insert a value in an array that is 0 for addition and 1 for multiplication. IGNORE+BITPATTERN: R's NA. [1] Example on NA+MASK: Temporarily flag something as an invalid outlier to check what effect that has on final estimates. The statistical method one is using may do something different with NA data (beyond what IGNORE does), you may not know exactly what it does, just that the docs says "support NA's gracefully" and that you temporarily want to flag some outliers as such when calling that function. Dag Sverre From wkerzendorf at googlemail.com Thu Jul 7 05:22:17 2011 From: wkerzendorf at googlemail.com (Wolfgang Kerzendorf) Date: Thu, 07 Jul 2011 11:22:17 +0200 Subject: [Numpy-discussion] reading in files with fixed with format In-Reply-To: <20110707085140.GB26058@poincare.pc.linmpi.mpg.de> References: <4E156580.6010309@gmail.com> <20110707085140.GB26058@poincare.pc.linmpi.mpg.de> Message-ID: <4E157AC9.3020504@gmail.com> Thanks. That is exactley what I need. On 7/07/11 10:51 AM, Miguel de Val-Borro wrote: > The function numpy.genfromtxt reads text files into arrays. There is an > example on how to deal with fixed-width columns using the delimiter > argument in the docstring and in the I/O chapter of the user guide: > http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#the-delimiter-argument > > Miguel > > On Thu, Jul 07, 2011 at 09:51:28AM +0200, Wolfgang Kerzendorf wrote: >> Dear all, >> >> I have a couple of data files that were written with fortran at a fixed >> with. That means its tabular data which might not have spaces (it is >> just specified how many characters each field has and what type it is). >> Is there anything to read that with scipy and or numpy? >> >> Cheers >> Wolfgang >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jeffspencerd at gmail.com Thu Jul 7 06:23:46 2011 From: jeffspencerd at gmail.com (Jeffrey Spencer) Date: Thu, 07 Jul 2011 20:23:46 +1000 Subject: [Numpy-discussion] Compiling 2.0.0.dev-3071eab version on Red-Hat Error with -lf77blas Message-ID: <4E158932.3090409@gmail.com> The error is below: creating build/temp.linux-x86_64-2.6/numpy/core/blasdot compile options: '-DATLAS_INFO="\"None\"" -Inumpy/core/blasdot -Inumpy/core/include -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/home/jspender/include/python2.6 -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c' gcc: numpy/core/blasdot/_dotblas.c numpy/core/blasdot/_dotblas.c: In function ?dotblas_matrixproduct?: numpy/core/blasdot/_dotblas.c:239: warning: comparison of distinct pointer types lacks a cast numpy/core/blasdot/_dotblas.c:257: warning: passing argument 3 of ?*(PyArray_API + 2240u)? from incompatible pointer type numpy/core/blasdot/_dotblas.c:292: warning: passing argument 3 of ?*(PyArray_API + 2240u)? from incompatible pointer type gcc -pthread -shared build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas -latlas -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when searching for -lf77blas /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when searching for -lf77blas /usr/bin/ld: cannot find -lf77blas collect2: ld returned 1 exit status /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when searching for -lf77blas /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when searching for -lf77blas /usr/bin/ld: cannot find -lf77blas collect2: ld returned 1 exit status error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas -latlas -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so" failed with exit status 1 Any help would be appreciated. -- ________________________ Jeffrey Spencer jeffspencerd at gmail.com From jensj at fysik.dtu.dk Thu Jul 7 08:10:32 2011 From: jensj at fysik.dtu.dk (Jens =?ISO-8859-1?Q?J=F8rgen?= Mortensen) Date: Thu, 07 Jul 2011 14:10:32 +0200 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous Message-ID: <1310040632.1736.85.camel@casimir> Hi! With numpy 1.5, I get this: >>> a = np.ones((2, 2)) >>> (2 * a.T).strides (16, 8) With 1.6, I get this: >>> (2 * a.T).strides (8, 16) So, this means I can't count on new arrays being C-contiguous any more. I guess there is a good reason for this. Anyway, I just thought I would mention it here - maybe I'm not the only one making this assumption when passing ndarrays to C code. Jens J?rgen From charlesr.harris at gmail.com Thu Jul 7 08:39:33 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Jul 2011 06:39:33 -0600 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: <1310040632.1736.85.camel@casimir> References: <1310040632.1736.85.camel@casimir> Message-ID: 2011/7/7 Jens J?rgen Mortensen > Hi! > > With numpy 1.5, I get this: > > >>> a = np.ones((2, 2)) > >>> (2 * a.T).strides > (16, 8) > > With 1.6, I get this: > > >>> (2 * a.T).strides > (8, 16) > > So, this means I can't count on new arrays being C-contiguous any more. > I guess there is a good reason for this. > > Anyway, I just thought I would mention it here - maybe I'm not the only > one making this assumption when passing ndarrays to C code. > > Yes, I believe that assumption is no longer valid, although Mark can verify the details of that. Essentially the axis can be reordered so as to provide the most efficient memory access during the computation. In particular, combinations of arrays in Fortran order are likely to produce arrays in Fortran order. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From yoshi at rokuko.net Thu Jul 7 09:24:29 2011 From: yoshi at rokuko.net (Yoshi Rokuko) Date: Thu, 07 Jul 2011 15:24:29 +0200 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: <1310040632.1736.85.camel@casimir> References: <1310040632.1736.85.camel@casimir> Message-ID: <201107071324.p67DOT8L031257@lotus.yokuts.org> thank you for pointing that out! so how do you change your numpy related c code now, would you like to share? best regards, yoshi From wkerzendorf at googlemail.com Thu Jul 7 09:26:44 2011 From: wkerzendorf at googlemail.com (Wolfgang Kerzendorf) Date: Thu, 07 Jul 2011 15:26:44 +0200 Subject: [Numpy-discussion] random numbers from arbitrary distribution Message-ID: <4E15B414.8020105@gmail.com> Hi all, Is there an way to get random numbers from an arbitrary distribution already built-in to numpy. I am interested to do that for a black body distribution Thanks Wolfgang From charlesr.harris at gmail.com Thu Jul 7 09:52:39 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Jul 2011 07:52:39 -0600 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: <201107071324.p67DOT8L031257@lotus.yokuts.org> References: <1310040632.1736.85.camel@casimir> <201107071324.p67DOT8L031257@lotus.yokuts.org> Message-ID: On Thu, Jul 7, 2011 at 7:24 AM, Yoshi Rokuko wrote: > thank you for pointing that out! > > so how do you change your numpy related c code now, would you like to > share? > > Either you have to deal with the axes in the c-code -- cython is an option there -- or you can check and make a copy of the array to be sure that it is both contiguous and in c-order. In [18]: a = np.ones((2,2)) In [19]: a.strides Out[19]: (16, 8) In [20]: a.T.strides Out[20]: (8, 16) In [21]: ascontiguousarray(a.T).strides Out[21]: (16, 8) Strictly speaking, you should have been doing that anyway, but sometimes quick and dirty gets the job done ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Jul 7 09:53:18 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 7 Jul 2011 08:53:18 -0500 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: <1310040632.1736.85.camel@casimir> References: <1310040632.1736.85.camel@casimir> Message-ID: 2011/7/7 Jens J?rgen Mortensen > Hi! > > With numpy 1.5, I get this: > > >>> a = np.ones((2, 2)) > >>> (2 * a.T).strides > (16, 8) > > With 1.6, I get this: > > >>> (2 * a.T).strides > (8, 16) > > So, this means I can't count on new arrays being C-contiguous any more. > I guess there is a good reason for this. > Yes, this was debated together with several other issues during the 1.6 beta period. The primary reason the default 'order=' setting for ufuncs changed from 'C' to 'K' was performance. For those NumPy users dealing with Fortran memory layouts or memory layouts which are neither C nor Fortran, having everything regress to a C memory layout caused many unnecessary copies and was more than an order of magnitude slower during certain computations. > Anyway, I just thought I would mention it here - maybe I'm not the only > one making this assumption when passing ndarrays to C code. > One way to deal with this is to use PyArray_FromAny with the NPY_C_CONTIGUOUS flag to ensure you have a C-aligned array. If you need to write to the array, you should also use the NPY_UPDATEIFCOPY flag. Here's how this may look (This is off the top of my head based on the documentation, I haven't tested it): http://docs.scipy.org/doc/numpy/reference/c-api.array.html#PyArray_FromAny int modify_array(PyArrayObject *arr) { PyArrayObject *c_arr = NULL; c_arr = PyArray_FromAny(arr, NULL, 0, 0, NPY_C_CONTIGUOUS | NPY_UPDATEIFCOPY, NULL); if (c_arr == NULL) { /* Return -1 indicating an error */ return -1; } /* Can now assume c_arr is C contiguous, and both read from it and write to it */ /* This triggers the UPDATE back to the original 'arr' object if a copy was made */ Py_DECREF(c_arr); /* Return 0 indicating success */ return 0; } Hope that helps! -Mark > > Jens J?rgen > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Jul 7 09:55:30 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 Jul 2011 09:55:30 -0400 Subject: [Numpy-discussion] random numbers from arbitrary distribution In-Reply-To: <4E15B414.8020105@gmail.com> References: <4E15B414.8020105@gmail.com> Message-ID: On Thu, Jul 7, 2011 at 9:26 AM, Wolfgang Kerzendorf wrote: > Hi all, > > Is there an ?way to get random numbers from an arbitrary distribution > already built-in to numpy. I am interested to do that for a black body > distribution What's a black body distributions? From a quick look at Wikipedia it might be planck, which is available in scipy.stats. A new distribution can be generated by subclassing scipy.stats.distributions.rv_continuous It has a generic random variable generator using the quantile function, ppf. If the ppf is available, then this is fast. If only the pdf is given, generating random variables is sloooow. (cdf is calculated by integrate.quad, ppf is generated from cdf with optimize.fsolve) Some distributions can be generated by a transformation of variables, which can also be very fast. Nothing else to generate arbitrary random variables is available in numpy or scipy. (I just started to read a paper that uses a piecewise polynomial approximation to the ppf that should work for fast generation of random variables when only the pdf is given.) Josef > > Thanks > ? ?Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From yoshi at rokuko.net Thu Jul 7 10:09:59 2011 From: yoshi at rokuko.net (Yoshi Rokuko) Date: Thu, 07 Jul 2011 16:09:59 +0200 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: References: <1310040632.1736.85.camel@casimir> Message-ID: <201107071410.p67E9xpA031714@lotus.yokuts.org> +---------------------------------------------------- Mark Wiebe -----------+ > One way to deal with this is to use PyArray_FromAny with the > NPY_C_CONTIGUOUS flag to ensure you have a C-aligned array. If you need to > write to the array, you should also use the NPY_UPDATEIFCOPY flag. Here's > how this may look (This is off the top of my head based on the > documentation, I haven't tested it): > ah so i get a copy from the not contiguous array that is then contiguous (by using NPY_C_CONTIGUOUS) - i thought PyArray_FromAny would return NULL then. thank you. From bsouthey at gmail.com Thu Jul 7 10:11:24 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 07 Jul 2011 09:11:24 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: Message-ID: <4E15BE8C.1090202@gmail.com> On 07/01/2011 04:46 PM, Ralf Gommers wrote: > Hi, > > I am pleased to announce the availability (only a little later than > planned) of the second release candidate of NumPy 1.6.1. This is a > bugfix release, list of fixed bugs: > #1834 einsum fails for specific shapes > #1837 einsum throws nan or freezes python for specific array shapes > #1838 object <-> structured type arrays regression > #1851 regression for SWIG based code in 1.6.0 > #1863 Buggy results when operating on array copied with astype() > #1870 Fix corner case of object array assignment > #1843 Py3k: fix error with recarray > #1885 nditer: Error in detecting double reduction loop > #1874 f2py: fix --include_paths bug > #1749 Fix ctypes.load_library() > > If no new problems are reported, the final release will be in one > week. Sources and binaries can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > > Enjoy, > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I finally got around to testing these: I have no errors for my Windows 7 32-bit installs of Python 2.6, 2.7, 3.1 and 3.2 I have no errors for my Linux Fedora 14 64-bit system with Python2.4, Python2.5, Python2.6, Python2.7 and Python3.1 With Linux Fedora 14 64-bit system with Python3.2 provides two warnings that are due to the test but not numpy (ticket 1385 http://projects.scipy.org/scipy/ticket/1385): test_mmap (test_io.TestSaveLoad) ... /usr/local/lib/python3.2/site-packages/numpy/lib/format.py:575: ResourceWarning: unclosed file <_io.BufferedReader name='/tmp/tmpk9cum0'> mode=mode, offset=offset) ok test_lapack (test_build.TestF77Mismatch) ... /usr/local/lib/python3.2/subprocess.py:460: ResourceWarning: unclosed file <_io.BufferedReader name=3> return Popen(*popenargs, **kwargs).wait() /usr/local/lib/python3.2/subprocess.py:460: ResourceWarning: unclosed file <_io.BufferedReader name=8> return Popen(*popenargs, **kwargs).wait() ok With the special debug version of Python2.7, I still get the unicode related error of ticket 1578 http://projects.scipy.org/numpy/ticket/1578 Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Jul 7 10:37:06 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 07 Jul 2011 09:37:06 -0500 Subject: [Numpy-discussion] Compiling 2.0.0.dev-3071eab version on Red-Hat Error with -lf77blas In-Reply-To: <4E158932.3090409@gmail.com> References: <4E158932.3090409@gmail.com> Message-ID: <4E15C492.1080907@gmail.com> On 07/07/2011 05:23 AM, Jeffrey Spencer wrote: > The error is below: > > creating build/temp.linux-x86_64-2.6/numpy/core/blasdot > compile options: '-DATLAS_INFO="\"None\"" -Inumpy/core/blasdot > -Inumpy/core/include > -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > -Inumpy/core/src/umath -Inumpy/core/include > -I/home/jspender/include/python2.6 > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c' > gcc: numpy/core/blasdot/_dotblas.c > numpy/core/blasdot/_dotblas.c: In function ?dotblas_matrixproduct?: > numpy/core/blasdot/_dotblas.c:239: warning: comparison of distinct > pointer types lacks a cast > numpy/core/blasdot/_dotblas.c:257: warning: passing argument 3 of > ?*(PyArray_API + 2240u)? from incompatible pointer type > numpy/core/blasdot/_dotblas.c:292: warning: passing argument 3 of > ?*(PyArray_API + 2240u)? from incompatible pointer type > gcc -pthread -shared > build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o > -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas -latlas > -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > searching for -lf77blas > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > searching for -lf77blas > /usr/bin/ld: cannot find -lf77blas > collect2: ld returned 1 exit status > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > searching for -lf77blas > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > searching for -lf77blas > /usr/bin/ld: cannot find -lf77blas > collect2: ld returned 1 exit status > error: Command "gcc -pthread -shared > build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o > -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas -latlas > -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so" failed with exit > status 1 > > Any help would be appreciated. > Python is looking for a 64-bit library as the one in /usr/local/lib/ is either 32-bit or built with a different compiler version. If you have the correct library in another location then you need to point numpy to it or just build everything with the same compiler. Bruce From mwwiebe at gmail.com Thu Jul 7 10:44:43 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 7 Jul 2011 09:44:43 -0500 Subject: [Numpy-discussion] Missing Data development plan Message-ID: It's been a day less than two weeks since I posted my first feedback request on a masked array implementation of missing data. I'd like to thank everyone that contributed to the discussion, and that continues to contribute. I believe my design is very solid thanks to all the feedback, and I understand at the same time there are still concerns that people have about the design. I sincerely hope that those concerns are further discussed and made more clear just as I have spent a lot of effort making sure my ideas are clear and understood by everyone in the discussion. Travis has directed me to for the moment focus a majority of my attention on the implementation. He will post further thoughts on the design issues in the next few days when he has enough of a break in his schedule. With the short time available for this implementation, my plan is as follows: 1) Implement the masked implementation of NA nearly to completion. This is the quickest way to get something that people can provide hands-on feedback with, and the NA dtype in my design uses the machinery of the masked implementation for all the computational kernels. 2) Assuming there is enough time left, implement the NA[] parameterized dtype in concert with a derived[] dtype and cleanups of the datetime64[] dtype, with the goal of creating some good structure for the possibility of creating more parameterized dtypes in the future. The derived[] dtype idea is based on an idea Travis had which he called computed columns, but generalized to apply in more contexts. When the time comes, I will post a proposal for feedback on this idea as well. Thanks once again for all the great feedback, and I look forward to getting a prototype into your hands to test as quickly as possible! -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Jul 7 10:56:29 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 7 Jul 2011 09:56:29 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: <4E14AFC2.6000704@uci.edu> References: <4E14AFC2.6000704@uci.edu> Message-ID: On Wed, Jul 6, 2011 at 1:56 PM, Christoph Gohlke wrote: > > > On 7/6/2011 10:57 AM, Russell E. Owen wrote: >> In article >> , >> ? Ralf Gommers ?wrote: >> >>> On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen ?wrote: >>> >>>> In article, >>>> ? Ralf Gommers ?wrote: >>>> >>>>> https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ >>>> >>>> Will there be a Mac binary for 32-bit pythons (one that is compatible >>>> with older versions of MacOS X)? At present I only see a 64-bit >>>> 10.6-only version. >>>> >>>> >>>> Yes there will be for the final release (10.4-10.6 compatible). I can't >>> create those on my own computer, so sometimes I don't make them for RCs. >> >> I'm glad they will be present for the final release. >> >> FYI: I built my own 1.6.1rc2 against Python 2.7.2 (the 32-bit Mac >> version from python.org). I reproduced a memory error that I've been >> trying to narrow down. This is ticket 1896: >> >> and the problem is also in 1.6.0. >> >> -- Russell >> > > > I can reproduce this error on Windows. It looks like a serious regression. > > Christoph > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I do get this error in the code without tinker on the first loop. I did notice that the original array (dataArr) is float64 but the second array (scaledArr) is only float32. The problem is removed by changing the dtype of scaledArr to float64. Thus, it would appear some memory allocation related error to squeeze a float64 result into a memory allocated for a float32 array. Bruce From mwwiebe at gmail.com Thu Jul 7 11:06:03 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 7 Jul 2011 10:06:03 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: <4E14AFC2.6000704@uci.edu> Message-ID: On Thu, Jul 7, 2011 at 9:56 AM, Bruce Southey wrote: > On Wed, Jul 6, 2011 at 1:56 PM, Christoph Gohlke wrote: > > > > > > On 7/6/2011 10:57 AM, Russell E. Owen wrote: > >> In article > >> , > >> Ralf Gommers wrote: > >> > >>> On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen wrote: > >>> > >>>> In article, > >>>> Ralf Gommers wrote: > >>>> > >>>>> https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > >>>> > >>>> Will there be a Mac binary for 32-bit pythons (one that is compatible > >>>> with older versions of MacOS X)? At present I only see a 64-bit > >>>> 10.6-only version. > >>>> > >>>> > >>>> Yes there will be for the final release (10.4-10.6 compatible). I > can't > >>> create those on my own computer, so sometimes I don't make them for > RCs. > >> > >> I'm glad they will be present for the final release. > >> > >> FYI: I built my own 1.6.1rc2 against Python 2.7.2 (the 32-bit Mac > >> version from python.org). I reproduced a memory error that I've been > >> trying to narrow down. This is ticket 1896: > >> > >> and the problem is also in 1.6.0. > >> > >> -- Russell > >> > > > > > > I can reproduce this error on Windows. It looks like a serious > regression. > > > > Christoph > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > I do get this error in the code without tinker on the first loop. > > I did notice that the original array (dataArr) is float64 but the > second array (scaledArr) is only float32. The problem is removed by > changing the dtype of scaledArr to float64. Thus, it would appear some > memory allocation related error to squeeze a float64 result into a > memory allocated for a float32 array. > Can you try it on your platform with the pull request I've made which hopefully fixes it? Here's the link: https://github.com/numpy/numpy/pull/103 Thanks, Mark > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Jul 7 12:11:51 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 07 Jul 2011 11:11:51 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: <4E14AFC2.6000704@uci.edu> Message-ID: <4E15DAC7.8020107@gmail.com> On 07/07/2011 10:06 AM, Mark Wiebe wrote: > On Thu, Jul 7, 2011 at 9:56 AM, Bruce Southey > wrote: > > On Wed, Jul 6, 2011 at 1:56 PM, Christoph Gohlke > wrote: > > > > > > On 7/6/2011 10:57 AM, Russell E. Owen wrote: > >> In article > >> > >, > >> Ralf Gommers > wrote: > >> > >>> On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen > wrote: > >>> > >>>> In article >, > >>>> Ralf Gommers > wrote: > >>>> > >>>>> https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > >>>> > >>>> Will there be a Mac binary for 32-bit pythons (one that is > compatible > >>>> with older versions of MacOS X)? At present I only see a 64-bit > >>>> 10.6-only version. > >>>> > >>>> > >>>> Yes there will be for the final release (10.4-10.6 > compatible). I can't > >>> create those on my own computer, so sometimes I don't make > them for RCs. > >> > >> I'm glad they will be present for the final release. > >> > >> FYI: I built my own 1.6.1rc2 against Python 2.7.2 (the 32-bit Mac > >> version from python.org ). I reproduced a > memory error that I've been > >> trying to narrow down. This is ticket 1896: > >> > >> and the problem is also in 1.6.0. > >> > >> -- Russell > >> > > > > > > I can reproduce this error on Windows. It looks like a serious > regression. > > > > Christoph > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > I do get this error in the code without tinker on the first loop. > > I did notice that the original array (dataArr) is float64 but the > second array (scaledArr) is only float32. The problem is removed by > changing the dtype of scaledArr to float64. Thus, it would appear some > memory allocation related error to squeeze a float64 result into a > memory allocated for a float32 array. > > > Can you try it on your platform with the pull request I've made which > hopefully fixes it? Here's the link: > > https://github.com/numpy/numpy/pull/103 > > Thanks, > Mark > > I do not get the crash with Python2.7 on the users code. But I can not compile this branch under Python3.1 or Python3.2. The last error is below - I can look into this more if needed. Bruce creating build/temp.linux-x86_64-3.1/numpy/core/src/multiarray compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-3.1/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python3.1 -Ibuild/src.linux-x86_64-3.1/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-3.1/numpy/core/src/umath -c' gcc: numpy/core/src/multiarray/multiarraymodule_onefile.c In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: numpy/core/src/multiarray/arraytypes.c.src:631:94: error: ?NPY_WRITEABLE? undeclared (first use in this function) numpy/core/src/multiarray/arraytypes.c.src:631:94: note: each undeclared identifier is reported only once for each function it appears in numpy/core/src/multiarray/arraytypes.c.src:633:35: warning: assignment from incompatible pointer type In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:28:0: numpy/core/src/multiarray/getset.c: In function ?array_data_get?: numpy/core/src/multiarray/getset.c:284:5: warning: passing argument 1 of ?PyMemoryView_FromObject? from incompatible pointer type /usr/include/python3.1/memoryobject.h:54:12: note: expected ?struct PyObject *? but argument is of type ?struct PyArrayObject *? numpy/core/src/multiarray/multiarraymodule_onefile.c: At top level: numpy/core/src/multiarray/scalartypes.c.src:2231:1: warning: ?gentype_getsegcount? defined but not used numpy/core/src/multiarray/scalartypes.c.src:2249:1: warning: ?gentype_getcharbuf? defined but not used numpy/core/src/multiarray/mapping.c:75:1: warning: ?_array_ass_item? defined but not used numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined but not used numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? defined but not used numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? defined but not used numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? defined but not used numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? defined but not used In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: numpy/core/src/multiarray/arraytypes.c.src:631:94: error: ?NPY_WRITEABLE? undeclared (first use in this function) numpy/core/src/multiarray/arraytypes.c.src:631:94: note: each undeclared identifier is reported only once for each function it appears in numpy/core/src/multiarray/arraytypes.c.src:633:35: warning: assignment from incompatible pointer type In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:28:0: numpy/core/src/multiarray/getset.c: In function ?array_data_get?: numpy/core/src/multiarray/getset.c:284:5: warning: passing argument 1 of ?PyMemoryView_FromObject? from incompatible pointer type /usr/include/python3.1/memoryobject.h:54:12: note: expected ?struct PyObject *? but argument is of type ?struct PyArrayObject *? numpy/core/src/multiarray/multiarraymodule_onefile.c: At top level: numpy/core/src/multiarray/scalartypes.c.src:2231:1: warning: ?gentype_getsegcount? defined but not used numpy/core/src/multiarray/scalartypes.c.src:2249:1: warning: ?gentype_getcharbuf? defined but not used numpy/core/src/multiarray/mapping.c:75:1: warning: ?_array_ass_item? defined but not used numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined but not used numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? defined but not used numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? defined but not used numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? defined but not used numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? defined but not used error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-3.1/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python3.1 -Ibuild/src.linux-x86_64-3.1/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-3.1/numpy/core/src/umath -c numpy/core/src/multiarray/multiarraymodule_onefile.c -o build/temp.linux-x86_64-3.1/numpy/core/src/multiarray/multiarraymodule_onefile.o" failed with exit status 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Jul 7 12:18:02 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 7 Jul 2011 11:18:02 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: <4E15DAC7.8020107@gmail.com> References: <4E14AFC2.6000704@uci.edu> <4E15DAC7.8020107@gmail.com> Message-ID: On Thu, Jul 7, 2011 at 11:11 AM, Bruce Southey wrote: > ** > On 07/07/2011 10:06 AM, Mark Wiebe wrote: > > On Thu, Jul 7, 2011 at 9:56 AM, Bruce Southey wrote: > >> On Wed, Jul 6, 2011 at 1:56 PM, Christoph Gohlke >> wrote: >> > >> > >> > On 7/6/2011 10:57 AM, Russell E. Owen wrote: >> >> In article >> >> , >> >> Ralf Gommers wrote: >> >> >> >>> On Tue, Jul 5, 2011 at 11:41 PM, Russell E. Owen >> wrote: >> >>> >> >>>> In article, >> >>>> Ralf Gommers wrote: >> >>>> >> >>>>> https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ >> >>>> >> >>>> Will there be a Mac binary for 32-bit pythons (one that is compatible >> >>>> with older versions of MacOS X)? At present I only see a 64-bit >> >>>> 10.6-only version. >> >>>> >> >>>> >> >>>> Yes there will be for the final release (10.4-10.6 compatible). I >> can't >> >>> create those on my own computer, so sometimes I don't make them for >> RCs. >> >> >> >> I'm glad they will be present for the final release. >> >> >> >> FYI: I built my own 1.6.1rc2 against Python 2.7.2 (the 32-bit Mac >> >> version from python.org). I reproduced a memory error that I've been >> >> trying to narrow down. This is ticket 1896: >> >> >> >> and the problem is also in 1.6.0. >> >> >> >> -- Russell >> >> >> > >> > >> > I can reproduce this error on Windows. It looks like a serious >> regression. >> > >> > Christoph >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> I do get this error in the code without tinker on the first loop. >> >> I did notice that the original array (dataArr) is float64 but the >> second array (scaledArr) is only float32. The problem is removed by >> changing the dtype of scaledArr to float64. Thus, it would appear some >> memory allocation related error to squeeze a float64 result into a >> memory allocated for a float32 array. >> > > Can you try it on your platform with the pull request I've made which > hopefully fixes it? Here's the link: > > https://github.com/numpy/numpy/pull/103 > > Thanks, > Mark > > >> I do not get the crash with Python2.7 on the users code. But I can not > compile this branch under Python3.1 or Python3.2. The last error is below - > I can look into this more if needed. > I suspect you have gotten the missingdata branch by accident instead of the pull request's one. This is one thing I've found confusing/bad about github, is that the URL they provide always gives you the default branch. You need to switch to the crash1896 branch for the test. It does look like I missed something when cleaning up some NumPy API stuff, however. That build failure log is useful, thanks! -Mark > > Bruce > > > creating build/temp.linux-x86_64-3.1/numpy/core/src/multiarray > compile options: '-Inumpy/core/include > -Ibuild/src.linux-x86_64-3.1/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python3.1 > -Ibuild/src.linux-x86_64-3.1/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-3.1/numpy/core/src/umath -c' > gcc: numpy/core/src/multiarray/multiarraymodule_onefile.c > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: > numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: > numpy/core/src/multiarray/arraytypes.c.src:631:94: error: ?NPY_WRITEABLE? > undeclared (first use in this function) > numpy/core/src/multiarray/arraytypes.c.src:631:94: note: each undeclared > identifier is reported only once for each function it appears in > numpy/core/src/multiarray/arraytypes.c.src:633:35: warning: assignment from > incompatible pointer type > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:28:0: > numpy/core/src/multiarray/getset.c: In function ?array_data_get?: > numpy/core/src/multiarray/getset.c:284:5: warning: passing argument 1 of > ?PyMemoryView_FromObject? from incompatible pointer type > /usr/include/python3.1/memoryobject.h:54:12: note: expected ?struct > PyObject *? but argument is of type ?struct PyArrayObject *? > numpy/core/src/multiarray/multiarraymodule_onefile.c: At top level: > numpy/core/src/multiarray/scalartypes.c.src:2231:1: warning: > ?gentype_getsegcount? defined but not used > numpy/core/src/multiarray/scalartypes.c.src:2249:1: warning: > ?gentype_getcharbuf? defined but not used > numpy/core/src/multiarray/mapping.c:75:1: warning: ?_array_ass_item? > defined but not used > numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined > but not used > numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? > defined but not used > numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? > defined but not used > numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? > defined but not used > numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? > defined but not used > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: > numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: > numpy/core/src/multiarray/arraytypes.c.src:631:94: error: ?NPY_WRITEABLE? > undeclared (first use in this function) > numpy/core/src/multiarray/arraytypes.c.src:631:94: note: each undeclared > identifier is reported only once for each function it appears in > numpy/core/src/multiarray/arraytypes.c.src:633:35: warning: assignment from > incompatible pointer type > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:28:0: > numpy/core/src/multiarray/getset.c: In function ?array_data_get?: > numpy/core/src/multiarray/getset.c:284:5: warning: passing argument 1 of > ?PyMemoryView_FromObject? from incompatible pointer type > /usr/include/python3.1/memoryobject.h:54:12: note: expected ?struct > PyObject *? but argument is of type ?struct PyArrayObject *? > numpy/core/src/multiarray/multiarraymodule_onefile.c: At top level: > numpy/core/src/multiarray/scalartypes.c.src:2231:1: warning: > ?gentype_getsegcount? defined but not used > numpy/core/src/multiarray/scalartypes.c.src:2249:1: warning: > ?gentype_getcharbuf? defined but not used > numpy/core/src/multiarray/mapping.c:75:1: warning: ?_array_ass_item? > defined but not used > numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined > but not used > numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? > defined but not used > numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? > defined but not used > numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? > defined but not used > numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? > defined but not used > error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -O2 -g -pipe > -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector > --param=ssp-buffer-size=4 -m64 -mtune=generic -D_GNU_SOURCE -fPIC -fwrapv > -fPIC -Inumpy/core/include > -Ibuild/src.linux-x86_64-3.1/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath > -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python3.1 > -Ibuild/src.linux-x86_64-3.1/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-3.1/numpy/core/src/umath -c > numpy/core/src/multiarray/multiarraymodule_onefile.c -o > build/temp.linux-x86_64-3.1/numpy/core/src/multiarray/multiarraymodule_onefile.o" > failed with exit status 1 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Jul 7 12:40:15 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 07 Jul 2011 11:40:15 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: <4E14AFC2.6000704@uci.edu> <4E15DAC7.8020107@gmail.com> Message-ID: <4E15E16F.4010401@gmail.com> On 07/07/2011 11:18 AM, Mark Wiebe wrote: > On Thu, Jul 7, 2011 at 11:11 AM, Bruce Southey > wrote: > > On 07/07/2011 10:06 AM, Mark Wiebe wrote: >> On Thu, Jul 7, 2011 at 9:56 AM, Bruce Southey > > wrote: >> >> On Wed, Jul 6, 2011 at 1:56 PM, Christoph Gohlke >> > wrote: >> > >> > >> > On 7/6/2011 10:57 AM, Russell E. Owen wrote: >> >> In article >> >> >> > >, >> >> Ralf Gommers> > wrote: >> >> >> >>> On Tue, Jul 5, 2011 at 11:41 PM, Russell E. >> Owen> wrote: >> >>> >> >>>> In >> article> >, >> >>>> Ralf Gommers> > wrote: >> >>>> >> >>>>> >> https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ >> >>>> >> >>>> Will there be a Mac binary for 32-bit pythons (one that >> is compatible >> >>>> with older versions of MacOS X)? At present I only see a >> 64-bit >> >>>> 10.6-only version. >> >>>> >> >>>> >> >>>> Yes there will be for the final release (10.4-10.6 >> compatible). I can't >> >>> create those on my own computer, so sometimes I don't >> make them for RCs. >> >> >> >> I'm glad they will be present for the final release. >> >> >> >> FYI: I built my own 1.6.1rc2 against Python 2.7.2 (the >> 32-bit Mac >> >> version from python.org ). I reproduced >> a memory error that I've been >> >> trying to narrow down. This is ticket 1896: >> >> >> >> and the problem is also in 1.6.0. >> >> >> >> -- Russell >> >> >> > >> > >> > I can reproduce this error on Windows. It looks like a >> serious regression. >> > >> > Christoph >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> I do get this error in the code without tinker on the first loop. >> >> I did notice that the original array (dataArr) is float64 but the >> second array (scaledArr) is only float32. The problem is >> removed by >> changing the dtype of scaledArr to float64. Thus, it would >> appear some >> memory allocation related error to squeeze a float64 result >> into a >> memory allocated for a float32 array. >> >> >> Can you try it on your platform with the pull request I've made >> which hopefully fixes it? Here's the link: >> >> https://github.com/numpy/numpy/pull/103 >> >> Thanks, >> Mark >> >> > I do not get the crash with Python2.7 on the users code. But I can > not compile this branch under Python3.1 or Python3.2. The last > error is below - I can look into this more if needed. > > > I suspect you have gotten the missingdata branch by accident instead > of the pull request's one. This is one thing I've found confusing/bad > about github, is that the URL they provide always gives you the > default branch. You need to switch to the crash1896 branch for the test. > > It does look like I missed something when cleaning up some NumPy API > stuff, however. That build failure log is useful, thanks! > > -Mark > Yes, your are correct as I just check if the fix was present rather than ensuring I got the correct branch. So the correct branch fixes the Python3.x build issue. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From bergstrj at iro.umontreal.ca Thu Jul 7 13:03:04 2011 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Thu, 7 Jul 2011 13:03:04 -0400 Subject: [Numpy-discussion] potential bug in PyArray_MoveInto and PyArray_CopyInto? In-Reply-To: References: Message-ID: In numpy 1.5.1, ?the functions PyArray_MoveInto and PyArray_CopyInto don't appear to treat strides correctly. Evidence: PyNumber_InPlaceAdd(dst, src), and modifies the correct subarray to which dst points. In the same context, PyArray_MoveInto(dst, src) modifies the first two rows of the underlying matrix instead of the first two columns. PyArray_CopyInto does the same. Is there something subtle going on here? James -- http://www-etud.iro.umontreal.ca/~bergstrj From charlesr.harris at gmail.com Thu Jul 7 13:10:46 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Jul 2011 11:10:46 -0600 Subject: [Numpy-discussion] potential bug in PyArray_MoveInto and PyArray_CopyInto? In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 11:03 AM, James Bergstra wrote: > In numpy 1.5.1, the functions PyArray_MoveInto and PyArray_CopyInto > don't appear to treat strides correctly. > > Evidence: > PyNumber_InPlaceAdd(dst, src), and modifies the correct subarray to > which dst points. > > In the same context, PyArray_MoveInto(dst, src) modifies the first two > rows of the > underlying matrix instead of the first two columns. PyArray_CopyInto > does the same. > > Is there something subtle going on here? > > What are the strides/dims in src and dst? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpyle at post.harvard.edu Thu Jul 7 13:16:47 2011 From: rpyle at post.harvard.edu (Robert Pyle) Date: Thu, 07 Jul 2011 13:16:47 -0400 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: <4E15BE8C.1090202@gmail.com> References: <4E15BE8C.1090202@gmail.com> Message-ID: <2486ACC5-C5BA-42C3-9E70-3168FE4D7A48@post.harvard.edu> My system is Mac OSX 10.6.8, python.org 2.7.1. When I run numpy.test(), I see the following warning: >>> import numpy as np >>> np.test() Running unit tests for numpy NumPy version 1.6.1rc2 NumPy is installed in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy Python version 2.7.1 (r271:86882M, Nov 30 2010, 09:39:13) [GCC 4.0.1 (Apple Inc. build 5494)] nose version 0.11.3 .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1922: RuntimeWarning: invalid value encountered in absolute return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) Everything else completes with 3 KNOWNFAILs and 1 SKIP. This warning is not new to this release; I've seen it before but haven't tried tracking it down until today. It arises in allclose(). The comments state "If either array contains NaN, then False is returned." but no test for NaN is done, and NaNs are indeed what cause the warning. Inserting if any(isnan(x)) or any(isnan(y)): return False before current line number 1916 in numeric.py seems to fix it. Thanks to all for the great work. Numpy has saved me a lot of grief. Bob Pyle Cambridge, MA From bergstrj at iro.umontreal.ca Thu Jul 7 16:59:19 2011 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Thu, 7 Jul 2011 16:59:19 -0400 Subject: [Numpy-discussion] potential bug in PyArray_MoveInto and PyArray_CopyInto? In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 1:10 PM, Charles R Harris wrote: > > > On Thu, Jul 7, 2011 at 11:03 AM, James Bergstra > wrote: >> >> In numpy 1.5.1, ?the functions PyArray_MoveInto and PyArray_CopyInto >> don't appear to treat strides correctly. >> >> Evidence: >> PyNumber_InPlaceAdd(dst, src), and modifies the correct subarray to >> which dst points. >> >> In the same context, PyArray_MoveInto(dst, src) modifies the first two >> rows of the >> underlying matrix instead of the first two columns. PyArray_CopyInto >> does the same. >> >> Is there something subtle going on here? >> > > What are the strides/dims in src and dst? > > Chuck > In dst: strides = (40,8), dims=(5,2) in src: strides = () dims=() dst was sliced out of a 5x5 array of doubles. src is a 0-d array James -- http://www-etud.iro.umontreal.ca/~bergstrj From dirk.ullrich at googlemail.com Thu Jul 7 19:48:20 2011 From: dirk.ullrich at googlemail.com (Dirk Ullrich) Date: Fri, 8 Jul 2011 01:48:20 +0200 Subject: [Numpy-discussion] Build failure for NumPy's HEAD in Git Message-ID: Hi, the current HEAD of NumPy fails to build. To be more precise: compilation of `numpy/core/src/multiarray/multiarraymodule_onefile.c' fails. It looks like that is caused by splitting the `nditer.c.src' stuff in the same directory into `nditer_api.c', `nditer_constr.c' and `nditer_templ.c.src': If you #include the three new `.c' files in `multiarraymodule_onefile.c' instead of `nditer.c' the problem seems to be fixed. Sorry for posting this to this mailing list instead of filing a bug. But I am new to NumPy and have now account yet for Trac (yet). Dirk From charlesr.harris at gmail.com Thu Jul 7 20:31:52 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 Jul 2011 18:31:52 -0600 Subject: [Numpy-discussion] Build failure for NumPy's HEAD in Git In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 5:48 PM, Dirk Ullrich wrote: > Hi, > > the current HEAD of NumPy fails to build. > To be more precise: compilation of > `numpy/core/src/multiarray/multiarraymodule_onefile.c' fails. It looks > like that is caused by splitting the `nditer.c.src' stuff in the same > directory into `nditer_api.c', `nditer_constr.c' and > `nditer_templ.c.src': If you #include the three new `.c' files in > `multiarraymodule_onefile.c' instead of `nditer.c' the problem seems > to be fixed. > > Sorry for posting this to this mailing list instead of filing a bug. > But I am new to NumPy and have now account yet for Trac (yet). > > Thanks for the report, it should be fixed in 834b5bf. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Fri Jul 8 00:03:51 2011 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 7 Jul 2011 23:03:51 -0500 Subject: [Numpy-discussion] Missing Values Discussion Message-ID: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> Hi all, I want to first apologize for stepping into this discussion a bit late and for not being able to participate adequately. However, I want to offer a couple of perspectives, and my opinion about what we should do as well as clarify what I have instructed Mark to do as part of his summer work. First, the discussion has reminded me how valuable it is to get feedback from all points of view. While it does lengthen the process, it significantly enhances the result. I strongly hope we can continue the tradition of respectful discussion on this mailing list where people's views are treated with respect --- even if we don't always have the time to understand them in depth. I also really appreciate people taking the time to visit on the phone call with me as it gave me a chance to understand many opinions quickly and at least start to form a possibly useful opinion. Basically, because there is not consensus and in fact a strong and reasonable opposition to specific points, Mark's NEP as proposed cannot be accepted in its entirety right now. However, I believe an implementation of his NEP is useful and will be instructive in resolving the issues and so I have instructed him to spend Enthought time on the implementation. Any changes that need to be made to the API before it is accepted into a released form of NumPy can still be made even after most of the implementation is completed as far as I understand it. This is because most of the disagreement is about the specific ability to manipulate the masks independently of assigning missing data and the creation of an additional np.HIDE (np.IGNORE) concept at the Python level. Despite some powerful arguments on both sides of the discussion, I am confident that we can figure out an elegant solution that will work long term. My current opinion is that I am very favorable to making easy the use-case that has been repeatedly described of having "missing data" that is *always* missing and then having "hidden data" that you don't want to think about for a particular set of calculations (but you also don't want to through away by over-writing). I think it is important to make it easy to keep that data around without over-writing but also have the "idea" of that kind of missing data different than the idea of data you can't care about because it just isn't there. I also think it is important for the calculation infrastructure to have just one notion of "missing data" which Mark's NEP handles beautifully. It seems to me that some of the disagreement is one of perspective in that Mark articulates very well the position of "generic programming, make-opaque-the-implementation" perspective with a focus on the implications of missing data for calculations. Nathaniel and Matthew articulate well the perspective of "focusing" on the data object itself and the desire to keep separate the different ideas behind missing data that have been described --- as well as a powerfully described description of the NumPy tradition of exposing the raw data to the Python side without hiding too much of the implementation from the user. I think it's a healthy discussion. But, I would like to see Mark's code get completed so that we can start talking about code examples. Please don't interpret my instructing Mark to finish the code as "it's been decided". I simply think it's the best path forward to ultimately resolving the concerns. I would like to see an API worked out before summer's end --- and I'm hopeful everyone will be excited about what the resulting design is. I do think there is room for agreement in the present debate if we all remember to keep listening to each other. It takes a lot of effort to understand somebody else's point of view. I have been grateful to see evidence I see of that behavior multiple times (in Mark's revamping of the NEP, in Matthew Brett's re-statement of his interpretation of Mark's views, in Nathaniel's working hard to engage the dialogue even in the throes of finishing his PhD, and many other examples). It makes me very happy to be a part of this community. I look forward to times when I can send more thoughtful and technical emails than this one. All the best, -Travis From ischnell at enthought.com Fri Jul 8 01:37:26 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Fri, 8 Jul 2011 00:37:26 -0500 Subject: [Numpy-discussion] ANN: EPD 7.1 released Message-ID: Hello, I am pleased to announce that EPD (Enthought Python Distribution) version 7.1 has been released. The most significant change is the addition of an "EPD Free" version, which has its own very liberal license, and can be downloaded and used free of any charge by anyone (not only academics). "EPD Free" includes a subset of the packages included in the full EPD. The highlights of this subset are: numpy, scipy, matplotlib, traits and chaco. To see which libraries are included in the free vs. full version, please see: http://www.enthought.com/products/epdlibraries.php In addition we have opened our PyPI build mirror for everyone. This means that one can type "enpkg xyz" for 10,000+ packages. However, there are still benefits to becoming an EPD subscriber. http://www.enthought.com/products/getepd.php Apart from the addition of "EPD Free", this release includes updates to over 30 packages, including numpy, scipy, ipython and ETS. We have also added PySide, Qt and MDP to this release. Please find the complete list of additions, updates and bug fixes in the change log: http://www.enthought.com/products/changelog.php About EPD --------- The Enthought Python Distribution (EPD) is a "kitchen-sink-included" distribution of the Python programming language, including over 90 additional tools and libraries. The EPD bundle includes NumPy, SciPy, IPython, 2D and 3D visualization, and many other tools. EPD is currently available as a single-click installer for Windows XP, Vista and 7, MacOSX (10.5 and 10.6), RedHat 3, 4 and 5, as well as Solaris 10 (x86 and x86_64/amd64 on all platforms). All versions of EPD (32 and 64-bit) are free for academic use. An annual subscription including installation support is available for individual and commercial use. Additional support options, including customization, bug fixes and training classes are also available: http://www.enthought.com/products/epd_sublevels.php - Ilan From jeffspencerd at gmail.com Fri Jul 8 03:35:18 2011 From: jeffspencerd at gmail.com (Jeffrey Spencer) Date: Fri, 8 Jul 2011 17:35:18 +1000 Subject: [Numpy-discussion] Compiling numpy on Red-Hat Import Error with lapack_lite.so Message-ID: That actually makes sense because I am not sure the gnu that it was compiled with but I think it is different. I have since compiled gcc myself, then python, and atlas libraries. Then I tried to install numpy. It go tthrough the install no worries and found the correct libraries. It stuffed when I tried to import it with this error: >>> import numpy Traceback (most recent call last): File "", line 1, in File "/home/jspender/lib/python2.6/site-packages/numpy/__init__.py", line 137, in import add_newdocs File "/home/jspender/lib/python2.6/site-packages/numpy/add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "/home/jspender/lib/python2.6/site-packages/numpy/lib/__init__.py", line 13, in from polynomial import * File "/home/jspender/lib/python2.6/site-packages/numpy/lib/polynomial.py", line 17, in from numpy.linalg import eigvals, lstsq File "/home/jspender/lib/python2.6/site-packages/numpy/linalg/__init__.py", line 48, in from linalg import * File "/home/jspender/lib/python2.6/site-packages/numpy/linalg/linalg.py", line 23, in from numpy.linalg import lapack_lite ImportError: /home/jspender/lib/python2.6/site-packages/numpy/linalg/lapack_lite.so: undefined symbol: _gfortran_concat_string >>> Any ideas??? Cheers, Jeff On Fri, Jul 8, 2011 at 12:37 AM, Bruce Southey wrote: > On 07/07/2011 05:23 AM, Jeffrey Spencer wrote: > > The error is below: > > > > creating build/temp.linux-x86_64-2.6/numpy/core/blasdot > > compile options: '-DATLAS_INFO="\"None\"" -Inumpy/core/blasdot > > -Inumpy/core/include > > -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > > -Inumpy/core/src/umath -Inumpy/core/include > > -I/home/jspender/include/python2.6 > > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray > > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c' > > gcc: numpy/core/blasdot/_dotblas.c > > numpy/core/blasdot/_dotblas.c: In function ?dotblas_matrixproduct?: > > numpy/core/blasdot/_dotblas.c:239: warning: comparison of distinct > > pointer types lacks a cast > > numpy/core/blasdot/_dotblas.c:257: warning: passing argument 3 of > > ?*(PyArray_API + 2240u)? from incompatible pointer type > > numpy/core/blasdot/_dotblas.c:292: warning: passing argument 3 of > > ?*(PyArray_API + 2240u)? from incompatible pointer type > > gcc -pthread -shared > > build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o > > -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas -latlas > > -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: cannot find -lf77blas > > collect2: ld returned 1 exit status > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: cannot find -lf77blas > > collect2: ld returned 1 exit status > > error: Command "gcc -pthread -shared > > build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o > > -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas -latlas > > -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so" failed with exit > > status 1 > > > > Any help would be appreciated. > > > Python is looking for a 64-bit library as the one in /usr/local/lib/ is > either 32-bit or built with a different compiler version. If you have > the correct library in another location then you need to point numpy to > it or just build everything with the same compiler. > > Bruce > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffspencerd at gmail.com Fri Jul 8 04:35:59 2011 From: jeffspencerd at gmail.com (Jeffrey Spencer) Date: Fri, 08 Jul 2011 18:35:59 +1000 Subject: [Numpy-discussion] Compiling numpy on Red-Hat Import Error with lapack_lite.so In-Reply-To: References: Message-ID: <4E16C16F.1020804@gmail.com> I'll answer my own question. It was a mix of using two different fortran compilers so specified the option: python setup.py config_fc --fcompiler=gfortran build. All seems to be going well now. On 07/08/2011 05:35 PM, Jeffrey Spencer wrote: > That actually makes sense because I am not sure the gnu that it was > compiled with but I think it is different. I have since compiled gcc > myself, then python, and atlas libraries. Then I tried to install > numpy. It go tthrough the install no worries and found the correct > libraries. It stuffed when I tried to import it with this error: > > >>> import numpy > Traceback (most recent call last): > File "", line 1, in > File "/home/jspender/lib/python2.6/site-packages/numpy/__init__.py", > line 137, in > import add_newdocs > File > "/home/jspender/lib/python2.6/site-packages/numpy/add_newdocs.py", > line 9, in > from numpy.lib import add_newdoc > File > "/home/jspender/lib/python2.6/site-packages/numpy/lib/__init__.py", > line 13, in > from polynomial import * > File > "/home/jspender/lib/python2.6/site-packages/numpy/lib/polynomial.py", > line 17, in > from numpy.linalg import eigvals, lstsq > File > "/home/jspender/lib/python2.6/site-packages/numpy/linalg/__init__.py", > line 48, in > from linalg import * > File > "/home/jspender/lib/python2.6/site-packages/numpy/linalg/linalg.py", > line 23, in > from numpy.linalg import lapack_lite > ImportError: > /home/jspender/lib/python2.6/site-packages/numpy/linalg/lapack_lite.so: undefined > symbol: _gfortran_concat_string > >>> > > Any ideas??? > > Cheers, > Jeff > > On Fri, Jul 8, 2011 at 12:37 AM, Bruce Southey > wrote: > > On 07/07/2011 05:23 AM, Jeffrey Spencer wrote: > > The error is below: > > > > creating build/temp.linux-x86_64-2.6/numpy/core/blasdot > > compile options: '-DATLAS_INFO="\"None\"" -Inumpy/core/blasdot > > -Inumpy/core/include > > -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > > -Inumpy/core/src/umath -Inumpy/core/include > > -I/home/jspender/include/python2.6 > > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray > > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c' > > gcc: numpy/core/blasdot/_dotblas.c > > numpy/core/blasdot/_dotblas.c: In function ?dotblas_matrixproduct?: > > numpy/core/blasdot/_dotblas.c:239: warning: comparison of distinct > > pointer types lacks a cast > > numpy/core/blasdot/_dotblas.c:257: warning: passing argument 3 of > > ?*(PyArray_API + 2240u)? from incompatible pointer type > > numpy/core/blasdot/_dotblas.c:292: warning: passing argument 3 of > > ?*(PyArray_API + 2240u)? from incompatible pointer type > > gcc -pthread -shared > > build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o > > -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas > -latlas > > -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: cannot find -lf77blas > > collect2: ld returned 1 exit status > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: skipping incompatible /usr/local/lib/libf77blas.a when > > searching for -lf77blas > > /usr/bin/ld: cannot find -lf77blas > > collect2: ld returned 1 exit status > > error: Command "gcc -pthread -shared > > build/temp.linux-x86_64-2.6/numpy/core/blasdot/_dotblas.o > > -L/usr/local/lib -Lbuild/temp.linux-x86_64-2.6 -lf77blas -lcblas > -latlas > > -o build/lib.linux-x86_64-2.6/numpy/core/_dotblas.so" failed > with exit > > status 1 > > > > Any help would be appreciated. > > > Python is looking for a 64-bit library as the one in > /usr/local/lib/ is > either 32-bit or built with a different compiler version. If you have > the correct library in another location then you need to point > numpy to > it or just build everything with the same compiler. > > Bruce > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Jul 8 08:15:53 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 8 Jul 2011 13:15:53 +0100 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> Message-ID: Hi Travis, On Fri, Jul 8, 2011 at 5:03 AM, Travis Oliphant wrote: > > Hi all, > > I want to first apologize for stepping into this discussion a bit late and for not being able to participate adequately. ? However, I want to offer a couple of perspectives, and my opinion about what we should do as well as clarify what I have instructed Mark to do as part of his summer work. > > First, the discussion has reminded me how valuable it is to get feedback from all points of view. ?While it does lengthen the process, it significantly enhances the result. ?I strongly hope we can continue the tradition of respectful discussion on this mailing list where people's views are treated with respect --- even if we don't always have the time to understand them in depth. > > I also really appreciate people taking the time to visit on the phone call with me as it gave me a chance to understand many opinions quickly and at least start to form a possibly useful opinion. > > Basically, because there is not consensus and in fact a strong and reasonable opposition to specific points, Mark's NEP as proposed cannot be accepted in its entirety right now. ? However, ?I believe an implementation of his NEP is useful and will be instructive in resolving the issues and so I have instructed him to spend Enthought time on the implementation. ? Any changes that need to be made to the API before it is accepted into a released form of NumPy can still be made even after most of the implementation is completed as far as I understand it. ? This is because most of the disagreement is about the specific ability to manipulate the masks independently of assigning missing data and the creation of an additional np.HIDE (np.IGNORE) concept at the Python level. > > Despite some powerful arguments on both sides of the discussion, I am confident that we can figure out an elegant solution that will work long term. > > My current opinion is that I am very favorable to making easy the use-case that has been repeatedly described of having "missing data" that is *always* missing and then having "hidden data" that you don't want to think about for a particular set of calculations (but you also don't want to through away by over-writing). ? I think it is important to make it easy to keep that data around without over-writing but also have the "idea" of that kind of missing data different than the idea of data you can't care about because it just isn't there. > > I also think it is important for the calculation infrastructure to have just one notion of "missing data" which Mark's NEP handles beautifully. > > It seems to me that some of the disagreement is one of perspective in that Mark articulates very well the position of "generic programming, make-opaque-the-implementation" perspective with a focus on the implications of missing data for calculations. ? ?Nathaniel and Matthew articulate well the perspective of "focusing" on the data object itself and the desire to keep separate the different ideas behind missing data that have been described --- as well as a powerfully described description of the NumPy tradition of exposing the raw data to the Python side without hiding too much of the implementation from the user. > > I think it's a healthy discussion. ? But, I would like to see Mark's code get completed so that we can start talking about code examples. ? Please don't interpret my instructing Mark to finish the code as "it's been decided". ?I simply think it's the best path forward to ultimately resolving the concerns. ? I would like to see an API worked out before summer's end --- and I'm hopeful everyone will be excited about what the resulting design is. > > I do think there is room for agreement in the present debate if we all remember to keep listening to each other. ?It takes a lot of effort to understand somebody else's point of view. ?I have been grateful to see evidence I see of that behavior multiple times (in Mark's revamping of the NEP, in Matthew Brett's re-statement of his interpretation of Mark's views, in Nathaniel's working hard to engage the dialogue even in the throes of finishing his PhD, and many other examples). > > It makes me very happy to be a part of this community. ?I look forward to times when I can send more thoughtful and technical emails than this one. Thanks for this email - it is very helpful. Personally I was worrying that: A) Mark had not fully grasped our concern B) Disagreement was not welcome and this gave me an uncomfortable feeling about A) the resulting API and B) the discussion. You've dealt with both here, and thank you for that. Can I ask - what do you recommend that we do now, for the discussion? Should we be quiet and wait until there is code to test, or, as Nathaniel has tried to do, work at reaching some compromise that makes sense to some or all parties? Thanks again, Matthew From rblove_lists at comcast.net Fri Jul 8 09:21:17 2011 From: rblove_lists at comcast.net (Robert Love) Date: Fri, 8 Jul 2011 08:21:17 -0500 Subject: [Numpy-discussion] ANN: EPD 7.1 released In-Reply-To: References: Message-ID: How does this match up with the recently announced release of ETS-4.0? Are the versions of the python modules the same? On Jul 8, 2011, at 12:37 AM, Ilan Schnell wrote: > Hello, > > I am pleased to announce that EPD (Enthought Python Distribution) > version 7.1 has been released. The most significant change is the > addition of an "EPD Free" version, which has its own very liberal > license, and can be downloaded and used free of any charge by > anyone (not only academics). "EPD Free" includes a subset of the > packages included in the full EPD. The highlights of this subset are: > numpy, scipy, matplotlib, traits and chaco. To see which libraries > are included in the free vs. full version, please see: > > http://www.enthought.com/products/epdlibraries.php > > In addition we have opened our PyPI build mirror for everyone. > This means that one can type "enpkg xyz" for 10,000+ packages. > However, there are still benefits to becoming an EPD subscriber. > > http://www.enthought.com/products/getepd.php > > Apart from the addition of "EPD Free", this release includes updates > to over 30 packages, including numpy, scipy, ipython and ETS. > We have also added PySide, Qt and MDP to this release. Please find the > complete list of additions, updates and bug fixes in the change log: > > http://www.enthought.com/products/changelog.php > > > About EPD > --------- > The Enthought Python Distribution (EPD) is a "kitchen-sink-included" > distribution of the Python programming language, including over 90 > additional tools and libraries. The EPD bundle includes NumPy, SciPy, > IPython, 2D and 3D visualization, and many other tools. > > EPD is currently available as a single-click installer for Windows XP, > Vista and 7, MacOSX (10.5 and 10.6), RedHat 3, 4 and 5, as well as > Solaris 10 (x86 and x86_64/amd64 on all platforms). > > All versions of EPD (32 and 64-bit) are free for academic use. An > annual subscription including installation support is available for > individual and commercial use. Additional support options, including > customization, bug fixes and training classes are also available: > > http://www.enthought.com/products/epd_sublevels.php > > - Ilan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From bsouthey at gmail.com Fri Jul 8 09:22:24 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 08 Jul 2011 08:22:24 -0500 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> Message-ID: <4E170490.8050702@gmail.com> On 07/08/2011 07:15 AM, Matthew Brett wrote: > Hi Travis, > > On Fri, Jul 8, 2011 at 5:03 AM, Travis Oliphant wrote: >> Hi all, >> >> I want to first apologize for stepping into this discussion a bit late and for not being able to participate adequately. However, I want to offer a couple of perspectives, and my opinion about what we should do as well as clarify what I have instructed Mark to do as part of his summer work. >> >> First, the discussion has reminded me how valuable it is to get feedback from all points of view. While it does lengthen the process, it significantly enhances the result. I strongly hope we can continue the tradition of respectful discussion on this mailing list where people's views are treated with respect --- even if we don't always have the time to understand them in depth. >> >> I also really appreciate people taking the time to visit on the phone call with me as it gave me a chance to understand many opinions quickly and at least start to form a possibly useful opinion. >> >> Basically, because there is not consensus and in fact a strong and reasonable opposition to specific points, Mark's NEP as proposed cannot be accepted in its entirety right now. However, I believe an implementation of his NEP is useful and will be instructive in resolving the issues and so I have instructed him to spend Enthought time on the implementation. Any changes that need to be made to the API before it is accepted into a released form of NumPy can still be made even after most of the implementation is completed as far as I understand it. This is because most of the disagreement is about the specific ability to manipulate the masks independently of assigning missing data and the creation of an additional np.HIDE (np.IGNORE) concept at the Python level. >> >> Despite some powerful arguments on both sides of the discussion, I am confident that we can figure out an elegant solution that will work long term. >> >> My current opinion is that I am very favorable to making easy the use-case that has been repeatedly described of having "missing data" that is *always* missing and then having "hidden data" that you don't want to think about for a particular set of calculations (but you also don't want to through away by over-writing). I think it is important to make it easy to keep that data around without over-writing but also have the "idea" of that kind of missing data different than the idea of data you can't care about because it just isn't there. >> >> I also think it is important for the calculation infrastructure to have just one notion of "missing data" which Mark's NEP handles beautifully. >> >> It seems to me that some of the disagreement is one of perspective in that Mark articulates very well the position of "generic programming, make-opaque-the-implementation" perspective with a focus on the implications of missing data for calculations. Nathaniel and Matthew articulate well the perspective of "focusing" on the data object itself and the desire to keep separate the different ideas behind missing data that have been described --- as well as a powerfully described description of the NumPy tradition of exposing the raw data to the Python side without hiding too much of the implementation from the user. >> >> I think it's a healthy discussion. But, I would like to see Mark's code get completed so that we can start talking about code examples. Please don't interpret my instructing Mark to finish the code as "it's been decided". I simply think it's the best path forward to ultimately resolving the concerns. I would like to see an API worked out before summer's end --- and I'm hopeful everyone will be excited about what the resulting design is. >> >> I do think there is room for agreement in the present debate if we all remember to keep listening to each other. It takes a lot of effort to understand somebody else's point of view. I have been grateful to see evidence I see of that behavior multiple times (in Mark's revamping of the NEP, in Matthew Brett's re-statement of his interpretation of Mark's views, in Nathaniel's working hard to engage the dialogue even in the throes of finishing his PhD, and many other examples). >> >> It makes me very happy to be a part of this community. I look forward to times when I can send more thoughtful and technical emails than this one. > Thanks for this email - it is very helpful. > > Personally I was worrying that: > > A) Mark had not fully grasped our concern > B) Disagreement was not welcome > > and this gave me an uncomfortable feeling about A) the resulting API > and B) the discussion. You've dealt with both here, and thank you for > that. > > Can I ask - what do you recommend that we do now, for the discussion? > Should we be quiet and wait until there is code to test, or, as > Nathaniel has tried to do, work at reaching some compromise that makes > sense to some or all parties? > > Thanks again, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I agree that this has been very interesting discussion especially the great interaction between everyone. The one thing that we do need now is the code that implements the small set of core ideas (array creation and simple numerical operations). Hopefully that will provide a better grasp of the concepts and the performance differences to determine the acceptability of the approach(es). Bruce From ischnell at enthought.com Fri Jul 8 09:57:41 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Fri, 8 Jul 2011 08:57:41 -0500 Subject: [Numpy-discussion] ANN: EPD 7.1 released In-Reply-To: References: Message-ID: I'm not sure what you mean, when you ask if the Python modules are the same. EPD 7.1 includes ETS 4.0. - Ilan On Fri, Jul 8, 2011 at 8:21 AM, Robert Love wrote: > > How does this match up with the recently announced release of ETS-4.0? ?Are the versions of the python modules the same? > > > > On Jul 8, 2011, at 12:37 AM, Ilan Schnell wrote: > >> Hello, >> >> I am pleased to announce that EPD (Enthought Python Distribution) >> version 7.1 has been released. ?The most significant change is the >> addition of an "EPD Free" version, which has its own very liberal >> license, and can be downloaded and used free of any charge by >> anyone (not only academics). ?"EPD Free" includes a subset of the >> packages included in the full EPD. ?The highlights of this subset are: >> numpy, scipy, matplotlib, traits and chaco. ?To see which libraries >> are included in the free vs. full version, please see: >> >> ? ? ? ?http://www.enthought.com/products/epdlibraries.php >> >> In addition we have opened our PyPI build mirror for everyone. >> This means that one can type "enpkg xyz" for 10,000+ packages. >> However, there are still benefits to becoming an EPD subscriber. >> >> ? ? ? ?http://www.enthought.com/products/getepd.php >> >> Apart from the addition of "EPD Free", this release includes updates >> to over 30 packages, including numpy, scipy, ipython and ETS. >> We have also added PySide, Qt and MDP to this release. ?Please find the >> complete list of additions, updates and bug fixes in the change log: >> >> ? ? ? ?http://www.enthought.com/products/changelog.php >> >> >> About EPD >> --------- >> The Enthought Python Distribution (EPD) is a "kitchen-sink-included" >> distribution of the Python programming language, including over 90 >> additional tools and libraries. The EPD bundle includes NumPy, SciPy, >> IPython, 2D and 3D visualization, and many other tools. >> >> EPD is currently available as a single-click installer for Windows XP, >> Vista and 7, MacOSX (10.5 and 10.6), RedHat 3, 4 and 5, as well as >> Solaris 10 (x86 and x86_64/amd64 on all platforms). >> >> All versions of EPD (32 and 64-bit) are free for academic use. ?An >> annual subscription including installation support is available for >> individual and commercial use. ?Additional support options, including >> customization, bug fixes and training classes are also available: >> >> ? ? ? ?http://www.enthought.com/products/epd_sublevels.php >> >> - Ilan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From matthew.brett at gmail.com Fri Jul 8 09:58:42 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 8 Jul 2011 14:58:42 +0100 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: <4E170490.8050702@gmail.com> References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> Message-ID: Hi, Just checking - but is this: On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey wrote: ... > The one thing that we do need now is the code that implements the small > set of core ideas (array creation and simple numerical operations). > Hopefully that will provide a better grasp of the concepts and the > performance differences to determine the acceptability of the approach(es). in reference to this: > On 07/08/2011 07:15 AM, Matthew Brett wrote: ... >> Can I ask - what do you recommend that we do now, for the discussion? >> Should we be quiet and wait until there is code to test, or, as >> Nathaniel has tried to do, work at reaching some compromise that makes >> sense to some or all parties? ? Cheers, Matthew From derek at astro.physik.uni-goettingen.de Fri Jul 8 10:17:16 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 8 Jul 2011 16:17:16 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: <2486ACC5-C5BA-42C3-9E70-3168FE4D7A48@post.harvard.edu> References: <4E15BE8C.1090202@gmail.com> <2486ACC5-C5BA-42C3-9E70-3168FE4D7A48@post.harvard.edu> Message-ID: <8D5FEB83-7BEC-4AB6-B63A-58E25F80372E@astro.physik.uni-goettingen.de> On 07.07.2011, at 7:16PM, Robert Pyle wrote: > .............../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1922: RuntimeWarning: invalid value encountered in absolute > return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) > > > Everything else completes with 3 KNOWNFAILs and 1 SKIP. This warning is not new to this release; I've seen it before but haven't tried tracking it down until today. > > It arises in allclose(). The comments state "If either array contains NaN, then False is returned." but no test for NaN is done, and NaNs are indeed what cause the warning. > > Inserting > > if any(isnan(x)) or any(isnan(y)): > return False > > before current line number 1916 in numeric.py seems to fix it. The same warning is still present in the current master, I just never paid attention to it because the tests still pass (it does correctly identify NaNs because they are not less_equal the tolerance), but of course this should be properly fixed as you suggest. Cheers, Derek From bergstrj at iro.umontreal.ca Fri Jul 8 11:15:28 2011 From: bergstrj at iro.umontreal.ca (James Bergstra) Date: Fri, 8 Jul 2011 11:15:28 -0400 Subject: [Numpy-discussion] potential bug in PyArray_MoveInto and PyArray_CopyInto? In-Reply-To: References: Message-ID: On Thu, Jul 7, 2011 at 4:59 PM, James Bergstra wrote: > On Thu, Jul 7, 2011 at 1:10 PM, Charles R Harris > wrote: >> >> >> On Thu, Jul 7, 2011 at 11:03 AM, James Bergstra >> wrote: >>> >>> In numpy 1.5.1, ?the functions PyArray_MoveInto and PyArray_CopyInto >>> don't appear to treat strides correctly. >>> >>> Evidence: >>> PyNumber_InPlaceAdd(dst, src), and modifies the correct subarray to >>> which dst points. >>> >>> In the same context, PyArray_MoveInto(dst, src) modifies the first two >>> rows of the >>> underlying matrix instead of the first two columns. PyArray_CopyInto >>> does the same. >>> >>> Is there something subtle going on here? >>> >> >> What are the strides/dims in src and dst? >> >> Chuck >> > > In dst: strides = (40,8), dims=(5,2) > in src: strides = () dims=() > > dst was sliced out of a 5x5 array of doubles. > src is a 0-d array > > James > -- > http://www-etud.iro.umontreal.ca/~bergstrj > I figured it out - I had forgotten to call PyArray_UpdateFlags after adjusting some strides. James -- http://www-etud.iro.umontreal.ca/~bergstrj From sturla at molden.no Fri Jul 8 11:20:44 2011 From: sturla at molden.no (Sturla Molden) Date: Fri, 08 Jul 2011 17:20:44 +0200 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: <201107071324.p67DOT8L031257@lotus.yokuts.org> References: <1310040632.1736.85.camel@casimir> <201107071324.p67DOT8L031257@lotus.yokuts.org> Message-ID: <4E17204C.2030905@molden.no> Den 07.07.2011 15:24, skrev Yoshi Rokuko: > thank you for pointing that out! > > so how do you change your numpy related c code now, would you like to share? > Regardless or memory layout, we can always access element array[i,j,k] like this: const int s0 = array->strides[0]; const int s1 = array->strides[1]; const int s2 = array->strides[2]; char *const data = array->data; dtype *element = (dtype*)(data + i*s0 + j*s1 + k*s2); To force a particular layout, I usually call "np.ascontiguousarray" or "np.asfortranarray" in Python or Cython before calling into C or Fortran. These functions will do nothing if the layout is already correct. Sturla From sturla at molden.no Fri Jul 8 12:50:39 2011 From: sturla at molden.no (Sturla Molden) Date: Fri, 08 Jul 2011 18:50:39 +0200 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: <1310040632.1736.85.camel@casimir> References: <1310040632.1736.85.camel@casimir> Message-ID: <4E17355F.4000209@molden.no> Den 07.07.2011 14:10, skrev Jens J?rgen Mortensen: > So, this means I can't count on new arrays being C-contiguous any more. > I guess there is a good reason for this. Work with linear algebra (LAPACK) caused excessive and redundant array transpositions. Arrays would be transposed from C to Fortran order before they were passed to LAPACK, and returned arrays were transposed from Fortran to C order when used in Python. Signal and image processing in SciPy (FFTPACK) suffered from the same issue, as did certain optimization (MINPACK). Computer graphics with OpenGL was similarly impaired. The OpenGL library has a C frontent, but requires that all buffers and matrices are stored in Fortran order. The old behaviour of NumPy was very annoying. Now we can rely on NumPy to always use the most efficient memory layout, unless we request one in particular. Yeah, and it also made NumPy look bad compared to Matlab, which always uses Fortran order for this reason ;-) Sturla From bsouthey at gmail.com Fri Jul 8 13:38:21 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 08 Jul 2011 12:38:21 -0500 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> Message-ID: <4E17408D.80705@gmail.com> On 07/08/2011 08:58 AM, Matthew Brett wrote: > Hi, > > Just checking - but is this: > > On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey wrote: > ... >> The one thing that we do need now is the code that implements the small >> set of core ideas (array creation and simple numerical operations). >> Hopefully that will provide a better grasp of the concepts and the >> performance differences to determine the acceptability of the approach(es). > in reference to this: > >> On 07/08/2011 07:15 AM, Matthew Brett wrote: > ... >>> Can I ask - what do you recommend that we do now, for the discussion? >>> Should we be quiet and wait until there is code to test, or, as >>> Nathaniel has tried to do, work at reaching some compromise that makes >>> sense to some or all parties? > ? > > Cheers, > > Matthew Simply, I think the time for discussion has passed and it is now time to see the 'cards'. I do not know enough (or anything) about the implementation so I need code to know the actual 'cost' of Mark's idea with real situations. I am also curious on the implementation as 'conditional' unmasking can be used implement some of the missing values ideas. That is unmask all values that do not match some special value like max(int) for int arrays and some IEEE 754 range (like 'Indeterminate') for floats. The reason is that I have major concerns with handling missing values in integer arrays that Mark's idea hopefully will remove. Bruce From matthew.brett at gmail.com Fri Jul 8 13:55:57 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 8 Jul 2011 18:55:57 +0100 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: <4E17408D.80705@gmail.com> References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> <4E17408D.80705@gmail.com> Message-ID: Hi, On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey wrote: > On 07/08/2011 08:58 AM, Matthew Brett wrote: >> Hi, >> >> Just checking - but is this: >> >> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey ?wrote: >> ... >>> The one thing that we do need now is the code that implements the small >>> set of core ideas (array creation and simple numerical operations). >>> Hopefully that will provide a better grasp of the concepts and the >>> performance differences to determine the acceptability of the approach(es). >> in reference to this: >> >>> On 07/08/2011 07:15 AM, Matthew Brett wrote: >> ... >>>> Can I ask - what do you recommend that we do now, for the discussion? >>>> Should we be quiet and wait until there is code to test, or, as >>>> Nathaniel has tried to do, work at reaching some compromise that makes >>>> sense to some or all parties? >> ? >> >> Cheers, >> >> Matthew > Simply, I think the time for discussion has passed and it is now time to > see the 'cards'. I do not know enough (or anything) about the > implementation so I need code to know the actual 'cost' of Mark's idea > with real situations. Yes, I thought that was what you were saying. I disagree and think that discussion of the type that Nathaniel has started is a useful way to think more clearly and specifically about the API and what can be agreed. Otherwise we will come to the same impasse when Mark's code arrives. If that happens, we'll either lose the code because the merge is refused, or be forced into something that may not be the best way forward. Best, Matthew From bsouthey at gmail.com Fri Jul 8 15:34:53 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 8 Jul 2011 14:34:53 -0500 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> <4E17408D.80705@gmail.com> Message-ID: On Fri, Jul 8, 2011 at 12:55 PM, Matthew Brett wrote: > Hi, > > On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey wrote: >> On 07/08/2011 08:58 AM, Matthew Brett wrote: >>> Hi, >>> >>> Just checking - but is this: >>> >>> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey ?wrote: >>> ... >>>> The one thing that we do need now is the code that implements the small >>>> set of core ideas (array creation and simple numerical operations). >>>> Hopefully that will provide a better grasp of the concepts and the >>>> performance differences to determine the acceptability of the approach(es). >>> in reference to this: >>> >>>> On 07/08/2011 07:15 AM, Matthew Brett wrote: >>> ... >>>>> Can I ask - what do you recommend that we do now, for the discussion? >>>>> Should we be quiet and wait until there is code to test, or, as >>>>> Nathaniel has tried to do, work at reaching some compromise that makes >>>>> sense to some or all parties? >>> ? >>> >>> Cheers, >>> >>> Matthew >> Simply, I think the time for discussion has passed and it is now time to >> see the 'cards'. I do not know enough (or anything) about the >> implementation so I need code to know the actual 'cost' of Mark's idea >> with real situations. > > Yes, I thought that was what you were saying. > > I disagree and think that discussion of the type that Nathaniel has > started is a useful way to think more clearly and specifically about > the API and what can be agreed. > > Otherwise we will come to the same impasse when Mark's code arrives. > If that happens, we'll either lose the code because the merge is > refused, or be forced into something that may not be the best way > forward. > > Best, > > Matthew > _______________________________________________ Unfortunately we need code from either side as an API etc. is not sufficient to judge anything. But I do not think we will be forced into anything as in the extreme situation you can keep old versions or fork the code in the really extreme case. Bruce From matthew.brett at gmail.com Fri Jul 8 17:35:16 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 8 Jul 2011 22:35:16 +0100 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> <4E17408D.80705@gmail.com> Message-ID: Hi, On Fri, Jul 8, 2011 at 8:34 PM, Bruce Southey wrote: > On Fri, Jul 8, 2011 at 12:55 PM, Matthew Brett wrote: >> Hi, >> >> On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey wrote: >>> On 07/08/2011 08:58 AM, Matthew Brett wrote: >>>> Hi, >>>> >>>> Just checking - but is this: >>>> >>>> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey ?wrote: >>>> ... >>>>> The one thing that we do need now is the code that implements the small >>>>> set of core ideas (array creation and simple numerical operations). >>>>> Hopefully that will provide a better grasp of the concepts and the >>>>> performance differences to determine the acceptability of the approach(es). >>>> in reference to this: >>>> >>>>> On 07/08/2011 07:15 AM, Matthew Brett wrote: >>>> ... >>>>>> Can I ask - what do you recommend that we do now, for the discussion? >>>>>> Should we be quiet and wait until there is code to test, or, as >>>>>> Nathaniel has tried to do, work at reaching some compromise that makes >>>>>> sense to some or all parties? >>>> ? >>>> >>>> Cheers, >>>> >>>> Matthew >>> Simply, I think the time for discussion has passed and it is now time to >>> see the 'cards'. I do not know enough (or anything) about the >>> implementation so I need code to know the actual 'cost' of Mark's idea >>> with real situations. >> >> Yes, I thought that was what you were saying. >> >> I disagree and think that discussion of the type that Nathaniel has >> started is a useful way to think more clearly and specifically about >> the API and what can be agreed. >> >> Otherwise we will come to the same impasse when Mark's code arrives. >> If that happens, we'll either lose the code because the merge is >> refused, or be forced into something that may not be the best way >> forward. >> >> Best, >> >> Matthew >> _______________________________________________ > > > Unfortunately we need code from either side as an API etc. is not > sufficient to judge anything. If I understand correctly, we are not going to get code from either side, we are only going to get code from one side. I cannot now see how the code will inform the discussion about the API, unless it turns out that the proposed API cannot be implemented. The substantial points are not about memory use or performance, but about how the API should work. If you can see some way that the code will inform the discussion, please say, I would honestly be grateful. > But I do not think we will be forced > into anything as in the extreme situation you can keep old versions or > fork the code in the really extreme case. That would be a terrible waste, and potentially damaging to the community, so of course we want to do all we can to avoid those outcomes. Best, Matthew From njs at pobox.com Fri Jul 8 18:04:56 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 8 Jul 2011 15:04:56 -0700 Subject: [Numpy-discussion] gist gist: 1068264 In-Reply-To: References: Message-ID: Hi Bruce, I'm replying on the list instead of on github, to make it easier for others to join in the discussion if they want. [For those joining in: this was a comment posted at https://gist.github.com/1068264 ] On Fri, Jul 8, 2011 at 10:36 AM, bsouthey wrote: > I presume missing float values could be addressed with one of the 'special' ranges such as 'Indeterminate' in IEEE 754 (http://babbage.cs.qc.edu/IEEE-754/References.xhtml). The outcome should be determined by the IEEE special operations. Right. An IEEE 754 double has IIRC about 2^53 distinct bit-patterns that all mean "not a number". A few of these are used to signal different invalid operations: In [20]: hex(np.asarray([np.nan]).view(dtype=np.uint64)[0]) Out[20]: '0x7ff8000000000000L' In [21]: hex(np.log([0]).view(dtype=np.uint64)[0]) Out[21]: '0xfff0000000000000L' In [22]: hex(np.divide([0.], [0,]).view(dtype=np.uint64)[0]) Out[22]: '0xfff8000000000000L' ...but that only accounts for, like, 10 of the 2^53 or something. The rest are simply unused. So what R does, and what we would do for dtype-style NAs, is just pick one of those (ideally the same one R uses), and declare that that is *not* not a number; it's NA. > So my real concern is handling integer arrays: > 1) How will you find where the missing values are in an array? If there is a variable that denotes missing values are present (NA_flags?) then do you have to duplicate code to avoid this searching when an array has no missing values? Each dtype has a bunch of C functions associated with it that say how to do comparisons, assignment, etc. In the miniNEP design, we add a new function to this list called 'isna', which every dtype that wants to support NAs has to define. Yes, this does mean that code which wants to treat NAs separately has to check for and call this function if it's present, but that seems to be inevitable... *all* of the dtype C functions are supposedly optional, so we have to check for them before calling them and do something sensible if they aren't defined. We could define a wrapper that calls the function if its defined, or else just fills the provided buffer with zeros (to mean "there are no NAs), and then code which wanted to avoid a special case could use that. But in general we probably do want to handle arrays that might have NAs differently from arrays which don't have NAs, because if there are no NAs present then it's quicker to skip the handling altogether. That's true for any NA implementation. > 2) What happens if a normal operation equates to that value: If you use max(np.int8), such as when adding 1 to an array with an element of 126 or when overflow occurs: >>>> np.arange(120,127, dtype=np.int8)+2 > array([ 122, ?123, ?124, ?125, ?126, ?127, -128], dtype=int8) > The -128 corresponds to the missing element but is the second to last element now missing? This is worse if the overflow is larger. Yeah, in the design as written, overflow (among other things) can create accidental NAs. Which kind of sucks. There are a few options: -- Just live with it. -- We could add a flag like NPY_NA_AUTO_CHECK, and when this flag is set, the ufunc loop runs 'isna' on its output buffer before returning. If there are any NAs there that did not arise from NAs in the input, then it raises an error. (The reason we would want to make it a flag is that this checking is pointless for dtypes like NA-string, and mostly pointless for dtypes like NA-float.) Also, we'd only want to enable this if we were using the NPY_NA_AUTO_UFUNC ufunc-delegation logic, because if you registered a special ufunc loop *specifically for your NA-dtype*, then presumably it knows what it's doing. This would also allow such an NA-dtype-specific ufunc loop to return NAs on purpose if it wanted to. -- Use a dtype that adds a separate flag next to the actual integer to indicate NA-ness, instead of stealing one of the integer's values. So your NA-int8 would actually be 2 bytes, where the first byte was 1 to indicate NA, or 0 to indicate that the second byte contains an actual int8. If you do this with larger integers, say an int32, then you have a choice: you could store your int32 in 8 bytes, in which case arithmetic etc. is fast, but you waste a bit of memory. Or you could store your int32 in 5 bytes, in which case arithmetic etc. become somewhat slower, but you don't waste any memory. (This latter case would basically be like using an unaligned or byteswapped array in current numpy, in terms of mechanisms and speed.) -- Nothing in this design rules out a second implementation of NAs based on masking. Personally, as you know, I'm not a big fan, but if it were added anyway, then you could use that for your integers as well. A related issue is, of the many ways we *can* do integer NA-dtype, which one *should* we do by default. I don't have a strong opinion, really; I haven't heard anyone say that they have huge quantities of integer-plus-NA data that they want to manipulate and memory/speed/allowing the full range of values are all really important. (Maybe that's you?) In the design as written, they're all pretty trivial to implement (you just tweak a few magic numbers in the dtype structure), and probably we should support all of them via more-or-less exotic invocations of np.withNA. (E.g., 'np.withNA(np.int32, useflag=True, flagsize=1)' to get a 5-byte int32.) ...I kind of like that NPY_NA_AUTO_CHECK idea, it's pretty clean and would definitely make things safer. I think I'll add it. -- Nathaniel From mwwiebe at gmail.com Fri Jul 8 19:31:43 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 8 Jul 2011 18:31:43 -0500 Subject: [Numpy-discussion] code review request: masked dtype transfers Message-ID: I've just made pull request 105: https://github.com/numpy/numpy/pull/105 This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, which behave analogously to the corresponding unmasked functions. To expose this with a reasonable interface, I added a function np.copyto, which takes a 'where=' parameter just like the element-wise ufuncs. One thing which needs discussion is that I've flagged 'putmask' and PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto handle what those functions do but in a more flexible fashion. If there are any objections to deprecating 'putmask' and PyArray_PutMask, please speak up! Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Fri Jul 8 21:48:14 2011 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 08 Jul 2011 15:48:14 -1000 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: Message-ID: <4E17B35E.6060900@hawaii.edu> On 07/08/2011 01:31 PM, Mark Wiebe wrote: > I've just made pull request 105: > > https://github.com/numpy/numpy/pull/105 > > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, > which behave analogously to the corresponding unmasked functions. To > expose this with a reasonable interface, I added a function np.copyto, > which takes a 'where=' parameter just like the element-wise ufuncs. > > One thing which needs discussion is that I've flagged 'putmask' and > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto > handle what those functions do but in a more flexible fashion. If there > are any objections to deprecating 'putmask' and PyArray_PutMask, please > speak up! > > Thanks, > Mark Mark, I thought I would do a test comparison of putmask and copyto, so I fetched and checked out your branch and tried to build it (after deleting my build directory), but the build failed: numpy/core/src/multiarray/multiarraymodule_onefile.c:41:20: fatal error: nditer.c: No such file or directory compilation terminated. error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c numpy/core/src/multiarray/multiarraymodule_onefile.c -o build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o" failed with exit status 1 Indeed, with rgrep I see: ./numpy/core/src/multiarray/multiarraymodule_onefile.c:#include "nditer.c" but no sign of nditer.c in the directory tree. Eric From charlesr.harris at gmail.com Fri Jul 8 22:22:51 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Jul 2011 20:22:51 -0600 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: <4E17B35E.6060900@hawaii.edu> References: <4E17B35E.6060900@hawaii.edu> Message-ID: On Fri, Jul 8, 2011 at 7:48 PM, Eric Firing wrote: > On 07/08/2011 01:31 PM, Mark Wiebe wrote: > > I've just made pull request 105: > > > > https://github.com/numpy/numpy/pull/105 > > > > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, > > which behave analogously to the corresponding unmasked functions. To > > expose this with a reasonable interface, I added a function np.copyto, > > which takes a 'where=' parameter just like the element-wise ufuncs. > > > > One thing which needs discussion is that I've flagged 'putmask' and > > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto > > handle what those functions do but in a more flexible fashion. If there > > are any objections to deprecating 'putmask' and PyArray_PutMask, please > > speak up! > > > > Thanks, > > Mark > > Mark, > > I thought I would do a test comparison of putmask and copyto, so I > fetched and checked out your branch and tried to build it (after > deleting my build directory), but the build failed: > > numpy/core/src/multiarray/multiarraymodule_onefile.c:41:20: fatal error: > nditer.c: No such file or directory > compilation terminated. > error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv > -O2 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include > -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include > -I/usr/include/python2.7 > -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c > numpy/core/src/multiarray/multiarraymodule_onefile.c -o > > build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o" > failed with exit status 1 > > Indeed, with rgrep I see: > ./numpy/core/src/multiarray/multiarraymodule_onefile.c:#include "nditer.c" > > but no sign of nditer.c in the directory tree. > > This is fixed in master. The way to use it is git co -b pull-105 curl https://github.com/numpy/numpy/pull/105.patch | git am and then build. That will apply the new stuff as a patch on top of current master. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 8 22:28:04 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Jul 2011 20:28:04 -0600 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: <4E17B35E.6060900@hawaii.edu> Message-ID: On Fri, Jul 8, 2011 at 8:22 PM, Charles R Harris wrote: > > > On Fri, Jul 8, 2011 at 7:48 PM, Eric Firing wrote: > >> On 07/08/2011 01:31 PM, Mark Wiebe wrote: >> > I've just made pull request 105: >> > >> > https://github.com/numpy/numpy/pull/105 >> > >> > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, >> > which behave analogously to the corresponding unmasked functions. To >> > expose this with a reasonable interface, I added a function np.copyto, >> > which takes a 'where=' parameter just like the element-wise ufuncs. >> > >> > One thing which needs discussion is that I've flagged 'putmask' and >> > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto >> > handle what those functions do but in a more flexible fashion. If there >> > are any objections to deprecating 'putmask' and PyArray_PutMask, please >> > speak up! >> > >> > Thanks, >> > Mark >> >> Mark, >> >> I thought I would do a test comparison of putmask and copyto, so I >> fetched and checked out your branch and tried to build it (after >> deleting my build directory), but the build failed: >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:41:20: fatal error: >> nditer.c: No such file or directory >> compilation terminated. >> error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv >> -O2 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include >> -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy >> -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray >> -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include >> -I/usr/include/python2.7 >> -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray >> -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c >> numpy/core/src/multiarray/multiarraymodule_onefile.c -o >> >> build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o" >> failed with exit status 1 >> >> Indeed, with rgrep I see: >> ./numpy/core/src/multiarray/multiarraymodule_onefile.c:#include "nditer.c" >> >> but no sign of nditer.c in the directory tree. >> >> > This is fixed in master. The way to use it is > > git co -b pull-105 > curl https://github.com/numpy/numpy/pull/105.patch | git am > > and then build. That will apply the new stuff as a patch on top of current > master. > > That's in a clone of github.com/numpy/numpy of course. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Fri Jul 8 23:04:09 2011 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 08 Jul 2011 17:04:09 -1000 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: Message-ID: <4E17C529.7090908@hawaii.edu> On 07/08/2011 01:31 PM, Mark Wiebe wrote: > I've just made pull request 105: > > https://github.com/numpy/numpy/pull/105 > > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, > which behave analogously to the corresponding unmasked functions. To > expose this with a reasonable interface, I added a function np.copyto, > which takes a 'where=' parameter just like the element-wise ufuncs. > > One thing which needs discussion is that I've flagged 'putmask' and > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto > handle what those functions do but in a more flexible fashion. If there > are any objections to deprecating 'putmask' and PyArray_PutMask, please > speak up! Mark, Looks good! Some quick tests with large and small arrays show copyto is faster than putmask when the source is an array and only a bit slower when the source is a scalar. Eric > > Thanks, > Mark From charlesr.harris at gmail.com Fri Jul 8 23:24:39 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Jul 2011 21:24:39 -0600 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 5:31 PM, Mark Wiebe wrote: > I've just made pull request 105: > > https://github.com/numpy/numpy/pull/105 > > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, > which behave analogously to the corresponding unmasked functions. To expose > this with a reasonable interface, I added a function np.copyto, which takes > a 'where=' parameter just like the element-wise ufuncs. > > One thing which needs discussion is that I've flagged 'putmask' and > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto > handle what those functions do but in a more flexible fashion. If there are > any objections to deprecating 'putmask' and PyArray_PutMask, please speak > up! > > I think it is OK to deprecate PyArray_PutMask but it should still work. It may be a loooong time before deprecated API functions can be removed, if ever... As to putmask, I don't really have an opinion, but it should probably be reimplemented to use copyto. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 8 23:31:12 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 8 Jul 2011 21:31:12 -0600 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 9:24 PM, Charles R Harris wrote: > > > On Fri, Jul 8, 2011 at 5:31 PM, Mark Wiebe wrote: > >> I've just made pull request 105: >> >> https://github.com/numpy/numpy/pull/105 >> >> This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, >> which behave analogously to the corresponding unmasked functions. To expose >> this with a reasonable interface, I added a function np.copyto, which takes >> a 'where=' parameter just like the element-wise ufuncs. >> >> One thing which needs discussion is that I've flagged 'putmask' and >> PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto >> handle what those functions do but in a more flexible fashion. If there are >> any objections to deprecating 'putmask' and PyArray_PutMask, please speak >> up! >> >> > I think it is OK to deprecate PyArray_PutMask but it should still work. It > may be a loooong time before deprecated API functions can be removed, if > ever... As to putmask, I don't really have an opinion, but it should > probably be reimplemented to use copyto. > > One thing about putmask is that it is widely used in masked arrays, so that needs to be fixed before there is a release. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Jul 9 01:03:40 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 9 Jul 2011 00:03:40 -0500 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: <4E17B35E.6060900@hawaii.edu> References: <4E17B35E.6060900@hawaii.edu> Message-ID: Sorry, looks like I forgot to rebase against master as Chuck pointed out. -Mark On Fri, Jul 8, 2011 at 8:48 PM, Eric Firing wrote: > On 07/08/2011 01:31 PM, Mark Wiebe wrote: > > I've just made pull request 105: > > > > https://github.com/numpy/numpy/pull/105 > > > > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, > > which behave analogously to the corresponding unmasked functions. To > > expose this with a reasonable interface, I added a function np.copyto, > > which takes a 'where=' parameter just like the element-wise ufuncs. > > > > One thing which needs discussion is that I've flagged 'putmask' and > > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto > > handle what those functions do but in a more flexible fashion. If there > > are any objections to deprecating 'putmask' and PyArray_PutMask, please > > speak up! > > > > Thanks, > > Mark > > Mark, > > I thought I would do a test comparison of putmask and copyto, so I > fetched and checked out your branch and tried to build it (after > deleting my build directory), but the build failed: > > numpy/core/src/multiarray/multiarraymodule_onefile.c:41:20: fatal error: > nditer.c: No such file or directory > compilation terminated. > error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv > -O2 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include > -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include > -I/usr/include/python2.7 > -Ibuild/src.linux-x86_64-2.7/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-2.7/numpy/core/src/umath -c > numpy/core/src/multiarray/multiarraymodule_onefile.c -o > > build/temp.linux-x86_64-2.7/numpy/core/src/multiarray/multiarraymodule_onefile.o" > failed with exit status 1 > > Indeed, with rgrep I see: > ./numpy/core/src/multiarray/multiarraymodule_onefile.c:#include "nditer.c" > > but no sign of nditer.c in the directory tree. > > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Jul 9 01:06:34 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 9 Jul 2011 00:06:34 -0500 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: <4E17C529.7090908@hawaii.edu> References: <4E17C529.7090908@hawaii.edu> Message-ID: On Fri, Jul 8, 2011 at 10:04 PM, Eric Firing wrote: > On 07/08/2011 01:31 PM, Mark Wiebe wrote: > > I've just made pull request 105: > > > > https://github.com/numpy/numpy/pull/105 > > > > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, > > which behave analogously to the corresponding unmasked functions. To > > expose this with a reasonable interface, I added a function np.copyto, > > which takes a 'where=' parameter just like the element-wise ufuncs. > > > > One thing which needs discussion is that I've flagged 'putmask' and > > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto > > handle what those functions do but in a more flexible fashion. If there > > are any objections to deprecating 'putmask' and PyArray_PutMask, please > > speak up! > > Mark, > > Looks good! Some quick tests with large and small arrays show copyto is > faster than putmask when the source is an array and only a bit slower > when the source is a scalar. > With a bit of effort into performance optimization, it can probably be faster in the scalar cases as well. Currently, the masked case is always a function which calls the unmasked inner loop for the values that are unmasked. A faster way would be to create inner loops that handle the mask directly. -Mark > > Eric > > > > > Thanks, > > Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dosena at mail.com Sat Jul 9 09:35:21 2011 From: dosena at mail.com (=?ISO-8859-1?Q?Nathaniel_Dos=E9?=) Date: Sat, 09 Jul 2011 15:35:21 +0200 Subject: [Numpy-discussion] non-integer powers in scimath.power Message-ID: <4E185919.2070900@mail.com> Is it true that numpy.lib.scimath.power() only accepts integer powers? >>> np.info(scimath.power) ... >>> Parameters >>> ---------- >>> x : array_like >>> The input value(s). >>> p : array_like of ints In any case, scimath.power() doesn't complain when you send in a non-int, and in fact gives a pretty correct answer: >>> scimath.power(-4., .5) (1.2246063538223773e-16+2j) Note I'm wondering about any non-integers, not just .5 Also note that the "normal" numpy power() function accepts non-ints: >>> np.info(np.power) ... >>> Parameters >>> ---------- >>> x1 : array_like >>> The bases. >>> x2 : array_like >>> The exponents. ... >>> np.power(4, .5) 2.0 (Sorry, I tried to follow the source for scimath.py to see what really happens at the C level, but lose the trail as soon as I hit numpy.core.umath...) -Nathaniel From charlesr.harris at gmail.com Sat Jul 9 11:24:02 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 9 Jul 2011 09:24:02 -0600 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: <4E17C529.7090908@hawaii.edu> Message-ID: On Fri, Jul 8, 2011 at 11:06 PM, Mark Wiebe wrote: > On Fri, Jul 8, 2011 at 10:04 PM, Eric Firing wrote: > >> On 07/08/2011 01:31 PM, Mark Wiebe wrote: >> > I've just made pull request 105: >> > >> > https://github.com/numpy/numpy/pull/105 >> > >> > This adds public API PyArray_MaskedCopyInto and PyArray_MaskedMoveInto, >> > which behave analogously to the corresponding unmasked functions. To >> > expose this with a reasonable interface, I added a function np.copyto, >> > which takes a 'where=' parameter just like the element-wise ufuncs. >> > >> > One thing which needs discussion is that I've flagged 'putmask' and >> > PyArray_PutMask as deprecated, because 'copyto' PyArray_MaskedMoveInto >> > handle what those functions do but in a more flexible fashion. If there >> > are any objections to deprecating 'putmask' and PyArray_PutMask, please >> > speak up! >> >> Mark, >> >> Looks good! Some quick tests with large and small arrays show copyto is >> faster than putmask when the source is an array and only a bit slower >> when the source is a scalar. >> > > With a bit of effort into performance optimization, it can probably be > faster in the scalar cases as well. Currently, the masked case is always a > function which calls the unmasked inner loop for the values that are > unmasked. A faster way would be to create inner loops that handle the mask > directly. > > Are you planning on doing that somewhere down the line? I'm going to push this with some style and typo mods. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sat Jul 9 13:53:46 2011 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 09 Jul 2011 07:53:46 -1000 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: Message-ID: <4E1895AA.5000206@hawaii.edu> On 07/08/2011 01:31 PM, Mark Wiebe wrote: > I've just made pull request 105: > > https://github.com/numpy/numpy/pull/105 > It's merged, which is good, but I have a suggestion relevant to that pull and I suspect to many others to come: use defines and macros to consolidate some of the implementation details. For example: #define MASK_TYPE npy_uint8 #define EXPOSE 1 #define HIDE 0 #define EXPOSED(mask) ( (*(MASK_TYPE *)mask)&0x01 == EXPOSE ) etc. The potential advantages are readability, reduction of scope for typos, and ease of testing alternative implementation details, should that turn out to be desirable. I am assuming that only a few expressions like EXPOSED will be needed *many* places in the code. Eric From mwwiebe at gmail.com Sat Jul 9 14:29:18 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 9 Jul 2011 13:29:18 -0500 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: <4E1895AA.5000206@hawaii.edu> References: <4E1895AA.5000206@hawaii.edu> Message-ID: On Sat, Jul 9, 2011 at 12:53 PM, Eric Firing wrote: > On 07/08/2011 01:31 PM, Mark Wiebe wrote: > > I've just made pull request 105: > > > > https://github.com/numpy/numpy/pull/105 > > > > It's merged, which is good, but I have a suggestion relevant to that > pull and I suspect to many others to come: use defines and macros to > consolidate some of the implementation details. For example: > > #define MASK_TYPE npy_uint8 > #define EXPOSE 1 > #define HIDE 0 > #define EXPOSED(mask) ( (*(MASK_TYPE *)mask)&0x01 == EXPOSE ) > > etc. > > The potential advantages are readability, reduction of scope for typos, > and ease of testing alternative implementation details, should that turn > out to be desirable. I am assuming that only a few expressions like > EXPOSED will be needed *many* places in the code. > That's a great idea, thanks. The one thing I would slightly adjust is to put everything in a 'macro namespace' to avoid global namespace pollution. Maybe: typedef npy_uint8 npy_mask; #define NPY_MASK NPY_UINT8 #define NPY_MASK_ISEXPOSED(mask) (((mask)&0x01) != 0) #define NPY_MASK_GETPAYLOAD(mask) (((npy_mask)mask) >> 1) #define NPY_MASK_MAKEMASK(exposed, payload) ((npy_mask)(exposed&0x01) | (npy_mask)(payload << 1)) -Mark > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jul 9 15:38:10 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 Jul 2011 12:38:10 -0700 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: <4E1895AA.5000206@hawaii.edu> Message-ID: On Sat, Jul 9, 2011 at 11:29 AM, Mark Wiebe wrote: > typedef npy_uint8 npy_mask; > #define NPY_MASK NPY_UINT8 > #define NPY_MASK_ISEXPOSED(mask) (((mask)&0x01) != 0) > #define NPY_MASK_GETPAYLOAD(mask) (((npy_mask)mask) >> 1) > #define NPY_MASK_MAKEMASK(exposed, payload) ((npy_mask)(exposed&0x01) | > (npy_mask)(payload << 1)) Even better, these should be inline functions instead of macros... (or is there some horrible old compiler that we care about that that wouldn't work for?) -- Nathaniel From mwwiebe at gmail.com Sat Jul 9 22:24:26 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 9 Jul 2011 21:24:26 -0500 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: <4E1895AA.5000206@hawaii.edu> Message-ID: On Sat, Jul 9, 2011 at 2:38 PM, Nathaniel Smith wrote: > On Sat, Jul 9, 2011 at 11:29 AM, Mark Wiebe wrote: > > typedef npy_uint8 npy_mask; > > #define NPY_MASK NPY_UINT8 > > #define NPY_MASK_ISEXPOSED(mask) (((mask)&0x01) != 0) > > #define NPY_MASK_GETPAYLOAD(mask) (((npy_mask)mask) >> 1) > > #define NPY_MASK_MAKEMASK(exposed, payload) ((npy_mask)(exposed&0x01) | > > (npy_mask)(payload << 1)) > > Even better, these should be inline functions instead of macros... (or > is there some horrible old compiler that we care about that that > wouldn't work for?) > That's a good idea, it's always worthwhile to use the little bit of type checking the C compiler will let you. NumPy has a macro NPY_INLINE which is used to define inline functions. -Mark > > -- Nathaniel > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Jul 9 22:53:04 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 Jul 2011 19:53:04 -0700 Subject: [Numpy-discussion] code review request: masked dtype transfers In-Reply-To: References: <4E1895AA.5000206@hawaii.edu> Message-ID: On Sat, Jul 9, 2011 at 7:24 PM, Mark Wiebe wrote: > On Sat, Jul 9, 2011 at 2:38 PM, Nathaniel Smith wrote: >> >> On Sat, Jul 9, 2011 at 11:29 AM, Mark Wiebe wrote: >> > typedef npy_uint8 npy_mask; >> > #define NPY_MASK NPY_UINT8 >> > #define NPY_MASK_ISEXPOSED(mask) (((mask)&0x01) != 0) >> > #define NPY_MASK_GETPAYLOAD(mask) (((npy_mask)mask) >> 1) >> > #define NPY_MASK_MAKEMASK(exposed, payload) ((npy_mask)(exposed&0x01) | >> > (npy_mask)(payload << 1)) >> >> Even better, these should be inline functions instead of macros... (or >> is there some horrible old compiler that we care about that that >> wouldn't work for?) > > That's a good idea, it's always worthwhile to use the little bit of type > checking the C compiler will let you. NumPy has a macro NPY_INLINE which is > used to define inline functions. Well, type checking, plus avoiding some of the horrible bizarre pitfalls of using macros. Try running NPY_MASK_MAKEMASK(value > 2, payload1 || payload2) with the above definitions... It's a lot easier to just use functions and not have to spend energy checking for that sort of nonsense :-). -- Nathaniel From mwwiebe at gmail.com Sun Jul 10 16:12:06 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sun, 10 Jul 2011 15:12:06 -0500 Subject: [Numpy-discussion] code review request: masked iteration, fix 'where=' in ufuncs, documentation Message-ID: >From the pull request: https://github.com/numpy/numpy/pull/108 The two flags NPY_ITER_WRITEMASKED and NPY_ITER_ARRAYMASK now fully work. This made it easy to fix the ufunc 'where=' bug. Also added documentation of the new iterator flags and inline functions for dealing with masks. Special thanks to Chuck, Eric, and Nathaniel for reviewing and providing feedback on my last pull request. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Sun Jul 10 21:57:28 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Sun, 10 Jul 2011 18:57:28 -0700 Subject: [Numpy-discussion] numpy build issue on i7-2600K CPU Message-ID: <4E1A5888.1030808@uci.edu> Hello, building numpy 1.6.1rc2 on Windows, i7-2600K CPU, with msvc9 failed with the following error: File "numpy/core/setup_common.py", line 271, in long_double_representation raise ValueError("Could not lock sequences (%s)" % saw) ValueError: Could not lock sequences (None) This problem has been mentioned before at . Opening the configtest.obj file in binary mode fixed the issue for me. A patch is attached. Christoph -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: i7-2600k.diff URL: From roy.lowrance at gmail.com Sun Jul 10 22:26:35 2011 From: roy.lowrance at gmail.com (Roy Lowrance) Date: Sun, 10 Jul 2011 22:26:35 -0400 Subject: [Numpy-discussion] FloatingPointError: overflow encountered in multiply Message-ID: I have a 1D float64 array ts. I want to square each element, so I compute x = ts * ts I get a floating point overflow error. However, when I access each element separately and multiple, I get no error: for i in ts.shape[0]: -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Sun Jul 10 22:33:12 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Sun, 10 Jul 2011 21:33:12 -0500 Subject: [Numpy-discussion] gist gist: 1068264 In-Reply-To: References: Message-ID: On Fri, Jul 8, 2011 at 5:04 PM, Nathaniel Smith wrote: > Hi Bruce, > > I'm replying on the list instead of on github, to make it easier for > others to join in the discussion if they want. [For those joining in: > this was a comment posted at https://gist.github.com/1068264 ] > > On Fri, Jul 8, 2011 at 10:36 AM, bsouthey wrote: >> I presume missing float values could be addressed with one of the 'special' ranges such as 'Indeterminate' in IEEE 754 (http://babbage.cs.qc.edu/IEEE-754/References.xhtml). The outcome should be determined by the IEEE special operations. > > Right. An IEEE 754 double has IIRC about 2^53 distinct bit-patterns > that all mean "not a number". A few of these are used to signal > different invalid operations: > > In [20]: hex(np.asarray([np.nan]).view(dtype=np.uint64)[0]) > Out[20]: '0x7ff8000000000000L' > In [21]: hex(np.log([0]).view(dtype=np.uint64)[0]) > Out[21]: '0xfff0000000000000L' > In [22]: hex(np.divide([0.], [0,]).view(dtype=np.uint64)[0]) > Out[22]: '0xfff8000000000000L' > > ...but that only accounts for, like, 10 of the 2^53 or something. The > rest are simply unused. So what R does, and what we would do for > dtype-style NAs, is just pick one of those (ideally the same one R > uses), and declare that that is *not* not a number; it's NA. > >> So my real concern is handling integer arrays: >> 1) How will you find where the missing values are in an array? If there is a variable that denotes missing values are present (NA_flags?) then do you have to duplicate code to avoid this searching when an array has no missing values? > > Each dtype has a bunch of C functions associated with it that say how > to do comparisons, assignment, etc. In the miniNEP design, we add a > new function to this list called 'isna', which every dtype that wants > to support NAs has to define. Starting to lose me here because you are adding memory that your miniNep was not meant to do. > > Yes, this does mean that code which wants to treat NAs separately has > to check for and call this function if it's present, but that seems to > be inevitable... *all* of the dtype C functions are supposedly > optional, so we have to check for them before calling them and do > something sensible if they aren't defined. We could define a wrapper > that calls the function if its defined, or else just fills the > provided buffer with zeros (to mean "there are no NAs), and then code > which wanted to avoid a special case could use that. But in general we > probably do want to handle arrays that might have NAs differently from > arrays which don't have NAs, because if there are no NAs present then > it's quicker to skip the handling altogether. That's true for any NA > implementation. Second problem is that we need memory for at least a new function. We also have code duplication that needs to be in sync. > >> 2) What happens if a normal operation equates to that value: If you use max(np.int8), such as when adding 1 to an array with an element of 126 or when overflow occurs: >>>>> np.arange(120,127, dtype=np.int8)+2 >> array([ 122, ?123, ?124, ?125, ?126, ?127, -128], dtype=int8) >> The -128 corresponds to the missing element but is the second to last element now missing? This is worse if the overflow is larger. > > Yeah, in the design as written, overflow (among other things) can > create accidental NAs. Which kind of sucks. There are a few options: > > -- Just live with it. Unfortunately that is impossible and other choice words. > > -- We could add a flag like NPY_NA_AUTO_CHECK, and when this flag is > set, the ufunc loop runs 'isna' on its output buffer before returning. > If there are any NAs there that did not arise from NAs in the input, > then it raises an error. (The reason we would want to make it a flag > is that this checking is pointless for dtypes like NA-string, and > mostly pointless for dtypes like NA-float.) Also, we'd only want to > enable this if we were using the NPY_NA_AUTO_UFUNC ufunc-delegation > logic, because if you registered a special ufunc loop *specifically > for your NA-dtype*, then presumably it knows what it's doing. This > would also allow such an NA-dtype-specific ufunc loop to return NAs on > purpose if it wanted to. This appears to me as masking. But my issue here is the complexity of the function involved because ensuring that the calculation is correct probably comes with a large performance penalty. > > -- Use a dtype that adds a separate flag next to the actual integer to > indicate NA-ness, instead of stealing one of the integer's values. So > your NA-int8 would actually be 2 bytes, where the first byte was 1 to > indicate NA, or 0 to indicate that the second byte contains an actual > int8. If you do this with larger integers, say an int32, then you have > a choice: you could store your int32 in 8 bytes, in which case > arithmetic etc. is fast, but you waste a bit of memory. Or you could > store your int32 in 5 bytes, in which case arithmetic etc. become > somewhat slower, but you don't waste any memory. (This latter case > would basically be like using an unaligned or byteswapped array in > current numpy, in terms of mechanisms and speed.) But avoiding any increase in memory was one of the benefits of this miniNEP. It really doesn't matter which integer size you use because you still have the same problem. Also, people use int8 or whatever by choice due say memory constraints. > > -- Nothing in this design rules out a second implementation of NAs > based on masking. Personally, as you know, I'm not a big fan, but if > it were added anyway, then you could use that for your integers as > well. > > A related issue is, of the many ways we *can* do integer NA-dtype, > which one *should* we do by default. I don't have a strong opinion, > really; I haven't heard anyone say that they have huge quantities of > integer-plus-NA data that they want to manipulate and > memory/speed/allowing the full range of values are all really > important. (Maybe that's you?) In the design as written, they're all > pretty trivial to implement (you just tweak a few magic numbers in the > dtype structure), and probably we should support all of them via > more-or-less exotic invocations of np.withNA. (E.g., > 'np.withNA(np.int32, useflag=True, flagsize=1)' to get a 5-byte > int32.) I disagree with the comment that this is 'pretty trivial to implement'. I do not think that is trivial to implement with acceptable performance and memory costs. > > ...I kind of like that NPY_NA_AUTO_CHECK idea, it's pretty clean and > would definitely make things safer. I think I'll add it. > > -- Nathaniel I am being difficult as I do agree with many of the underlying idea. But I want something that works with acceptable performance and memory usage (there should be minor penalty of having masked elements over no masked elements). I do not find it acceptable when A.dot(B) is slower than first creating an array without NAs: C=A.noNA(), C.dot(B). Thus to me an API is insufficient to address that. Bruce From bsouthey at gmail.com Sun Jul 10 22:52:29 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Sun, 10 Jul 2011 21:52:29 -0500 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> <4E17408D.80705@gmail.com> Message-ID: On Fri, Jul 8, 2011 at 4:35 PM, Matthew Brett wrote: > Hi, > > On Fri, Jul 8, 2011 at 8:34 PM, Bruce Southey wrote: >> On Fri, Jul 8, 2011 at 12:55 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey wrote: >>>> On 07/08/2011 08:58 AM, Matthew Brett wrote: >>>>> Hi, >>>>> >>>>> Just checking - but is this: >>>>> >>>>> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey ?wrote: >>>>> ... >>>>>> The one thing that we do need now is the code that implements the small >>>>>> set of core ideas (array creation and simple numerical operations). >>>>>> Hopefully that will provide a better grasp of the concepts and the >>>>>> performance differences to determine the acceptability of the approach(es). >>>>> in reference to this: >>>>> >>>>>> On 07/08/2011 07:15 AM, Matthew Brett wrote: >>>>> ... >>>>>>> Can I ask - what do you recommend that we do now, for the discussion? >>>>>>> Should we be quiet and wait until there is code to test, or, as >>>>>>> Nathaniel has tried to do, work at reaching some compromise that makes >>>>>>> sense to some or all parties? >>>>> ? >>>>> >>>>> Cheers, >>>>> >>>>> Matthew >>>> Simply, I think the time for discussion has passed and it is now time to >>>> see the 'cards'. I do not know enough (or anything) about the >>>> implementation so I need code to know the actual 'cost' of Mark's idea >>>> with real situations. >>> >>> Yes, I thought that was what you were saying. >>> >>> I disagree and think that discussion of the type that Nathaniel has >>> started is a useful way to think more clearly and specifically about >>> the API and what can be agreed. >>> >>> Otherwise we will come to the same impasse when Mark's code arrives. >>> If that happens, we'll either lose the code because the merge is >>> refused, or be forced into something that may not be the best way >>> forward. >>> >>> Best, >>> >>> Matthew >>> _______________________________________________ >> >> >> Unfortunately we need code from either side as an API etc. is not >> sufficient to judge anything. > > If I understand correctly, we are not going to get code from either > side, we are only going to get code from one side. The would be very unfortunate indeed. > > I cannot now see how the code will inform the discussion about the > API, unless it turns out that the proposed API cannot be implemented. > ?The substantial points are not about memory use or performance, but > about how the API should work. ?If you can see some way that the code > will inform the discussion, please say, I would honestly be grateful. API's are not my area or even a concern. I am an end user so the code has to work correctly with acceptable performance and memory usage. To that end I have know if doing a+b is faster with less memory than first creating new arrays c and d without missing values then doing c+d. The limited understanding with the masked approach is that the former it should be faster than the latter with some acceptable increase in memory usage. With the miniNEP approach, I do not see that there will be benefits because the function will have to find these and handle them appropriately which may be a 'killer' for integer arrays. > >> But I do not think we will be forced >> into anything as in the extreme situation you can keep old versions or >> fork the code in the really extreme case. > > That would be a terrible waste, and potentially damaging to the > community, so of course we want to do all we can to avoid those > outcomes. > > Best, > > Matthew So I have to support anybody that wants to try a new change especially one that would remove my 'bane' of having functions automatically handle masked arrays. Bruce From charlesr.harris at gmail.com Sun Jul 10 23:00:58 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 10 Jul 2011 21:00:58 -0600 Subject: [Numpy-discussion] FloatingPointError: overflow encountered in multiply In-Reply-To: References: Message-ID: On Sun, Jul 10, 2011 at 8:26 PM, Roy Lowrance wrote: > I have a 1D float64 array ts. I want to square each element, so I compute > x = ts * ts > > I get a floating point overflow error. > > However, when I access each element separately and multiple, I get no > error: > for i in ts.shape[0]: > > Data please. The element by element squaring is handled by python, not numpy, so I expect numpy and python handle the errors differently. Catching floating point errors is a bit unreliable in any case. What OS/compiler are you using? Are you running 32 bit or 64 bit? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Jul 11 00:02:01 2011 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 10 Jul 2011 21:02:01 -0700 Subject: [Numpy-discussion] gist gist: 1068264 In-Reply-To: References: Message-ID: Hi Bruce, I think we have some fundamental misunderstandings about what this proposal would do. Let me see if I can try to be clearer. On Sun, Jul 10, 2011 at 7:33 PM, Bruce Southey wrote: > On Fri, Jul 8, 2011 at 5:04 PM, Nathaniel Smith wrote: >> Each dtype has a bunch of C functions associated with it that say how >> to do comparisons, assignment, etc. In the miniNEP design, we add a >> new function to this list called 'isna', which every dtype that wants >> to support NAs has to define. > > Starting to lose me here because you are adding memory that your > miniNep was not meant to do. The memory overhead that people have been worrying about is if they have, say, an 8 gigabyte array full of doubles, are they also going to need a 1 gigabyte array full of mask bytes. These new functions we're talking about are defined just once for each data, per Python invocation. This comes to, at worst, a few kilobytes total. Also, I should say that there were a few motivations for wanting to support dtype-style NAs; memory usage is only one of them. >> Yes, this does mean that code which wants to treat NAs separately has >> to check for and call this function if it's present, but that seems to >> be inevitable... *all* of the dtype C functions are supposedly >> optional, so we have to check for them before calling them and do >> something sensible if they aren't defined. We could define a wrapper >> that calls the function if its defined, or else just fills the >> provided buffer with zeros (to mean "there are no NAs), and then code >> which wanted to avoid a special case could use that. But in general we >> probably do want to handle arrays that might have NAs differently from >> arrays which don't have NAs, because if there are no NAs present then >> it's quicker to skip the handling altogether. That's true for any NA >> implementation. > > Second problem is that we need memory for at least a new function. We > also have code duplication that needs to be in sync. Both the masking and dtype ideas for NA support would require new code be written for Numpy to actually implement the functionality, and this code does take a small amount of memory, yes. But that's true for every feature ever. Also, there isn't any code duplication here, at least as far as I can tell. If you want to add a fast-path then that does use a tiny amount more memory, but that's sometimes worth it for speed. Anyway, my point was just that we can and should decide on a case-by-case basis; if a fast-path isn't worth it in some situation, then we shouldn't add it. Any checking you have to do for bit-pattern NAs, you also have to do for masks, and vice-versa. The checking looks slightly different (comparing for some magic NA value in the array versus checking for some special bits in the mask), but the actual work involved is equivalent. >> Yeah, in the design as written, overflow (among other things) can >> create accidental NAs. Which kind of sucks. There are a few options: >> >> -- Just live with it. > > Unfortunately that is impossible and other choice words. Okay. >> -- We could add a flag like NPY_NA_AUTO_CHECK, and when this flag is >> set, the ufunc loop runs 'isna' on its output buffer before returning. >> If there are any NAs there that did not arise from NAs in the input, >> then it raises an error. (The reason we would want to make it a flag >> is that this checking is pointless for dtypes like NA-string, and >> mostly pointless for dtypes like NA-float.) Also, we'd only want to >> enable this if we were using the NPY_NA_AUTO_UFUNC ufunc-delegation >> logic, because if you registered a special ufunc loop *specifically >> for your NA-dtype*, then presumably it knows what it's doing. This >> would also allow such an NA-dtype-specific ufunc loop to return NAs on >> purpose if it wanted to. > > This appears to me as masking. But my issue here is the complexity of > the function involved because ensuring that the calculation is correct > probably comes with a large performance penalty. I'm not sure what you mean about "appears as masking". There would be some overhead for double-checking that output values didn't accidentally produce NAs, yes. Depending on how caching effects worked out, this overhead might be zero; the bottleneck for most array operations is memory, not CPU, and doing these checks wouldn't require any extra CPU. But every solution does have some trade-offs; if there was a perfect solution then we wouldn't have anything to debate :-). The point is that the dtype-NA approach lets you choose which trade-offs you want to make while still being easy to understand. >> -- Use a dtype that adds a separate flag next to the actual integer to >> indicate NA-ness, instead of stealing one of the integer's values. So >> your NA-int8 would actually be 2 bytes, where the first byte was 1 to >> indicate NA, or 0 to indicate that the second byte contains an actual >> int8. If you do this with larger integers, say an int32, then you have >> a choice: you could store your int32 in 8 bytes, in which case >> arithmetic etc. is fast, but you waste a bit of memory. Or you could >> store your int32 in 5 bytes, in which case arithmetic etc. become >> somewhat slower, but you don't waste any memory. (This latter case >> would basically be like using an unaligned or byteswapped array in >> current numpy, in terms of mechanisms and speed.) > > But avoiding any increase in memory was one of the benefits of this > miniNEP. It really doesn't matter which integer size you use because > you still have the same problem. Also, people use int8 or whatever by > choice due say memory constraints. If you insist on functionality that requires an increase in memory, then you have to accept an increase in memory. Wanting to be able to store an int8, have the full range of values available, *plus* NA as a 257th value, means that you need to get an extra byte somewhere. I'm just explaining how you do that :-). My point is just that the proposal is flexible enough to make whichever trade-offs you decide are best for your situation, while still being easy to understand. >> A related issue is, of the many ways we *can* do integer NA-dtype, >> which one *should* we do by default. I don't have a strong opinion, >> really; I haven't heard anyone say that they have huge quantities of >> integer-plus-NA data that they want to manipulate and >> memory/speed/allowing the full range of values are all really >> important. (Maybe that's you?) In the design as written, they're all >> pretty trivial to implement (you just tweak a few magic numbers in the >> dtype structure), and probably we should support all of them via >> more-or-less exotic invocations of np.withNA. (E.g., >> 'np.withNA(np.int32, useflag=True, flagsize=1)' to get a 5-byte >> int32.) > > I disagree with the comment that this is 'pretty trivial to > implement'. I do not think that is trivial to implement with > acceptable performance and memory costs. I hope I made clear above the the necessary memory costs you're thinking of are actually basically non-existent. I'm not sure what you mean about it not being trivial to implement. Like I said, giving the different options is literally a matter of tweaking a few fields, and we want to support those fields for other reasons, plus they aren't very complicated to start with. > I am being difficult as I do agree with many of the underlying idea. > But I want something that works with acceptable performance and memory > usage (there should be minor penalty of having masked elements over no > masked elements). I do not find it acceptable when A.dot(B) is slower > than first creating an array without NAs: C=A.noNA(), C.dot(B). Thus > to me an API is insufficient to address that. First, let me say again that this miniNEP is not intended to compete with the masking idea -- they can coexist. But if we do want to choose between the two ideas, speed won't help you make a decision, because they're both going to use very similar code to do the inner loops, and both are going to be about equally fast. (And both should be faster than making a whole array copy! If not, you should complain until someone fixes it...) If anything, bit-pattern-NAs might be slightly faster than masking-NAs, because the masking-NAs will force the inner loops to look at two chunks of memory (one for the mask, and one for the values to do the actual computation), while the inner loop for a bit-pattern-NA only needs to look at the values. But again, I suspect this difference will not be measurable in practice. -- Nathaniel From thouis at gmail.com Mon Jul 11 08:43:51 2011 From: thouis at gmail.com (Thouis (Ray) Jones) Date: Mon, 11 Jul 2011 14:43:51 +0200 Subject: [Numpy-discussion] Post-doctoral position in image & data analysis for image-based high-content screening Message-ID: Short summary: A funded post-doctoral position in data analysis, machine-learning, and statistics applied to biological image-based high-content screening is available at the BioPhenics platform of the Curie Institute (Paris, France). The position involves development and maintenance of tools in Python, Matlab, or R that will be used both to analyze current screens and as a contribution to a larger community. Expertise in machine learning or statistical analysis with large biological datasets is desirable, good knowledge of image processing tools is an asset. Full details at: http://dl.dropbox.com/u/16028921/PostDocApp_MachineLearning_BFX.pdf Thouis Jones Institut Curie From roy.lowrance at gmail.com Mon Jul 11 08:50:34 2011 From: roy.lowrance at gmail.com (Roy Lowrance) Date: Mon, 11 Jul 2011 08:50:34 -0400 Subject: [Numpy-discussion] FloatingPointError: overflow encountered in multiply In-Reply-To: References: Message-ID: I found and fixed the problem. I had performed a distance computation of all point to all points. My points included a query point and I needed to set its distance to something large. So I did distances[query_point] = 1e308; ts = distances / band_width; x = ts * ts and got the overflow, which makes sense. I am eventually going to mask out x values that are greater than 1, so the fix is to set distances[query_point] = 1e10 because I know the geography and know that no reasonable distance exceeds this. Thanks, Chuck for your help. Roy On Sun, Jul 10, 2011 at 11:00 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Sun, Jul 10, 2011 at 8:26 PM, Roy Lowrance wrote: > >> I have a 1D float64 array ts. I want to square each element, so I compute >> x = ts * ts >> >> I get a floating point overflow error. >> >> However, when I access each element separately and multiple, I get no >> error: >> for i in ts.shape[0]: >> >> > Data please. The element by element squaring is handled by python, not > numpy, so I expect numpy and python handle the errors differently. Catching > floating point errors is a bit unreliable in any case. What OS/compiler are > you using? Are you running 32 bit or 64 bit? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Roy Lowrance home: 212 674 9777 mobile: 347 255 2544 -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin.druon at wanadoo.fr Mon Jul 11 08:58:35 2011 From: martin.druon at wanadoo.fr (Martin DRUON) Date: Mon, 11 Jul 2011 14:58:35 +0200 (CEST) Subject: [Numpy-discussion] Problem with ufunc of a numpy.ndarray derived class Message-ID: <21883025.66295.1310389115823.JavaMail.www@wwinf1h24> Hi, I have a problem with the ufunc return type of a numpy.ndarray derived class. In fact, I subclass a numpy.ndarray using the tutorial : http://docs.scipy.org/doc/numpy/user/basics.subclassing.html But, for example, if I execute the "max" ufunc from my subclass, the return type differs from the return type of the numpy ufunc. This is my code (testSubclassNumpy.py), "copy/paste" from the tutorial : # -*- coding: utf-8 -*- import numpy class MySubClass(numpy.ndarray): def __new__(cls, input_array, info=None): obj = numpy.asarray(input_array).view(cls) obj.info = info return obj def __array_finalize__(self, obj): #print 'In __array_finalize__:' #print ' self is %s' % repr(self) #print ' obj is %s' % repr(obj) if obj is None: return self.info = getattr(obj, 'info', None) def __array_wrap__(self, out_arr, context=None): #print 'In __array_wrap__:' #print ' self is %s' % repr(self) #print ' arr is %s' % repr(out_arr) # then just call the parent return numpy.ndarray.__array_wrap__(self, out_arr, context) >>> import numpy >>> numpy.__version__ '1.6.0' >>> import testSubclassNumpy >>> a = numpy.random.random(size=(10,10)) >>> t = testSubclassNumpy.MySubClass(a) >>> type(a) >>> type(t) >>> a.max() 0.99207693069079683 >>> t.max() MySubClass(0.9920769306907968) >>> type(numpy.max(a)) >>> type(numpy.max(t)) This problem seems to be appeared with the latest version of numpy. Today, I use Python 2.7.2 + numpy 1.6.0 but I didn't have this problem with python 2.6.6 and numpy 1.5.1. Is it a bug ? or perhaps I have made mistake somewhere... thanks, Martin From matthew.brett at gmail.com Mon Jul 11 09:08:03 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 11 Jul 2011 14:08:03 +0100 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> <4E17408D.80705@gmail.com> Message-ID: Hi, On Mon, Jul 11, 2011 at 3:52 AM, Bruce Southey wrote: > On Fri, Jul 8, 2011 at 4:35 PM, Matthew Brett wrote: >> Hi, >> >> On Fri, Jul 8, 2011 at 8:34 PM, Bruce Southey wrote: >>> On Fri, Jul 8, 2011 at 12:55 PM, Matthew Brett wrote: >>>> Hi, >>>> >>>> On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey wrote: >>>>> On 07/08/2011 08:58 AM, Matthew Brett wrote: >>>>>> Hi, >>>>>> >>>>>> Just checking - but is this: >>>>>> >>>>>> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey ?wrote: >>>>>> ... >>>>>>> The one thing that we do need now is the code that implements the small >>>>>>> set of core ideas (array creation and simple numerical operations). >>>>>>> Hopefully that will provide a better grasp of the concepts and the >>>>>>> performance differences to determine the acceptability of the approach(es). >>>>>> in reference to this: >>>>>> >>>>>>> On 07/08/2011 07:15 AM, Matthew Brett wrote: >>>>>> ... >>>>>>>> Can I ask - what do you recommend that we do now, for the discussion? >>>>>>>> Should we be quiet and wait until there is code to test, or, as >>>>>>>> Nathaniel has tried to do, work at reaching some compromise that makes >>>>>>>> sense to some or all parties? >>>>>> ? >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Matthew >>>>> Simply, I think the time for discussion has passed and it is now time to >>>>> see the 'cards'. I do not know enough (or anything) about the >>>>> implementation so I need code to know the actual 'cost' of Mark's idea >>>>> with real situations. >>>> >>>> Yes, I thought that was what you were saying. >>>> >>>> I disagree and think that discussion of the type that Nathaniel has >>>> started is a useful way to think more clearly and specifically about >>>> the API and what can be agreed. >>>> >>>> Otherwise we will come to the same impasse when Mark's code arrives. >>>> If that happens, we'll either lose the code because the merge is >>>> refused, or be forced into something that may not be the best way >>>> forward. >>>> >>>> Best, >>>> >>>> Matthew >>>> _______________________________________________ >>> >>> >>> Unfortunately we need code from either side as an API etc. is not >>> sufficient to judge anything. >> >> If I understand correctly, we are not going to get code from either >> side, we are only going to get code from one side. > > The would be very unfortunate indeed. > >> >> I cannot now see how the code will inform the discussion about the >> API, unless it turns out that the proposed API cannot be implemented. >> ?The substantial points are not about memory use or performance, but >> about how the API should work. ?If you can see some way that the code >> will inform the discussion, please say, I would honestly be grateful. > > API's are not my area or even a concern. ?I am an end user so the code > has to work correctly with acceptable performance and memory usage. To > that end I have know if doing a+b is faster with less memory than > first creating new arrays c and d without missing values then doing > c+d. The limited understanding with the masked approach is that the > former it should be faster than the latter with some acceptable > increase in memory usage. With the miniNEP approach, I do not see that > there will be benefits because the function will have to find these > and handle them appropriately which may be a 'killer' for integer > arrays. > >> >>> But I do not think we will be forced >>> into anything as in the extreme situation you can keep old versions or >>> fork the code in the really extreme case. >> >> That would be a terrible waste, and potentially damaging to the >> community, so of course we want to do all we can to avoid those >> outcomes. >> >> Best, >> >> Matthew > > So I have to support anybody that wants to try a new change especially > one that would remove my 'bane' of having functions automatically > handle masked arrays. This is a very important statement, and it is right at the heart of the problem that I have been trying to raise. Here what you are saying is "I want functions to handle masked arrays" and so "I support a change to handle masked arrays". However, you are replying on another discussion which is "What is the right API to handle masked arrays in relationship to missing values". Specifically you are saying you think discussion should stop on that until the masking implementation is done. My point is this: 1) We must make sure that we discuss the substance of the actual point. 2) In order to do this, we must be very careful to separate the actual point from A) Desire for our own favorite use-case B) General expressions of personal solidarity. If we don't then what we will see is considerable confusion in the discussion, and the destructive formation of cliques. We're scientists - and so we know better than most about the importance of keeping the ideas separate from the people making them. If we want to have clear ideas that will help numpy last as a tool, we need to preserve the quality of our discussion. Best, Matthew From bsouthey at gmail.com Mon Jul 11 10:16:06 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 11 Jul 2011 09:16:06 -0500 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> <4E17408D.80705@gmail.com> Message-ID: <4E1B05A6.7080603@gmail.com> On 07/11/2011 08:08 AM, Matthew Brett wrote: > Hi, > > On Mon, Jul 11, 2011 at 3:52 AM, Bruce Southey wrote: >> On Fri, Jul 8, 2011 at 4:35 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Fri, Jul 8, 2011 at 8:34 PM, Bruce Southey wrote: >>>> On Fri, Jul 8, 2011 at 12:55 PM, Matthew Brett wrote: >>>>> Hi, >>>>> >>>>> On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey wrote: >>>>>> On 07/08/2011 08:58 AM, Matthew Brett wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Just checking - but is this: >>>>>>> >>>>>>> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey wrote: >>>>>>> ... >>>>>>>> The one thing that we do need now is the code that implements the small >>>>>>>> set of core ideas (array creation and simple numerical operations). >>>>>>>> Hopefully that will provide a better grasp of the concepts and the >>>>>>>> performance differences to determine the acceptability of the approach(es). >>>>>>> in reference to this: >>>>>>> >>>>>>>> On 07/08/2011 07:15 AM, Matthew Brett wrote: >>>>>>> ... >>>>>>>>> Can I ask - what do you recommend that we do now, for the discussion? >>>>>>>>> Should we be quiet and wait until there is code to test, or, as >>>>>>>>> Nathaniel has tried to do, work at reaching some compromise that makes >>>>>>>>> sense to some or all parties? >>>>>>> ? >>>>>>> >>>>>>> Cheers, >>>>>>> >>>>>>> Matthew >>>>>> Simply, I think the time for discussion has passed and it is now time to >>>>>> see the 'cards'. I do not know enough (or anything) about the >>>>>> implementation so I need code to know the actual 'cost' of Mark's idea >>>>>> with real situations. >>>>> Yes, I thought that was what you were saying. >>>>> >>>>> I disagree and think that discussion of the type that Nathaniel has >>>>> started is a useful way to think more clearly and specifically about >>>>> the API and what can be agreed. >>>>> >>>>> Otherwise we will come to the same impasse when Mark's code arrives. >>>>> If that happens, we'll either lose the code because the merge is >>>>> refused, or be forced into something that may not be the best way >>>>> forward. >>>>> >>>>> Best, >>>>> >>>>> Matthew >>>>> _______________________________________________ >>>> >>>> Unfortunately we need code from either side as an API etc. is not >>>> sufficient to judge anything. >>> If I understand correctly, we are not going to get code from either >>> side, we are only going to get code from one side. >> The would be very unfortunate indeed. >> >>> I cannot now see how the code will inform the discussion about the >>> API, unless it turns out that the proposed API cannot be implemented. >>> The substantial points are not about memory use or performance, but >>> about how the API should work. If you can see some way that the code >>> will inform the discussion, please say, I would honestly be grateful. >> API's are not my area or even a concern. I am an end user so the code >> has to work correctly with acceptable performance and memory usage. To >> that end I have know if doing a+b is faster with less memory than >> first creating new arrays c and d without missing values then doing >> c+d. The limited understanding with the masked approach is that the >> former it should be faster than the latter with some acceptable >> increase in memory usage. With the miniNEP approach, I do not see that >> there will be benefits because the function will have to find these >> and handle them appropriately which may be a 'killer' for integer >> arrays. >> >>>> But I do not think we will be forced >>>> into anything as in the extreme situation you can keep old versions or >>>> fork the code in the really extreme case. >>> That would be a terrible waste, and potentially damaging to the >>> community, so of course we want to do all we can to avoid those >>> outcomes. >>> >>> Best, >>> >>> Matthew >> So I have to support anybody that wants to try a new change especially >> one that would remove my 'bane' of having functions automatically >> handle masked arrays. > This is a very important statement, and it is right at the heart of > the problem that I have been trying to raise. > > Here what you are saying is "I want functions to handle masked arrays" > and so "I support a change to handle masked arrays". > > However, you are replying on another discussion which is "What is the > right API to handle masked arrays in relationship to missing values". > Specifically you are saying you think discussion should stop on that > until the masking implementation is done. > > My point is this: > > 1) We must make sure that we discuss the substance of the actual point. > 2) In order to do this, we must be very careful to separate the actual > point from > A) Desire for our own favorite use-case > B) General expressions of personal solidarity. > > If we don't then what we will see is considerable confusion in the > discussion, and the destructive formation of cliques. > > We're scientists - and so we know better than most about the > importance of keeping the ideas separate from the people making them. > If we want to have clear ideas that will help numpy last as a tool, we > need to preserve the quality of our discussion. > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Just to correct you, my position is 'show me the code' not that I support any idea. As you probably can tell, I do have a hard time understanding how either approach will actually work. By having basic code that implements very basic functionality, I, and probably others, will better appreciate what people are referring to and what is the cost in performance and usage. Bruce From matthew.brett at gmail.com Mon Jul 11 10:24:49 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 11 Jul 2011 15:24:49 +0100 Subject: [Numpy-discussion] Missing Values Discussion In-Reply-To: <4E1B05A6.7080603@gmail.com> References: <4B373853-CDDE-46FC-A318-972E72E2CDA6@enthought.com> <4E170490.8050702@gmail.com> <4E17408D.80705@gmail.com> <4E1B05A6.7080603@gmail.com> Message-ID: Hi, On Mon, Jul 11, 2011 at 3:16 PM, Bruce Southey wrote: > On 07/11/2011 08:08 AM, Matthew Brett wrote: >> Hi, >> >> On Mon, Jul 11, 2011 at 3:52 AM, Bruce Southey ?wrote: >>> On Fri, Jul 8, 2011 at 4:35 PM, Matthew Brett ?wrote: >>>> Hi, >>>> >>>> On Fri, Jul 8, 2011 at 8:34 PM, Bruce Southey ?wrote: >>>>> On Fri, Jul 8, 2011 at 12:55 PM, Matthew Brett ?wrote: >>>>>> Hi, >>>>>> >>>>>> On Fri, Jul 8, 2011 at 6:38 PM, Bruce Southey ?wrote: >>>>>>> On 07/08/2011 08:58 AM, Matthew Brett wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Just checking - but is this: >>>>>>>> >>>>>>>> On Fri, Jul 8, 2011 at 2:22 PM, Bruce Southey ? ?wrote: >>>>>>>> ... >>>>>>>>> The one thing that we do need now is the code that implements the small >>>>>>>>> set of core ideas (array creation and simple numerical operations). >>>>>>>>> Hopefully that will provide a better grasp of the concepts and the >>>>>>>>> performance differences to determine the acceptability of the approach(es). >>>>>>>> in reference to this: >>>>>>>> >>>>>>>>> On 07/08/2011 07:15 AM, Matthew Brett wrote: >>>>>>>> ... >>>>>>>>>> Can I ask - what do you recommend that we do now, for the discussion? >>>>>>>>>> Should we be quiet and wait until there is code to test, or, as >>>>>>>>>> Nathaniel has tried to do, work at reaching some compromise that makes >>>>>>>>>> sense to some or all parties? >>>>>>>> ? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Matthew >>>>>>> Simply, I think the time for discussion has passed and it is now time to >>>>>>> see the 'cards'. I do not know enough (or anything) about the >>>>>>> implementation so I need code to know the actual 'cost' of Mark's idea >>>>>>> with real situations. >>>>>> Yes, I thought that was what you were saying. >>>>>> >>>>>> I disagree and think that discussion of the type that Nathaniel has >>>>>> started is a useful way to think more clearly and specifically about >>>>>> the API and what can be agreed. >>>>>> >>>>>> Otherwise we will come to the same impasse when Mark's code arrives. >>>>>> If that happens, we'll either lose the code because the merge is >>>>>> refused, or be forced into something that may not be the best way >>>>>> forward. >>>>>> >>>>>> Best, >>>>>> >>>>>> Matthew >>>>>> _______________________________________________ >>>>> >>>>> Unfortunately we need code from either side as an API etc. is not >>>>> sufficient to judge anything. >>>> If I understand correctly, we are not going to get code from either >>>> side, we are only going to get code from one side. >>> The would be very unfortunate indeed. >>> >>>> I cannot now see how the code will inform the discussion about the >>>> API, unless it turns out that the proposed API cannot be implemented. >>>> ? The substantial points are not about memory use or performance, but >>>> about how the API should work. ?If you can see some way that the code >>>> will inform the discussion, please say, I would honestly be grateful. >>> API's are not my area or even a concern. ?I am an end user so the code >>> has to work correctly with acceptable performance and memory usage. To >>> that end I have know if doing a+b is faster with less memory than >>> first creating new arrays c and d without missing values then doing >>> c+d. The limited understanding with the masked approach is that the >>> former it should be faster than the latter with some acceptable >>> increase in memory usage. With the miniNEP approach, I do not see that >>> there will be benefits because the function will have to find these >>> and handle them appropriately which may be a 'killer' for integer >>> arrays. >>> >>>>> But I do not think we will be forced >>>>> into anything as in the extreme situation you can keep old versions or >>>>> fork the code in the really extreme case. >>>> That would be a terrible waste, and potentially damaging to the >>>> community, so of course we want to do all we can to avoid those >>>> outcomes. >>>> >>>> Best, >>>> >>>> Matthew >>> So I have to support anybody that wants to try a new change especially >>> one that would remove my 'bane' of having functions automatically >>> handle masked arrays. >> This is a very important statement, and it is right at the heart of >> the problem that I have been trying to raise. >> >> Here what you are saying is "I want functions to handle masked arrays" >> and so "I support a change to handle masked arrays". >> >> However, you are replying on another discussion which is "What is the >> right API to handle masked arrays in relationship to missing values". >> ? Specifically you are saying you think discussion should stop on that >> until the masking implementation is done. >> >> My point is this: >> >> 1) We must make sure that we discuss the substance of the actual point. >> 2) In order to do this, we must be very careful to separate the actual >> point from >> A) Desire for our own favorite use-case >> B) General expressions of personal solidarity. >> >> If we don't then what we will see is considerable confusion in the >> discussion, and the destructive formation of cliques. >> >> We're scientists - and so we know better than most about the >> importance of keeping the ideas separate from the people making them. >> If we want to have clear ideas that will help numpy last as a tool, we >> need to preserve the quality of our discussion. >> >> Best, >> >> Matthew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > Just to correct you, my position is 'show me the code' not that I > support any idea. I'm directly commenting on this phrase in your last email: >>> So I have to support anybody that wants to try a new change especially >>> one that would remove my 'bane' of having functions automatically >>> handle masked arrays. > As you probably can tell, I do have a hard time > understanding how either approach will actually work. >From what you've said, I have the impression that you do not actually care about the API: >>> API's are not my area or even a concern. ?I am an end user so the code >>> has to work correctly with acceptable performance and memory usage. > By having basic > code that implements very basic functionality, I, and probably others, > will better appreciate what people are referring to and what is the cost > in performance and usage. I want to make sure that you realize that Mark and Travis are not proposing to implement both APIs, and so the current implementation is unlikely to resolve the discussion. You are thinking of the things of interest to you - masked array integration, memory, performance. That's fine, but, in order to keep the discussion clear and focussed, we must avoid confusing that with the API discussion. Best, Matthew From ralf.gommers at googlemail.com Mon Jul 11 15:28:14 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 11 Jul 2011 21:28:14 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 3 Message-ID: Hi, I am pleased to announce the availability of the third release candidate of NumPy 1.6.1. This is a bugfix release, list of fixed bugs: #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() #1895/1896 iter: writeonly operands weren't always being buffered correctly This third RC has only a single change compared to RC2 (for #1895/1896), which fixes a serious regression in the iterator. If no new problems are reported, the final release will be in one week. Sources and binaries can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Mon Jul 11 15:31:15 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 11 Jul 2011 21:31:15 +0200 Subject: [Numpy-discussion] numpy build issue on i7-2600K CPU In-Reply-To: <4E1A5888.1030808@uci.edu> References: <4E1A5888.1030808@uci.edu> Message-ID: Hi Christoph, On Mon, Jul 11, 2011 at 3:57 AM, Christoph Gohlke wrote: > Hello, > > building numpy 1.6.1rc2 on Windows, i7-2600K CPU, with msvc9 failed with > the following error: > > File "numpy/core/setup_common.py", line 271, in long_double_representation > raise ValueError("Could not lock sequences (%s)" % saw) > ValueError: Could not lock sequences (None) > > > This problem has been mentioned before at pipermail/numpy-discussion/**2011-March/055571.html > >. > > > Opening the configtest.obj file in binary mode fixed the issue for me. A > patch is attached. > I did see this, just not before I tagged 1.6.1rc3. If it's reviewed/tested I think it's a simple enough change that it can go in without requiring a new RC. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.wheeler2 at gmail.com Mon Jul 11 17:01:07 2011 From: daniel.wheeler2 at gmail.com (Daniel Wheeler) Date: Mon, 11 Jul 2011 17:01:07 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices Message-ID: Hi, I am trying to find the eigenvalues and eigenvectors as well as the inverse for a large number of small matrices. The matrix size (MxM) will typically range from 2x2 to 8x8 at most. The number of matrices (N) can be from 100 up to a million or more. My current solution is to define "eig" and "inv" to be, def inv(A): """ Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). """ return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0) def eig(A): """ Calculate the eigenvalues and eigenvectors of N MxM matrices, A.shape = (M, M, N), eig(A)[0].shape = (M, N), eig(A)[1].shape = (M, M, N) """ tmp = zip(*map(np.linalg.eig, A.transpose(2, 0, 1))) return (np.array(tmp[0]).swapaxes(0,1), np.array(tmp[1]).transpose(1,2,0)) The above uses "map" to fake a vector solution, but this is heinously slow. Are there any better ways to do this without resorting to cython or weave (would it even be faster (or possible) to use "np.linalg.eig" and "np.linalg.inv" within cython)? I could write specialized versions when M=2 or M=3, which could be fully vectorized, but I'd rather have a general solution. Are there better algorithms than the ones used in "np.linalg.inv" and "np.linalg.eig" when M < 9, say, that I could hand code using numpy in a fully vectorized way? My end goal is to implement a Riemann flux calculation as part of a finite volume solver. This requires calculating "Abar = R E inv(R)" where R are the right eigenvectors of a matrix A and E is the matrix with the absolute value of the eigenvalues of A along the diagonal (both R and E are sorted in ascending order of their corresponding eigenvalues). As well as using "eig" and "inv" defined above, matrix multiplication and sorted eigenvalues and eigenvectors are also required. Fortunately, these can be vectorized as follows, def sum(a, axis=0): """ Faster than using np.sum. """ return np.tensordot(np.ones(a.shape[axis], 'l'), a, (0, axis)) def mul(A, B): """ Matrix multiply N MxM matrices, A.shape = B.shape = (M, M, N), mul(A, B).shape = (M, M, N) """ return sum(A.swapaxes(0,1)[:, :, np.newaxis] * B[:, np.newaxis], 0) def sortedeig(A): """ Calculates the sorted eigenvalues and eigenvectors of N MxM matrices, A.shape = (M, M, N), sortedeig(A)[0].shape = (M, N), sortedeig(A)[1].shape = (M, M, N). """ N = A.shape[-1] eigenvalues, R = eig(A) order = eigenvalues.argsort(0).swapaxes(0, 1) Nlist = [[i] for i in xrange(N)] return (eigenvalues[order, Nlist].swapaxes(0, 1), R[:, order, Nlist].swapaxes(1, 2)) The above two functions take very little time compared with "eig" and "inv". Using "sum" helps too. Given A, calculating "Abar = R E inv(R)" can then simply be written eigenvalues, R = sortedeig(A) E = abs(eigenvalues) * numerix.identity(eigenvalues.shape[0])[..., np.newaxis] Abar = mul(mul(R, E), inv(R)) Maybe there is a better way to calculate "Abar" rather than explicitly calculating the eigenvalues and eigenvectors for every matrix. Any help is much appreciated. -- Daniel Wheeler From cgohlke at uci.edu Mon Jul 11 17:12:45 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Mon, 11 Jul 2011 14:12:45 -0700 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 3 In-Reply-To: References: Message-ID: <4E1B674D.90707@uci.edu> On 7/11/2011 12:28 PM, Ralf Gommers wrote: > Hi, > > I am pleased to announce the availability of the third release candidate > of NumPy 1.6.1. This is a bugfix release, list of fixed bugs: > #1834 einsum fails for specific shapes > #1837 einsum throws nan or freezes python for specific array shapes > #1838 object <-> structured type arrays regression > #1851 regression for SWIG based code in 1.6.0 > #1863 Buggy results when operating on array copied with astype() > #1870 Fix corner case of object array assignment > #1843 Py3k: fix error with recarray > #1885 nditer: Error in detecting double reduction loop > #1874 f2py: fix --include_paths bug > #1749 Fix ctypes.load_library() > #1895/1896 iter: writeonly operands weren't always being buffered correctly > > This third RC has only a single change compared to RC2 (for #1895/1896), > which fixes a serious regression in the iterator. If no new problems are > reported, the final release will be in one week. Sources and binaries > can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.6.1rc2/ > > Enjoy, > Ralf > > Hi Ralph. I tested rc3. It looks good, except that on win-amd64 whenever numpy is imported, a 'Forcing DISTUTILS_USE_SDK=1' is printed from line 377 in misc_util.py. Hence some tests of other packages fail. This is due to a recent change: Now every time numpy is imported, numpy.distutils is also imported. Is this necessary or can the import of distutils be deferred? Christoph From robert.kern at gmail.com Mon Jul 11 17:31:01 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 11 Jul 2011 16:31:01 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 3 In-Reply-To: <4E1B674D.90707@uci.edu> References: <4E1B674D.90707@uci.edu> Message-ID: On Mon, Jul 11, 2011 at 16:12, Christoph Gohlke wrote: > > I tested rc3. It looks good, except that on win-amd64 whenever numpy is > imported, a 'Forcing DISTUTILS_USE_SDK=1' is printed from line 377 in > misc_util.py. Hence some tests of other packages fail. > > This is due to a recent change: > > > Now every time numpy is imported, numpy.distutils is also imported. Is > this necessary or can the import of distutils be deferred? The get_shared_lib_extension() call could be deferred to inside load_library() in ctypeslib.py, yes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From d.s.seljebotn at astro.uio.no Tue Jul 12 03:51:02 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 12 Jul 2011 09:51:02 +0200 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: References: Message-ID: <4E1BFCE6.5030702@astro.uio.no> On 07/11/2011 11:01 PM, Daniel Wheeler wrote: > Hi, I am trying to find the eigenvalues and eigenvectors as well as > the inverse for a large number of small matrices. The matrix size > (MxM) will typically range from 2x2 to 8x8 at most. The number of > matrices (N) can be from 100 up to a million or more. My current > solution is to define "eig" and "inv" to be, > > def inv(A): > """ > Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). > """ > return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0) > > def eig(A): > """ > Calculate the eigenvalues and eigenvectors of N MxM matrices, > A.shape = (M, M, N), eig(A)[0].shape = (M, N), eig(A)[1].shape = (M, > M, N) > """ > tmp = zip(*map(np.linalg.eig, A.transpose(2, 0, 1))) > return (np.array(tmp[0]).swapaxes(0,1), np.array(tmp[1]).transpose(1,2,0)) > > The above uses "map" to fake a vector solution, but this is heinously > slow. Are there any better ways to do this without resorting to cython > or weave (would it even be faster (or possible) to use "np.linalg.eig" > and "np.linalg.inv" within cython)? I could write specialized versions If you want to go the Cython route, here's a start: http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/ Dag Sverre From sturla at molden.no Tue Jul 12 09:36:34 2011 From: sturla at molden.no (Sturla Molden) Date: Tue, 12 Jul 2011 15:36:34 +0200 Subject: [Numpy-discussion] Are .M and .H removed in NumPy 1.6? Message-ID: <4E1C4DE2.4030201@molden.no> After upgrading EPD, I just discovered that my ndarrays no longer have .M and .H attributes. Were they deprectated, or is my NumPy not working correctly? Sturla From charlesr.harris at gmail.com Tue Jul 12 09:55:12 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 Jul 2011 07:55:12 -0600 Subject: [Numpy-discussion] Are .M and .H removed in NumPy 1.6? In-Reply-To: <4E1C4DE2.4030201@molden.no> References: <4E1C4DE2.4030201@molden.no> Message-ID: On Tue, Jul 12, 2011 at 7:36 AM, Sturla Molden wrote: > After upgrading EPD, I just discovered that my ndarrays no longer have > .M and .H attributes. > > Were they deprectated, or is my NumPy not working correctly? > > I thought they were long gone: http://mail.scipy.org/pipermail/numpy-discussion/2006-July/009247.html Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Tue Jul 12 10:06:54 2011 From: sturla at molden.no (Sturla Molden) Date: Tue, 12 Jul 2011 16:06:54 +0200 Subject: [Numpy-discussion] Are .M and .H removed in NumPy 1.6? In-Reply-To: References: <4E1C4DE2.4030201@molden.no> Message-ID: <4E1C54FE.9020801@molden.no> Den 12.07.2011 15:55, skrev Charles R Harris: > > > On Tue, Jul 12, 2011 at 7:36 AM, Sturla Molden > wrote: > > After upgrading EPD, I just discovered that my ndarrays no longer have > .M and .H attributes. > > Were they deprectated, or is my NumPy not working correctly? > > > I thought they were long gone: > http://mail.scipy.org/pipermail/numpy-discussion/2006-July/009247.html Yes, thanks. Sorry for my confusion. .H and .A are obviously attributes of np.matrix, but there are no .H and .M for np.ndarray. Sturla -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.wheeler2 at gmail.com Tue Jul 12 10:10:58 2011 From: daniel.wheeler2 at gmail.com (Daniel Wheeler) Date: Tue, 12 Jul 2011 10:10:58 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: <4E1BFCE6.5030702@astro.uio.no> References: <4E1BFCE6.5030702@astro.uio.no> Message-ID: On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn wrote: > On 07/11/2011 11:01 PM, Daniel Wheeler wrote: >> Hi, I am trying to find the eigenvalues and eigenvectors as well as >> the inverse for a large number of small matrices. The matrix size > If you want to go the Cython route, here's a start: > > http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/ Thanks for the heads up. Looks like an option. Presumably, it would still have to use "map" even with more direct access to BLAS (still going C <-> python for every matrix)? Also, adding extra non-standard dependencies is a problem as this code is part of a production code that's passed onto others. -- Daniel Wheeler From d.s.seljebotn at astro.uio.no Tue Jul 12 10:52:55 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Tue, 12 Jul 2011 16:52:55 +0200 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: References: <4E1BFCE6.5030702@astro.uio.no> Message-ID: <4E1C5FC7.3070200@astro.uio.no> On 07/12/2011 04:10 PM, Daniel Wheeler wrote: > On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn > wrote: >> On 07/11/2011 11:01 PM, Daniel Wheeler wrote: >>> Hi, I am trying to find the eigenvalues and eigenvectors as well as >>> the inverse for a large number of small matrices. The matrix size > >> If you want to go the Cython route, here's a start: >> >> http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/ > > Thanks for the heads up. Looks like an option. Presumably, it would > still have to use "map" even with more direct access to BLAS (still > going C<-> python for every matrix)? Also, adding extra non-standard > dependencies is a problem as this code is part of a production code > that's passed onto others. > I was thinking you'd use it as a starting point, and actually write low-level for-loops indexing the buffer data pointers in Cython. If you make sure that np.ascontiguousarray or np.asfortranarray, you can do cimport numpy as np np.import_array() ... def func(np.ndarray[double, ndim=3, mode='fortran'] arr): double *buf = PyArray_DATA(arr) # low-level C-like code to get slices of buf and pass to BLAS Dag Sverre From jhibschman+numpy at gmail.com Tue Jul 12 11:14:21 2011 From: jhibschman+numpy at gmail.com (Johann Hibschman) Date: Tue, 12 Jul 2011 10:14:21 -0500 Subject: [Numpy-discussion] object scalars Message-ID: Is there any way to wrap a sequence (in particular a python list) as a numpy object scalar, without it being promoted to an object array? In particular, np.object_([1, 2]).shape == (2,) np.array([1,2], dtype='O').shape == (2,) while I want some_call([1,2]).shape = () Thanks, Johann From sturla at molden.no Tue Jul 12 11:19:07 2011 From: sturla at molden.no (Sturla Molden) Date: Tue, 12 Jul 2011 17:19:07 +0200 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: References: Message-ID: <4E1C65EB.9000502@molden.no> Den 11.07.2011 23:01, skrev Daniel Wheeler: > The above uses "map" to fake a vector solution, but this is heinously > slow. Are there any better ways to do this without resorting to cython > or weave (would it even be faster (or possible) to use "np.linalg.eig" > and "np.linalg.inv" within cython)? I see two problems here: def inv(A): """ Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). """ return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0) At least get rid of the transpositions and non-contiguous memory access. Use shape (N,M,M) in C order or (M,M,N) in Fortran order to make memory access more contiguous and cache friendly. Statements like A.transpose(2,0,1) are evil: they just burn the CPU and flood the memory bus; and cache, prefetching, etc. will do no good as everything is out of order. LAPACK is written in Fortran, at least with scipy.linalg I would use shape (M,M,N) in Fortran order to avoid redundant transpositions by f2py. I am not sure what NumPy's lapack_lite does, though. You will have a lot of Python overhead by using np.linalg.inv on each of the N matrices. Therefore, you don't gain much by using map instead of a for loop. Using map will save a few attribute lookups per loop, but there are dozens of them. To make the loop over N matrices fast, there is nothing that beats a loop in C or Fortran (or Cython) if you have a 3D array. And that brings us to the second issue, which is that it would be nice if the solvers in numpy.linalg (and scipy.linalg) were vectorized for 3D arrays. Calling np.linalg.inv from Cython will not help though, as you incur the same overhead as calling np.linalg.inv with map from Python. Another question is if you really need to compute the inverse. A matrix inversion and subsequent matrix multiplication can be replaced by solving a linear system, which only takes half the amount of computation. Sturla From shish at keba.be Tue Jul 12 11:22:05 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 12 Jul 2011 11:22:05 -0400 Subject: [Numpy-discussion] object scalars In-Reply-To: References: Message-ID: I found a workaround but it's a bit ugly: def some_call(x): rval = numpy.array(None, dtype='object') rval.fill(x) return rval -=- Olivier 2011/7/12 Johann Hibschman > Is there any way to wrap a sequence (in particular a python list) as a > numpy object scalar, without it being promoted to an object array? > > In particular, > > np.object_([1, 2]).shape == (2,) > np.array([1,2], dtype='O').shape == (2,) > > while I want > > some_call([1,2]).shape = () > > Thanks, > Johann > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Tue Jul 12 11:30:57 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 12 Jul 2011 17:30:57 +0200 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: References: Message-ID: <20110712153057.GG17559@phare.normalesup.org> On Mon, Jul 11, 2011 at 05:01:07PM -0400, Daniel Wheeler wrote: > Hi, I am trying to find the eigenvalues and eigenvectors as well as > the inverse for a large number of small matrices. The matrix size > (MxM) will typically range from 2x2 to 8x8 at most. If you really care about speed, for matrices of this size you shouldn't call a linear algebra pack, but simply precompute the close-form solution. Here is sympy code that I used a while ago to generate fast code to do inverse of SPD matrices. G -------------- next part -------------- A non-text attachment was scrubbed... Name: gen_sym_inv.py Type: text/x-python Size: 2553 bytes Desc: not available URL: From daniel.wheeler2 at gmail.com Tue Jul 12 12:15:41 2011 From: daniel.wheeler2 at gmail.com (Daniel Wheeler) Date: Tue, 12 Jul 2011 12:15:41 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: <4E1C5FC7.3070200@astro.uio.no> References: <4E1BFCE6.5030702@astro.uio.no> <4E1C5FC7.3070200@astro.uio.no> Message-ID: On Tue, Jul 12, 2011 at 10:52 AM, Dag Sverre Seljebotn wrote: > On 07/12/2011 04:10 PM, Daniel Wheeler wrote: >> On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn >> Thanks for the heads up. Looks like an option. Presumably, it would >> still have to use "map" even with more direct access to BLAS (still >> going C<-> ?python for every matrix)? Also, adding extra non-standard >> dependencies is a problem as this code is part of a production code >> that's passed onto others. >> > > I was thinking you'd use it as a starting point, and actually write > low-level for-loops indexing the buffer data pointers in Cython. I realized that as soon as I'd hit the send button. Thanks. -- Daniel Wheeler From gregwh at gmail.com Tue Jul 12 12:16:29 2011 From: gregwh at gmail.com (greg whittier) Date: Tue, 12 Jul 2011 12:16:29 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: <20110712153057.GG17559@phare.normalesup.org> References: <20110712153057.GG17559@phare.normalesup.org> Message-ID: On Tue, Jul 12, 2011 at 11:30 AM, Gael Varoquaux wrote: > On Mon, Jul 11, 2011 at 05:01:07PM -0400, Daniel Wheeler wrote: >> Hi, I am trying to find the eigenvalues and eigenvectors as well as >> the inverse for a large number of small matrices. The matrix size >> (MxM) will typically range from 2x2 to 8x8 at most. > > If you really care about speed, for matrices of this size you shouldn't > call a linear algebra pack, but simply precompute the close-form > solution. Here is sympy code that I used a while ago to generate fast > code to do inverse of SPD matrices. > > G This has been a very timely discussion for me since I'm looking to do the same thing with a different application. My interest in diagonalizing lots of small matrices (not inverting). I believe the OP was also interested in eigenvalues. Gael, your code addresses inverses, but I take it something similar for eigenvalues of a matrix bigger than 5x5 doesn't exists since a closed-form solution doesn't exist for finding polynomials roots for order > 5? The ones I'm looking at now happen to be 3x3, so I was thinking of using http://en.wikipedia.org/wiki/Eigenvalue_algorithm#Eigenvalues_of_a_Symmetric_3x3_Matrix but I might have anywhere from 2 to 10 at some point. (To add another spin to this, I recently acquired an NVIDIA Tesla card and am thinking of using it for this problem.) Thanks, Greg From gael.varoquaux at normalesup.org Tue Jul 12 12:18:19 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 12 Jul 2011 18:18:19 +0200 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: References: <20110712153057.GG17559@phare.normalesup.org> Message-ID: <20110712161819.GI17559@phare.normalesup.org> On Tue, Jul 12, 2011 at 12:16:29PM -0400, greg whittier wrote: > Gael, your code addresses inverses, but I take it something similar for > eigenvalues of a matrix bigger than 5x5 doesn't exists since a > closed-form solution doesn't exist for finding polynomials roots for > order > 5? I guess so :). > The ones I'm looking at now happen to be 3x3, so I was thinking of > using http://en.wikipedia.org/wiki/Eigenvalue_algorithm#Eigenvalues_of_a_Symmetric_3x3_Matrix > but I might have anywhere from 2 to 10 at some point. I am afraid that this is beyond my skills. Sorry ;$. G From daniel.wheeler2 at gmail.com Tue Jul 12 12:23:19 2011 From: daniel.wheeler2 at gmail.com (Daniel Wheeler) Date: Tue, 12 Jul 2011 12:23:19 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: <4E1C65EB.9000502@molden.no> References: <4E1C65EB.9000502@molden.no> Message-ID: On Tue, Jul 12, 2011 at 11:19 AM, Sturla Molden wrote: > Den 11.07.2011 23:01, skrev Daniel Wheeler: > To make the loop over N matrices fast, there is nothing that beats a > loop in C or Fortran (or Cython) if you have a 3D array. And that brings > us to the second issue, which is that it would be nice if the solvers in > numpy.linalg (and scipy.linalg) were vectorized for 3D arrays. Amen. > Another question is if you really need to compute the inverse. A matrix > inversion and subsequent matrix multiplication can be replaced by > solving a linear system, which only takes half the amount of computation. Good idea. One possibility for avoiding cython for the inverse is to populate a sparse matrix in pysparse (or scipy) with the small matrices and then linear solve as I don't need the explicit inverse. However, that doesn't help with the eigenvalues. -- Daniel Wheeler From nouiz at nouiz.org Tue Jul 12 12:25:42 2011 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 12 Jul 2011 12:25:42 -0400 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: <4E17355F.4000209@molden.no> References: <1310040632.1736.85.camel@casimir> <4E17355F.4000209@molden.no> Message-ID: Hi, We depend highly on numpy, but don't have the time to follow all the mailing lists regularly of all our tools. Having some information on the release note about this would have been useful to many people I think. Also, did this affect the C-API? Do the default value of newlly created ndarray in C changed? Thanks for the great work. Fr?d?ric Bastien On Fri, Jul 8, 2011 at 12:50 PM, Sturla Molden wrote: > Den 07.07.2011 14:10, skrev Jens J?rgen Mortensen: >> So, this means I can't count on new arrays being C-contiguous any more. >> I guess there is a good reason for this. > > Work with linear algebra (LAPACK) caused excessive and redundant array > transpositions. Arrays would be transposed from C to Fortran order > before they were passed to LAPACK, and returned arrays were transposed > from Fortran to C order when used in Python. Signal and image processing > in SciPy (FFTPACK) suffered from the same issue, as did certain > optimization (MINPACK). Computer graphics with OpenGL was similarly > impaired. The OpenGL library has a C frontent, but requires that all > buffers and matrices are stored in Fortran order. > > The old behaviour of NumPy was very annoying. Now we can rely on NumPy > to always use the most efficient memory layout, unless we request one in > particular. > > Yeah, and it also made NumPy look bad compared to Matlab, which always > uses Fortran order for this reason ;-) > > Sturla > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mwwiebe at gmail.com Tue Jul 12 12:48:16 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 12 Jul 2011 11:48:16 -0500 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: References: <1310040632.1736.85.camel@casimir> <4E17355F.4000209@molden.no> Message-ID: 2011/7/12 Fr?d?ric Bastien > Hi, > > We depend highly on numpy, but don't have the time to follow all the > mailing lists regularly of all our tools. Having some information on > the release note about this would have been useful to many people I > think. > You're absolutely right, not including information on this was an oversight on our part. I apologize for that. Also, did this affect the C-API? Do the default value of newlly > created ndarray in C changed? > This only added to the C-API, pre-existing API remained the same for API/ABI compatibility reasons. C code already had to deal with the possibility of differing memory layouts, for example if someone passes in carr.T, something in Fortran order. This change primarily affected the output layout of ufuncs, newly created ndarrays continue to be default 'C' order. > > Thanks for the great work. > Thanks, Mark > > Fr?d?ric Bastien > > On Fri, Jul 8, 2011 at 12:50 PM, Sturla Molden wrote: > > Den 07.07.2011 14:10, skrev Jens J?rgen Mortensen: > >> So, this means I can't count on new arrays being C-contiguous any more. > >> I guess there is a good reason for this. > > > > Work with linear algebra (LAPACK) caused excessive and redundant array > > transpositions. Arrays would be transposed from C to Fortran order > > before they were passed to LAPACK, and returned arrays were transposed > > from Fortran to C order when used in Python. Signal and image processing > > in SciPy (FFTPACK) suffered from the same issue, as did certain > > optimization (MINPACK). Computer graphics with OpenGL was similarly > > impaired. The OpenGL library has a C frontent, but requires that all > > buffers and matrices are stored in Fortran order. > > > > The old behaviour of NumPy was very annoying. Now we can rely on NumPy > > to always use the most efficient memory layout, unless we request one in > > particular. > > > > Yeah, and it also made NumPy look bad compared to Matlab, which always > > uses Fortran order for this reason ;-) > > > > Sturla > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Tue Jul 12 14:47:18 2011 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 12 Jul 2011 14:47:18 -0400 Subject: [Numpy-discussion] New arrays in 1.6 not always C-contiguous In-Reply-To: References: <1310040632.1736.85.camel@casimir> <4E17355F.4000209@molden.no> Message-ID: Hi, On Tue, Jul 12, 2011 at 12:48 PM, Mark Wiebe wrote: [...] > > This only added to the C-API, pre-existing API remained the same for API/ABI > compatibility reasons. C code already had to deal with the possibility of > differing memory layouts, for example if someone passes in carr.T, something > in Fortran order. > This change primarily affected the output layout of ufuncs, newly created > ndarrays continue to be default 'C' order. In that case, there won't be problem with our software(Theano). We already handle correctly all c order for the input, but frequently we allocate some outputs memory and I'm not sure if in all case we check the stride or suppose it is C contiguous. thanks Fr?d?ric Bastien From jhibschman+numpy at gmail.com Tue Jul 12 15:40:25 2011 From: jhibschman+numpy at gmail.com (Johann Hibschman) Date: Tue, 12 Jul 2011 14:40:25 -0500 Subject: [Numpy-discussion] object scalars References: Message-ID: Olivier Delalleau writes: > 2011/7/12 Johann Hibschman > > Is there any way to wrap a sequence (in particular a python list) as a > numpy object scalar, without it being promoted to an object array? > I found a workaround but it's a bit ugly: > def some_call(x): > ? rval = numpy.array(None, dtype='object') > ? rval.fill(x) > ? return rval Thanks, that works for me, as does "rval[()] = x" instead of "rval.fill(x)". Regards, Johann From pwang at streamitive.com Tue Jul 12 16:00:14 2011 From: pwang at streamitive.com (Peter Wang) Date: Tue, 12 Jul 2011 15:00:14 -0500 Subject: [Numpy-discussion] Scipy 2011 Convore thread now open Message-ID: Hi folks, I have gone ahead and created a Convore group for the SciPy 2011 conference: https://convore.com/scipy-2011/ I have already created threads for each of the tutorial topics, and once the conference is underway, we'll create threads for each talk, so that audience can interact and post questions. Everyone is welcome to create topics of their own, in addition to the "official" conference topics. For those who are unfamiliar with Convore, it is a cross between a mailing list and a very souped-up IRC. It's usable for aynchronous discussion, but great for realtime, topical chats. Those of you who were at PyCon this year probably saw what a wonderful tool Convore proved to be for a tech conference. People used it for everything from BoF planning to dinner coordination to good-natured heckling of lightning talk speakers. I'm hoping that it will be used to similarly good effect for the SciPy Cheers, Peter From craigyk at me.com Tue Jul 12 19:39:47 2011 From: craigyk at me.com (Craig Yoshioka) Date: Tue, 12 Jul 2011 16:39:47 -0700 Subject: [Numpy-discussion] named ndarray axes Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998@me.com> I brought up a while ago about how it would be nice if numpy arrays could have their axes 'labeled'. = I got an implementation that works pretty well for me and in the process learned quite a few things, and was hoping to foster some more discussion on this topic, as I think I have found a simple/flexible solution to support this at the numpy level. Here are *some* examples code snippets from my unit tests on 'Array': a = Array((4,5,6)) # you can assign data to all axes by attribute: a.Axes.Label = (0:'z',1:'y',2:'x'} # or add metadata to each individually: a.Axes[1].Vector = [0,1,0] a.Axes[2].Vector = [0,0,1] # simple case int indexing b = a[0] assert b.shape == (5,6) assert b.Axes.Label == {0:'y',1:'x'} assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]} # indexing with slices b = a[:,0,:] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'} assert b.Axes.Vector == {1:[0,0,1]} # indexing with ellipsis b = a[...,0] assert b.shape == (4,5) assert b.Axes.Label == {0:'z',1:'y'} # indexing with ellipsis, newaxis, etc. b = a[newaxis,...,2,newaxis] assert b.shape == (1,4,5,1) assert b.Axes.Label == {1:'z',2:'y'} # indexing with lists b = a[[1,2],:,[1,2]] assert b.shape == (2,5) assert b.Axes.Label == {0:'z',1:'y'} # most interesting examples, indexing with axes labels---------------- # I was a bit confused about how to handle indexing with mixed axes/non-axes indexes # IE: what does a['x',2:4] mean? on what axis is the 2:4 slice being applied, the first? the first after 'x'? # One option is to disallow mixing (simpler to implement, understand?) # Instead I chose to treat the axis indexing as a forced assignment of an axis to a position. # axis indexing that transposes the first two dimensions, but otherwise does nothing b = a['y','z'] assert b.shape == (5,4,6) assert b.Axes.Label == {0:'y',1:'z',2:'x'} # abusing slices to allow specifying indexes for axes b = a['y':0,'z'] assert b.shape == (4,6) assert b.Axes.Label == {0:'z',1:'x'} # unfortunately that means a slice index on an axis must be written like so: b = a['y':slice(0,2),'x','z'] assert b.shape == (2,6,4) assert b.Axes.Label == {0:'y',1:'x',2:'z'} b = a['y':[1,2,3],'x','z':slice(0:1)] # or due to the forced transposition, this is the same as: c = a['y','x','z'][[1,2,3],:,0:1] assert b.shape == (3,6,1) assert b.Axes.Label == {0:'y',1:'x',2:'z'} assert b.shape == c.shape assert b.Axes == c.Axes #---------------------------------------------------------------------------------------- To do all this I essentially had to recreate the effects of numpy indexing on axes.... This is not ideal, but so far I seem to have addressed most of the indexing I've used, at least. Here is what __getitem__ looks like: def __getitem__(self,idxs): filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) array = self.view(ndarray).transpose(transposed_axes) array = array[filtered_idxs] if isinstance(array,ndarray): array = array.view(Array) array.Axes = self.Axes.keep(kept_axes) return array As you can see idx_axes() essentially recreates a lot of ndarray indexing behavior, so that its effects can be explicitly handled. Having done all this, I think the best way for numpy to support 'labeled' axes in the future is by having numpy itself keep track of a very simple tuple attribute, like shape, and leave more complex axis naming/labeling to subclasses on the python side. As an example, upon creating a new dimension in an array, numpy assigns that dimension a semi-unique id, and this tuple could be used in __array_finalize__. For example my __array_finalize__ could look like: def __array_finalize__(self,obj): if hasattr(obj,'axesdata'): for axesid in self.axes: if axesid in obj.axes: self.axesdata[axesid] = obj.axesdata[axesid] This would cover a lot more situations and lead to much simpler code since the work required on the C side would be minimal, but still allow robust and custom tracking and propagation of axes information. Subclasses that tap into this data would react to the result of numpy operations vs. having to predict/anticipate. For example, my __getitem__, relying on the __array_finalize__ above, could look like: def __getitem__(self,idxs): filtered_idxs,transposed_axes= self.idx_axes(idxs) array = self.transpose(transposed_axes) return array[filtered_idxs] Not shown is how much simpler and robust the code for idx_axes would then be. I estimate it would go from 130 loc to < 20 loc. Sorry for the extra long e-mail, -Craig From developer at studioart.org Wed Jul 13 01:36:12 2011 From: developer at studioart.org (Long Duong) Date: Tue, 12 Jul 2011 22:36:12 -0700 Subject: [Numpy-discussion] Scipy 2011 Convore thread now open In-Reply-To: References: Message-ID: Does anybody know if there are there videos of the conference this year? Best regards, Long Duong UC Irvine Biomedical Engineering long at studioart.org On Tue, Jul 12, 2011 at 1:00 PM, Peter Wang wrote: > Hi folks, > > I have gone ahead and created a Convore group for the SciPy 2011 > conference: > > https://convore.com/scipy-2011/ > > I have already created threads for each of the tutorial topics, and > once the conference is underway, we'll create threads for each talk, > so that audience can interact and post questions. Everyone is welcome > to create topics of their own, in addition to the "official" > conference topics. > > For those who are unfamiliar with Convore, it is a cross between a > mailing list and a very souped-up IRC. It's usable for aynchronous > discussion, but great for realtime, topical chats. Those of you who > were at PyCon this year probably saw what a wonderful tool Convore > proved to be for a tech conference. People used it for everything > from BoF planning to dinner coordination to good-natured heckling of > lightning talk speakers. I'm hoping that it will be used to similarly > good effect for the SciPy > > > Cheers, > Peter > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jul 13 12:42:37 2011 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 13 Jul 2011 11:42:37 -0500 Subject: [Numpy-discussion] Scipy 2011 Convore thread now open In-Reply-To: References: Message-ID: On Wed, Jul 13, 2011 at 00:36, Long Duong wrote: > > Does anybody know if there are there videos of the conference this year? Yes. Announcements will be made when they start going online. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From ischnell at enthought.com Wed Jul 13 17:10:49 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Wed, 13 Jul 2011 16:10:49 -0500 Subject: [Numpy-discussion] Fwd: [Enthought-Dev] Linking issues with libX11.so In-Reply-To: References: Message-ID: Hello List, Varun, who is a debian packager ran into some problems while compiling Enable, as it uses numpy.distutils, which did not locate the location of the X11 libraries correctly. Maybe this can be fixed in the numpy 1.6.1 release. Please the the forwarded conversation below. Thanks Ilan ---------- Forwarded message ---------- From: Varun Hiremath Date: Wed, Jul 13, 2011 at 4:03 PM Subject: [Enthought-Dev] Linking issues with libX11.so To: enthought-dev at enthought.com Hi Ilan, You were right, the issue was with the X11 library. The _plat_support.so was not linked to libX11 and so the chaco examples were failing; and the reason libX11 was not linked was because numpy distutils' x11_info was failing. I figured out that in debian/ubuntu with a new multi_arch build support [1] the libraries are being moved from the standard /usr/lib and /usr/lib64 directories to architecture specific directories like: /usr/lib/i386-linux-gnu/ /usr/lib/x86_64-linux-gnu/ and so the numpy.distutil was failing to find libX11 with the latest version of libX11-dev on debian which installs libX11.so in /usr/lib/x86_64-linux-gnu/ (on my amd64 system). The nump.distutils' scripts need to be updated to handle this, but for now I am using the following patch to force _plat_support.so link with X11 (which I think is always present in the default search path): --------------------------- @@ -230,6 +144,7 @@ ? ? elif plat in ['x11','gtk1']: ? ? ? ? x11_info = get_info('x11', notfound_action=1) ? ? ? ? dict_append(plat_info, **x11_info) + ? ? ? ?dict_append(plat_info, libraries = ['X11']) --------------------------- With this everything seems to be working fine! Thanks, Varun [1] https://wiki.ubuntu.com/MultiarchSpec On Mon, Jul 11, 2011 at 10:53 PM, Ilan Schnell wrote: > Hello Varun, > > the important part is: _plat_support.so: undefined symbol: XCreateImage > This indicates that the kiva/agg/_plat_support.so C extension was > not linked to X11 while compiling. ?Until > https://github.com/enthought/enable/commit/ebecdbfc5c4596282204e61ff687c3ab2442947a > which was made shortly after the release, it was easy to create a broken > Enable build like this one. ?Note that this commit does *not* fix the problem, > it only causes the build to fail right away, instead of creating a broken build. > This was added because of the famous esr quote: > "When you must fail, fail noisily and as soon as possible." > > As enable uses numpy.distutils to build agg, the fix is to edit: > /lib/python2.6/site-packages/numpy/distutils/site.cfg > > and add: > [x11] > library_dirs = ... > include_dirs = ... > > - Ilan > > > On Mon, Jul 11, 2011 at 9:23 PM, Varun Hiremath wrote: >> Hi all, >> >> I am facing another issue running chaco examples with the new ETS 4.0 >> packages. I am getting the following error when I run any chaco >> example: >> -------------------------- >> $$ python zoom_plot.py >> /usr/lib/python2.6/dist-packages/enable/wx/image.py:16: Warning: Error initializing Agg: /usr/lib/python2.6/dist-packages/kiva/agg/_plat_support.so: undefined symbol: XCreateImage >> ?from kiva.agg import CompiledPath, GraphicsContextSystem as GraphicsContext >> Traceback (most recent call last): >> ?File "zoom_plot.py", line 15, in >> ? ?from enable.api import Component, ComponentEditor >> ?File "/usr/lib/python2.6/dist-packages/enable/api.py", line 42, in >> ? ?from graphics_context import GraphicsContextEnable, ImageGraphicsContextEnable >> ?File "/usr/lib/python2.6/dist-packages/enable/graphics_context.py", line 86, in >> ? ?class GraphicsContextEnable(EnableGCMixin, GraphicsContext): >> TypeError: Error when calling the metaclass bases >> ? ?metaclass conflict: the metaclass of a derived class must be a (non-strict) subclass of the metaclasses of all its bases >> ------------------------- >> >> Does anybody know what could be the problem? >> >> Thanks, >> Varun >> >> p.s. Most of the ETS 4.0 debian packages are now available in debian unstable. >> >> >> On Sat, 09 Jul, 2011 at 12:45:32PM -0500, Ilan Schnell wrote: >>> I'm glad it worked. ?That's a good idea, I'll release traitsui-4.0.1 >>> later today. >>> >>> - Ilan >>> >>> >>> On Sat, Jul 9, 2011 at 11:20 AM, Varun Hiremath wrote: >>> > Hi Ilan, >>> > >>> > Thanks, that worked! Are you planning on doing a point release for >>> > traitsui to fix this bug? It would make packaging easier then. >>> > >>> > Thanks, >>> > Varun >>> > >>> > On Sat, 09 Jul, 2011 at 11:11:26AM -0500, Ilan Schnell wrote: >>> >> Hello Varun, >>> >> >>> >> I ran into the same bug when preparing the EPD 7.1 release. >>> >> The fix is commited to the github master of traitsui: >>> >> https://github.com/enthought/traitsui/commit/4f36a8a27cfa131347dd90d1a8e10a37358cf634 >>> >> >>> >> Just replace the two zip-files with the fixed ones, and it should work. >>> >> >>> >> - Ilan >>> >> >>> >> >>> >> On Sat, Jul 9, 2011 at 10:27 AM, Varun Hiremath wrote: >>> >> > Hi, >>> >> > >>> >> > I was trying to update the debian packages to the new ETS 4.0 release, >>> >> > but I am having some trouble getting mayavi2 running. I get the error >>> >> > shown below when I run mayavi2. Could someone please let me know what >>> >> > might be causing this error? >>> >> > >>> >> > Thanks, >>> >> > Varun >>> >> > >>> >> > ----------------- >>> >> > $ mayavi2 >>> >> > Traceback (most recent call last): >>> >> > ?File "/usr/bin/mayavi2", line 658, in >>> >> > ? ?main() >>> >> > ?File "/usr/bin/mayavi2", line 649, in main >>> >> > ? ?mayavi.main(sys.argv[1:]) >>> >> > ?File "/usr/lib/python2.6/dist-packages/mayavi/plugins/app.py", line 195, in main >>> >> > ? ?app.run() >>> >> > ?File "/usr/lib/python2.6/dist-packages/mayavi/plugins/mayavi_workbench_application.py", line 81, in run >>> >> > ? ?window.open() >>> >> > ?File "/usr/lib/python2.6/dist-packages/pyface/workbench/workbench_window.py", line 144, in open >>> >> > ? ?self._create() >>> >> > ?File "/usr/lib/python2.6/dist-packages/pyface/ui/wx/application_window.py", line 150, in _create >>> >> > ? ?contents = self._create_contents(body) >>> >> > ?File "/usr/lib/python2.6/dist-packages/pyface/workbench/workbench_window.py", line 217, in _create_contents >>> >> > ? ?contents = self.layout.create_initial_layout(parent) >>> >> > ?File "/usr/lib/python2.6/dist-packages/pyface/ui/wx/workbench/workbench_window_layout.py", line 151, in create_initial_layout >>> >> > ? ?self._wx_view_dock_window = WorkbenchDockWindow(parent) >>> >> > ?File "/usr/lib/python2.6/dist-packages/pyface/dock/dock_window.py", line 324, in __init__ >>> >> > ? ?if self.theme.use_theme_color: >>> >> > ?File "/usr/lib/python2.6/dist-packages/pyface/dock/dock_window.py", line 335, in _theme_default >>> >> > ? ?return dock_window_theme() >>> >> > ?File "/usr/lib/python2.6/dist-packages/traitsui/dock_window_theme.py", line 92, in dock_window_theme >>> >> > ? ?from .default_dock_window_theme import default_dock_window_theme >>> >> > ?File "/usr/lib/python2.6/dist-packages/traitsui/default_dock_window_theme.py", line 39, in >>> >> > ? ?label = ( 0, -3 ), content = ( 7, 6, 0, 0 ) ), >>> >> > ?File "/usr/lib/python2.6/dist-packages/traitsui/theme.py", line 63, in __init__ >>> >> > ? ?self.image = image >>> >> > ?File "/usr/lib/python2.6/dist-packages/traitsui/ui_traits.py", line 229, in validate >>> >> > ? ?self.error( object, name, value ) >>> >> > ?File "/usr/lib/python2.6/dist-packages/traits/trait_handlers.py", line 168, in error >>> >> > ? ?value ) >>> >> > traits.trait_errors.TraitError: The 'image' trait of a Theme instance must be an ImageResource or string that can be used to define one, but a value of '@std:tab_active' was specified. >>> >> > Exception in thread Thread-1: >>> >> > Traceback (most recent call last): >>> >> > ?File "/usr/lib/python2.6/threading.py", line 532, in __bootstrap_inner >>> >> > ? ?self.run() >>> >> > ?File "/usr/lib/python2.6/threading.py", line 484, in run >>> >> > ? ?self.__target(*self.__args, **self.__kwargs) >>> >> > ?File "/usr/lib/python2.6/dist-packages/traitsui/image/image.py", line 329, in _process >>> >> > ? ?if time() > (self.time_stamp + 2.0): >>> >> > TypeError: 'NoneType' object is not callable >>> >> > --------------- >>> >> > _______________________________________________ >>> >> > Enthought-Dev mailing list >>> >> > Enthought-Dev at mail.enthought.com >>> >> > https://mail.enthought.com/mailman/listinfo/enthought-dev >>> >> > >>> >> _______________________________________________ >>> >> Enthought-Dev mailing list >>> >> Enthought-Dev at mail.enthought.com >>> >> https://mail.enthought.com/mailman/listinfo/enthought-dev >>> > _______________________________________________ >>> > Enthought-Dev mailing list >>> > Enthought-Dev at mail.enthought.com >>> > https://mail.enthought.com/mailman/listinfo/enthought-dev >>> > >>> _______________________________________________ >>> Enthought-Dev mailing list >>> Enthought-Dev at mail.enthought.com >>> https://mail.enthought.com/mailman/listinfo/enthought-dev >> _______________________________________________ >> Enthought-Dev mailing list >> Enthought-Dev at mail.enthought.com >> https://mail.enthought.com/mailman/listinfo/enthought-dev >> > _______________________________________________ > Enthought-Dev mailing list > Enthought-Dev at mail.enthought.com > https://mail.enthought.com/mailman/listinfo/enthought-dev > _______________________________________________ Enthought-Dev mailing list Enthought-Dev at mail.enthought.com https://mail.enthought.com/mailman/listinfo/enthought-dev From samquinan at gmail.com Wed Jul 13 18:13:18 2011 From: samquinan at gmail.com (Sam Quinan) Date: Wed, 13 Jul 2011 17:13:18 -0500 Subject: [Numpy-discussion] named ndarray axes In-Reply-To: Message-ID: I'm currently working on interfacing ndarrays with a custom C-representation for n-dimensional arrays. My custom C code provides additional per-axis information (labeling, spacing between samples / range of sample positions along the axis, axis direction, cell vs.node centering, etc.) Subclassing ndarray to hold onto this info is fairly simple, but getting numpy's methods to intelligently modify that information when the array is sliced is something that I'm still trying to figure out. A robust way to attach per-axis info to a given ndarray, whether it just be a label or some more complex structure, would definitely be something I (and likely others) would find useful... That said, I'd love to know more about how the idx_axes() structure in your workaround works... - Sam On 7/13/11 12:00 PM, "numpy-discussion-request at scipy.org" wrote: > Date: Tue, 12 Jul 2011 16:39:47 -0700 > From: Craig Yoshioka > Subject: [Numpy-discussion] named ndarray axes > To: NumPy-Discussion at scipy.org > Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998 at me.com> > Content-Type: text/plain; CHARSET=US-ASCII > > I brought up a while ago about how it would be nice if numpy arrays could have > their axes 'labeled'. = I got an implementation that works pretty well for > me and in the process learned quite a few things, and was hoping to foster > some more discussion on this topic, as I think I have found a simple/flexible > solution to support this at the numpy level. > > Here are *some* examples code snippets from my unit tests on 'Array': > > a = Array((4,5,6)) > > # you can assign data to all axes by attribute: > a.Axes.Label = (0:'z',1:'y',2:'x'} > > # or add metadata to each individually: > a.Axes[1].Vector = [0,1,0] > a.Axes[2].Vector = [0,0,1] > > # simple case int indexing > b = a[0] > assert b.shape == (5,6) > assert b.Axes.Label == {0:'y',1:'x'} > assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]} > > # indexing with slices > b = a[:,0,:] > assert b.shape == (4,6) > assert b.Axes.Label == {0:'z',1:'x'} > assert b.Axes.Vector == {1:[0,0,1]} > > # indexing with ellipsis > b = a[...,0] > assert b.shape == (4,5) > assert b.Axes.Label == {0:'z',1:'y'} > > # indexing with ellipsis, newaxis, etc. > b = a[newaxis,...,2,newaxis] > assert b.shape == (1,4,5,1) > assert b.Axes.Label == {1:'z',2:'y'} > > # indexing with lists > b = a[[1,2],:,[1,2]] > assert b.shape == (2,5) > assert b.Axes.Label == {0:'z',1:'y'} > > # most interesting examples, indexing with axes labels---------------- > # I was a bit confused about how to handle indexing with mixed > axes/non-axes indexes > # IE: what does a['x',2:4] mean? on what axis is the 2:4 slice being > applied, the first? the first after 'x'? > # One option is to disallow mixing (simpler to implement, > understand?) > # Instead I chose to treat the axis indexing as a forced assignment > of an axis to a position. > > # axis indexing that transposes the first two dimensions, but otherwise > does nothing > b = a['y','z'] > assert b.shape == (5,4,6) > assert b.Axes.Label == {0:'y',1:'z',2:'x'} > > # abusing slices to allow specifying indexes for axes > b = a['y':0,'z'] > assert b.shape == (4,6) > assert b.Axes.Label == {0:'z',1:'x'} > > # unfortunately that means a slice index on an axis must be written like > so: > b = a['y':slice(0,2),'x','z'] > assert b.shape == (2,6,4) > assert b.Axes.Label == {0:'y',1:'x',2:'z'} > > b = a['y':[1,2,3],'x','z':slice(0:1)] > # or due to the forced transposition, this is the same as: > c = a['y','x','z'][[1,2,3],:,0:1] > > assert b.shape == (3,6,1) > assert b.Axes.Label == {0:'y',1:'x',2:'z'} > assert b.shape == c.shape > assert b.Axes == c.Axes > > > #----------------------------------------------------------------------------- > ----------- > > > To do all this I essentially had to recreate the effects of numpy indexing on > axes.... This is not ideal, but so far I seem to have addressed most of the > indexing I've used, at least. Here is what __getitem__ looks like: > > def __getitem__(self,idxs): > filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) > array = self.view(ndarray).transpose(transposed_axes) > array = array[filtered_idxs] > if isinstance(array,ndarray): > array = array.view(Array) > array.Axes = self.Axes.keep(kept_axes) > return array > > As you can see idx_axes() essentially recreates a lot of ndarray indexing > behavior, so that its effects can be explicitly handled. > > Having done all this, I think the best way for numpy to support 'labeled' axes > in the future is by having numpy itself keep track of a very simple tuple > attribute, like shape, and leave more complex axis naming/labeling to > subclasses on the python side. As an example, upon creating a new dimension > in an array, numpy assigns that dimension a semi-unique id, and this tuple > could be used in __array_finalize__. > > For example my __array_finalize__ could look like: > > def __array_finalize__(self,obj): > if hasattr(obj,'axesdata'): > for axesid in self.axes: > if axesid in obj.axes: > self.axesdata[axesid] = obj.axesdata[axesid] > > > This would cover a lot more situations and lead to much simpler code since the > work required on the C side would be minimal, but still allow robust and > custom tracking and propagation of axes information. > Subclasses that tap into this data would react to the result of numpy > operations vs. having to predict/anticipate. > > For example, my __getitem__, relying on the __array_finalize__ above, could > look like: > > def __getitem__(self,idxs): > filtered_idxs,transposed_axes= self.idx_axes(idxs) > array = self.transpose(transposed_axes) > return array[filtered_idxs] > > Not shown is how much simpler and robust the code for idx_axes would then be. > I estimate it would go from 130 loc to < 20 loc. > > Sorry for the extra long e-mail, > -Craig From samquinan at gmail.com Wed Jul 13 18:47:59 2011 From: samquinan at gmail.com (Sam Quinan) Date: Wed, 13 Jul 2011 17:47:59 -0500 Subject: [Numpy-discussion] Fate of Numpy's Array Interface In-Reply-To: Message-ID: Hey, So I'm working on interfacing numpy ndarrays with an n-dimensional array representation that exists as part of a massive custom C library. Due to the size of the library, hand-coding a c-extension for the library just was not really an option; so we wound up using gcc_xml to generate the proper ctypes code. This works great for accessing our C functions within python, but not so much for trying share memory between numpy and our custom array representations... Passing a pointer to the numpy array data to ctypes is fairly simple, but figuring out the proper way to get memory from ctypes into numpy has been problematic. I know that PEP 3118 is supposed to be superseding the numpy array interface, but PEP 3118 can only be specified on the C side, which is problematic for anybody using ctypes to wrap their C code. The legacy __array_interface__ allows for a python side specification of data buffers, but there appears to be no corresponding interface capability in the PEP 3118 protocol. On top of that add the fact that Python's own support for PEP 3118 has some major bugs (ctypes throwing invalid PEP 3118 codes - http://bugs.python.org/issue10746 :: issues with python's memoryview object - http://bugs.python.org/issue10181), and PEP 3118 seems like a nightmare to deal with. At the same time though, I don't want to simply use the legacy array interface if it's going to be completely deprecated in the near future. How long before the legacy __array_interface__ goes the way of the dodo? When that happens, are there plans to add support for a python side interface to the PEP 3118 protocol? If not, what is the proper way to interface a ctypes wrapped library with PEP 3118? Thanks, - Sam Quinan From craigyk at me.com Wed Jul 13 19:15:06 2011 From: craigyk at me.com (Craig Yoshioka) Date: Wed, 13 Jul 2011 16:15:06 -0700 Subject: [Numpy-discussion] named ndarray axes In-Reply-To: References: Message-ID: <0C413075-3343-4C9B-8D17-AEE2A8E1593B@me.com> Yup exactly. To enable this sort of tracking I needed to explicitly reverse-engineer the effects of indexing on axes. I figure overriding indexing catches most cases that modify axes, but other holes need to be plugged as well... ie: tranpose, swapaxes. This probably means most C functions that change array axes (np.mean(axis=), etc.) need to be covered as well.... that sucks. BTW, it sounds like you're trying to track very similar data. I am trying to load structural biology data formats, and I try to preserve as much of the metadata as possible, ie: I am encoding unit cell length/angle information as vectors, etc. Here is my implementation: def __getitem__(self,idxs): idxs,trans,keep = idx_axes(self,idxs) array = self.view(np.ndarray).transpose(trans) array = array[idxs] if isinstance(array,ndarray): array = array.view(self.__class__) array.axes = self.axes.transpose(keep) return array def idx_axes(array,idxs): # explicitly expand ellipsis expanded_idxs = idx_expanded(array.ndim,idxs) # determine how the axes will be rearranged as a result of axes-based indexing # and the creation of newaxes remapped_axes = idx_axes_remapped(array.ndim,array.axes,expanded_idxs) # determine numpy compatible transpose, before newaxes are created transposed_axes = idx_axes_transposed(remapped_axes) # determine numpy compatible indexes with axes-based indexing removed filtered_idxs = idx_filtered(expanded_idxs) # determine which axes will be kept after numpy indexing kept_axes = idx_axes_kept(remapped_axes,filtered_idxs) return filtered_idxs,transposed_axes,kept_axes def idx_expanded(ndim,idxs): ''' explicitly expands ellipsis taking into account newaxes ''' if not isinstance(idxs,tuple): return idx_expanded(ndim,(idxs,)) # how many dimensions we will end up having ndim = ndim + idxs.count(newaxis) filler = slice(None) def skip_ellipsis(idxs): return tuple([filler if isinstance(x,type(Ellipsis)) else x for x in idxs]) def fill_ellipsis(ndim,l,r): return (filler,)*(ndim-len(l)-len(r)) # expand first ellipsis, treat all other ellipsis as slices if Ellipsis in idxs: idx = idxs.index(Ellipsis) llist = idxs[:idx] rlist = skip_ellipsis(idxs[idx+1:]) cfill = fill_ellipsis(ndim,llist,rlist) idxs = llist + cfill + rlist return idxs def idx_filtered(idxs): ''' replace indexes on axes with normal numpy indexes ''' def axisAsIdx(idx): if isinstance(idx,str): return slice(None) elif isinstance(idx,slice): if isinstance(idx.start,str): return idx.stop return idx return tuple([axisAsIdx(x) for x in idxs]) def idx_axes_remapped(ndim,axes,idxs): ''' if given a set of array indexes that contain labeled axes, return a tuple that maps axes in the source array to the axes as they will end up in the destination array. Must take into account the spaces created by newaxis indexes. ''' # how many dims are we expecting? ndim = ndim + idxs.count(newaxis) # new unique object for marking unassigned locations in mapping unassigned = object() # by default all locations are unsassigned mapping = [unassigned] * ndim # find labels in indexes and set the dims for those locations for dim,label in enumerate(idxs): if label == newaxis: mapping[dim] = label elif isinstance(label,str): mapping[dim] = axes.dimForLabel(label) elif isinstance(label,slice): if isinstance(label.start,str): mapping[dim] = axes.dimForLabel(label.start) # find unassigned dims, in order unmapped = [d for d in range(ndim) if d not in set(mapping)] # fill in remaining unassigned locations with dims for dst,src in enumerate(mapping): if src == unassigned: mapping[dst] = unmapped.pop(0) return tuple(mapping) def idx_axes_transposed(mapping): ''' stripping out newaxes in mapping yields a tuple compatible with transpose ''' return tuple([x for x in mapping if x != newaxis]) def idx_axes_kept(mapping,idxs): ''' remove axes from mapping that will not survive the indexing (ie: ints) ''' kept = [] first_list = True for dst,src in enumerate(mapping): if dst < len(idxs): idx = idxs[dst] if isinstance(idx,int): continue elif isinstance(idx,list): if not first_list: continue first_list = False kept += [src] return tuple(kept) On Jul 13, 2011, at 3:13 PM, Sam Quinan wrote: > I'm currently working on interfacing ndarrays with a custom C-representation > for n-dimensional arrays. My custom C code provides additional per-axis > information (labeling, spacing between samples / range of sample positions > along the axis, axis direction, cell vs.node centering, etc.) Subclassing > ndarray to hold onto this info is fairly simple, but getting numpy's methods > to intelligently modify that information when the array is sliced is > something that I'm still trying to figure out. > > A robust way to attach per-axis info to a given ndarray, whether it just be > a label or some more complex structure, would definitely be something I (and > likely others) would find useful... > > That said, I'd love to know more about how the idx_axes() structure in your > workaround works... > > - Sam > > > > On 7/13/11 12:00 PM, "numpy-discussion-request at scipy.org" > wrote: > >> Date: Tue, 12 Jul 2011 16:39:47 -0700 >> From: Craig Yoshioka >> Subject: [Numpy-discussion] named ndarray axes >> To: NumPy-Discussion at scipy.org >> Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998 at me.com> >> Content-Type: text/plain; CHARSET=US-ASCII >> >> I brought up a while ago about how it would be nice if numpy arrays could have >> their axes 'labeled'. = I got an implementation that works pretty well for >> me and in the process learned quite a few things, and was hoping to foster >> some more discussion on this topic, as I think I have found a simple/flexible >> solution to support this at the numpy level. >> >> Here are *some* examples code snippets from my unit tests on 'Array': >> >> a = Array((4,5,6)) >> >> # you can assign data to all axes by attribute: >> a.Axes.Label = (0:'z',1:'y',2:'x'} >> >> # or add metadata to each individually: >> a.Axes[1].Vector = [0,1,0] >> a.Axes[2].Vector = [0,0,1] >> >> # simple case int indexing >> b = a[0] >> assert b.shape == (5,6) >> assert b.Axes.Label == {0:'y',1:'x'} >> assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]} >> >> # indexing with slices >> b = a[:,0,:] >> assert b.shape == (4,6) >> assert b.Axes.Label == {0:'z',1:'x'} >> assert b.Axes.Vector == {1:[0,0,1]} >> >> # indexing with ellipsis >> b = a[...,0] >> assert b.shape == (4,5) >> assert b.Axes.Label == {0:'z',1:'y'} >> >> # indexing with ellipsis, newaxis, etc. >> b = a[newaxis,...,2,newaxis] >> assert b.shape == (1,4,5,1) >> assert b.Axes.Label == {1:'z',2:'y'} >> >> # indexing with lists >> b = a[[1,2],:,[1,2]] >> assert b.shape == (2,5) >> assert b.Axes.Label == {0:'z',1:'y'} >> >> # most interesting examples, indexing with axes labels---------------- >> # I was a bit confused about how to handle indexing with mixed >> axes/non-axes indexes >> # IE: what does a['x',2:4] mean? on what axis is the 2:4 slice being >> applied, the first? the first after 'x'? >> # One option is to disallow mixing (simpler to implement, >> understand?) >> # Instead I chose to treat the axis indexing as a forced assignment >> of an axis to a position. >> >> # axis indexing that transposes the first two dimensions, but otherwise >> does nothing >> b = a['y','z'] >> assert b.shape == (5,4,6) >> assert b.Axes.Label == {0:'y',1:'z',2:'x'} >> >> # abusing slices to allow specifying indexes for axes >> b = a['y':0,'z'] >> assert b.shape == (4,6) >> assert b.Axes.Label == {0:'z',1:'x'} >> >> # unfortunately that means a slice index on an axis must be written like >> so: >> b = a['y':slice(0,2),'x','z'] >> assert b.shape == (2,6,4) >> assert b.Axes.Label == {0:'y',1:'x',2:'z'} >> >> b = a['y':[1,2,3],'x','z':slice(0:1)] >> # or due to the forced transposition, this is the same as: >> c = a['y','x','z'][[1,2,3],:,0:1] >> >> assert b.shape == (3,6,1) >> assert b.Axes.Label == {0:'y',1:'x',2:'z'} >> assert b.shape == c.shape >> assert b.Axes == c.Axes >> >> >> #----------------------------------------------------------------------------- >> ----------- >> >> >> To do all this I essentially had to recreate the effects of numpy indexing on >> axes.... This is not ideal, but so far I seem to have addressed most of the >> indexing I've used, at least. Here is what __getitem__ looks like: >> >> def __getitem__(self,idxs): >> filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) >> array = self.view(ndarray).transpose(transposed_axes) >> array = array[filtered_idxs] >> if isinstance(array,ndarray): >> array = array.view(Array) >> array.Axes = self.Axes.keep(kept_axes) >> return array >> >> As you can see idx_axes() essentially recreates a lot of ndarray indexing >> behavior, so that its effects can be explicitly handled. >> >> Having done all this, I think the best way for numpy to support 'labeled' axes >> in the future is by having numpy itself keep track of a very simple tuple >> attribute, like shape, and leave more complex axis naming/labeling to >> subclasses on the python side. As an example, upon creating a new dimension >> in an array, numpy assigns that dimension a semi-unique id, and this tuple >> could be used in __array_finalize__. >> >> For example my __array_finalize__ could look like: >> >> def __array_finalize__(self,obj): >> if hasattr(obj,'axesdata'): >> for axesid in self.axes: >> if axesid in obj.axes: >> self.axesdata[axesid] = obj.axesdata[axesid] >> >> >> This would cover a lot more situations and lead to much simpler code since the >> work required on the C side would be minimal, but still allow robust and >> custom tracking and propagation of axes information. >> Subclasses that tap into this data would react to the result of numpy >> operations vs. having to predict/anticipate. >> >> For example, my __getitem__, relying on the __array_finalize__ above, could >> look like: >> >> def __getitem__(self,idxs): >> filtered_idxs,transposed_axes= self.idx_axes(idxs) >> array = self.transpose(transposed_axes) >> return array[filtered_idxs] >> >> Not shown is how much simpler and robust the code for idx_axes would then be. >> I estimate it would go from 130 loc to < 20 loc. >> >> Sorry for the extra long e-mail, >> -Craig > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From wesmckinn at gmail.com Wed Jul 13 23:26:21 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 13 Jul 2011 23:26:21 -0400 Subject: [Numpy-discussion] named ndarray axes In-Reply-To: <0C413075-3343-4C9B-8D17-AEE2A8E1593B@me.com> References: <0C413075-3343-4C9B-8D17-AEE2A8E1593B@me.com> Message-ID: On Wed, Jul 13, 2011 at 7:15 PM, Craig Yoshioka wrote: > Yup exactly. ?To enable this sort of tracking I needed to explicitly reverse-engineer the effects of indexing on axes. ?I figure overriding indexing catches most cases that modify axes, but other holes need to be plugged as well... ie: tranpose, swapaxes. ?This probably means most C functions that change array axes (np.mean(axis=), etc.) need to be covered as well.... that sucks. > > BTW, it sounds like you're trying to track very similar data. ?I am trying to load structural biology data formats, and I try to preserve as much of the metadata as possible, ie: I am encoding unit cell length/angle information as vectors, etc. > > Here is my implementation: > > ? ?def __getitem__(self,idxs): > ? ? ? ?idxs,trans,keep = idx_axes(self,idxs) > ? ? ? ?array = self.view(np.ndarray).transpose(trans) > ? ? ? ?array = array[idxs] > ? ? ? ?if isinstance(array,ndarray): > ? ? ? ? ? ?array = array.view(self.__class__) > ? ? ? ? ? ?array.axes = self.axes.transpose(keep) > ? ? ? ?return array > > > def idx_axes(array,idxs): > > ? ?# explicitly expand ellipsis > ? ?expanded_idxs = idx_expanded(array.ndim,idxs) > > ? ?# determine how the axes will be rearranged as a result of axes-based indexing > ? ?# and the creation of newaxes > ? ?remapped_axes = idx_axes_remapped(array.ndim,array.axes,expanded_idxs) > > ? ?# determine numpy compatible transpose, before newaxes are created > ? ?transposed_axes = idx_axes_transposed(remapped_axes) > > ? ?# determine numpy compatible indexes with axes-based indexing removed > ? ?filtered_idxs = idx_filtered(expanded_idxs) > > ? ?# determine which axes will be kept after numpy indexing > ? ?kept_axes = idx_axes_kept(remapped_axes,filtered_idxs) > > ? ?return filtered_idxs,transposed_axes,kept_axes > > > def idx_expanded(ndim,idxs): > ? ?''' > ? ?explicitly expands ellipsis taking into account newaxes > ? ?''' > > ? ?if not isinstance(idxs,tuple): > ? ? ? ?return idx_expanded(ndim,(idxs,)) > > ? ?# how many dimensions we will end up having > ? ?ndim = ndim + idxs.count(newaxis) > > ? ?filler = slice(None) > > ? ?def skip_ellipsis(idxs): > ? ? ? ?return tuple([filler if isinstance(x,type(Ellipsis)) else x for x in idxs]) > > ? ?def fill_ellipsis(ndim,l,r): > ? ? ? ?return (filler,)*(ndim-len(l)-len(r)) > > ? ?# expand first ellipsis, treat all other ellipsis as slices > ? ?if Ellipsis in idxs: > ? ? ? ?idx = idxs.index(Ellipsis) > ? ? ? ?llist = idxs[:idx] > ? ? ? ?rlist = skip_ellipsis(idxs[idx+1:]) > ? ? ? ?cfill = fill_ellipsis(ndim,llist,rlist) > ? ? ? ?idxs = llist + cfill + rlist > > ? ?return idxs > > > def idx_filtered(idxs): > ? ?''' > ? ?replace indexes on axes with normal numpy indexes > ? ?''' > ? ?def axisAsIdx(idx): > ? ? ? ?if isinstance(idx,str): > ? ? ? ? ? ?return slice(None) > ? ? ? ?elif isinstance(idx,slice): > ? ? ? ? ? ?if isinstance(idx.start,str): > ? ? ? ? ? ? ? ?return idx.stop > ? ? ? ?return idx > > ? ?return tuple([axisAsIdx(x) for x in idxs]) > > > def idx_axes_remapped(ndim,axes,idxs): > ? ?''' > ? ?if given a set of array indexes that contain labeled axes, > ? ?return a tuple that maps axes in the source array to the axes > ? ?as they will end up in the destination array. ?Must take into > ? ?account the spaces created by newaxis indexes. > ? ?''' > > ? ?# how many dims are we expecting? > ? ?ndim = ndim + idxs.count(newaxis) > > ? ?# new unique object for marking unassigned locations in mapping > ? ?unassigned = object() > > ? ?# by default all locations are unsassigned > ? ?mapping = [unassigned] * ndim > > ? ?# find labels in indexes and set the dims for those locations > ? ?for dim,label in enumerate(idxs): > ? ? ? ?if label == newaxis: > ? ? ? ? ? ?mapping[dim] = label > ? ? ? ?elif isinstance(label,str): > ? ? ? ? ? ?mapping[dim] = axes.dimForLabel(label) > ? ? ? ?elif isinstance(label,slice): > ? ? ? ? ? ?if isinstance(label.start,str): > ? ? ? ? ? ? ? ?mapping[dim] = axes.dimForLabel(label.start) > > ? ?# find unassigned dims, in order > ? ?unmapped = [d for d in range(ndim) if d not in set(mapping)] > > ? ?# fill in remaining unassigned locations with dims > ? ?for dst,src in enumerate(mapping): > ? ? ? ?if src == unassigned: > ? ? ? ? ? ?mapping[dst] = unmapped.pop(0) > > ? ?return tuple(mapping) > > > def idx_axes_transposed(mapping): > ? ?''' > ? ?stripping out newaxes in mapping yields a tuple compatible with transpose > ? ?''' > ? ?return tuple([x for x in mapping if x != newaxis]) > > > def idx_axes_kept(mapping,idxs): > ? ?''' > ? ?remove axes from mapping that will not survive the indexing (ie: ints) > ? ?''' > ? ?kept = [] > ? ?first_list = True > ? ?for dst,src in enumerate(mapping): > ? ? ? ?if dst < len(idxs): > ? ? ? ? ? ?idx = idxs[dst] > ? ? ? ? ? ?if isinstance(idx,int): > ? ? ? ? ? ? ? ?continue > ? ? ? ? ? ?elif isinstance(idx,list): > ? ? ? ? ? ? ? ?if not first_list: > ? ? ? ? ? ? ? ? ? ?continue > ? ? ? ? ? ? ? ?first_list = False > ? ? ? ?kept += [src] > ? ?return tuple(kept) > > > > > On Jul 13, 2011, at 3:13 PM, Sam Quinan wrote: > >> I'm currently working on interfacing ndarrays with a custom C-representation >> for n-dimensional arrays. My custom C code provides additional per-axis >> information (labeling, spacing between samples / range of sample positions >> along the axis, axis direction, cell vs.node centering, etc.) Subclassing >> ndarray to hold onto this info is fairly simple, but getting numpy's methods >> to intelligently modify that information when the array is sliced is >> something that I'm still trying to figure out. >> >> A robust way to attach per-axis info to a given ndarray, whether it just be >> a label or some more complex structure, would definitely be something I (and >> likely others) would find useful... >> >> That said, I'd love to know more about how the idx_axes() structure in your >> workaround works... >> >> - Sam >> >> >> >> On 7/13/11 12:00 PM, "numpy-discussion-request at scipy.org" >> wrote: >> >>> Date: Tue, 12 Jul 2011 16:39:47 -0700 >>> From: Craig Yoshioka >>> Subject: [Numpy-discussion] named ndarray axes >>> To: NumPy-Discussion at scipy.org >>> Message-ID: <0FC8B43E-26CD-40ED-A6FA-59DD8D641998 at me.com> >>> Content-Type: text/plain; CHARSET=US-ASCII >>> >>> I brought up a while ago about how it would be nice if numpy arrays could have >>> their axes 'labeled'. ? ?= I got an implementation that works pretty well for >>> me and in the process learned quite a few things, and was hoping to foster >>> some more discussion on this topic, as I think I have found a simple/flexible >>> solution to support this at the numpy level. >>> >>> Here are *some* examples code snippets from my unit tests on 'Array': >>> >>> ? ?a = Array((4,5,6)) >>> >>> ? ?# you can assign data to all axes by attribute: >>> ? ?a.Axes.Label = (0:'z',1:'y',2:'x'} >>> >>> ? ?# or add metadata to each individually: >>> ? ?a.Axes[1].Vector = [0,1,0] >>> ? ?a.Axes[2].Vector = [0,0,1] >>> >>> ? ?# simple case int indexing >>> ? ?b = a[0] >>> ? ?assert b.shape == (5,6) >>> ? ?assert b.Axes.Label == {0:'y',1:'x'} >>> ? ?assert b.Axes.Vector == {0:[0,1,0],1:[0,0,1]} >>> >>> ? ?# indexing with slices >>> ? ?b = a[:,0,:] >>> ? ?assert b.shape == (4,6) >>> ? ?assert b.Axes.Label == {0:'z',1:'x'} >>> ? ?assert b.Axes.Vector == {1:[0,0,1]} >>> >>> ? ?# indexing with ellipsis >>> ? ?b = a[...,0] >>> ? ?assert b.shape == (4,5) >>> ? ?assert b.Axes.Label == {0:'z',1:'y'} >>> >>> ? ?# indexing with ellipsis, newaxis, etc. >>> ? ?b = a[newaxis,...,2,newaxis] >>> ? ?assert b.shape == (1,4,5,1) >>> ? ?assert b.Axes.Label == {1:'z',2:'y'} >>> >>> ? ?# indexing with lists >>> ? ?b = a[[1,2],:,[1,2]] >>> ? ?assert b.shape == (2,5) >>> ? ?assert b.Axes.Label == {0:'z',1:'y'} >>> >>> ? ?# most interesting examples, indexing with axes labels---------------- >>> ? ?# I was a bit confused about how to handle indexing with mixed >>> axes/non-axes indexes >>> ? ?# IE: what does a['x',2:4] ?mean? ?on what axis is the 2:4 slice being >>> applied, the first? the first after 'x'? >>> ? ?# ? ? ? One option is to disallow mixing (simpler to implement, >>> understand?) >>> ? ?# ? ? ? Instead I chose to treat the axis indexing as a forced assignment >>> of an axis to a position. >>> >>> ? ?# axis indexing that transposes the first two dimensions, but otherwise >>> does nothing >>> ? ?b = a['y','z'] >>> ? ?assert b.shape == (5,4,6) >>> ? ?assert b.Axes.Label == {0:'y',1:'z',2:'x'} >>> >>> ? ?# abusing slices to allow specifying indexes for axes >>> ? ?b = a['y':0,'z'] >>> ? ?assert b.shape == (4,6) >>> ? ?assert b.Axes.Label == {0:'z',1:'x'} >>> >>> ? ?# unfortunately that means a slice index on an axis must be written like >>> so: >>> ? ?b = a['y':slice(0,2),'x','z'] >>> ? ?assert b.shape == (2,6,4) >>> ? ?assert b.Axes.Label == {0:'y',1:'x',2:'z'} >>> >>> ? ?b = a['y':[1,2,3],'x','z':slice(0:1)] >>> ? ?# or due to the forced transposition, this is the same as: >>> ? ?c = a['y','x','z'][[1,2,3],:,0:1] >>> >>> ? ?assert b.shape == (3,6,1) >>> ? ?assert b.Axes.Label == {0:'y',1:'x',2:'z'} >>> ? ?assert b.shape == c.shape >>> ? ?assert b.Axes == c.Axes >>> >>> >>> #----------------------------------------------------------------------------- >>> ----------- >>> >>> >>> To do all this I essentially had to recreate the effects of numpy indexing on >>> axes.... ?This is not ideal, but so far I seem to have addressed most of the >>> indexing I've used, at least. Here is what __getitem__ looks like: >>> >>> ? ?def __getitem__(self,idxs): >>> ? ? ? ?filtered_idxs,transposed_axes,kept_axes = self.idx_axes(idxs) >>> ? ? ? ?array = self.view(ndarray).transpose(transposed_axes) >>> ? ? ? ?array = array[filtered_idxs] >>> ? ? ? ?if isinstance(array,ndarray): >>> ? ? ? ? ? ?array = array.view(Array) >>> ? ? ? ? ? ?array.Axes = self.Axes.keep(kept_axes) >>> ? ? ? ?return array >>> >>> As you can see idx_axes() essentially recreates a lot of ndarray indexing >>> behavior, so that its effects can be explicitly handled. >>> >>> Having done all this, I think the best way for numpy to support 'labeled' axes >>> in the future is by having numpy itself keep track of a very simple tuple >>> attribute, like shape, and leave more complex axis naming/labeling to >>> subclasses on the python side. ?As an example, upon creating a new dimension >>> in an array, numpy assigns that dimension a semi-unique id, and this tuple >>> could be used in __array_finalize__. >>> >>> For example my __array_finalize__ could look like: >>> >>> def __array_finalize__(self,obj): >>> ? ?if hasattr(obj,'axesdata'): >>> ? ? ? ? for axesid in self.axes: >>> ? ? ? ? ? ? ?if axesid in obj.axes: >>> ? ? ? ? ? ? ? ? ? self.axesdata[axesid] = obj.axesdata[axesid] >>> >>> >>> This would cover a lot more situations and lead to much simpler code since the >>> work required on the C side would be minimal, but still allow robust and >>> custom tracking and propagation of axes information. >>> Subclasses that tap into this data would react to the result of numpy >>> operations vs. having to predict/anticipate. >>> >>> For example, my __getitem__, relying on the __array_finalize__ above, could >>> look like: >>> >>> ? ?def __getitem__(self,idxs): >>> ? ? ? ?filtered_idxs,transposed_axes= self.idx_axes(idxs) >>> ? ? ? ?array = self.transpose(transposed_axes) >>> ? ? ? ?return array[filtered_idxs] >>> >>> Not shown is how much simpler and robust the code for idx_axes would then be. >>> I estimate it would go from 130 loc to < 20 loc. >>> >>> Sorry for the extra long e-mail, >>> -Craig >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Have you guys been following the DataArray discussions or project at all? I think it provides a nearly complete implementation of what you've been describing (named axes): https://github.com/fperez/datarray (no joke: 21 forks!) some links: http://inscight.org/2011/05/18/episode_13/ https://convore.com/python-scientific-computing/data-array-in-numpy/ I'm excited that many more people seem to be excited about making this kind of functionality available in the scientific Python stack so understanding every perspective and set of requirements makes a big difference =) best, Wes From craigyk at me.com Wed Jul 13 23:48:28 2011 From: craigyk at me.com (Craig Yoshioka) Date: Wed, 13 Jul 2011 20:48:28 -0700 Subject: [Numpy-discussion] named ndarray axes In-Reply-To: References: <0C413075-3343-4C9B-8D17-AEE2A8E1593B@me.com> Message-ID: <1ECAC305-B400-465C-B418-03B66C055480@me.com> I did take a look at it. It looked way heavier than I needed or wanted, plus last time I looked it didn't support fancy indexing on axes... It does support indexing on 'ticks' though. There is a bit of wheel inventing going on, but I think that's OK, since things should be well worked out and experimented with before becoming lower level... I think my suggestion for adding an index of unique ids as a tuple, like shape, that is maintained though array manipulations is a good one though. It would make implementing any of these axes indexing attempts much easier and more robust. I'm sure the dataarray code could be greatly simplified by this addition to ndarray, just as mine would. This 'uniqueid' tuple could even be extended to make keeping track of ticks easier: On Jul 13, 2011, at 8:26 PM, Wes McKinney wrote: > Have you guys been following the DataArray discussions or project at > all? I think it provides a nearly complete implementation of what > you've been describing (named axes): > > https://github.com/fperez/datarray > > (no joke: 21 forks!) > > some links: > http://inscight.org/2011/05/18/episode_13/ > https://convore.com/python-scientific-computing/data-array-in-numpy/ > > I'm excited that many more people seem to be excited about making this > kind of functionality available in the scientific Python stack so > understanding every perspective and set of requirements makes a big > difference =) > > best, > Wes -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Thu Jul 14 02:56:29 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 13 Jul 2011 23:56:29 -0700 Subject: [Numpy-discussion] Can not compile documentation with python3.2 Message-ID: <26FC23E7C398A64083C980D16001012D246CA2F9E2@VA3DIAXVS361.RED001.local> I installed numpy-1.6.1-rc3 on python3.2, and used the python3 sphinx port (version 1.1pre) to compile the documentation and got this error: ==================================================================================================== nadav at nadav /dev/shm/numpy-1.6.1rc3/doc $ make latex mkdir -p build touch build/generate-stamp mkdir -p build/latex build/doctrees LANG=C sphinx-build -b latex -d build/doctrees source build/latex Running Sphinx v1.1pre 1.6rc3 1.6.1rc3 Exception occurred: File "/usr/lib64/python3.2/site-packages/Sphinx-1.1predev_20110713-py3.2.egg/sphinx/application.py", line 247, in setup_extension mod = __import__(extension, None, None, ['setup']) File "/dev/shm/numpy-1.6.1rc3/doc/sphinxext/numpydoc.py", line 37 title_re = re.compile(ur'^\s*[#*=]{4,}\n[a-z0-9 -]+\n[#*=]{4,}\s*', ^ SyntaxError: invalid syntax ==================================================================================================== Platform: 64bit gentoo linux Nadav. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwlodarczak at uni-bielefeld.de Thu Jul 14 05:09:43 2011 From: mwlodarczak at uni-bielefeld.de (Marcin Wlodarczak) Date: Thu, 14 Jul 2011 11:09:43 +0200 Subject: [Numpy-discussion] Masking entries in structured arrays Message-ID: <16604_1310634592_ZZh0v3U7Mkh35.00_4E1EB257.6090906@uni-bielefeld.de> Hi, I was wondering whether it is possible to mask specific entries in a structured array. If I try to do the following: x = ma.masked_array([(2, 1.), (8, 2.)], dtype=[('a',int), ('b', float)]) x_masked = ma.masked_equal(x, 2) I get "AttributeError: 'NotImplementedType' object has no attribute 'ndim'", which actually makes sense since x.shape returns (2,). I really can't think of any way around this problem. Best regards, Marcin From cjordan1 at uw.edu Thu Jul 14 06:36:49 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Thu, 14 Jul 2011 05:36:49 -0500 Subject: [Numpy-discussion] Masking entries in structured arrays In-Reply-To: <16604_1310634592_ZZh0v3U7Mkh35.00_4E1EB257.6090906@uni-bielefeld.de> References: <16604_1310634592_ZZh0v3U7Mkh35.00_4E1EB257.6090906@uni-bielefeld.de> Message-ID: On Thu, Jul 14, 2011 at 4:09 AM, Marcin Wlodarczak < mwlodarczak at uni-bielefeld.de> wrote: > > Hi, > > I was wondering whether it is possible to mask specific entries in a > structured array. If I try to do the following: > > x = ma.masked_array([(2, 1.), (8, 2.)], dtype=[('a',int), ('b', float)]) > x_masked = ma.masked_equal(x, 2) > > I get "AttributeError: 'NotImplementedType' object has no attribute > 'ndim'", which actually makes sense since x.shape returns (2,). I really > can't think of any way around this problem. > > It's not terribly satisfying, but you can iterate over the field names. for field in x.dtype.names: x[field] = np.ma.masked_equal(x[field],2) -Chris Jordan-Squire > Best regards, > Marcin _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Jul 14 16:43:45 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 14 Jul 2011 15:43:45 -0500 Subject: [Numpy-discussion] type-casting differences for comparisons Message-ID: I just came across a real-head scratcher that took me a bit to figure out. I don't know if it counts as a bug or not. I have an array with dtype "f4" and a separate python float. Some elements of this array gets assigned this numpy float64 scalar value. (I know, I should be better off with a mask, but bear with me, this is just a demonstration code to isolate the core problem from a much more complicated program...) import numpy as np a = np.empty((500,500), dtype='f4') a[:] = np.random.random(a.shape) bad_val = 10*a.max() b = np.where(a > 0.8, bad_val, a) Now, the following seems to always evaluate to False, as expected: >>> np.any(b > bad_val) but, if I am (un-)lucky enough, this will sometimes evaluate to True: >>> any([(c > bad_val) for c in b.flat]) What it seems to me is that for the first comparison test, bad_val is casted down to float32 (or maybe b is casted up to float64?), but for the second example, the opposite is true. This can lead to some unexpected behaviors. Is there some sort of difference between type-casting of numpy scalars and numpy arrays? I would expect both to be the same. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Jul 14 17:15:12 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 14 Jul 2011 16:15:12 -0500 Subject: [Numpy-discussion] type-casting differences for comparisons In-Reply-To: References: Message-ID: On Thu, Jul 14, 2011 at 15:43, Benjamin Root wrote: > I just came across a real-head scratcher that took me a bit to figure out. I > don't know if it counts as a bug or not. > > I have an array with dtype "f4" and a separate python float. Some elements > of this array gets assigned this numpy float64 scalar value. (I know, I > should be better off with a mask, but bear with me, this is just a > demonstration code to isolate the core problem from a much more complicated > program...) > > import numpy as np > > a = np.empty((500,500), dtype='f4') > a[:] = np.random.random(a.shape) > bad_val = 10*a.max() > > b = np.where(a > 0.8, bad_val, a) > > Now, the following seems to always evaluate to False, as expected: > >>>> np.any(b > bad_val) > > but, if I am (un-)lucky enough, this will sometimes evaluate to True: > >>>> any([(c > bad_val) for c in b.flat]) > > What it seems to me is that for the first comparison test, bad_val is casted > down to float32 (or maybe b is casted up to float64?), but for the second > example, the opposite is true.? This can lead to some unexpected behaviors. > Is there some sort of difference between type-casting of numpy scalars and > numpy arrays?? I would expect both to be the same. Remember, the rule for ufuncs is that when the operation is array-scalar, the array dtype wins (barring cross-kind types which aren't relevant here). For array-array and scalar-scalar, the "largest" dtype wins. So for the first case, array-scalar, bad_val gets downcasted to float32. For the latter case, bad_val remains float64 and upcasts c to float64. Try this: bad_val = np.float32(10) * a.max() -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From Chris.Barker at noaa.gov Thu Jul 14 21:56:33 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu, 14 Jul 2011 18:56:33 -0700 Subject: [Numpy-discussion] Build error on Windows Message-ID: <4E1F9E51.9080702@noaa.gov> Hi folks, I'm trying to build numpy HEAD on Windows in preparation for the SciPy sprints tomorrow. I've never built numpy on Windows, and I'm new to git, so I could be doing any number of things wrong. I think I have the latest code: C:\Documents and Settings\Chris\My Documents\SciPy\numpy_git\numpy>git log -1 commit 6fdfd9c070ce943415d75780702a22f4bbd8f837 Author: Ben Walsh Date: Tue Jul 12 14:52:01 2011 +0100 I tried to build it with a simple: python setup.py build_ext --inplace and got: C:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\link.exe /DLL /nologo /INCRE MENTAL:NO /LIBPATH:c:\python27\libs /LIBPATH:c:\python27\PCbuild /LIBPATH:build\ temp.win32-2.7 /EXPORT:init_dummy build\temp.win32-2.7\Release\numpy\core\src\du mmymodule.obj /OUT:numpy\core\_dummy.pyd /IMPLIB:build\temp.win32-2.7\Release\nu mpy\core\src\_dummy.lib /MANIFESTFILE:build\temp.win32-2.7\Release\numpy\core\sr c\_dummy.pyd.manifest LINK : error LNK2001: unresolved external symbol init_dummy build\temp.win32-2.7\Release\numpy\core\src\_dummy.lib : fatal error LNK1120: 1 unresolved externals error: Command "C:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\link.exe /DL L /nologo /INCREMENTAL:NO /LIBPATH:c:\python27\libs /LIBPATH:c:\python27\PCbuild /LIBPATH:build\temp.win32-2.7 /EXPORT:init_dummy build\temp.win32-2.7\Release\n umpy\core\src\dummymodule.obj /OUT:numpy\core\_dummy.pyd /IMPLIB:build\temp.win3 2-2.7\Release\numpy\core\src\_dummy.lib /MANIFESTFILE:build\temp.win32-2.7\Relea se\numpy\core\src\_dummy.pyd.manifest" failed with exit status 1120 Python 2.7.2, python.org build (32 bit) Visual Studio express 2008 That should work, yes? Thanks for any hints. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From cgohlke at uci.edu Thu Jul 14 23:04:39 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 14 Jul 2011 20:04:39 -0700 Subject: [Numpy-discussion] Build error on Windows In-Reply-To: <4E1F9E51.9080702@noaa.gov> References: <4E1F9E51.9080702@noaa.gov> Message-ID: <4E1FAE47.2020207@uci.edu> On 7/14/2011 6:56 PM, Chris Barker wrote: > Hi folks, > > I'm trying to build numpy HEAD on Windows in preparation for the SciPy > sprints tomorrow. I've never built numpy on Windows, and I'm new to git, > so I could be doing any number of things wrong. > > I think I have the latest code: > > C:\Documents and Settings\Chris\My Documents\SciPy\numpy_git\numpy>git > log -1 > commit 6fdfd9c070ce943415d75780702a22f4bbd8f837 > Author: Ben Walsh > Date: Tue Jul 12 14:52:01 2011 +0100 > > I tried to build it with a simple: > > python setup.py build_ext --inplace > > and got: > > C:\Program Files\Microsoft Visual Studio 9.0\VC\BIN\link.exe /DLL > /nologo /INCRE > MENTAL:NO /LIBPATH:c:\python27\libs /LIBPATH:c:\python27\PCbuild > /LIBPATH:build\ > temp.win32-2.7 /EXPORT:init_dummy > build\temp.win32-2.7\Release\numpy\core\src\du > mmymodule.obj /OUT:numpy\core\_dummy.pyd > /IMPLIB:build\temp.win32-2.7\Release\nu > mpy\core\src\_dummy.lib > /MANIFESTFILE:build\temp.win32-2.7\Release\numpy\core\sr > c\_dummy.pyd.manifest > LINK : error LNK2001: unresolved external symbol init_dummy > build\temp.win32-2.7\Release\numpy\core\src\_dummy.lib : fatal error > LNK1120: 1 > unresolved externals > error: Command "C:\Program Files\Microsoft Visual Studio > 9.0\VC\BIN\link.exe /DL > L /nologo /INCREMENTAL:NO /LIBPATH:c:\python27\libs > /LIBPATH:c:\python27\PCbuild > /LIBPATH:build\temp.win32-2.7 /EXPORT:init_dummy > build\temp.win32-2.7\Release\n > umpy\core\src\dummymodule.obj /OUT:numpy\core\_dummy.pyd > /IMPLIB:build\temp.win3 > 2-2.7\Release\numpy\core\src\_dummy.lib > /MANIFESTFILE:build\temp.win32-2.7\Relea > se\numpy\core\src\_dummy.pyd.manifest" failed with exit status 1120 > > Python 2.7.2, python.org build (32 bit) > Visual Studio express 2008 > > That should work, yes? > > Thanks for any hints. > > -Chris > Hi Chris, the build should work but it is probably better to install numpy in the site-packages directory (and be prepared for crashes and test failures). A patch for the build issues is attached. Remove the build directory before rebuilding. Christoph -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: msvc9.diff URL: From ralf.gommers at googlemail.com Fri Jul 15 02:55:38 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 15 Jul 2011 08:55:38 +0200 Subject: [Numpy-discussion] Fwd: [Enthought-Dev] Linking issues with libX11.so In-Reply-To: References: Message-ID: On Wed, Jul 13, 2011 at 11:10 PM, Ilan Schnell wrote: > Hello List, > > Varun, who is a debian packager ran into some problems while > compiling Enable, as it uses numpy.distutils, which did not locate > the location of the X11 libraries correctly. Maybe this can be fixed > in the numpy 1.6.1 release. It's a bit late for that, since we're at 1.6.1rc3, this is not a patch and I'm not even sure it should be fixed in numpy.distutils. Right now the searched dirs are ['/usr/X11R6/lib', '/usr/X11/lib', '/usr/lib'] or its 64-bit equivalents. See around line 200 in system_info.py. Is your proposal to add any arch-dependent paths that Debian can think of to that? Leaving it up to the packager to specify the correct path in site.cfg may be cleaner. Or just apply the patch you received to the ETS Agg build. Cheers, Ralf Please the the forwarded conversation > below. > > Thanks Ilan > > > ---------- Forwarded message ---------- > From: Varun Hiremath > Date: Wed, Jul 13, 2011 at 4:03 PM > Subject: [Enthought-Dev] Linking issues with libX11.so > To: enthought-dev at enthought.com > > > Hi Ilan, > > You were right, the issue was with the X11 library. The > _plat_support.so was not linked to libX11 and so the chaco examples > were failing; and the reason libX11 was not linked was because numpy > distutils' x11_info was failing. > > I figured out that in debian/ubuntu with a new multi_arch build > support [1] the libraries are being moved from the standard /usr/lib > and /usr/lib64 directories to architecture specific directories like: > > /usr/lib/i386-linux-gnu/ > /usr/lib/x86_64-linux-gnu/ > > and so the numpy.distutil was failing to find libX11 with the latest > version of libX11-dev on debian which installs libX11.so in > /usr/lib/x86_64-linux-gnu/ (on my amd64 system). The nump.distutils' > scripts need to be updated to handle this, but for now I am using the > following patch to force _plat_support.so link with X11 (which I think > is always present in the default search path): > > --------------------------- > @@ -230,6 +144,7 @@ > elif plat in ['x11','gtk1']: > x11_info = get_info('x11', notfound_action=1) > dict_append(plat_info, **x11_info) > + dict_append(plat_info, libraries = ['X11']) > > --------------------------- > > With this everything seems to be working fine! > > Thanks, > Varun > > [1] https://wiki.ubuntu.com/MultiarchSpec > > On Mon, Jul 11, 2011 at 10:53 PM, Ilan Schnell > wrote: > > Hello Varun, > > > > the important part is: _plat_support.so: undefined symbol: XCreateImage > > This indicates that the kiva/agg/_plat_support.so C extension was > > not linked to X11 while compiling. Until > > > https://github.com/enthought/enable/commit/ebecdbfc5c4596282204e61ff687c3ab2442947a > > which was made shortly after the release, it was easy to create a broken > > Enable build like this one. Note that this commit does *not* fix the > problem, > > it only causes the build to fail right away, instead of creating a broken > build. > > This was added because of the famous esr quote: > > "When you must fail, fail noisily and as soon as possible." > > > > As enable uses numpy.distutils to build agg, the fix is to edit: > > /lib/python2.6/site-packages/numpy/distutils/site.cfg > > > > and add: > > [x11] > > library_dirs = ... > > include_dirs = ... > > > > - Ilan > > > > > > On Mon, Jul 11, 2011 at 9:23 PM, Varun Hiremath > wrote: > >> Hi all, > >> > >> I am facing another issue running chaco examples with the new ETS 4.0 > >> packages. I am getting the following error when I run any chaco > >> example: > >> -------------------------- > >> $$ python zoom_plot.py > >> /usr/lib/python2.6/dist-packages/enable/wx/image.py:16: Warning: Error > initializing Agg: > /usr/lib/python2.6/dist-packages/kiva/agg/_plat_support.so: undefined > symbol: XCreateImage > >> from kiva.agg import CompiledPath, GraphicsContextSystem as > GraphicsContext > >> Traceback (most recent call last): > >> File "zoom_plot.py", line 15, in > >> from enable.api import Component, ComponentEditor > >> File "/usr/lib/python2.6/dist-packages/enable/api.py", line 42, in > > >> from graphics_context import GraphicsContextEnable, > ImageGraphicsContextEnable > >> File "/usr/lib/python2.6/dist-packages/enable/graphics_context.py", > line 86, in > >> class GraphicsContextEnable(EnableGCMixin, GraphicsContext): > >> TypeError: Error when calling the metaclass bases > >> metaclass conflict: the metaclass of a derived class must be a > (non-strict) subclass of the metaclasses of all its bases > >> ------------------------- > >> > >> Does anybody know what could be the problem? > >> > >> Thanks, > >> Varun > >> > >> p.s. Most of the ETS 4.0 debian packages are now available in debian > unstable. > >> > >> > >> On Sat, 09 Jul, 2011 at 12:45:32PM -0500, Ilan Schnell wrote: > >>> I'm glad it worked. That's a good idea, I'll release traitsui-4.0.1 > >>> later today. > >>> > >>> - Ilan > >>> > >>> > >>> On Sat, Jul 9, 2011 at 11:20 AM, Varun Hiremath < > varunhiremath at gmail.com> wrote: > >>> > Hi Ilan, > >>> > > >>> > Thanks, that worked! Are you planning on doing a point release for > >>> > traitsui to fix this bug? It would make packaging easier then. > >>> > > >>> > Thanks, > >>> > Varun > >>> > > >>> > On Sat, 09 Jul, 2011 at 11:11:26AM -0500, Ilan Schnell wrote: > >>> >> Hello Varun, > >>> >> > >>> >> I ran into the same bug when preparing the EPD 7.1 release. > >>> >> The fix is commited to the github master of traitsui: > >>> >> > https://github.com/enthought/traitsui/commit/4f36a8a27cfa131347dd90d1a8e10a37358cf634 > >>> >> > >>> >> Just replace the two zip-files with the fixed ones, and it should > work. > >>> >> > >>> >> - Ilan > >>> >> > >>> >> > >>> >> On Sat, Jul 9, 2011 at 10:27 AM, Varun Hiremath < > varunhiremath at gmail.com> wrote: > >>> >> > Hi, > >>> >> > > >>> >> > I was trying to update the debian packages to the new ETS 4.0 > release, > >>> >> > but I am having some trouble getting mayavi2 running. I get the > error > >>> >> > shown below when I run mayavi2. Could someone please let me know > what > >>> >> > might be causing this error? > >>> >> > > >>> >> > Thanks, > >>> >> > Varun > >>> >> > > >>> >> > ----------------- > >>> >> > $ mayavi2 > >>> >> > Traceback (most recent call last): > >>> >> > File "/usr/bin/mayavi2", line 658, in > >>> >> > main() > >>> >> > File "/usr/bin/mayavi2", line 649, in main > >>> >> > mayavi.main(sys.argv[1:]) > >>> >> > File "/usr/lib/python2.6/dist-packages/mayavi/plugins/app.py", > line 195, in main > >>> >> > app.run() > >>> >> > File > "/usr/lib/python2.6/dist-packages/mayavi/plugins/mayavi_workbench_application.py", > line 81, in run > >>> >> > window.open() > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/workbench/workbench_window.py", > line 144, in open > >>> >> > self._create() > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/ui/wx/application_window.py", line > 150, in _create > >>> >> > contents = self._create_contents(body) > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/workbench/workbench_window.py", > line 217, in _create_contents > >>> >> > contents = self.layout.create_initial_layout(parent) > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/ui/wx/workbench/workbench_window_layout.py", > line 151, in create_initial_layout > >>> >> > self._wx_view_dock_window = WorkbenchDockWindow(parent) > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/dock/dock_window.py", line 324, in > __init__ > >>> >> > if self.theme.use_theme_color: > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/dock/dock_window.py", line 335, in > _theme_default > >>> >> > return dock_window_theme() > >>> >> > File > "/usr/lib/python2.6/dist-packages/traitsui/dock_window_theme.py", line 92, > in dock_window_theme > >>> >> > from .default_dock_window_theme import > default_dock_window_theme > >>> >> > File > "/usr/lib/python2.6/dist-packages/traitsui/default_dock_window_theme.py", > line 39, in > >>> >> > label = ( 0, -3 ), content = ( 7, 6, 0, 0 ) ), > >>> >> > File "/usr/lib/python2.6/dist-packages/traitsui/theme.py", line > 63, in __init__ > >>> >> > self.image = image > >>> >> > File "/usr/lib/python2.6/dist-packages/traitsui/ui_traits.py", > line 229, in validate > >>> >> > self.error( object, name, value ) > >>> >> > File "/usr/lib/python2.6/dist-packages/traits/trait_handlers.py", > line 168, in error > >>> >> > value ) > >>> >> > traits.trait_errors.TraitError: The 'image' trait of a Theme > instance must be an ImageResource or string that can be used to define one, > but a value of '@std:tab_active' was specified. > >>> >> > Exception in thread Thread-1: > >>> >> > Traceback (most recent call last): > >>> >> > File "/usr/lib/python2.6/threading.py", line 532, in > __bootstrap_inner > >>> >> > self.run() > >>> >> > File "/usr/lib/python2.6/threading.py", line 484, in run > >>> >> > self.__target(*self.__args, **self.__kwargs) > >>> >> > File "/usr/lib/python2.6/dist-packages/traitsui/image/image.py", > line 329, in _process > >>> >> > if time() > (self.time_stamp + 2.0): > >>> >> > TypeError: 'NoneType' object is not callable > >>> >> > --------------- > >>> >> > _______________________________________________ > >>> >> > Enthought-Dev mailing list > >>> >> > Enthought-Dev at mail.enthought.com > >>> >> > https://mail.enthought.com/mailman/listinfo/enthought-dev > >>> >> > > >>> >> _______________________________________________ > >>> >> Enthought-Dev mailing list > >>> >> Enthought-Dev at mail.enthought.com > >>> >> https://mail.enthought.com/mailman/listinfo/enthought-dev > >>> > _______________________________________________ > >>> > Enthought-Dev mailing list > >>> > Enthought-Dev at mail.enthought.com > >>> > https://mail.enthought.com/mailman/listinfo/enthought-dev > >>> > > >>> _______________________________________________ > >>> Enthought-Dev mailing list > >>> Enthought-Dev at mail.enthought.com > >>> https://mail.enthought.com/mailman/listinfo/enthought-dev > >> _______________________________________________ > >> Enthought-Dev mailing list > >> Enthought-Dev at mail.enthought.com > >> https://mail.enthought.com/mailman/listinfo/enthought-dev > >> > > _______________________________________________ > > Enthought-Dev mailing list > > Enthought-Dev at mail.enthought.com > > https://mail.enthought.com/mailman/listinfo/enthought-dev > > > _______________________________________________ > Enthought-Dev mailing list > Enthought-Dev at mail.enthought.com > https://mail.enthought.com/mailman/listinfo/enthought-dev > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Jul 15 03:14:45 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 15 Jul 2011 09:14:45 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 3 In-Reply-To: References: <4E1B674D.90707@uci.edu> Message-ID: On Mon, Jul 11, 2011 at 11:31 PM, Robert Kern wrote: > On Mon, Jul 11, 2011 at 16:12, Christoph Gohlke wrote: > > > > I tested rc3. It looks good, except that on win-amd64 whenever numpy is > > imported, a 'Forcing DISTUTILS_USE_SDK=1' is printed from line 377 in > > misc_util.py. Hence some tests of other packages fail. > > > > This is due to a recent change: > > < > https://github.com/numpy/numpy/commit/025c8c77bb1e633ea6e8a0cb929528b1fbe85efc > > > > > > Now every time numpy is imported, numpy.distutils is also imported. Is > > this necessary or can the import of distutils be deferred? > > The get_shared_lib_extension() call could be deferred to inside > load_library() in ctypeslib.py, yes. > > This is fixed now. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Jul 15 04:13:15 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 15 Jul 2011 10:13:15 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: <8D5FEB83-7BEC-4AB6-B63A-58E25F80372E@astro.physik.uni-goettingen.de> References: <4E15BE8C.1090202@gmail.com> <2486ACC5-C5BA-42C3-9E70-3168FE4D7A48@post.harvard.edu> <8D5FEB83-7BEC-4AB6-B63A-58E25F80372E@astro.physik.uni-goettingen.de> Message-ID: On Fri, Jul 8, 2011 at 4:17 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 07.07.2011, at 7:16PM, Robert Pyle wrote: > > > > .............../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1922: > RuntimeWarning: invalid value encountered in absolute > > return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) > > > > > > Everything else completes with 3 KNOWNFAILs and 1 SKIP. This warning is > not new to this release; I've seen it before but haven't tried tracking it > down until today. > > > > It arises in allclose(). The comments state "If either array contains > NaN, then False is returned." but no test for NaN is done, and NaNs are > indeed what cause the warning. > > > > Inserting > > > > if any(isnan(x)) or any(isnan(y)): > > return False > > > > before current line number 1916 in numeric.py seems to fix it. > > The same warning is still present in the current master, I just never paid > attention to it because the tests still pass (it does correctly identify > NaNs because they are not less_equal the tolerance), but of course this > should be properly fixed as you suggest. > > Under Python 2.6 I used to see this but it has disappeared. What's going on here? $ python2.7 >>> from numpy import * >>> absolute(nan) __main__:1: RuntimeWarning: invalid value encountered in absolute nan $ python2.6 >>> from numpy import * >>> absolute(nan) nan Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwlodarczak at uni-bielefeld.de Fri Jul 15 05:28:42 2011 From: mwlodarczak at uni-bielefeld.de (Marcin Wlodarczak) Date: Fri, 15 Jul 2011 11:28:42 +0200 Subject: [Numpy-discussion] Masking entries in structured arrays In-Reply-To: References: <16604_1310634592_ZZh0v3U7Mkh35.00_4E1EB257.6090906@uni-bielefeld.de> Message-ID: <12541_1310722111_ZZh0w30L8UdEz.00_4E20084A.7020702@uni-bielefeld.de> On 14.07.2011 12:36, Christopher Jordan-Squire wrote: >> > I was wondering whether it is possible to mask specific entries in a >> > structured array. If I try to do the following: >> > >> > x = ma.masked_array([(2, 1.), (8, 2.)], dtype=[('a',int), ('b', float)]) >> > x_masked = ma.masked_equal(x, 2) >> > >> > I get "AttributeError: 'NotImplementedType' object has no attribute >> > 'ndim'", which actually makes sense since x.shape returns (2,). I really >> > can't think of any way around this problem. >> > >> > > It's not terribly satisfying, but you can iterate over the field names. > for field in x.dtype.names: > x[field] = np.ma.masked_equal(x[field],2) Thanks. It's really a pity it's not possible to use masked_equal directly though. Best, M. From cournape at gmail.com Fri Jul 15 09:08:49 2011 From: cournape at gmail.com (David Cournapeau) Date: Fri, 15 Jul 2011 22:08:49 +0900 Subject: [Numpy-discussion] [ANN] Bento 0.0.6, a packaging solution for python software Message-ID: Hi, I am pleased to announce a new release of bento, a packaging solution for python which aims at reproducibility, extensibility and simplicity. It supports every python version from 2.4 to 3.2. You can take a look at its main features on Bento's main page (http://cournape.github.com/Bento). The main features of this 0.0.6 release are: - Completely revamped distutils compatibility layer: it is now a thin layer around bento infrastructure, so that most bento packages should be pip-installable, while still keeping bento customization capabilities. - Build directory is now customizable through bentomaker with --build-directory option - Out of tree builds support (i.e. running bento in a directory which does not contain bento.info), with global --bento-info option - Hook File can now be specified in recursed bento.info - Preliminary support for .mpkg (Mac OS X native packaging) - More consistent API for extension/compiled library build registration - Both numpy and scipy can now be built with bento + waf as a build backend Bento is discussed on the bento mailing list (http://librelist.com/browser/bento). cheers, David From bsouthey at gmail.com Fri Jul 15 09:20:55 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 15 Jul 2011 08:20:55 -0500 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release candidate 2 In-Reply-To: References: <4E15BE8C.1090202@gmail.com> <2486ACC5-C5BA-42C3-9E70-3168FE4D7A48@post.harvard.edu> <8D5FEB83-7BEC-4AB6-B63A-58E25F80372E@astro.physik.uni-goettingen.de> Message-ID: <4E203EB7.5070701@gmail.com> On 07/15/2011 03:13 AM, Ralf Gommers wrote: > > > On Fri, Jul 8, 2011 at 4:17 PM, Derek Homeier > > wrote: > > On 07.07.2011, at 7:16PM, Robert Pyle wrote: > > > > .............../Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/numeric.py:1922: > RuntimeWarning: invalid value encountered in absolute > > return all(less_equal(absolute(x-y), atol + rtol * absolute(y))) > > > > > > Everything else completes with 3 KNOWNFAILs and 1 SKIP. This > warning is not new to this release; I've seen it before but > haven't tried tracking it down until today. > > > > It arises in allclose(). The comments state "If either array > contains NaN, then False is returned." but no test for NaN is > done, and NaNs are indeed what cause the warning. > > > > Inserting > > > > if any(isnan(x)) or any(isnan(y)): > > return False > > > > before current line number 1916 in numeric.py seems to fix it. > > The same warning is still present in the current master, I just > never paid attention to it because the tests still pass (it does > correctly identify NaNs because they are not less_equal the > tolerance), but of course this should be properly fixed as you > suggest. > > Under Python 2.6 I used to see this but it has disappeared. What's > going on here? > > $ python2.7 > >>> from numpy import * > >>> absolute(nan) > __main__:1: RuntimeWarning: invalid value encountered in absolute > nan > > $ python2.6 > >>> from numpy import * > >>> absolute(nan) > nan > > Ralf I do not see this with 64-bit Python2.7 on my Linux system. So perhaps Mac specific? By the way, all tests pass with Python 2.7 and 3.2 for rc3. Bruce $ python Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.__version__ '1.6.1rc3' >>> np.absolute(np.nan) nan $ python Python 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import * >>> absolute(nan) nan >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Fri Jul 15 10:31:58 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 15 Jul 2011 09:31:58 -0500 Subject: [Numpy-discussion] Fwd: [Enthought-Dev] Linking issues with libX11.so In-Reply-To: References: Message-ID: <4E204F5E.6080103@gmail.com> On 07/15/2011 01:55 AM, Ralf Gommers wrote: > > > On Wed, Jul 13, 2011 at 11:10 PM, Ilan Schnell > wrote: > > Hello List, > > Varun, who is a debian packager ran into some problems while > compiling Enable, as it uses numpy.distutils, which did not locate > the location of the X11 libraries correctly. Maybe this can be fixed > in the numpy 1.6.1 release. > > > It's a bit late for that, since we're at 1.6.1rc3, this is not a patch > and I'm not even sure it should be fixed in numpy.distutils. > > Right now the searched dirs are ['/usr/X11R6/lib', '/usr/X11/lib', > '/usr/lib'] or its 64-bit equivalents. See around line 200 in > system_info.py. Is your proposal to add any arch-dependent paths that > Debian can think of to that? > > Leaving it up to the packager to specify the correct path in site.cfg > may be cleaner. Or just apply the patch you received to the ETS Agg build. > > Cheers, > Ralf I did not think that numpy uses X11 so this is just some system output that may be useful for the user. Yes, a ticket could be filed but Debian's multarch support is so 'new' (announcement thread: http://thread.gmane.org/gmane.linux.debian.devel.announce/1609). I think it is is premature to address multarch support yet because it does require a specific version of numpy only using the correct libraries when the user has multiple architecture versions installed. That will also be a problem here because just adding the paths alone is probably insufficient for the application to find the correct libraries. Bruce > > > Please the the forwarded conversation > below. > > Thanks Ilan > > > ---------- Forwarded message ---------- > From: Varun Hiremath > > Date: Wed, Jul 13, 2011 at 4:03 PM > Subject: [Enthought-Dev] Linking issues with libX11.so > To: enthought-dev at enthought.com > > > Hi Ilan, > > You were right, the issue was with the X11 library. The > _plat_support.so was not linked to libX11 and so the chaco examples > were failing; and the reason libX11 was not linked was because numpy > distutils' x11_info was failing. > > I figured out that in debian/ubuntu with a new multi_arch build > support [1] the libraries are being moved from the standard /usr/lib > and /usr/lib64 directories to architecture specific directories like: > > /usr/lib/i386-linux-gnu/ > /usr/lib/x86_64-linux-gnu/ > > and so the numpy.distutil was failing to find libX11 with the latest > version of libX11-dev on debian which installs libX11.so in > /usr/lib/x86_64-linux-gnu/ (on my amd64 system). The nump.distutils' > scripts need to be updated to handle this, but for now I am using the > following patch to force _plat_support.so link with X11 (which I think > is always present in the default search path): > > --------------------------- > @@ -230,6 +144,7 @@ > elif plat in ['x11','gtk1']: > x11_info = get_info('x11', notfound_action=1) > dict_append(plat_info, **x11_info) > + dict_append(plat_info, libraries = ['X11']) > > --------------------------- > > With this everything seems to be working fine! > > Thanks, > Varun > > [1] https://wiki.ubuntu.com/MultiarchSpec > > On Mon, Jul 11, 2011 at 10:53 PM, Ilan Schnell > > wrote: > > Hello Varun, > > > > the important part is: _plat_support.so: undefined symbol: > XCreateImage > > This indicates that the kiva/agg/_plat_support.so C extension was > > not linked to X11 while compiling. Until > > > https://github.com/enthought/enable/commit/ebecdbfc5c4596282204e61ff687c3ab2442947a > > which was made shortly after the release, it was easy to create > a broken > > Enable build like this one. Note that this commit does *not* > fix the problem, > > it only causes the build to fail right away, instead of creating > a broken build. > > This was added because of the famous esr quote: > > "When you must fail, fail noisily and as soon as possible." > > > > As enable uses numpy.distutils to build agg, the fix is to edit: > > /lib/python2.6/site-packages/numpy/distutils/site.cfg > > > > and add: > > [x11] > > library_dirs = ... > > include_dirs = ... > > > > - Ilan > > > > > > On Mon, Jul 11, 2011 at 9:23 PM, Varun Hiremath > > wrote: > >> Hi all, > >> > >> I am facing another issue running chaco examples with the new > ETS 4.0 > >> packages. I am getting the following error when I run any chaco > >> example: > >> -------------------------- > >> $$ python zoom_plot.py > >> /usr/lib/python2.6/dist-packages/enable/wx/image.py:16: > Warning: Error initializing Agg: > /usr/lib/python2.6/dist-packages/kiva/agg/_plat_support.so: > undefined symbol: XCreateImage > >> from kiva.agg import CompiledPath, GraphicsContextSystem as > GraphicsContext > >> Traceback (most recent call last): > >> File "zoom_plot.py", line 15, in > >> from enable.api import Component, ComponentEditor > >> File "/usr/lib/python2.6/dist-packages/enable/api.py", line > 42, in > >> from graphics_context import GraphicsContextEnable, > ImageGraphicsContextEnable > >> File > "/usr/lib/python2.6/dist-packages/enable/graphics_context.py", > line 86, in > >> class GraphicsContextEnable(EnableGCMixin, GraphicsContext): > >> TypeError: Error when calling the metaclass bases > >> metaclass conflict: the metaclass of a derived class must be > a (non-strict) subclass of the metaclasses of all its bases > >> ------------------------- > >> > >> Does anybody know what could be the problem? > >> > >> Thanks, > >> Varun > >> > >> p.s. Most of the ETS 4.0 debian packages are now available in > debian unstable. > >> > >> > >> On Sat, 09 Jul, 2011 at 12:45:32PM -0500, Ilan Schnell wrote: > >>> I'm glad it worked. That's a good idea, I'll release > traitsui-4.0.1 > >>> later today. > >>> > >>> - Ilan > >>> > >>> > >>> On Sat, Jul 9, 2011 at 11:20 AM, Varun Hiremath > > wrote: > >>> > Hi Ilan, > >>> > > >>> > Thanks, that worked! Are you planning on doing a point > release for > >>> > traitsui to fix this bug? It would make packaging easier then. > >>> > > >>> > Thanks, > >>> > Varun > >>> > > >>> > On Sat, 09 Jul, 2011 at 11:11:26AM -0500, Ilan Schnell wrote: > >>> >> Hello Varun, > >>> >> > >>> >> I ran into the same bug when preparing the EPD 7.1 release. > >>> >> The fix is commited to the github master of traitsui: > >>> >> > https://github.com/enthought/traitsui/commit/4f36a8a27cfa131347dd90d1a8e10a37358cf634 > >>> >> > >>> >> Just replace the two zip-files with the fixed ones, and it > should work. > >>> >> > >>> >> - Ilan > >>> >> > >>> >> > >>> >> On Sat, Jul 9, 2011 at 10:27 AM, Varun Hiremath > > wrote: > >>> >> > Hi, > >>> >> > > >>> >> > I was trying to update the debian packages to the new ETS > 4.0 release, > >>> >> > but I am having some trouble getting mayavi2 running. I > get the error > >>> >> > shown below when I run mayavi2. Could someone please let > me know what > >>> >> > might be causing this error? > >>> >> > > >>> >> > Thanks, > >>> >> > Varun > >>> >> > > >>> >> > ----------------- > >>> >> > $ mayavi2 > >>> >> > Traceback (most recent call last): > >>> >> > File "/usr/bin/mayavi2", line 658, in > >>> >> > main() > >>> >> > File "/usr/bin/mayavi2", line 649, in main > >>> >> > mayavi.main(sys.argv[1:]) > >>> >> > File > "/usr/lib/python2.6/dist-packages/mayavi/plugins/app.py", line > 195, in main > >>> >> > app.run() > >>> >> > File > "/usr/lib/python2.6/dist-packages/mayavi/plugins/mayavi_workbench_application.py", > line 81, in run > >>> >> > window.open() > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/workbench/workbench_window.py", > line 144, in open > >>> >> > self._create() > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/ui/wx/application_window.py", > line 150, in _create > >>> >> > contents = self._create_contents(body) > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/workbench/workbench_window.py", > line 217, in _create_contents > >>> >> > contents = self.layout.create_initial_layout(parent) > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/ui/wx/workbench/workbench_window_layout.py", > line 151, in create_initial_layout > >>> >> > self._wx_view_dock_window = WorkbenchDockWindow(parent) > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/dock/dock_window.py", > line 324, in __init__ > >>> >> > if self.theme.use_theme_color: > >>> >> > File > "/usr/lib/python2.6/dist-packages/pyface/dock/dock_window.py", > line 335, in _theme_default > >>> >> > return dock_window_theme() > >>> >> > File > "/usr/lib/python2.6/dist-packages/traitsui/dock_window_theme.py", > line 92, in dock_window_theme > >>> >> > from .default_dock_window_theme import > default_dock_window_theme > >>> >> > File > "/usr/lib/python2.6/dist-packages/traitsui/default_dock_window_theme.py", > line 39, in > >>> >> > label = ( 0, -3 ), content = ( 7, 6, 0, 0 ) ), > >>> >> > File > "/usr/lib/python2.6/dist-packages/traitsui/theme.py", line 63, in > __init__ > >>> >> > self.image = image > >>> >> > File > "/usr/lib/python2.6/dist-packages/traitsui/ui_traits.py", line > 229, in validate > >>> >> > self.error( object, name, value ) > >>> >> > File > "/usr/lib/python2.6/dist-packages/traits/trait_handlers.py", line > 168, in error > >>> >> > value ) > >>> >> > traits.trait_errors.TraitError: The 'image' trait of a > Theme instance must be an ImageResource or string that can be used > to define one, but a value of '@std:tab_active' was > specified. > >>> >> > Exception in thread Thread-1: > >>> >> > Traceback (most recent call last): > >>> >> > File "/usr/lib/python2.6/threading.py", line 532, in > __bootstrap_inner > >>> >> > self.run() > >>> >> > File "/usr/lib/python2.6/threading.py", line 484, in run > >>> >> > self.__target(*self.__args, **self.__kwargs) > >>> >> > File > "/usr/lib/python2.6/dist-packages/traitsui/image/image.py", line > 329, in _process > >>> >> > if time() > (self.time_stamp + 2.0): > >>> >> > TypeError: 'NoneType' object is not callable > >>> >> > --------------- > >>> >> > _______________________________________________ > >>> >> > Enthought-Dev mailing list > >>> >> > Enthought-Dev at mail.enthought.com > > >>> >> > https://mail.enthought.com/mailman/listinfo/enthought-dev > >>> >> > > >>> >> _______________________________________________ > >>> >> Enthought-Dev mailing list > >>> >> Enthought-Dev at mail.enthought.com > > >>> >> https://mail.enthought.com/mailman/listinfo/enthought-dev > >>> > _______________________________________________ > >>> > Enthought-Dev mailing list > >>> > Enthought-Dev at mail.enthought.com > > >>> > https://mail.enthought.com/mailman/listinfo/enthought-dev > >>> > > >>> _______________________________________________ > >>> Enthought-Dev mailing list > >>> Enthought-Dev at mail.enthought.com > > >>> https://mail.enthought.com/mailman/listinfo/enthought-dev > >> _______________________________________________ > >> Enthought-Dev mailing list > >> Enthought-Dev at mail.enthought.com > > >> https://mail.enthought.com/mailman/listinfo/enthought-dev > >> > > _______________________________________________ > > Enthought-Dev mailing list > > Enthought-Dev at mail.enthought.com > > > https://mail.enthought.com/mailman/listinfo/enthought-dev > > > _______________________________________________ > Enthought-Dev mailing list > Enthought-Dev at mail.enthought.com > > https://mail.enthought.com/mailman/listinfo/enthought-dev > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From morph at debian.org Sat Jul 16 05:34:00 2011 From: morph at debian.org (Sandro Tosi) Date: Sat, 16 Jul 2011 11:34:00 +0200 Subject: [Numpy-discussion] Error building numpy (1.5.1 and 1.6.1rc3) with python2.7 debug Message-ID: Hello, while preparing a test upload for 1.6.1rc3 in Debian, I noticed that it gets an error when building blas with python 2.7 in the debug flavor, the build log is at [1]. It's also been confirmed it fails also with 1.5.1 [2] [1] http://people.debian.org/~morph/python-numpy_1.6.1~rc3-1_amd64.build [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=634012 I think it might be a toolchain change in Debian (since 1.5.1 was built successfully and now it fails), but could you please give me a hand in debugging the issue? Thanks in advance, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From stefan-usenet at bytereef.org Sat Jul 16 06:42:36 2011 From: stefan-usenet at bytereef.org (Stefan Krah) Date: Sat, 16 Jul 2011 12:42:36 +0200 Subject: [Numpy-discussion] PyBUF_SIMPLE requests Message-ID: <20110716104236.GA15434@sleipnir.bytereef.org> Hello, I'm working on the completion of the PEP-3118 (buffer protocol) implementation in Python core. I was wondering how a PyBUF_SIMPLE buffer request sent to an object with a complex structure should behave. The current Python C-API documentation for a PyBUF_SIMPLE buffer request says: "This is the default flag. The returned buffer exposes a read-only memory area. The format of data is assumed to be raw unsigned bytes, without any particular structure. This is a ?stand-alone? flag constant. It never needs to be ?|?d to the others. The exporter will raise an error if it cannot provide such a contiguous buffer of bytes." To me a memory view "without any particular structure" means that the returned buffer should always expose the whole memory area. But NumPy behaves differently (see below for the printbuffer() hack): Python 3.3.0a0 (default:1dd6908df8f5, Jul 16 2011, 11:16:00) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import * >>> x = ndarray(buffer=bytearray([1,2,3,4,5,6,7,8,9,10]), shape=[10], strides=[-1], dtype="B", offset=9) >>> x array([10, 9, 8, 7, 6, 5, 4, 3, 2, 1], dtype='uint8') >>> x.printbuffer() PyBUF_FULL: obj: 0x7f527e946240 buf: 0x7f5281268c79 buf[0]: 10 len: 10 itemsize: 1 readonly: 0 ndim: 1 format: B shape: 10 strides: -1 PyBUF_SIMPLE: obj: 0x7f527e946240 buf: 0x7f5281268c79 buf[0]: 10 len: 10 itemsize: 1 readonly: 0 ndim: 0 format: (null) shape: strides: The PyBUF_FULL request returns a reasonable buffer, because consumers that follow the strides will behave correctly. I do not understand the PyBUF_SIMPLE result. According to the C-API docs a consumer would be allowed to access buf[9], which would be invalid. Or is a consumer supposed to observe ndim = 0 and treat the result as a scalar? If so, there is a problem with this approach. For example, consumers like the struct module do not care about ndim and would walk right out of the array. Stefan Krah printbuffer() hack: ============================================================================ diff --git a/numpy/core/src/multiarray/methods.c b/numpy/core/src/multiarray/methods.c index 68f697a..1cd260a 100644 --- a/numpy/core/src/multiarray/methods.c +++ b/numpy/core/src/multiarray/methods.c @@ -2234,6 +2234,51 @@ array_newbyteorder(PyArrayObject *self, PyObject *args) } +static int +do_print(PyObject *self, int flags, const char *flagstring) +{ + Py_buffer view; + int i; + + if (PyObject_GetBuffer(self, &view, flags) < 0) + return -1; + + fprintf(stderr, "%s:\n\n", flagstring); + fprintf(stderr, "obj: %p\n", view.obj); + fprintf(stderr, "buf: %p\n", view.buf); + fprintf(stderr, "buf[0]: %d\n", ((char *)(view.buf))[0]); + fprintf(stderr, "len: %zd\n", view.len); + fprintf(stderr, "itemsize: %zd\n", view.itemsize); + fprintf(stderr, "readonly: %d\n", view. readonly); + fprintf(stderr, "ndim: %d\n", view.ndim); + fprintf(stderr, "format: %s\n", view.format); + + fprintf(stderr, "shape: "); + for (i = 0; i < view.ndim; i++) + fprintf(stderr, "%zd ", view.shape[i]); + fprintf(stderr, "\n"); + + fprintf(stderr, "strides: "); + for (i = 0; i < view.ndim; i++) + fprintf(stderr, "%zd ", view.strides[i]); + fprintf(stderr, "\n\n"); + + PyBuffer_Release(&view); + + return 0; +} + +static PyObject * +array_print_getbuf(PyObject *self, PyObject *dummy) +{ + if (do_print(self, PyBUF_FULL, "PyBUF_FULL") < 0) + return NULL; + if (do_print(self, PyBUF_SIMPLE, "PyBUF_SIMPLE") < 0) + return NULL; + + Py_RETURN_NONE; +} + NPY_NO_EXPORT PyMethodDef array_methods[] = { /* for subtypes */ @@ -2426,6 +2471,9 @@ NPY_NO_EXPORT PyMethodDef array_methods[] = { {"view", (PyCFunction)array_view, METH_VARARGS | METH_KEYWORDS, NULL}, + {"printbuffer", + (PyCFunction)array_print_getbuf, + METH_NOARGS, NULL}, {NULL, NULL, 0, NULL} /* sentinel */ }; From martin-numpy at earth.li Sat Jul 16 10:50:11 2011 From: martin-numpy at earth.li (Martin Ling) Date: Sat, 16 Jul 2011 15:50:11 +0100 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available Message-ID: <20110716145010.GY3465@earth.li> Hi all, I have just pushed a package to GitHub which adds a quaternion dtype to NumPy: https://github.com/martinling/numpy_quaternion Some backstory: on Wednesday I gave a talk at SciPy 2011 about an inertial sensing simulation package I have been working on (http://www.imusim.org/). One component I suggested might be reusable from that code was the quaternion math implementation, written in Cython. One of its features is a wrapper class for Nx4 NumPy arrays that supports efficient operations using arrays of quaternion values. Travis Oliphant suggested that a quaternion dtype would be a better solution, and got me talking to Mark Weibe about this. With Mark's help I completed this initial version at yesterday's sprint session. Incidentally, how to do something like this isn't well documented and I would have had little hope without both Mark's in-person help and his previous code (for adding a half-precision float dtype) to refer to. I don't know what the consensus is about whether people writing custom dtypes is a desirable thing, but if it is then the process needs to be made a lot easier. That said, the fact this is doable without patching the numpy core at all is really, really nice. Example usage: >>> import numpy as np >>> import quaternion >>> np.quaternion(1,0,0,0) quaternion(1, 0, 0, 0) >>> q1 = np.quaternion(1,2,3,4) >>> q2 = np.quaternion(5,6,7,8) >>> q1 * q2 quaternion(-60, 12, 30, 24) >>> a = np.array([q1, q2]) >>> a array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], dtype=quaternion) >>> exp(a) array([quaternion(1.69392, -0.78956, -1.18434, -1.57912), quaternion(138.909, -25.6861, -29.9671, -34.2481)], dtype=quaternion) The following ufuncs are implemented: add, subtract, multiply, divide, log, exp, power, negative, conjugate, copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, absolute Quaternion components are stored as doubles. The package could be extended to support e.g. qfloat, qdouble, qlongdouble Comparison operations follow the same lexicographic ordering as tuples. The unary tests isnan, isinf and isfinite return true if they would return true for any individual component. Real types may be cast to quaternions, giving quaternions with zero for all three imaginary components. Complex types may also be cast to quaternions, with their single imaginary component becoming the first imaginary component of the quaternion. Quaternions may not be cast to real or complex types. Comments very welcome. This is my first attempt at NumPy hacking :-) Martin From craigyk at me.com Sat Jul 16 13:38:31 2011 From: craigyk at me.com (Craig Yoshioka) Date: Sat, 16 Jul 2011 10:38:31 -0700 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110716145010.GY3465@earth.li> References: <20110716145010.GY3465@earth.li> Message-ID: <21D155F7-7959-4055-AEFC-C39056B93DD9@me.com> Wow, that makes for a great howto example. Thanks. On Jul 16, 2011, at 7:50 AM, Martin Ling wrote: > Hi all, > > I have just pushed a package to GitHub which adds a quaternion dtype to > NumPy: https://github.com/martinling/numpy_quaternion > > Some backstory: on Wednesday I gave a talk at SciPy 2011 about an > inertial sensing simulation package I have been working on > (http://www.imusim.org/). One component I suggested might be reusable > from that code was the quaternion math implementation, written in > Cython. One of its features is a wrapper class for Nx4 NumPy arrays that > supports efficient operations using arrays of quaternion values. > > Travis Oliphant suggested that a quaternion dtype would be a better > solution, and got me talking to Mark Weibe about this. With Mark's help > I completed this initial version at yesterday's sprint session. > > Incidentally, how to do something like this isn't well documented and I > would have had little hope without both Mark's in-person help and his > previous code (for adding a half-precision float dtype) to refer to. I > don't know what the consensus is about whether people writing custom > dtypes is a desirable thing, but if it is then the process needs to be > made a lot easier. That said, the fact this is doable without patching > the numpy core at all is really, really nice. > > Example usage: > >>>> import numpy as np >>>> import quaternion >>>> np.quaternion(1,0,0,0) > quaternion(1, 0, 0, 0) >>>> q1 = np.quaternion(1,2,3,4) >>>> q2 = np.quaternion(5,6,7,8) >>>> q1 * q2 > quaternion(-60, 12, 30, 24) >>>> a = np.array([q1, q2]) >>>> a > array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], > dtype=quaternion) >>>> exp(a) > array([quaternion(1.69392, -0.78956, -1.18434, -1.57912), > quaternion(138.909, -25.6861, -29.9671, -34.2481)], > dtype=quaternion) > > The following ufuncs are implemented: > add, subtract, multiply, divide, log, exp, power, negative, conjugate, > copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, > absolute > > Quaternion components are stored as doubles. The package could be extended > to support e.g. qfloat, qdouble, qlongdouble > > Comparison operations follow the same lexicographic ordering as tuples. > > The unary tests isnan, isinf and isfinite return true if they would > return true for any individual component. > > Real types may be cast to quaternions, giving quaternions with zero for > all three imaginary components. Complex types may also be cast to > quaternions, with their single imaginary component becoming the first > imaginary component of the quaternion. Quaternions may not be cast to > real or complex types. > > Comments very welcome. This is my first attempt at NumPy hacking :-) > > > Martin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Sat Jul 16 15:12:21 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 16 Jul 2011 14:12:21 -0500 Subject: [Numpy-discussion] Problem with boolean-indexing of structured arrays Message-ID: Just a heads-up, I have uncovered a serious bug in one of my programs and traced it down to a boolean indexing of a structured array in the current master branch of numpy. While I am still investigating exactly what is happening, my initial suspicion is that data from another structured array is getting transplanted into the array view that I have. I have not encountered this bug before, but I am currently checking with older versions of numpy to see if it is a regression or not. I will update with more information shortly. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Sat Jul 16 16:45:51 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Sat, 16 Jul 2011 15:45:51 -0500 Subject: [Numpy-discussion] Error building numpy (1.5.1 and 1.6.1rc3) with python2.7 debug In-Reply-To: References: Message-ID: On Sat, Jul 16, 2011 at 4:34 AM, Sandro Tosi wrote: > Hello, > while preparing a test upload for 1.6.1rc3 in Debian, I noticed that > it gets an error when building blas with python 2.7 in the debug > flavor, the build log is at [1]. It's also been confirmed it fails > also with 1.5.1 [2] > > [1] http://people.debian.org/~morph/python-numpy_1.6.1~rc3-1_amd64.build > [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=634012 > > I think it might be a toolchain change in Debian (since 1.5.1 was > built successfully and now it fails), but could you please give me a > hand in debugging the issue? > > Thanks in advance, > -- > Sandro Tosi (aka morph, morpheus, matrixhasu) > My website: http://matrixhasu.altervista.org/ > Me at Debian: http://wiki.debian.org/SandroTosi > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Hi, What do you mean by 'python2.7 debug'? Numpy 1.6.1rc's and earlier build and install with Python 2.7 build in debug mode ($ ./configure --with-pydebug ) on 64-bit Fedora 14 and 15. But, if I can follow you build process (should be the plain 'python setup.py build' to be useful) I think numpy is not finding the correct blas/lapack/atlas libraries so either you may need a site.cfg for that system or install those in the Linux standard locations such as /usr/lib64. You should probably try building without blas, lapack and atlas etc.: BLAS=None LAPACK=None ATLAS=None python setup.py build Bruce From ben.root at ou.edu Sat Jul 16 17:09:29 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 16 Jul 2011 16:09:29 -0500 Subject: [Numpy-discussion] Problem with boolean-indexing of structured arrays In-Reply-To: References: Message-ID: On Sat, Jul 16, 2011 at 2:12 PM, Benjamin Root wrote: > Just a heads-up, I have uncovered a serious bug in one of my programs and > traced it down to a boolean indexing of a structured array in the current > master branch of numpy. While I am still investigating exactly what is > happening, my initial suspicion is that data from another structured array > is getting transplanted into the array view that I have. I have not > encountered this bug before, but I am currently checking with older versions > of numpy to see if it is a regression or not. > > I will update with more information shortly. > > Ben Root > Please disregard. It seems that the boolean indexing was working correctly, it was the test that generated the boolean array that had an error and caused incorrect data to be accessed. I apologize for the noise. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From aarchiba at physics.mcgill.ca Sat Jul 16 20:16:44 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Sat, 16 Jul 2011 20:16:44 -0400 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110716145010.GY3465@earth.li> References: <20110716145010.GY3465@earth.li> Message-ID: What a useful package! Apart from helping all the people who know they need quaternions, this package removes one major family of use cases for vectorized small-matrix operations, namely, 3D rotations. Quaternions are the canonical way to represent orientation and rotation in three dimensions, and their multiplication gives (with some fiddling) composition of rotations. The next interesting question is, how well does scipy.interpolate deal with them? For really good rotational paths I seem to recall you want specialized splines, but simply interpolating in the quaternion domain is not a bad quick and dirty approach. Anne (now awaiting octonions, though I've never heard of a practical use for them) On 16 July 2011 10:50, Martin Ling wrote: > Hi all, > > I have just pushed a package to GitHub which adds a quaternion dtype to > NumPy: https://github.com/martinling/numpy_quaternion > > Some backstory: on Wednesday I gave a talk at SciPy 2011 about an > inertial sensing simulation package I have been working on > (http://www.imusim.org/). One component I suggested might be reusable > from that code was the quaternion math implementation, written in > Cython. One of its features is a wrapper class for Nx4 NumPy arrays that > supports efficient operations using arrays of quaternion values. > > Travis Oliphant suggested that a quaternion dtype would be a better > solution, and got me talking to Mark Weibe about this. With Mark's help > I completed this initial version at yesterday's sprint session. > > Incidentally, how to do something like this isn't well documented and I > would have had little hope without both Mark's in-person help and his > previous code (for adding a half-precision float dtype) to refer to. I > don't know what the consensus is about whether people writing custom > dtypes is a desirable thing, but if it is then the process needs to be > made a lot easier. That said, the fact this is doable without patching > the numpy core at all is really, really nice. > > Example usage: > > ?>>> import numpy as np > ?>>> import quaternion > ?>>> np.quaternion(1,0,0,0) > ?quaternion(1, 0, 0, 0) > ?>>> q1 = np.quaternion(1,2,3,4) > ?>>> q2 = np.quaternion(5,6,7,8) > ?>>> q1 * q2 > ?quaternion(-60, 12, 30, 24) > ?>>> a = np.array([q1, q2]) > ?>>> a > ?array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], > ? ? ? ?dtype=quaternion) > ?>>> exp(a) > ?array([quaternion(1.69392, -0.78956, -1.18434, -1.57912), > ? ? ? ?quaternion(138.909, -25.6861, -29.9671, -34.2481)], > ? ? ? ?dtype=quaternion) > > The following ufuncs are implemented: > ?add, subtract, multiply, divide, log, exp, power, negative, conjugate, > ?copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, > ?absolute > > Quaternion components are stored as doubles. The package could be extended > to support e.g. qfloat, qdouble, qlongdouble > > Comparison operations follow the same lexicographic ordering as tuples. > > The unary tests isnan, isinf and isfinite return true if they would > return true for any individual component. > > Real types may be cast to quaternions, giving quaternions with zero for > all three imaginary components. Complex types may also be cast to > quaternions, with their single imaginary component becoming the first > imaginary component of the quaternion. Quaternions may not be cast to > real or complex types. > > Comments very welcome. This is my first attempt at NumPy hacking :-) > > > Martin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at googlemail.com Sun Jul 17 11:52:19 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 17 Jul 2011 17:52:19 +0200 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110716145010.GY3465@earth.li> References: <20110716145010.GY3465@earth.li> Message-ID: On Sat, Jul 16, 2011 at 4:50 PM, Martin Ling wrote: > Hi all, > > I have just pushed a package to GitHub which adds a quaternion dtype to > NumPy: https://github.com/martinling/numpy_quaternion > > Some backstory: on Wednesday I gave a talk at SciPy 2011 about an > inertial sensing simulation package I have been working on > (http://www.imusim.org/). One component I suggested might be reusable > from that code was the quaternion math implementation, written in > Cython. One of its features is a wrapper class for Nx4 NumPy arrays that > supports efficient operations using arrays of quaternion values. > > Travis Oliphant suggested that a quaternion dtype would be a better > solution, and got me talking to Mark Weibe about this. With Mark's help > I completed this initial version at yesterday's sprint session. > > Incidentally, how to do something like this isn't well documented and I > would have had little hope without both Mark's in-person help and his > previous code (for adding a half-precision float dtype) to refer to. I > don't know what the consensus is about whether people writing custom > dtypes is a desirable thing, but if it is then the process needs to be > made a lot easier. That said, the fact this is doable without patching > the numpy core at all is really, really nice. > > Example usage: > > >>> import numpy as np > >>> import quaternion > >>> np.quaternion(1,0,0,0) > quaternion(1, 0, 0, 0) > >>> q1 = np.quaternion(1,2,3,4) > >>> q2 = np.quaternion(5,6,7,8) > >>> q1 * q2 > quaternion(-60, 12, 30, 24) > >>> a = np.array([q1, q2]) > >>> a > array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], > dtype=quaternion) > >>> exp(a) > array([quaternion(1.69392, -0.78956, -1.18434, -1.57912), > quaternion(138.909, -25.6861, -29.9671, -34.2481)], > dtype=quaternion) > > The following ufuncs are implemented: > add, subtract, multiply, divide, log, exp, power, negative, conjugate, > copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, > absolute > > Quaternion components are stored as doubles. The package could be extended > to support e.g. qfloat, qdouble, qlongdouble > > Comparison operations follow the same lexicographic ordering as tuples. > > The unary tests isnan, isinf and isfinite return true if they would > return true for any individual component. > > Real types may be cast to quaternions, giving quaternions with zero for > all three imaginary components. Complex types may also be cast to > quaternions, with their single imaginary component becoming the first > imaginary component of the quaternion. Quaternions may not be cast to > real or complex types. > > Comments very welcome. This is my first attempt at NumPy hacking :-) > > Looks very interesting. One thing that is surprising to me is that the quaternion dtype is inserted in to the numpy namespace. For dtypes that are planned to be integrated with numpy later on perhaps this makes sense, but in general this doesn't look right I think. Can you explain your reasoning to do it like this? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Sun Jul 17 13:15:06 2011 From: tsyu80 at gmail.com (Tony Yu) Date: Sun, 17 Jul 2011 13:15:06 -0400 Subject: [Numpy-discussion] Numpydoc warnings for methods Message-ID: I'm building documentation using Sphinx, and it seems that numpydoc is raising a lot of warnings. Specifically, the warnings look like "failed to import ", "toctree references unknown document u''", "toctree contains reference to nonexisting document ''---for each method defined. The example below reproduces the issue on my system (Sphinx 1.0.7, numpy HEAD). These warnings appear in my build of the numpy docs, as well. Removing numpydoc from the list of Sphinx extensions gets rid of these warnings (but, of course, adds new warnings if headings for 'Parameters', 'Returns', etc. are present). Am I doing something wrong here? Thanks, -Tony test_sphinx/foo.py: =================== class Bar(object): """Bar docstring""" def baz(self): """baz docstring""" pass test_sphinx/doc/source/foo.rst: =============================== .. autoclass:: foo.Bar :members: Warnings from build: ==================== /Users/Tony/Desktop/test_sphinx/doc/source/foo.rst:13: (WARNING/2) failed to import baz /Users/Tony/Desktop/test_sphinx/doc/source/foo.rst:13: (WARNING/2) toctree references unknown document u'baz' /Users/Tony/Desktop/test_sphinx/doc/source/foo.rst:: WARNING: toctree contains reference to nonexisting document 'baz' -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Sun Jul 17 13:57:34 2011 From: sturla at molden.no (Sturla Molden) Date: Sun, 17 Jul 2011 19:57:34 +0200 Subject: [Numpy-discussion] Adding a linear system type to NumPy? Message-ID: <4E23228E.8090502@molden.no> The problem I am thinking we might try to might fix is that programmers with less numerical competence is unaware that the matrix expression (X**-1) * Y should be written as np.linalg.solve(X,Y) I've seen numerous times matrix expressions being typed exactly as written in linear algebra text books. Matlab does this with the backslash operator, though it is not that intuitive. Also, there seems to be general agreement against a backslash division operator in Python. So I suggest inverting a NumPy matrix could result in an unsolved linear system type, instead of actually computing the matrix inverse and returning a new matrix. The linear system type would store two arrays, A and X, symbolically representing A * (X**-1). Initially, A would be set to the indenty matrix, I. A matrix expression Y * (X**-1) would result in (1) creation of a LinearSystem object for the iversion of X, and (2) matrix multiplication of Y by A, returning a new LinearSystem object with A updated by A = Y * A. The matrix expression (X**-1) * Y Would result in (1) creation of a LinearSystem object for the iversion of X, and (2) solution of the linear system by calling np.linalg.solve, i.e. np np.linalg.solve(X,Y) The matrix expression would Z * (X**-1) * Y would form a linear system type for X, initialize A to Z, and then evaluate np.dot(A, np np.linalg.solve(X,Y)) cf. Python's evaluation order is left to right. Any other operation on a linear system (e.g. slicing) would result in formation of the inverse, by solving it against the identity matrix, set a flag that the system is solved, and then just behave as an ordinary np.matrix. Thus, (X**-1) * Y would behave as before, but do the "correct" math (instead of explicitely forming the inverse and then multiplying). Consequently this would be the same as Matlab's backslash operator, only more intuitive, as the syntax would be the same as textbook linear algebra notation. A for naming, it could e.g. be np.linear_system. I am just thinking out loudly, so forgive me for spamming the list :-) Sturla From alan.isaac at gmail.com Sun Jul 17 15:17:11 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sun, 17 Jul 2011 15:17:11 -0400 Subject: [Numpy-discussion] Adding a linear system type to NumPy? In-Reply-To: <4E23228E.8090502@molden.no> References: <4E23228E.8090502@molden.no> Message-ID: <4E233537.7080300@gmail.com> On 7/17/2011 1:57 PM, Sturla Molden wrote: > I suggest inverting a NumPy matrix could result in an unsolved linear > system type, instead of actually computing the matrix inverse and > returning a new matrix. 1. Too implicit. 2. Too confusing for new users. 2a. Too confusing for students. However a "project" method might be nice, where X.project(Y) would do an orthogonal projection of Y onto X. (Then the underlying computation becomes an implementation detail.) fwiw, Alan Isaac From ralf.gommers at googlemail.com Sun Jul 17 15:35:19 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 17 Jul 2011 21:35:19 +0200 Subject: [Numpy-discussion] Numpydoc warnings for methods In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 7:15 PM, Tony Yu wrote: > I'm building documentation using Sphinx, and it seems that numpydoc is > raising > a lot of warnings. Specifically, the warnings look like "failed to import > ", "toctree > references unknown document u''", "toctree contains reference > to nonexisting document ''---for each method defined. The > example below reproduces the issue on my system (Sphinx 1.0.7, numpy HEAD). > These warnings appear in my build of the numpy docs, as well. > > Removing numpydoc from the list of Sphinx extensions gets rid of these > warnings > (but, of course, adds new warnings if headings for 'Parameters', 'Returns', > etc. are present). > > Am I doing something wrong here? > > You're not, it's a Sphinx bug that Pauli already has a fix for. See http://projects.scipy.org/numpy/ticket/1772 Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Jul 17 16:03:03 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 17 Jul 2011 22:03:03 +0200 Subject: [Numpy-discussion] numpy build issue on i7-2600K CPU In-Reply-To: References: <4E1A5888.1030808@uci.edu> Message-ID: On Mon, Jul 11, 2011 at 9:31 PM, Ralf Gommers wrote: > Hi Christoph, > > On Mon, Jul 11, 2011 at 3:57 AM, Christoph Gohlke wrote: > >> Hello, >> >> building numpy 1.6.1rc2 on Windows, i7-2600K CPU, with msvc9 failed with >> the following error: >> >> File "numpy/core/setup_common.py", line 271, in long_double_representation >> raise ValueError("Could not lock sequences (%s)" % saw) >> ValueError: Could not lock sequences (None) >> >> >> This problem has been mentioned before at > pipermail/numpy-discussion/**2011-March/055571.html >> >. >> >> >> Opening the configtest.obj file in binary mode fixed the issue for me. A >> patch is attached. >> > > I did see this, just not before I tagged 1.6.1rc3. If it's reviewed/tested > I think it's a simple enough change that it can go in without requiring a > new RC. > > This looks like a correct fix to me, and I planned to test it on Windows and push it to master first. But since master doesn't compile on Windows for me at the moment I can't do that, and I don't really want to only push it to 1.6.x. So it may have to wait till 1.6.2/1.7.0. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin-numpy at earth.li Sun Jul 17 16:03:17 2011 From: martin-numpy at earth.li (Martin Ling) Date: Sun, 17 Jul 2011 21:03:17 +0100 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> Message-ID: <20110717200316.GE3465@earth.li> On Sat, Jul 16, 2011 at 08:16:44PM -0400, Anne Archibald wrote: > The next interesting question is, how well does scipy.interpolate deal > with them? For really good rotational paths I seem to recall you want > specialized splines, but simply interpolating in the quaternion domain > is not a bad quick and dirty approach. Hi Anne, Actually that's next on my list. The most commonly used quaternion interpolation algorithm is Ken Shoemake's SLERP. There is an improved one called SQUAD which is C^1-continuous, i.e defines continuous angular velocities as well as just rotations. For the inertial sensing simulator I mentioned we needed C^2 continuitiy, so we implemented the quaternion B-spline algorithm from: M-J Kim, M-S Kim and S Y Shin, "A General Construction Scheme for Unit Quaternion Curves with Simple High Order Derivatives, in "Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (SIG-GRAPH?95)", pp. 369-376, ACM, 1995. At the moment this uses our Cython-based quaternion library, but my plan once the quaternion dtype is nailed down would be to rewrite these interpolators to use the dtype, and submit the result to scipy.interpolate. Martin From martin-numpy at earth.li Sun Jul 17 16:24:12 2011 From: martin-numpy at earth.li (Martin Ling) Date: Sun, 17 Jul 2011 21:24:12 +0100 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> Message-ID: <20110717202412.GF3465@earth.li> On Sun, Jul 17, 2011 at 05:52:19PM +0200, Ralf Gommers wrote: > > Looks very interesting. > > One thing that is surprising to me is that the quaternion dtype is > inserted in to the numpy namespace. For dtypes that are planned to be > integrated with numpy later on perhaps this makes sense, but in general > this doesn't look right I think. Can you explain your reasoning to do it > like this? To be honest I didn't think about it a great deal; the basic glue code of the package was copied directly from Mark Weibe's numpy_half module. In retrospect, the rationale is that I would like to see this dtype integrated into NumPy, so the implementation aims to show exactly what that would look like to the user. I would have written this as a patch to NumPy had Mark not shown me it was possible to write it separately. In the general case I would agree with you that application-specific dtypes should not go into the numpy namespace. I think this is a sufficiently general thing that it would be worth including there - it's just another sort of number, after all. That said, I would have no major objections to this being a separate package and not touching the numpy namespace. The key thing is just that it's usable as a dtype. However, if this is going to be a separate package then I'd like to be sure that it's not going to be broken by internal changes to NumPy in the future. At present there is no official documentation of how to write an external dtype. Martin From dsdale24 at gmail.com Sun Jul 17 17:48:39 2011 From: dsdale24 at gmail.com (Darren Dale) Date: Sun, 17 Jul 2011 17:48:39 -0400 Subject: [Numpy-discussion] X11 system info Message-ID: In numpy.distutils.system info: default_x11_lib_dirs = libpaths(['/usr/X11R6/lib','/usr/X11/lib', '/usr/lib'], platform_bits) default_x11_include_dirs = ['/usr/X11R6/include','/usr/X11/include', '/usr/include'] These defaults won't work on the forthcoming Ubuntu 11.10, which installs X into /usr/lib/X11 and /usr/include/X11. Darren From Chris.Barker at noaa.gov Sun Jul 17 17:55:37 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Sun, 17 Jul 2011 14:55:37 -0700 Subject: [Numpy-discussion] Build error on Windows In-Reply-To: <4E1FAE47.2020207@uci.edu> References: <4E1F9E51.9080702@noaa.gov> <4E1FAE47.2020207@uci.edu> Message-ID: <4E235A59.8020002@noaa.gov> On 7/14/2011 8:04 PM, Christoph Gohlke wrote: > A patch for the build issues is attached. Remove the build directory > before rebuilding. > Christoph, I had other issues (I think in one case, a *.c file was not getting re-built from the *.c.src file. But anyway, at the end the patch appears to work. Could someone with commit privileges commit it? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: msvc9.diff URL: From d.s.seljebotn at astro.uio.no Sun Jul 17 17:59:31 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 17 Jul 2011 23:59:31 +0200 Subject: [Numpy-discussion] Adding a linear system type to NumPy? In-Reply-To: <4E23228E.8090502@molden.no> References: <4E23228E.8090502@molden.no> Message-ID: <1e52b6c0-1e21-484f-9e5c-76a6d657c960@email.android.com> Something related: This autumn I expect to invest a significant amount of time (more than four weeks full-time) in a package for lazily evaluated, polymorphic linear algebra. Matrix = linear operator, a type seperate from arrays -- arrays are treated as vectors/stacked vectors Matrices can be of a variety of storage formats (diagonal, dense, the sparse formats, block-diagonal, a fortran routine that you promise acts linearly on a vector linear, and so on). The point is allowing to write linear algebra code without caring about the storage formats of the inputs. Use * for matmul, and A.I for lazy inversion i.e. (A.I * u) does a LU. In summary, a) I add my vote to this being outside the scope of numpy, b) I hope to do something about this outside of numpy. (I'll only do what is actually relevant to my research of course... But I think that will be enough for an interesting prototype of a full-fledged system for object oriented/polymorphic linear algebra) -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Sturla Molden wrote: The problem I am thinking we might try to might fix is that programmers with less numerical competence is unaware that the matrix expression (X**-1) * Y should be written as np.linalg.solve(X,Y) I've seen numerous times matrix expressions being typed exactly as written in linear algebra text books. Matlab does this with the backslash operator, though it is not that intuitive. Also, there seems to be general agreement against a backslash division operator in Python. So I suggest inverting a NumPy matrix could result in an unsolved linear system type, instead of actually computing the matrix inverse and returning a new matrix. The linear system type would store two arrays, A and X, symbolically representing A * (X**-1). Initially, A would be set to the indenty matrix, I. A matrix expression Y * (X**-1) would result in (1) creation of a LinearSystem object for the iversion of X, and (2) matrix multiplication of Y by A, returning a new LinearSystem object with A updated by A = Y * A. The matrix expression (X**-1) * Y Would result in (1) creation of a LinearSystem object for the iversion of X, and (2) solution of the linear system by calling np.linalg.solve, i.e. np np.linalg.solve(X,Y) The matrix expression would Z * (X**-1) * Y would form a linear system type for X, initialize A to Z, and then evaluate np.dot(A, np np.linalg.solve(X,Y)) cf. Python's evaluation order is left to right. Any other operation on a linear system (e.g. slicing) would result in formation of the inverse, by solving it against the identity matrix, set a flag that the system is solved, and then just behave as an ordinary np.matrix. Thus, (X**-1) * Y would behave as before, but do the "correct" math (instead of explicitely forming the inverse and then multiplying). Consequently this would be the same as Matlab's backslash operator, only more intuitive, as the syntax would be the same as textbook linear algebra notation. A for naming, it could e.g. be np.linear_system. I am just thinking out loudly, so forgive me for spamming the list :-) Sturla_____________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sun Jul 17 18:07:51 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 18 Jul 2011 00:07:51 +0200 Subject: [Numpy-discussion] Adding a linear system type to NumPy? In-Reply-To: <1e52b6c0-1e21-484f-9e5c-76a6d657c960@email.android.com> References: <4E23228E.8090502@molden.no> <1e52b6c0-1e21-484f-9e5c-76a6d657c960@email.android.com> Message-ID: <9bdf4dae-bad5-4419-bfea-62dfb7d0d15a@email.android.com> More concrete feedback about Sturla's proposal: The problem I have is if you do A = B**-1 Then, A is some 'magic' object, not a NumPy array. That means that it is very different from Matlab's \, which restricts the context, you simply can't do A = B \ I think A.solve(u) is a lot clearer in the case of numpy arrays. Dag -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Dag Sverre Seljebotn wrote: Something related: This autumn I expect to invest a significant amount of time (more than four weeks full-time) in a package for lazily evaluated, polymorphic linear algebra. Matrix = linear operator, a type seperate from arrays -- arrays are treated as vectors/stacked vectors Matrices can be of a variety of storage formats (diagonal, dense, the sparse formats, block-diagonal, a fortran routine that you promise acts linearly on a vector linear, and so on). The point is allowing to write linear algebra code without caring about the storage formats of the inputs. Use * for matmul, and A.I for lazy inversion i.e. (A.I * u) does a LU. In summary, a) I add my vote to this being outside the scope of numpy, b) I hope to do something about this outside of numpy. (I'll only do what is actually relevant to my research of course... But I think that will be enough for an interesting prototype of a full-fledged system for object oriented/polymorphic linear algebra) -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Sturla Molden wrote: The problem I am thinking we might try to might fix is that programmers with less numerical competence is unaware that the matrix expression (X**-1) * Y should be written as np.linalg.solve(X,Y) I've seen numerous times matrix expressions being typed exactly as written in linear algebra text books. Matlab does this with the backslash operator, though it is not that intuitive. Also, there seems to be general agreement against a backslash division operator in Python. So I suggest inverting a NumPy matrix could result in an unsolved linear system type, instead of actually computing the matrix inverse and returning a new matrix. The linear system type would store two arrays, A and X, symbolically representing A * (X**-1). Initially, A would be set to the indenty matrix, I. A matrix expression Y * (X**-1) would result in (1) creation of a LinearSystem object for the iversion of X, and (2) matrix multiplication of Y by A, returning a new LinearSystem object with A updated by A = Y * A. The matrix expression (X**-1) * Y Would result in (1) creation of a LinearSystem object for the iversion of X, and (2) solution of the linear system by calling np.linalg.solve, i.e. np np.linalg.solve(X,Y) The matrix expression would Z * (X**-1) * Y would form a linear system type for X, initialize A to Z, and then evaluate np.dot(A, np np.linalg.solve(X,Y)) cf. Python's evaluation order is left to right. Any other operation on a linear system (e.g. slicing) would result in formation of the inverse, by solving it against the identity matrix, set a flag that the system is solved, and then just behave as an ordinary np.matrix. Thus, (X**-1) * Y would behave as before, but do the "correct" math (instead of explicitely forming the inverse and then multiplying). Consequently this would be the same as Matlab's backslash operator, only more intuitive, as the syntax would be the same as textbook linear algebra notation. A for naming, it could e.g. be np.linear_system. I am just thinking out loudly, so forgive me for spamming the list :-) Sturla_____________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From tsyu80 at gmail.com Sun Jul 17 19:42:23 2011 From: tsyu80 at gmail.com (Tony Yu) Date: Sun, 17 Jul 2011 19:42:23 -0400 Subject: [Numpy-discussion] Numpydoc warnings for methods In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 3:35 PM, Ralf Gommers wrote: > > On Sun, Jul 17, 2011 at 7:15 PM, Tony Yu wrote: > >> >> Am I doing something wrong here? >> >> You're not, it's a Sphinx bug that Pauli already has a fix for. See > http://projects.scipy.org/numpy/ticket/1772 > > Ralf > I thought I searched pretty thoroughly, but apparently my google skills are lacking. Thanks for the link! -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Jul 18 06:39:34 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 18 Jul 2011 10:39:34 +0000 (UTC) Subject: [Numpy-discussion] PyBUF_SIMPLE requests References: <20110716104236.GA15434@sleipnir.bytereef.org> Message-ID: Sat, 16 Jul 2011 12:42:36 +0200, Stefan Krah wrote: > x = ndarray(buffer=bytearray([1,2,3,4,5,6,7,8,9,10]), > shape=[10], strides=[-1], dtype="B", offset=9) [clip] > I do not understand the PyBUF_SIMPLE result. According to the C-API docs > a consumer would be allowed to access buf[9], which would be invalid. It's a bug in the Numpy implementation. The correct behavior here seems to be that exporting the buffer must fail: there is no way to represent this buffer with the requirements of PyBUF_SIMPLE. Pauli From schmidbe at in.tum.de Mon Jul 18 08:54:18 2011 From: schmidbe at in.tum.de (Markus Schmidberger) Date: Mon, 18 Jul 2011 14:54:18 +0200 Subject: [Numpy-discussion] Your NumPy application on a Computer Cluster in the Cloud - cloudnumbers.com Message-ID: <1310993658.2551.64.camel@schmidb-TravelMate8572TG> Dear NumPy users and experts, cloudnumbers.com provides researchers and companies with the access to resources to perform high performance calculations in the cloud. As cloudnumbers.com's community manager I may invite you to register and test your Python application on a computer cluster in the cloud for free: http://my.cloudnumbers.com/register We are looking forward to get your feedback and consumer insights. Take the chance and have an impact to the development of a new cloud computing calculation platform. Our aim is to change the way of research collaboration is done today by bringing together scientists and businesses from all over the world on a single platform. cloudnumbers.com is a Berlin (Germany) based international high-tech startup striving for enabling everyone to benefit from the High Performance Computing related advantages of the cloud. We provide easy access to applications running on any kind of computer hardware from single core high memory machines up to 1000 cores computer clusters. To get more information check out our web-page (http://www.cloudnumbers.com/) or follow our blog about cloud computing, HPC and HPC applications: http://cloudnumbers.com/blog Key features of our platform for efficient computing in the cloud are: * Turn fixed into variable costs and pay only for the capacity you need. Watch our latest saving costs with cloudnumbers.com video: http://www.youtube.com/watch?v=ln_BSVigUhg&feature=player_embedded * Enter the cloud using an intuitive and user friendly platform. Watch our latest cloudnumbers.com in a nutshell video: http://www.youtube.com/watch?v=0ZNEpR_ElV0&feature=player_embedded * Be released from ongoing technological obsolescence and continuous maintenance costs (e.g. linking to libraries or system dependencies) * Accelerated your Python, C, C++, Fortran, R, ... calculations through parallel processing and great computing capacity - more than 1000 cores are available and GPUs are coming soon. * Share your results worldwide (coming soon). * Get high speed access to public databases (please let us know, if your favorite database is missing!). * We have developed a security architecture that meets high requirements of data security and privacy. Read our security white paper: http://d1372nki7bx5yg.cloudfront.net/wp-content/uploads/2011/06/cloudnumberscom-security.whitepaper.pdf Best Markus -- Dr. rer. nat. Markus Schmidberger Senior Community Manager Cloudnumbers.com GmbH Chausseestra?e 6 10119 Berlin www.cloudnumbers.com E-Mail: markus.schmidberger at cloudnumbers.com ************************* Amtsgericht M?nchen, HRB 191138 Gesch?ftsf?hrer: Erik Muttersbach, Markus Fensterer, Moritz v. Petersdorff-Campen From samquinan at gmail.com Mon Jul 18 13:10:36 2011 From: samquinan at gmail.com (Sam Quinan) Date: Mon, 18 Jul 2011 12:10:36 -0500 Subject: [Numpy-discussion] fate of array interface In-Reply-To: Message-ID: As I got no response to my original email, I figured I'd ask one more time just in case somebody who could answer my question(s) missed it the first time... Is array_interface actually being deprecated in favor of PEP 3118? If so, when? Are there eventual plans for python-side buffer specification of PEP 3118 protocol, in order for users to be able to use PEP 3118 with ctypes-extended libraries? Besides this list is there anywhere else I can check for this information? The numpy 1.4 documentation seems to suggest that it will be deprecated http://docs.scipy.org/doc/numpy-1.4.x/reference/arrays.interface.html while the 1.6 and current development versions of documentation don't mention deprecation, but simply inform users that PEP 3118 exists as an alternative http://docs.scipy.org/doc/numpy/reference/arrays.interface.html Thus my confusion... - Sam On 7/13/11 6:15 PM, "numpy-discussion-request at scipy.org" wrote: > Message: 3 > Date: Wed, 13 Jul 2011 17:47:59 -0500 > From: Sam Quinan > Subject: [Numpy-discussion] Fate of Numpy's Array Interface > To: > Message-ID: > Content-Type: text/plain; charset="US-ASCII" > > Hey, > > So I'm working on interfacing numpy ndarrays with an n-dimensional array > representation that exists as part of a massive custom C library. Due to the > size of the library, hand-coding a c-extension for the library just was not > really an option; so we wound up using gcc_xml to generate the proper ctypes > code. This works great for accessing our C functions within python, but not > so much for trying share memory between numpy and our custom array > representations... Passing a pointer to the numpy array data to ctypes is > fairly simple, but figuring out the proper way to get memory from ctypes > into numpy has been problematic. > > I know that PEP 3118 is supposed to be superseding the numpy array > interface, but PEP 3118 can only be specified on the C side, which is > problematic for anybody using ctypes to wrap their C code. The legacy > __array_interface__ allows for a python side specification of data buffers, > but there appears to be no corresponding interface capability in the PEP > 3118 protocol. On top of that add the fact that Python's own support for PEP > 3118 has some major bugs (ctypes throwing invalid PEP 3118 codes - > http://bugs.python.org/issue10746 :: issues with python's memoryview object > - http://bugs.python.org/issue10181), and PEP 3118 seems like a nightmare to > deal with. At the same time though, I don't want to simply use the legacy > array interface if it's going to be completely deprecated in the near > future. > > How long before the legacy __array_interface__ goes the way of the dodo? > When that happens, are there plans to add support for a python side > interface to the PEP 3118 protocol? If not, what is the proper way to > interface a ctypes wrapped library with PEP 3118? > > Thanks, > > - Sam Quinan From robert.kern at gmail.com Mon Jul 18 18:29:40 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 Jul 2011 17:29:40 -0500 Subject: [Numpy-discussion] fate of array interface In-Reply-To: References: Message-ID: On Mon, Jul 18, 2011 at 12:10, Sam Quinan wrote: > As I got no response to my original email, I figured I'd ask one more time > just in case somebody who could answer my question(s) missed it the first > time... > > Is array_interface actually being deprecated in favor of PEP 3118? No. > The numpy 1.4 documentation seems to suggest that it will be deprecated > > ? ?http://docs.scipy.org/doc/numpy-1.4.x/reference/arrays.interface.html > > while the 1.6 and current development versions of documentation don't > mention deprecation, but simply inform users that PEP 3118 exists as an > alternative > > ? ?http://docs.scipy.org/doc/numpy/reference/arrays.interface.html > > Thus my confusion... It was an error in the 1.4 documentation that was fixed. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From carlosbecker at gmail.com Tue Jul 19 05:05:18 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Tue, 19 Jul 2011 11:05:18 +0200 Subject: [Numpy-discussion] Array vectorization in numpy Message-ID: Hi, I started with numpy a few days ago. I was timing some array operations and found that numpy takes 3 or 4 times longer than Matlab on a simple array-minus-scalar operation. This looks as if there is a lack of vectorization, even though this is just a guess. I hope this is not reposting. I tried searching the mailing list database but did not find anything related specifically to a problem like this one. Here there is the python test code: -------------------------------------------- from datetime import datetime import numpy as np def test(): m = np.ones([2000,2000],float) N = 100 t1 = datetime.now() for x in range(N): k = m - 0.5 t2 = datetime.now() print (t2 - t1).total_seconds() / N -------------------------------------------- And matlab: -------------------------------------------- m = rand(2000,2000); N = 100; tic; for I=1:N k = m - 0.5; end toc / N -------------------------------------------- I have the impression that the speed boost with Matlab is not related to matlab optimizations, since singe-runs also render similar timings. I tried compiling ATLAS for SSE2 and didn't observe any difference. Any clues? Thanks, Carlos -------------- next part -------------- An HTML attachment was scrubbed... URL: From ater1980 at gmail.com Tue Jul 19 06:04:38 2011 From: ater1980 at gmail.com (Alex Ter-Sarkissov) Date: Tue, 19 Jul 2011 22:04:38 +1200 Subject: [Numpy-discussion] import Message-ID: this is probably silly question, I've seen in this in one of the tutorials: from tkinter import * import tkinter.messagebox given that * implies importing the whole module, why would anyone bother with importing a specific command on top of it? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jul 19 07:10:47 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 19 Jul 2011 11:10:47 +0000 (UTC) Subject: [Numpy-discussion] Array vectorization in numpy References: Message-ID: Tue, 19 Jul 2011 11:05:18 +0200, Carlos Becker wrote: > Hi, I started with numpy a few days ago. I was timing some array > operations and found that numpy takes 3 or 4 times longer than Matlab on > a simple array-minus-scalar operation. > This looks as if there is a lack of vectorization, even though this is > just a guess. I hope this is not reposting. I tried searching the > mailing list database but did not find anything related specifically to > a problem like this one. I see essentially no performance difference: Matlab [7.10.0.499 (R2010a)]: 0.0321 Numpy [1.6.0]: 0.03117567 If later versions of Matlab can parallelize the computation across multiple processors, that could be one possibility for the difference you see. Alternatively, you may have compiled Numpy with optimizations turned off. From carlosbecker at gmail.com Tue Jul 19 07:40:20 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Tue, 19 Jul 2011 13:40:20 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: Message-ID: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Hi Pauli, thanks for the quick answer. Is there a way to check the optimization flags of numpy after installation? I am away of a matlab installation now, but I remember I saw a single processor active with matlab. I will check it again soon Thanks! El 19/07/2011, a las 13:10, Pauli Virtanen escribi?: > Tue, 19 Jul 2011 11:05:18 +0200, Carlos Becker wrote: >> Hi, I started with numpy a few days ago. I was timing some array >> operations and found that numpy takes 3 or 4 times longer than >> Matlab on >> a simple array-minus-scalar operation. >> This looks as if there is a lack of vectorization, even though this >> is >> just a guess. I hope this is not reposting. I tried searching the >> mailing list database but did not find anything related >> specifically to >> a problem like this one. > > I see essentially no performance difference: > > Matlab [7.10.0.499 (R2010a)]: 0.0321 > Numpy [1.6.0]: 0.03117567 > > If later versions of Matlab can parallelize the computation across > multiple processors, that could be one possibility for the difference > you see. Alternatively, you may have compiled Numpy with optimizations > turned off. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From g.plantageneto at gmail.com Tue Jul 19 08:38:52 2011 From: g.plantageneto at gmail.com (Andrea Cimatoribus) Date: Tue, 19 Jul 2011 12:38:52 +0000 Subject: [Numpy-discussion] Alternative to boolean array Message-ID: Dear all, I would like to avoid the use of a boolean array (mask) in the following statement: mask = (A != 0.) B = A[mask] in order to be able to move this bit of code in a cython script (boolean arrays are not yet implemented there, and they slow down execution a lot as they can't be defined explicitely). Any idea of an efficient alternative? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlosbecker at gmail.com Tue Jul 19 11:49:14 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Tue, 19 Jul 2011 17:49:14 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: I made more tests with the same operation, restricting Matlab to use a single processing unit. I got: - Matlab: 0.0063 sec avg - Numpy: 0.026 sec avg - Numpy with weave.blitz: 0.0041 Note that weave.blitz is even faster than Matlab (slightly). I tried on an older computer, and I got similar results between matlab and numpy without weave.blitz, so maybe it has to do with 'new' vectorization opcodes. Anyhow, even though these results are not very promising, it gets worse if I try to do something like: result = (m - 0.5)*0.3 and I get the following timings: - Matlab: 0.0089 - Numpy: 0.051 - Numpy with blitz: 0.0043 Now blitz is considerably faster! Anyways, I am concerned about numpy being much slower, in this case taking 2x the time of the previous operation. I guess this is because of the way that python operands/arguments are passed. Should I always use weave.blitz? Carlos On Tue, Jul 19, 2011 at 1:40 PM, Carlos Becker wrote: > Hi Pauli, thanks for the quick answer. > Is there a way to check the optimization flags of numpy after installation? > > I am away of a matlab installation now, but I remember I saw a single > processor active with matlab. I will check it again soon > > Thanks! > > > > El 19/07/2011, a las 13:10, Pauli Virtanen escribi?: > > > Tue, 19 Jul 2011 11:05:18 +0200, Carlos Becker wrote: >> >>> Hi, I started with numpy a few days ago. I was timing some array >>> operations and found that numpy takes 3 or 4 times longer than Matlab on >>> a simple array-minus-scalar operation. >>> This looks as if there is a lack of vectorization, even though this is >>> just a guess. I hope this is not reposting. I tried searching the >>> mailing list database but did not find anything related specifically to >>> a problem like this one. >>> >> >> I see essentially no performance difference: >> >> Matlab [7.10.0.499 (R2010a)]: 0.0321 >> Numpy [1.6.0]: 0.03117567 >> >> If later versions of Matlab can parallelize the computation across >> multiple processors, that could be one possibility for the difference >> you see. Alternatively, you may have compiled Numpy with optimizations >> turned off. >> >> ______________________________**_________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/**listinfo/numpy-discussion >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chaoyuejoy at gmail.com Tue Jul 19 11:52:11 2011 From: chaoyuejoy at gmail.com (Chao YUE) Date: Tue, 19 Jul 2011 17:52:11 +0200 Subject: [Numpy-discussion] import In-Reply-To: References: Message-ID: Yes, you're right. The problem is, when you use the first one, you may cause a 'name pollution' to the current namespace. read this: http://bytebaker.com/2008/07/30/python-namespaces/ cheers, Chao 2011/7/19 Alex Ter-Sarkissov > this is probably silly question, I've seen in this in one of the > tutorials: > > from tkinter import * > import tkinter.messagebox > > given that * implies importing the whole module, why would anyone bother > with importing a specific command on top of it? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- *********************************************************************************** Chao YUE Laboratoire des Sciences du Climat et de l'Environnement (LSCE-IPSL) UMR 1572 CEA-CNRS-UVSQ Batiment 712 - Pe 119 91191 GIF Sur YVETTE Cedex Tel: (33) 01 69 08 77 30; Fax:01.69.08.77.16 ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Jul 19 12:08:44 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 19 Jul 2011 11:08:44 -0500 Subject: [Numpy-discussion] Alternative to boolean array In-Reply-To: References: Message-ID: On Tue, Jul 19, 2011 at 07:38, Andrea Cimatoribus wrote: > Dear all, > I would like to avoid the use of a boolean array (mask) in the following > statement: > > mask = (A != 0.) > B?????? = A[mask] > > in order to be able to move this bit of code in a cython script (boolean > arrays are not yet implemented there, and they slow down execution a lot as > they can't be defined explicitely). > Any idea of an efficient alternative? You will have to count the number of True values, create the B array with the right size, then run a simple loop to assign into it where A != 0. This makes you do the comparisons twice. Or you can allocate a B array the same size as A, run your loop to assign into it when A != 0 and incrementing the index into B, then slice out or memcpy out the portion that you assigned. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From charlesr.harris at gmail.com Tue Jul 19 12:19:07 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 19 Jul 2011 10:19:07 -0600 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: On Tue, Jul 19, 2011 at 9:49 AM, Carlos Becker wrote: > I made more tests with the same operation, restricting Matlab to use a > single processing unit. I got: > > - Matlab: 0.0063 sec avg > - Numpy: 0.026 sec avg > - Numpy with weave.blitz: 0.0041 > > Note that weave.blitz is even faster than Matlab (slightly). > I tried on an older computer, and I got similar results between matlab and > numpy without weave.blitz, so maybe it has to do with 'new' vectorization > opcodes. > > Anyhow, even though these results are not very promising, it gets worse if > I try to do something like: > > result = (m - 0.5)*0.3 > > and I get the following timings: > > - Matlab: 0.0089 > - Numpy: 0.051 > - Numpy with blitz: 0.0043 > > Now blitz is considerably faster! Anyways, I am concerned about numpy being > much slower, in this case taking 2x the time of the previous operation. > I guess this is because of the way that python operands/arguments are > passed. Should I always use weave.blitz? > > Out of curiosity, what os/architecture are you running on? What version of numpy are you using? By and large, you shouldn't spend time programming in blitz, it will ruin the whole point of using numpy in the first place. If there is an inefficiency somewhere it is better to fix the core problem, whatever it is. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Tue Jul 19 12:44:54 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Tue, 19 Jul 2011 11:44:54 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: On Tue, Jul 19, 2011 at 11:19 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Tue, Jul 19, 2011 at 9:49 AM, Carlos Becker wrote: > >> I made more tests with the same operation, restricting Matlab to use a >> single processing unit. I got: >> >> - Matlab: 0.0063 sec avg >> - Numpy: 0.026 sec avg >> - Numpy with weave.blitz: 0.0041 >> >> Note that weave.blitz is even faster than Matlab (slightly). >> I tried on an older computer, and I got similar results between matlab and >> numpy without weave.blitz, so maybe it has to do with 'new' vectorization >> opcodes. >> >> Anyhow, even though these results are not very promising, it gets worse if >> I try to do something like: >> >> result = (m - 0.5)*0.3 >> >> and I get the following timings: >> >> - Matlab: 0.0089 >> - Numpy: 0.051 >> - Numpy with blitz: 0.0043 >> >> Now blitz is considerably faster! Anyways, I am concerned about numpy >> being much slower, in this case taking 2x the time of the previous >> operation. >> I guess this is because of the way that python operands/arguments are >> passed. Should I always use weave.blitz? >> >> > Out of curiosity, what os/architecture are you running on? What version of > numpy are you using? > > By and large, you shouldn't spend time programming in blitz, it will ruin > the whole point of using numpy in the first place. If there is an > inefficiency somewhere it is better to fix the core problem, whatever it is. > > > > Chuck > Also what version of matlab were you using? -Chris JS > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlosbecker at gmail.com Tue Jul 19 15:27:35 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Tue, 19 Jul 2011 21:27:35 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: Hi, everything was run on linux. I am using numpy 2.0.0.dev-64fce7c, but I tried an older version (cannot remember which one now) and saw similar results. Matlab is R2011a, and I used taskset to assign its process to a single core. Linux is 32-bit, on Intel Core i7-2630QM. Besides the matlab/numpy comparison, I think that there is an inherent problem with how expressions are handled, in terms of efficiency. For instance, k = (m - 0.5)*0.3 takes 52msec average here (2000x2000 array), while k = (m - 0.5)*0.3*0.2 takes 0.079, and k = (m - 0.5)*0.3*0.2*0.1 takes 101msec. Placing parentheses around the scalar multipliers shows that it seems to have to do with how expressions are handled, is there sometihng that can be done about this so that numpy can deal with expressions rather than single operations chained by python itself? Maybe I am missing the point as well. ---------------------- Carlos Becker On Tue, Jul 19, 2011 at 6:44 PM, Christopher Jordan-Squire wrote: > > > On Tue, Jul 19, 2011 at 11:19 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Tue, Jul 19, 2011 at 9:49 AM, Carlos Becker wrote: >> >>> I made more tests with the same operation, restricting Matlab to use a >>> single processing unit. I got: >>> >>> - Matlab: 0.0063 sec avg >>> - Numpy: 0.026 sec avg >>> - Numpy with weave.blitz: 0.0041 >>> >>> Note that weave.blitz is even faster than Matlab (slightly). >>> I tried on an older computer, and I got similar results between matlab >>> and numpy without weave.blitz, so maybe it has to do with 'new' >>> vectorization opcodes. >>> >>> Anyhow, even though these results are not very promising, it gets worse >>> if I try to do something like: >>> >>> result = (m - 0.5)*0.3 >>> >>> and I get the following timings: >>> >>> - Matlab: 0.0089 >>> - Numpy: 0.051 >>> - Numpy with blitz: 0.0043 >>> >>> Now blitz is considerably faster! Anyways, I am concerned about numpy >>> being much slower, in this case taking 2x the time of the previous >>> operation. >>> I guess this is because of the way that python operands/arguments are >>> passed. Should I always use weave.blitz? >>> >>> >> Out of curiosity, what os/architecture are you running on? What version of >> numpy are you using? >> >> By and large, you shouldn't spend time programming in blitz, it will ruin >> the whole point of using numpy in the first place. If there is an >> inefficiency somewhere it is better to fix the core problem, whatever it is. >> >> >> >> Chuck >> > > Also what version of matlab were you using? > > -Chris JS > > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chad.netzer at gmail.com Tue Jul 19 15:29:43 2011 From: chad.netzer at gmail.com (Chad Netzer) Date: Tue, 19 Jul 2011 14:29:43 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: Message-ID: On Tue, Jul 19, 2011 at 4:05 AM, Carlos Becker wrote: > Hi, I started with numpy a few days ago. I was timing some array operations > and found that numpy takes 3 or 4 times longer than Matlab on a simple > array-minus-scalar operation. Doing these kinds of timings correctly is a tricky issue, and the method you used is at fault. It is testing more than just the vectorized array-minus-scalar operation, it is also timing a range() call and list creation for the loop, as well as vector result object creation and deletion time, both of which add constant overhead to the result (which is itself rather small and susceptible to overhead bias). Whereas the matlab loop range equivalent is part of the syntax itself, and can therefore be optimized better. And depending on the type of garbage collection Matlab uses, it may defer the destruction of the temporaries until after the timing is done (ie. when it exits, whereas Python has to destruct the object on each loop they way you've written it.) First of all, use the 'timeit' module for timing: %python >>> import timeit >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([2000,2000],float)') >>> np.mean(t.repeat(repeat=100, number=1)) 0.022081942558288575 That will at least give you a more accurate timing of just the summing expression itself, and not the loop overhead. Furthermore, you can also reuse the m array for the sum, rather than allocating a new one, which will give you a better idea of just the vectorized subtration time: >>> t=timeit.Timer('m -= 0.5', setup='import numpy as np;m = np.ones([2000,2000],float)') >>> np.mean(t.repeat(repeat=100, number=1)) 0.015955450534820555 Note that the value has dropped considerably. In the end, what you are attempting to time is fairly simple, so any extra overhead you add that is not actually the vectorized sum, will bias your results. You have to be extremely careful with these timing comparisons, since you may be comparing apples to oranges. At the least, try to give the vectorizing code much more work to do, for example you are summing only over about 32 Megs. Try about half a gig, and compare that with Matlab, in order to reduce the percentage of overhead to summing in your timings: >>> t=timeit.Timer('m -= 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=100, number=1)) 0.26796033143997194 Try comparing these examples to your existing Matlab timings, and you should find Python w/ numpy comparing favorably (or even beating Matlab). Of course, then you could improve your Matlab timings; in the end they should be almost the same when done properly. If not, by all means let us know. -Chad From chad.netzer at gmail.com Tue Jul 19 15:51:51 2011 From: chad.netzer at gmail.com (Chad Netzer) Date: Tue, 19 Jul 2011 14:51:51 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: On Tue, Jul 19, 2011 at 2:27 PM, Carlos Becker wrote: > Hi, everything was run on linux. > Placing parentheses around the scalar multipliers shows that it seems to > have to do with how expressions are handled, is there sometihng that can be > done about this so that numpy can deal with expressions rather than single > operations chained by python itself? Numpy is constrained (when using scalars) to Python's normal expression ordering rules, which tend to evaluate left to right. So, once an expression gets promoted to an array, adding more scalars will do array math rather than being able to collapse all the scalar math. To extend my example from before: >>> t=timeit.Timer('k = m - 0.5 + 0.4 - 0.3 + 0.2 - 0.1', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 2.9083001852035522 >>> t=timeit.Timer('k = 0.5 + 0.4 - 0.3 + 0.2 - 0.1 + m', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.52074816226959231 In the second case, the first 4 sums are done in scalar math, effectively collapsing the work down to '0.7 + m', however in the first case the whole expression is upconverted to an array computation right from the start, making the total amount of work much greater. If python had a way of exposing it's expression tree to objects *during execution* and allowed for delayed expression evaluation, such quirks might be avoidable. But it's a complex issue, and not always a problem in practice. In general, as with all computation numerics, you have to be aware of the underlying evaluation order and associativity to fully understand your results, and if you understand that, you can optimize you yourself. So, to show the difference with your example: >>> t=timeit.Timer('k = (m - 0.5)*0.3*0.2', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 1.6823677778244019 >>> t=timeit.Timer('k = 0.2*0.3*(m - 0.5)', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 1.1084311008453369 -C From ralf.gommers at googlemail.com Tue Jul 19 15:55:28 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 19 Jul 2011 21:55:28 +0200 Subject: [Numpy-discussion] X11 system info In-Reply-To: References: Message-ID: On Sun, Jul 17, 2011 at 11:48 PM, Darren Dale wrote: > In numpy.distutils.system info: > > default_x11_lib_dirs = libpaths(['/usr/X11R6/lib','/usr/X11/lib', > '/usr/lib'], platform_bits) > default_x11_include_dirs = ['/usr/X11R6/include','/usr/X11/include', > '/usr/include'] > > These defaults won't work on the forthcoming Ubuntu 11.10, which > installs X into /usr/lib/X11 and /usr/include/X11. > > Do you have a link to where this is described? And what about the 64-bit lib path, will that be /usr/lib64/X11? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Jul 19 16:10:16 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 19 Jul 2011 22:10:16 +0200 Subject: [Numpy-discussion] Build error on Windows In-Reply-To: <4E235A59.8020002@noaa.gov> References: <4E1F9E51.9080702@noaa.gov> <4E1FAE47.2020207@uci.edu> <4E235A59.8020002@noaa.gov> Message-ID: On Sun, Jul 17, 2011 at 11:55 PM, Chris Barker wrote: > On 7/14/2011 8:04 PM, Christoph Gohlke wrote: > >> A patch for the build issues is attached. Remove the build directory >> before rebuilding. >> >> Christoph, > > I had other issues (I think in one case, a *.c file was not getting > re-built from the *.c.src file. But anyway, at the end the patch appears to > work. > > Could someone with commit privileges commit it? > Can someone explain the change in core/setup.py below? 'cmpl' is apparently for Compaq Portable Math Library, but I can't figure out what the 'm' is for. Ralf --- a/numpy/core/setup.py +++ b/numpy/core/setup.py @@ -349,7 +349,7 @@ def check_types(config_cmd, ext, build_dir): def check_mathlib(config_cmd): # Testing the C math library mathlibs = [] - mathlibs_choices = [[],['m'],['cpml']] + mathlibs_choices = [[],['cpml']] mathlib = os.environ.get('MATHLIB') * * -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Jul 19 16:11:13 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 19 Jul 2011 20:11:13 +0000 (UTC) Subject: [Numpy-discussion] Array vectorization in numpy References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: On Tue, 19 Jul 2011 17:49:14 +0200, Carlos Becker wrote: > I made more tests with the same operation, restricting Matlab to use a > single processing unit. I got: > > - Matlab: 0.0063 sec avg > - Numpy: 0.026 sec avg > - Numpy with weave.blitz: 0.0041 To check if it's an issue with building without optimizations, look at the build log: C compiler: gcc -pthread -fno-strict-aliasing "-ggdb" -fPIC ... gcc: build/src.linux-x86_64-2.7/numpy/core/src/umath/umathmodule.c I.e., look on the "C compiler:" line nearest to the "umathmodule" compilation. Above is an example with no optimization. *** For me, compared to zeroing the memory via memset & plain C implementation (Numpy 1.6.0 / gcc): Blitz: 0.00746664 Numpy: 0.00711051 Zeroing (memset): 0.00263333 Operation in C: 0.00706667 with "gcc -O3 -ffast-math -march=native -mfpmath=sse" optimizations for the C code (involving SSE2 vectorization and whatnot, looking at the assembler output). Numpy is already going essentially at the maximum speed. ----------------- #include #include #include #include int main() { double *a, *b; int N = 2000*2000, M=300; int j; int k; clock_t start, end; a = (double*)malloc(sizeof(double)*N); b = (double*)malloc(sizeof(double)*N); start = clock(); for (k = 0; k < M; ++k) { memset(a, '\0', sizeof(double)*N); } end = clock(); printf("Zeroing (memset): %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M); start = clock(); for (k = 0; k < M; ++k) { for (j = 0; j < N; ++j) { b[j] = a[j] - 0.5; } } end = clock(); printf("Operation in C: %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M); return 0; } From nadavh at visionsense.com Tue Jul 19 16:11:12 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 19 Jul 2011 13:11:12 -0700 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> , Message-ID: <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> For such expressions you should try numexpr package: It allows the same type of optimisation as Matlab does: run a single loop over the matrix elements instead of repetitive loops and intermediate objects creation. Nadav > Besides the matlab/numpy comparison, I think that there is an inherent problem with how expressions are handled, in terms of efficiency. > For instance, k = (m - 0.5)*0.3 takes 52msec average here (2000x2000 array), while k = (m - 0.5)*0.3*0.2 takes 0.079, and k = (m - 0.5)*0.3*0.2*0.1 > takes 101msec. > Placing parentheses around the scalar multipliers shows that it seems to have to do with how expressions are handled, is there sometihng that can >be done about this so that numpy can deal with expressions rather than single operations chained by python itself? Maybe I am missing the point as well. ---------------------- Carlos Becker From Chris.Barker at noaa.gov Tue Jul 19 16:32:12 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 19 Jul 2011 13:32:12 -0700 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: <4E25E9CC.6060706@noaa.gov> Carlos Becker wrote: > Besides the matlab/numpy comparison, I think that there is an inherent > problem with how expressions are handled, in terms of efficiency. > For instance, k = (m - 0.5)*0.3 takes 52msec average here (2000x2000 > array), while k = (m - 0.5)*0.3*0.2 takes 0.079, and k = (m - > 0.5)*0.3*0.2*0.1 takes 101msec. > Placing parentheses around the scalar multipliers shows that it seems to > have to do with how expressions are handled, is there sometihng that can > be done about this so that numpy can deal with expressions rather than > single operations chained by python itself? well, it is Python, and Python itself does not know anything about array math -- so you need to be careful to do that correctly yourself. Python aside, understanding how parentheses effect computation, even for algebraically equal expressions is a good thing to understand. Aside from issues with scalars, a Python expression like: a = a * b * c does: multiply a and b and put it in a temporary multiply that temporary by c and put that in a temporary assign the final temporary to a so it does, in fact, create two "unnecessary" temporaries for this simple expression. If you are concerned about performance, there are ways to control that. "in-place" operators is one: a *= b a *= c will not create any temporaries, and will probably be faster. It still does two loops through the data, though. If your arrays are too big to fit in cache, that could effect performance. To get around that you need to get fancy. One option is numexpr: http://code.google.com/p/numexpr/ numexpr takes an entire expression as input, and can thus optimize some of this at the expression level, make good use of cache, etc. -- it's pretty cool. There are a few other options, including weave, of course. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From carlosbecker at gmail.com Tue Jul 19 16:35:32 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Tue, 19 Jul 2011 22:35:32 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Thanks Chad for the explanation on those details. I am new to python and I still have a lot to learn, this was very useful. Now I get similar results between matlab and numpy when I re-use the memory allocated for m with 'm -= 0.5'. However, if I don't, I obtain this 4x penalty with numpy, even with the 8092x8092 array. Would it be possible to do k = m - 0.5 and pre-alllocate k such that python does not have to waste time on that? Another interesting case is something like k = m + n + p, which I guess should be run with numexpr in order to accelerate it. Regarding operator evaluation, I thought of the same thing, and it is what would happen in other languages as well if no expression 'templates' ( such as what the Eigen library uses ). I will look at numexpr, I hope I can replace all my matlab image processing and computational needs with python, and interface with C++ code that uses Eigen if I need an extra speed boost. Right now I am trying spyder as a debugger, which looks very nice. If a good debugger is available, it could totally replace matlab/octave for researchers/engineers/etc for some specific needs. I will try numexpr now to see the performance gain. Thanks to all that replied to this topic, it was very useful. On Tue, Jul 19, 2011 at 10:11 PM, Nadav Horesh wrote: > For such expressions you should try numexpr package: It allows the same > type of optimisation as Matlab does: run a single loop over the matrix > elements instead of repetitive loops and intermediate objects creation. > > Nadav > > > Besides the matlab/numpy comparison, I think that there is an inherent > problem with how expressions are handled, in terms of efficiency. > > For instance, k = (m - 0.5)*0.3 takes 52msec average here (2000x2000 > array), while k = (m - 0.5)*0.3*0.2 takes 0.079, and k = (m - > 0.5)*0.3*0.2*0.1 > takes 101msec. > > Placing parentheses around the scalar multipliers shows that it seems to > have to do with how expressions are handled, is there sometihng that can >be > done about this so that numpy can deal with expressions rather than single > operations chained by python itself? Maybe I am missing the point as well. > > ---------------------- > Carlos Becker > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Jul 19 17:06:02 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 19 Jul 2011 16:06:02 -0500 Subject: [Numpy-discussion] pull request review: deprecating PyArrayObject* direct field access Message-ID: https://github.com/numpy/numpy/pull/116 This pull request deprecates direct access to PyArrayObject fields. This direct access has been discouraged for a while through comments in the header file and documentation, but up till now, there was no way to disable it. I've created such a mechanism, and C extensions can test that they don't use deprecated C API by #defining NPY_NO_DEPRECATED_API at the top of the C file. I've confirmed that scipy master builds against this branch, and its test failures look unrelated to these changes (iterative methods failures). Additional testing of different versions and platforms would be appreciated! This also includes a few other miscellaneous changes: - improve error message in PyArray_FromArray - some missingdata NEP changes - allow comma-separated dtype strings to include datetime metadata - http://projects.scipy.org/numpy/ticket/466 -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From chad.netzer at gmail.com Tue Jul 19 18:15:47 2011 From: chad.netzer at gmail.com (Chad Netzer) Date: Tue, 19 Jul 2011 17:15:47 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Tue, Jul 19, 2011 at 3:35 PM, Carlos Becker wrote: > Thanks Chad for the explanation on those details. I am new to python and I > However, if I don't, I obtain this 4x penalty with numpy, even with the > 8092x8092 array. Would it be possible to do k = m - 0.5 and pre-alllocate k > such that python does not have to waste time on that? I suspect the 4x penalty is related to the expression evaluation overhead (temporaries and copying), so hopefully numexpr() will help, or just remembering to use the in-place operators whenever appropriate. To answer your question, though, you can allocate an array, without initializing it, with the empty() function. Note - if you aren't absolutely sure you are going to overwrite every single element of the array, this could leave you with uninitialized values in your array. I'd just go ahead and use the zeros() function instead, to be safe (it's initialized outside the timeit() timing loop): %python >>> import timeit >>> import numpy as np >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float); k = np.zeros(m.size, m.dtype)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.58557529449462886 >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.53153839111328127 >>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.038648796081542966 As you can see, preallocation doesn't seem to affect the results all that much, it's the overhead of creating a temporary, then copying it to the result, that seems to matter here. The in-place operation was much faster. Here we see that just copying m to k, takes up more time than the 'k = m + 0.5' operation: >>> t=timeit.Timer('k = m.copy()', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.63301105499267574 Possibly that is because 8K*8K matrices are a bit too big for this kind of benchmark; I recommend also trying it with 4K*4K, and your original 2K*2K to see if the results are consistent. Remember, the timeit() setup is hiding the initial allocation time of m from the results, but it still exists, and should be accounted for in determining the overall execution time of the in-place operation results. Also, with these large array sizes, make sure these tests are in a fresh python instance, so that the process address space isn't tainted with old object allocations (which may cause your OS to 'swap' the now unused memory, and ruin your timing values). -Chad From lutz.maibaum at gmail.com Tue Jul 19 18:55:02 2011 From: lutz.maibaum at gmail.com (Lutz Maibaum) Date: Tue, 19 Jul 2011 15:55:02 -0700 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Jul 19, 2011, at 3:15 PM, Chad Netzer wrote: > %python >>>> import timeit >>>> import numpy as np > >>>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float); k = np.zeros(m.size, m.dtype)') >>>> np.mean(t.repeat(repeat=10, number=1)) > 0.58557529449462886 > >>>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') >>>> np.mean(t.repeat(repeat=10, number=1)) > 0.53153839111328127 I am surprised that there is any difference between these two approaches at all. I would have thought that in both cases a temporary array holding the result of m-0.5 is created, which is then assigned to the variable k. However, it seems that the second approach is about 10% faster (I see a similar difference on my machine). Why would that be? On the other hand, if I actually use the preallocated space for k, and replace the assignment by k[?] = m - 0.5 then this takes about 40% more time, presumably because the content of the temporary array is copied into k (which must be initialized by m.shape instead of m.size in this case). Thanks, Lutz From pav at iki.fi Tue Jul 19 19:10:54 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 19 Jul 2011 23:10:54 +0000 (UTC) Subject: [Numpy-discussion] Array vectorization in numpy References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Tue, 19 Jul 2011 17:15:47 -0500, Chad Netzer wrote: > On Tue, Jul 19, 2011 at 3:35 PM, Carlos Becker [clip] >> However, if I don't, I obtain this 4x penalty with numpy, even with the >> 8092x8092 array. Would it be possible to do k = m - 0.5 and >> pre-alllocate k such that python does not have to waste time on that? > > I suspect the 4x penalty is related to the expression evaluation > overhead (temporaries and copying), so hopefully numexpr() will help, or > just remembering to use the in-place operators whenever appropriate. Doubtful: k = m - 0.5 does here the same thing as k = np.empty_like(m) np.subtract(m, 0.5, out=k) The memory allocation (empty_like and the subsequent deallocation) costs essentially nothing, and there are no temporaries or copying in `subtract`. *** There's something else going on -- on my machine, the Numpy operation runs exactly at the same speed as C, so this issue must have a platform-dependent explanation. -- Pauli Virtanen From chad.netzer at gmail.com Tue Jul 19 19:42:54 2011 From: chad.netzer at gmail.com (Chad Netzer) Date: Tue, 19 Jul 2011 18:42:54 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Tue, Jul 19, 2011 at 6:10 PM, Pauli Virtanen wrote: > ? ? ? ?k = m - 0.5 > > does here the same thing as > > ? ? ? ?k = np.empty_like(m) > ? ? ? ?np.subtract(m, 0.5, out=k) > > The memory allocation (empty_like and the subsequent deallocation) > costs essentially nothing, and there are no temporaries or copying > in `subtract`. As verification: >>> import timeit >>> import numpy as np >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.53904647827148433 >>> t=timeit.Timer('k = np.empty_like(m);np.subtract(m, 0.5, out=k)', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.54006035327911373 The trivial difference is expected as extra python parsing overhead, I think. Which leads me to apologize, since in my previous post I clearly meant to type "m -= 0.5", not "m =- 0.5", which is *quite* a different operation... Carlos, and Lutz, please take heed. :) In fact, as Lutz pointed out, that example was not at all what I intended to show anyway. So, just to demonstrate how it was wrong: >>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.058299207687377931 >>> t=timeit.Timer('m -= 0.5', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.28192551136016847 >>> t=timeit.Timer('np.subtract(m, 0.5, m)', setup='import numpy as np;m = np.ones([8092,8092],float)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.27014491558074949 >>> t=timeit.Timer('np.subtract(m, 0.5, k)', setup='import numpy as np;m = np.ones([8092,8092],float); k = np.empty_like(m)') >>> np.mean(t.repeat(repeat=10, number=1)) 0.54962997436523442 Perhaps the difference in the last two simply comes down to cache effects (having to iterate over two different large memory blocks, rather than one)? -Chad From sturla at molden.no Wed Jul 20 02:49:14 2011 From: sturla at molden.no (Sturla Molden) Date: Wed, 20 Jul 2011 08:49:14 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: Message-ID: <4E267A6A.20601@molden.no> There is a total lack of vectorization in your code, so you are right about the lack of vectorization. What happens is that you see the result of the Matlab JIT compiler speeding up the loop. With a vectorized array expression, there will hardly be any difference. Sturla Den 19.07.2011 11:05, skrev Carlos Becker: > Hi, I started with numpy a few days ago. I was timing some array > operations and found that numpy takes 3 or 4 times longer than Matlab > on a simple array-minus-scalar operation. > This looks as if there is a lack of vectorization, even though this is > just a guess. I hope this is not reposting. I tried searching the > mailing list database but did not find anything related specifically > to a problem like this one. > > Here there is the python test code: > > -------------------------------------------- > > from datetime import datetime > > import numpy as np > > def test(): > > m = np.ones([2000,2000],float) > > N = 100 > > t1 = datetime.now() > > for x in range(N): > > k = m - 0.5 > > t2 = datetime.now() > > print (t2 - t1).total_seconds() / N > > -------------------------------------------- > > > And matlab: > > > -------------------------------------------- > > m = rand(2000,2000); > > > N = 100; > > tic; > > for I=1:N > > k = m - 0.5; > > end > > toc / N > > -------------------------------------------- > > > I have the impression that the speed boost with Matlab is not related > to matlab optimizations, since singe-runs also render similar timings. > > I tried compiling ATLAS for SSE2 and didn't observe any difference. > Any clues? > > > Thanks, > > Carlos > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlosbecker at gmail.com Wed Jul 20 02:49:21 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Wed, 20 Jul 2011 08:49:21 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Those are very interesting examples. I think that pre-allocation is very important, and something similar happens in Matlab if no pre-allocation is done: it takes 3-4x longer than with pre-allocation. The main difference is that Matlab is able to take into account a pre-allocated array/matrix, probably avoiding the creation of a temporary and writing the results directly in the pre-allocated array. I think this is essential to speed up numpy. Maybe numexpr could handle this in the future? Right now the general use of numexpr is result = numexpr.evaluate("whatever"), so the same problem seems to be there. With this I am not saying that numpy is not worth it, just that for many applications (specially with huge matrices/arrays), pre-allocation does make a huge difference, especially if we want to attract more people to using numpy. ---------------------- Carlos Becker On Wed, Jul 20, 2011 at 1:42 AM, Chad Netzer wrote: > On Tue, Jul 19, 2011 at 6:10 PM, Pauli Virtanen wrote: > > > k = m - 0.5 > > > > does here the same thing as > > > > k = np.empty_like(m) > > np.subtract(m, 0.5, out=k) > > > > The memory allocation (empty_like and the subsequent deallocation) > > costs essentially nothing, and there are no temporaries or copying > > in `subtract`. > > As verification: > > >>> import timeit > >>> import numpy as np > >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.53904647827148433 > > >>> t=timeit.Timer('k = np.empty_like(m);np.subtract(m, 0.5, out=k)', > setup='import numpy as np;m = np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.54006035327911373 > > The trivial difference is expected as extra python parsing overhead, I > think. > > Which leads me to apologize, since in my previous post I clearly meant > to type "m -= 0.5", not "m =- 0.5", which is *quite* a different > operation... Carlos, and Lutz, please take heed. :) In fact, as Lutz > pointed out, that example was not at all what I intended to show > anyway. > > > So, just to demonstrate how it was wrong: > > >>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.058299207687377931 > > >>> t=timeit.Timer('m -= 0.5', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.28192551136016847 > > >>> t=timeit.Timer('np.subtract(m, 0.5, m)', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.27014491558074949 > > >>> t=timeit.Timer('np.subtract(m, 0.5, k)', setup='import numpy as np;m = > np.ones([8092,8092],float); k = np.empty_like(m)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.54962997436523442 > > Perhaps the difference in the last two simply comes down to cache > effects (having to iterate over two different large memory blocks, > rather than one)? > > -Chad > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Wed Jul 20 02:55:36 2011 From: sturla at molden.no (Sturla Molden) Date: Wed, 20 Jul 2011 08:55:36 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: <4E267BE8.8050607@molden.no> Den 19.07.2011 17:49, skrev Carlos Becker: > > - Matlab: 0.0089 > - Numpy: 0.051 > - Numpy with blitz: 0.0043 > > Now blitz is considerably faster! Anyways, I am concerned about numpy > being much slower, in this case taking 2x the time of the previous > operation. > I guess this is because of the way that python operands/arguments are > passed. Should I always use weave.blitz? > That depends on how many milliseconds you intend to save. CPU time is expensive, so you cannot afford to loose any... Sturla From sturla at molden.no Wed Jul 20 03:16:40 2011 From: sturla at molden.no (Sturla Molden) Date: Wed, 20 Jul 2011 09:16:40 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: <4E2680D8.4010506@molden.no> Den 20.07.2011 08:49, skrev Carlos Becker: > > The main difference is that Matlab is able to take into account a > pre-allocated array/matrix, probably avoiding the creation of a > temporary and writing the results directly in the pre-allocated array. > > I think this is essential to speed up numpy. As for speed, I think those who need Fortran, C or Cython knows where to find it. Yes, in certain situations you can make Matlab run faster than NumPy and vice versa. But I want to see an example of a real problem where it really matters -- not just "my faulty loop was a few milliseconds faster in Matlab." NumPy could use a more intelligent memory management and reuse some storage space, but I am not sure how much difference it would make. Sturla From ben.root at ou.edu Wed Jul 20 03:21:14 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 20 Jul 2011 02:21:14 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Wednesday, July 20, 2011, Carlos Becker wrote: > Those are very interesting examples. I think that pre-allocation is very important, and something similar happens in Matlab if no pre-allocation is done: it takes 3-4x longer than with pre-allocation.The main difference is that Matlab is able to take into account a pre-allocated array/matrix, probably avoiding the creation of a temporary and writing the results directly in the pre-allocated array. > > I think this is essential to speed up numpy. Maybe numexpr could handle this in the future? Right now the general use of numexpr is result = numexpr.evaluate("whatever"), so the same problem seems to be there. > > With this I am not saying that numpy is not worth it, just that for many applications (specially with huge matrices/arrays), pre-allocation does make a huge difference, especially if we want to attract more people to using numpy. The ufuncs and many scipy functions take a "out" parameter where you can specify a pre-allocated array. It can be a little awkward writing expressions that way, but the capability is there. But, ultimately, I think the main value with python and numpy is not it's speed, but rather the ease of use and how quickly one can develop working code with it. If you want to squeeze every CPU resources, you could program in assembly, but good luck getting that linear solver done in time. Don't get me wrong, there is always room for improvements, and I would love to see numpy go even faster. However, I doubt that converting matlab users would *require* speed to be the main selling point. Ease of development and full-featured, high-quality standard and third-party libraries have always been the top selling points for me. Ben Root From carlosbecker at gmail.com Wed Jul 20 03:35:37 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Wed, 20 Jul 2011 09:35:37 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Hi all. Thanks for the feedback. My point is not to start a war on matlab/numpy. This comes out of my wish to switch from Matlab to something more appealing. I like numpy and python, being a proper language (not like matlab scripts, whose syntax is patched and destroyed as new versions come up). I am impressed at how complete numpy is, and with numexpr as well. In my case, sometimes it is required to process 1k images or more, and 2x speed improvement in this case means 2 hours of processing vs 4. Someone would say 'switch to C/C++', but this is not my point. This thread came up when comparing matlab and python to find whether performance is somehow similar, and even if numpy is a bit slower, I would not mind. However, 3x-4x is an important difference. I tried the operations with an output argument and didn't see much difference. I have to try that further with other examples. If I find something new I will let you know, I would be glad to switch to numpy soon ;) Cheers, Carlos On Wed, Jul 20, 2011 at 9:21 AM, Benjamin Root wrote: > On Wednesday, July 20, 2011, Carlos Becker wrote: > > Those are very interesting examples. I think that pre-allocation is very > important, and something similar happens in Matlab if no pre-allocation is > done: it takes 3-4x longer than with pre-allocation.The main difference is > that Matlab is able to take into account a pre-allocated array/matrix, > probably avoiding the creation of a temporary and writing the results > directly in the pre-allocated array. > > > > I think this is essential to speed up numpy. Maybe numexpr could handle > this in the future? Right now the general use of numexpr is result = > numexpr.evaluate("whatever"), so the same problem seems to be there. > > > > With this I am not saying that numpy is not worth it, just that for many > applications (specially with huge matrices/arrays), pre-allocation does make > a huge difference, especially if we want to attract more people to using > numpy. > > The ufuncs and many scipy functions take a "out" parameter where you > can specify a pre-allocated array. It can be a little awkward writing > expressions that way, but the capability is there. > > But, ultimately, I think the main value with python and numpy is not > it's speed, but rather the ease of use and how quickly one can develop > working code with it. If you want to squeeze every CPU resources, you > could program in assembly, but good luck getting that linear solver > done in time. > > Don't get me wrong, there is always room for improvements, and I would > love to see numpy go even faster. However, I doubt that converting > matlab users would *require* speed to be the main selling point. Ease > of development and full-featured, high-quality standard and > third-party libraries have always been the top selling points for me. > > Ben Root > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Wed Jul 20 03:40:50 2011 From: sturla at molden.no (Sturla Molden) Date: Wed, 20 Jul 2011 09:40:50 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: Message-ID: <4E268682.8080201@molden.no> Den 19.07.2011 11:05, skrev Carlos Becker: > > > N = 100; > > tic; > > for I=1:N > > k = m - 0.5; > > end > > toc / N > > -------------------------------------------- > > > m = rand(2000,2000); Here, Matlab's JIT compiler can probably hoist the invariant out of the loop, and just do I=N k = m - 0.5 Try this instead: for I=1:N k = I - 0.5; end S.M. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla at molden.no Wed Jul 20 03:49:06 2011 From: sturla at molden.no (Sturla Molden) Date: Wed, 20 Jul 2011 09:49:06 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: <4E268872.6050802@molden.no> Den 20.07.2011 09:35, skrev Carlos Becker: > > In my case, sometimes it is required to process 1k images or more, and > 2x speed improvement in this case means 2 hours of processing vs 4. Can you demonstrate that Matlab is faster than NumPy for this task? Sturla From pav at iki.fi Wed Jul 20 04:58:00 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 20 Jul 2011 08:58:00 +0000 (UTC) Subject: [Numpy-discussion] X11 system info References: Message-ID: Tue, 19 Jul 2011 21:55:28 +0200, Ralf Gommers wrote: > On Sun, Jul 17, 2011 at 11:48 PM, Darren Dale > wrote: >> In numpy.distutils.system info: >> >> default_x11_lib_dirs = libpaths(['/usr/X11R6/lib','/usr/X11/lib', >> '/usr/lib'], platform_bits) >> default_x11_include_dirs = ['/usr/X11R6/include','/usr/X11/include', >> '/usr/include'] >> >> These defaults won't work on the forthcoming Ubuntu 11.10, which >> installs X into /usr/lib/X11 and /usr/include/X11. Did you check that some compilation fails because of this? If not, how did you find the information that the location is changed? On Ubuntu 10.04, the libs are in /usr/lib/i386-linux-gnu/ The same seems to be true for the current Ubuntu 10.10 packages: http://packages.ubuntu.com/oneiric/i386/libxrender1/filelist Do you have a link where the change of location is explained? > Do you have a link to where this is described? And what about the > 64-bit lib path, will that be /usr/lib64/X11? These paths can be found runtime via pkg-config, pkg-config --cflags-only-I xproto pkg-config --libs-only-L xproto However, the convention typically is "#include " so having only "/usr/include" in the path should be OK. But maybe just hardcoding more paths would be enough. There's a lot of stuff in `system_info`... Pauli From pav at iki.fi Wed Jul 20 05:04:09 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 20 Jul 2011 09:04:09 +0000 (UTC) Subject: [Numpy-discussion] Array vectorization in numpy References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Wed, 20 Jul 2011 08:49:21 +0200, Carlos Becker wrote: > Those are very interesting examples. I think that pre-allocation is very > important, and something similar happens in Matlab if no pre-allocation > is done: it takes 3-4x longer than with pre-allocation. The main > difference is that Matlab is able to take into account a pre-allocated > array/matrix, probably avoiding the creation of a temporary and writing > the results directly in the pre-allocated array. You have not demonstrated that the difference you have comes from pre-allocation. If it would come from pre-allocation, how come I get the same speed as an equivalent C implementation, which *does* pre-allocation, using exactly the same benchmark codes as you have posted? -- Pauli Virtanen From pav at iki.fi Wed Jul 20 05:17:14 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 20 Jul 2011 09:17:14 +0000 (UTC) Subject: [Numpy-discussion] Array vectorization in numpy References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Wed, 20 Jul 2011 09:04:09 +0000, Pauli Virtanen wrote: > Wed, 20 Jul 2011 08:49:21 +0200, Carlos Becker wrote: >> Those are very interesting examples. I think that pre-allocation is >> very important, and something similar happens in Matlab if no >> pre-allocation is done: it takes 3-4x longer than with pre-allocation. >> The main difference is that Matlab is able to take into account a >> pre-allocated array/matrix, probably avoiding the creation of a >> temporary and writing the results directly in the pre-allocated array. > > You have not demonstrated that the difference you have comes from > pre-allocation. Also, there are no temporaries in the expression k = m - 0.5 From e.antero.tammi at gmail.com Wed Jul 20 06:57:07 2011 From: e.antero.tammi at gmail.com (eat) Date: Wed, 20 Jul 2011 13:57:07 +0300 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Hi, On Wed, Jul 20, 2011 at 2:42 AM, Chad Netzer wrote: > On Tue, Jul 19, 2011 at 6:10 PM, Pauli Virtanen wrote: > > > k = m - 0.5 > > > > does here the same thing as > > > > k = np.empty_like(m) > > np.subtract(m, 0.5, out=k) > > > > The memory allocation (empty_like and the subsequent deallocation) > > costs essentially nothing, and there are no temporaries or copying > > in `subtract`. > > As verification: > > >>> import timeit > >>> import numpy as np > >>> t=timeit.Timer('k = m - 0.5', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.53904647827148433 > > >>> t=timeit.Timer('k = np.empty_like(m);np.subtract(m, 0.5, out=k)', > setup='import numpy as np;m = np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.54006035327911373 > > The trivial difference is expected as extra python parsing overhead, I > think. > > Which leads me to apologize, since in my previous post I clearly meant > to type "m -= 0.5", not "m =- 0.5", which is *quite* a different > operation... Carlos, and Lutz, please take heed. :) In fact, as Lutz > pointed out, that example was not at all what I intended to show > anyway. > > > So, just to demonstrate how it was wrong Perhaps slightly OT, but here is something very odd going on. I would expect the performance to be in totally different ballpark. > >>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.058299207687377931 > More like: In []: %timeit m =- .5 10000000 loops, best of 3: 35 ns per loop -eat > > >>> t=timeit.Timer('m -= 0.5', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.28192551136016847 > > >>> t=timeit.Timer('np.subtract(m, 0.5, m)', setup='import numpy as np;m = > np.ones([8092,8092],float)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.27014491558074949 > > >>> t=timeit.Timer('np.subtract(m, 0.5, k)', setup='import numpy as np;m = > np.ones([8092,8092],float); k = np.empty_like(m)') > >>> np.mean(t.repeat(repeat=10, number=1)) > 0.54962997436523442 > > Perhaps the difference in the last two simply comes down to cache > effects (having to iterate over two different large memory blocks, > rather than one)? > > -Chad > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chad.netzer at gmail.com Wed Jul 20 07:02:52 2011 From: chad.netzer at gmail.com (Chad Netzer) Date: Wed, 20 Jul 2011 04:02:52 -0700 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Tue, Jul 19, 2011 at 11:49 PM, Carlos Becker wrote: > Those are very interesting examples. Cool. > I think that pre-allocation is very > important, and something similar happens in Matlab if no pre-allocation is > done: it takes 3-4x longer than with pre-allocation. Can you provide a simple example of this in Matlab code? I'd like to see the code you are testing with, and the numbers you are reporting, all in one post (please). So far we've seen some code in your first post, some numbers in your follow up, but being spread out it makes it hard to know what exactly you are asserting. > The main difference is that Matlab is able to take into account a > pre-allocated array/matrix, probably avoiding the creation of a temporary > and writing the results directly in the pre-allocated array. Now I believe you are guessing. My last example showed the effect of only using a pre-allocated result array in numpy; It was still slower than an in place operation (ie. overwriting the array used to calculate the result), which may be due to machine memory considerations. The simple operation you are testing (an array operated on by a scalar) is dominated by the memory access speeds of reading and writing to the large arrays. With a separate, pre-allocated array, there is twice the memory to read and write to, and hence twice the time. At least that's my guess, are you saying Matlab does this 3-4 times faster than numpy? I'd really like to see the *exact* code you are testing, with the specific numbers you are getting for that code, if it's not too much trouble. > With this I am not saying that numpy is not worth it, just that for many > applications (specially with huge matrices/arrays), pre-allocation does make > a huge difference, especially if we want to attract more people to using > numpy. What do you mean by 'pre-allocated'? It is certainly perfectly feasible to pre-allocate numpy arrays and use them as the target of operations, as my examples showed. And you can also easily do sums and multiplies using in-place array operations, with Python nomenclature. It's true that you have to do some work at optimizing some expressions if you wish to avoid temporary array objects being created during multi-term expression evaluations, but the manual discusses this and gives the reasons why. Is this what you mean by pre-allocation? I'm still not sure where exactly you are seeing a problem; can you show us exactly what Matlab code cannot be made to run as efficiently with numpy? -Chad From carlosbecker at gmail.com Wed Jul 20 07:12:27 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Wed, 20 Jul 2011 13:12:27 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: <9C1576F2-50A9-4338-B9EA-610377C8BA12@gmail.com> I will be away from my computer for a week, but what I could try today shows that Matlab JIT is doing some tricks so the results I have shown previously for Matlab are likely to be wrong. In this sense, it seems to be that timings are similar between numpy and matlab if Jit tricks are avoided. Next week I will run more tests. I am planning to summarize the results and put them somewhere on the web, since in some cases numpy +numexpr greatly outperform matlab- however I will first make sure that JIT is not shadowing the conclusions El 20/07/2011, a las 11:04, Pauli Virtanen escribi?: > Wed, 20 Jul 2011 08:49:21 +0200, Carlos Becker wrote: >> Those are very interesting examples. I think that pre-allocation is >> very >> important, and something similar happens in Matlab if no pre- >> allocation >> is done: it takes 3-4x longer than with pre-allocation. The main >> difference is that Matlab is able to take into account a pre- >> allocated >> array/matrix, probably avoiding the creation of a temporary and >> writing >> the results directly in the pre-allocated array. > > You have not demonstrated that the difference you have comes from > pre-allocation. > > If it would come from pre-allocation, how come I get the same speed > as an equivalent C implementation, which *does* pre-allocation, using > exactly the same benchmark codes as you have posted? > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Wed Jul 20 07:31:41 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 20 Jul 2011 11:31:41 +0000 (UTC) Subject: [Numpy-discussion] Array vectorization in numpy References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Wed, 20 Jul 2011 09:04:09 +0000, Pauli Virtanen wrote: > Wed, 20 Jul 2011 08:49:21 +0200, Carlos Becker wrote: >> Those are very interesting examples. I think that pre-allocation is >> very important, and something similar happens in Matlab if no >> pre-allocation is done: it takes 3-4x longer than with pre-allocation. >> The main difference is that Matlab is able to take into account a >> pre-allocated array/matrix, probably avoiding the creation of a >> temporary and writing the results directly in the pre-allocated array. > > You have not demonstrated that the difference you have comes from > pre-allocation. > > If it would come from pre-allocation, how come I get the same speed as > an equivalent C implementation, which *does* pre-allocation, using > exactly the same benchmark codes as you have posted? Ok, I looked at it at a different machine, and in the end I'll have to agree with you :) Some interesting data points: on my Eee laptop (with the test codes below), I get Numpy: 0.0771 Numpy (preallocated): 0.0383 On a different machine, on the other hand: Numpy: 0.0288 Numpy (preallocated): 0.0283 For larger array sizes the situation starts to change (4000x4000): Numpy (allocation & zeroing): 0.161 Numpy: 0.1953 Numpy (preallocated): 0.1153 Also interestingly, Zeroing (memset, per element): 1.04313e-08 # 4000x4000 array Zeroing (memset, per element): 1.05333e-08 # 3000x3000 array Zeroing (memset, per element): 1.04427e-08 # 2048x2048 array Zeroing (memset, per element): 2.24223e-09 # 2048x2047 array Zeroing (memset, per element): 2.1e-09 # 2000x2000 array Zeroing (memset, per element): 1.75e-09 # 200x200 array Zeroing (memset, preallocated, per element): 2.06667e-09 # 3000x3000 Zeroing (memset, preallocated, per element): 2.0504e-09 # 2048x2048 Zeroing (memset, preallocated, per element): 1.94e-09 # 200x200 There is a sharp order-of-magnitude change of speed in malloc+memset of an array, which is not present in memset itself. (This is then also reflected in the Numpy performance -- floating point operations probably don't cost much compared to memory access speed.) It seems that either the kernel or the C library changes the way it handles allocation at that point. So yes, it seems that you were right after all: for large arrays preallocation may be an issue, but the exact size limit depends on the machine etc. in question, which is why I didn't manage to see this at first. *** In this particular case, we are somewhat limited by Python on what optimizations we can do. It is not possible to have the expression "k = m - 0.5" be translated into `np.subtract(m, 0.5, k)` instead of `k = np.subtract(m, 0.5)` because this translation is done by Python itself. If the translation is taken away from Python, e.g., by switching to lazy evaluation or via Numexpr, then things can be improved. There have been some ideas around on implementing matrix algebra lazily in this way, with the ability of reusing temporary buffers. The issue of reusing temporaries in expressions such as `a = 0.3*x + y + z` should however be possible to address within Numpy. Pauli ---------------------------------------------------- #include #include #include #include int main() { double *a, *b; int N = 2000*2000, M=100; int j; int k; clock_t start, end; a = (double*)malloc(sizeof(double)*N); b = (double*)malloc(sizeof(double)*N); start = clock(); for (k = 0; k < M; ++k) { memset(b, '\0', sizeof(double)*N); } end = clock(); printf("Zeroing (memset, preallocated, per element): %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M/N); free(b); start = clock(); for (k = 0; k < M; ++k) { b = (double*)malloc(sizeof(double)*N); memset(b, '\0', sizeof(double)*N); free(b); } end = clock(); printf("Zeroing (memset, per element): %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M/N); b = (double*)malloc(sizeof(double)*N); start = clock(); for (k = 0; k < M; ++k) { for (j = 0; j < N; ++j) { b[j] = a[j] - 0.5; } } end = clock(); printf("Operation in C (preallocated): %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M); free(b); start = clock(); for (k = 0; k < M; ++k) { b = (double*)malloc(sizeof(double)*N); for (j = 0; j < N; ++j) { b[j] = a[j] - 0.5; } free(b); } end = clock(); printf("Operation in C: %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M); return 0; } ---------------------------------------------------- import time import numpy as np print np.__version__, np.__file__ m = np.ones([2000, 2000],float) N = 100 print (m.size * m.dtype.itemsize) / 1e6 t1 = time.clock() for x in range(N): k = np.zeros_like(m) t2 = time.clock() print "Numpy (allocation & zeroing):", (t2 - t1) / N t1 = time.clock() for x in range(N): k = m - 0.5 t2 = time.clock() print "Numpy:", (t2 - t1) / N t1 = time.clock() for x in range(N): np.subtract(m, 0.5, k) t2 = time.clock() print "Numpy (preallocated):", (t2 - t1) / N From chad.netzer at gmail.com Wed Jul 20 07:45:47 2011 From: chad.netzer at gmail.com (Chad Netzer) Date: Wed, 20 Jul 2011 04:45:47 -0700 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Wed, Jul 20, 2011 at 3:57 AM, eat wrote: > Perhaps slightly OT, but here is something very odd going on. I would expect > the performance to be in totally different ballpark. >> >> >>> t=timeit.Timer('m =- 0.5', setup='import numpy as np;m = >> >>> np.ones([8092,8092],float)') >> >>> np.mean(t.repeat(repeat=10, number=1)) >> 0.058299207687377931 > > More like: > In []: %timeit m =- .5 > 10000000 loops, best of 3: 35 ns per loop > -eat I think that's the effect of the timer having a low resolution, and the repeat value being 10, instead of the default 1000000. For the huge array operations, a small repeat value wasn't a problem. But my mistake made it a simple python assignment, and for such a quick operation you need to repeat it a great many times between timer calls to get a meaningful result: In [1]: %timeit m = -0.5 10000000 loops, best of 3: 39.1 ns per loop In [2]: import timeit In [3]: t=timeit.Timer('m = -0.5') In [4]: t.timeit(number=1000000000) Out[35]: 38.36219096183777 So, directly repeating the assignment a billion times puts it into 'nanoseconds per assignment' units, and the results from ipython %timeit and the timeit() call are comparable (approx 39 nanoseconds per loop). -Chad From schut at sarvision.nl Wed Jul 20 08:54:14 2011 From: schut at sarvision.nl (Vincent Schut) Date: Wed, 20 Jul 2011 14:54:14 +0200 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> Message-ID: > with "gcc -O3 -ffast-math -march=native -mfpmath=sse" optimizations > for the C code (involving SSE2 vectorization and whatnot, looking at > the assembler output). Numpy is already going essentially at the maximum > speed. As a related side question that I've been wondering myself for some time already: what is the preferred way to compile numpy/scipy with those gcc optimization flags? Afaik, numpy's setup.py is simply picking up the flags that my distro's python was compiled with... Would the best way be to recompile python myself? Or could I fine-tune the gcc options just for numpy/scipy somehow? Vincent. From robert.kern at gmail.com Wed Jul 20 11:16:14 2011 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 Jul 2011 10:16:14 -0500 Subject: [Numpy-discussion] X11 system info In-Reply-To: References: Message-ID: On Wed, Jul 20, 2011 at 03:58, Pauli Virtanen wrote: > Tue, 19 Jul 2011 21:55:28 +0200, Ralf Gommers wrote: >> On Sun, Jul 17, 2011 at 11:48 PM, Darren Dale >> wrote: >>> In numpy.distutils.system info: >>> >>> ? ?default_x11_lib_dirs = libpaths(['/usr/X11R6/lib','/usr/X11/lib', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '/usr/lib'], platform_bits) >>> ? ?default_x11_include_dirs = ['/usr/X11R6/include','/usr/X11/include', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'/usr/include'] >>> >>> These defaults won't work on the forthcoming Ubuntu 11.10, which >>> installs X into /usr/lib/X11 and /usr/include/X11. > > Did you check that some compilation fails because of this? Enthought's Enable will probably fail. It uses the system_info infrastructure to find the X11 headers and libraries. It has perennially been fragile to build because of the unexpected variation in locations. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From zelbier at gmail.com Wed Jul 20 11:47:29 2011 From: zelbier at gmail.com (Olivier Verdier) Date: Wed, 20 Jul 2011 17:47:29 +0200 Subject: [Numpy-discussion] Difference between frompyfunc and vectorize? Message-ID: Dear NumPy gurus, I don't get the difference between frompyfunc and vectorize. What is their respective use cases? Thanks! == Olivier From pav at iki.fi Wed Jul 20 12:08:18 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 20 Jul 2011 16:08:18 +0000 (UTC) Subject: [Numpy-discussion] Array vectorization in numpy References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Wed, 20 Jul 2011 11:31:41 +0000, Pauli Virtanen wrote: [clip] > There is a sharp order-of-magnitude change of speed in malloc+memset of > an array, which is not present in memset itself. (This is then also > reflected in the Numpy performance -- floating point operations probably > don't cost much compared to memory access speed.) It seems that either > the kernel or the C library changes the way it handles allocation at > that point. The explanation seems to be the following: (a) When the process adjusts the size of its heap, the kernel must zero new pages it gives to the process (because they might contain sensitive information from other processes) [1] (b) GNU libc hangs onto some memory even after free() is called, so that the heap size doesn't need to be adjusted continuously. This is controlled by parameters that can be tuned with the mallopt() function. [2] Because of (a), there is a performance hit probably equivalent to `memset(buf, 0, size)` or more (kernel overheads?) for using newly allocated memory the first time. But because of (b), this hit mainly applies to buffers larger than some threshold. Preallocating can get rid of this overhead, but it probably only matters in places where you reuse the same memory many times, and the operations done are not much more expensive than whatever the kernel needs to do. Alternatively, you can call mallopt(M_TRIM_THRESHOLD, N); mallopt(M_TOP_PAD, N); mallopt(M_MMAP_MAX, 0); with large enough `N`, and let libc manage the memory reuse for you. .. [1] http://stackoverflow.com/questions/1327261 .. [2] http://www.gnu.org/s/hello/manual/libc/Malloc-Tunable-Parameters.html -- Pauli Virtanen From brett.olsen at gmail.com Wed Jul 20 12:16:49 2011 From: brett.olsen at gmail.com (Brett Olsen) Date: Wed, 20 Jul 2011 11:16:49 -0500 Subject: [Numpy-discussion] Alternative to boolean array In-Reply-To: References: Message-ID: On Tue, Jul 19, 2011 at 11:08 AM, Robert Kern wrote: > On Tue, Jul 19, 2011 at 07:38, Andrea Cimatoribus > wrote: >> Dear all, >> I would like to avoid the use of a boolean array (mask) in the following >> statement: >> >> mask = (A != 0.) >> B?????? = A[mask] >> >> in order to be able to move this bit of code in a cython script (boolean >> arrays are not yet implemented there, and they slow down execution a lot as >> they can't be defined explicitely). >> Any idea of an efficient alternative? > > You will have to count the number of True values, create the B array > with the right size, then run a simple loop to assign into it where A > != 0. This makes you do the comparisons twice. > > Or you can allocate a B array the same size as A, run your loop to > assign into it when A != 0 and incrementing the index into B, then > slice out or memcpy out the portion that you assigned. According to my calculations, the last method is the fastest, though the savings aren't considerable. In cython, defining some test mask functions (saved as cython_mask.pyx): import numpy as N cimport numpy as N def mask1(N.ndarray[N.int32_t, ndim=1] A): cdef N.ndarray[N.int32_t, ndim=1] B B = A[A != 0] return B def mask2(N.ndarray[N.int32_t, ndim=1] A): cdef int i cdef int count = 0 for i in range(len(A)): if A[i] == 0: continue count += 1 cdef N.ndarray[N.int32_t, ndim=1] B = N.empty(count, dtype=int) count = 0 for i in range(len(A)): if A[i] == 0: continue B[count] = A[i] count += 1 return B def mask3(N.ndarray[N.int32_t, ndim=1] A): cdef N.ndarray[N.int32_t, ndim=1] B = N.empty(len(A), dtype=int) cdef int i cdef int count = 0 for i in range(len(A)): if A[i] == 0: continue B[count] = A[i] count += 1 return B[:count] In [1]: import numpy as N In [2]: import timeit In [3]: from cython_mask import * In [4]: A = N.random.randint(0, 2, 10000) In [5]: def mask4(A): ...: return A[A != 0] ...: In [6]: %timeit mask1(A) 10000 loops, best of 3: 195 us per loop In [7]: %timeit mask2(A) 10000 loops, best of 3: 136 us per loop In [8]: %timeit mask3(A) 10000 loops, best of 3: 117 us per loop In [9]: %timeit mask4(A) 10000 loops, best of 3: 193 us per loop ~Brett From mwwiebe at gmail.com Wed Jul 20 13:00:49 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 20 Jul 2011 12:00:49 -0500 Subject: [Numpy-discussion] Build error on Windows In-Reply-To: References: <4E1F9E51.9080702@noaa.gov> <4E1FAE47.2020207@uci.edu> <4E235A59.8020002@noaa.gov> Message-ID: The 'm' seems to be the math library on Linux, removing it breaks the build for me. I've put this patch, minus removing the 'm', in a pull request along with hopefully a fix for http://projects.scipy.org/numpy/ticket/1909. https://github.com/numpy/numpy/pull/118 -Mark On Tue, Jul 19, 2011 at 3:10 PM, Ralf Gommers wrote: > > > On Sun, Jul 17, 2011 at 11:55 PM, Chris Barker wrote: > >> On 7/14/2011 8:04 PM, Christoph Gohlke wrote: >> >>> A patch for the build issues is attached. Remove the build directory >>> before rebuilding. >>> >>> Christoph, >> >> I had other issues (I think in one case, a *.c file was not getting >> re-built from the *.c.src file. But anyway, at the end the patch appears to >> work. >> >> Could someone with commit privileges commit it? >> > > Can someone explain the change in core/setup.py below? 'cmpl' is apparently > for Compaq Portable Math Library, but I can't figure out what the 'm' is > for. > > Ralf > > --- a/numpy/core/setup.py > +++ b/numpy/core/setup.py > @@ -349,7 +349,7 @@ def check_types(config_cmd, ext, build_dir): > def check_mathlib(config_cmd): > # Testing the C math library > mathlibs = [] > - mathlibs_choices = [[],['m'],['cpml']] > + mathlibs_choices = [[],['cpml']] > mathlib = os.environ.get('MATHLIB') > * * > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bbelson at princeton.edu Wed Jul 20 15:02:30 2011 From: bbelson at princeton.edu (Brandt Belson) Date: Wed, 20 Jul 2011 15:02:30 -0400 Subject: [Numpy-discussion] f2py and openmp on mac os x with gfortran Message-ID: Hello, I'm struggling to create openmp subroutines. I've simplified the problem down to the subroutine below. -- play.f90 -- subroutine step(soln,n) implicit none integer n,i real*8 soln(n) !f2py intent(in) n !f2py intent(out) soln !f2py depend(n) soln !$OMP PARALLEL DO do i=1,n soln(i) = .1 end do !$OMP END PARALLEL DO end subroutine step I compile this with the command: f2py -c -m play play.f90 --fcompiler=gfortran --f90flags="-fopenmp" This completes successfully. When I import the module, I get the following error message. $ python -c 'import play' Traceback (most recent call last): File "", line 1, in ImportError: dlopen(./play.so, 2): Symbol not found: _GOMP_parallel_end Referenced from: /home/bbelson/Desktop/SOR/play.so Expected in: flat namespace in /home/bbelson/Desktop/SOR/play.so It seems to me that the linking step is broken, however I did not see any other options in the f2py documentation to change the linking step. Did I miss something? Thanks, Brandt -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Wed Jul 20 16:09:11 2011 From: robince at gmail.com (Robin) Date: Wed, 20 Jul 2011 22:09:11 +0200 Subject: [Numpy-discussion] f2py and openmp on mac os x with gfortran In-Reply-To: References: Message-ID: I'm not at my Mac to check the exact paths but see if pointing one of the environment variables LD_LIBRARY_PATH or DYLD_LIBRARY_PATH to a directory where the gfortran openmp libraries can be found - this will depend on where you got gfortran from and the version, but you should be able to find it by following the symlink that is the gfortran command and looking for an appropriate lib/ directory near the target of that. Cheers Robin On Wed, Jul 20, 2011 at 9:02 PM, Brandt Belson wrote: > Hello, > I'm struggling to create openmp subroutines. I've simplified the problem > down to the subroutine below. > -- play.f90 -- > subroutine step(soln,n) > ? implicit none > ? integer n,i > ? real*8 soln(n) > > ? !f2py intent(in) n > ? !f2py intent(out) soln > ? !f2py depend(n) soln > !$OMP PARALLEL DO > ? do i=1,n > ? ? soln(i) = .1 > ? end do > !$OMP END PARALLEL DO > end subroutine step > > I compile this with the command: > f2py -c -m play play.f90 --fcompiler=gfortran --f90flags="-fopenmp" > This completes successfully. When I import the module, I get the following > error message. > $ python -c 'import play' > Traceback (most recent call last): > ? File "", line 1, in > ImportError: dlopen(./play.so, 2): Symbol not found: _GOMP_parallel_end > ? Referenced from: /home/bbelson/Desktop/SOR/play.so > ? Expected in: flat namespace > ?in /home/bbelson/Desktop/SOR/play.so > It seems to me that the linking step is broken, however I did not see any > other options in the f2py documentation to change the linking step. Did I > miss something? > Thanks, > Brandt > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From srean.list at gmail.com Wed Jul 20 18:52:08 2011 From: srean.list at gmail.com (srean) Date: Wed, 20 Jul 2011 17:52:08 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: >> I think this is essential to speed up numpy. Maybe numexpr could handle this in the future? Right now the general use of numexpr is result = numexpr.evaluate("whatever"), so the same problem seems to be there. >> >> With this I am not saying that numpy is not worth it, just that for many applications (specially with huge matrices/arrays), pre-allocation does make a huge difference, especially if we want to attract more people to using numpy. > > The ufuncs and many scipy functions take a "out" parameter where you > can specify a pre-allocated array. ?It can be a little awkward writing > expressions that way, but the capability is there. This is a slight digression: is there a way to have a out parameter like semantics with numexpr. I have always used it as a[:] = numexpr(expression) But I dont think numexpr builds the value in place. Is it possible to have side-effects with numexpr as opposed to obtaining values, for example "a= a * b + c" The documentation is not clear about this. Oh and I do not find the "out" parameter awkward at all. Its very handy. Furthermore, if I may, here is a request that the Blitz++ source be updated. Seems like there is a lot of activity on the Blitz++ repository and weave is very handy too and can be used as easily as numexpr. From mwwiebe at gmail.com Wed Jul 20 19:01:33 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 20 Jul 2011 18:01:33 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: On Wed, Jul 20, 2011 at 5:52 PM, srean wrote: > >> I think this is essential to speed up numpy. Maybe numexpr could handle > this in the future? Right now the general use of numexpr is result = > numexpr.evaluate("whatever"), so the same problem seems to be there. > >> > >> With this I am not saying that numpy is not worth it, just that for many > applications (specially with huge matrices/arrays), pre-allocation does make > a huge difference, especially if we want to attract more people to using > numpy. > > > > The ufuncs and many scipy functions take a "out" parameter where you > > can specify a pre-allocated array. It can be a little awkward writing > > expressions that way, but the capability is there. > > This is a slight digression: is there a way to have a out parameter > like semantics with numexpr. I have always used it as > > a[:] = numexpr(expression) > > But I dont think numexpr builds the value in place. Is it possible to > have side-effects with numexpr as opposed to obtaining values, for > example > > "a= a * b + c" > > The documentation is not clear about this. Oh and I do not find the > "out" parameter awkward at all. Its very handy. Furthermore, if I may, > here is a request that the Blitz++ source be updated. Seems like there > is a lot of activity on the Blitz++ repository and weave is very handy > too and can be used as easily as numexpr. > In order to make sure the 1.6 nditer supports multithreading, I adapted numexpr to use it. The branch which does this is here: http://code.google.com/p/numexpr/source/browse/#svn%2Fbranches%2Fnewiter This supports out, order, and casting parameters, visible here: http://code.google.com/p/numexpr/source/browse/branches/newiter/numexpr/necompiler.py#615 It's pretty much ready to go, just needs someone to do the release management. -Mark _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Thu Jul 21 04:22:39 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 21 Jul 2011 10:22:39 +0200 Subject: [Numpy-discussion] f2py and openmp on mac os x with gfortran In-Reply-To: References: Message-ID: <28A449F0-05BE-41A2-AFC8-36810A9CAC04@gmail.com> Hi, I had the same problem. I think this might work: FFLAGS='-fopenmp' f2py -c (etc) The thing is that f2py doesn't let you pass the -fopenmp flag at the right time to the compiler, so you have to use some sort of environment variable trick. By the way, as far as I know, this is the case also on non-mac platforms. Did that do the trick? Cheers, Paul. On 20. juli 2011, at 21.02, Brandt Belson wrote: > Hello, > I'm struggling to create openmp subroutines. I've simplified the problem down to the subroutine below. > > -- play.f90 -- > subroutine step(soln,n) > implicit none > integer n,i > real*8 soln(n) > > !f2py intent(in) n > !f2py intent(out) soln > !f2py depend(n) soln > > !$OMP PARALLEL DO > do i=1,n > soln(i) = .1 > end do > !$OMP END PARALLEL DO > end subroutine step > > > I compile this with the command: > > f2py -c -m play play.f90 --fcompiler=gfortran --f90flags="-fopenmp" > > This completes successfully. When I import the module, I get the following error message. > > $ python -c 'import play' > Traceback (most recent call last): > File "", line 1, in > ImportError: dlopen(./play.so, 2): Symbol not found: _GOMP_parallel_end > Referenced from: /home/bbelson/Desktop/SOR/play.so > Expected in: flat namespace > in /home/bbelson/Desktop/SOR/play.so > > It seems to me that the linking step is broken, however I did not see any other options in the f2py documentation to change the linking step. Did I miss something? > > Thanks, > Brandt > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From fiolj at yahoo.com Thu Jul 21 05:34:13 2011 From: fiolj at yahoo.com (Juan) Date: Thu, 21 Jul 2011 11:34:13 +0200 Subject: [Numpy-discussion] f2py and openmp on mac os x with gfortran In-Reply-To: References: Message-ID: <4E27F295.8050305@yahoo.com> Hi Brandt, I am on linux and see the same problem. It is solved (at least here) if you add at the end the library libgomp, i.e: f2py -c -m play play.f90 --fcompiler=gfortran --f90flags="-fopenmp" -lgomp Hope it helps, Juan > Hello, > I'm struggling to create openmp subroutines. I've simplified the problem > down to the subroutine below. > > -- play.f90 -- > subroutine step(soln,n) > implicit none > integer n,i > real*8 soln(n) > > !f2py intent(in) n > !f2py intent(out) soln > !f2py depend(n) soln > > !$OMP PARALLEL DO > do i=1,n > soln(i) = .1 > end do > !$OMP END PARALLEL DO > end subroutine step > > > I compile this with the command: > > f2py -c -m play play.f90 --fcompiler=gfortran --f90flags="-fopenmp" > > This completes successfully. When I import the module, I get the following > error message. > > $ python -c 'import play' > Traceback (most recent call last): > File "", line 1, in > ImportError: dlopen(./play.so, 2): Symbol not found: _GOMP_parallel_end > Referenced from: /home/bbelson/Desktop/SOR/play.so > Expected in: flat namespace > in /home/bbelson/Desktop/SOR/play.so > > It seems to me that the linking step is broken, however I did not see any > other options in the f2py documentation to change the linking step. Did I > miss something? > > Thanks, > Brandt From dsdale24 at gmail.com Thu Jul 21 07:49:45 2011 From: dsdale24 at gmail.com (Darren Dale) Date: Thu, 21 Jul 2011 07:49:45 -0400 Subject: [Numpy-discussion] X11 system info In-Reply-To: References: Message-ID: On Wed, Jul 20, 2011 at 4:58 AM, Pauli Virtanen wrote: > Tue, 19 Jul 2011 21:55:28 +0200, Ralf Gommers wrote: >> On Sun, Jul 17, 2011 at 11:48 PM, Darren Dale >> wrote: >>> In numpy.distutils.system info: >>> >>> ? ?default_x11_lib_dirs = libpaths(['/usr/X11R6/lib','/usr/X11/lib', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? '/usr/lib'], platform_bits) >>> ? ?default_x11_include_dirs = ['/usr/X11R6/include','/usr/X11/include', >>> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?'/usr/include'] >>> >>> These defaults won't work on the forthcoming Ubuntu 11.10, which >>> installs X into /usr/lib/X11 and /usr/include/X11. > > Did you check that some compilation fails because of this? > If not, how did you find the information that the location is changed? I discovered the problem when I tried to build the entire Enthought Tool Suite from source on a Kubuntu-11.10 pre-release system. Even after changing the paths to point at the right location, there are other problems, as seen from this traceback for building Enable: /usr/lib/pymodules/python2.7/numpy/distutils/system_info.py:525: UserWarning: Specified path /usr/local/include/python2.7 is invalid. warnings.warn('Specified path %s is invalid.' % d) /usr/lib/pymodules/python2.7/numpy/distutils/system_info.py:525: UserWarning: Specified path /usr/include/suitesparse/python2.7 is invalid. warnings.warn('Specified path %s is invalid.' % d) /usr/lib/pymodules/python2.7/numpy/distutils/system_info.py:525: UserWarning: Specified path is invalid. warnings.warn('Specified path %s is invalid.' % d) /usr/lib/pymodules/python2.7/numpy/distutils/system_info.py:525: UserWarning: Specified path /usr/lib/X1164 is invalid. warnings.warn('Specified path %s is invalid.' % d) Traceback (most recent call last): File "setup.py", line 56, in config = configuration().todict() File "setup.py", line 48, in configuration config.add_subpackage('kiva') File "/usr/lib/pymodules/python2.7/numpy/distutils/misc_util.py", line 972, in add_subpackage caller_level = 2) File "/usr/lib/pymodules/python2.7/numpy/distutils/misc_util.py", line 941, in get_subpackage caller_level = caller_level + 1) File "/usr/lib/pymodules/python2.7/numpy/distutils/misc_util.py", line 878, in _get_configuration_from_setup_py config = setup_module.configuration(*args) File "kiva/setup.py", line 27, in configuration config.add_subpackage('agg') File "/usr/lib/pymodules/python2.7/numpy/distutils/misc_util.py", line 972, in add_subpackage caller_level = 2) File "/usr/lib/pymodules/python2.7/numpy/distutils/misc_util.py", line 941, in get_subpackage caller_level = caller_level + 1) File "/usr/lib/pymodules/python2.7/numpy/distutils/misc_util.py", line 878, in _get_configuration_from_setup_py config = setup_module.configuration(*args) File "kiva/agg/setup.py", line 235, in configuration x11_info = get_info('x11', notfound_action=2) File "/usr/lib/pymodules/python2.7/numpy/distutils/system_info.py", line 308, in get_info return cl().get_info(notfound_action) File "/usr/lib/pymodules/python2.7/numpy/distutils/system_info.py", line 459, in get_info raise self.notfounderror(self.notfounderror.__doc__) numpy.distutils.system_info.X11NotFoundError: X11 libraries not found. From meine at informatik.uni-hamburg.de Thu Jul 21 10:56:21 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Thu, 21 Jul 2011 16:56:21 +0200 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) Message-ID: <201107211656.21611.meine@informatik.uni-hamburg.de> Hi, I have the same problem as Martin DRUON, who wrote 10 days ago: > I have a problem with the ufunc return type of a numpy.ndarray derived > class. In fact, I subclass a numpy.ndarray using the tutorial : > http://docs.scipy.org/doc/numpy/user/basics.subclassing.html > > But, for example, if I execute the "max" ufunc from my subclass, the return > type differs from the return type of the numpy ufunc. BTW: http://projects.scipy.org/numpy/ticket/1904 is meant to describe this, although the example code misses the actual calls to a1.min() and a2.min() in the assertion: # --------------------------------------------------- import numpy class Test(numpy.ndarray): pass a1 = numpy.ndarray((1,)) a2 = Test((1,)) assert type(a1.min()) == type(a2.min()), \ "%s != %s" % (type(a1.min()), type(a2.min())) # --------------------------------------------------- This code fails with 1.6.0, while it worked in 1.3.0. I tend to think that this is a bug (after all, a1.min() does not return ndarray, but an array scalar), but maybe there is also a good reason for this (for us, unexpected) behavor change and a nice solution? Have a nice day, Hans From bbelson at princeton.edu Thu Jul 21 13:50:43 2011 From: bbelson at princeton.edu (Brandt Belson) Date: Thu, 21 Jul 2011 13:50:43 -0400 Subject: [Numpy-discussion] f2py and openmp on mac os x with gfortran Message-ID: Hi all, As Juan said, I didn't include the -lgomp flag for f2py. Once I use that, the f2py module works with openMP as expected. Thanks, Brandt > > Message: 1 > Date: Thu, 21 Jul 2011 11:34:13 +0200 > From: Juan > Subject: Re: [Numpy-discussion] f2py and openmp on mac os x with > gfortran > To: numpy-discussion at scipy.org > Message-ID: <4E27F295.8050305 at yahoo.com> > Content-Type: text/plain; charset=UTF-8 > > Hi Brandt, I am on linux and see the same problem. It is solved (at least > here) > if you add at the end the library libgomp, i.e: > f2py -c -m play play.f90 --fcompiler=gfortran --f90flags="-fopenmp" -lgomp > Hope it helps, > Juan > > > Hello, > > I'm struggling to create openmp subroutines. I've simplified the problem > > down to the subroutine below. > > > > -- play.f90 -- > > subroutine step(soln,n) > > implicit none > > integer n,i > > real*8 soln(n) > > > > !f2py intent(in) n > > !f2py intent(out) soln > > !f2py depend(n) soln > > > > !$OMP PARALLEL DO > > do i=1,n > > soln(i) = .1 > > end do > > !$OMP END PARALLEL DO > > end subroutine step > > > > > > I compile this with the command: > > > > f2py -c -m play play.f90 --fcompiler=gfortran --f90flags="-fopenmp" > > > > This completes successfully. When I import the module, I get the > following > > error message. > > > > $ python -c 'import play' > > Traceback (most recent call last): > > File "", line 1, in > > ImportError: dlopen(./play.so, 2): Symbol not found: _GOMP_parallel_end > > Referenced from: /home/bbelson/Desktop/SOR/play.so > > Expected in: flat namespace > > in /home/bbelson/Desktop/SOR/play.so > > > > It seems to me that the linking step is broken, however I did not see any > > other options in the f2py documentation to change the linking step. Did I > > miss something? > > > > Thanks, > > Brandt > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Thu Jul 21 16:43:05 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 21 Jul 2011 22:43:05 +0200 Subject: [Numpy-discussion] ANN: NumPy 1.6.1 release Message-ID: Hi, I am pleased to announce the availability of NumPy 1.6.1. This is a bugfix release for the 1.6.x series; the list of fixed bugs is given below. Sources and binaries can be found at http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/ Thanks to anyone who contributed to this release. Enjoy, The NumPy developers Bug fixes for NumPy 1.6.1 ------------------------- #1834 einsum fails for specific shapes #1837 einsum throws nan or freezes python for specific array shapes #1838 object <-> structured type arrays regression #1851 regression for SWIG based code in 1.6.0 #1863 Buggy results when operating on array copied with astype() #1870 Fix corner case of object array assignment #1843 Py3k: fix error with recarray #1885 nditer: Error in detecting double reduction loop #1874 f2py: fix --include_paths bug #1749 Fix ctypes.load_library() #1895/1896 iter: writeonly operands weren't always being buffered correctly -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Fri Jul 22 01:24:47 2011 From: srean.list at gmail.com (srean) Date: Fri, 22 Jul 2011 00:24:47 -0500 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: >> This is a slight digression: is there a way to have a out parameter >> like semantics with numexpr. I have always used it as >> >> a[:] = numexpr(expression) > In order to make sure the 1.6 nditer supports multithreading, I adapted > numexpr to use it. The branch which does this is here: > http://code.google.com/p/numexpr/source/browse/#svn%2Fbranches%2Fnewiter > This supports out, order, and casting parameters, visible here: > http://code.google.com/p/numexpr/source/browse/branches/newiter/numexpr/necompiler.py#615 > It's pretty much ready to go, just needs someone to do the release > management. > -Mark Oh excellent, I did not know that the out parameter was available. Hope this gets in soon. From miggins at gmail.com Fri Jul 22 07:52:21 2011 From: miggins at gmail.com (Mark Higgins) Date: Fri, 22 Jul 2011 07:52:21 -0400 Subject: [Numpy-discussion] Setup failure on Max OS X Lion Message-ID: <817E3E3D-E3D0-4847-A630-A632ECD1E0B7@gmail.com> I just tried to set up numpy on a new laptop with Mac OS X Lion (10.7) and am running into some problems. The laptop came with python 2.7 installed, but when I downloaded the dmg for numpy from the sourceforge side, it refused to install it saying that it couldn't find python 2.7. Odd, but maybe it was installed in some place numpy didn't expect. So I went to python.org and got the official 2.7 installer dmg and stuck that in. Seemed okay. Then I installed numpy using the dmg. Seemed okay. Now, though, when I start python at the terminal (or through Wing IDE) and import numpy, I get Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/__init__.py", line 78, in from numpy import show_config as show_numpy_config File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/__init__.py", line 137, in import add_newdocs File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/__init__.py", line 4, in from type_check import * File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/type_check.py", line 8, in import numpy.core.numeric as _nx File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/__init__.py", line 5, in import multiarray ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/multiarray.so, 2): no suitable image found. Did find: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/multiarray.so: no matching architecture in universal wrapper I'm not sure whether this is a problem with where the python 2.7 install is located, or whether it's something bespoke about the new Mac OS. I'm pretty sure I followed a similar route on an older laptop (which was OS X 10.6 and has python 2.6 installed by default) and it worked fine; but it was a while ago and I don't really remember. Any suggestions? From miggins at gmail.com Fri Jul 22 12:22:14 2011 From: miggins at gmail.com (Mark Higgins) Date: Fri, 22 Jul 2011 12:22:14 -0400 Subject: [Numpy-discussion] Setup failure on Max OS X Lion In-Reply-To: <817E3E3D-E3D0-4847-A630-A632ECD1E0B7@gmail.com> References: <817E3E3D-E3D0-4847-A630-A632ECD1E0B7@gmail.com> Message-ID: <6E2A36B4-5280-4235-ADED-E1498A76EEF8@gmail.com> Sorted - I downloaded the wrong numpy dmg - I pulled in the mac os x 10.3-labelled dmg instead of the 10.6-labelled dmg, which was further down the list on sourceforge. :) On Jul 22, 2011, at 7:52 AM, Mark Higgins wrote: > I just tried to set up numpy on a new laptop with Mac OS X Lion (10.7) and am running into some problems. > > The laptop came with python 2.7 installed, but when I downloaded the dmg for numpy from the sourceforge side, it refused to install it saying that it couldn't find python 2.7. Odd, but maybe it was installed in some place numpy didn't expect. > > So I went to python.org and got the official 2.7 installer dmg and stuck that in. Seemed okay. > > Then I installed numpy using the dmg. Seemed okay. > > Now, though, when I start python at the terminal (or through Wing IDE) and import numpy, I get > > Traceback (most recent call last): > File "", line 1, in > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/__init__.py", line 78, in > from numpy import show_config as show_numpy_config > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/__init__.py", line 137, in > import add_newdocs > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in > from numpy.lib import add_newdoc > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/__init__.py", line 4, in > from type_check import * > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/type_check.py", line 8, in > import numpy.core.numeric as _nx > File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/__init__.py", line 5, in > import multiarray > ImportError: dlopen(/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/multiarray.so, 2): no suitable image found. Did find: > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/core/multiarray.so: no matching architecture in universal wrapper > > I'm not sure whether this is a problem with where the python 2.7 install is located, or whether it's something bespoke about the new Mac OS. I'm pretty sure I followed a similar route on an older laptop (which was OS X 10.6 and has python 2.6 installed by default) and it worked fine; but it was a while ago and I don't really remember. > > Any suggestions? > > From chad.netzer at gmail.com Fri Jul 22 13:53:26 2011 From: chad.netzer at gmail.com (Chad Netzer) Date: Fri, 22 Jul 2011 10:53:26 -0700 Subject: [Numpy-discussion] Setup failure on Max OS X Lion In-Reply-To: <817E3E3D-E3D0-4847-A630-A632ECD1E0B7@gmail.com> References: <817E3E3D-E3D0-4847-A630-A632ECD1E0B7@gmail.com> Message-ID: On Fri, Jul 22, 2011 at 4:52 AM, Mark Higgins wrote: > I just tried to set up numpy on a new laptop with Mac OS X Lion (10.7) and am running into some problems. > > Any suggestions? On a freshly upgraded-to Lion installation: $ which python /usr/bin/python $ python Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.__version__ '1.5.1' >>> numpy.__file__ '/System/Library/Frameworks/Python.framework/Versions/2.7/Extras/lib/python/numpy/__init__.py' If you need the latest release, instead of the Apple supplied version 1.5.1: $ sudo easy_install -U numpy --SNIPPED-- $ python Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.__version__ '1.6.1' >>> numpy.__file__ '/Library/Python/2.7/site-packages/numpy-1.6.1-py2.7-macosx-10.7-intel.egg/numpy/__init__.pyc' >>> numpy.int_ Consider using 'pip' instead of 'easy_install', for adding and searching PyPI packages: $ sudo easy_install pip $ sudo pip search numpy If you need a newer python later on, consider using Homebrew (after installing Xcode 4.1): http://mxcl.github.com/homebrew/ http://github.com/mxcl/homebrew/wiki/Installation # Note that Homebrew does *not* need to use 'sudo', and after installing python # both 'pip' and 'easy_install' will be the Homebrew installed versions... $ ruby -e "$(curl -fsSL https://raw.github.com/gist/323731)" $ brew update $ brew install python $ hash -r $ python Python 2.7.2 (default, Jul 22 2011, 10:32:09) [GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin Type "help", "copyright", "credits" or "license" for more information. $ pip install --upgrade numpy etc... -Chad From gmane at blindgoat.org Fri Jul 22 14:47:51 2011 From: gmane at blindgoat.org (martin smith) Date: Fri, 22 Jul 2011 14:47:51 -0400 Subject: [Numpy-discussion] Multi-taper spectral analysis code available Message-ID: This is the initial release of a module that implements Thomson's multi-taper spectral analysis algorithms. The code is based on a subroutine from Lees and Park and has, of course, a python interface. References are provided in the readme file. The code has seen substantial usage and should be fairly reliable. Examples are included. It's available at http://code.google.com/p/pymutt/. Martin L. Smith Blindgoat Geophysics From mlist at re-factory.de Sun Jul 24 18:43:46 2011 From: mlist at re-factory.de (Robert Elsner) Date: Mon, 25 Jul 2011 00:43:46 +0200 Subject: [Numpy-discussion] Optimizing recursive loop Message-ID: <4E2CA022.5060600@re-factory.de> Hey Everybody, I am approximating the derivative of nonperiodic functions on [-1,1] using Chebyshev polynomials. The implementation is straightforward and works well but is painfully slow. The routine wastes most of its time on a trivial operation (see comment in the code) Unfortunately the spectral coefficients are calculated recursively and thus I haven't found a way to speed up the code. I am aware of list comprehensions, the speed advantage of calling native numpy functions etc. But no luck yet finding an alternate formulation that speeds up the calculation. I attached a stripped down variant of the code. I am very grateful for tips or directions where to look for a solution (well I know writing a small C extension might be the way to go but Id love to have speedy Python) Note: The DCT takes almost no time compared to the statement in the loop. Cheers Robert -------------- next part -------------- A non-text attachment was scrubbed... Name: optimization.py Type: text/x-python Size: 1173 bytes Desc: not available URL: From mlist at re-factory.de Sun Jul 24 19:10:14 2011 From: mlist at re-factory.de (Robert Elsner) Date: Mon, 25 Jul 2011 01:10:14 +0200 Subject: [Numpy-discussion] Optimizing recursive loop - updated example In-Reply-To: <4E2CA022.5060600@re-factory.de> References: <4E2CA022.5060600@re-factory.de> Message-ID: <4E2CA656.9010509@re-factory.de> Boiled it down a bit more to only include code that actually takes time. First time around I found the other variant more instructive because it shows the discrepancy between the DCT and the loop but might be confusing. Thus here the bare minimum that correctly calculates the coefficients of the first derivative from the coefficients of the Chebyshev polynomials. Cheers Robert On 25.07.2011 00:43, Robert Elsner wrote: > Hey Everybody, > > I am approximating the derivative of nonperiodic functions on [-1,1] > using Chebyshev polynomials. The implementation is straightforward and > works well but is painfully slow. The routine wastes most of its time on > a trivial operation (see comment in the code) > Unfortunately the spectral coefficients are calculated recursively and > thus I haven't found a way to speed up the code. I am aware of list > comprehensions, the speed advantage of calling native numpy functions > etc. But no luck yet finding an alternate formulation that speeds up the > calculation. I attached a stripped down variant of the code. I am very > grateful for tips or directions where to look for a solution (well I > know writing a small C extension might be the way to go but Id love to > have speedy Python) > > Note: The DCT takes almost no time compared to the statement in the loop. > > Cheers > Robert > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: optimization.py Type: text/x-python Size: 533 bytes Desc: not available URL: From joonpyro at gmail.com Sun Jul 24 19:38:44 2011 From: joonpyro at gmail.com (Joon Ro) Date: Sun, 24 Jul 2011 18:38:44 -0500 Subject: [Numpy-discussion] Optimizing recursive loop - updated example In-Reply-To: <4E2CA656.9010509@re-factory.de> References: <4E2CA022.5060600@re-factory.de> <4E2CA656.9010509@re-factory.de> Message-ID: For those cases where you cannot vectorize the operation, numpy is usually does not help much. Try using Cython. You will be able to compile the part of the code and the loop will be much faster (can be more than 100 times faster). http://docs.cython.org/ -Joon On Sun, 24 Jul 2011 18:10:14 -0500, Robert Elsner wrote: > Boiled it down a bit more to only include code that actually takes time. > First time around I found the other variant more instructive because it > shows the discrepancy between the DCT and the loop but might be > confusing. Thus here the bare minimum that correctly calculates the > coefficients of the first derivative from the coefficients of the > Chebyshev polynomials. > > Cheers > Robert > > On 25.07.2011 00:43, Robert Elsner wrote: >> Hey Everybody, >> >> I am approximating the derivative of nonperiodic functions on [-1,1] >> using Chebyshev polynomials. The implementation is straightforward and >> works well but is painfully slow. The routine wastes most of its time on >> a trivial operation (see comment in the code) >> Unfortunately the spectral coefficients are calculated recursively and >> thus I haven't found a way to speed up the code. I am aware of list >> comprehensions, the speed advantage of calling native numpy functions >> etc. But no luck yet finding an alternate formulation that speeds up the >> calculation. I attached a stripped down variant of the code. I am very >> grateful for tips or directions where to look for a solution (well I >> know writing a small C extension might be the way to go but Id love to >> have speedy Python) >> >> Note: The DCT takes almost no time compared to the statement in the >> loop. >> >> Cheers >> Robert >> >> >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Using Opera's revolutionary email client: http://www.opera.com/mail/ From mlist at re-factory.de Mon Jul 25 06:30:34 2011 From: mlist at re-factory.de (Robert Elsner) Date: Mon, 25 Jul 2011 12:30:34 +0200 Subject: [Numpy-discussion] Optimizing recursive loop - updated example In-Reply-To: References: <4E2CA022.5060600@re-factory.de> <4E2CA656.9010509@re-factory.de> Message-ID: <4E2D45CA.9000301@re-factory.de> Thanks for the hint. I thought about Cython myself but I was unable to get even the slightest speed gain out of it. Here is the equivalent Cython code with the timing and setup.py. I typed (I think). Am I missing something obvious? Cheers Robert On 25.07.2011 01:38, Joon Ro wrote: > For those cases where you cannot vectorize the operation, numpy is > usually does not help much. > Try using Cython. You will be able to compile the part of the code and > the loop will be much faster (can be more than 100 times faster). > > http://docs.cython.org/ > > -Joon > > > On Sun, 24 Jul 2011 18:10:14 -0500, Robert Elsner > wrote: > >> Boiled it down a bit more to only include code that actually takes time. >> First time around I found the other variant more instructive because it >> shows the discrepancy between the DCT and the loop but might be >> confusing. Thus here the bare minimum that correctly calculates the >> coefficients of the first derivative from the coefficients of the >> Chebyshev polynomials. >> >> Cheers >> Robert >> >> On 25.07.2011 00:43, Robert Elsner wrote: >>> Hey Everybody, >>> >>> I am approximating the derivative of nonperiodic functions on [-1,1] >>> using Chebyshev polynomials. The implementation is straightforward and >>> works well but is painfully slow. The routine wastes most of its >>> time on >>> a trivial operation (see comment in the code) >>> Unfortunately the spectral coefficients are calculated recursively and >>> thus I haven't found a way to speed up the code. I am aware of list >>> comprehensions, the speed advantage of calling native numpy functions >>> etc. But no luck yet finding an alternate formulation that speeds up >>> the >>> calculation. I attached a stripped down variant of the code. I am very >>> grateful for tips or directions where to look for a solution (well I >>> know writing a small C extension might be the way to go but Id love to >>> have speedy Python) >>> >>> Note: The DCT takes almost no time compared to the statement in the >>> loop. >>> >>> Cheers >>> Robert >>> >>> >>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: c_ext_test.py Type: text/x-python Size: 151 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: optimization.pyx URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: setup.py Type: text/x-python Size: 345 bytes Desc: not available URL: From mlist at re-factory.de Mon Jul 25 06:34:43 2011 From: mlist at re-factory.de (Robert Elsner) Date: Mon, 25 Jul 2011 12:34:43 +0200 Subject: [Numpy-discussion] Optimizing recursive loop - updated example In-Reply-To: <4E2D45CA.9000301@re-factory.de> References: <4E2CA022.5060600@re-factory.de> <4E2CA656.9010509@re-factory.de> <4E2D45CA.9000301@re-factory.de> Message-ID: <4E2D46C3.2040006@re-factory.de> Yes I did. Slicing and Cython do not mix too well. Using an explicit loop fixes the problem. In case anybody is interested the code is attached. Thanks for your help Robert On 25.07.2011 12:30, Robert Elsner wrote: > Thanks for the hint. I thought about Cython myself but I was unable to > get even the slightest speed gain out of it. > Here is the equivalent Cython code with the timing and setup.py. I typed > (I think). Am I missing something obvious? > > Cheers > Robert > > On 25.07.2011 01:38, Joon Ro wrote: >> For those cases where you cannot vectorize the operation, numpy is >> usually does not help much. >> Try using Cython. You will be able to compile the part of the code and >> the loop will be much faster (can be more than 100 times faster). >> >> http://docs.cython.org/ >> >> -Joon >> >> >> On Sun, 24 Jul 2011 18:10:14 -0500, Robert Elsner >> wrote: >> >>> Boiled it down a bit more to only include code that actually takes time. >>> First time around I found the other variant more instructive because it >>> shows the discrepancy between the DCT and the loop but might be >>> confusing. Thus here the bare minimum that correctly calculates the >>> coefficients of the first derivative from the coefficients of the >>> Chebyshev polynomials. >>> >>> Cheers >>> Robert >>> >>> On 25.07.2011 00:43, Robert Elsner wrote: >>>> Hey Everybody, >>>> >>>> I am approximating the derivative of nonperiodic functions on [-1,1] >>>> using Chebyshev polynomials. The implementation is straightforward and >>>> works well but is painfully slow. The routine wastes most of its >>>> time on >>>> a trivial operation (see comment in the code) >>>> Unfortunately the spectral coefficients are calculated recursively and >>>> thus I haven't found a way to speed up the code. I am aware of list >>>> comprehensions, the speed advantage of calling native numpy functions >>>> etc. But no luck yet finding an alternate formulation that speeds up >>>> the >>>> calculation. I attached a stripped down variant of the code. I am very >>>> grateful for tips or directions where to look for a solution (well I >>>> know writing a small C extension might be the way to go but Id love to >>>> have speedy Python) >>>> >>>> Note: The DCT takes almost no time compared to the statement in the >>>> loop. >>>> >>>> Cheers >>>> Robert >>>> >>>> >>>> >>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: optimization.pyx URL: From pav at iki.fi Mon Jul 25 06:35:05 2011 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 25 Jul 2011 10:35:05 +0000 (UTC) Subject: [Numpy-discussion] Optimizing recursive loop - updated example References: <4E2CA022.5060600@re-factory.de> <4E2CA656.9010509@re-factory.de> <4E2D45CA.9000301@re-factory.de> Message-ID: Mon, 25 Jul 2011 12:30:34 +0200, Robert Elsner wrote: > Thanks for the hint. I thought about Cython myself but I was unable to > get even the slightest speed gain out of it. Here is the equivalent > Cython code with the timing and setup.py. I typed (I think). Am I > missing something obvious? Cython doesn't vectorize the slice operations such as b[:,j-g] but falls back to Numpy on them. You'll need to convert the slice notation to a loop to get speed gains. -- Pauli Virtanen From charlesr.harris at gmail.com Mon Jul 25 09:59:24 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Jul 2011 07:59:24 -0600 Subject: [Numpy-discussion] Optimizing recursive loop - updated example In-Reply-To: <4E2CA656.9010509@re-factory.de> References: <4E2CA022.5060600@re-factory.de> <4E2CA656.9010509@re-factory.de> Message-ID: On Sun, Jul 24, 2011 at 5:10 PM, Robert Elsner wrote: > Boiled it down a bit more to only include code that actually takes time. > First time around I found the other variant more instructive because it > shows the discrepancy between the DCT and the loop but might be > confusing. Thus here the bare minimum that correctly calculates the > coefficients of the first derivative from the coefficients of the > Chebyshev polynomials. > > Have you tried using an (inverse) discrete sine transform to get the derivative? dT_n/dx = n*U_{n-1}, where U_n is the Chebyshev polynomial of the second kind, sin((n+1)\theta)/sin(theta) where cos(\theta) = x. I don't believe the discrete sine transform is part of scipy, but you can just use the inverse fft instead. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlist at re-factory.de Mon Jul 25 12:24:52 2011 From: mlist at re-factory.de (Robert Elsner) Date: Mon, 25 Jul 2011 18:24:52 +0200 Subject: [Numpy-discussion] Optimizing recursive loop - updated example In-Reply-To: References: <4E2CA022.5060600@re-factory.de> <4E2CA656.9010509@re-factory.de> Message-ID: <4E2D98D4.2080304@re-factory.de> I didn't look into that but it definitely sounds interesting. Especially as the coefficient manipulation is mildly unstable for higher derivatives. Need to work out the math first though ;). Thanks for the hint. On 25.07.2011 15:59, Charles R Harris wrote: > On Sun, Jul 24, 2011 at 5:10 PM, Robert Elsner wrote: > >> Boiled it down a bit more to only include code that actually takes time. >> First time around I found the other variant more instructive because it >> shows the discrepancy between the DCT and the loop but might be >> confusing. Thus here the bare minimum that correctly calculates the >> coefficients of the first derivative from the coefficients of the >> Chebyshev polynomials. >> >> > Have you tried using an (inverse) discrete sine transform to get the > derivative? dT_n/dx = n*U_{n-1}, where U_n is the Chebyshev polynomial of > the second kind, sin((n+1)\theta)/sin(theta) where cos(\theta) = x. I don't > believe the discrete sine transform is part of scipy, but you can just use > the inverse fft instead. > > > > Chuck > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From stefan at sun.ac.za Mon Jul 25 15:19:38 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 25 Jul 2011 12:19:38 -0700 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: Hey all, On Tue, Jun 14, 2011 at 4:34 PM, Mark Wiebe wrote: > These functions are now fully implemented and documented. As always, code > reviews are welcome here: > https://github.com/numpy/numpy/pull/87 I haven't been keeping up with the datetime developments, but I noticed the introduction of more names into the root numpy namespace. About a year (or two?) ago at SciPy, there were discussions about organising the NumPy namespace for 2.0, and halting the introduction of new functions into the root namespace. What is the status quo? Regards St?fan From mwwiebe at gmail.com Mon Jul 25 15:35:19 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 25 Jul 2011 14:35:19 -0500 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: 2011/7/25 St?fan van der Walt > Hey all, > > On Tue, Jun 14, 2011 at 4:34 PM, Mark Wiebe wrote: > > These functions are now fully implemented and documented. As always, code > > reviews are welcome here: > > https://github.com/numpy/numpy/pull/87 > > I haven't been keeping up with the datetime developments, but I > noticed the introduction of more names into the root numpy namespace. > About a year (or two?) ago at SciPy, there were discussions about > organising the NumPy namespace for 2.0, and halting the introduction > of new functions into the root namespace. What is the status quo? > I'm trying to make things fit into the existing system as naturally as possible. The discussion you're talking about ideally should have resulted in some guideline documentation about namespaces, but I don't recall seeing something like that in a prominent place anywhere. -Mark > > Regards > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Jul 25 15:43:30 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 25 Jul 2011 12:43:30 -0700 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: On Mon, Jul 25, 2011 at 12:35 PM, Mark Wiebe wrote: > I'm trying to make things fit into the existing system as naturally as > possible. The discussion you're talking about ideally should have resulted > in some guideline documentation about namespaces, but I don't recall seeing > something like that in a prominent place anywhere. Probably should have! Either way, it's something to consider: if we introduce those functions now, people will start to use them where they are (np.xyz), introducing another change in usage comes 2.0 (or 3.0 or whichever). Regards St?fan From ijstokes at hkl.hms.harvard.edu Mon Jul 25 16:00:50 2011 From: ijstokes at hkl.hms.harvard.edu (Ian Stokes-Rees) Date: Mon, 25 Jul 2011 16:00:50 -0400 Subject: [Numpy-discussion] Problems with numpy binary for Python2.7 + OS X 10.6 In-Reply-To: References: <4E1F9E51.9080702@noaa.gov> <4E1FAE47.2020207@uci.edu> <4E235A59.8020002@noaa.gov> Message-ID: <4E2DCB72.3070608@hkl.hms.harvard.edu> As best I can tell, I have Python 2.7.2 for my system Python: [ijstokes at moose ~]$ python -V Python 2.7.2 [ijstokes at moose ~]$ which python /Library/Frameworks/Python.framework/Versions/2.7/bin/python however when I attempt to install the recent numpy binary python-2.7.2-macosx10.6.dmg I get stopped at the first stage of the install procedure with the error: numpy 1.6.1 can't be installed on this disk. numpy requires System Python 2.7 to install. Any idea what I might be doing wrong? Is it looking for /usr/bin/python2.7? For that, I only have up to 2.6 available. (and 2.5) Cheers, Ian From mwwiebe at gmail.com Mon Jul 25 16:02:02 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 25 Jul 2011 15:02:02 -0500 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: 2011/7/25 St?fan van der Walt > On Mon, Jul 25, 2011 at 12:35 PM, Mark Wiebe wrote: > > I'm trying to make things fit into the existing system as naturally as > > possible. The discussion you're talking about ideally should have > resulted > > in some guideline documentation about namespaces, but I don't recall > seeing > > something like that in a prominent place anywhere. > > Probably should have! Either way, it's something to consider: if we > introduce those functions now, people will start to use them where > they are (np.xyz), introducing another change in usage comes 2.0 (or > 3.0 or whichever). > Absolutely, do you have any suggestions about organizing the datetime functionality? It's as good a place as any to try applying a good namespace convention. -Mark > > Regards > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Jul 25 16:45:46 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Jul 2011 14:45:46 -0600 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: 2011/7/25 St?fan van der Walt > Hey all, > > On Tue, Jun 14, 2011 at 4:34 PM, Mark Wiebe wrote: > > These functions are now fully implemented and documented. As always, code > > reviews are welcome here: > > https://github.com/numpy/numpy/pull/87 > > I haven't been keeping up with the datetime developments, but I > noticed the introduction of more names into the root numpy namespace. > About a year (or two?) ago at SciPy, there were discussions about > organising the NumPy namespace for 2.0, and halting the introduction > of new functions into the root namespace. What is the status quo? > > Datetime is now a numpy type, so to that extent is in the base namespace. One could maybe argue that the calender functions belong in another namespace, and some of what is in numpy/lib (poly1d, financial functions) should have been in separate namespaces to begin with. I'm not sure what else in datetime might belong in its own namespace. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Jul 25 16:46:12 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 25 Jul 2011 13:46:12 -0700 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: On Mon, Jul 25, 2011 at 1:02 PM, Mark Wiebe wrote: >> Probably should have! ?Either way, it's something to consider: if we >> introduce those functions now, people will start to use them where >> they are (np.xyz), introducing another change in usage comes 2.0 (or >> 3.0 or whichever). > > Absolutely, do you have any suggestions about organizing the datetime > functionality? It's as good a place as any to try applying a good namespace > convention. The first thought that comes to mind is simply to keep them in a submodule, so that users can do something like: from numpy.datetime import some_date_func That convention should be very easy to support across the restructuring. The important thing then is to document clearly that these functions are available. Regards St?fan From mwwiebe at gmail.com Mon Jul 25 16:52:48 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Mon, 25 Jul 2011 15:52:48 -0500 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: 2011/7/25 St?fan van der Walt > On Mon, Jul 25, 2011 at 1:02 PM, Mark Wiebe wrote: > >> Probably should have! Either way, it's something to consider: if we > >> introduce those functions now, people will start to use them where > >> they are (np.xyz), introducing another change in usage comes 2.0 (or > >> 3.0 or whichever). > > > > Absolutely, do you have any suggestions about organizing the datetime > > functionality? It's as good a place as any to try applying a good > namespace > > convention. > > The first thought that comes to mind is simply to keep them in a > submodule, so that users can do something like: > > from numpy.datetime import some_date_func > > That convention should be very easy to support across the > restructuring. The important thing then is to document clearly that > these functions are available. > Can't use numpy.datetime, since that conflicts with Python's datetime library, especially in pylab. Can't use numpy.datetime64, since that's already the name of the scalar type. I don't like numpy.dt, that name belongs to a delta t variable in Python. I'm not sure what a good name for the namespace is, actually. -Mark > > Regards > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Mon Jul 25 17:00:39 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Mon, 25 Jul 2011 23:00:39 +0200 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: <20110725210039.GA10839@phare.normalesup.org> On Mon, Jul 25, 2011 at 03:52:48PM -0500, Mark Wiebe wrote: > Can't use numpy.datetime, since that conflicts with Python's datetime > library, especially in pylab. I don't understand that: isn't the point of namespaces to avoid those naming conflicts. To me that's just like saying that numpy.sum shouldn't be named sum because it would conflict with the sum builtin. My 2 euro cents (a endangered currency) Gael From charlesr.harris at gmail.com Mon Jul 25 17:11:00 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Jul 2011 15:11:00 -0600 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: <20110725210039.GA10839@phare.normalesup.org> References: <20110725210039.GA10839@phare.normalesup.org> Message-ID: On Mon, Jul 25, 2011 at 3:00 PM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Mon, Jul 25, 2011 at 03:52:48PM -0500, Mark Wiebe wrote: > > Can't use numpy.datetime, since that conflicts with Python's datetime > > library, especially in pylab. > > I don't understand that: isn't the point of namespaces to avoid those > naming conflicts. To me that's just like saying that numpy.sum shouldn't > be named sum because it would conflict with the sum builtin. > > It's just asking for import problems and general confusion to shadow a Python module, that's why we renamed io to npyio. I'm curious as to what would be in the module. My 2 euro cents (a endangered currency) > > Aren't they all? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Jul 25 17:12:00 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 25 Jul 2011 14:12:00 -0700 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: Message-ID: On Mon, Jul 25, 2011 at 1:52 PM, Mark Wiebe wrote: > Can't use numpy.datetime, since that conflicts with Python's datetime > library, especially in pylab. We're allowed to name the modules under numpy whatever we like--people know that doing "from numpy import *" can (and already does) cause havoc. But maybe "numpy.time" would suffice as a grouping. Regards St?fan From stefan at sun.ac.za Mon Jul 25 17:13:15 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 25 Jul 2011 14:13:15 -0700 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: <20110725210039.GA10839@phare.normalesup.org> Message-ID: On Mon, Jul 25, 2011 at 2:11 PM, Charles R Harris wrote: > It's just asking for import problems and general confusion to shadow a > Python module, that's why we renamed io to npyio. Why? Users can simply do import numpy.io as npyio ? St?fan From ben.root at ou.edu Mon Jul 25 17:19:10 2011 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 25 Jul 2011 16:19:10 -0500 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: <20110725210039.GA10839@phare.normalesup.org> References: <20110725210039.GA10839@phare.normalesup.org> Message-ID: On Monday, July 25, 2011, Gael Varoquaux wrote: > On Mon, Jul 25, 2011 at 03:52:48PM -0500, Mark Wiebe wrote: >> Can't use numpy.datetime, since that conflicts with Python's datetime >> library, especially in pylab. > > I don't understand that: isn't the point of namespaces to avoid those > naming conflicts. To me that's just like saying that numpy.sum shouldn't > be named sum because it would conflict with the sum builtin. > > My 2 euro cents (a endangered currency) > > Gael > I think the problem is that numpy's datetime might not be a drop-in replacement for python's datetime. For operations in python where one would use python's sum, numpy's sum would produce effectively identical results. We even have a few bug reports in our tracker for the use of all and any being ever-so-slightly different than python's, and it can cause some confusion in pylab mode. Admittedly, though, I can't come up with a better name, either. My two cents (also endangered)... Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Jul 25 17:21:10 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 25 Jul 2011 17:21:10 -0400 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: <20110725210039.GA10839@phare.normalesup.org> Message-ID: 2011/7/25 St?fan van der Walt : > On Mon, Jul 25, 2011 at 2:11 PM, Charles R Harris > wrote: >> It's just asking for import problems and general confusion to shadow a >> Python module, that's why we renamed io to npyio. > > Why? ?Users can simply do > > import numpy.io as npyio ? > IIRC this was changed because of a (now fixed) bug in 2to3. Skipper From rowen at uw.edu Mon Jul 25 17:21:25 2011 From: rowen at uw.edu (Russell E. Owen) Date: Mon, 25 Jul 2011 14:21:25 -0700 Subject: [Numpy-discussion] Problems with numpy binary for Python2.7 + OS X 10.6 References: <4E1F9E51.9080702@noaa.gov> <4E1FAE47.2020207@uci.edu> <4E235A59.8020002@noaa.gov> <4E2DCB72.3070608@hkl.hms.harvard.edu> Message-ID: In article <4E2DCB72.3070608 at hkl.hms.harvard.edu>, Ian Stokes-Rees wrote: > As best I can tell, I have Python 2.7.2 for my system Python: > > [ijstokes at moose ~]$ python -V > Python 2.7.2 > > [ijstokes at moose ~]$ which python > /Library/Frameworks/Python.framework/Versions/2.7/bin/python > > however when I attempt to install the recent numpy binary > python-2.7.2-macosx10.6.dmg I get stopped at the first stage of the > install procedure with the error: > > numpy 1.6.1 can't be installed on this disk. numpy requires System > Python 2.7 to install. > > Any idea what I might be doing wrong? Is it looking for > /usr/bin/python2.7? For that, I only have up to 2.6 available. (and 2.5) > > Cheers, > > Ian I believe the error message is misleading (a known bug). From the path you are probably running python.org python (though it could be ActiveState or built from source). Assuming it really is python.org, the next question is: which of the two flavors of python.org Python do you have: the "10.3" version (which is 32-bit only, but very backward compatible), or the "10.6" version (which includes 64-bit support but requires MacOS X 10.6 or later)? There is a separate numpy installer for each (and unfortunately they are not listed near each other in the file list). Maybe you got that match wrong? If in doubt you could reinstall python from python.org. -- Russell From charlesr.harris at gmail.com Mon Jul 25 17:29:36 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 Jul 2011 15:29:36 -0600 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: <20110725210039.GA10839@phare.normalesup.org> Message-ID: 2011/7/25 St?fan van der Walt > On Mon, Jul 25, 2011 at 2:11 PM, Charles R Harris > wrote: > > It's just asking for import problems and general confusion to shadow a > > Python module, that's why we renamed io to npyio. > > Why? Users can simply do > > import numpy.io as npyio ? > > It caused problems with 2to3 for one thing because it was getting imported as io in the package. It is just a bad idea to shadow python modules and best avoided. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nbigaouette at gmail.com Mon Jul 25 17:52:35 2011 From: nbigaouette at gmail.com (Nicolas Bigaouette) Date: Mon, 25 Jul 2011 17:52:35 -0400 Subject: [Numpy-discussion] Get a 1D slice of a 3D data set? Message-ID: Hi all, I have a 3D orthogonal and non-uniform grid representing a scalar field. I'm using matplotlib.image.NonUniformImage() to plot it similarly to imshow(). What I'd like to do is plot the values of the scalar field across a specific line (say, from point A to B). Any suggestion? Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmane at blindgoat.org Tue Jul 26 11:59:31 2011 From: gmane at blindgoat.org (martin smith) Date: Tue, 26 Jul 2011 11:59:31 -0400 Subject: [Numpy-discussion] Multi-taper spectral analysis package: pymutt. Critical update. Message-ID: The current release, version 0.82.0, contains fixes for two major bugs. The first bug is a show-stopping segmentation fault under some versions of Linux and arises from a variable type mismatch in calls to the numpy api. The second bug causes bad spectral values at the Nyquist frequency for series with even lengths and arises from an error in interfacing to a local version of fftpack. I strongly recommend upgrading to the current release. Pymutt is a module that implements Thomson's multi-taper spectral analysis algorithms. The code is based on a subroutine from Lees and Park and has, of course, a python interface. References are provided in the readme file. The code has seen substantial usage and should be fairly reliable. Examples are included. It's available at http://code.google.com/p/pymutt/. Martin L. Smith Blindgoat Geophysics From Chris.Barker at noaa.gov Tue Jul 26 12:59:22 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 26 Jul 2011 09:59:22 -0700 Subject: [Numpy-discussion] Problems with numpy binary for Python2.7 + OS X 10.6 In-Reply-To: <4E2DCB72.3070608@hkl.hms.harvard.edu> References: <4E1F9E51.9080702@noaa.gov> <4E1FAE47.2020207@uci.edu> <4E235A59.8020002@noaa.gov> <4E2DCB72.3070608@hkl.hms.harvard.edu> Message-ID: <4E2EF26A.7040302@noaa.gov> On 7/25/11 1:00 PM, Ian Stokes-Rees wrote: > As best I can tell, I have Python 2.7.2 for my system Python: > > [ijstokes at moose ~]$ python -V > Python 2.7.2 > > [ijstokes at moose ~]$ which python > /Library/Frameworks/Python.framework/Versions/2.7/bin/python yup -- that is probably the python.org python. However, there are now two different builds of 2.7 for OS-X the "10.3" one, which is 32 bit, PPC+Intel, 10.3.9 and above, and the "10.6" build, which is 32bit_64bit, Intel only, and only runs on 10.6 and above. > however when I attempt to install the recent numpy binary > python-2.7.2-macosx10.6.dmg I get stopped at the first stage of the > install procedure with the error: > > numpy 1.6.1 can't be installed on this disk. numpy requires System > Python 2.7 to install. You need a numpy build that matches the python build, I suspect you have a mis-match. Check carefully which ones you downloaded and make sure they match. > Is it looking for > /usr/bin/python2.7? For that, I only have up to 2.6 available. (and 2.5) No. The "System Python" term is a misnomer that somehow has never gotten fixed in the installer -- what it is really looking for is the python.org build. HTH, -Chris > Cheers, > > Ian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From stefan at sun.ac.za Tue Jul 26 16:35:06 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 26 Jul 2011 13:35:06 -0700 Subject: [Numpy-discussion] code review/build & test for datetime business day API In-Reply-To: References: <20110725210039.GA10839@phare.normalesup.org> Message-ID: On Mon, Jul 25, 2011 at 2:29 PM, Charles R Harris wrote: >> Why? ?Users can simply do >> >> import numpy.io as npyio ? >> > > It caused problems with 2to3 for one thing because it was getting imported > as io in the package. It is just a bad idea to shadow python modules and > best avoided. Call me hard-headed, but I feel that "just a bad idea" is not a precise enough justification for obfuscating module names. But then, you are the one working on the code at the moment, so you get to say that :) Cheers St?fan From craigyk at me.com Tue Jul 26 20:11:35 2011 From: craigyk at me.com (Craig Yoshioka) Date: Tue, 26 Jul 2011 17:11:35 -0700 Subject: [Numpy-discussion] lazy loading ndarrays Message-ID: I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach? Thanks, and cheers! -Craig From matthew.brett at gmail.com Tue Jul 26 20:15:59 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 Jul 2011 17:15:59 -0700 Subject: [Numpy-discussion] lazy loading ndarrays In-Reply-To: References: Message-ID: Hi, On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka wrote: > I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. ?That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. ?What is the best way to do this? ?Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? ?I tried some trickery with __array_interface__ but couldn't get it to work very well. ?Should I just use a memmapped array, and give up on a purely 'lazy' approach? What kind of images are you loading? We do lazy loading in nibabel, for medical image type formats: http://nipy.sourceforge.net/nibabel/ - but our images _have_ arrays and headers, rather than (appearing to be) arrays. Thus something like: import nibabel as nib img = nib.load('my_image.img') # data not loaded at this point data = img.get_data() # data loaded now. Maybe memmapped if the format allows If you think you might have similar needs, I'd be very happy to help you get going in nibabel... Best, Matthew From jkington at wisc.edu Tue Jul 26 20:45:43 2011 From: jkington at wisc.edu (Joe Kington) Date: Tue, 26 Jul 2011 16:45:43 -0800 Subject: [Numpy-discussion] lazy loading ndarrays In-Reply-To: References: Message-ID: Similar to what Matthew said, I often find that it's cleaner to make a seperate class with a "data" (or somesuch) property that lazily loads the numpy array. For example, something like: class DataFormat(object): def __init__(self, filename): self.filename = filename for key, value in self._read_header().iteritems(): setattr(self, key, value) @property def data(self): try: return self._data except AttributeError: self._data = self._read_data() return self._data Hope that helps, -Joe On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett wrote: > Hi, > > On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka wrote: > > I want to subclass ndarray to create a class for image and volume data, > and when referencing a file I'd like to have it load the data only when > accessed. That way the class can be used to quickly set and manipulate > header values, and won't load data unless necessary. What is the best way > to do this? Are there any hooks I can use to load the data when an array's > values are first accessed or manipulated? I tried some trickery with > __array_interface__ but couldn't get it to work very well. Should I just > use a memmapped array, and give up on a purely 'lazy' approach? > > What kind of images are you loading? We do lazy loading in nibabel, > for medical image type formats: > > http://nipy.sourceforge.net/nibabel/ > > - but our images _have_ arrays and headers, rather than (appearing to > be) arrays. Thus something like: > > import nibabel as nib > > img = nib.load('my_image.img') > # data not loaded at this point > data = img.get_data() > # data loaded now. Maybe memmapped if the format allows > > If you think you might have similar needs, I'd be very happy to help > you get going in nibabel... > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From craigyk at me.com Tue Jul 26 22:41:56 2011 From: craigyk at me.com (Craig Yoshioka) Date: Tue, 26 Jul 2011 19:41:56 -0700 Subject: [Numpy-discussion] lazy loading ndarrays In-Reply-To: References: Message-ID: ok, that was an alternative strategy I was going to try... but not my favorite as I'd have to explicitly perform all operations on the data portion of the object, and given numpy's mechanics, assignment would also have to be explicit, and creating new image objects implicitly would be trickier: image3 = Image(image1) image3.data = ( image1.data + 19.0 ) * image2.data vs. image3 = ( image1 + 19 ) * image2 I suppose option A isn't that bad though and getting lazy loading would be very straightforward.... -- On a side note, I prefer this construct for lazy operations... curious to see what people's reactions are, ie: that's horrible! class lazy_property(object): ''' meant to be used for lazy evaluation of object attributes. should represent non-mutable return value, as whatever is returned replaces itself permanently. ''' def __init__(self,fget): self.fget = fget def __get__(self,obj,cls): value = self.fget(obj) setattr(obj,self.fget.func_name,value) return value class DataFormat(object): def __init__(self,loader): self.loadData = loader @lazy_property def data(self): return self.loadData() On Jul 26, 2011, at 5:45 PM, Joe Kington wrote: > Similar to what Matthew said, I often find that it's cleaner to make a seperate class with a "data" (or somesuch) property that lazily loads the numpy array. > > For example, something like: > > class DataFormat(object): > def __init__(self, filename): > self.filename = filename > for key, value in self._read_header().iteritems(): > setattr(self, key, value) > > @property > def data(self): > try: > return self._data > except AttributeError: > self._data = self._read_data() > return self._data > > Hope that helps, > -Joe > > On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett wrote: > Hi, > > On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka wrote: > > I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach? > > What kind of images are you loading? We do lazy loading in nibabel, > for medical image type formats: > > http://nipy.sourceforge.net/nibabel/ > > - but our images _have_ arrays and headers, rather than (appearing to > be) arrays. Thus something like: > > import nibabel as nib > > img = nib.load('my_image.img') > # data not loaded at this point > data = img.get_data() > # data loaded now. Maybe memmapped if the format allows > > If you think you might have similar needs, I'd be very happy to help > you get going in nibabel... > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Wed Jul 27 00:40:01 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 26 Jul 2011 21:40:01 -0700 Subject: [Numpy-discussion] lazy loading ndarrays In-Reply-To: References: , Message-ID: <26FC23E7C398A64083C980D16001012D246DFC5F59@VA3DIAXVS361.RED001.local> For lazy data loading I use memory-mapped array (numpy.memmap): I use it to process multi-image files that are much larger than the available RAM. Nadav. ________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Craig Yoshioka [craigyk at me.com] Sent: 27 July 2011 05:41 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] lazy loading ndarrays ok, that was an alternative strategy I was going to try... but not my favorite as I'd have to explicitly perform all operations on the data portion of the object, and given numpy's mechanics, assignment would also have to be explicit, and creating new image objects implicitly would be trickier: image3 = Image(image1) image3.data = ( image1.data + 19.0 ) * image2.data vs. image3 = ( image1 + 19 ) * image2 I suppose option A isn't that bad though and getting lazy loading would be very straightforward.... -- On a side note, I prefer this construct for lazy operations... curious to see what people's reactions are, ie: that's horrible! class lazy_property(object): ''' meant to be used for lazy evaluation of object attributes. should represent non-mutable return value, as whatever is returned replaces itself permanently. ''' def __init__(self,fget): self.fget = fget def __get__(self,obj,cls): value = self.fget(obj) setattr(obj,self.fget.func_name,value) return value class DataFormat(object): def __init__(self,loader): self.loadData = loader @lazy_property def data(self): return self.loadData() On Jul 26, 2011, at 5:45 PM, Joe Kington wrote: Similar to what Matthew said, I often find that it's cleaner to make a seperate class with a "data" (or somesuch) property that lazily loads the numpy array. For example, something like: class DataFormat(object): def __init__(self, filename): self.filename = filename for key, value in self._read_header().iteritems(): setattr(self, key, value) @property def data(self): try: return self._data except AttributeError: self._data = self._read_data() return self._data Hope that helps, -Joe On Tue, Jul 26, 2011 at 4:15 PM, Matthew Brett > wrote: Hi, On Tue, Jul 26, 2011 at 5:11 PM, Craig Yoshioka > wrote: > I want to subclass ndarray to create a class for image and volume data, and when referencing a file I'd like to have it load the data only when accessed. That way the class can be used to quickly set and manipulate header values, and won't load data unless necessary. What is the best way to do this? Are there any hooks I can use to load the data when an array's values are first accessed or manipulated? I tried some trickery with __array_interface__ but couldn't get it to work very well. Should I just use a memmapped array, and give up on a purely 'lazy' approach? What kind of images are you loading? We do lazy loading in nibabel, for medical image type formats: http://nipy.sourceforge.net/nibabel/ - but our images _have_ arrays and headers, rather than (appearing to be) arrays. Thus something like: import nibabel as nib img = nib.load('my_image.img') # data not loaded at this point data = img.get_data() # data loaded now. Maybe memmapped if the format allows If you think you might have similar needs, I'd be very happy to help you get going in nibabel... Best, Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdickinson at enthought.com Wed Jul 27 04:49:24 2011 From: mdickinson at enthought.com (Mark Dickinson) Date: Wed, 27 Jul 2011 09:49:24 +0100 Subject: [Numpy-discussion] nanmin() fails with 'TypeError: cannot reduce a scalar'. Numpy 1.6.0 regression? Message-ID: In NumPy 1.6.0, I get the following behaviour: Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "packages", "demo" or "enthought" for more information. >>> import numpy >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/function_base.py", line 1507, in nanmin return np.fmin.reduce(a.flat) TypeError: cannot reduce on a scalar >>> numpy.__version__ '1.6.0' In NumPy version 1.5.1: Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "packages", "demo" or "enthought" for more information. >>> import numpy >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) 1 >>> numpy.__version__ '1.5.1' Was this change intentional? -- Mark From charlesr.harris at gmail.com Wed Jul 27 08:58:14 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jul 2011 06:58:14 -0600 Subject: [Numpy-discussion] nanmin() fails with 'TypeError: cannot reduce a scalar'. Numpy 1.6.0 regression? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 2:49 AM, Mark Dickinson wrote: > In NumPy 1.6.0, I get the following behaviour: > > > Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) > [GCC 4.0.1 (Apple Inc. build 5493)] on darwin > Type "packages", "demo" or "enthought" for more information. > >>> import numpy > >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) > Traceback (most recent call last): > File "", line 1, in > File > "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/function_base.py", > line 1507, in nanmin > return np.fmin.reduce(a.flat) > TypeError: cannot reduce on a scalar > >>> numpy.__version__ > '1.6.0' > > > In NumPy version 1.5.1: > > Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) > [GCC 4.0.1 (Apple Inc. build 5493)] on darwin > Type "packages", "demo" or "enthought" for more information. > >>> import numpy > >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) > 1 > >>> numpy.__version__ > '1.5.1' > > > Was this change intentional? > > No, it comes from this In [2]: a = numpy.ma.masked_array([1,2,3,4]) In [3]: array(a.flat) Out[3]: array(, dtype='object') i.e., the a.flat iterator is turned into an object array with one element. I'm not sure what the correct fix for this would be. Please open a ticket. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mdroe at stsci.edu Wed Jul 27 12:46:29 2011 From: mdroe at stsci.edu (Michael Droettboom) Date: Wed, 27 Jul 2011 12:46:29 -0400 Subject: [Numpy-discussion] Numpy master breaking matplotlib build Message-ID: <4E3040E5.6070102@stsci.edu> The return type of PyArray_BYTES in the old API compatibility code seems to have changed recently to (void *) which breaks matplotlib builds. This pull request changes it back. Is this correct? https://github.com/numpy/numpy/pull/121 Mike From mwwiebe at gmail.com Wed Jul 27 12:57:28 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 11:57:28 -0500 Subject: [Numpy-discussion] Numpy master breaking matplotlib build In-Reply-To: <4E3040E5.6070102@stsci.edu> References: <4E3040E5.6070102@stsci.edu> Message-ID: Looks good. It might be good to change it back to (void *) for the PyArray_DATA inline function as well, I changed that during lots of tweaking to get things to build properly. -Mark On Wed, Jul 27, 2011 at 11:46 AM, Michael Droettboom wrote: > The return type of PyArray_BYTES in the old API compatibility code seems > to have changed recently to (void *) which breaks matplotlib builds. > This pull request changes it back. Is this correct? > > https://github.com/numpy/numpy/pull/121 > > Mike > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ischnell at enthought.com Wed Jul 27 13:17:53 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Wed, 27 Jul 2011 12:17:53 -0500 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion Message-ID: MacOS Lion: >>> numpy.sqrt([complex(numpy.nan, numpy.inf)]) array([ nan+infj]) Other all system: array([ inf+infj]) This causes a few numpy tests to fail on Lion. The numpy was not compiled using the new LLVM based gcc, it is the same numpy binary I used on other MacOS systems, which was compiled using gcc-4.0.1. However on Lion it is linked to Lions LLVM based gcc runtime, which apparently has some different behavior when it comes to such strange complex values. - Ilan From matthew.brett at gmail.com Wed Jul 27 13:50:09 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 Jul 2011 18:50:09 +0100 Subject: [Numpy-discussion] dtype repr change? Message-ID: Hi, I see that (current trunk): In [9]: np.ones((1,), dtype=bool) Out[9]: array([ True], dtype='bool') - whereas (1.5.1): In [2]: np.ones((1,), dtype=bool) Out[2]: array([ True], dtype=bool) That is breaking quite a few doctests. What is the reason for the change? Something to do with more planned dtypes? Thanks a lot, Matthew From mwwiebe at gmail.com Wed Jul 27 13:54:36 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 12:54:36 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: This was the most consistent way to deal with the parameterized dtype in the repr, making it more future-proof at the same time. It was producing reprs like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly wrong, and putting quotes around it makes it work in general for all possible dtypes, present and future. -Mark On Wed, Jul 27, 2011 at 12:50 PM, Matthew Brett wrote: > Hi, > > I see that (current trunk): > > In [9]: np.ones((1,), dtype=bool) > Out[9]: array([ True], dtype='bool') > > - whereas (1.5.1): > > In [2]: np.ones((1,), dtype=bool) > Out[2]: array([ True], dtype=bool) > > That is breaking quite a few doctests. What is the reason for the > change? Something to do with more planned dtypes? > > Thanks a lot, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Jul 27 13:58:29 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 27 Jul 2011 19:58:29 +0200 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 7:17 PM, Ilan Schnell wrote: > MacOS Lion: > >>> numpy.sqrt([complex(numpy.nan, numpy.inf)]) > array([ nan+infj]) > > Other all system: > array([ inf+infj]) > > This causes a few numpy tests to fail on Lion. The numpy > was not compiled using the new LLVM based gcc, it is the > same numpy binary I used on other MacOS systems, which > was compiled using gcc-4.0.1. However on Lion it is linked > to Lions LLVM based gcc runtime, which apparently has some > different behavior when it comes to such strange complex > values. > > These type of complex corner cases fail on several other platforms, there they are marked as skipped. I propose not to start changing this yet - the compiler change is causing problems with scipy ( http://projects.scipy.org/scipy/ticket/1476) and it's not yet clear what the recommended build setup on Lion should be. Regarding binaries, it may be better to distribute separate ones for each version of OS X from numpy 1.7 / 2.0 (we already do for python 2.7). In that case this particular failure will not occur. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 27 14:01:15 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 Jul 2011 19:01:15 +0100 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: Hi, On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe wrote: > This was the most consistent way to deal with the parameterized dtype in the > repr, making it more future-proof at the same time. It was producing reprs > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly wrong, > and putting quotes around it makes it work in general for all possible > dtypes, present and future. I don't know about you, but I find maintaining doctests across versions changes rather tricky. For our projects, doctests are important as part of the automated tests. At the moment this means that many doctests will break between 1.5.1 and 2.0. What do you think the best way round this problem? See you, Matthew From ischnell at enthought.com Wed Jul 27 14:18:14 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Wed, 27 Jul 2011 13:18:14 -0500 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion In-Reply-To: References: Message-ID: Thanks for you quick response Ralf. Regarding binaries, we are trying to avoid to different EPD binaries for different versions of OSX, as maintaining/distributing/testing more binaries is quite expensive. - Ilan On Wed, Jul 27, 2011 at 12:58 PM, Ralf Gommers wrote: > > > On Wed, Jul 27, 2011 at 7:17 PM, Ilan Schnell > wrote: >> >> MacOS Lion: >> >>> numpy.sqrt([complex(numpy.nan, numpy.inf)]) >> array([ nan+infj]) >> >> Other all system: >> array([ inf+infj]) >> >> This causes a few numpy tests to fail on Lion. ?The numpy >> was not compiled using the new LLVM based gcc, it is the >> same numpy binary I used on other MacOS systems, which >> was compiled using gcc-4.0.1. ?However on Lion it is linked >> to Lions LLVM based gcc runtime, which apparently has some >> different behavior when it comes to such strange complex >> values. >> > These type of complex corner cases fail on several other platforms, there > they are marked as skipped. I propose not to start changing this yet - the > compiler change is causing problems with scipy > (http://projects.scipy.org/scipy/ticket/1476) and it's not yet clear what > the recommended build setup on Lion should be. > > Regarding binaries, it may be better to distribute separate ones for each > version of OS X from numpy 1.7 / 2.0 (we already do for python 2.7). In that > case this particular failure will not occur. > > Cheers, > Ralf > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From ralf.gommers at googlemail.com Wed Jul 27 15:00:10 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 27 Jul 2011 21:00:10 +0200 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 8:18 PM, Ilan Schnell wrote: > Thanks for you quick response Ralf. Regarding binaries, we are > trying to avoid to different EPD binaries for different versions of OSX, > as maintaining/distributing/testing more binaries is quite expensive. > > Agreed, it can be expensive. However, for numpy the main time sink is maintaining 10.5/10.6/10.7 systems and build environments on them, which probably can't be avoided anyway. Building and uploading binaries is quick. And more reliable if build_OS_version == usage_OS_version. I can imagine for EPD the situation is different because it's so large. Ralf > > On Wed, Jul 27, 2011 at 12:58 PM, Ralf Gommers > wrote: > > > > > > On Wed, Jul 27, 2011 at 7:17 PM, Ilan Schnell > > wrote: > >> > >> MacOS Lion: > >> >>> numpy.sqrt([complex(numpy.nan, numpy.inf)]) > >> array([ nan+infj]) > >> > >> Other all system: > >> array([ inf+infj]) > >> > >> This causes a few numpy tests to fail on Lion. The numpy > >> was not compiled using the new LLVM based gcc, it is the > >> same numpy binary I used on other MacOS systems, which > >> was compiled using gcc-4.0.1. However on Lion it is linked > >> to Lions LLVM based gcc runtime, which apparently has some > >> different behavior when it comes to such strange complex > >> values. > >> > > These type of complex corner cases fail on several other platforms, there > > they are marked as skipped. I propose not to start changing this yet - > the > > compiler change is causing problems with scipy > > (http://projects.scipy.org/scipy/ticket/1476) and it's not yet clear > what > > the recommended build setup on Lion should be. > > > > Regarding binaries, it may be better to distribute separate ones for each > > version of OS X from numpy 1.7 / 2.0 (we already do for python 2.7). In > that > > case this particular failure will not occur. > > > > Cheers, > > Ralf > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rowen at uw.edu Wed Jul 27 15:00:56 2011 From: rowen at uw.edu (Russell E. Owen) Date: Wed, 27 Jul 2011 12:00:56 -0700 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion References: Message-ID: In article , Ralf Gommers wrote: > On Wed, Jul 27, 2011 at 7:17 PM, Ilan Schnell wrote: > > > MacOS Lion: > > >>> numpy.sqrt([complex(numpy.nan, numpy.inf)]) > > array([ nan+infj]) > > > > Other all system: > > array([ inf+infj]) > > > > This causes a few numpy tests to fail on Lion. The numpy > > was not compiled using the new LLVM based gcc, it is the > > same numpy binary I used on other MacOS systems, which > > was compiled using gcc-4.0.1. However on Lion it is linked > > to Lions LLVM based gcc runtime, which apparently has some > > different behavior when it comes to such strange complex > > values. > > > > These type of complex corner cases fail on several other platforms, there > they are marked as skipped. I propose not to start changing this yet - the > compiler change is causing problems with scipy ( > http://projects.scipy.org/scipy/ticket/1476) and it's not yet clear what the > recommended build setup on Lion should be. > > Regarding binaries, it may be better to distribute separate ones for each > version of OS X from numpy 1.7 / 2.0 (we already do for python 2.7). In that > case this particular failure will not occur. Please don't distribute a different numpy binary for each version of MacOS X. That makes it very difficult to distribute bundled applications. The current situation is very reasonable, in my opinion: numpy has two Mac binary distributions for Python 2.7: 32-bit 10.3-and-up and 64-bit 10.6-and-up. These match the python.org python distributions. I can't see wanting any more than one per python.org Mac binary. Note that the numpy Mac binaries are not listed next to each other on the numpy sourceforge download page, so some folks are installing the wrong one. If you add even more os-specific flavors the problem is likely to get worse. -- Russell From charlesr.harris at gmail.com Wed Jul 27 15:20:07 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jul 2011 13:20:07 -0600 Subject: [Numpy-discussion] nanmin() fails with 'TypeError: cannot reduce a scalar'. Numpy 1.6.0 regression? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 6:58 AM, Charles R Harris wrote: > > > On Wed, Jul 27, 2011 at 2:49 AM, Mark Dickinson wrote: > >> In NumPy 1.6.0, I get the following behaviour: >> >> >> Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) >> [GCC 4.0.1 (Apple Inc. build 5493)] on darwin >> Type "packages", "demo" or "enthought" for more information. >> >>> import numpy >> >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) >> Traceback (most recent call last): >> File "", line 1, in >> File >> "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/function_base.py", >> line 1507, in nanmin >> return np.fmin.reduce(a.flat) >> TypeError: cannot reduce on a scalar >> >>> numpy.__version__ >> '1.6.0' >> >> >> In NumPy version 1.5.1: >> >> Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) >> [GCC 4.0.1 (Apple Inc. build 5493)] on darwin >> Type "packages", "demo" or "enthought" for more information. >> >>> import numpy >> >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) >> 1 >> >>> numpy.__version__ >> '1.5.1' >> >> >> Was this change intentional? >> >> > No, it comes from this > > In [2]: a = numpy.ma.masked_array([1,2,3,4]) > > In [3]: array(a.flat) > Out[3]: array(, > dtype='object') > > i.e., the a.flat iterator is turned into an object array with one element. > I'm not sure what the correct fix for this would be. Please open a ticket. > > In fact, array no longer recognizes iterators, but a.flat works, so I assume the __array__ attribute of the array iterator is at work. I think nanmin needs to be fixed, because it used a.flat for speed, but it looks like something closer to 'asflat' is needed. In addition, array probably needs to be fixed to accept iterators, I think it used to. How did nanmin interact with the mask of masked arrays in earlier versions? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jul 27 15:25:39 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 14:25:39 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe wrote: > > This was the most consistent way to deal with the parameterized dtype in > the > > repr, making it more future-proof at the same time. It was producing > reprs > > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly > wrong, > > and putting quotes around it makes it work in general for all possible > > dtypes, present and future. > > I don't know about you, but I find maintaining doctests across > versions changes rather tricky. For our projects, doctests are > important as part of the automated tests. At the moment this means > that many doctests will break between 1.5.1 and 2.0. What do you > think the best way round this problem? > I'm not sure what the best approach is. I think the primary use of doctests should be to validate that the documentation matches the implementation, and anything confirming aspects of a software system should be regular tests. In NumPy, there are platform-dependent differences in 32 vs 64 bit and big vs little endian, so the part of the system that changed already couldn't be relied on consistently. I prefer systems where the code output in the documentation is generated as part of the documentation build process instead of being included in the documentation source files. Cheers, Mark > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jul 27 15:32:17 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 14:32:17 -0500 Subject: [Numpy-discussion] nanmin() fails with 'TypeError: cannot reduce a scalar'. Numpy 1.6.0 regression? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 2:20 PM, Charles R Harris wrote: > > > On Wed, Jul 27, 2011 at 6:58 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Jul 27, 2011 at 2:49 AM, Mark Dickinson > > wrote: >> >>> In NumPy 1.6.0, I get the following behaviour: >>> >>> >>> Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) >>> [GCC 4.0.1 (Apple Inc. build 5493)] on darwin >>> Type "packages", "demo" or "enthought" for more information. >>> >>> import numpy >>> >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) >>> Traceback (most recent call last): >>> File "", line 1, in >>> File >>> "/Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/function_base.py", >>> line 1507, in nanmin >>> return np.fmin.reduce(a.flat) >>> TypeError: cannot reduce on a scalar >>> >>> numpy.__version__ >>> '1.6.0' >>> >>> >>> In NumPy version 1.5.1: >>> >>> Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) >>> [GCC 4.0.1 (Apple Inc. build 5493)] on darwin >>> Type "packages", "demo" or "enthought" for more information. >>> >>> import numpy >>> >>> numpy.nanmin(numpy.ma.masked_array([1,2,3,4])) >>> 1 >>> >>> numpy.__version__ >>> '1.5.1' >>> >>> >>> Was this change intentional? >>> >>> >> No, it comes from this >> >> In [2]: a = numpy.ma.masked_array([1,2,3,4]) >> >> In [3]: array(a.flat) >> Out[3]: array(, >> dtype='object') >> >> i.e., the a.flat iterator is turned into an object array with one >> element. I'm not sure what the correct fix for this would be. Please open a >> ticket. >> >> > In fact, array no longer recognizes iterators, but a.flat works, so I > assume the __array__ attribute of the array iterator is at work. I think > nanmin needs to be fixed, because it used a.flat for speed, but it looks > like something closer to 'asflat' is needed. In addition, array probably > needs to be fixed to accept iterators, I think it used to. > I'd guess this slipped through with something I changed when I was in the array construction part of the system, because the test suite doesn't exercise this. Any NumPy behavior we want preserved needs tests! -Mark > > How did nanmin interact with the mask of masked arrays in earlier versions? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Wed Jul 27 15:35:02 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 27 Jul 2011 21:35:02 +0200 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 9:00 PM, Russell E. Owen wrote: > In article > , > Ralf Gommers wrote: > > > On Wed, Jul 27, 2011 at 7:17 PM, Ilan Schnell >wrote: > > > > > MacOS Lion: > > > >>> numpy.sqrt([complex(numpy.nan, numpy.inf)]) > > > array([ nan+infj]) > > > > > > Other all system: > > > array([ inf+infj]) > > > > > > This causes a few numpy tests to fail on Lion. The numpy > > > was not compiled using the new LLVM based gcc, it is the > > > same numpy binary I used on other MacOS systems, which > > > was compiled using gcc-4.0.1. However on Lion it is linked > > > to Lions LLVM based gcc runtime, which apparently has some > > > different behavior when it comes to such strange complex > > > values. > > > > > > These type of complex corner cases fail on several other platforms, > there > > they are marked as skipped. I propose not to start changing this yet - > the > > compiler change is causing problems with scipy ( > > http://projects.scipy.org/scipy/ticket/1476) and it's not yet clear what > the > > recommended build setup on Lion should be. > > > > Regarding binaries, it may be better to distribute separate ones for each > > version of OS X from numpy 1.7 / 2.0 (we already do for python 2.7). In > that > > case this particular failure will not occur. > > Please don't distribute a different numpy binary for each version of > MacOS X. That makes it very difficult to distribute bundled applications. > > The current situation is very reasonable, in my opinion: numpy has two > Mac binary distributions for Python 2.7: 32-bit 10.3-and-up and 64-bit > 10.6-and-up. These match the python.org python distributions. I can't > see wanting any more than one per python.org Mac binary. > If 10.6-built binaries are going to work without problems on 10.7 - also for scipy - then two versions is enough. I'm not yet confident this will be the case though. Do the tests for the current 10.6 scipy installer pass on 10.7? And do the 10.3-and-up Python 2.7 and 3.2 binaries work on 10.7? Those are explicitly listed as 10.3-10.6 (not 10.7 ...). > Note that the numpy Mac binaries are not listed next to each other on > the numpy sourceforge download page, so some folks are installing the > wrong one. That unfortunately can't be changed, unless I re-upload everything in the desired order. The SF interface has been rewritten, but it's not much of an improvement. Ralf > If you add even more os-specific flavors the problem is > likely to get worse. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 27 15:44:21 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 Jul 2011 12:44:21 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: Hi, On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe wrote: > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe wrote: >> > This was the most consistent way to deal with the parameterized dtype in >> > the >> > repr, making it more future-proof at the same time. It was producing >> > reprs >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly >> > wrong, >> > and putting quotes around it makes it work in general for all possible >> > dtypes, present and future. >> >> I don't know about you, but I find maintaining doctests across >> versions changes rather tricky. ?For our projects, doctests are >> important as part of the automated tests. ?At the moment this means >> that many doctests will break between 1.5.1 and 2.0. ?What do you >> think the best way round this problem? > > I'm not sure what the best approach is. I think the primary use of doctests > should be to validate that the documentation matches the implementation, and > anything confirming aspects of a software system should be regular tests. > ?In NumPy, there are platform-dependent differences in 32 vs 64 bit and big > vs little endian, so the part of the system that changed already couldn't be > relied on consistently.?I prefer systems where the code output in the > documentation is generated as part of the documentation build process > instead of being included in the documentation source files. Would it be fair to summarize your reply as 'just deal with it'? See you, Matthew From mwwiebe at gmail.com Wed Jul 27 15:47:40 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 14:47:40 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 2:44 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe wrote: > > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe wrote: > >> > This was the most consistent way to deal with the parameterized dtype > in > >> > the > >> > repr, making it more future-proof at the same time. It was producing > >> > reprs > >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly > >> > wrong, > >> > and putting quotes around it makes it work in general for all possible > >> > dtypes, present and future. > >> > >> I don't know about you, but I find maintaining doctests across > >> versions changes rather tricky. For our projects, doctests are > >> important as part of the automated tests. At the moment this means > >> that many doctests will break between 1.5.1 and 2.0. What do you > >> think the best way round this problem? > > > > I'm not sure what the best approach is. I think the primary use of > doctests > > should be to validate that the documentation matches the implementation, > and > > anything confirming aspects of a software system should be regular tests. > > In NumPy, there are platform-dependent differences in 32 vs 64 bit and > big > > vs little endian, so the part of the system that changed already couldn't > be > > relied on consistently. I prefer systems where the code output in the > > documentation is generated as part of the documentation build process > > instead of being included in the documentation source files. > > Would it be fair to summarize your reply as 'just deal with it'? > I'm not sure what else I can do to help you, since I think this aspect of the system should be subject to arbitrary improvement. My recommendation is in general not to use doctests as if they were regular tests. I'd rather not back out the improvements to repr, if that's what you're suggesting should happen. Do you have any other ideas? -Mark > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Jul 27 16:09:07 2011 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 Jul 2011 15:09:07 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 14:47, Mark Wiebe wrote: > On Wed, Jul 27, 2011 at 2:44 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe wrote: >> > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe wrote: >> >> > This was the most consistent way to deal with the parameterized dtype >> >> > in >> >> > the >> >> > repr, making it more future-proof at the same time. It was producing >> >> > reprs >> >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly >> >> > wrong, >> >> > and putting quotes around it makes it work in general for all >> >> > possible >> >> > dtypes, present and future. >> >> >> >> I don't know about you, but I find maintaining doctests across >> >> versions changes rather tricky. ?For our projects, doctests are >> >> important as part of the automated tests. ?At the moment this means >> >> that many doctests will break between 1.5.1 and 2.0. ?What do you >> >> think the best way round this problem? >> > >> > I'm not sure what the best approach is. I think the primary use of >> > doctests >> > should be to validate that the documentation matches the implementation, >> > and >> > anything confirming aspects of a software system should be regular >> > tests. >> > ?In NumPy, there are platform-dependent differences in 32 vs 64 bit and >> > big >> > vs little endian, so the part of the system that changed already >> > couldn't be >> > relied on consistently.?I prefer systems where the code output in the >> > documentation is generated as part of the documentation build process >> > instead of being included in the documentation source files. >> >> Would it be fair to summarize your reply as 'just deal with it'? > > I'm not sure what else I can do to help you, since I think this aspect of > the system should be subject to arbitrary improvement. My recommendation is > in general not to use doctests as if they were regular tests. I'd rather not > back out the improvements to repr, if that's what you're suggesting should > happen. Do you have any other ideas? In general, I tend to agree that doctests are not always appropriate. They tend to "overtest" and express things that the tester did not intend. It's just the nature of doctests that you have to accept if you want to use them. In this case, the tester wanted to test that the contents of the array were particular values and that it was a boolean array. Instead, it tested the precise bytes of the repr of the array. The repr of ndarrays are not a stable API, and we don't make guarantees about the precise details of its behavior from version to version. doctests work better to test simpler types and methods that do not have such complicated reprs. Yes, even as part of an automated test suite for functionality, not just to ensure the compliance of documentation examples. That said, you could only quote the dtypes that require the extra [syntax] and leave the current, simpler dtypes alone. That's a pragmatic compromise to the reality of the situation, which is that people do have extensive doctest suites already around, without removing your ability to innovate with the representations of the new dtypes. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From mwwiebe at gmail.com Wed Jul 27 16:12:08 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 15:12:08 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 3:09 PM, Robert Kern wrote: > On Wed, Jul 27, 2011 at 14:47, Mark Wiebe wrote: > > On Wed, Jul 27, 2011 at 2:44 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe wrote: > >> > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe > wrote: > >> >> > This was the most consistent way to deal with the parameterized > dtype > >> >> > in > >> >> > the > >> >> > repr, making it more future-proof at the same time. It was > producing > >> >> > reprs > >> >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly > >> >> > wrong, > >> >> > and putting quotes around it makes it work in general for all > >> >> > possible > >> >> > dtypes, present and future. > >> >> > >> >> I don't know about you, but I find maintaining doctests across > >> >> versions changes rather tricky. For our projects, doctests are > >> >> important as part of the automated tests. At the moment this means > >> >> that many doctests will break between 1.5.1 and 2.0. What do you > >> >> think the best way round this problem? > >> > > >> > I'm not sure what the best approach is. I think the primary use of > >> > doctests > >> > should be to validate that the documentation matches the > implementation, > >> > and > >> > anything confirming aspects of a software system should be regular > >> > tests. > >> > In NumPy, there are platform-dependent differences in 32 vs 64 bit > and > >> > big > >> > vs little endian, so the part of the system that changed already > >> > couldn't be > >> > relied on consistently. I prefer systems where the code output in the > >> > documentation is generated as part of the documentation build process > >> > instead of being included in the documentation source files. > >> > >> Would it be fair to summarize your reply as 'just deal with it'? > > > > I'm not sure what else I can do to help you, since I think this aspect of > > the system should be subject to arbitrary improvement. My recommendation > is > > in general not to use doctests as if they were regular tests. I'd rather > not > > back out the improvements to repr, if that's what you're suggesting > should > > happen. Do you have any other ideas? > > In general, I tend to agree that doctests are not always appropriate. > They tend to "overtest" and express things that the tester did not > intend. It's just the nature of doctests that you have to accept if > you want to use them. In this case, the tester wanted to test that the > contents of the array were particular values and that it was a boolean > array. Instead, it tested the precise bytes of the repr of the array. > The repr of ndarrays are not a stable API, and we don't make > guarantees about the precise details of its behavior from version to > version. doctests work better to test simpler types and methods that > do not have such complicated reprs. Yes, even as part of an automated > test suite for functionality, not just to ensure the compliance of > documentation examples. > > That said, you could only quote the dtypes that require the extra > [syntax] and leave the current, simpler dtypes alone. That's a > pragmatic compromise to the reality of the situation, which is that > people do have extensive doctest suites already around, without > removing your ability to innovate with the representations of the new > dtypes. > That sounds reasonable to me, and I'm happy to review pull requests from anyone who has time to do this change. -Mark > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Jul 27 16:23:36 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 Jul 2011 13:23:36 -0700 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion In-Reply-To: References: Message-ID: <4E3073C8.8060601@noaa.gov> On 7/27/11 12:35 PM, Ralf Gommers wrote: > Please don't distribute a different numpy binary for each version of > MacOS X. +1 ! > If 10.6-built binaries are going to work without problems on 10.7 - also > for scipy - then two versions is enough. I'm not yet confident this will > be the case though. Unless Apple has really broken things (and they usually don't in this way), that should be fine. However, a potential arise when folks want to build their own extensions against the python.org (and numpy) binaries. As I understand it, you can not build extensions to the 32 bit 10.3 binary on Lion, because Apple has not distributed the 10.4 sdk with XCode (nor does it support PPC compilation) But I think the 10.6+ binaries are fine. ( wish we had 10.5+ Intel only binaries, as I still need to support 10.5, but there are reasons that wasn't done) No Lion here just yet, so I can't test -- hopefully soon. > Do the tests for the current 10.6 scipy installer > pass on 10.7? And do the 10.3-and-up Python 2.7 and 3.2 binaries work on > 10.7? Those are explicitly listed as 10.3-10.6 (not 10.7 ...). they'll work (the 10.6, not 10.7 is because 10.7 didn't exist yet) -- with the exception of the above, which is, unfortunately, a common numpy/scipy use case. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ischnell at enthought.com Wed Jul 27 16:33:56 2011 From: ischnell at enthought.com (Ilan Schnell) Date: Wed, 27 Jul 2011 15:33:56 -0500 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion In-Reply-To: <4E3073C8.8060601@noaa.gov> References: <4E3073C8.8060601@noaa.gov> Message-ID: > Please don't distribute a different numpy binary for each version of > MacOS X. +1 Maybe I should mention that I just finished testing all Python packages in EPD under 10.7, and everything (execpt numpy.sqr for weird complex values such as inf/nan) works fine! In particular building C and Fortran extensions with the new LLVM based gcc and importing them into Python (both 32 and 64-bit). There are two MacOS builds of EPD (one 32-bit and 64-bit), they are compiled on 10.5 using gcc 4.0.1 and then tested on 10.5, 10.6 and 10.7. - Ilan On Wed, Jul 27, 2011 at 3:23 PM, Christopher Barker wrote: > On 7/27/11 12:35 PM, Ralf Gommers wrote: >> ? ? Please don't distribute a different numpy binary for each version of >> ? ? MacOS X. > > +1 ! > >> If 10.6-built binaries are going to work without problems on 10.7 - also >> for scipy - then two versions is enough. I'm not yet confident this will >> be the case though. > > Unless Apple has really broken things (and they usually don't in this > way), that should be fine. > > However, a potential arise when folks want to build their own extensions > against the python.org (and numpy) binaries. > > As I understand it, you can not build extensions to the 32 bit 10.3 > binary on Lion, because Apple has not distributed the 10.4 sdk with > XCode (nor does it support PPC compilation) > > But I think the 10.6+ binaries are fine. > > ( wish we had 10.5+ Intel only binaries, as I still need to support > 10.5, but there are reasons that wasn't done) > > No Lion here just yet, so I can't test -- hopefully soon. > >> Do the tests for the current 10.6 scipy installer >> pass on 10.7? And do the 10.3-and-up Python 2.7 and 3.2 binaries work on >> 10.7? Those are explicitly listed as 10.3-10.6 (not 10.7 ...). > > they'll work (the 10.6, not 10.7 is because 10.7 didn't exist yet) -- > with the exception of the above, which is, unfortunately, a common > numpy/scipy use case. > > > -Chris > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R ? ? ? ? ? ?(206) 526-6959 ? voice > 7600 Sand Point Way NE ? (206) 526-6329 ? fax > Seattle, WA ?98115 ? ? ? (206) 526-6317 ? main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Wed Jul 27 17:32:06 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 Jul 2011 14:32:06 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: Hi, On Wed, Jul 27, 2011 at 1:12 PM, Mark Wiebe wrote: > On Wed, Jul 27, 2011 at 3:09 PM, Robert Kern wrote: >> >> On Wed, Jul 27, 2011 at 14:47, Mark Wiebe wrote: >> > On Wed, Jul 27, 2011 at 2:44 PM, Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe wrote: >> >> > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett >> >> > >> >> > wrote: >> >> >> >> >> >> Hi, >> >> >> >> >> >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe >> >> >> wrote: >> >> >> > This was the most consistent way to deal with the parameterized >> >> >> > dtype >> >> >> > in >> >> >> > the >> >> >> > repr, making it more future-proof at the same time. It was >> >> >> > producing >> >> >> > reprs >> >> >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is >> >> >> > clearly >> >> >> > wrong, >> >> >> > and putting quotes around it makes it work in general for all >> >> >> > possible >> >> >> > dtypes, present and future. >> >> >> >> >> >> I don't know about you, but I find maintaining doctests across >> >> >> versions changes rather tricky. ?For our projects, doctests are >> >> >> important as part of the automated tests. ?At the moment this means >> >> >> that many doctests will break between 1.5.1 and 2.0. ?What do you >> >> >> think the best way round this problem? >> >> > >> >> > I'm not sure what the best approach is. I think the primary use of >> >> > doctests >> >> > should be to validate that the documentation matches the >> >> > implementation, >> >> > and >> >> > anything confirming aspects of a software system should be regular >> >> > tests. >> >> > ?In NumPy, there are platform-dependent differences in 32 vs 64 bit >> >> > and >> >> > big >> >> > vs little endian, so the part of the system that changed already >> >> > couldn't be >> >> > relied on consistently.?I prefer systems where the code output in the >> >> > documentation is generated as part of the documentation build process >> >> > instead of being included in the documentation source files. >> >> >> >> Would it be fair to summarize your reply as 'just deal with it'? >> > >> > I'm not sure what else I can do to help you, since I think this aspect >> > of >> > the system should be subject to arbitrary improvement. My recommendation >> > is >> > in general not to use doctests as if they were regular tests. I'd rather >> > not >> > back out the improvements to repr, if that's what you're suggesting >> > should >> > happen. Do you have any other ideas? >> >> In general, I tend to agree that doctests are not always appropriate. >> They tend to "overtest" and express things that the tester did not >> intend. It's just the nature of doctests that you have to accept if >> you want to use them. In this case, the tester wanted to test that the >> contents of the array were particular values and that it was a boolean >> array. Instead, it tested the precise bytes of the repr of the array. >> The repr of ndarrays are not a stable API, and we don't make >> guarantees about the precise details of its behavior from version to >> version. doctests work better to test simpler types and methods that >> do not have such complicated reprs. Yes, even as part of an automated >> test suite for functionality, not just to ensure the compliance of >> documentation examples. >> >> That said, you could only quote the dtypes that require the extra >> [syntax] and leave the current, simpler dtypes alone. That's a >> pragmatic compromise to the reality of the situation, which is that >> people do have extensive doctest suites already around, without >> removing your ability to innovate with the representations of the new >> dtypes. > > That sounds reasonable to me, and I'm happy to review pull requests from > anyone who has time to do this change. Forgive me, but this seems almost ostentatiously unhelpful. I understand you have little sympathy for the problem, but, just as a social courtesy, some pointers as to where to look would have been useful. See you, Matthew From alex.flint at gmail.com Wed Jul 27 17:36:51 2011 From: alex.flint at gmail.com (Alex Flint) Date: Wed, 27 Jul 2011 17:36:51 -0400 Subject: [Numpy-discussion] inconsistent semantics for double-slicing Message-ID: When applying two different slicing operations in succession (e.g. select a sub-range, then select using a binary mask) it seems that numpy arrays can be inconsistent with respect to assignment: For example, in this case an array is modified: In [6]: *A = np.arange(5)* In [8]: *A[:][A>2] = 0* In [10]: A Out[10]: *array([0, 1, 2, 0, 0])* Whereas here the original array remains unchanged In [11]: *A = np.arange(5)* In [12]: *A[[0,1,2,3,4]][A>2] = 0* In [13]: A Out[13]: *array([0, 1, 2, 3, 4])* This arose in a less contrived situation in which I was trying to copy a small image into a large image, modulo a mask on the small image. Is this meant to be like this? Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From jbauer-news at web.de Wed Jul 27 17:37:50 2011 From: jbauer-news at web.de (Johann Bauer) Date: Wed, 27 Jul 2011 14:37:50 -0700 Subject: [Numpy-discussion] C-API: multidimensional array indexing? Message-ID: <4E30852E.8080201@web.de> Dear experts, is there a C-API function for numpy which implements Python's multidimensional indexing? Say, I have a 2d-array PyArrayObject * M; and an index int i; how do I extract the i-th row or column M[i,:] respectively M[:,i]? I am looking for a function which gives again a PyArrayObject * and which is a view to M (no copied data; the result should be another PyArrayObject whose data and strides points to the correct memory portion of M). I searched the API documentation, Google and mailing lists for quite a long time but didn't find anything. Can you help me? Thanks, Johann From cjordan1 at uw.edu Wed Jul 27 17:39:33 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 27 Jul 2011 16:39:33 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 3:09 PM, Robert Kern wrote: > On Wed, Jul 27, 2011 at 14:47, Mark Wiebe wrote: > > On Wed, Jul 27, 2011 at 2:44 PM, Matthew Brett > > wrote: > >> > >> Hi, > >> > >> On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe wrote: > >> > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe > wrote: > >> >> > This was the most consistent way to deal with the parameterized > dtype > >> >> > in > >> >> > the > >> >> > repr, making it more future-proof at the same time. It was > producing > >> >> > reprs > >> >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly > >> >> > wrong, > >> >> > and putting quotes around it makes it work in general for all > >> >> > possible > >> >> > dtypes, present and future. > >> >> > >> >> I don't know about you, but I find maintaining doctests across > >> >> versions changes rather tricky. For our projects, doctests are > >> >> important as part of the automated tests. At the moment this means > >> >> that many doctests will break between 1.5.1 and 2.0. What do you > >> >> think the best way round this problem? > >> > > >> > I'm not sure what the best approach is. I think the primary use of > >> > doctests > >> > should be to validate that the documentation matches the > implementation, > >> > and > >> > anything confirming aspects of a software system should be regular > >> > tests. > >> > In NumPy, there are platform-dependent differences in 32 vs 64 bit > and > >> > big > >> > vs little endian, so the part of the system that changed already > >> > couldn't be > >> > relied on consistently. I prefer systems where the code output in the > >> > documentation is generated as part of the documentation build process > >> > instead of being included in the documentation source files. > >> > >> Would it be fair to summarize your reply as 'just deal with it'? > > > > I'm not sure what else I can do to help you, since I think this aspect of > > the system should be subject to arbitrary improvement. My recommendation > is > > in general not to use doctests as if they were regular tests. I'd rather > not > > back out the improvements to repr, if that's what you're suggesting > should > > happen. Do you have any other ideas? > > In general, I tend to agree that doctests are not always appropriate. > They tend to "overtest" and express things that the tester did not > intend. It's just the nature of doctests that you have to accept if > you want to use them. In this case, the tester wanted to test that the > contents of the array were particular values and that it was a boolean > array. Instead, it tested the precise bytes of the repr of the array. > The repr of ndarrays are not a stable API, and we don't make > guarantees about the precise details of its behavior from version to > version. doctests work better to test simpler types and methods that > do not have such complicated reprs. Yes, even as part of an automated > test suite for functionality, not just to ensure the compliance of > documentation examples. > > That said, you could only quote the dtypes that require the extra > [syntax] and leave the current, simpler dtypes alone. That's a > pragmatic compromise to the reality of the situation, which is that > people do have extensive doctest suites already around, without > removing your ability to innovate with the representations of the new > dtypes. > > +1 -Chris JS > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Wed Jul 27 17:43:05 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 27 Jul 2011 17:43:05 -0400 Subject: [Numpy-discussion] inconsistent semantics for double-slicing In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 5:36 PM, Alex Flint wrote: > When applying two different slicing operations in succession (e.g. select a > sub-range, then select using a binary mask) it seems that numpy arrays can > be inconsistent with respect to assignment: > For example, in this case an array is modified: > In [6]: A = np.arange(5) > In [8]: A[:][A>2] = 0 > In [10]: A > Out[10]: array([0, 1, 2, 0, 0]) > Whereas here the original array remains unchanged > In [11]: A = np.arange(5) > In [12]: A[[0,1,2,3,4]][A>2] = 0 > In [13]: A > Out[13]: array([0, 1, 2, 3, 4]) > This arose in a less contrived situation in which I was trying to copy a > small image into a large image, modulo a mask on the small image. > Is this meant to be like this? > Alex > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > When you do this: A[[0,1,2,3,4]][A>2] = 0 what is happening is: A.__getitem__([0,1,2,3,4]).__setitem__(A > 2, 0) Whenever you do getitem with "fancy" indexing (i.e. A[[0,1,2,3,4]]), it produces a new object. In the first case, slicing A[:] produces a view on the same data. - Wes From mwwiebe at gmail.com Wed Jul 27 17:59:17 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 16:59:17 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 4:32 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 27, 2011 at 1:12 PM, Mark Wiebe wrote: > > On Wed, Jul 27, 2011 at 3:09 PM, Robert Kern > wrote: > >> > >> On Wed, Jul 27, 2011 at 14:47, Mark Wiebe wrote: > >> > On Wed, Jul 27, 2011 at 2:44 PM, Matthew Brett < > matthew.brett at gmail.com> > >> > wrote: > >> >> > >> >> Hi, > >> >> > >> >> On Wed, Jul 27, 2011 at 12:25 PM, Mark Wiebe > wrote: > >> >> > On Wed, Jul 27, 2011 at 1:01 PM, Matthew Brett > >> >> > > >> >> > wrote: > >> >> >> > >> >> >> Hi, > >> >> >> > >> >> >> On Wed, Jul 27, 2011 at 6:54 PM, Mark Wiebe > >> >> >> wrote: > >> >> >> > This was the most consistent way to deal with the parameterized > >> >> >> > dtype > >> >> >> > in > >> >> >> > the > >> >> >> > repr, making it more future-proof at the same time. It was > >> >> >> > producing > >> >> >> > reprs > >> >> >> > like "array(['2011-01-01'], dtype=datetime64[D])", which is > >> >> >> > clearly > >> >> >> > wrong, > >> >> >> > and putting quotes around it makes it work in general for all > >> >> >> > possible > >> >> >> > dtypes, present and future. > >> >> >> > >> >> >> I don't know about you, but I find maintaining doctests across > >> >> >> versions changes rather tricky. For our projects, doctests are > >> >> >> important as part of the automated tests. At the moment this > means > >> >> >> that many doctests will break between 1.5.1 and 2.0. What do you > >> >> >> think the best way round this problem? > >> >> > > >> >> > I'm not sure what the best approach is. I think the primary use of > >> >> > doctests > >> >> > should be to validate that the documentation matches the > >> >> > implementation, > >> >> > and > >> >> > anything confirming aspects of a software system should be regular > >> >> > tests. > >> >> > In NumPy, there are platform-dependent differences in 32 vs 64 bit > >> >> > and > >> >> > big > >> >> > vs little endian, so the part of the system that changed already > >> >> > couldn't be > >> >> > relied on consistently. I prefer systems where the code output in > the > >> >> > documentation is generated as part of the documentation build > process > >> >> > instead of being included in the documentation source files. > >> >> > >> >> Would it be fair to summarize your reply as 'just deal with it'? > >> > > >> > I'm not sure what else I can do to help you, since I think this aspect > >> > of > >> > the system should be subject to arbitrary improvement. My > recommendation > >> > is > >> > in general not to use doctests as if they were regular tests. I'd > rather > >> > not > >> > back out the improvements to repr, if that's what you're suggesting > >> > should > >> > happen. Do you have any other ideas? > >> > >> In general, I tend to agree that doctests are not always appropriate. > >> They tend to "overtest" and express things that the tester did not > >> intend. It's just the nature of doctests that you have to accept if > >> you want to use them. In this case, the tester wanted to test that the > >> contents of the array were particular values and that it was a boolean > >> array. Instead, it tested the precise bytes of the repr of the array. > >> The repr of ndarrays are not a stable API, and we don't make > >> guarantees about the precise details of its behavior from version to > >> version. doctests work better to test simpler types and methods that > >> do not have such complicated reprs. Yes, even as part of an automated > >> test suite for functionality, not just to ensure the compliance of > >> documentation examples. > >> > >> That said, you could only quote the dtypes that require the extra > >> [syntax] and leave the current, simpler dtypes alone. That's a > >> pragmatic compromise to the reality of the situation, which is that > >> people do have extensive doctest suites already around, without > >> removing your ability to innovate with the representations of the new > >> dtypes. > > > > That sounds reasonable to me, and I'm happy to review pull requests from > > anyone who has time to do this change. > > Forgive me, but this seems almost ostentatiously unhelpful. > I was offering to help, I think you're reading between the lines too much. The kind of response I was trying to invite is more along the lines of "I'd like to help, but I'm not sure where to start. Can you give me some pointers?" I understand you have little sympathy for the problem, but, just as a > social courtesy, some pointers as to where to look would have been > useful. > I do have sympathy for the problem, dealing with bad design decisions made early on in software projects is pretty common. In this case what Robert proposed is a good temporary solution, but ultimately NumPy needs the ability to change its repr and other details like it in order to progress as a software project. If I recall correctly the relevant functions are in Python and called array_repr and array2string, and they're in some of the files in numpy/core. I don't remember the file names, but a grep or find in files should track that down pretty quick. Cheers, Mark > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Jul 27 18:07:06 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 28 Jul 2011 00:07:06 +0200 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: <20110727220706.GB14003@phare.normalesup.org> On Wed, Jul 27, 2011 at 04:59:17PM -0500, Mark Wiebe wrote: > but ultimately NumPy needs the ability to change its repr and other > details like it in order to progress as a software project. You have to understand that numpy is the core layer on which people have build pretty huge scientific codebases with fairly little money flowing in to support. Any minor change to numpy cause turmoils in labs and is delt by people (student or researchers) on their spare time. I am not saying that there should not be any changes to numpy, just that the costs and benefits of these changes need to be weighted carefully. Numpy is not a young and agile project its a foundation library. My two cents, Ga?l From mwwiebe at gmail.com Wed Jul 27 18:23:18 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 17:23:18 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: <20110727220706.GB14003@phare.normalesup.org> References: <20110727220706.GB14003@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 5:07 PM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Wed, Jul 27, 2011 at 04:59:17PM -0500, Mark Wiebe wrote: > > but ultimately NumPy needs the ability to change its repr and other > > details like it in order to progress as a software project. > > You have to understand that numpy is the core layer on which people have > build pretty huge scientific codebases with fairly little money flowing > in to support. Any minor change to numpy cause turmoils in labs and is > delt by people (student or researchers) on their spare time. I am not > saying that there should not be any changes to numpy, just that the costs > and benefits of these changes need to be weighted carefully. Numpy is not > a young and agile project its a foundation library. > That's absolutely true. In my opinion, the biggest consequence of this is that great caution needs to be taken during the release process, something that Ralf has done a commendable job on. This shouldn't affect the idea that there are progress and improvements in the library. On the contrary, changes to such a core library can have just as many positive ripple effects to all its dependencies as it can have turmoils. The development trunk out of necessity will cause more turmoils than release versions, otherwise there's no way to see whether a change affects a lot of code out there or if its effects will be relatively minor. I appreciate everyone out there that's running the code in master, producing bug reports and in some cases pull requests based on their testing. NumPy isn't a young and agile project, that's correct. It does however have the potential to be agile. Many of the changes I've done deeper in the core are with the aim of allowing NumPy to evolve more quickly without having to change the exposed ABI and API. Cheers, Mark > > My two cents, > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Jul 27 18:25:10 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 17:25:10 -0500 Subject: [Numpy-discussion] C-API: multidimensional array indexing? In-Reply-To: <4E30852E.8080201@web.de> References: <4E30852E.8080201@web.de> Message-ID: Probably the easiest way is to emulate what Python is doing in M[i,:] and M[:,i]. You can create the : with PySlice_New(NULL, NULL, NULL), and the i with PyInt_FromLong. Then create a tuple with Py_BuildValue and use PyObject_GetItem to do the slicing. It is possible to do the same thing directly in C, but as you've found there aren't convenient APIs for this yet. Cheers, Mark On Wed, Jul 27, 2011 at 4:37 PM, Johann Bauer wrote: > Dear experts, > > is there a C-API function for numpy which implements Python's > multidimensional indexing? Say, I have a 2d-array > > PyArrayObject * M; > > and an index > > int i; > > how do I extract the i-th row or column M[i,:] respectively M[:,i]? > > I am looking for a function which gives again a PyArrayObject * and > which is a view to M (no copied data; the result should be another > PyArrayObject whose data and strides points to the correct memory > portion of M). > > I searched the API documentation, Google and mailing lists for quite a > long time but didn't find anything. Can you help me? > > Thanks, Johann > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 27 18:35:35 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 Jul 2011 15:35:35 -0700 Subject: [Numpy-discussion] Change in behavior of PyArray_BYTES( Message-ID: Hi, I was trying to compile matplotlib against current trunk, and hit an error with this line: char* row0 = PyArray_BYTES(matrix); https://github.com/matplotlib/matplotlib/blob/master/src/agg_py_transforms.cpp The error is: src/agg_py_transforms.cpp:30:26: error: invalid conversion from ?void*? to ?char*? It turned out that the output type of PyArray_BYTES has changed between 1.5.1 and current trunk In 1.5.1, ndarraytypes.h: #define PyArray_BYTES(obj) (((PyArrayObject *)(obj))->data) (resulting in a char *, from the char * bytes member of PyArrayObject) In current trunk we have this: #define PyArray_BYTES(arr) PyArray_DATA(arr) ifndef NPY_NO_DEPRECATED_API then this results in: #define PyArray_DATA(obj) ((void *)(((PyArrayObject_fieldaccess *)(obj))->data)) giving a void * ifdef NPY_NO_DEPRECATED_API then: static NPY_INLINE char * PyArray_DATA(PyArrayObject *arr) { return ((PyArrayObject_fieldaccess *)arr)->data; } resulting in a char * (for both PyArray_DATA and PyArray_BYTES. It seems to me that it would be safer to add back this line: #define PyArray_BYTES(obj) (((PyArrayObject *)(obj))->data) to ndarraytypes.h , within the ifndef NPY_NO_DEPRECATED_API block, to maintain compatibility. Do y'all agree? Best, Matthew From mwwiebe at gmail.com Wed Jul 27 18:40:21 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 17:40:21 -0500 Subject: [Numpy-discussion] Change in behavior of PyArray_BYTES( In-Reply-To: References: Message-ID: On Wed, Jul 27, 2011 at 5:35 PM, Matthew Brett wrote: > Hi, > > I was trying to compile matplotlib against current trunk, and hit an > error with this line: > > char* row0 = PyArray_BYTES(matrix); > > > https://github.com/matplotlib/matplotlib/blob/master/src/agg_py_transforms.cpp > > The error is: > > src/agg_py_transforms.cpp:30:26: error: invalid conversion from > ?void*? to ?char*? > > It turned out that the output type of PyArray_BYTES has changed > between 1.5.1 and current trunk > > In 1.5.1, ndarraytypes.h: > > #define PyArray_BYTES(obj) (((PyArrayObject *)(obj))->data) > (resulting in a char *, from the char * bytes member of PyArrayObject) > > In current trunk we have this: > > #define PyArray_BYTES(arr) PyArray_DATA(arr) > > ifndef NPY_NO_DEPRECATED_API then this results in: > > #define PyArray_DATA(obj) ((void *)(((PyArrayObject_fieldaccess > *)(obj))->data)) > > giving a void * > > ifdef NPY_NO_DEPRECATED_API then: > > static NPY_INLINE char * > PyArray_DATA(PyArrayObject *arr) > { > return ((PyArrayObject_fieldaccess *)arr)->data; > } > > resulting in a char * (for both PyArray_DATA and PyArray_BYTES. > > It seems to me that it would be safer to add back this line: > > #define PyArray_BYTES(obj) (((PyArrayObject *)(obj))->data) > > to ndarraytypes.h , within the ifndef NPY_NO_DEPRECATED_API block, to > maintain compatibility. > > Do y'all agree? > Yes, this was an error. Michael Droettboom's pull request to fix it is already merged, so if you update against master it should work. -Mark > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Jul 27 18:47:14 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 Jul 2011 15:47:14 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> Message-ID: Hi, On Wed, Jul 27, 2011 at 3:23 PM, Mark Wiebe wrote: > On Wed, Jul 27, 2011 at 5:07 PM, Gael Varoquaux > wrote: >> >> On Wed, Jul 27, 2011 at 04:59:17PM -0500, Mark Wiebe wrote: >> > ? ?but ultimately NumPy needs the ability to change its repr and other >> > ? ?details like it in order to progress as a software project. >> >> You have to understand that numpy is the core layer on which people have >> build pretty huge scientific codebases with fairly little money flowing >> in to support. Any minor change to numpy cause turmoils in labs and is >> delt by people (student or researchers) on their spare time. I am not >> saying that there should not be any changes to numpy, just that the costs >> and benefits of these changes need to be weighted carefully. Numpy is not >> a young and agile project its a foundation library. > > That's absolutely true. In my opinion, the biggest consequence of this is > that great caution needs to be taken during the release process, something > that Ralf has done a commendable job on. You seem to be saying that if - say - you - put in some backwards incompatibility during development then you are expecting: a) Not to do anything about this until release time and b) That Ralf can clear all that up even though you made the changes. I am sure that most people, myself included, are very glad that you are trying to improve the numpy internals, and know that that is hard, and will cause breakage, from time to time. On the other hand, if we tell you about breakages or incompatibilities, and you tell us 'go fix it yourself', or 'Ralf can fix it later' then that can a) cause bad feeling and b) reduce community ownership of the code and c) make us anxious about stability. Cheers, Matthew From matthew.brett at gmail.com Wed Jul 27 18:50:48 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 Jul 2011 15:50:48 -0700 Subject: [Numpy-discussion] Change in behavior of PyArray_BYTES( In-Reply-To: References: Message-ID: Hi, On Wed, Jul 27, 2011 at 3:40 PM, Mark Wiebe wrote: > On Wed, Jul 27, 2011 at 5:35 PM, Matthew Brett > wrote: >> >> Hi, >> >> I was trying to compile matplotlib against current trunk, and hit an >> error with this line: >> >> ? ? ? ? ? ?char* row0 = PyArray_BYTES(matrix); >> >> >> https://github.com/matplotlib/matplotlib/blob/master/src/agg_py_transforms.cpp >> >> The error is: >> >> src/agg_py_transforms.cpp:30:26: error: invalid conversion from >> ?void*? to ?char*? >> >> It turned out that the output type of PyArray_BYTES has changed >> between 1.5.1 and current trunk >> >> In 1.5.1, ndarraytypes.h: >> >> #define PyArray_BYTES(obj) (((PyArrayObject *)(obj))->data) >> (resulting in a char *, from the char * bytes member of PyArrayObject) >> >> In current trunk we have this: >> >> #define PyArray_BYTES(arr) PyArray_DATA(arr) >> >> ifndef NPY_NO_DEPRECATED_API ?then this results in: >> >> #define PyArray_DATA(obj) ((void *)(((PyArrayObject_fieldaccess >> *)(obj))->data)) >> >> giving a void * >> >> ifdef NPY_NO_DEPRECATED_API then: >> >> static NPY_INLINE char * >> PyArray_DATA(PyArrayObject *arr) >> { >> ? ?return ((PyArrayObject_fieldaccess *)arr)->data; >> } >> >> resulting in a char * (for both PyArray_DATA and PyArray_BYTES. >> >> It seems to me that it would be safer to add back this line: >> >> #define PyArray_BYTES(obj) (((PyArrayObject *)(obj))->data) >> >> to ndarraytypes.h , within the ifndef NPY_NO_DEPRECATED_API block, to >> maintain compatibility. >> >> Do y'all agree? > > Yes, this was an error. Michael Droettboom's pull request to fix it is > already merged, so if you update against master it should work. > -Mark Ah - yes - thanks, Matthew From jbauer-news at web.de Wed Jul 27 19:21:33 2011 From: jbauer-news at web.de (Johann Bauer) Date: Wed, 27 Jul 2011 16:21:33 -0700 Subject: [Numpy-discussion] C-API: multidimensional array indexing? In-Reply-To: References: <4E30852E.8080201@web.de> Message-ID: <4E309D7D.4060705@web.de> Thanks, Mark! Problem solved. Johann From mwwiebe at gmail.com Wed Jul 27 19:24:40 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 27 Jul 2011 18:24:40 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 5:47 PM, Matthew Brett wrote: > Hi, > > On Wed, Jul 27, 2011 at 3:23 PM, Mark Wiebe wrote: > > On Wed, Jul 27, 2011 at 5:07 PM, Gael Varoquaux > > wrote: > >> > >> On Wed, Jul 27, 2011 at 04:59:17PM -0500, Mark Wiebe wrote: > >> > but ultimately NumPy needs the ability to change its repr and other > >> > details like it in order to progress as a software project. > >> > >> You have to understand that numpy is the core layer on which people have > >> build pretty huge scientific codebases with fairly little money flowing > >> in to support. Any minor change to numpy cause turmoils in labs and is > >> delt by people (student or researchers) on their spare time. I am not > >> saying that there should not be any changes to numpy, just that the > costs > >> and benefits of these changes need to be weighted carefully. Numpy is > not > >> a young and agile project its a foundation library. > > > > That's absolutely true. In my opinion, the biggest consequence of this is > > that great caution needs to be taken during the release process, > something > > that Ralf has done a commendable job on. > > You seem to be saying that if - say - you - put in some backwards > incompatibility during development then you are expecting: > > a) Not to do anything about this until release time and > b) That Ralf can clear all that up even though you made the changes. > Not at all. What I tried to do is explain the rationale for the change, and why I believe third party code should not depend on this aspect of the system. You are free to argue why you believe this point is incorrect, or why even though it is correct, there are pragmatic reasons why a compromise solution should be found. Then we can discuss who should do what and figure out a time frame. That is after all the purpose of the mailing list. The role Ralf is playing in managing the release process does not involve doing all the code fixes, it's a group effort. He gets the credit for making sure everything goes smoothly during the beta and release candidate period, and that all the loose ends are tied up appropriately. > I am sure that most people, myself included, are very glad that you > are trying to improve the numpy internals, and know that that is hard, > and will cause breakage, from time to time. > > On the other hand, if we tell you about breakages or > incompatibilities, and you tell us 'go fix it yourself', or 'Ralf can > fix it later' then that can > > a) cause bad feeling and > b) reduce community ownership of the code and > c) make us anxious about stability. > By working with master, you're participating in the development of NumPy. I'm volunteering as part of this community as much as you are, and I'm doing things to try and increase participation by giving pointers and suggestions in bug reports and pull requests. NumPy has a *very* small development team and a very large user base. I'm a human being, as are all of us in the NumPy community, and not everything I do will be perfect. I am, however, for the moment writing a lot of NumPy code, and there already have been two releases, 1.6.0 and 1.6.1, with changes authored by me which cut deeper in NumPy than I suspect most people realize. I would kindly ask that people be patient with and participate positively in the development process, it is not a straightforward journey from start to finish. -Mark > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jul 27 19:25:20 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 Jul 2011 17:25:20 -0600 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: <20110727220706.GB14003@phare.normalesup.org> References: <20110727220706.GB14003@phare.normalesup.org> Message-ID: On Wed, Jul 27, 2011 at 4:07 PM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Wed, Jul 27, 2011 at 04:59:17PM -0500, Mark Wiebe wrote: > > but ultimately NumPy needs the ability to change its repr and other > > details like it in order to progress as a software project. > > You have to understand that numpy is the core layer on which people have > build pretty huge scientific codebases with fairly little money flowing > in to support. Any minor change to numpy cause turmoils in labs and is > delt by people (student or researchers) on their spare time. I am not > saying that there should not be any changes to numpy, just that the costs > and benefits of these changes need to be weighted carefully. Numpy is not > a young and agile project its a foundation library. > > Well, doc tests are just a losing proposition, no one should be using them for writing tests. It's not like this is a new discovery, doc tests have been known to be unstable for years. As to numpy being a settled project, I beg to differ. Without moving forward and routine maintenance numpy will quickly bitrot. I think it would only take a year or two before the decay began to show. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Wed Jul 27 19:29:49 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 28 Jul 2011 01:29:49 +0200 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> Message-ID: <20110727232949.GF14003@phare.normalesup.org> On Wed, Jul 27, 2011 at 05:25:20PM -0600, Charles R Harris wrote: > Well, doc tests are just a losing proposition, no one should be using them > for writing tests. It's not like this is a new discovery, doc tests have > been known to be unstable for years. Untested documentation is broken in my experience. This is why I do rely a lot on doctests. > As to numpy being a settled project, I beg to differ. Without moving > forward and routine maintenance numpy will quickly bitrot. I think it > would only take a year or two before the decay began to show. I agree; it's a question of finding tradeoff. It is hard to support different versions of numpy because of it's evolution, but we do like it's evolution, but that's what makes it better. However, sometimes doing a bit of compromises on changes that are cosmetic for the old farts like Matthew and me who have legacy codebases to support is appreciated. Ga?l From ed at pythoncharmers.com Wed Jul 27 22:31:48 2011 From: ed at pythoncharmers.com (Ed Schofield) Date: Thu, 28 Jul 2011 12:31:48 +1000 Subject: [Numpy-discussion] Regression in choose() In-Reply-To: <3C246651-E232-4AF4-9838-1F47D9D86A21@enthought.com> References: <3C246651-E232-4AF4-9838-1F47D9D86A21@enthought.com> Message-ID: Hi Travis, hi Olivier, Thanks for your replies last month about the choose() issue. I did some further investigation into this. I ran out of time in that project to come up with a patch, but here's what I found, which may be of interest: The compile-time constant NPY_MAXARGS is indeed limiting choose(), but only in recent versions. In NumPy version 1.2.1 this constant was set to the same value of 32, but choose() was not limited in the same way. This code succeeds on NumPy 1.2.1: ---------------- import numpy as np choices = [[0, 1, 2, 3], [10, 11, 12, 13], [20, 21, 22, 23], [30, 31, 32, 33]] morechoices = choices * 2**22 np.choose([2, 0, 1, 0], morechoices) ---------------- where the list contains 16.7m items. So this is a real regression ... for heavy-duty users of choose(). Thanks again for your thoughts! Best wishes, Ed On Fri, Jun 17, 2011 at 3:05 AM, Travis Oliphant wrote: > Hi Ed, > > I'm pretty sure that this is "bug" is due to the compile-time constant > NPY_MAXARGS defined in include/numpy/ndarraytypes.h I suspect that the > versions you are trying it on where it succeeds as a different compile-time > constant of that value. > > NumPy uses a multi-iterator object (PyArrayMultiIterObject defined in > ndarraytypes.h as well) to broadcast arguments together for ufuncs and for > functions like choose. The data-structure that it uses to do this has a > static array of Iterator objects with room for NPY_MAXARGS iterators. I > think in some versions this compile time constant has been 40 or higher. > Re-compiling NumPy by bumping up that constant will of course require > re-compilation of most extensions modules that use the NumPy API. > > Numeric did not use this approach to broadcast the arguments to choose > together and so likely does not have the same limitation. It would also > not be that difficult to modify the NumPy code to dynamically allocate the > iters array when needed to remove the NPY_MAXARGS limitation. In fact, I > would not mind seeing all the NPY_MAXDIMS and NPY_MAXARGS limitations > removed. To do it well you would probably want to have some minimum > storage-space pre-allocated (say NPY_MAXDIMS as 7 and NPY_MAXARGS as 10 to > avoid the memory allocation in common cases) and just increase that space as > needed dynamically. > > This would be a nice project for someone wanting to learn the NumPy code > base. > > -Travis > > > > > > On Jun 16, 2011, at 1:56 AM, Ed Schofield wrote: > > Hi all, > > I have been investigation the limitation of the choose() method (and > function) to 32 elements. This is a regression in recent versions of NumPy. > I have tested choose() in the following NumPy versions: > > 1.0.4: fine > 1.1.1: bug > 1.2.1: fine > 1.3.0: bug > 1.4.x: bug > 1.5.x: bug > 1.6.x: bug > Numeric 24.3: fine > > (To run the tests on versions of NumPy prior to 1.4.x I used Python 2.4.3. > For the other tests I used Python 2.7.) > > Here 'bug' means the choose() function has the 32-element limitation. I > have been helping an organization to port a large old Numeric-using codebase > to NumPy, and the choose() limitation in recent NumPy versions is throwing a > spanner in the works. The codebase is currently using both NumPy and Numeric > side-by-side, with Numeric only being used for its choose() function, with a > few dozen lines like this: > > a = numpy.array(Numeric.choose(b, c)) > > Here is a simple example that triggers the bug. It is a simple extension of > the example from the choose() docstring: > > ---------------- > > import numpy as np > > choices = [[0, 1, 2, 3], [10, 11, 12, 13], > [20, 21, 22, 23], [30, 31, 32, 33]] > > np.choose([2, 3, 1, 0], choices * 8) > > ---------------- > > A side note: the exception message (defined in > core/src/multiarray/iterators.c) is also slightly inconsistent with the > actual behaviour: > > Traceback (most recent call last): > File "chooser.py", line 6, in > np.choose([2, 3, 1, 0], choices * 8) > File "/usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py", line > 277, in choose > return _wrapit(a, 'choose', choices, out=out, mode=mode) > File "/usr/lib64/python2.7/site-packages/numpy/core/fromnumeric.py", line > 37, in _wrapit > result = getattr(asarray(obj),method)(*args, **kwds) > ValueError: Need between 2 and (32) array objects (inclusive). > > The actual behaviour is that choose() passes with 31 objects but fails with > 32 objects, so this should read "exclusive" rather than "inclusive". (And > why the parentheses around 32?) > > Does anyone know what changed between 1.2.1 and 1.3.0 that introduced the > 32-element limitation to choose(), and whether we might be able to lift this > limitation again for future NumPy versions? I have a couple of days to work > on a patch ... if someone can advise me how to approach this. > > Best wishes, > Ed > > > -- > Dr. Edward Schofield > Python Charmers > +61 (0)405 676 229 > http://pythoncharmers.com > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > --- > Travis Oliphant > Enthought, Inc. > oliphant at enthought.com > 1-512-536-1057 > http://www.enthought.com > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Dr. Edward Schofield Python Charmers +61 (0)405 676 229 http://pythoncharmers.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From rblove_lists at comcast.net Wed Jul 27 23:29:08 2011 From: rblove_lists at comcast.net (Robert Love) Date: Wed, 27 Jul 2011 22:29:08 -0500 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110716145010.GY3465@earth.li> References: <20110716145010.GY3465@earth.li> Message-ID: <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> To use quaternions I find I often need conversion to/from matrices and to/from Euler angles. Will you add that functionality? Will you handle the left versor and right versor versions? I have a set of pure python code I've sketched out for my needs (aerospace) but would be happy to have an intrinsic Numpy solution. On Jul 16, 2011, at 9:50 AM, Martin Ling wrote: > Hi all, > > I have just pushed a package to GitHub which adds a quaternion dtype to > NumPy: https://github.com/martinling/numpy_quaternion > > Some backstory: on Wednesday I gave a talk at SciPy 2011 about an > inertial sensing simulation package I have been working on > (http://www.imusim.org/). One component I suggested might be reusable > from that code was the quaternion math implementation, written in > Cython. One of its features is a wrapper class for Nx4 NumPy arrays that > supports efficient operations using arrays of quaternion values. > > Travis Oliphant suggested that a quaternion dtype would be a better > solution, and got me talking to Mark Weibe about this. With Mark's help > I completed this initial version at yesterday's sprint session. > > Incidentally, how to do something like this isn't well documented and I > would have had little hope without both Mark's in-person help and his > previous code (for adding a half-precision float dtype) to refer to. I > don't know what the consensus is about whether people writing custom > dtypes is a desirable thing, but if it is then the process needs to be > made a lot easier. That said, the fact this is doable without patching > the numpy core at all is really, really nice. > > Example usage: > >>>> import numpy as np >>>> import quaternion >>>> np.quaternion(1,0,0,0) > quaternion(1, 0, 0, 0) >>>> q1 = np.quaternion(1,2,3,4) >>>> q2 = np.quaternion(5,6,7,8) >>>> q1 * q2 > quaternion(-60, 12, 30, 24) >>>> a = np.array([q1, q2]) >>>> a > array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], > dtype=quaternion) >>>> exp(a) > array([quaternion(1.69392, -0.78956, -1.18434, -1.57912), > quaternion(138.909, -25.6861, -29.9671, -34.2481)], > dtype=quaternion) > > The following ufuncs are implemented: > add, subtract, multiply, divide, log, exp, power, negative, conjugate, > copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, > absolute > > Quaternion components are stored as doubles. The package could be extended > to support e.g. qfloat, qdouble, qlongdouble > > Comparison operations follow the same lexicographic ordering as tuples. > > The unary tests isnan, isinf and isfinite return true if they would > return true for any individual component. > > Real types may be cast to quaternions, giving quaternions with zero for > all three imaginary components. Complex types may also be cast to > quaternions, with their single imaginary component becoming the first > imaginary component of the quaternion. Quaternions may not be cast to > real or complex types. > > Comments very welcome. This is my first attempt at NumPy hacking :-) > > > Martin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From johan.rade at gmail.com Thu Jul 28 01:50:05 2011 From: johan.rade at gmail.com (=?ISO-8859-1?Q?Johan_R=E5de?=) Date: Thu, 28 Jul 2011 07:50:05 +0200 Subject: [Numpy-discussion] C-API: PyTypeObject* for NumPy scalar types Message-ID: How do I get the PyTypeObject* for a NumPy scalar type such as np.uint8? (The reason I'm asking is the following: I'm writing a C++ extension module. The Python interface to the module has a function f that takes a NumPy scalar type as an argument, for instance f(np.uint8). Then the corresponding C++ function receives a PyObject* and needs to decide which type object it points to.) --Johan From martin-numpy at earth.li Thu Jul 28 08:42:19 2011 From: martin-numpy at earth.li (Martin Ling) Date: Thu, 28 Jul 2011 13:42:19 +0100 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> Message-ID: <20110728124218.GK3465@earth.li> On Wed, Jul 27, 2011 at 10:29:08PM -0500, Robert Love wrote: > > To use quaternions I find I often need conversion to/from matrices and > to/from Euler angles. Will you add that functionality? Yes, I intend to. Note that these conversions are already available in the standalone (non-dtype) implementation in imusim.maths.quaternions: http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromEuler http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toEuler http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromMatrix http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toMatrix I should do a new release though - the Euler methods there only support ZYX and ZXY order conversions, my development version supports any order. > Will you handle the left versor and right versor versions? I don't know what this means. Please enlighten me and I'll be happy to try! I thought a 'right versor' was a unit quaternion representing an angle of 90 degrees (as in 'right angle') - I don't see what a 'left' one would be. Martin From martin-numpy at earth.li Thu Jul 28 09:54:50 2011 From: martin-numpy at earth.li (Martin Ling) Date: Thu, 28 Jul 2011 14:54:50 +0100 Subject: [Numpy-discussion] Issues with adding new dtypes - customizing ndarray attributes Message-ID: <20110728135450.GL3465@earth.li> Hi, I'd like to kick off some discussion on general issues I've encountered while developing the quaternion dtype (see other thread, and the code at: https://github.com/martinling/numpy_quaternion) The basic issue is that the attributes of ndarray cannot be adapted to the dtype of a given array. Indeed, their behaviour can't be changed at all without patching numpy itself. There are essentially four cases of the problem: 1. Attributes which do the wrong thing even though there is a mechanism that should let them do the right thing, e.g: >>> a = array([quaternion(1,2,3,4), quaternion(5,6,7,8)]) >>> conjugate(a) # correct, calls conjugate ufunc I defined array([quaternion(1, -2, -3, -4), quaternion(5, -6, -7, -8)], dtype=quaternion) >>> a.conjugate() # incorrect, why doesn't it do the same? array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], dtype=quaternion) >>> min(a) # works, calls min ufunc I defined quaternion(1, 2, 3, 4) >>> a.min() # fails, why doesn't it do the same? ValueError: No cast function available. 2. Attributes that do the wrong thing with no mechanism to override them: >>> array([q.real for q in a]) array([ 1., 5.]) >>> a.real # would like this to return the same, can't make it do so array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], dtype=quaternion) 3. Attributes that don't exist and could be added to suit the dtype: >>> array([q.y for q in a]) array([ 3., 7.]) >>> a.y # would like this to return the same, can't make it do so AttributeError: 'numpy.ndarray' object has no attribute 'y' 4. Attributes that already exist and make no sense for some dtypes: >>> sa = array(['foo', 'bar', 'baz']) >>> sa.imag # why can I even do this? array(['', '', ''], dtype='|S3') We had ?ome discussion about this at the SciPy conference sprints and the consensus seemed to be that allowing dtypes to customize the attributes of ndarrays would be a good thing. This would also be useful for struct arrays, datetime arrays, etc. What do people think? Martin From meine at informatik.uni-hamburg.de Thu Jul 28 10:58:23 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Thu, 28 Jul 2011 16:58:23 +0200 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: <201107211656.21611.meine@informatik.uni-hamburg.de> References: <201107211656.21611.meine@informatik.uni-hamburg.de> Message-ID: <201107281658.24102.meine@informatik.uni-hamburg.de> Hi again! Am Donnerstag, 21. Juli 2011, 16:56:21 schrieb Hans Meine: > import numpy > > class Test(numpy.ndarray): > pass > > a1 = numpy.ndarray((1,)) > a2 = Test((1,)) > > assert type(a1.min()) == type(a2.min()), \ > "%s != %s" % (type(a1.min()), type(a2.min())) > # --------------------------------------------------- > > This code fails with 1.6.0, while it worked in 1.3.0. I just tried with 1.5.1 (Ubuntu natty), and it works, too. Thus, this behavor-incompatible change happened between 1.5.1 and 1.6.0. > I tend to think that this is a bug (after all, a1.min() does not return > ndarray, but an array scalar), but maybe there is also a good reason for > this (for us, unexpected) behavor change and a nice solution? Unfortunately, I did not receive any answers so far. Have a nice day, Hans From matthew.brett at gmail.com Thu Jul 28 11:42:38 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2011 08:42:38 -0700 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: <201107281658.24102.meine@informatik.uni-hamburg.de> References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> Message-ID: Hi, On Thu, Jul 28, 2011 at 7:58 AM, Hans Meine wrote: > Hi again! > > Am Donnerstag, 21. Juli 2011, 16:56:21 schrieb Hans Meine: >> import numpy >> >> class Test(numpy.ndarray): >> ? ? pass >> >> a1 = numpy.ndarray((1,)) >> a2 = Test((1,)) >> >> assert type(a1.min()) == type(a2.min()), \ >> ? "%s != %s" % (type(a1.min()), type(a2.min())) >> # --------------------------------------------------- >> >> This code fails with 1.6.0, while it worked in 1.3.0. > > I just tried with 1.5.1 (Ubuntu natty), and it works, too. > > Thus, this behavor-incompatible change happened between 1.5.1 and 1.6.0. > >> I tend to think that this is a bug (after all, a1.min() does not return >> ndarray, but an array scalar), but maybe there is also a good reason for >> this (for us, unexpected) behavor change and a nice solution? > > Unfortunately, I did not receive any answers so far. Sorry about the lack of replies. If I understand you correctly, the problem is that, for 1.5.1: >>> class Test(np.ndarray): pass >>> type(np.min(Test((1,)))) and for 1.6.0 (and current trunk): >>> class Test(np.ndarray): pass >>> type(np.min(Test((1,)))) So, 1.6.0 is returning a zero-dimensional scalar of the given type, and 1.5.1 returns a python scalar. Zero dimensional scalars are designed to behave in a similar way to python scalars, so the change should be all but invisible in practice. Was there a particular case you ran into where this was a problem? Best, Matthew From ralf.gommers at googlemail.com Thu Jul 28 13:13:15 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Thu, 28 Jul 2011 19:13:15 +0200 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: <20110727232949.GF14003@phare.normalesup.org> References: <20110727220706.GB14003@phare.normalesup.org> <20110727232949.GF14003@phare.normalesup.org> Message-ID: On Thu, Jul 28, 2011 at 1:29 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Wed, Jul 27, 2011 at 05:25:20PM -0600, Charles R Harris wrote: > > Well, doc tests are just a losing proposition, no one should be using > them > > for writing tests. It's not like this is a new discovery, doc tests > have > > been known to be unstable for years. > > Untested documentation is broken in my experience. This is why I do rely > a lot on doctests. > Rely as in making sure that the examples run once in a while and before a release is of course a good idea. Failures can be inspected and ignored if there are only minor differences in string representation. Relying on doctests as in "they replace the unit tests I should also have written" is another thing altogether - unnecessary and expecting an unrealistic level of backward compatibility. That of course doesn't mean things in numpy should change without a good reason, but it seems there was one. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Jul 28 13:19:40 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2011 10:19:40 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> <20110727232949.GF14003@phare.normalesup.org> Message-ID: Hi, On Thu, Jul 28, 2011 at 10:13 AM, Ralf Gommers wrote: > > > On Thu, Jul 28, 2011 at 1:29 AM, Gael Varoquaux > wrote: >> >> On Wed, Jul 27, 2011 at 05:25:20PM -0600, Charles R Harris wrote: >> > ? ?Well, doc tests are just a losing proposition, no one should be using >> > them >> > ? ?for writing tests. It's not like this is a new discovery, doc tests >> > have >> > ? ?been known to be unstable for years. >> >> Untested documentation is broken in my experience. This is why I do rely >> a lot on doctests. > > Rely as in making sure that the examples run once in a while and before a > release is of course a good idea. Failures can be inspected and ignored if > there are only minor differences in string representation. I think automated and frequent testing of the doctests considerably reduces the risk of broken documentation. If the code-base gets big enough, scanning the errors by eye isn't efficient, and the result can only be running the tests less often and detecting errors less often. > Relying on doctests as in "they replace the unit tests I should also have > written" is another thing altogether - unnecessary and expecting an > unrealistic level of backward compatibility. That of course doesn't mean > things in numpy should change without a good reason, but it seems there was > one. I don't think anyone suggested that doctests should replace unit tests; it's a bit difficult to see why that discussion started. Best, Matthew From njs at pobox.com Thu Jul 28 16:09:27 2011 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 28 Jul 2011 13:09:27 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: I have a different question about this than the rest of the thread. I'm confused at why there isn't a programmatic way to create a datetime dtype, other than by going through this special string-based mini-language. I guess I generally think of string-based dtype descriptors as being a legacy thing necessary for compatibility, but probably better to avoid in new code, now that we have nice python ways to describe dtypes with scalar types and such. Probably that's a minority opinion, but even putting it aside: it certainly isn't the case that we can describe arbitrary dtypes using strings right now - think of record types and so on. And even restricting ourselves to atomic styles, I'm skeptical about this claim that we'll be able to use strings for everything in the future, too. My pet possible future dtype is one for categorical data, which would be parametrized by the set of possible categories; I don't relish the idea of making up some ad hoc syntax for specifying such lists within the dtype mini-language. So is the plan actually to promote strings as the canonical way of describing dtypes? Aside from the question of what repr does, shouldn't there actually be some sort of syntax like dtype=np.datetime64("D") available as a working option? - Nathaniel On Jul 27, 2011 10:55 AM, "Mark Wiebe" wrote: > This was the most consistent way to deal with the parameterized dtype in the > repr, making it more future-proof at the same time. It was producing reprs > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly wrong, > and putting quotes around it makes it work in general for all possible > dtypes, present and future. > > -Mark > > On Wed, Jul 27, 2011 at 12:50 PM, Matthew Brett wrote: > >> Hi, >> >> I see that (current trunk): >> >> In [9]: np.ones((1,), dtype=bool) >> Out[9]: array([ True], dtype='bool') >> >> - whereas (1.5.1): >> >> In [2]: np.ones((1,), dtype=bool) >> Out[2]: array([ True], dtype=bool) >> >> That is breaking quite a few doctests. What is the reason for the >> change? Something to do with more planned dtypes? >> >> Thanks a lot, >> >> Matthew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Thu Jul 28 16:29:07 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 Jul 2011 13:29:07 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> <20110727232949.GF14003@phare.normalesup.org> Message-ID: On Thu, Jul 28, 2011 at 10:19 AM, Matthew Brett wrote: > I don't think anyone suggested that doctests should replace unit > tests; it's a bit difficult to see why that discussion started. The conversation started because array([True], dtype=bool) changed to array([True], dtype='bool') or something along those lines. A reasonable expectation is that eval(repr(x)) should produce x. An unreasonable expectation is to expect the repr string to remain exactly the same over versions (as doctest does). So, while there seems to be a simple solution in this case, I don't think the change was unreasonable or wrong. Regards St?fan From matthew.brett at gmail.com Thu Jul 28 16:48:29 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2011 13:48:29 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> <20110727232949.GF14003@phare.normalesup.org> Message-ID: Hi, 2011/7/28 St?fan van der Walt : > On Thu, Jul 28, 2011 at 10:19 AM, Matthew Brett wrote: >> I don't think anyone suggested that doctests should replace unit >> tests; it's a bit difficult to see why that discussion started. > > The conversation started because array([True], dtype=bool) changed to > array([True], dtype='bool') or something along those lines. ?A > reasonable expectation is that eval(repr(x)) should produce x. ?An > unreasonable expectation is to expect the repr string to remain > exactly the same over versions (as doctest does). > > So, while there seems to be a simple solution in this case, I don't > think the change was unreasonable or wrong. I don't think you'll find anyone saying the change was unreasonable or wrong. It wouldn't really matter if it was, Mark's making a lot of changes, and if some of them are wrong, that's just how it goes when you change stuff. The thread was first about how to deal with the change, and second about the tone of Mark's - and I suppose my own - replies. See you, Matthew From paul.anton.letnes at gmail.com Thu Jul 28 17:05:18 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Thu, 28 Jul 2011 23:05:18 +0200 Subject: [Numpy-discussion] ticket 1793 Message-ID: <483275D8-CBD9-4ECD-AD42-CE4AAF27431B@gmail.com> Hi! In my quest for a bug-free numpy I have, I think, fixed ticket 1793. https://github.com/numpy/numpy/pull/123 Enjoy - feedback welcome, of course. Cheers, Paul. From jlconlin at gmail.com Thu Jul 28 18:19:51 2011 From: jlconlin at gmail.com (Jeremy Conlin) Date: Thu, 28 Jul 2011 16:19:51 -0600 Subject: [Numpy-discussion] Can I index array starting with 1? Message-ID: I have a need to index my array(s) starting with a 1 instead of a 0. The reason for this is to be consistent with the documentation of a format I'm accessing. I know I can just '-1' from what the documentation says, but that can get cumbersome. Is there a magic flag I can pass to a numpy array (maybe when it is created) so I can index starting at 1 instead of the Python default? Thanks, Jeremy From stefan at sun.ac.za Thu Jul 28 18:31:52 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 Jul 2011 15:31:52 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> <20110727232949.GF14003@phare.normalesup.org> Message-ID: On Thu, Jul 28, 2011 at 1:48 PM, Matthew Brett wrote: > The thread was first about how to deal with the change, and second I'm still curious to know of a technical solution with doctests. Ideally, one would like to specify a set of rules that a line must pass to know whether it matched. The current system of excluding certain checks with "+SKIP" flags etc. seems fragile. Maybe a person can already do that, I'm not sure? But it would be handy to simply have an extra rule added that said: "In any line-set starting with 'array', ignore everything after dtype=" for example. Then each package would be able to keep a customised set of rules. Do you know if doctests supports any sort of manual intervention, like a plugin system? Regards St?fan From stefan at sun.ac.za Thu Jul 28 18:39:15 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 Jul 2011 15:39:15 -0700 Subject: [Numpy-discussion] ticket 1793 In-Reply-To: <483275D8-CBD9-4ECD-AD42-CE4AAF27431B@gmail.com> References: <483275D8-CBD9-4ECD-AD42-CE4AAF27431B@gmail.com> Message-ID: On Thu, Jul 28, 2011 at 2:05 PM, Paul Anton Letnes wrote: > In my quest for a bug-free numpy I have, I think, fixed ticket 1793. > https://github.com/numpy/numpy/pull/123 This brings up an interesting question. When raising warnings, they only show for the first time, unless the system is specially configured. Should we consider sending messages to a logger as well? Is there already a way of handling this in numpy? Regards St?fan From stefan at sun.ac.za Thu Jul 28 18:42:54 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 Jul 2011 15:42:54 -0700 Subject: [Numpy-discussion] Can I index array starting with 1? In-Reply-To: References: Message-ID: Hi Jeremy On Thu, Jul 28, 2011 at 3:19 PM, Jeremy Conlin wrote: > I have a need to index my array(s) starting with a 1 instead of a 0. > The reason for this is to be consistent with the documentation of a > format I'm accessing. I know I can just '-1' from what the > documentation says, but that can get cumbersome. > > Is there a magic flag I can pass to a numpy array (maybe when it is > created) so I can index starting at 1 instead of the Python default? You may want to have a look at some of the labeled array packages out there, such as larry, datarray, pandas, etc. I'm sure at least one of them allows integer re-labelling, although I'm not certain whether it can be done in a programmatic fashion. An alternative may be to create an indexing function that remaps the input space, e.g.: def ix(n): return n - 1 x[ix(3), ix(5)] But that looks pretty nasty, and you'll have to expand ix quite a bit to handle slices, etc. :/ St?fan From aarchiba at physics.mcgill.ca Thu Jul 28 19:10:51 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Thu, 28 Jul 2011 19:10:51 -0400 Subject: [Numpy-discussion] Can I index array starting with 1? In-Reply-To: References: Message-ID: Don't forget the everything-looks-like-a-nail approach: make all your arrays one bigger than you need and ignore element zero. Anne On 7/28/11, St?fan van der Walt wrote: > Hi Jeremy > > On Thu, Jul 28, 2011 at 3:19 PM, Jeremy Conlin wrote: >> I have a need to index my array(s) starting with a 1 instead of a 0. >> The reason for this is to be consistent with the documentation of a >> format I'm accessing. I know I can just '-1' from what the >> documentation says, but that can get cumbersome. >> >> Is there a magic flag I can pass to a numpy array (maybe when it is >> created) so I can index starting at 1 instead of the Python default? > > You may want to have a look at some of the labeled array packages out > there, such as larry, datarray, pandas, etc. I'm sure at least one of > them allows integer re-labelling, although I'm not certain whether it > can be done in a programmatic fashion. > > An alternative may be to create an indexing function that remaps the > input space, e.g.: > > def ix(n): > return n - 1 > > x[ix(3), ix(5)] > > But that looks pretty nasty, and you'll have to expand ix quite a bit > to handle slices, etc. :/ > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Sent from my mobile device From stefan at sun.ac.za Thu Jul 28 19:19:18 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 Jul 2011 16:19:18 -0700 Subject: [Numpy-discussion] Can I index array starting with 1? In-Reply-To: References: Message-ID: On Thu, Jul 28, 2011 at 4:10 PM, Anne Archibald wrote: > Don't forget the everything-looks-like-a-nail approach: make all your > arrays one bigger than you need and ignore element zero. Hehe, why didn't I think of that :) I guess the kind of problem I struggle with more frequently is books written with summations over -m to +n. In those cases, it's often convenient to use the mapping function, so that I can enter the formulas as they occur. St?fan From matthew.brett at gmail.com Thu Jul 28 19:21:46 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 28 Jul 2011 16:21:46 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> <20110727232949.GF14003@phare.normalesup.org> Message-ID: Hi, 2011/7/28 St?fan van der Walt : > On Thu, Jul 28, 2011 at 1:48 PM, Matthew Brett wrote: >> The thread was first about how to deal with the change, and second > > I'm still curious to know of a technical solution with doctests. > Ideally, one would like to specify a set of rules that a line must > pass to know whether it matched. ?The current system of excluding > certain checks with "+SKIP" flags etc. seems fragile. Well - you can use the +ELLIPSIS flag, but of course you have ugly and confusing ... in the docstring for the variable bits. > Maybe a person can already do that, I'm not sure? ?But it would be > handy to simply have an extra rule added that said: "In any line-set > starting with 'array', ignore everything after dtype=" for example. > Then each package would be able to keep a customised set of rules. > > Do you know if doctests supports any sort of manual intervention, like > a plugin system? Actually, I was going to ask you that question :) But yes, there's the NumpyDoctest nose plugin, for example. Using it does mean you have to customize nose somehow - in numpy's case by using the 'numpy.test()' machinery. Sympy I believe has a fair amount of machinery to work with doctests, but I haven't looked at that yet, See you, Matthew From derek at astro.physik.uni-goettingen.de Thu Jul 28 19:26:15 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 29 Jul 2011 01:26:15 +0200 Subject: [Numpy-discussion] Can I index array starting with 1? In-Reply-To: References: Message-ID: On 29.07.2011, at 1:19AM, St?fan van der Walt wrote: > On Thu, Jul 28, 2011 at 4:10 PM, Anne Archibald > wrote: >> Don't forget the everything-looks-like-a-nail approach: make all your >> arrays one bigger than you need and ignore element zero. > > Hehe, why didn't I think of that :) > > I guess the kind of problem I struggle with more frequently is books > written with summations over -m to +n. In those cases, it's often > convenient to use the mapping function, so that I can enter the > formulas as they occur. I don't want to open any cans of worms at this point, but given that Fortran90 supports such indexing (arbitrary limits, including negative ones), there definitely are use cases for it (or rather, instances where it is very convenient at least, like in St?fan's books). So I am wondering how much it would take to implement such an enhancement for the standard ndarray... Cheers, Derek From stefan at sun.ac.za Thu Jul 28 19:37:44 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 Jul 2011 16:37:44 -0700 Subject: [Numpy-discussion] Can I index array starting with 1? In-Reply-To: References: Message-ID: On Thu, Jul 28, 2011 at 4:26 PM, Derek Homeier wrote: >> I guess the kind of problem I struggle with more frequently is books >> written with summations over -m to +n. ?In those cases, it's often >> convenient to use the mapping function, so that I can enter the >> formulas as they occur. > > I don't want to open any cans of worms at this point, but given that Fortran90 supports such indexing (arbitrary limits, including negative ones), there definitely are use cases for it (or rather, instances where it is very convenient at least, like in St?fan's books). So I am wondering how much it would take to implement such an enhancement for the standard ndarray... Thinking about it, expanding on Anne's solution and the fact that Python allows negative indexing, you can simply reshuffle the array after operations a bit and have everything work out. import numpy as np n = -5 m = 7 x = np.zeros((m - n), dtype=float) # Do some operation with n..m based indexing for i in range(-5, 7): x[i] = i # Construct the output x = np.hstack((x[n:], x[:m])) print x Regards St?fan From aarchiba at physics.mcgill.ca Thu Jul 28 19:38:25 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Thu, 28 Jul 2011 19:38:25 -0400 Subject: [Numpy-discussion] Can I index array starting with 1? In-Reply-To: References: Message-ID: The can is open and the worms are everywhere, so: The big problem with one-based indexing for numpy is interpretation. In python indexing, -1 is the last element of the array, and ranges have a specific meaning. In a hypothetical one-based indexing scheme, would the last element be element 0? if not, what does looking up zero do? What about ranges - do ranges still include the first endpoint and not the second? I suppose one could choose the most pythonic of the 1-based conventions, but do any of them provide from-the-end indexing without special syntax? Once one had decided what to do, implementation would be pretty easy - just make a subclass of ndarray that replaces the indexing function. Anne On 28 July 2011 19:26, Derek Homeier wrote: > On 29.07.2011, at 1:19AM, St?fan van der Walt wrote: > >> On Thu, Jul 28, 2011 at 4:10 PM, Anne Archibald >> wrote: >>> Don't forget the everything-looks-like-a-nail approach: make all your >>> arrays one bigger than you need and ignore element zero. >> >> Hehe, why didn't I think of that :) >> >> I guess the kind of problem I struggle with more frequently is books >> written with summations over -m to +n. ?In those cases, it's often >> convenient to use the mapping function, so that I can enter the >> formulas as they occur. > > I don't want to open any cans of worms at this point, but given that Fortran90 supports such indexing (arbitrary limits, including negative ones), there definitely are use cases for it (or rather, instances where it is very convenient at least, like in St?fan's books). So I am wondering how much it would take to implement such an enhancement for the standard ndarray... > > Cheers, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From derek at astro.physik.uni-goettingen.de Thu Jul 28 20:25:51 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 29 Jul 2011 02:25:51 +0200 Subject: [Numpy-discussion] Can I index array starting with 1? In-Reply-To: References: Message-ID: <92EF6379-A9A7-4E3B-9603-F6A174D555BB@astro.physik.uni-goettingen.de> On 29.07.2011, at 1:38AM, Anne Archibald wrote: > The can is open and the worms are everywhere, so: > > The big problem with one-based indexing for numpy is interpretation. > In python indexing, -1 is the last element of the array, and ranges > have a specific meaning. In a hypothetical one-based indexing scheme, > would the last element be element 0? if not, what does looking up zero > do? What about ranges - do ranges still include the first endpoint and > not the second? I suppose one could choose the most pythonic of the > 1-based conventions, but do any of them provide from-the-end indexing > without special syntax? > I forgot, this definitely needs to be preserved for ndarray! > Once one had decided what to do, implementation would be pretty easy - > just make a subclass of ndarray that replaces the indexing function. In fact, St?fan's reshuffling trick does nearly everything I would expect for using negative indices, maybe the only functionality needed to implement is 1. define an attribute like x.start that could tell appropriate functions (e.g. for print(x) or plot(x)) the "zero-point", so x would be evaluated e.g. at x[-5], wrapping around at [x-1], x[0] to x[-6]... Should have the advantage that anything that's not yet aware of this attribute could simply ignore it. 2. allow to automatically set this starting point when creating something like "x = np.zeros(-5:7)" or setting a shape to (-5:7) - but maybe the latter is leading into very dangerous territory already... Cheers, Derek From rblove_lists at comcast.net Thu Jul 28 22:48:29 2011 From: rblove_lists at comcast.net (Robert Love) Date: Thu, 28 Jul 2011 21:48:29 -0500 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110728124218.GK3465@earth.li> References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> Message-ID: On Jul 28, 2011, at 7:42 AM, Martin Ling wrote: > On Wed, Jul 27, 2011 at 10:29:08PM -0500, Robert Love wrote: >> >> To use quaternions I find I often need conversion to/from matrices and >> to/from Euler angles. Will you add that functionality? > > Yes, I intend to. Note that these conversions are already available in > the standalone (non-dtype) implementation in imusim.maths.quaternions: > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromEuler > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toEuler > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromMatrix > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toMatrix > > I should do a new release though - the Euler methods there only support > ZYX and ZXY order conversions, my development version supports any order. > >> Will you handle the left versor and right versor versions? > > I don't know what this means. Please enlighten me and I'll be happy to > try! I thought a 'right versor' was a unit quaternion representing an > angle of 90 degrees (as in 'right angle') - I don't see what a 'left' > one would be. > Quaternions have a "handedness" or a sign convention. The recently departed Space Shuttle used a Left versor convention while most things, including Space Station, use the right versor convention, in their flight software. Made for frequent confusion. Let me see if I can illustrate by showing the functions I use for converting a matrix to a quaternion. def Quaternion_Of(m): """ Returns a quaternion in the right versor sense. """ q = N.zeros(4,float) q[0] = 0.5*sqrt(1.0 + m[0,0] + m[1,1] + m[2,2]) q04_inv = 1.0/(4.0*q[0]) q[1] = (m[1,2] - m[2,1])*q04_inv q[2] = (m[2,0] - m[0,2])*q04_inv q[3] = (m[0,1] - m[1,0])*q04_inv return q def Quaternion_Of(m): """ Returns a quaternion in the left versor sense. """ q = N.zeros(4,float) q[0] = 0.5*sqrt(1.0 + m[0,0] + m[1,1] + m[2,2]) q04_inv = 1.0/(4.0*q[0]) q[1] = (m[2,1] - m[1,2])*q04_inv q[2] = (m[0,2] - m[2,0])*q04_inv q[3] = (m[1,0] - m[0,1])*q04_inv return q Or transforming a vector using the different conventions. def Transform(q,v): """ Returns the vector part of q*vq which transforms v from one coordinate system to another. Right Versor """ u = Q.Vector_Part(q) return 2.0*(q[0]*N.cross(v,u) + N.dot(v,u)*u + (q[0]*q[0] - 0.5)*v) def Transform(q,v): """ Returns the vector part of q*vq which transforms v from one coordinate system to another. Left Versor """ u = Q.Vector_Part(q) return 2.0*(q[0]*N.cross(u,v) + N.dot(u,v)*u + (q[0]*q[0] - 0.5)*v) From robert.kern at gmail.com Thu Jul 28 23:16:58 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Jul 2011 22:16:58 -0500 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> Message-ID: On Thu, Jul 28, 2011 at 21:48, Robert Love wrote: > > On Jul 28, 2011, at 7:42 AM, Martin Ling wrote: > >> On Wed, Jul 27, 2011 at 10:29:08PM -0500, Robert Love wrote: >>> >>> To use quaternions I find I often need conversion to/from matrices and >>> to/from Euler angles. ?Will you add that functionality? >> >> Yes, I intend to. Note that these conversions are already available in >> the standalone (non-dtype) implementation in imusim.maths.quaternions: >> >> http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromEuler >> http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toEuler >> http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromMatrix >> http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toMatrix >> >> I should do a new release though - the Euler methods there only support >> ZYX and ZXY order conversions, my development version supports any order. >> >>> Will you handle the left versor and right versor versions? >> >> I don't know what this means. Please enlighten me and I'll be happy to >> try! I thought a 'right versor' was a unit quaternion representing an >> angle of 90 degrees (as in 'right angle') - I don't see what a 'left' >> one would be. >> > > Quaternions have a "handedness" or a sign convention. ?The recently departed Space Shuttle used a Left versor convention while most things, including Space Station, use the right versor convention, in their flight software. ?Made for frequent confusion. For what it's worth, I have found this paper by James Diebel to be the most complete listing of all of the different conventions and conversions amongst quaternions, Euler angles, and rotation vectors: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.110.5134 -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From robert.kern at gmail.com Thu Jul 28 23:25:47 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 28 Jul 2011 22:25:47 -0500 Subject: [Numpy-discussion] ticket 1793 In-Reply-To: References: <483275D8-CBD9-4ECD-AD42-CE4AAF27431B@gmail.com> Message-ID: 2011/7/28 St?fan van der Walt : > On Thu, Jul 28, 2011 at 2:05 PM, Paul Anton Letnes > wrote: >> In my quest for a bug-free numpy I have, I think, fixed ticket 1793. >> https://github.com/numpy/numpy/pull/123 > > This brings up an interesting question. ?When raising warnings, they > only show for the first time, unless the system is specially > configured. Note that the contents of the message is taken into account; *unique* messages only show up for the first time, but different messages for the same warning issued from the same place will all appear. Taking a quick glance at the pull request, it looks like the filename is included in the message, so the warning will appear once for each file that needs to be warned about. This seems entirely appropriate behavior to me. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From charlesr.harris at gmail.com Thu Jul 28 23:33:40 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 28 Jul 2011 21:33:40 -0600 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> Message-ID: On Thu, Jul 28, 2011 at 8:48 PM, Robert Love wrote: > > On Jul 28, 2011, at 7:42 AM, Martin Ling wrote: > > > On Wed, Jul 27, 2011 at 10:29:08PM -0500, Robert Love wrote: > >> > >> To use quaternions I find I often need conversion to/from matrices and > >> to/from Euler angles. Will you add that functionality? > > > > Yes, I intend to. Note that these conversions are already available in > > the standalone (non-dtype) implementation in imusim.maths.quaternions: > > > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromEuler > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toEuler > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromMatrix > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toMatrix > > > > I should do a new release though - the Euler methods there only support > > ZYX and ZXY order conversions, my development version supports any order. > > > >> Will you handle the left versor and right versor versions? > > > > I don't know what this means. Please enlighten me and I'll be happy to > > try! I thought a 'right versor' was a unit quaternion representing an > > angle of 90 degrees (as in 'right angle') - I don't see what a 'left' > > one would be. > > > > Quaternions have a "handedness" or a sign convention. The recently > departed Space Shuttle used a Left versor convention while most things, > including Space Station, use the right versor convention, in their flight > software. Made for frequent confusion. > > Let me see if I can illustrate by showing the functions I use for > converting a matrix to a quaternion. > > > def Quaternion_Of(m): > """ > Returns a quaternion in the right versor sense. > """ > > q = N.zeros(4,float) > > q[0] = 0.5*sqrt(1.0 + m[0,0] + m[1,1] + m[2,2]) > > q04_inv = 1.0/(4.0*q[0]) > q[1] = (m[1,2] - m[2,1])*q04_inv > q[2] = (m[2,0] - m[0,2])*q04_inv > q[3] = (m[0,1] - m[1,0])*q04_inv > > return q > > > > def Quaternion_Of(m): > """ > Returns a quaternion in the left versor sense. > """ > > q = N.zeros(4,float) > > q[0] = 0.5*sqrt(1.0 + m[0,0] + m[1,1] + m[2,2]) > > q04_inv = 1.0/(4.0*q[0]) > q[1] = (m[2,1] - m[1,2])*q04_inv > q[2] = (m[0,2] - m[2,0])*q04_inv > q[3] = (m[1,0] - m[0,1])*q04_inv > > return q > > > Or transforming a vector using the different conventions. > > > def Transform(q,v): > """ > Returns the vector part of q*vq which transforms v from one > coordinate system to another. Right Versor > """ > u = Q.Vector_Part(q) > return 2.0*(q[0]*N.cross(v,u) + > N.dot(v,u)*u + > (q[0]*q[0] - 0.5)*v) > > > def Transform(q,v): > """ > Returns the vector part of q*vq which transforms v from one > coordinate system to another. Left Versor > """ > u = Q.Vector_Part(q) > return 2.0*(q[0]*N.cross(u,v) + > N.dot(u,v)*u + > (q[0]*q[0] - 0.5)*v) > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jul 28 23:36:16 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 28 Jul 2011 21:36:16 -0600 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> Message-ID: On Thu, Jul 28, 2011 at 8:48 PM, Robert Love wrote: > > On Jul 28, 2011, at 7:42 AM, Martin Ling wrote: > > > On Wed, Jul 27, 2011 at 10:29:08PM -0500, Robert Love wrote: > >> > >> To use quaternions I find I often need conversion to/from matrices and > >> to/from Euler angles. Will you add that functionality? > > > > Yes, I intend to. Note that these conversions are already available in > > the standalone (non-dtype) implementation in imusim.maths.quaternions: > > > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromEuler > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toEuler > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#setFromMatrix > > > http://www.imusim.org/docs/api/imusim.maths.quaternions.Quaternion-class.html#toMatrix > > > > I should do a new release though - the Euler methods there only support > > ZYX and ZXY order conversions, my development version supports any order. > > > >> Will you handle the left versor and right versor versions? > > > > I don't know what this means. Please enlighten me and I'll be happy to > > try! I thought a 'right versor' was a unit quaternion representing an > > angle of 90 degrees (as in 'right angle') - I don't see what a 'left' > > one would be. > > > > Quaternions have a "handedness" or a sign convention. The recently > departed Space Shuttle used a Left versor convention while most things, > including Space Station, use the right versor convention, in their flight > software. Made for frequent confusion. > > Let me see if I can illustrate by showing the functions I use for > converting a matrix to a quaternion. > > > def Quaternion_Of(m): > """ > Returns a quaternion in the right versor sense. > """ > > q = N.zeros(4,float) > > q[0] = 0.5*sqrt(1.0 + m[0,0] + m[1,1] + m[2,2]) > > q04_inv = 1.0/(4.0*q[0]) > q[1] = (m[1,2] - m[2,1])*q04_inv > q[2] = (m[2,0] - m[0,2])*q04_inv > q[3] = (m[0,1] - m[1,0])*q04_inv > > return q > > > > def Quaternion_Of(m): > """ > Returns a quaternion in the left versor sense. > """ > > q = N.zeros(4,float) > > q[0] = 0.5*sqrt(1.0 + m[0,0] + m[1,1] + m[2,2]) > > q04_inv = 1.0/(4.0*q[0]) > q[1] = (m[2,1] - m[1,2])*q04_inv > q[2] = (m[0,2] - m[2,0])*q04_inv > q[3] = (m[1,0] - m[0,1])*q04_inv > > return q > > > Or transforming a vector using the different conventions. > > > def Transform(q,v): > """ > Returns the vector part of q*vq which transforms v from one > coordinate system to another. Right Versor > """ > u = Q.Vector_Part(q) > return 2.0*(q[0]*N.cross(v,u) + > N.dot(v,u)*u + > (q[0]*q[0] - 0.5)*v) > > > def Transform(q,v): > """ > Returns the vector part of q*vq which transforms v from one > coordinate system to another. Left Versor > """ > u = Q.Vector_Part(q) > return 2.0*(q[0]*N.cross(u,v) + > N.dot(u,v)*u + > (q[0]*q[0] - 0.5)*v) > > So they differ in whether the rotation about the direction of the vector part is left handed or right handed? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From johan.rade at gmail.com Fri Jul 29 03:27:47 2011 From: johan.rade at gmail.com (=?ISO-8859-1?Q?Johan_R=E5de?=) Date: Fri, 29 Jul 2011 09:27:47 +0200 Subject: [Numpy-discussion] C-API: PyTypeObject* for NumPy scalar types In-Reply-To: References: Message-ID: On 2011-07-28 07:50, Johan R?de wrote: > How do I get the PyTypeObject* for a NumPy scalar type such as np.uint8? > > (The reason I'm asking is the following: > I'm writing a C++ extension module. The Python interface to the module > has a function f that takes a NumPy scalar type as an argument, for > instance f(np.uint8). Then the corresponding C++ function receives a > PyObject* and needs to decide which type object it points to.) > > --Johan I have figured out the answer: PyArray_TypeObjectFromType(NPY_UINT8) --Johan From yoshi at rokuko.net Fri Jul 29 04:52:12 2011 From: yoshi at rokuko.net (Yoshi Rokuko) Date: Fri, 29 Jul 2011 10:52:12 +0200 Subject: [Numpy-discussion] c-api return two arrays Message-ID: <201107290852.p6T8qCIE014839@lotus.yokuts.org> hey, i have an algorithm that computes two matrices like that: A(i,k) = (x(i,k) + y(i,k))/norm B(i,k) = (x(i,k) - y(i,k))/norm it would be convenient to have the method like that: >>> A, B = mod.meth(C, prob=.95) is ith possible to return two arrays? best regards From meine at informatik.uni-hamburg.de Fri Jul 29 05:31:24 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Fri, 29 Jul 2011 11:31:24 +0200 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> Message-ID: <201107291131.24260.meine@informatik.uni-hamburg.de> Am Donnerstag, 28. Juli 2011, 17:42:38 schrieb Matthew Brett: > If I understand you correctly, the problem is that, for 1.5.1: > >>> class Test(np.ndarray): pass > >>> type(np.min(Test((1,)))) > > > > and for 1.6.0 (and current trunk): > >>> class Test(np.ndarray): pass > >>> type(np.min(Test((1,)))) > > Correct. This is the behavior change we're talking (/complaining) about. > So, 1.6.0 is returning a zero-dimensional scalar of the given type, > and 1.5.1 returns a python scalar. I am not sure about the terminology; I would have said that 1.5.1 returns a "numpy array scalar" (as opposed to a built-in python scalar), but at least we agree on the /scalar/ part here. OTOH, 1.6.0 does not return a zero-dimensional /scalar/, but a zero-rank /array/, which may be used to represent scalars in some contexts. But IMO this makes array subclassing much harder, since you need to special-case zero- dimensional arrays in all your member functions. IOW, it is hard to make a zero-dimensional instance of your array subclass "quack" like a scalar. And I do not see any need / reason for the above change. Right now, I found http://projects.scipy.org/numpy/wiki/ZeroRankArray which is an interesting read in this context, but does not seem to mention this change (age of that page is 5-6 years). > Zero dimensional scalars are designed to behave in a similar way to > python scalars, so the change should be all but invisible in practice. Now I am not sure if you're confusing something. I do not complain about min returning a numpy.float64 instead of a float, but about numpy wrapping the scalar in an instance of my *array* subclass. This is also inconsistent with ndarray, and the matrix class contains an explicit workaround, returning self[0,0] in cases where a scalar is expected: class matrix(N.ndarray): [...] def min(self, axis=None, out=None): """...""" return N.ndarray.min(self, axis, out)._align(axis) def _align(self, axis): """A convenience function for operations that need to preserve axis orientation. """ if axis is None: return self[0,0] <=== HERE elif axis==0: return self elif axis==1: return self.transpose() else: raise ValueError, "unsupported axis" > Was there a particular case you ran into where this was a problem? Yes, but it is not easy to reproduce for me right now (changed versions of numpy, our software, etc.). Basically, the problem arose because our ndarray subclass does not support zero-rank-instances fully. (And previously, there was no need for that.) Have a nice day, Hans From numpy-discussion at maubp.freeserve.co.uk Fri Jul 29 05:50:17 2011 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Fri, 29 Jul 2011 10:50:17 +0100 Subject: [Numpy-discussion] c-api return two arrays In-Reply-To: <201107290852.p6T8qCIE014839@lotus.yokuts.org> References: <201107290852.p6T8qCIE014839@lotus.yokuts.org> Message-ID: On Fri, Jul 29, 2011 at 9:52 AM, Yoshi Rokuko wrote: > > hey, i have an algorithm that computes two matrices like that: > > A(i,k) = (x(i,k) + y(i,k))/norm > B(i,k) = (x(i,k) - y(i,k))/norm > > it would be convenient to have the method like that: > >>>> A, B = mod.meth(C, prob=.95) > > is ith possible to return two arrays? > > best regards Yes, return a tuple of two elements. e.g. def make_range(center, spread): return center-spread, center+spread low, high = make_range(5,1) assert low == 4 assert high == 6 Peter From meine at informatik.uni-hamburg.de Fri Jul 29 06:12:19 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Fri, 29 Jul 2011 12:12:19 +0200 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: <201107291131.24260.meine@informatik.uni-hamburg.de> References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107291131.24260.meine@informatik.uni-hamburg.de> Message-ID: <201107291212.19475.meine@informatik.uni-hamburg.de> Am Freitag, 29. Juli 2011, 11:31:24 schrieb Hans Meine: > Am Donnerstag, 28. Juli 2011, 17:42:38 schrieb Matthew Brett: > > Was there a particular case you ran into where this was a problem? > [...] > Basically, the problem arose because our ndarray subclass does not support > zero-rank-instances fully. (And previously, there was no need for that.) I just reproduced the problem, it was this exception: /home/hmeine/new_numpy/lib64/python2.6/site-packages/vigra/arraytypes.pyc in reshape(self, shape, order) 587 588 def reshape(self, shape, order='C'): --> 589 res = numpy.ndarray.reshape(self, shape, order) 590 res.axistags = AxisTags(res.ndim) 591 return res TypeError: an integer is required The problem is that 'self' has become a zero-rank array, and those cannot be reshaped in order to add singleton dimensions anymore. IOW, if you implement sth. like broadcasting, this is made much harder. Best, Hans From yoshi at rokuko.net Fri Jul 29 06:37:45 2011 From: yoshi at rokuko.net (Yoshi Rokuko) Date: Fri, 29 Jul 2011 12:37:45 +0200 Subject: [Numpy-discussion] c-api return two arrays In-Reply-To: References: <201107290852.p6T8qCIE014839@lotus.yokuts.org> Message-ID: <201107291037.p6TAbjdJ016526@lotus.yokuts.org> +-------------------------------------------- Peter -----------+ > On Fri, Jul 29, 2011 at 9:52 AM, Yoshi Rokuko wrote: > > > > hey, i have an algorithm that computes two matrices like that: > > > > A(i,k) = (x(i,k) + y(i,k))/norm > > B(i,k) = (x(i,k) - y(i,k))/norm > > > > it would be convenient to have the method like that: > > > >>>> A, B = mod.meth(C, prob=.95) > > > > is ith possible to return two arrays? > > > > best regards > > Yes, return a tuple of two elements. e.g. > > def make_range(center, spread): > return center-spread, center+spread > > low, high = make_range(5,1) > assert low == 4 > assert high == 6 > sorry i had important information only in subject: i'm writing in c - so i'm talking about something like: static PyObject * mod_meth(PyObject *self, PyObject* args) { PyObject *arg; PyArrayObject *A, *B, *C; [...] a magic merge: mergedouts = (A, B) [...] return PyArray_Return(mergedouts); } is it clear what i mean? - yoshi From pav at iki.fi Fri Jul 29 07:22:39 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 29 Jul 2011 11:22:39 +0000 (UTC) Subject: [Numpy-discussion] c-api return two arrays References: <201107290852.p6T8qCIE014839@lotus.yokuts.org> Message-ID: Fri, 29 Jul 2011 10:52:12 +0200, Yoshi Rokuko wrote: [clip] >>>> A, B = mod.meth(C, prob=.95) > > is ith possible to return two arrays? The way to do this in Python is to build a tuple with Py_BuildValue("OO", A, B) and return that. From charlesr.harris at gmail.com Fri Jul 29 09:28:37 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Jul 2011 07:28:37 -0600 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: <201107291212.19475.meine@informatik.uni-hamburg.de> References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107291131.24260.meine@informatik.uni-hamburg.de> <201107291212.19475.meine@informatik.uni-hamburg.de> Message-ID: On Fri, Jul 29, 2011 at 4:12 AM, Hans Meine wrote: > Am Freitag, 29. Juli 2011, 11:31:24 schrieb Hans Meine: > > Am Donnerstag, 28. Juli 2011, 17:42:38 schrieb Matthew Brett: > > > Was there a particular case you ran into where this was a problem? > > [...] > > Basically, the problem arose because our ndarray subclass does not > support > > zero-rank-instances fully. (And previously, there was no need for that.) > > I just reproduced the problem, it was this exception: > > /home/hmeine/new_numpy/lib64/python2.6/site-packages/vigra/arraytypes.pyc > in > reshape(self, shape, order) > 587 > 588 def reshape(self, shape, order='C'): > --> 589 res = numpy.ndarray.reshape(self, shape, order) > 590 res.axistags = AxisTags(res.ndim) > 591 return res > > TypeError: an integer is required > > The problem is that 'self' has become a zero-rank array, and those cannot > be > reshaped in order to add singleton dimensions anymore. IOW, if you > implement > sth. like broadcasting, this is made much harder. > > What is self and shape in this example? Out of curiosity, if you don't support all the ndarray operations, why are you subclassing ndarray? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 29 09:49:35 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 29 Jul 2011 08:49:35 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: On Thu, Jul 28, 2011 at 3:09 PM, Nathaniel Smith wrote: > I have a different question about this than the rest of the thread. I'm > confused at why there isn't a programmatic way to create a datetime dtype, > other than by going through this special string-based mini-language. I guess > I generally think of string-based dtype descriptors as being a legacy thing > necessary for compatibility, but probably better to avoid in new code, now > that we have nice python ways to describe dtypes with scalar types and such. > Probably that's a minority opinion, but even putting it aside: it certainly > isn't the case that we can describe arbitrary dtypes using strings right now > - think of record types and so on. And even restricting ourselves to atomic > styles, I'm skeptical about this claim that we'll be able to use strings for > everything in the future, too. My pet possible future dtype is one for > categorical data, which would be parametrized by the set of possible > categories; I don't relish the idea of making up some ad hoc syntax for > specifying such lists within the dtype mini-language. > > So is the plan actually to promote strings as the canonical way of > describing dtypes? Aside from the question of what repr does, shouldn't > there actually be some sort of syntax like dtype=np.datetime64("D") > available as a working option? > I've thought about having something like this in addition to the string format, but haven't worked it through. Calling np.datetime64("D") is creating a datetime64 scalar. What would more closely match the string syntax is np.datetime64["D"], which would require overloading __getitem__ in the type object, something I haven't tried. Since this is something that could be just as easily added later, I was treating it as pretty low on the long datetime TODO list. I'm personally more in favour of there being a canonical string representation of each dtype, similar to the way Python repr(obj) is intended to be able to reconstruct the object where possible. It would be nice to come up with an unambiguous string format for struct dtypes, that is definitely something I see as missing. Being able to construct the dtype without using the string is very good, though, too. -Mark > - Nathaniel > On Jul 27, 2011 10:55 AM, "Mark Wiebe" wrote: > > This was the most consistent way to deal with the parameterized dtype in > the > > repr, making it more future-proof at the same time. It was producing > reprs > > like "array(['2011-01-01'], dtype=datetime64[D])", which is clearly > wrong, > > and putting quotes around it makes it work in general for all possible > > dtypes, present and future. > > > > -Mark > > > > On Wed, Jul 27, 2011 at 12:50 PM, Matthew Brett >wrote: > > > >> Hi, > >> > >> I see that (current trunk): > >> > >> In [9]: np.ones((1,), dtype=bool) > >> Out[9]: array([ True], dtype='bool') > >> > >> - whereas (1.5.1): > >> > >> In [2]: np.ones((1,), dtype=bool) > >> Out[2]: array([ True], dtype=bool) > >> > >> That is breaking quite a few doctests. What is the reason for the > >> change? Something to do with more planned dtypes? > >> > >> Thanks a lot, > >> > >> Matthew > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dipo.elegbede at gmail.com Fri Jul 29 09:55:15 2011 From: dipo.elegbede at gmail.com (DIPO ELEGBEDE) Date: Fri, 29 Jul 2011 14:55:15 +0100 Subject: [Numpy-discussion] Numpy Help Message-ID: Hi All, I am fresh on this list and would be looking forward to as much help as I can get. I am hoping to develop ino helping others too after a short while. Kindly help me with this task. I would appreciate if you can point me to an example or brief explanation. I have a 4 by 4 matrix filled with 0s, 1s and 2s. I want to loop through the whole matrix to get the fields with 1s and 2s only and then count how many ones and how many twos. Please help. -- Dipo Elegbede,OCA 08033299270,08077682428 -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 29 10:07:16 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 29 Jul 2011 09:07:16 -0500 Subject: [Numpy-discussion] boolean array indexing and broadcasting rules Message-ID: As part of supporting the NA mask, I've rewritten boolean indexing. Here's a timing comparison of my version versus a previous version: In [2]: np.__version__ Out[2]: '1.4.1' In [3]: a = np.zeros((1000,1000)) In [4]: mask = np.random.rand(1000,1000) > 0.5 In [5]: timeit a[mask] = 1.5 10 loops, best of 3: 71.5 ms per loop In [2]: np.__version__ Out[2]: '2.0.0.dev-a1e98d4' In [3]: a = np.zeros((1000,1000)) In [4]: mask = np.random.rand(1000,1000) > 0.5 In [5]: timeit a[mask] = 1.5 100 loops, best of 3: 12.6 ms per loop That's a 5.6 times speedup. Unfortunately, it turns out that the old code didn't use NumPy broadcasting rules. This change found a bug in the hardmask code of numpy.ma, so I think there's definitely a case to be made that it is improvement, but at the same time it looks like the NumPy and SciPy polynomial code is using this behavior. I get test errors of the following form in Scipy: ====================================================================== ERROR: test_orthogonal_eval.TestPolys.test_sh_chebyu ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/nose-1.0.0-py2.7.egg/nose/case.py", line 187, in runTest self.test(*self.arg) File "/home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/scipy/special/tests/test_orthogonal_eval.py", line 111, in test_sh_chebyu param_ranges=[], x_range=[0, 1]) File "/home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/scipy/special/tests/test_orthogonal_eval.py", line 66, in check_poly ds.check() File "/home/mwiebe/virtualenvs/dev/lib/python2.7/site-packages/scipy/special/_testutils.py", line 165, in check data = data[param_mask] ValueError: operands could not be broadcast together with shapes (100,3) (100) What are people's thoughts on this predicament? Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 29 10:18:51 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 29 Jul 2011 09:18:51 -0500 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: Message-ID: I've merged a pull request from Alok Singhal which implements Robert Kern's idea for this. Thanks, Mark On Wed, Jul 27, 2011 at 12:50 PM, Matthew Brett wrote: > Hi, > > I see that (current trunk): > > In [9]: np.ones((1,), dtype=bool) > Out[9]: array([ True], dtype='bool') > > - whereas (1.5.1): > > In [2]: np.ones((1,), dtype=bool) > Out[2]: array([ True], dtype=bool) > > That is breaking quite a few doctests. What is the reason for the > change? Something to do with more planned dtypes? > > Thanks a lot, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin-numpy at earth.li Fri Jul 29 10:24:35 2011 From: martin-numpy at earth.li (Martin Ling) Date: Fri, 29 Jul 2011 15:24:35 +0100 Subject: [Numpy-discussion] Numpy Help In-Reply-To: References: Message-ID: <20110729142435.GT3465@earth.li> On Fri, Jul 29, 2011 at 02:55:15PM +0100, DIPO ELEGBEDE wrote: > > I have a 4 by 4 matrix filled with 0s, 1s and 2s. > I want to loop through the whole matrix to get the fields with 1s and 2s > only and then count how many ones and how many twos. Try this: >>> m = matrix('1,2,0,2;2,2,1,0;0,2,0,2;1,1,0,2') >>> m matrix([[1, 2, 0, 2], [2, 2, 1, 0], [0, 2, 0, 2], [1, 1, 0, 2]]) >>> sum(m == 1) 4 >>> sum(m == 2) 7 This works because 'm == 1' evaluates to a boolean matrix whose elements are true where that element of m is equal to 1: >>> m == 1 matrix([[ True, False, False, False], [False, False, True, False], [False, False, False, False], [ True, True, False, False]], dtype=bool) Calling sum() on this matrix adds up the number of true elements. I suggest you read the NumPy tutorial: http://www.scipy.org/Tentative_NumPy_Tutorial This sort of thing is covered under 'Indexing with Boolean Arrays': http://www.scipy.org/Tentative_NumPy_Tutorial#head-d55e594d46b4f347c20efe1b4c65c92779f06268 Martin From mwwiebe at gmail.com Fri Jul 29 10:30:37 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 29 Jul 2011 09:30:37 -0500 Subject: [Numpy-discussion] Issues with adding new dtypes - customizing ndarray attributes In-Reply-To: <20110728135450.GL3465@earth.li> References: <20110728135450.GL3465@earth.li> Message-ID: On Thu, Jul 28, 2011 at 8:54 AM, Martin Ling wrote: > Hi, > > I'd like to kick off some discussion on general issues I've encountered > while developing the quaternion dtype (see other thread, and the code > at: https://github.com/martinling/numpy_quaternion) > > The basic issue is that the attributes of ndarray cannot be adapted > to the dtype of a given array. Indeed, their behaviour can't be changed > at all without patching numpy itself. > > There are essentially four cases of the problem: > > 1. Attributes which do the wrong thing even though there is a mechanism > that should let them do the right thing, e.g: > > >>> a = array([quaternion(1,2,3,4), quaternion(5,6,7,8)]) > > >>> conjugate(a) # correct, calls conjugate ufunc I defined > array([quaternion(1, -2, -3, -4), quaternion(5, -6, -7, -8)], > dtype=quaternion) > > >>> a.conjugate() # incorrect, why doesn't it do the same? > array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], dtype=quaternion) > > >>> min(a) # works, calls min ufunc I defined > quaternion(1, 2, 3, 4) > > >>> a.min() # fails, why doesn't it do the same? > ValueError: No cast function available. > > 2. Attributes that do the wrong thing with no mechanism to override them: > > >>> array([q.real for q in a]) > array([ 1., 5.]) > > >>> a.real # would like this to return the same, can't make it do so > array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], dtype=quaternion) > > 3. Attributes that don't exist and could be added to suit the dtype: > > >>> array([q.y for q in a]) > array([ 3., 7.]) > > >>> a.y # would like this to return the same, can't make it do so > AttributeError: 'numpy.ndarray' object has no attribute 'y' > > 4. Attributes that already exist and make no sense for some dtypes: > > >>> sa = array(['foo', 'bar', 'baz']) > > >>> sa.imag # why can I even do this? > array(['', '', ''], dtype='|S3') > > We had ?ome discussion about this at the SciPy conference sprints and > the consensus seemed to be that allowing dtypes to customize the > attributes of ndarrays would be a good thing. This would also be useful > for struct arrays, datetime arrays, etc. > > What do people think? > I was part of this discussion at SciPy, and while I was initially skeptical of giving dtypes the ability to add properties and functions to arrays built with them, the discussion at the SciPy sprint convinced me otherwise. Since the author of a dtype is fully aware of what properties and functions an array already have, they can avoid name collisions in a straightforward way. This is different from the recarray case, where assigning field names can be a lot more haphazard, and it's perfectly sane to want a field called 'sum' conflicting with the arr.sum() array method. One example where this would help is with the datetime64 type. I suggested that it might be good to automatically convert Python's datetime objects into datetime64 arrays. Here's a pull request Ben Walsh did towards that: https://github.com/numpy/numpy/pull/111 The point he raises, that np.array([datetime.date(2000, 1, 1)])[0].year would fail, could be addressed through this mechanism. -Mark > > > Martin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Fri Jul 29 10:57:43 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 29 Jul 2011 16:57:43 +0200 Subject: [Numpy-discussion] boolean array indexing and broadcasting rules In-Reply-To: References: Message-ID: <9A58AB2B-E80E-44CF-9549-EF4A07CC7C8B@gmail.com> On Jul 29, 2011, at 4:07 PM, Mark Wiebe wrote: > As part of supporting the NA mask, I've rewritten boolean indexing. Here's a timing comparison of my version versus a previous version: > > In [2]: np.__version__ > Out[2]: '1.4.1' > In [3]: a = np.zeros((1000,1000)) > In [4]: mask = np.random.rand(1000,1000) > 0.5 > In [5]: timeit a[mask] = 1.5 > 10 loops, best of 3: 71.5 ms per loop > > In [2]: np.__version__ > Out[2]: '2.0.0.dev-a1e98d4' > In [3]: a = np.zeros((1000,1000)) > In [4]: mask = np.random.rand(1000,1000) > 0.5 > In [5]: timeit a[mask] = 1.5 > 100 loops, best of 3: 12.6 ms per loop > > That's a 5.6 times speedup. > > Unfortunately, it turns out that the old code didn't use NumPy broadcasting rules. This change found a bug in the hardmask code of numpy.ma, Good! And I wouldn't worry too much about that (as nobody really uses hard masks anyway) From dipo.elegbede at gmail.com Fri Jul 29 10:58:37 2011 From: dipo.elegbede at gmail.com (DIPO ELEGBEDE) Date: Fri, 29 Jul 2011 15:58:37 +0100 Subject: [Numpy-discussion] Numpy Help In-Reply-To: <20110729142435.GT3465@earth.li> References: <20110729142435.GT3465@earth.li> Message-ID: Thanks Martins, that did the magic. Thanks so much. I'm on the tutorials now. Regards. On 29 Jul 2011 15:24, "Martin Ling" wrote: > On Fri, Jul 29, 2011 at 02:55:15PM +0100, DIPO ELEGBEDE wrote: >> >> I have a 4 by 4 matrix filled with 0s, 1s and 2s. >> I want to loop through the whole matrix to get the fields with 1s and 2s >> only and then count how many ones and how many twos. > > Try this: > >>>> m = matrix('1,2,0,2;2,2,1,0;0,2,0,2;1,1,0,2') > >>>> m > matrix([[1, 2, 0, 2], > [2, 2, 1, 0], > [0, 2, 0, 2], > [1, 1, 0, 2]]) > >>>> sum(m == 1) > 4 >>>> sum(m == 2) > 7 > > This works because 'm == 1' evaluates to a boolean matrix whose elements > are true where that element of m is equal to 1: > >>>> m == 1 > matrix([[ True, False, False, False], > [False, False, True, False], > [False, False, False, False], > [ True, True, False, False]], dtype=bool) > > Calling sum() on this matrix adds up the number of true elements. > > I suggest you read the NumPy tutorial: > http://www.scipy.org/Tentative_NumPy_Tutorial > > This sort of thing is covered under 'Indexing with Boolean Arrays': > http://www.scipy.org/Tentative_NumPy_Tutorial#head-d55e594d46b4f347c20efe1b4c65c92779f06268 > > > > Martin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin-numpy at earth.li Fri Jul 29 11:03:14 2011 From: martin-numpy at earth.li (Martin Ling) Date: Fri, 29 Jul 2011 16:03:14 +0100 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> Message-ID: <20110729150314.GU3465@earth.li> On Thu, Jul 28, 2011 at 09:48:29PM -0500, Robert Love wrote: > > Quaternions have a "handedness" or a sign convention. The recently > departed Space Shuttle used a Left versor convention while most > things, including Space Station, use the right versor convention, in > their flight software. Made for frequent confusion. > > Let me see if I can illustrate by showing the functions I use for > converting a matrix to a quaternion. > > [snip] OK, the difference here is between quaternion conjugates. Your two matrix-to-quaternion functions return the conjugate of each other, and one of your rotate-vector-by-quaternion functions uses the conjugate of the quaternion. When quaternions are used to represent rotations, quaternion conjugates represent "opposite" rotations. E.g. if you have two spacecraft A and B and are considering the rotation between them, you can describe this as the rotation from A->B or as the rotation from B->A. The two quaternions will be the conjugate of each other. Similarly, if two systems describe the same rotation on the same axes but one defines rotation using the right-hand rule and the other the left-hand rule, their two quaternions will be the conjugate of each other. What this is all about is a matter of what co-ordinate frames you choose to interpret things in, not something about a "handedness" of quaternions. Quaternions themselves do not have a "handedness", they are just numbers. If you have systems using opposite co-ordinate frames, or indeed any other differing co-ordinate systems, then the best approach is to explicitly convert everything into a single chosen co-ordinate frame before doing any calculations. Don't entangle your representation changes (e.g. matrix to quaternion) and transforms (e.g. rotate vector) with co-ordinate frame changes. Your 'left versor' functions are the correct ones. Your 'right versor' functions create and use the opposite rotations. I don't know where you got this 'left versor' and 'right versor' terminology from. This thread seems to be the only hit for these terms together on Google. Martin From mwwiebe at gmail.com Fri Jul 29 11:07:05 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 29 Jul 2011 10:07:05 -0500 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: <201107281658.24102.meine@informatik.uni-hamburg.de> References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> Message-ID: On Thu, Jul 28, 2011 at 9:58 AM, Hans Meine wrote: > Hi again! > > Am Donnerstag, 21. Juli 2011, 16:56:21 schrieb Hans Meine: > > import numpy > > > > class Test(numpy.ndarray): > > pass > > > > a1 = numpy.ndarray((1,)) > > a2 = Test((1,)) > > > > assert type(a1.min()) == type(a2.min()), \ > > "%s != %s" % (type(a1.min()), type(a2.min())) > > # --------------------------------------------------- > > > > This code fails with 1.6.0, while it worked in 1.3.0. > > I just tried with 1.5.1 (Ubuntu natty), and it works, too. > > Thus, this behavor-incompatible change happened between 1.5.1 and 1.6.0. > I dug a little bit into the relevant 1.5.x vs 1.6.x code, in the places I would most suspect a change, but couldn't find anything obvious. Here's the ndarray.min method: https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/multiarray/methods.c#L267 https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/multiarray/methods.c#L257 it calls PyArray_Min, which is here: https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/multiarray/calculation.c#L208 https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/multiarray/calculation.c#L208 this calls numpy.minimum.reduce, which ends up here: https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/umath/ufunc_object.c#L3194 https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/umath/ufunc_object.c#L3851 This function calls PyArray_Return(ret), which is how things get converted into the NumPy scalars: https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/umath/ufunc_object.c#L3365 https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/umath/ufunc_object.c#L4019 Here's the code for PyArray_Return, which was unchanged: https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/multiarray/scalarapi.c#L729 https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/multiarray/scalarapi.c#L793 Something more subtle is going on. I don't have the time to dig into this more at the moment, but hopefully these pointers into the code can help anyone out there who has the time to investigate further. We definitely need to add some tests for the desired behavior here. Cheers, -Mark > > > I tend to think that this is a bug (after all, a1.min() does not return > > ndarray, but an array scalar), but maybe there is also a good reason for > > this (for us, unexpected) behavor change and a nice solution? > > Unfortunately, I did not receive any answers so far. > > Have a nice day, > Hans > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 29 11:14:00 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Jul 2011 09:14:00 -0600 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110729150314.GU3465@earth.li> References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> Message-ID: On Fri, Jul 29, 2011 at 9:03 AM, Martin Ling wrote: > On Thu, Jul 28, 2011 at 09:48:29PM -0500, Robert Love wrote: > > > > Quaternions have a "handedness" or a sign convention. The recently > > departed Space Shuttle used a Left versor convention while most > > things, including Space Station, use the right versor convention, in > > their flight software. Made for frequent confusion. > > > > Let me see if I can illustrate by showing the functions I use for > > converting a matrix to a quaternion. > > > > [snip] > > OK, the difference here is between quaternion conjugates. Your two > matrix-to-quaternion functions return the conjugate of each other, and > one of your rotate-vector-by-quaternion functions uses the conjugate of > the quaternion. > > When quaternions are used to represent rotations, quaternion conjugates > represent "opposite" rotations. > > E.g. if you have two spacecraft A and B and are considering the rotation > between them, you can describe this as the rotation from A->B or as the > rotation from B->A. The two quaternions will be the conjugate of each > other. > > Similarly, if two systems describe the same rotation on the same axes > but one defines rotation using the right-hand rule and the other the > left-hand rule, their two quaternions will be the conjugate of each > other. > > What this is all about is a matter of what co-ordinate frames you choose > to interpret things in, not something about a "handedness" of quaternions. > Quaternions themselves do not have a "handedness", they are just numbers. > > If you have systems using opposite co-ordinate frames, or indeed any > other differing co-ordinate systems, then the best approach is to > explicitly convert everything into a single chosen co-ordinate frame > before doing any calculations. Don't entangle your representation > changes (e.g. matrix to quaternion) and transforms (e.g. rotate vector) > with co-ordinate frame changes. > > Your 'left versor' functions are the correct ones. Your 'right versor' > functions create and use the opposite rotations. I don't know where you > got this 'left versor' and 'right versor' terminology from. This thread > seems to be the only hit for these terms together on Google. > > Well, if the shuttle used a different definition then it was out there somewhere. The history of quaternions is rather involved and mixed up with vectors, so it may be the case that there were different conventions. It might also be that the difference was between vector and coordinate rotations, but it is hard to tell without knowing how the code actually made use of the results. The left/right versor terminology is new to me also. Maybe it's like economists never admitting to knowing the word 'derivative' as in calculus, it's all marginal this and marginal that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason-sage at creativetrax.com Fri Jul 29 11:14:39 2011 From: jason-sage at creativetrax.com (Jason Grout) Date: Fri, 29 Jul 2011 08:14:39 -0700 Subject: [Numpy-discussion] dtype repr change? In-Reply-To: References: <20110727220706.GB14003@phare.normalesup.org> <20110727232949.GF14003@phare.normalesup.org> Message-ID: <4E32CE5F.4060105@creativetrax.com> On 7/28/11 4:21 PM, Matthew Brett wrote: > Hi, >> Do you know if doctests supports any sort of manual intervention, like >> a plugin system? > > Actually, I was going to ask you that question :) > > But yes, there's the NumpyDoctest nose plugin, for example. Using it > does mean you have to customize nose somehow - in numpy's case by > using the 'numpy.test()' machinery. Sympy I believe has a fair amount > of machinery to work with doctests, but I haven't looked at that yet, Sage also has a fair amount of machinery dealing with doctests. Almost all of Sage's testing is done in doctests (covering 85.4% of the Sage library, which is 27833 functions). All doctests must pass before a release, and any new functions must have doctests. We do also have some unit tests, and there is sentiment that we should have more unit tests, but the requirement right now is only for doctests. Jason -- Jason Grout From ben.root at ou.edu Fri Jul 29 12:32:54 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 29 Jul 2011 11:32:54 -0500 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> Message-ID: On Fri, Jul 29, 2011 at 10:07 AM, Mark Wiebe wrote: > On Thu, Jul 28, 2011 at 9:58 AM, Hans Meine < > meine at informatik.uni-hamburg.de> wrote: > >> Hi again! >> >> Am Donnerstag, 21. Juli 2011, 16:56:21 schrieb Hans Meine: >> > import numpy >> > >> > class Test(numpy.ndarray): >> > pass >> > >> > a1 = numpy.ndarray((1,)) >> > a2 = Test((1,)) >> > >> > assert type(a1.min()) == type(a2.min()), \ >> > "%s != %s" % (type(a1.min()), type(a2.min())) >> > # --------------------------------------------------- >> > >> > This code fails with 1.6.0, while it worked in 1.3.0. >> >> I just tried with 1.5.1 (Ubuntu natty), and it works, too. >> >> Thus, this behavor-incompatible change happened between 1.5.1 and 1.6.0. >> > > I dug a little bit into the relevant 1.5.x vs 1.6.x code, in the places I > would most suspect a change, but couldn't find anything obvious. Here's the > ndarray.min method: > > > https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/multiarray/methods.c#L267 > > https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/multiarray/methods.c#L257 > > it calls PyArray_Min, which is here: > > > https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/multiarray/calculation.c#L208 > > https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/multiarray/calculation.c#L208 > > this calls numpy.minimum.reduce, which ends up here: > > > https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/umath/ufunc_object.c#L3194 > > https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/umath/ufunc_object.c#L3851 > > This function calls PyArray_Return(ret), which is how things get converted > into the NumPy scalars: > > > https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/umath/ufunc_object.c#L3365 > > https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/umath/ufunc_object.c#L4019 > > Here's the code for PyArray_Return, which was unchanged: > > > https://github.com/numpy/numpy/blob/maintenance%2F1.5.x/numpy/core/src/multiarray/scalarapi.c#L729 > > https://github.com/numpy/numpy/blob/maintenance%2F1.6.x/numpy/core/src/multiarray/scalarapi.c#L793 > > Something more subtle is going on. I don't have the time to dig into this > more at the moment, but hopefully these pointers into the code can help > anyone out there who has the time to investigate further. We definitely need > to add some tests for the desired behavior here. > > Cheers, > -Mark > > Could it possibly have anything to do with the bug Chuck found in a different thread: No, it comes from this > > In [2]: a = numpy.ma.masked_array([1,2,3, > 4]) > > In [3]: array(a.flat) > Out[3]: array(, > dtype='object') > > i.e., the a.flat iterator is turned into an object array with one element. > I'm not sure what the correct fix for this would be. Please open a ticket. > Don't know if they are related, just throwing it out there. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From martin-numpy at earth.li Fri Jul 29 13:07:34 2011 From: martin-numpy at earth.li (Martin Ling) Date: Fri, 29 Jul 2011 18:07:34 +0100 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> Message-ID: <20110729170734.GZ3465@earth.li> On Fri, Jul 29, 2011 at 09:14:00AM -0600, Charles R Harris wrote: > > Well, if the shuttle used a different definition then it was out there > somewhere. The history of quaternions is rather involved and mixed up with > vectors, so it may be the case that there were different conventions. My point is that these are conventions of co-ordinate frame, not of different representations of quaternions themselves. There's no two "handednesses" of quaternions to support. There are an infinte number of co-ordinate frames, and a quaternion can be interpreted as a rotation in any one of them. It's a matter of interpretation, not calculation. > It might also be that the difference was between vector and > coordinate rotations, but it is hard to tell without knowing how > the code actually made use of the results. Indeed, this is the other place the duality shows up. If q is the rotation of frame A relative to frame B, then a vector v in A appears in B as: v' = q * v * q.conjugate while a vector u in B appears in A as: u' = q.conjugate * u * q The former is often thought of as 'rotating the vector' versus the second as 'rotating the co-ordinate frame', but both are actually the same operation performed using a different choice of frames. Martin From njs at pobox.com Fri Jul 29 14:23:38 2011 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 29 Jul 2011 11:23:38 -0700 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> Message-ID: On Jul 28, 2011 8:43 AM, "Matthew Brett" wrote: > So, 1.6.0 is returning a zero-dimensional scalar of the given type, > and 1.5.1 returns a python scalar. > > Zero dimensional scalars are designed to behave in a similar way to > python scalars, so the change should be all but invisible in practice. > Was there a particular case you ran into where this was a problem? Even so, surely this behavior should be consistent between base class ndarrays and subclasses? If returning 0d arrays is a good idea, then we should do it everywhere. If it's a bad idea, then we shouldn't do it at all...? (In reality, it sounds like this might be some mishap in the __array_wrap__ mechanism?) - Nathaniel From yoshi at rokuko.net Fri Jul 29 14:32:13 2011 From: yoshi at rokuko.net (Yoshi Rokuko) Date: Fri, 29 Jul 2011 20:32:13 +0200 Subject: [Numpy-discussion] c-api return two arrays In-Reply-To: References: <201107290852.p6T8qCIE014839@lotus.yokuts.org> Message-ID: <201107291832.p6TIWDmP020999@lotus.yokuts.org> +-------------------------- Pauli Virtanen -----------+ > Fri, 29 Jul 2011 10:52:12 +0200, Yoshi Rokuko wrote: > [clip] > >>>> A, B = mod.meth(C, prob=.95) > > > > is it possible to return two arrays? > > The way to do this in Python is to build a tuple with > Py_BuildValue("OO", A, B) and return that. that seems to be it, thank you! From charlesr.harris at gmail.com Fri Jul 29 15:52:48 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Jul 2011 13:52:48 -0600 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110729170734.GZ3465@earth.li> References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> <20110729170734.GZ3465@earth.li> Message-ID: On Fri, Jul 29, 2011 at 11:07 AM, Martin Ling wrote: > On Fri, Jul 29, 2011 at 09:14:00AM -0600, Charles R Harris wrote: > > > > Well, if the shuttle used a different definition then it was out there > > somewhere. The history of quaternions is rather involved and mixed up > with > > vectors, so it may be the case that there were different conventions. > > My point is that these are conventions of co-ordinate frame, not of > different representations of quaternions themselves. There's no two > "handednesses" of quaternions to support. There are an infinte number of > co-ordinate frames, and a quaternion can be interpreted as a rotation in > any one of them. It's a matter of interpretation, not calculation. > > > It might also be that the difference was between vector and > > coordinate rotations, but it is hard to tell without knowing how > > the code actually made use of the results. > > Indeed, this is the other place the duality shows up. If q is the > rotation of frame A relative to frame B, then a vector v in A appears > in B as: > > v' = q * v * q.conjugate > > while a vector u in B appears in A as: > > u' = q.conjugate * u * q > > The former is often thought of as 'rotating the vector' versus the > second as 'rotating the co-ordinate frame', but both are actually the > same operation performed using a different choice of frames. > > They are different, a vector is an element of a vector space independent of coordinate frames, coordinate frames are a collection of functions from the vector space to scalars. Operationally, rotating vectors is a map from the vector space onto itself, however the coordinates happen to be the same when the inverse rotation is applied to the coordinate frame, it's pretty much the definition of coordinate rotation. But the concepts aren't the same. The similarity between the operations is how covariant vectors got to be called contravariant tensors, the early workers in the field dealt with the coordinates. But that is all to the side ;) I'm wondering about the history of the 'versor' object and in which fields it was used. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Jul 29 15:57:23 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 29 Jul 2011 14:57:23 -0500 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> <20110729170734.GZ3465@earth.li> Message-ID: On Fri, Jul 29, 2011 at 2:52 PM, Charles R Harris wrote: > > > On Fri, Jul 29, 2011 at 11:07 AM, Martin Ling wrote: > >> On Fri, Jul 29, 2011 at 09:14:00AM -0600, Charles R Harris wrote: >> > >> > Well, if the shuttle used a different definition then it was out >> there >> > somewhere. The history of quaternions is rather involved and mixed up >> with >> > vectors, so it may be the case that there were different conventions. >> >> My point is that these are conventions of co-ordinate frame, not of >> different representations of quaternions themselves. There's no two >> "handednesses" of quaternions to support. There are an infinte number of >> co-ordinate frames, and a quaternion can be interpreted as a rotation in >> any one of them. It's a matter of interpretation, not calculation. >> >> > It might also be that the difference was between vector and >> > coordinate rotations, but it is hard to tell without knowing how >> > the code actually made use of the results. >> >> Indeed, this is the other place the duality shows up. If q is the >> rotation of frame A relative to frame B, then a vector v in A appears >> in B as: >> >> v' = q * v * q.conjugate >> >> while a vector u in B appears in A as: >> >> u' = q.conjugate * u * q >> >> The former is often thought of as 'rotating the vector' versus the >> second as 'rotating the co-ordinate frame', but both are actually the >> same operation performed using a different choice of frames. >> >> > They are different, a vector is an element of a vector space independent of > coordinate frames, coordinate frames are a collection of functions from the > vector space to scalars. Operationally, rotating vectors is a map from the > vector space onto itself, however the coordinates happen to be the same > when the inverse rotation is applied to the coordinate frame, it's pretty > much the definition of coordinate rotation. But the concepts aren't the > same. The similarity between the operations is how covariant vectors got to > be called contravariant tensors, the early workers in the field dealt with > the coordinates. > > But that is all to the side ;) I'm wondering about the history of the > 'versor' object and in which fields it was used. > > Chuck > > I am starting to get very interested in this quaternion concept (and maybe how I could use it for mplot3d), but I have never come across it before (beyond the typical vector math that I am familiar with). Can anybody recommend a good introductory resource to get me up to speed? Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Jul 29 16:16:20 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 29 Jul 2011 14:16:20 -0600 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> <20110729170734.GZ3465@earth.li> Message-ID: On Fri, Jul 29, 2011 at 1:57 PM, Benjamin Root wrote: > > > On Fri, Jul 29, 2011 at 2:52 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Jul 29, 2011 at 11:07 AM, Martin Ling wrote: >> >>> On Fri, Jul 29, 2011 at 09:14:00AM -0600, Charles R Harris wrote: >>> > >>> > Well, if the shuttle used a different definition then it was out >>> there >>> > somewhere. The history of quaternions is rather involved and mixed >>> up with >>> > vectors, so it may be the case that there were different >>> conventions. >>> >>> My point is that these are conventions of co-ordinate frame, not of >>> different representations of quaternions themselves. There's no two >>> "handednesses" of quaternions to support. There are an infinte number of >>> co-ordinate frames, and a quaternion can be interpreted as a rotation in >>> any one of them. It's a matter of interpretation, not calculation. >>> >>> > It might also be that the difference was between vector and >>> > coordinate rotations, but it is hard to tell without knowing how >>> > the code actually made use of the results. >>> >>> Indeed, this is the other place the duality shows up. If q is the >>> rotation of frame A relative to frame B, then a vector v in A appears >>> in B as: >>> >>> v' = q * v * q.conjugate >>> >>> while a vector u in B appears in A as: >>> >>> u' = q.conjugate * u * q >>> >>> The former is often thought of as 'rotating the vector' versus the >>> second as 'rotating the co-ordinate frame', but both are actually the >>> same operation performed using a different choice of frames. >>> >>> >> They are different, a vector is an element of a vector space independent >> of coordinate frames, coordinate frames are a collection of functions from >> the vector space to scalars. Operationally, rotating vectors is a map from >> the vector space onto itself, however the coordinates happen to be the same >> when the inverse rotation is applied to the coordinate frame, it's pretty >> much the definition of coordinate rotation. But the concepts aren't the >> same. The similarity between the operations is how covariant vectors got to >> be called contravariant tensors, the early workers in the field dealt with >> the coordinates. >> >> But that is all to the side ;) I'm wondering about the history of the >> 'versor' object and in which fields it was used. >> >> Chuck >> >> > I am starting to get very interested in this quaternion concept (and maybe > how I could use it for mplot3d), but I have never come across it before > (beyond the typical vector math that I am familiar with). Can anybody > recommend a good introductory resource to get me up to speed? > > Well, there is Robert's recommendation, which looks sort of like a reprise of Klein & Sommerfeld's Theory of the Top, but there are lots of resources out there for the application of quaternions to graphics. The main advantage of quaternions is that they provide a simply connected representation of rotations, there isn't a jump between rotations of +/- 180 degrees. Also, since they exist on the surface of a 4 dimensional ball you can interpolate rotations by a path on that surface, or even approximate same by secant lines between nearby points. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Jul 29 17:05:02 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 29 Jul 2011 16:05:02 -0500 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> <20110729170734.GZ3465@earth.li> Message-ID: On Fri, Jul 29, 2011 at 2:57 PM, Benjamin Root wrote: > > > On Fri, Jul 29, 2011 at 2:52 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Fri, Jul 29, 2011 at 11:07 AM, Martin Ling wrote: >> >>> On Fri, Jul 29, 2011 at 09:14:00AM -0600, Charles R Harris wrote: >>> > >>> > Well, if the shuttle used a different definition then it was out >>> there >>> > somewhere. The history of quaternions is rather involved and mixed >>> up with >>> > vectors, so it may be the case that there were different >>> conventions. >>> >>> My point is that these are conventions of co-ordinate frame, not of >>> different representations of quaternions themselves. There's no two >>> "handednesses" of quaternions to support. There are an infinte number of >>> co-ordinate frames, and a quaternion can be interpreted as a rotation in >>> any one of them. It's a matter of interpretation, not calculation. >>> >>> > It might also be that the difference was between vector and >>> > coordinate rotations, but it is hard to tell without knowing how >>> > the code actually made use of the results. >>> >>> Indeed, this is the other place the duality shows up. If q is the >>> rotation of frame A relative to frame B, then a vector v in A appears >>> in B as: >>> >>> v' = q * v * q.conjugate >>> >>> while a vector u in B appears in A as: >>> >>> u' = q.conjugate * u * q >>> >>> The former is often thought of as 'rotating the vector' versus the >>> second as 'rotating the co-ordinate frame', but both are actually the >>> same operation performed using a different choice of frames. >>> >>> >> They are different, a vector is an element of a vector space independent >> of coordinate frames, coordinate frames are a collection of functions from >> the vector space to scalars. Operationally, rotating vectors is a map from >> the vector space onto itself, however the coordinates happen to be the same >> when the inverse rotation is applied to the coordinate frame, it's pretty >> much the definition of coordinate rotation. But the concepts aren't the >> same. The similarity between the operations is how covariant vectors got to >> be called contravariant tensors, the early workers in the field dealt with >> the coordinates. >> >> But that is all to the side ;) I'm wondering about the history of the >> 'versor' object and in which fields it was used. >> >> Chuck >> >> > I am starting to get very interested in this quaternion concept (and maybe > how I could use it for mplot3d), but I have never come across it before > (beyond the typical vector math that I am familiar with). Can anybody > recommend a good introductory resource to get me up to speed? > One resource is the Visualizing Quaternions book, and an earlier paper: http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.22.7438 -Mark > > Thanks, > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkluck at infty.nl Fri Jul 29 17:18:26 2011 From: tkluck at infty.nl (Timo Kluck) Date: Fri, 29 Jul 2011 23:18:26 +0200 Subject: [Numpy-discussion] numpy.interp running time In-Reply-To: References: Message-ID: Dear numpy developers, The current implementation of numpy.interp(x,xp,fp) comes down to: first calculating all the slopes of the linear interpolant (these are len(xp)-1), then use a binary search to find where x is in xp (running time log(len(xp)). So we obtain a running time of O( len(xp) + len(x)*log(len(xp) ) We could improve this to just O( len(x)*log(len(xp) ) by not caching the slopes. The point is, of course, that this is slightly slower in the common use case where x is is refinement of xp, and where you will have to compute all the slopes anyway. In my personal use case, however, I needed the value of the interp(x0,xp,fp) in order to calculate the next point x1 where I wanted to calculate interp(x1,xp,fp). The current implementation gave a severe running time penalty. I have looked at the source and I could easily produce a patch for this. Would you be interested in it? Cheers, Timo Kluck -------------- next part -------------- An HTML attachment was scrubbed... URL: From carlosbecker at gmail.com Fri Jul 29 17:45:08 2011 From: carlosbecker at gmail.com (Carlos Becker) Date: Fri, 29 Jul 2011 18:45:08 -0300 Subject: [Numpy-discussion] Array vectorization in numpy In-Reply-To: References: <266951DF-4242-4247-9292-E0AB578731C4@gmail.com> <26FC23E7C398A64083C980D16001012D246CA28741@VA3DIAXVS361.RED001.local> Message-ID: Hi. That is really amazing. I checked out that numexpr branch and saw some strange results when evaluating expressions on a multi-core i7 processor. Running the numexpr.test() yields a few 'F', which I suppose are failing tests. I tried to let the tests finish but it takes more than 20 min, is there any way to run the tests individually? Is there a specific mailing list for numexpr, so I can avoid 'spamming' numpy? Thanks! ---------------------- Carlos Becker On Wed, Jul 20, 2011 at 8:01 PM, Mark Wiebe wrote: > > On Wed, Jul 20, 2011 at 5:52 PM, srean wrote: > >> >> I think this is essential to speed up numpy. Maybe numexpr could handle >> this in the future? Right now the general use of numexpr is result = >> numexpr.evaluate("whatever"), so the same problem seems to be there. >> >> >> >> With this I am not saying that numpy is not worth it, just that for >> many applications (specially with huge matrices/arrays), pre-allocation does >> make a huge difference, especially if we want to attract more people to >> using numpy. >> > >> > The ufuncs and many scipy functions take a "out" parameter where you >> > can specify a pre-allocated array. It can be a little awkward writing >> > expressions that way, but the capability is there. >> >> This is a slight digression: is there a way to have a out parameter >> like semantics with numexpr. I have always used it as >> >> a[:] = numexpr(expression) >> >> But I dont think numexpr builds the value in place. Is it possible to >> have side-effects with numexpr as opposed to obtaining values, for >> example >> >> "a= a * b + c" >> >> The documentation is not clear about this. Oh and I do not find the >> "out" parameter awkward at all. Its very handy. Furthermore, if I may, >> here is a request that the Blitz++ source be updated. Seems like there >> is a lot of activity on the Blitz++ repository and weave is very handy >> too and can be used as easily as numexpr. >> > > In order to make sure the 1.6 nditer supports multithreading, I adapted > numexpr to use it. The branch which does this is here: > > http://code.google.com/p/numexpr/source/browse/#svn%2Fbranches%2Fnewiter > > This supports out, order, and casting parameters, visible here: > > > http://code.google.com/p/numexpr/source/browse/branches/newiter/numexpr/necompiler.py#615 > > It's pretty much ready to go, just needs someone to do the release > management. > > -Mark > > _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sat Jul 30 14:52:33 2011 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 30 Jul 2011 08:52:33 -1000 Subject: [Numpy-discussion] numpy.interp running time In-Reply-To: References: Message-ID: <4E3452F1.7010607@hawaii.edu> On 07/29/2011 11:18 AM, Timo Kluck wrote: > Dear numpy developers, > > The current implementation of numpy.interp(x,xp,fp) comes down to: first > calculating all the slopes of the linear interpolant (these are > len(xp)-1), then use a binary search to find where x is in xp (running > time log(len(xp)). So we obtain a running time of > > O( len(xp) + len(x)*log(len(xp) ) > > We could improve this to just > > O( len(x)*log(len(xp) ) > > by not caching the slopes. The point is, of course, that this is > slightly slower in the common use case where x is is refinement of xp, > and where you will have to compute all the slopes anyway. > > In my personal use case, however, I needed the value of the > interp(x0,xp,fp) in order to calculate the next point x1 where I wanted > to calculate interp(x1,xp,fp). The current implementation gave a severe > running time penalty. Maybe the thing to do is to pre-calculate if len(xp) <= len(x), or some such guess as to which method would be more efficient. Eric > > I have looked at the source and I could easily produce a patch for this. > Would you be interested in it? > > Cheers, > Timo Kluck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From stefan at sun.ac.za Sat Jul 30 19:04:36 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 30 Jul 2011 16:04:36 -0700 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> <20110729170734.GZ3465@earth.li> Message-ID: Hi Ben On Fri, Jul 29, 2011 at 12:57 PM, Benjamin Root wrote: > I am starting to get very interested in this quaternion concept (and maybe > how I could use it for mplot3d), but I have never come across it before > (beyond the typical vector math that I am familiar with).? Can anybody > recommend a good introductory resource to get me up to speed? I heard that Ch. 11 of Roger Penrose's "Road to Reality" explain quaternions well. 11 Hypercomplex numbers 11.1 The algebra of quaternions 11.2 The physical role of quaternions? 11.3 Geometry of quaternions 11.4 How to compose rotations 11.5 Clifford algebras 11.6 Grassmann algebras Regards St?fan From meine at informatik.uni-hamburg.de Sun Jul 31 01:57:58 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Sun, 31 Jul 2011 07:57:58 +0200 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <20110716145010.GY3465@earth.li> References: <20110716145010.GY3465@earth.li> Message-ID: <06A22474-3A84-4751-BC29-D657F1DC8186@informatik.uni-hamburg.de> Hi Martin, I think it would be more useful if isfinite returned true if *all* elements were finite. (Opposite of isnan and isinf.) HTH, Hans PS: did not check the complex dtype, hopefully that one's no different. (The above has been typed using a small on-screen keyboard, which may account for any typos, briefness or bad formatting.) Am 16.07.2011 um 16:50 schrieb Martin Ling : > Hi all, > > I have just pushed a package to GitHub which adds a quaternion dtype to > NumPy: https://github.com/martinling/numpy_quaternion > > Some backstory: on Wednesday I gave a talk at SciPy 2011 about an > inertial sensing simulation package I have been working on > (http://www.imusim.org/). One component I suggested might be reusable > from that code was the quaternion math implementation, written in > Cython. One of its features is a wrapper class for Nx4 NumPy arrays that > supports efficient operations using arrays of quaternion values. > > Travis Oliphant suggested that a quaternion dtype would be a better > solution, and got me talking to Mark Weibe about this. With Mark's help > I completed this initial version at yesterday's sprint session. > > Incidentally, how to do something like this isn't well documented and I > would have had little hope without both Mark's in-person help and his > previous code (for adding a half-precision float dtype) to refer to. I > don't know what the consensus is about whether people writing custom > dtypes is a desirable thing, but if it is then the process needs to be > made a lot easier. That said, the fact this is doable without patching > the numpy core at all is really, really nice. > > Example usage: > >>>> import numpy as np >>>> import quaternion >>>> np.quaternion(1,0,0,0) > quaternion(1, 0, 0, 0) >>>> q1 = np.quaternion(1,2,3,4) >>>> q2 = np.quaternion(5,6,7,8) >>>> q1 * q2 > quaternion(-60, 12, 30, 24) >>>> a = np.array([q1, q2]) >>>> a > array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], > dtype=quaternion) >>>> exp(a) > array([quaternion(1.69392, -0.78956, -1.18434, -1.57912), > quaternion(138.909, -25.6861, -29.9671, -34.2481)], > dtype=quaternion) > > The following ufuncs are implemented: > add, subtract, multiply, divide, log, exp, power, negative, conjugate, > copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, > absolute > > Quaternion components are stored as doubles. The package could be extended > to support e.g. qfloat, qdouble, qlongdouble > > Comparison operations follow the same lexicographic ordering as tuples. > > The unary tests isnan, isinf and isfinite return true if they would > return true for any individual component. > > Real types may be cast to quaternions, giving quaternions with zero for > all three imaginary components. Complex types may also be cast to > quaternions, with their single imaginary component becoming the first > imaginary component of the quaternion. Quaternions may not be cast to > real or complex types. > > Comments very welcome. This is my first attempt at NumPy hacking :-) > > > Martin > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From meine at informatik.uni-hamburg.de Sun Jul 31 02:38:41 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Sun, 31 Jul 2011 08:38:41 +0200 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107291131.24260.meine@informatik.uni-hamburg.de> <201107291212.19475.meine@informatik.uni-hamburg.de> Message-ID: <704A4698-59F7-42D5-882B-F09954BE2A84@informatik.uni-hamburg.de> On Fri, Jul 29, 2011 at 4:12 AM, Hans Meine wrote: > > /home/hmeine/new_numpy/lib64/python2.6/site-packages/vigra/arraytypes.pyc in > reshape(self, shape, order) > 587 > 588 def reshape(self, shape, order='C'): > --> 589 res = numpy.ndarray.reshape(self, shape, order) > 590 res.axistags = AxisTags(res.ndim) > 591 return res > > TypeError: an integer is required > > The problem is that 'self' has become a zero-rank array, and those cannot be > reshaped in order to add singleton dimensions anymore. IOW, if you implement > sth. like broadcasting, this is made much harder. Am 29.07.2011 um 15:28 schrieb Charles R Harris > > What is self and shape in this example? self is my zero-rank array, shape is (1,1) (just 'int's, I checked), and order is 'F'. The problem is that reshape of a zero-rank-array fails with the above TypeError. > Out of curiosity, if you don't support all the ndarray operations, why are you subclassing ndarray? Oh, we *do* support all ndarray operations, at least we try to, but we did not pay attention to support instances with an empty shape tuple. (And before 1.6.0, this did not bite us.) In fact, we also found a not-too-complex workaround, very similar to the numpy.matrix one, but we still believe the new min() behavior to be strange and probably unwanted. Have a nice day, Hans From meine at informatik.uni-hamburg.de Sun Jul 31 02:40:41 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Sun, 31 Jul 2011 08:40:41 +0200 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> Message-ID: <33D73C73-2C00-4334-8572-D538F768A7F7@informatik.uni-hamburg.de> Am 29.07.2011 um 17:07 schrieb Mark Wiebe: > I dug a little bit into the relevant 1.5.x vs 1.6.x code, in the places I would most suspect a change, but couldn't find anything obvious. Thanks for having a look. This strengthens my suspicion that the behavior change was not intentional. Have a nice day, Hans From meine at informatik.uni-hamburg.de Sun Jul 31 02:50:40 2011 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Sun, 31 Jul 2011 08:50:40 +0200 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> Message-ID: <558B668E-F83B-40DB-95CF-F72F0ABEFBD7@informatik.uni-hamburg.de> Am 29.07.2011 um 20:23 schrieb Nathaniel Smith: > Even so, surely this behavior should be consistent between base class > ndarrays and subclasses? If returning 0d arrays is a good idea, then > we should do it everywhere. If it's a bad idea, then we shouldn't do > it at all...? Very well put. That's exactly the reason why I am insisting on this discussion, and why I believe that the behavior change is not intentional. Otherwise, ndarray and matrix should behave like my subclass. (BTW: I did not check masked_array yet.) > (In reality, it sounds like this might be some mishap in the > __array_wrap__ mechanism?) That's exactly my guess. (That could also explain why Mark did not see anything obvious in the code.) In fact, my first thought was "maybe there was a documented change in the __array_wrap__ protocol, which we have to implement now", but obviously that is not the case. Have a nice day, Hans From friedrichromstedt at gmail.com Sun Jul 31 05:03:28 2011 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Sun, 31 Jul 2011 11:03:28 +0200 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: References: <20110716145010.GY3465@earth.li> <54045EAB-6486-413F-BF3B-4AC2DC9E1C1E@comcast.net> <20110728124218.GK3465@earth.li> <20110729150314.GU3465@earth.li> <20110729170734.GZ3465@earth.li> Message-ID: 2011/7/29 Benjamin Root : > I am starting to get very interested in this quaternion concept (and maybe > how I could use it for mplot3d), but I have never come across it before > (beyond the typical vector math that I am familiar with). Can anybody > recommend a good introductory resource to get me up to speed? The time I learned it I used the Wikipedia article, apparently it wasn't that bad that time I used it (haven't checked now). But it needs some hand-crafting to get into it. In principle, if you decompose the operation of Quaternions manually (which involves a bit of algebra, but doable), then you'll see how beautifully it decomposes the vector given in to create a local rotation plane, where it is rotated just as in ordinary polar coordinates. Friedrich From dirk.ullrich at googlemail.com Sun Jul 31 06:36:22 2011 From: dirk.ullrich at googlemail.com (Dirk Ullrich) Date: Sun, 31 Jul 2011 12:36:22 +0200 Subject: [Numpy-discussion] Error when building numpy with Py3k Message-ID: Hi, trying to build current Git master of numpy with Py3k (Pythin 3.2.1, to be precise) yields to an error: building 'numpy.lib._compiled_base' extension compiling C sources C compiler: gcc -pthread -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fPIC creating build/temp.linux-x86_64-3.2/numpy/lib creating build/temp.linux-x86_64-3.2/numpy/lib/src compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python3.2mu -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c' gcc: numpy/lib/src/_compiled_base.c numpy/lib/src/_compiled_base.c: In function 'pack_or_unpack_bits': numpy/lib/src/_compiled_base.c:1317:51: error: 'PyArrayObject' has no member named 'ob_type' numpy/lib/src/_compiled_base.c:1357:43: error: 'PyArrayObject' has no member named 'ob_type' numpy/lib/src/_compiled_base.c: In function 'pack_or_unpack_bits': numpy/lib/src/_compiled_base.c:1317:51: error: 'PyArrayObject' has no member named 'ob_type' numpy/lib/src/_compiled_base.c:1357:43: error: 'PyArrayObject' has no member named 'ob_type' error: Command "gcc -pthread -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -march=x86-64 -mtune=generic -O2 -pipe -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python3.2mu -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c numpy/lib/src/_compiled_base.c -o build/temp.linux-x86_64-3.2/numpy/lib/src/_compiled_base.o" failed with exit status 1 Dirk From bblais at bryant.edu Sun Jul 31 08:48:00 2011 From: bblais at bryant.edu (Brian Blais) Date: Sun, 31 Jul 2011 08:48:00 -0400 Subject: [Numpy-discussion] recommendation for saving data Message-ID: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> Hello, I was wondering if there are any recommendations for formats for saving scientific data. I am running a simulation, which has many somewhat-indepedent parts which have their own internal state and parameters. I've been using pickle (gzipped) to save the entire object (which contains subobjects, etc...), but it is getting too unwieldy and I think it is time to look for a more robust solution. Ideally I'd like to have something where I can call a save method on the simulation object, and it will call the save methods on all the children, on down the line all saving into one file. It'd also be nice if it were cross-platform, and I could depend on the files being readable into the future for a while. Are there any good standards for this? What do you use for saving scientific data? thank you, Brian Blais -- Brian Blais bblais at bryant.edu http://web.bryant.edu/~bblais http://bblais.blogspot.com/ From martin-numpy at earth.li Sun Jul 31 09:21:38 2011 From: martin-numpy at earth.li (Martin Ling) Date: Sun, 31 Jul 2011 14:21:38 +0100 Subject: [Numpy-discussion] Quaternion dtype for NumPy - initial implementation available In-Reply-To: <06A22474-3A84-4751-BC29-D657F1DC8186@informatik.uni-hamburg.de> References: <20110716145010.GY3465@earth.li> <06A22474-3A84-4751-BC29-D657F1DC8186@informatik.uni-hamburg.de> Message-ID: <20110731132137.GB3465@earth.li> Hi Hans, Sorry, that is actually what I implemented, I just documented it iincorrectly. I have just pushed an update to the README. Thanks for pointing this out! Martin On Sun, Jul 31, 2011 at 07:57:58AM +0200, Hans Meine wrote: > > Hi Martin, > > I think it would be more useful if isfinite returned true if *all* elements were finite. (Opposite of isnan and isinf.) > > HTH, > Hans > > PS: did not check the complex dtype, hopefully that one's no different. > > (The above has been typed using a small on-screen keyboard, which may account for any typos, briefness or bad formatting.) > > Am 16.07.2011 um 16:50 schrieb Martin Ling : > > > Hi all, > > > > I have just pushed a package to GitHub which adds a quaternion dtype to > > NumPy: https://github.com/martinling/numpy_quaternion > > > > Some backstory: on Wednesday I gave a talk at SciPy 2011 about an > > inertial sensing simulation package I have been working on > > (http://www.imusim.org/). One component I suggested might be reusable > > from that code was the quaternion math implementation, written in > > Cython. One of its features is a wrapper class for Nx4 NumPy arrays that > > supports efficient operations using arrays of quaternion values. > > > > Travis Oliphant suggested that a quaternion dtype would be a better > > solution, and got me talking to Mark Weibe about this. With Mark's help > > I completed this initial version at yesterday's sprint session. > > > > Incidentally, how to do something like this isn't well documented and I > > would have had little hope without both Mark's in-person help and his > > previous code (for adding a half-precision float dtype) to refer to. I > > don't know what the consensus is about whether people writing custom > > dtypes is a desirable thing, but if it is then the process needs to be > > made a lot easier. That said, the fact this is doable without patching > > the numpy core at all is really, really nice. > > > > Example usage: > > > >>>> import numpy as np > >>>> import quaternion > >>>> np.quaternion(1,0,0,0) > > quaternion(1, 0, 0, 0) > >>>> q1 = np.quaternion(1,2,3,4) > >>>> q2 = np.quaternion(5,6,7,8) > >>>> q1 * q2 > > quaternion(-60, 12, 30, 24) > >>>> a = np.array([q1, q2]) > >>>> a > > array([quaternion(1, 2, 3, 4), quaternion(5, 6, 7, 8)], > > dtype=quaternion) > >>>> exp(a) > > array([quaternion(1.69392, -0.78956, -1.18434, -1.57912), > > quaternion(138.909, -25.6861, -29.9671, -34.2481)], > > dtype=quaternion) > > > > The following ufuncs are implemented: > > add, subtract, multiply, divide, log, exp, power, negative, conjugate, > > copysign, equal, not_equal, less, less_equal, isnan, isinf, isfinite, > > absolute > > > > Quaternion components are stored as doubles. The package could be extended > > to support e.g. qfloat, qdouble, qlongdouble > > > > Comparison operations follow the same lexicographic ordering as tuples. > > > > The unary tests isnan, isinf and isfinite return true if they would > > return true for any individual component. > > > > Real types may be cast to quaternions, giving quaternions with zero for > > all three imaginary components. Complex types may also be cast to > > quaternions, with their single imaginary component becoming the first > > imaginary component of the quaternion. Quaternions may not be cast to > > real or complex types. > > > > Comments very welcome. This is my first attempt at NumPy hacking :-) > > > > > > Martin > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sun Jul 31 12:36:46 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 31 Jul 2011 10:36:46 -0600 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: <558B668E-F83B-40DB-95CF-F72F0ABEFBD7@informatik.uni-hamburg.de> References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> <558B668E-F83B-40DB-95CF-F72F0ABEFBD7@informatik.uni-hamburg.de> Message-ID: On Sun, Jul 31, 2011 at 12:50 AM, Hans Meine < meine at informatik.uni-hamburg.de> wrote: > Am 29.07.2011 um 20:23 schrieb Nathaniel Smith: > > Even so, surely this behavior should be consistent between base class > > ndarrays and subclasses? If returning 0d arrays is a good idea, then > > we should do it everywhere. If it's a bad idea, then we shouldn't do > > it at all...? > > Very well put. That's exactly the reason why I am insisting on this > discussion, and why I believe that the behavior change is not intentional. > Otherwise, ndarray and matrix should behave like my subclass. (BTW: I did > not check masked_array yet.) > > > (In reality, it sounds like this might be some mishap in the > > __array_wrap__ mechanism?) > > That's exactly my guess. (That could also explain why Mark did not see > anything obvious in the code.) > > Maybe. There isn't a problem for plain old zero dimensional arrays. In [1]: a = array(1) In [2]: a.dtype Out[2]: dtype('int64') In [3]: reshape(a, (1,1), order='f') Out[3]: array([[1]]) This on Linux 64 with latest master. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Sun Jul 31 13:19:51 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 31 Jul 2011 12:19:51 -0500 Subject: [Numpy-discussion] [ANN] IPython 0.11 is officially out Message-ID: Hi all, on behalf of the IPython development team, I'm thrilled to announce, after more than two years of development work, the official release of IPython 0.11. This release brings a long list of improvements and new features (along with hopefully few new bugs). We have completely refactored IPython, making it a much more friendly project to participate in by having better separated and organized internals. We hope you will not only use the new tools and libraries, but also join us with new ideas and development. After this very long development effort, we hope to make a few stabilization releases at a quicker pace, where we iron out the kinks in the new APIs and complete some remaining internal cleanup work. We will then make a (long awaited) IPython 1.0 release with these stable APIs. *Downloads* Download links and instructions are at: http://ipython.org/download.html And IPython is also on PyPI: http://pypi.python.org/pypi/ipython Those contain a built version of the HTML docs; if you want pure source downloads with no docs, those are available on github: Tarball: https://github.com/ipython/ipython/tarball/rel-0.11 Zipball: https://github.com/ipython/ipython/zipball/rel-0.11 * Features * Here is a quick listing of the major new features: - Standalone Qt console - High-level parallel computing with ZeroMQ - New model for GUI/plotting support in the terminal - A two-process architecture - Fully refactored internal project structure - Vim integration - Integration into Microsoft Visual Studio - Improved unicode support - Python 3 support - New profile model - SQLite storage for history - New configuration system - Pasting of code with prompts And many more... We closed over 500 tickets, merged over 200 pull requests, and more than 60 people contributed over 2200 commits for the final release. Please see our release notes for the full details on everything about this release: https://github.com/ipython/ipython/zipball/rel-0.11 As usual, if you find any problem, please file a ticket --or even better, a pull request fixing it-- on our github issues site (https://github.com/ipython/ipython/issues/). Many thanks to all who contributed! Fernando, on behalf of the IPython development team. http://ipython.org From e.antero.tammi at gmail.com Sun Jul 31 13:44:06 2011 From: e.antero.tammi at gmail.com (eat) Date: Sun, 31 Jul 2011 20:44:06 +0300 Subject: [Numpy-discussion] Rationale for returning type-wrapped min() / max() scalars? (was: Problem with ufunc of a numpy.ndarray derived class) In-Reply-To: References: <201107211656.21611.meine@informatik.uni-hamburg.de> <201107281658.24102.meine@informatik.uni-hamburg.de> <558B668E-F83B-40DB-95CF-F72F0ABEFBD7@informatik.uni-hamburg.de> Message-ID: Hi, On Sun, Jul 31, 2011 at 7:36 PM, Charles R Harris wrote: > > > On Sun, Jul 31, 2011 at 12:50 AM, Hans Meine < > meine at informatik.uni-hamburg.de> wrote: > >> Am 29.07.2011 um 20:23 schrieb Nathaniel Smith: >> > Even so, surely this behavior should be consistent between base class >> > ndarrays and subclasses? If returning 0d arrays is a good idea, then >> > we should do it everywhere. If it's a bad idea, then we shouldn't do >> > it at all...? >> >> Very well put. That's exactly the reason why I am insisting on this >> discussion, and why I believe that the behavior change is not intentional. >> Otherwise, ndarray and matrix should behave like my subclass. (BTW: I did >> not check masked_array yet.) >> >> > (In reality, it sounds like this might be some mishap in the >> > __array_wrap__ mechanism?) >> >> That's exactly my guess. (That could also explain why Mark did not see >> anything obvious in the code.) >> >> > Maybe. There isn't a problem for plain old zero dimensional arrays. > > In [1]: a = array(1) > > In [2]: a.dtype > Out[2]: dtype('int64') > > In [3]: reshape(a, (1,1), order='f') > Out[3]: array([[1]]) > FWIW: In []: sys.version Out[]: '2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)]' In []: np.version.version Out[]: '1.6.0' In []: a= array(1) In []: a.reshape((1, 1), order= 'F').flags Out[]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False In []: a.reshape((1, 1), order= 'C').flags Out[]: C_CONTIGUOUS : True F_CONTIGUOUS : False OWNDATA : False WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False Seems to be slightly inconsistent, but does it really matter? -eat > > This on Linux 64 with latest master. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkluck at infty.nl Sun Jul 31 20:56:07 2011 From: tkluck at infty.nl (Timo Kluck) Date: Mon, 1 Aug 2011 02:56:07 +0200 Subject: [Numpy-discussion] numpy.interp running time In-Reply-To: <4E3452F1.7010607@hawaii.edu> References: <4E3452F1.7010607@hawaii.edu> Message-ID: 2011/7/30 Eric Firing > On 07/29/2011 11:18 AM, Timo Kluck wrote: > > The current implementation of numpy.interp(x,xp,fp) comes down to: first > > calculating all the slopes of the linear interpolant (these are > > len(xp)-1), then use a binary search to find where x is in xp (running > > time log(len(xp)). So we obtain a running time of > > > > O( len(xp) + len(x)*log(len(xp) ) > > > > We could improve this to just > > > > O( len(x)*log(len(xp) ) > > > > by not caching the slopes. The point is, of course, that this is > > slightly slower in the common use case where x is is refinement of xp, > > and where you will have to compute all the slopes anyway. > > Maybe the thing to do is to pre-calculate if len(xp) <= len(x), or some > such guess as to which method would be more efficient. > > What you're suggesting is reasonable. The cutoff at len(xp) <= len(x) can distinguish between the 'refinement' case and the 'just one value' case. I'll implement it for a start. I'm not sure if it is the optimal cutoff in other cases -- or even if it is possible to define such an optimal cutoff. I'll try to get some numerical evidence to see what kind of speed differences we're talking about. I'll post back here when I have more info. Cheers, Timo -------------- next part -------------- An HTML attachment was scrubbed... URL: