From clee at spiralis.merseine.nu Mon Apr 1 09:06:59 2002 From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu) Date: Mon Apr 1 09:06:59 2002 Subject: [Numpy-discussion] slice question and bug Message-ID: <20020401165852.44D3E79B@spiralis.merseine.nu> Hello, I'm trying to track down a segv when I do the B[:] operation on an array, "B", a that I've built in as a view on external data. During the process I ran into the following code (Numeric-21.0): /* {%c++%} */ extern int PyArray_Free(PyObject *op, char *ptr) { PyArrayObject *ap = (PyArrayObject *)op; int i, n; if (ap->nd > 2) return -1; if (ap->nd == 3) { n = ap->dimensions[0]; for (i=0; ind >= 2) { free(ptr); } Py_DECREF(ap); return 0; } /* {%c++%} */ The multiple, incompatible tests of ap->nd are the problem. -chris From clee at spiralis.merseine.nu Mon Apr 1 10:59:02 2002 From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu) Date: Mon Apr 1 10:59:02 2002 Subject: [Numpy-discussion] slice question and bug In-Reply-To: <20020401165852.44D3E79B@spiralis.merseine.nu> References: <20020401165852.44D3E79B@spiralis.merseine.nu> Message-ID: <15528.44386.160013.936132@spiralis.merseine.nu> clee at spiralis.merseine.nu writes: > > Hello, > I'm trying to track down a segv when I do the B[:] operation on an > array, "B", a that I've built in as a view on external data. During... > [snip] To clarify my own somewhat non-sensical post: When I started composing my message, I was trying to figure out a bug in my own code that caused a crash while doing slice_array. I've since fixed that bug. However, in the process of figuring out what I was doing wrong I was browsing the Numeric source code. While examining PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the number of dimensions is greater than 2, yet it has code that tests for when the number of dimensions equals 3. So utimately, my post is just an alert, that I think there might be some code that needs to be cleaned up. Thanks, lacking-caffeine-ly yours -chris From nwagner at mecha.uni-stuttgart.de Wed Apr 3 11:48:47 2002 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Apr 3 11:48:47 2002 Subject: [Numpy-discussion] Factorization of complex symmetric matrices Message-ID: <3CAABF60.12D609C0@mecha.uni-stuttgart.de> Hi, I am looking for a suitable factorization of complex symmetric matrices. Where can I find a proper routine ? Nils From ray_drew at yahoo.co.uk Thu Apr 4 02:27:09 2002 From: ray_drew at yahoo.co.uk (Ray Drew) Date: Thu Apr 4 02:27:09 2002 Subject: [Numpy-discussion] RandomArray difference between Python2.1 and 2.2? References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu> Message-ID: <000b01c1dbc3$65fe6100$6014000a@RDREWXP> Hi, Can anyone explain the following? Python 2.1.1, Numpy version='20.2.0' Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> from RandomArray import * >>> normal(3., 1., (5,)) array([ 2.19091588, 2.44682837, 2.51790264, 4.26374364, 4.56880629]) Python 2.2, Numpy version='20.3' Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> from RandomArray import * >>> normal(3., 1., (5,)) array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304]) Why am I getting negative values with Python 2.2? This happens consistently. Any help would be appreciated. Thanks, Ray _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From pearu at cens.ioc.ee Thu Apr 4 18:51:36 2002 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Thu Apr 4 18:51:36 2002 Subject: [Numpy-discussion] RandomArray difference between Python2.1 and 2.2? In-Reply-To: <000b01c1dbc3$65fe6100$6014000a@RDREWXP> Message-ID: On Thu, 4 Apr 2002, Ray Drew wrote: > Python 2.2, Numpy version='20.3' > > Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32 > Type "copyright", "credits" or "license" for more information. > IDLE 0.8 -- press F1 for help > >>> from RandomArray import * > >>> normal(3., 1., (5,)) > array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304]) > > Why am I getting negative values with Python 2.2? This happens consistently. > Any help would be appreciated. This is a bug in Numpy 20.3 and should be fixed in Numpy 21.0. Pearu From kelson at fedka.ociw.edu Fri Apr 5 08:23:49 2002 From: kelson at fedka.ociw.edu (Daniel D. Kelson) Date: Fri Apr 5 08:23:49 2002 Subject: [Numpy-discussion] Error in MLab.py Message-ID: <200204040140.g341e1e04422@fedka.ociw.edu> Howdy: Shoudln't line 296 in MLab.py of Numeric 21.0, which currently reads: val = squeeze(dot(transpose(m)*conjugate(y)) / fact) read: val = squeeze(dot(transpose(m),conjugate(y)) / fact) Thanks, D.Kelson Carnegie Observatories http://www.ociw.edu/~kelson From DavidA at ActiveState.com Fri Apr 5 13:47:03 2002 From: DavidA at ActiveState.com (David Ascher) Date: Fri Apr 5 13:47:03 2002 Subject: [Numpy-discussion] Re: [Python-Dev] Array Enhancements References: <20020405203029.19286.qmail@web12903.mail.yahoo.com> <200204052121.g35LLut20125@pcp742651pcs.reston01.va.comcast.net> Message-ID: <3CAE1913.ECB27329@activestate.com> Guido van Rossum wrote: > > I would propose the following for multi-dimensional arrays: > > > > a = array.array('d', 20000, 20000) > > > > or: > > > > a = array.xarray('d', 20000, 20000) > > I just realized that multi-dimensional __getitem__ shouldn't be a big > deal. The question is, given the above declaration, what a[0] should > return: the same as a[0, 0] or a copy of a[0, 0:20000] or a reference > to a[0, 0:20000]. Or a ValueError? In the face of ambiguity, refuse the temptation to guess. IIRC, this issue caused lots of problems in the numpy world. cc'ing Paul in case he wants to jump in to fill in my rusty memory. Why does submitting a patch to arraymodule seem an easier path than modifying numarray or numpy to support what's needed? I believe that the goals of numarray aren't that different from what Scott is trying to do (memory management APIs, etc.). I'd like to see fewer multi-dimensional array objects, not more... --david ascher From jochen at unc.edu Fri Apr 5 20:56:09 2002 From: jochen at unc.edu (Jochen =?iso-8859-1?q?K=FCpper?=) Date: Fri Apr 5 20:56:09 2002 Subject: [Numpy-discussion] numerical integration Message-ID: The following message is a courtesy copy of an article that has been posted to comp.lang.python.announce as well. I have made a numerical intergation package available at ,---- | http://python.jochen-kuepper.de/integrate `---- This is a copy of the integrate module of scipy by Travis Oliphant plus some small changes and rearrangements to make it work standalone (well, it need Numeric). All credits go to the scipy folks, esp. Travis, all errors should be blamed on me. Greetings, Jochen PS: In the long run this module will be phased out in favor of scipy, but for now it might be useful for someone... -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: 44BCCD8E Sex, drugs and rock-n-roll From andrewm at object-craft.com.au Sun Apr 7 23:32:07 2002 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Sun Apr 7 23:32:07 2002 Subject: [Numpy-discussion] Puzzling numpy results? Message-ID: <20020408063157.1659D38F5B@coffee.object-craft.com.au> The behavior I'm seeing with zero length Numeric arrays is not what I would have expected: >>> from Numeric import * >>> array([5]) != array([]) zeros((0,), 'l') >>> array([]) == array([]) zeros((0,), 'l') >>> allclose(array([5]), array([])) 1 This is with Numeric-20.3 (and Numeric-20.2.1) - is this behavior correct, or have I stumbled across a bug? If both sides of the comparison are arrays with a length greater than zero, the comparisons work as expected: >>> array([5]) != array([6]) array([1]) >>> array([5, 5]) != array([6]) array([1, 1]) >>> array([5]) != array([5]) array([0]) The problem came up when I was writing unittests for some Numpy code: under some circumstances, the code under test is expected to return a zero length array: I was somewhat surprised when I couldn't make the test fail! 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From tchur at optushome.com.au Mon Apr 8 13:34:16 2002 From: tchur at optushome.com.au (Tim Churches) Date: Mon Apr 8 13:34:16 2002 Subject: [Numpy-discussion] Puzzling numpy results? References: <20020408063157.1659D38F5B@coffee.object-craft.com.au> Message-ID: <3CB208CE.2270C89@optushome.com.au> Andrew McNamara wrote: > > The behavior I'm seeing with zero length Numeric arrays is not what I > would have expected: > > >>> from Numeric import * > >>> array([5]) != array([]) > zeros((0,), 'l') > >>> array([]) == array([]) > zeros((0,), 'l') > >>> allclose(array([5]), array([])) > 1 The Numpy docs point out that == and != are implemented via the logical ufuncs, and that: "The ``logical'' ufuncs also perform their operations on arrays in elementwise fashion, just like the ``mathematical'' ones." I think this explains the results you are seeing: if you do an element-wise comparison of a length-one array with a zero-length array, the Numpy recycling rule means that you should always get a zero-length result. Note that zeros((0,),'l') is not zero, it is zero zeros. So although the results are surprising (at least to me, and you), I think the observed results are logically correct, although surprising. But, if that is the case, why does this hold (which I suspect reflects what you originally expected)?: >>> from Numeric import * >>> array([5,6]) != array([]) 1 >>> array([5,6]) == array([]) 0 Tim C From xscottg at yahoo.com Thu Apr 11 04:32:03 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 11 04:32:03 2002 Subject: [Numpy-discussion] Introduction Message-ID: <20020411113152.98373.qmail@web12906.mail.yahoo.com> Hello All. I'm interested in this project, and am curious to what level you are willing to accept outside contribution. I just tried to subscribe to the developers list, but I didn't realize that required admin approval. Hopefully it doesn't look like I was shaking the door without knocking first. Is this list active? Is this the correct place to talk about Numarray? A little about me: My name is Scott Gilbert, and I work as a software developer for a company called Rincon Research in Tucson Arizona. We do a lot digital signal processing/analysis among other things. In the last year or so, we've started to use Python in various capacities, and we're hoping to use it for more things. We need a good array module for various things. Some are similar to what it looks like Numarray is targeted at (fft, convolutions, etc...), and others are pretty different (providing buffers for reading data from specialized hardware etc...) About a week ago, I noticed that Guido over in Python developer land was willing to accept patches to the standard array module. As such, I thought I would take that opportunity to try and wedge some desirements and requirements I have into that baseline. Bummer for me, but they weren't exactly exited about bloating out arraymodule.c to meet my needs, and in retrospect that does make good sense. A number of people suggested that this might be a better place to try and get what I need. So here I am, poking around and wondering if I can play in your sandbox. If you're willing to let me contribute, my specific itches that I need to scratch are below. Otherwise - bummer, and I hope you all catch crabs... :-) ----------------------------------- It's taken me a couple of days to understand what's going on in the source. I've read through the design docs, and the PEP, but it wasn't until I tried to re-implement it that it really clicked. My re-implementation of the array portion of what you're doing is attached. There are still some holes to fill in, but it's fairly complete and supports a whole bunch of things which yours does not (Some of which you might even find useful: Pickling, Bit type). I'm pretty proud of it for only 400 lines of Python (Most of which is the bazillion type declarations). It's probably riddled with bugs as it's less than a day old... After initially thinking that you guys were getting too clever, I've come to realize it's a pretty good design overall. Still I have some changes I would like to make if you'll let me. (Both to the design and the implementation) ------------------------- Following your design for the Array stuff, I've been able to implement a pretty usable array class that supports the bazillion array types I need (Bit, Complex Integer, etc...). This gets me past my core requirements without polluting your world, but unfortunately my new XArray type doesn't play so well with your UFuncs. I think my users will definitely want to use your UFuncs when the time comes, so I want to remedy this situation. The first change I would like to make is to rework your code that verifies that an object is a "usable" array. I think NumArray should only check for the interface required, not the actual type hierarchy. By this I mean that the minimum required to be a supported array type is that it support the correct attributes, not that it actually inherit from NDArray: (quoting from your paper) something like: _data _shape _strides _byteoffset _aligned _contiguous _type _byteswap Most of these are just integer fields, or tuples of integers. Ignoring _type for the moment, it appears that the interface required to be a NumArray is much less strict than actually requiring it to derive from NumArray. If you allow me to change a few functions (inputarray() in numarray.py is one small example), I could use my independant XArray class almost as is, and moreover I can implement new array objects (possibly as extension types) for crazy things like working with page aligned memory, memory mapping etc... Well, that's almost enough. The _type field poses a small problem of sorts. It looks like you don't require a _type to be derived from NumericType, and this is a good thing since it allows me (and others) to implement NumArray compatible arrays without actually requiring NumArray to be present. However, it would be nice if you declared a more comprehensive list of typenames - even if they aren't all implemented in NumArray proper. Who knows, maybe the SciPy guys have a use for complex integers or bit arrays. If you make a reasonable canonical list, our data could be passed back and forth even if NumArray doesn't know what to do with it. See my attached module for the types of things I'm thinking of. I'm not so concerned about the "Native Types" that are in there, but I think committing a list of named standard types. (I suspect there are others that are interested in standard C types even if the size changes between machines...) If you were to specify a minimal interface like this in the short term, I could begin propagating my array module to my users. I could get my work done now, knowing that I'll be compatible with NumArray proper once it matures. I'd be willing to participate in making these changes if necessary. Looking at the big picture, I think it's desirable that there really only be one official standard for ND arrays in the Python world. That way, the various independent groups can all share their independent work. You guys are the heir-apparent, so to speak, from the Python guys point of view. I don't know if you're trying to get all of NumArray into the Python distribution or not, but I suspect a good interim step would be to have a PEP that specifies what it means to be a NumArray or NDArray in minimal terms. Perhaps supplying an Array only module in Python that implements this interface. Again, I'd be willing to help with all of this. ------------------------- Ok, other suggestions... Here is the list of things that your design document indicates are required to be a NumArray: _data _shape _strides _byteoffset _aligned _contiguous _type _byteswap I believe that one could calculate the values for _aligned and _contiguous from the other fields. So they shouldn't really be part of the interface required. I suspect it is useful for the C implementation of UFuncs to have this information in the NDINfo struct though, so while I would drop them from attribute interface, I would delegate the task of calculating these values to getNDInfo() and/or getNumInfo(). I also notice that you chose _byteswap to indicate byteswapping is needed. I think a better choice would be to specify the endian-ness of the data (with an _endian attr), and have getNDInfo() and getNumInfo() calculte the _byteswap value for the NDInfo struct. In my implementation, I came up with a slightly different list: self._endian self._offset self._shape self._stride self._itemtype self._itemsize self._itemformat self._buffer The only minimal differences are that _itemsize allows me to work with arrays of bytes without having any clue what the underlying type is (in some cases, _itemtype is "Unknown".) Secondly, I implemented a "Struct" _itemtype, and _itemformat is useful for for this case. (It's the same format string that the struct module in Python uses.) Also, I specified 0 for _itemsize when the actual items aren't byte addressable. In my module, this only occurred with the Bit type. I figured specifying 0 like this could keep a UFunc that isn't Bit aware from stepping on memory that it isn't allowed to. ------------------------- Next thought: Memory Mapping I really like the idea of having Python objects that map huge files a piece at time without using all of available memory. I've seen this in NumArray's charter as part of the reason for breaking away from Numeric, and I'm curious how you intend to address it. Right now, the only requirement for _data seems to be that it implement the PyBufferProcs. For memory mapping something else is needed... I haven't implemented this, so take it as just my rambling thoughts: With the addition of 3 new, optional, attributes to the NumArray object interface, I think this could be efficiently accomplished: _mapproc _mapmin _mapmax If _mapproc is present and not None, then it points to a function who's responsibility it is to set _mapmin and _mapmax appropriately. _mapproc takes one argument which is the desired byte offset into the virtual array. This is probably easier to describe with code: def _mapproc(self, offset): unmap_the_old_range() mmap_a_new_range_that_includes_byteoffset() self._mapmin = minimum_of_new_range() self._mapmax = maximum_of_new_range() In this way, when the delta between _mapmin and _mapmax is large enough, the UFuncs could act over a large contiguous portion of the _data array at a time before another remapping is necessary. If the byteoffset that a UFunc needs to work with is outside of _mapmin and _mapmax, it must call _mapproc to remedy the situation. This puts a lot of work into UFuncs that choose to support this. I suppose that is tough to avoid though. Also, there are threading issues to think about here. I don't know if UFuncs are going to release the Global Interpreter Lock, but if they do it's possible that multiple threads could have the same PyObject and try to _mapproc different offsets at different times. It is possible to implement a mutex for the NumArray without requiring anything special from the PyObject that implements it... ----------------------------- Ok. That's probably way too much content for an Introductory email. I do have more thoughts on this stuff though. They'll just have to wait for another time. Nice to meet you all, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: XArray.py URL: From perry at stsci.edu Thu Apr 11 09:02:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 11 09:02:05 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com> Message-ID: Hi Scott, I've printed out your message and will try to read and understand it today. It may be a couple days before we can respond, so don't take a lack of an immediate response as disinterest. Thanks, Perry From jmiller at stsci.edu Thu Apr 11 14:36:03 2002 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 11 14:36:03 2002 Subject: [Numpy-discussion] slice question and bug References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu> Message-ID: <3CB60188.1010203@stsci.edu> clee at spiralis.merseine.nu wrote: > >clee at spiralis.merseine.nu writes: > > > > Hello, > > I'm trying to track down a segv when I do the B[:] operation on an > > array, "B", a that I've built in as a view on external data. During... > > [snip] > >To clarify my own somewhat non-sensical post: When I started composing >my message, I was trying to figure out a bug in my own code that >caused a crash while doing slice_array. I've since fixed that bug. >However, in the process of figuring out what I was doing wrong I >was browsing the Numeric source code. While examining >PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the >number of dimensions is greater than 2, yet it has code that tests for >when the number of dimensions equals 3. > >So utimately, my post is just an alert, that I think there might be >some code that needs to be cleaned up. > >Thanks, > lacking-caffeine-ly yours > -chris > >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > Looking at the code to PyArray_Free, I agree with Chris. Called to free a 2D array, I think that PyArray_Free leaks all of the row storage because ap->nd == 2, not 3: * {%c++%} */ extern int PyArray_Free(PyObject *op, char *ptr) { PyArrayObject *ap = (PyArrayObject *)op; int i, n; if (ap->nd > 2) return -1; if (ap->nd == 3) { n = ap->dimensions[0]; for (i=0; ind >= 2) { free(ptr); } Py_DECREF(ap); return 0; } /* {%c++%} */ Other opinions? Todd -- Todd Miller jmiller at stsci.edu STSCI / SSG (410) 338 4576 From perry at stsci.edu Thu Apr 11 14:57:14 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 11 14:57:14 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com> Message-ID: > [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Scott > Gilbert > Subject: [Numpy-discussion] Introduction > > > Hello All. > > I'm interested in this project, and am curious to what level you are > willing to accept outside contribution. I just tried to subscribe to > the developers list, but I didn't realize that required admin approval. > Hopefully it doesn't look like I was shaking the door without knocking > first. > > Is this list active? Is this the correct place to talk about Numarray? Sure. > > Following your design for the Array stuff, I've been able to implement > a pretty usable array class that supports the bazillion array types I > need (Bit, Complex Integer, etc...). This gets me past my core > requirements without polluting your world, but unfortunately my new > XArray type doesn't play so well with your UFuncs. I think my users > will definitely want to use your UFuncs when the time comes, so I want > to remedy this situation. > > The first change I would like to make is to rework your code that > verifies that an object is a "usable" array. I think NumArray should > only check for the interface required, not the actual type hierarchy. > By this I mean that the minimum required to be a supported array type > is that it support the correct attributes, not that it actually inherit > from NDArray: > > (quoting from your paper) something like: > > _data > _shape > _strides > _byteoffset > _aligned > _contiguous > _type > _byteswap > > Most of these are just integer fields, or tuples of integers. Ignoring > _type for the moment, it appears that the interface required to be a > NumArray is much less strict than actually requiring it to derive from > NumArray. If you allow me to change a few functions (inputarray() in > numarray.py is one small example), I could use my independant XArray > class almost as is, and moreover I can implement new array objects > (possibly as extension types) for crazy things like working with page > aligned memory, memory mapping etc... > I guess we are not sure we understand what you mean by interface. In particular, we don't understand why sharing the same object attributes (the private ones you list above) is a benefit to the code you are writing if you aren't also using the low level implementation. The above attributes are private and nothing external to the Class should depend on or even know about them. Could you elaborate on what you mean by interface and the relationship between your arrays and numarrays? > > Well, that's almost enough. The _type field poses a small problem of > sorts. It looks like you don't require a _type to be derived from > NumericType, and this is a good thing since it allows me (and others) > to implement NumArray compatible arrays without actually requiring > NumArray to be present. > What do you mean by NumArray compatible? [some issues snipped since we need to understand the interface issue first] > I don't know if you're trying to get all of NumArray into the Python > distribution or not, but I suspect a good interim step would be to have > a PEP that specifies what it means to be a NumArray or NDArray in > minimal terms. Perhaps supplying an Array only module in Python that > implements this interface. Again, I'd be willing to help with all of > this. > We are hoping to get numarray into the distribution [it won't be the end of the world for us if it doesn't happen]. I'll warn you that the PEP is out of date. We are likely to update it only after we feel we are close to having the implementation ready for consideration for including into the standard distribution. I would refer to the actual implementation and the design notes for the time being. > > ------------------------- > > Ok, other suggestions... > > Here is the list of things that your design document indicates are > required to be a NumArray: > > _data > _shape > _strides > _byteoffset > _aligned > _contiguous > _type > _byteswap > > I believe that one could calculate the values for _aligned and > _contiguous from the other fields. So they shouldn't really be part of > the interface required. I suspect it is useful for the C > implementation of UFuncs to have this information in the NDINfo struct > though, so while I would drop them from attribute interface, I would > delegate the task of calculating these values to getNDInfo() and/or > getNumInfo(). > > I also notice that you chose _byteswap to indicate byteswapping is > needed. I think a better choice would be to specify the endian-ness of > the data (with an _endian attr), and have getNDInfo() and getNumInfo() > calculte the _byteswap value for the NDInfo struct. > > In my implementation, I came up with a slightly different list: > > self._endian > self._offset > self._shape > self._stride > self._itemtype > self._itemsize > self._itemformat > self._buffer > Some of the name changes are worth considering (like replacing ._byteswap with an endian indicator, though I find _endian completely opaque as to what it would mean--1 means what? little or big?). (BTW, we already have _itemsize). _contiguous and _aligned are things we have been considering changing, but I would have to think about it carefully to determine if they really are redundant. > The only minimal differences are that _itemsize allows me to work with > arrays of bytes without having any clue what the underlying type is (in > some cases, _itemtype is "Unknown".) Secondly, I implemented a > "Struct" _itemtype, and _itemformat is useful for for this case. (It's > the same format string that the struct module in Python uses.) > It looks like you are trying to deal with records with these "structs". We deal with records (efficiently) in a completely different way. Take a look at the recarray module. > Also, I specified 0 for _itemsize when the actual items aren't byte > addressable. In my module, this only occurred with the Bit type. I > figured specifying 0 like this could keep a UFunc that isn't Bit aware > from stepping on memory that it isn't allowed to. > Again, we aren't sure how this works with numarray. > ------------------------- > > Next thought: Memory Mapping > > I really like the idea of having Python objects that map huge files a > piece at time without using all of available memory. I've seen this in > NumArray's charter as part of the reason for breaking away from > Numeric, and I'm curious how you intend to address it. > > Right now, the only requirement for _data seems to be that it implement > the PyBufferProcs. For memory mapping something else is needed... > > I haven't implemented this, so take it as just my rambling thoughts: > > With the addition of 3 new, optional, attributes to the NumArray object > interface, I think this could be efficiently accomplished: > > _mapproc > _mapmin > _mapmax > > If _mapproc is present and not None, then it points to a function who's > responsibility it is to set _mapmin and _mapmax appropriately. > _mapproc takes one argument which is the desired byte offset into the > virtual array. This is probably easier to describe with code: > > def _mapproc(self, offset): > unmap_the_old_range() > mmap_a_new_range_that_includes_byteoffset() > self._mapmin = minimum_of_new_range() > self._mapmax = maximum_of_new_range() > > In this way, when the delta between _mapmin and _mapmax is large > enough, the UFuncs could act over a large contiguous portion of the > _data array at a time before another remapping is necessary. If the > byteoffset that a UFunc needs to work with is outside of _mapmin and > _mapmax, it must call _mapproc to remedy the situation. > > This puts a lot of work into UFuncs that choose to support this. I > suppose that is tough to avoid though. > We deal with memory mapping a completely differnent way. It's a bit late for me to go into it in great detail, but we wrap the standard library mmap module with a module that lets us manage memory mapped files. This module basically memory maps an entire file and then in effect mallocs segments of that file as buffer objects. This allocation of subsets is needed to ensure that overlapping memory maps buffers don't happen. One can basically reserve part of the memory mapped file as a buffer. Once that is done, nothing else can use that part of the file for another buffer. We do not intend to handle memory maps as a way of sequentially mapping parts of the file to provide windowed views as your code segment above suggests. If you want a buffer that is the whole (large) file, you just get a mapped buffer to the whole thing. (Why wouldn't you?) The above scheme is needed for our purposes because many of our data files contain multiple data arrays and we need a means of creating a numarray object for each one. Most of this machinery has already been implemented, but we haven't released it since our I/O package (for astronomical FITS files) is not yet at the point of being able to use it. > Also, there are threading issues to think about here. I don't know if > UFuncs are going to release the Global Interpreter Lock, but if they do > it's possible that multiple threads could have the same PyObject and > try to _mapproc different offsets at different times. > To tell you the truth, we haven't dealt with the threading issue much. We think about it occasionally, but have deferred dealing with it until we have finished other aspects first. We do want to make it thread safe though. Perry Greenfield From oliphant at ee.byu.edu Thu Apr 11 15:47:04 2002 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 11 15:47:04 2002 Subject: [Numpy-discussion] slice question and bug In-Reply-To: <3CB60188.1010203@stsci.edu> Message-ID: > Looking at the code to PyArray_Free, I agree with Chris. Called to > free a 2D > array, I think that PyArray_Free leaks all of the row storage because > ap->nd == 2, not 3: > > * {%c++%} */ > extern int PyArray_Free(PyObject *op, char *ptr) { > PyArrayObject *ap = (PyArrayObject *)op; > int i, n; > > if (ap->nd > 2) return -1; > if (ap->nd == 3) { > n = ap->dimensions[0]; > for (i=0; i free(((char **)ptr)[i]); > } > } > if (ap->nd >= 2) { > free(ptr); > } > Py_DECREF(ap); > return 0; > } > /* {%c++%} */ > > This has been broken since the beginning. I believe the documentation says as much. I've never used it because I always think of 2-D arrays as a block of data not as rows of pointers. It should be fixed, but no one's ever been interested enough to do it. -Travis Oliphant From xscottg at yahoo.com Thu Apr 11 21:46:02 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 11 21:46:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020412044201.63373.qmail@web12908.mail.yahoo.com> --- Perry Greenfield wrote: > > I guess we are not sure we understand what you mean by interface. > In particular, we don't understand why sharing the same object > attributes (the private ones you list above) is a benefit to the > code you are writing if you aren't also using the low level > implementation. The above attributes are private and nothing > external to the Class should depend on or even know about them. > Could you elaborate on what you mean by interface and the > relationship between your arrays and numarrays? > There are several places in your code that check to see if you are working with a valid type for NDArrays. Currently this check consists of asking the following questions: 'Is it a tuple or list?' 'Is it a scalar of some sort?' 'Does it derive from our NDArray class?' If any of these questions answer true, it does the right thing and moves on. If none of these is true, it raises an exception. I suppose this is fine if you are only concerned about working with your own implementation of an array type, but I hope you'll consider the following as a minor change that opens up the possibility for other compatible array implementations to work interoperably. Instead have the code ask the following questions: 'Is it a tuple or list?' 'Is it a scalar of some sort?' 'Does it support the attributes necessary to be like an NDArray object?' This change is very similar to how you can pass in any Python object to the "pickle.dump()" function, and if it supports the "write()" method it will be called: >>> class WhoKnows: ... def write(self, x): ... print x >>> >>> import pickle >>> >>> w = WhoKnows() >>> >>> pickle.dump('some data', w) S'some data' p1 . Until reading your response above, I didn't realize that you consider your single underscore attributes to be totally private. In general, I try to use a single underscore to mean protected (meaning you can use them if you REALLY know what you are doing), hence my confusion. With that in mind, pretend that I suggested the following instead: The specification of an NDArray is that it has the following attributes ndarray_buffer - a PyObject which has PyBufferProcs ndarray_shape - a tuple specifying the shape of the array ndarray_stride - a tuple specifyinf the index multipliers ndarray_itemsize - an int/long stating the size of items ndarray_itemtype - some representation of type This would be a very minor change to your functions like inputarray(), getNDInfo(), getNDArray(), but it would allow your UFuncs to work with other implementations of arrays. As an example similar to the pickle example above: import array class ScottArray: def __init__(self): self.ndarray_buffer = array.array('d', [0]*100) self.ndarray_shape = (10, 10) self.ndarray_stride = (80, 8) self.ndarray_itemsize = 8 self.ndarray_itemtype = 'Float64' import numarray n = numarray.numarray((10, 10), type='Float64') s = ScottArray() very_cool = numarray.add(n, s) This example is kind of silly. I mean, why wouldn't I just use numarray for all of my array needs? Well, that's where my world is a little different than yours I think. Instead of using 'array.array()' above, there are times where I'll need to use 'whizbang.array()' to get a different PyBufferProcs supporting object. Or where I'll need to work with a crazy type in one part of the code, but I'd like to pass it to an extension that combines your types and mine. In these cases where I need "special memory" or "special types" I could try and get you guys to accept a patch, but this would just pollute your project and probably annoy you in general. A better solution is to create a general standard mechanism for implementing NDArray types, and let me make my own. In the above example, we could have completely different NDArray implementations working interoperably inside of one UFunc. It seems to me that all it really takes to be an NDArray can be specified by a list of attributes like the one above. (Probably need a few more attributes to be really general: 'ndarray_endian', etc...) In the end, NDArrays are just pointers to a buffer, and descriptors for indexing. I don't believe this would have any significant affect on the performance of numarray. (The efficient fast C code still gets a pointer to work with.) More over, I'd be very willing to contribute patches to make this happen. If you agree, and we can flesh out what this "attribute interface" should be, then I can start distributing my own array module to the engineers where I work without too much fear that they'll be screwed once numarray is stable and they want to mix and match. Code always lives a lot longer than I want it to, and if I give them something now which doesn't work with your end product, I'll have done them a disservice. BTW: Allowing other types to fill in as NDArrays also allows other types to implement things like slicing as they see fit (slice and copy contiguious, slice and copy on write, slice and copy by reference, etc...). > > We are hoping to get numarray into the distribution [it won't be the > end of the world for us if it doesn't happen]. I'll warn you that the > PEP is out of date. We are likely to update it only after we feel > we are close to having the implementation ready for consideration > for including into the standard distribution. I would refer to the > actual implementation and the design notes for the time being. > Yeah, I recognize that the PEP is gathering dust at the moment. I'm not having too much trouble following through the source and design docs. It took me a few days to "get it", but that's probably because I'm slower than your average bear. :-) Regarding the PEP, what I would like to see happen is that if we agree that the "attribute interface" stuff above is the right way to go about things, I would (or we would) submit a milder interim PEP specifying what those attributes are, how they are to be interpreted, and a simple Python module implementing a general NDArray class for consumption. Hopefully this PEP would specify a canonical list of type names as well. Then we could make updates to the other PEP if necessary. > > Some of the name changes are worth considering (like replacing ._byteswap > with an endian indicator, though I find _endian completely opaque as to > what it would mean--1 means what? little or big?). (BTW, we already have > _itemsize). _contiguous and _aligned are things we have been considering > changing, but I would have to think about it carefully to determine if > they really are redundant. > It's all open for discussion, but I would propose that ndarray_endian be one of: '>' - big endian '<' - little endian This is how the standard Python struct module specifies endian, and I've been trying to stay consistant with the baseline when possible. > > It looks like you are trying to deal with records with these "structs". > We deal with records (efficiently) in a completely different way. Take > a look at the recarray module. > Will definitely do. I've called them structs simply because they borrow their format string from the struct module that ships with Python. I'm not hung up on the name, and I wouldn't object to an alias. Too early for me to tell if there is even a difference in the underlying memory, but maybe we'll end up with 'structs' for my notion of things, and 'records' for yours. > > We deal with memory mapping a completely different way. It's a bit late > for me to go into it in great detail, but we wrap the standard library > mmap module with a module that lets us manage memory mapped files. > This module basically memory maps an entire file and then in effect > mallocs segments of that file as buffer objects. This allocation of > subsets is needed to ensure that overlapping memory maps buffers > don't happen. One can basically reserve part of the memory mapped file > as a buffer. Once that is done, nothing else can use that part of the > file for another buffer. We do not intend to handle memory maps as a > way of sequentially mapping parts of the file to provide windowed views > as your code segment above suggests. If you want a buffer that is the > whole (large) file, you just get a mapped buffer to the whole thing. > (Why wouldn't you?) > I think the idea of taking a 500 megabyte (or 5 gigabyte) file, and windowing 1 meg of actual memory at time pretty attractive. Sometimes we do very large correlations, and there just isn't enough memory to mmap the whole file (much less two files for correlation). Any library that doesn't want to support this business could just raise a NotImplemented error on encountering them. Maybe I shouldn't be calling this "memory mapping". Even though it could be implemented on top of mmap, truthfully I just want to support a "windowing" interface. If we could specify the windowing attributes and indicate the standard usage that would be great. Maybe: ndarray_window(self, offset) ndarray_winmin ndarray_winmax > > The above scheme is needed for our purposes because many of our data files > contain multiple data arrays and we need a means of creating a numarray > object for each one. Most of this machinery has already been implemented, > but we haven't released it since our I/O package (for astronomical FITS > files) is not yet at the point of being able to use it. > There is a group at my company that is using FITS for some stuff. I don't know enough about it to comment though... Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Fri Apr 12 17:44:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Apr 12 17:44:04 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020412044201.63373.qmail@web12908.mail.yahoo.com> Message-ID: Scott Gilbert writes: > import array > class ScottArray: > def __init__(self): > self.ndarray_buffer = array.array('d', [0]*100) > self.ndarray_shape = (10, 10) > self.ndarray_stride = (80, 8) > self.ndarray_itemsize = 8 > self.ndarray_itemtype = 'Float64' > > import numarray > > n = numarray.numarray((10, 10), type='Float64') > s = ScottArray() > > very_cool = numarray.add(n, s) > But why not (I may have some details wrong, I'm doing this from memory, and I haven't worked on it myself in a bit): import array import numarray import memory # comes with numarray class ScottArray(NumArray): def __init__(self): # create necessary buffer obj buf = memory.writeable_buffer(array.array('d', [0]*100)) Numarray.__init__(self, shape=(10, 10), type=numarray.Float64 buffer=buf) # _strides not settable from constructor yet, but currently # if you needed to set it: # self._strides = (80, 8) # But for this case it would be computed automatically from # the supplied shape n = numarray.numarray((10, 10), type='Float64') s = ScottArray() maybe_not_quite_so_cool_but_just_as_functional = n + s > This example is kind of silly. I mean, why wouldn't I just use > numarray for > all of my array needs? Well, that's where my world is a little > different than > yours I think. Instead of using 'array.array()' above, there are > times where > I'll need to use 'whizbang.array()' to get a different > PyBufferProcs supporting > object. Or where I'll need to work with a crazy type in one part > of the code, > but I'd like to pass it to an extension that combines your types and mine. > > In these cases where I need "special memory" or "special types" I > could try and > get you guys to accept a patch, but this would just pollute your > project and > probably annoy you in general. A better solution is to create a general > standard mechanism for implementing NDArray types, and let me make my own. > >From everything I've seen so far, I don't see why you can't just create a NumArray object directly. You can subclass it (and use multiple inheritance if you need to subclass a different object as well) and add whatever customized behavior you want. You can create new kinds of objects as buffers just so long as you satisfy the buffer interface. > > In the above example, we could have completely different NDArray > implementations working interoperably inside of one UFunc. It > seems to me that > all it really takes to be an NDArray can be specified by a list > of attributes > like the one above. (Probably need a few more attributes to be > really general: > 'ndarray_endian', etc...) In the end, NDArrays are just pointers > to a buffer, > and descriptors for indexing. > Again, why not just create an NDArray object with the appropriate buffer object and attributes (subclassing if necessary). > > I don't believe this would have any significant affect on the > performance of > numarray. (The efficient fast C code still gets a pointer to > work with.) More > over, I'd be very willing to contribute patches to make this happen. > > > If you agree, and we can flesh out what this "attribute > interface" should be, > then I can start distributing my own array module to the > engineers where I work > without too much fear that they'll be screwed once numarray is > stable and they > want to mix and match. > > Code always lives a lot longer than I want it to, and if I give > them something > now which doesn't work with your end product, I'll have done them > a disservice. > All good in principle, but I haven't yet seen a reason to change numarray. As far as I can tell, it provides all you need exactly as it is. If you could give an example that demonstrated otherwise... > > It's all open for discussion, but I would propose that > ndarray_endian be one > of: > > '>' - big endian > '<' - little endian > > This is how the standard Python struct module specifies endian, > and I've been > trying to stay consistant with the baseline when possible. > To tell you the truth, I'm not crazy about how the struct module handles types or attributes. It's generally far too cryptic for my tastes. Other than providing backward compatibility, we aren't interested in it emulating struct. > > > > The above scheme is needed for our purposes because many of our > data files > > contain multiple data arrays and we need a means of creating a numarray > > object for each one. Most of this machinery has already been > implemented, > > but we haven't released it since our I/O package (for astronomical FITS > > files) is not yet at the point of being able to use it. > > > > I could well misundertand, but I thought that if you mmap a file in unix in write mode, you do not use up the virtual memory as limited by the physical memory and the paging file. Your only limit becomes the virtual address space available to the processor. If the 32 bit address is your problem, you are far, far better off using a 64-bit processor and operating system than trying to kludge up a windowing memory mechanism. I could see a way of doing it for ufuncs, but the numeric world (and I would think the DSP world as well) needs far more than element-by-element array functionality. providing a usable C-api for that kind of memory model would be a nightmare. But I'm not sure if this or the page file is your limitation. Perry From kragen at pobox.com Sat Apr 13 00:25:01 2002 From: kragen at pobox.com (Kragen Sitaker) Date: Sat Apr 13 00:25:01 2002 Subject: [Numpy-discussion] segfault in Numpy esxtension Message-ID: <20020413072433.2702DBDC1@panacea.canonical.org> (All of the below is with regard to Numeric 20.2.0.) For a consulting client, I wrote a extension module that does the equivalent of sum(take(a, b)), but without the temporary result in between. I was surprised that when I tried to .resize() the result of this routine, I got a segmentation fault and a core dump. It was crashing at this line in arrayobject.c: if (memcmp(self->descr->zero, all_zero, elsize) == 0) { self->descr, in this case, was the type description for arrays of type "double". It seems that self->descr->zero was 0, as in a null pointer, not a pointer to a location containing (double)0, and this was causing it to crash. It looks like the .zero fields of the type descriptions (which live in arraytypes.c and _numpy.so) are initialized to be null pointers, and only when the initmultiarray() function in multiarraymodule.c is run are these pointers set to point to actual zeroes somewhere in allocated memory. I guess Numeric.py imports multiarray.so, which calls initmultiarray(), so the solution for me was to make sure I import Numeric before importing my module (or at least before resizing arrays produced by my module). But, to my mind, this segfault is a bug --- importing a module that follows all the rules shouldn't put Python in a state that's so dangerously inconsistent that innocent things like .resize() can crash it. Maybe the same .so file that includes the actual data items should be responsible for initializing them --- especially since import_array() imports _numpy without importing multiarray. (I assume there's a reason it wasn't done this way in the first place.) What do other people think? -- /* By Kragen Sitaker, http://pobox.com/~kragen/puzzle4.html */ char b[2][10000],*s,*t=b,*d,*e=b+1,**p;main(int c,char**v){int n=atoi(v[1]); strcpy(b,v[2]);while(n--){for(s=t,d=e;*s;s++){for(p=v+3;*p;p++)if(**p==*s){ strcpy(d,*p+2);d+=strlen(d);goto x;}*d++=*s;x:}s=t;t=e;e=s;*d++=0;}puts(t);} From xscottg at yahoo.com Sat Apr 13 03:09:04 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Sat Apr 13 03:09:04 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020413100823.45837.qmail@web12907.mail.yahoo.com> --- Perry Greenfield wrote: > Scott Gilbert writes: [...] > > > > very_cool = numarray.add(n, s) > > > But why not (I may have some details wrong, I'm doing this > from memory, and I haven't worked on it myself in a bit): > [...] > > maybe_not_quite_so_cool_but_just_as_functional = n + s > [...] > > From everything I've seen so far, I don't see why you can't > just create a NumArray object directly. You can subclass it > (and use multiple inheritance if you need to subclass a different > object as well) and add whatever customized behavior you want. > You can create new kinds of objects as buffers just so long > as you satisfy the buffer interface. > Your point about the optional buffer parameter to the NumArray is well taken. I had seen that when looking through the code, but it slipped my mind for that example. I could very well be wrong about some of these other reasons too... I have a number of reasons listed below for wanting the standard that Python adopts to specify only the interface and not the implementation. You may not find all of these pursuasive, and I apologize in advance if any looks like a criticism. (In my limited years as a professional software developer, I've found that the majority of people can be very defensive and protective of their code. I've been trying to tread lightly, but I don't know if I'm succeeding.) However if any of these reasons is persuasive, keep in mind that the actual changes I'm proposing are pretty minimal in scope. And that I'd be willing to submit patches so as to reduce any inconvenience to you. (Not that you have any reason to believe I can code my way out of a box... :-) Ok, here's my list: Philosophical You have a proposal in to the Python guys to make Numarray into the standard _implementation_. I think standards like this should specify an _interface_, not an implementation. Simplicity I can give my users a single XArray.py file, and they can be off and running with something that works right then and there, and it could in many ways be compatible with Numarray (with some slight modifications) when they decide they want the extra functionality of extension modules that you or anyone else who follows your standard provides. But they don't have to compile anything until they really need to. Your implementation leaves me with all or nothing. I'll have to build and use numarray, or I've got an in house only solution. Expediency I want to see a usable standard arise quickly. If you maintain the stance that we should all use the Numarray implementation, instead of just defining a good Numarray interface, everyone has to wait for you to finish things enough to get them accepted by the Python group. Your implementation is complicated, and I suspect they will have many things that they will want you to change before they accept it into their baseline. (If you think my list of suggestions is annoying, wait until you see theirs!) If a simple interface protocol is presented, and a simple pure Python module that implements it. The PEP acceptance process might move along quickly, but you could take your time with implementing your code. Pragmatic You guys aren't finished yet, and I need to give my users an array module ASAP. As such a new project, there are likely to be many bugs floating around in there. I think that when you are done, you will probably have a very good library. Moreover, I'm grateful that you are making it open source. That's very generous of you, and the fact that you are tolerating this discussion is definitely appreciated. Still, I can't put off my projects, and I can't task you to work faster. However, I do think we could agree in a very short term that your design for the interface is a good one. I also think that we (or just me if you like) could make a much smaller PEP that would be more readily accepted. Then everyone in this community could proceed at their own pace - knowing that if we followed the simple standard we would have inter operability with each other. Social Normally I wouldn't expect you to care about any of my special issues. You have your own problems to solve. As I said above, it's generous of you to even offer your source code. However, you are (or at least were) trying to push for this to become a standard. As such, considering how to be more general and apply to a wider class of problems should be on your agenda. If it's not, then you shouldn't be creating the standard. If you don't care about numarray becoming standard, I would like to try my hand at submitting the slightly modified version of your design. I won't be compatible with your stuff, but hopefully others will follow suit. Functionality Data Types I have needs for other types of data that you probably have little use for. If I can't coerce you to make a minor change in specification, I really don't think I could coerce you to support brand new data types (complex ints is the one I've beaten to death, because I could use that one in the short term). What happens when someone at my company wants quaternions? I suspect that you won't have direct support for those. I know that numarray is supposed to be extensible, but the following raises an exception: from numarray import * class QuaternionType(NumericType): def __init__(self): NumericType.__init__(self, "Quaternion", 4*8, 0) Quaternion = QuaternionType() # BOOM! q = array(shape=(10, 10), type=Quaternion) Maybe I'm just doing something wrong, but it looks like your code wants "Quaternion" to be in your (private?) typeConverters dictionary. Ok, try two: from numarray import * q = NDArray(shape=(10, 10), itemsize=4*8) if a[5][5] is None: print "No boom, but what can I do with it?" Maybe this is just a documentation problem. On the other hand, I can do the following pretty readily: import array class Quat2D: def __init__(self, *shape): assert len(shape) == 2 self._buffer = array.array('d', [0])*shape[0]*shape[1]*4 self._shape, self._stride = tuple(shape), (4*shape[0], 4) self._itemsize = 4*8 def __getitem__(self, sub): assert isinstance(sub, tuple) and len(sub) == 2 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] return tuple([self._buffer[offset + i] for i in range(4)]) def __setitem__(self, sub, val): assert isinstance(sub, tuple) and len(sub) == 2 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] for i in range(4): self._buffer[offset + i] = val[i] return val q = Quat2D(10, 10) q[5, 5] = (1, 2, 3, 4) print q[5, 5] This isn't very general, but it is short, and it makes a good example. If they get half of their data from calculations using Numarray, and half from whatever I provide them, and then try to mix the results in an extension module that has to know about separate implementations, life is more complicated than it should be. Operations I'm going to have to write my own C extension modules for some high performance operations. All I need to get this done is a void* pointer, the shape, stride, itemsize, itemtype, and maybe some other things to get off and running. You have a growing framework, and you have already indicated that you think of your hidden variables as private. I don't think I or my users should have to understand the whole UFunc framework and API just to create an extension that manipulates a pointer to an array of doubles. Arrays are simpler than UFuncs. I consider them to be pretty seperable parts of your design. If you keep it this way, and it becomes the standard, it seems that I and everyone else will have to understand both parts in order to create an extension module. Flexibility Numarray is going to make a choice of how to implement slicing. My guess is that it will be one of "copy contiguous", "copy on write", "copy by reference". I don't know what the correct choice is, but I know that someone else will need something different based on context. Things like UFuncs and other extension modules that do fast C level calculations typically don't need to concern themselves with slicing behaviour. Design Your implementation would be similar to having the 'pickle' module require you to derive from a 'Pickleable' base class - instead of simply providing __getstate__ and __setstate__ methods. It's an artificial constraint, and those are usually bad. > > All good in principle, but I haven't yet seen a reason to change > numarray. As far as I can tell, it provides all you need exactly > as it is. If you could give an example that demonstrated otherwise... > Maybe you're right. I suspect you as the author will come up with the quick example that shows how to implement my bizarre quaternion example above. I'm not sure if this makes either of us right or wrong, but if you're not buying any of this, then it's probably time for me to chock this off to a difference in opinion and move on. Truthfully this is taking me pretty far from my original tack. Originally I had simply hoped to hack a couple of things into arraymodule.c, and here I am now trying to get a simpler standard in place. I'll try one last time to convince you with the following two statements: - Changing such that you only require the interface is a subtle, but noticeable, improvement to your otherwise very good design. - It's not a difficult change. If that doesn't compel you, at least I can walk away knowing I tried. For the volumes I've written, this will probably be my last pesky message if you really don't want to budge on this issue. > > To tell you the truth, I'm not crazy about how the struct module > handles types or attributes. It's generally far too cryptic for > my tastes. Other than providing backward compatibility, we aren't > interested in it emulating struct. > I consider it a lot like regular expressions. I cringe when I see someone else's, but I don't have much difficulty putting them together. The alternative of coming up with a different specifier for records/structs is probably a mistake now that the struct module already has it's (terse) format specification. Once that is taken into consideration, following all the leads of the struct module makes sense to me. > > I could well misunderstand, but I thought that if you mmap a file > in unix in write mode, you do not use up the virtual memory as > limited by the physical memory and the paging file. Your only > limit becomes the virtual address space available to the processor. > Regarding efficiency, it depends on the implementations, which vary greatly, and there are other subtleties. I've already written a book above, so I won't tire you with details. I will say that closing a large memory mapped file on top of NFS can be dreadful. It probably takes the same amount of total time or less, but from an interactive analysys point of view it's pretty unpleasant on Tru64 at least. Also, just mmaping the whole file puts all of the memory use at the discretion of the OS. I might have a gig or two to work with, but if mmap takes them all, other threads will have to contend for memory. The system (application) as a whole might very well run better if I can retain some control over this. I'm not married to the windowing suggestion. I think it's something to consider, but it might not be a common enough case to try and make a standard mechanism for. If there isn't a way to do it without a kluge, then I'll drop it. Likewise if a simple strategy can't meet anyone's real needs. > > If the 32 bit address is your problem, you are far, far better off > using a 64-bit processor and operating system than trying to kludge up > a windowing memory mechanism. > We don't always get to specify what platform we want to run on. Our customer has other needs, and sometimes hardware support for exotic devices dictate what we'll be using. Frequently it is on 64 bit Alphas, but sometimes the requirement is x86 Linux, or 32 bit Solaris. Finally, our most frustrating piece of legacy software was written in Fortran assuming you could stuff a pointer into an INT*4 and now requires the -taso flag to the compiler for all new code (which turns a sexy 64 bit Alpha into a 32 bit kluge...). Also, much of our data comes on tapes. It's not easy to memory map those. > > I could see a way of doing it for > ufuncs, but the numeric world (and I would think the DSP world > as well) needs far more than element-by-element array functionality. > providing a usable C-api for that kind of memory model would be > a nightmare. But I'm not sure if this or the page file is your > limitation. > I would suggest that any extension module which is not interested in this feature simply raise a NotImplemented exception of some sort. UFuncs could fall into this camp without any criticism from me. All it would have to do is check if the 'window_get' attribute is a callable, and punt an exception. My proposal wasn't necessarily to map in a single element at a time. If the C extension was willing to work these beasts at all, it would check to see if the offset it wanted was between window_min and window_max. If it wasn't, then it would call ob.window_get(offset), and the Python object could update window_min and window_max however it sees fit. For instance by remapping 10 or 20 megabytes on both sides. This particular implementation would allow us to do correlations of a small (mega sample) chunk of data against a HUGE (giga sample) file. This might be the wrong interface, and I'm willing to listen to a better suggestion. It might also be too special of a need to detract from a simpler overall design. Also, there are other uses for things like this. It could possibly be used to implement sparse arrays. It's probably not the best implementation of that, but it could hide a dict of set data points, and present it to an extension module as a complete array. Cheers, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Sat Apr 13 18:43:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Sat Apr 13 18:43:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020413100823.45837.qmail@web12907.mail.yahoo.com> Message-ID: > Ok, here's my list: > > Philosophical > > You have a proposal in to the Python guys to make Numarray into the > standard _implementation_. I think standards like this should specify > an _interface_, not an implementation. > Sure (though there is often more to a standard than just an interface, but certainly an implementation is generally not the standard). I'm not sure why you think we imply the implementation is the standard. We are waiting to rewrite the PEP when we are closer to having the implementation ready, but we've been very open about the design and have asked for input on it for a long time now. > Simplicity > > I can give my users a single XArray.py file, and they can be off and > running with something that works right then and there, and it could in > many ways be compatible with Numarray (with some slight modifications) > when they decide they want the extra functionality of extension modules > that you or anyone else who follows your standard provides. But they > don't have to compile anything until they really need to. > > Your implementation leaves me with all or nothing. I'll have to build > and use numarray, or I've got an in house only solution. > Hard to comment on this. > Expediency > > I want to see a usable standard arise quickly. If you maintain the > stance that we should all use the Numarray implementation, instead of > just defining a good Numarray interface, everyone has to wait for you > to finish things enough to get them accepted by the Python group. Your > implementation is complicated, and I suspect they will have many things > that they will want you to change before they accept it into their > baseline. (If you think my list of suggestions is annoying, wait until > you see theirs!) > I have the strong sense you misunderstand how the process works. Guido will be driven in large part by the acceptance or non-acceptance of the Numeric community. If they don't buy into it. It won't be part of the standard. If it won't be used by many, it won't be part of the standard. Yes, he will review the design and interface to see if there should be a long term commitment by the Python maintainers to have it in the standard library. We have sent him the design documents, and we do keep him informed. He has given us feedback about it. But for the most part, the judgement is going to be by the Numeric community. > If a simple interface protocol is presented, and a simple pure Python > module that implements it. The PEP acceptance process might move along > quickly, but you could take your time with implementing your code. > > Pragmatic > > You guys aren't finished yet, and I need to give my users an array > module ASAP. As such a new project, there are likely to be many bugs > floating around in there. I think that when you are done, you will > probably have a very good library. Moreover, I'm grateful that you are > making it open source. That's very generous of you, and the fact that > you are tolerating this discussion is definitely appreciated. > > Still, I can't put off my projects, and I can't task you to > work faster. > > > However, I do think we could agree in a very short term that your design > for the interface is a good one. I also think that we (or just > me if you > like) could make a much smaller PEP that would be more readily accepted. > Then everyone in this community could proceed at their own pace > - knowing > that if we followed the simple standard we would have inter operability > with each other. > I think we still don't understand what you need yet. More elaboration on that later. > Social > > Normally I wouldn't expect you to care about any of my special issues. > You have your own problems to solve. As I said above, it's generous of > you to even offer your source code. > > However, you are (or at least were) trying to push for this to become a > standard. As such, considering how to be more general and apply to a > wider class of problems should be on your agenda. If it's not, then you > shouldn't be creating the standard. > Pleeease. Just because a library developer doesn't happen to meet your needs doesn't mean it can't be part of the standard library. There are plenty of modules in the standard library that could have been made more general in some way, but there they are. The criteria is whether it solves problems for a large community of users, not that it is infinitely extensible or so on. Software development is full of trade-offs and that includes limits to generalization. Sure we can discuss whether things could be made more general or not. But because you want it more general doesn't mean we just say "Sure, you define everything!" > If you don't care about numarray becoming standard, I would like to try > my hand at submitting the slightly modified version of your design. I > won't be compatible with your stuff, but hopefully others will follow > suit. > You are free to propose your own standard at any time. No one will stop you from doing so. > Functionality > > Data Types > > I have needs for other types of data that you probably have little use > for. If I can't coerce you to make a minor change in specification, I > really don't think I could coerce you to support brand new data types > (complex ints is the one I've beaten to death, because I > could use that > You are right on complex ints (that we won't consider them). One could take numarray and add them if one wanted and have a more extended version. But we won't do it, and we wouldn't support as being in what we maintain. It's one of those trade offs. > one in the short term). What happens when someone at my company wants > quaternions? I suspect that you won't have direct support for those. > I know that numarray is supposed to be extensible, but the following > raises an exception: > > from numarray import * > > class QuaternionType(NumericType): > def __init__(self): > NumericType.__init__(self, "Quaternion", 4*8, 0) > > Quaternion = QuaternionType() # BOOM! > > q = array(shape=(10, 10), type=Quaternion) > > Maybe I'm just doing something wrong, but it looks like your code > wants "Quaternion" to be in your (private?) typeConverters dictionary. > Yep, and there's a good reason for that. Just spend a few minutes thinking about the role types play with array packages and how they have traditionally been implemented. Generally speaking, it is presumed that any two numeric types may be used in a binary operator. So you, Scott, define your special type, Quaternions. You will need to provide the module all the machinery for knowing what to do with all the other numeric types available. You may not care, but it is a requirement that numarray (and Numeric) know what to do. If that doesn't fit in with your needs, then you shouldn't be trying to use it. The problem is worse than that. You supply a Quaternion type extension to numarray, and Bob supplies a super long int type (64 bytes!) also. Both of you have gone to the trouble of giving numarray the means of handling all other default numarray types. But you don't know to handle each other. How do you solve that problem? I don't know. If you do, let us know. Given the requirements, adding new numeric types is not going to allow indepenent extensions to work with each other. That's fairly limiting, but that's the price that is paid for the feature. > Ok, try two: > > from numarray import * > > q = NDArray(shape=(10, 10), itemsize=4*8) > > if a[5][5] is None: > print "No boom, but what can I do with it?" > > Maybe this is just a documentation problem. On the other hand, I can > do the following pretty readily: > > import array > class Quat2D: > def __init__(self, *shape): > assert len(shape) == 2 > self._buffer = array.array('d', [0])*shape[0]*shape[1]*4 > self._shape, self._stride = tuple(shape), (4*shape[0], 4) > self._itemsize = 4*8 > > def __getitem__(self, sub): > assert isinstance(sub, tuple) and len(sub) == 2 > offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] > return tuple([self._buffer[offset + i] for i in range(4)]) > > def __setitem__(self, sub, val): > assert isinstance(sub, tuple) and len(sub) == 2 > offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] > for i in range(4): self._buffer[offset + i] = val[i] > return val > > q = Quat2D(10, 10) > q[5, 5] = (1, 2, 3, 4) > print q[5, 5] > > This isn't very general, but it is short, and it makes a good example. > I'm not sure what it proves. If all you need is an array to store some kind of type, be able to index and slice it, and not provide numeric operations, by all means use the existing array module, it does that fine. It's more work to subclass NDArray, but it can do it too, and gives you more capabilities (you won't be able to use index arrays or broadcasting in the array module for example). The extra functionality comes at some price. Sure, it isn't as simple to extend. It's your choice if it is worth it or not. If you want to add your large quaterion array efficiently, then the array module is worthless. Your example shows nothing about what your real needs for the object are. > If they get half of their data from calculations using Numarray, and > half from whatever I provide them, and then try to mix the results in > an extension module that has to know about separate implementations, > life is more complicated than it should be. > It's how you intend to 'mix' these that I have no clue about. > Operations > > I'm going to have to write my own C extension modules for some high > performance operations. All I need to get this done is a void* > pointer, > the shape, stride, itemsize, itemtype, and maybe some other things to > get off and running. You have a growing framework, and you have > already > indicated that you think of your hidden variables as private. I don't > think I or my users should have to understand the whole UFunc > framework > and API just to create an extension that manipulates a pointer to an > array of doubles. > Sigh. No one said you had to understand the ufunc framework to do so. We are working on an C API that just gives you a simple pointer (it's actually available now, but we aren't going to tout it until we have better documentation). > Arrays are simpler than UFuncs. I consider them to be pretty > seperable > parts of your design. If you keep it this way, and it becomes the > standard, it seems that I and everyone else will have to understand > both parts in order to create an extension module. > Wrong. > Flexibility > > Numarray is going to make a choice of how to implement slicing. > My guess > is that it will be one of "copy contiguous", "copy on write", "copy by > reference". I don't know what the correct choice is, but I know that > someone else will need something different based on context. > Things like > UFuncs and other extension modules that do fast C level calculations > typically don't need to concern themselves with slicing behaviour. > And they don't. > Design > > Your implementation would be similar to having the 'pickle' module > require you to derive from a 'Pickleable' base class - instead of simply > providing __getstate__ and __setstate__ methods. > > It's an artificial constraint, and those are usually bad. > You say. You are quite welcome do your own implementation that doesn't have this 'artificial' constraint. After all your text I *still* don't understand how you intend to use the 'interface' of the private attributes. You haven't provided any example (let alone a compelling one) of why we should accept any object that provides those attributes. Shoudn't the object also provide all the public methods. Shouldn't also provide indexing and so forth. All in all you are talking about checking quite a few attributes to make sure the object has the interface. And even if it does, *why* in the world would we presume that the C functions used by numarray would work properly with the object you provide. I really don't have a clue as to what you are getting at here, and without some real concrete example illustrating this point, I don't think there is any point to continuing this discussion. > > > > All good in principle, but I haven't yet seen a reason to change > > numarray. As far as I can tell, it provides all you need exactly > > as it is. If you could give an example that demonstrated otherwise... > > > > Maybe you're right. I suspect you as the author will come up with the > quick example that shows how to implement my bizarre quaternion example > above. I'm not sure if this makes either of us right or wrong, but if > you're not buying any of this, then it's probably time for me to chock > this off to a difference in opinion and move on. > > Truthfully this is taking me pretty far from my original tack. Originally > I had simply hoped to hack a couple of things into arraymodule.c, and here > I am now trying to get a simpler standard in place. I'll try one > last time > to convince you with the following two statements: > > - Changing such that you only require the interface is a subtle, > but noticeable, improvement to your otherwise very good design. > > - It's not a difficult change. > > > If that doesn't compel you, at least I can walk away knowing I tried. For > the volumes I've written, this will probably be my last pesky message if > you really don't want to budge on this issue. > We're not going to budge until you show us what the hell you are talking about. > > The alternative of coming up with a different specifier for > records/structs > is probably a mistake now that the struct module already has it's (terse) > format specification. Once that is taken into consideration, > following all > the leads of the struct module makes sense to me. > Again, you are free to do your own, or fork our numarray and do it the way you want. Or do your own from scratch. Or whatever. > [...] > Also, just mmaping the whole file puts all of the memory use at the > discretion of the OS. I might have a gig or two to work with, but if mmap > takes them all, other threads will have to contend for memory. The system > (application) as a whole might very well run better if I can retain some > control over this. > > > I'm not married to the windowing suggestion. I think it's something to > consider, but it might not be a common enough case to try and make a > standard mechanism for. If there isn't a way to do it without a kluge, > then I'll drop it. Likewise if a simple strategy can't meet anyone's real > needs. > You can forget our doing it. It's out of the question for us. > > > > If the 32 bit address is your problem, you are far, far better off > > using a 64-bit processor and operating system than trying to kludge up > > a windowing memory mechanism. > > > > We don't always get to specify what platform we want to run on. Our > customer has other needs, and sometimes hardware support for > exotic devices > dictate what we'll be using. Frequently it is on 64 bit Alphas, but > sometimes the requirement is x86 Linux, or 32 bit Solaris. > > Finally, our most frustrating piece of legacy software was written in > Fortran assuming you could stuff a pointer into an INT*4 and now requires > the -taso flag to the compiler for all new code (which turns a sexy 64 bit > Alpha into a 32 bit kluge...). > You may have customers with unreasonable demands. We don't have to let them cause an incredible complication in the underlying machinery. (And we won't). And we won't make it work on Windows 3.1 either. We have to draw the line somewhere. Your customers will pay dearly (and you will benefit :-). > Also, much of our data comes on tapes. It's not easy to memory map those. > Your point being? > > > [...] This doesn't seem to be going anywhere. If you can give us a better idea of how your interface needs would be used, at least we could respond to the specific issues. But we don't understand and although we are considering some changes, I'm not going to fold in your requests until we do understand. You may not be happy with the progress we are making either. Sorry, I can't help that. If you need something sooner, you'll need to do something else. Come up with your own system and try to get it into Python. Take numarray and do it the way you think it ought to be done and at the rate you think it should be done. You're welcome to. Take the array module and use that as a basis. We'd like numarray to be part of the standard. We'd like it to be the standard package in the Numeric community. But if neither happened, we'd still be working on it. We need it for our own work. Numeric doesn't give us the capabilities that we need. We are using it for our software development and it is being used to reduce HST data now. We are continuing on this regardless. Perry From paul at pfdubois.com Sat Apr 13 19:35:02 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Sat Apr 13 19:35:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <000001c1e35c$d85a2f90$0a01a8c0@NICKLEBY> I haven't been following this discussion (I have a product release on Monday). But I am getting a lot of mail stacking up for numpy-developers which will not go through unless you are one of the registered developers mailing from your registered mail account. All others, please do not use numpy-developers. This is a private channel for the official developers only. I gather from my brief reading that someone is looking for a standard to use now. That standard is Numeric. If you go with that now then when the time comes to switch to Numarray, you'll be in the same boat as the whole community and therefore liable to be able to profit from any conversion tools required. You can reduce your problems to a minimum by sticking with the Python interface where possible. If you have some special need that Numeric is not meeting please realize that what exists is a consensus product after a long evolution and it is not likely to change much to meet your particular needs. There are some areas where what is right for one set of people is wrong for the others. From xscottg at yahoo.com Sun Apr 14 04:20:03 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Sun Apr 14 04:20:03 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020414111911.2977.qmail@web12901.mail.yahoo.com> Perry, I've been trying to be persuasive, but I think all I've managed to do is to be verbose and annoy you. Please accept my apologies. I really am sorry this is going as poorly as it is. I'm doing a lousy job of getting my point across, and I'd like to turn around the tone this has taken. Email always comes off as more antagonistic than intended. Finally, my appeal to the fact that you are proposing a standard was heavy handed. I guess I was trying to use that to force you to consider my position. It clearly backfired... I'll try to be more to the point. Here's what I'm proposing, and it's only a suggestion. *** I think the requirements for being a general purpose "NDArray" can be specified with only the following attributes: __array_buffer__ - as buffer object __array_shape__ - as tuple of long __array_itemsize__ - as int Optionally __array_stride__ - as tuple of long (get from shape if None) __array_offset__ - as int (would default to 0 if not present) Then anyone who implemented these could work with the same C API for getting the pointer to memory, shape array, stride array, and item size. The set of operations on a pure "NDArray" is probably pretty minimal (reshape, transpose/rotate, index arrays?). So in order to create a full featured "NumArray", a few more attributes are required: __array_itemtype__ - as string? Optionally __array_endian__ - as 1 char string? (default to the native endian) This brings the total up to 4 required attributes, and 3 optional ones for a very general purpose array data structure. (I can think of other optional ones, but skip that for now.) > > All in all you are talking about checking quite a few attributes > to make sure the object has the interface. And even if it does, > *why* in the world would we presume that the C functions used by > numarray would work properly with the object you provide. > Because truthfully arrays are little more than a pointer to memory. That's like asking "why in the world would we presume memcpy() or qsort() would know what to do with your memory?" > > You haven't provided any example (let > alone a compelling one) of why we should accept any object that > provides those attributes. > Well, the UFuncs certainly should reject any object that they don't know how to handle. I'm currently only addressing what it takes to be an NDArray/NumArray object. OTOH, if I can present something to the UFuncs that looks like a known array type, why wouldn't UFuncs want to work with it? Ok, so what does this buy you? Well, it probably doesn't buy you personally very much. Your needs are already being met by the current implementation. Ok, so what does this cost you? A few translations: _data -> __array_buffer__ _shape -> __array_shape__ _strides -> __array_stride__ _itemsize -> __array_itemsize__ _offset -> __array_offset__ _type -> __array_type__ _byteswap -> __array_endian__ This isn't a style criticism. I'm not just asking you to change your names, I'm asking to promote the names to be a "standard interface" much like these things are in many places in Python. Also requires some small changes to getNDInfo() and getNumInfo() so that they can calculate the derived fields (contiguous, aligned, etc...). Also requires some changes to your scripts so that it checks for the interface rather than the inheritance. What are the benefits to anyone else? - Describes how anyone could implement something that looks and acts like NDArrays or NumArrays. There are probably a lot of reasons to want to do this. I have some reasons that I don't think you value too much. I think others would have reasons which I can't imagine too. - Allows one standard API for getting at the basics of NDArrays/NumArrays - Allows anyone to easily implement other data types for NumArrays. The typecode won't match any of your builtin types, but maybe other third parties could agree on other typecodes for their crazy needs and share modules. - Allows me personally to distribute a separate (and simpler) implementation of NDArrays/NumArrays right now and have the same data objects work with yours when you're all done. If I give the UFuncs a pointer to memory, and the attributes above, why shouldn't it work correctly? > > We're not going to budge until you show us what the hell you are talking > about. > Am I doing any better? I am trying. > > You are right on complex ints (that we won't consider them). One > could take numarray and add them if one wanted and have a more > extended version. But we won't do it, and we wouldn't support as > being in what we maintain. It's one of those trade offs. > Is there a way, today, without modifying numarray, for me to use numarray as a holder for these esoteric data types? Is that way difficult? Could it be easier? I'm not asking numarray to know about my types in it's core baseline. I'm wondering what it takes to implement new types at all. > > Your example shows nothing about what your > real needs for the object are. > My real needs are all over the place. Some of which you've shown me are solvable with the current implementation of numarray. Some of which you've not addressed or said you won't address. To be explicit: Here are (at least most of) my _needs_ for array objects: - support a wide variety of data types (user defined) - have efficient storage - support the pickle interface for serialization - allow alternate sources of underlying memory - have an easy interface for accessing the pieces necessary to create C extensions (buffer, shape, stride, ...) - completed and reliable in the near term Here are (at least some of) my _wants_ for array objects: - cooperate on some level with other standard array modules (once the standard is set) - have same API for accessing the pieces (buffer, shape, stride, ...) as all standard array modules will. - implementation in pure Python so that building extension modules is not required until the fast operations present in those modules is required. - implemented from a standard that is as good as it can be Here are (at least some of) my _whims_ for array objects: - has "windowing" functionality to work efficiently with really large files (on any modern platform). - alternate implementations for things such as "slicing behaviour" (copy on write, reference). Loosely following your design, I've already written a module that meets my "needs", I was hoping that we could cooperate towards filling in some of my "wants" (cooperating array modules), and I've brought up my "whims" because I thought they were interesting possibilities for discussion. I was going to respond to some of your other remarks, but I've probably wasted enough of your time. If you don't respond to this message, I'll take that as a sign that we just aren't going to see eye to eye on any of this, and I won't bother you any more. (I'll be half surprised if you even get this message. From the tone of your last one, I wouldn't be shocked to find out you've already added me to your killfile. :-) No hard feelings, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Sun Apr 14 11:55:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Sun Apr 14 11:55:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020414111911.2977.qmail@web12901.mail.yahoo.com> Message-ID: Hi Scott, Just to be to the point, I'm still missing what I've been asking for, to wit a concrete example that illustrates your point. I'll try to address a few of your points that appear to try to answer that and clarify what I mean by concrete example. > > Here's what I'm proposing, and it's only a suggestion. > > > *** I think the requirements for being a general purpose "NDArray" > can be specified with only the following attributes: > > __array_buffer__ - as buffer object > __array_shape__ - as tuple of long > __array_itemsize__ - as int > > Optionally > __array_stride__ - as tuple of long (get from shape if None) > __array_offset__ - as int (would default to 0 if not present) > > Then anyone who implemented these could work with the same C API for > getting the pointer to memory, shape array, stride array, and item size. > Then you are talking about standardizing a C-API. But I'm still confused. If you write a class that implements these attributes, is it your C-API that uses them, or do you mean our C-API uses them? If you have your own C-API, then the attributes are not relevant as an interface. If you intend to use our C-API to access your objects, then they are. But if you want to use our C-API, that still doesn't explain why the alternatives aren't acceptable (namely subclassing). > > Because truthfully arrays are little more than a pointer to memory. > > That's like asking "why in the world would we presume memcpy() or > qsort() would know what to do with your memory?" > Then you misunderstand Numarray. Numarrays are far more than just a pointer to memory. You can get a pointer to memory from them, but they entail much more than that. Numarray presumes that certain things are possible with NumArray objects (like standard math operations). If you want something that doesn't make such an assumption, you should be using NDArray instead. NDArray makes no presumptions about the contents of the memory other than they are arranged in memory in array fashion. > > > > > You haven't provided any example (let > > alone a compelling one) of why we should accept any object that > > provides those attributes. > > > > Well, the UFuncs certainly should reject any object that they don't > know how to handle. I'm currently only addressing what it takes to be > an NDArray/NumArray object. OTOH, if I can present something to the > UFuncs that looks like a known array type, why wouldn't UFuncs > want to work with it? > If you are presenting numarray with a type is already knows about, why aren't you subclassing it? If you present numarray an object with a type it doesn't know about, then that is pointless. Types and numarray are inextricably intertwined, and shall remain so. > > - Allows me personally to distribute a separate (and simpler) > implementation of NDArrays/NumArrays right now and have the same data > objects work with yours when you're all done. If I give the UFuncs a > pointer to memory, and the attributes above, why shouldn't it work > correctly? > > > Am I doing any better? I am trying. > Not really. More on that later. > > > Is there a way, today, without modifying numarray, for me to use > numarray as a holder for these esoteric data types? Is that way > difficult? > Could it be easier? > No to the first, it isn't intended to serve that purpose. If you just need something to blindly hold values without doing anything with them use NDArray (and you can add whatever customization you wish regarding what methods or operators are available). > I'm not asking numarray to know about my types in it's core baseline. I'm > wondering what it takes to implement new types at all. > It's possible to extend (but not in any way that makes it automaticaly usable with anyone elses extension. Currently that sort of extension would not be hard for someone that knows how things work. We haven't documented how to do so, and won't for a while. It's not a high priority for us now. ********************************************************** What I want to see is a specific example. I'm not going to pay much attention to generalities becasue I'm still unclear about how you intend to do what you say you will do. Perhaps I'm slow, but I still don't get it. On the one hand, you ask us to have numarray accept objects with the same 'interface'. Well, if they are not of an existing supported type, thats pointless since numarray won't work properly with them. If it is an existing type, you haven't explained why you can't use numarray directly (or alternatively, create a numarray object that uses the same buffer yours does). I still haven't seen a specific example that illustrates why you cannot use subclassing or an instance of a numarray object instead. If you need to add a new type that's possible but you'll have to spend some time figuring out how to do that for your own extended version. If you just want to use arrays to hold values (of new types), then use NDArray. It doesn't care about types. But please give a specific case. E.g., "I want complex ints and I will develop a class that will use this to do the following things [it doesn't have to be exhastive or complete, but include just enough to illustrate the point]. If the attributes were standardized then I would do this and that, and use it with your stuff like this showing you the code (and the behavior I expect)." Given this I can either show you an alternate solution or I can realize why you are right and we can discuss where to go from there. Otherwise you are wasting your time. Perry From xscottg at yahoo.com Sun Apr 14 21:10:12 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Sun Apr 14 21:10:12 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020415040923.5808.qmail@web12903.mail.yahoo.com> --- Perry Greenfield wrote: *** Just skim through my first few responses. About half way through writing this letter, a few things hit me. I still want to propose some changes, but I don't think you'll find them as intrusive... > > > > > Then anyone who implemented these could work with the same C API for > > getting the pointer to memory, shape array, stride array, and item > > size. > > > Then you are talking about standardizing a C-API. But I'm still > confused. If you write a class that implements these attributes, > is it your C-API that uses them, or do you mean our C-API uses > them? > I'm not really talking about standardizing a C-API. I'm talking about standardizing what that C-API would have to do. You would have your C-API as part of numarray proper. And, for the short term, I would have my own C-API as part of what I need to get done. Both C-API's would use the same attributes. Why do I want my own C-API today? Because numarray isn't done yet, and I can't create arrays of the types I need. I'll need a C-API to get at my types. It would be great if the same C-API could get at yours too. > > If you have your own C-API, then the attributes are not > relevant as an interface. If you intend to use our C-API to access > your objects, then they are. > Either C-API could access anything that looks like an NDArray. > > > > > Because truthfully arrays are little more than a pointer to memory. > > > > That's like asking "why in the world would we presume memcpy() or > > qsort() would know what to do with your memory?" > > > > Then you misunderstand Numarray. Numarrays are far more than just > a pointer to memory. You can get a pointer to memory from them, > but they entail much more than that. Numarray presumes that certain > things are possible with NumArray objects (like standard math > operations). If you want something that doesn't make such an > assumption, you should be using NDArray instead. NDArray makes > no presumptions about the contents of the memory other than > they are arranged in memory in array fashion. > I think I understand where you're coming from now. (BTW, I think some of our confusion comes from when I'm talking about "Numarray" or "numarray" the package versus "NumArray" and "NDArray" the classes.) *** Ok, I think there is light at the end of this tunnel... I guess what I've been arguing for all along is something a lot like an NDArray where I can specify the typecode (and possibly other things like 'endian' etc...), and that only NDArrays have a minimal set of standardized attributes. With this I can create extensions that will work with anything that looks like an NDArray. Your NDArrays from the numarray package, and my NDArrays of crazy types. I'm still left in the position of having to upcast an NDArray to a full blown NumArray if I ever want to use my NDArrays in a routine meant solely for NumArrays. However this conversion isn't difficult, and I think can do that when needed. Important Question: If an NDArray had a typecode (and it was a known string), is it possible to promote it to one of the standard NumArray types? Lesser Question: If an NDArray had a known typecode, is it desirable for numarray routines to promote the NDArray to a NumArray in the same way that the routines promote a Python list or tuple to a NumArray on the fly? Ok, my new proposal (again, treat it like a suggestion): - Do you think it would be possible to standardize the set of attributes that it requires to be an NDArray? NDArrays are simple and unlikely to change. I think _those_ really are just pointers to memory with array accounting information. We could agree on what exactly constitutes an NDArray. - Could this standard set of attributes optionally include the names for the typecode, endian, (and maybe some other) attributes? That doesn't mean that your NDArrays would have to have the typecode, endian or whatever information. It just means that when any class does add a typecode, it adds it as a specially named attribute. I realize that a large part of what I want is interoperability between separate implementations of NDArrays. Anything that has (_data, _shape, _itemsize, _type) is something I could work with in an extension. Some other fields are optional (_strides, _byteoffset) because they have sensible defaults that can be calculated from above in the common case. So the only difference between what you currently have and most of what I'm proposing is that the names of NDArray attributes become standardized. > > If you are presenting numarray with a type it already knows about, > why aren't you subclassing it? > Since I know I'll have to create types that numarray doesn't know about, I know I'm going to have to write a new array class (it's already written). It would be silly of my new array class to not implement the standard types just because numarray _does_ know about them. I now realize that I don't have to give my class to numarray directly. That didn't hit me before. I could promote/upcast it when necessary. The upcast-in and downcast-out thing will add up to extra work and messier code, but it is a workaround. > > If you present numarray an object > with a type it doesn't know about, then that is pointless. > Types and numarray are inextricably intertwined, and shall > remain so. > Understood. I don't want to ruin your NumArrays. > > ********************************************************** > > What I want to see is a specific example. I'm not going to > pay much attention to generalities because I'm still unclear > about how you intend to do what you say you will do. Perhaps > I'm slow, but I still don't get it. > Nope, clearly it was me that was being slow. There is still that bit about NDArrays that I'm trying to justify, so my example is below. > > (or alternatively, > create a numarray object that uses the same buffer yours does). > You're right. This hadn't occurred to me until just a little bit ago. > > E.g., "I want > complex ints and I will develop a class that will use this to > do the following things [it doesn't have to be exhaustive or > complete, but include just enough to illustrate the point]. > If the attributes were standardized then I would do this and that, > and use it with your stuff like this showing you the code > (and the behavior I expect)." > Here goes (somewhat hypothetical, but close to the boat I'm currently in): Jon is our FPGA guy who makes screaming fast core files, but our FPGAs don't do floating point. So I have to provide his driver with ComplexInt16 data. Jon and I write an extension module that calls his driver and reads data. We also write a C routine (call it "munge") that takes both ComplexInt16 data, and ComplexFloat64 data. We try it out for testing, and pass in my arrays in both places. We could have used Numarray for the ComplexFloat64, but that meant we had to use two array packages, and use two C-APIs in our extension. All we needed was a pointer to an array of doubles, so we stuck with mine. Ok, that part of development is done. Now we present it to the application developers. Their happy and we're rolling. Successful application. Another group find out about this and they want to use it. They're using numarray for a large part of their application. In fact, their calculating the ComplexFloat64 half the data that they want to pass to my "munge" routine using numarray, and they still need to use my ComplexInt32 data to read the FPGA. They're going to be disappointed to find out my extension can't read numarray data, and that they have to convert back and forth between the two. And as the list of routines grow, they have to keep track of whether it is a numarray-routine, or a scottarray-routine. It's not so bad for one simple "munge" function, but there are going to be hundreds of functions... I don't expect you to have much sympathy for my having to convert data back and forth between my array types and yours, but it is an avoidable problem. For the most part, we both agree on what parts an NDArray should have. If we could only agree what to name them, and that we'd stick to those names, that would be a large part of it for me. > > Given this I can either show you an alternate solution or > I can realize why you are right and we can discuss where > to go from there. Otherwise you are wasting your time. > Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From jmiller at stsci.edu Mon Apr 15 11:19:09 2002 From: jmiller at stsci.edu (Todd Miller) Date: Mon Apr 15 11:19:09 2002 Subject: [Numpy-discussion] ANN: Numarray-0.3.1 and 0.3.2 Message-ID: <3CBB1955.1010800@stsci.edu> Numarray 0.3.1 and 0.3.2 --------------------------------- Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, and improved type promotions. Numarray-0.3.1 incorporates a number of bug fixes and enhancements to the C-API, including a minimal Numeric emulation layer which makes it easy to port simple Numeric C-extensions to numarray. The emulation layer is incomplete, so not all Numeric extensions will work, but simple ones *do* with a minimal amount of effort. See Doc/numpy_compat for an example of convolution done using the emulation layer. New for Numarray-0.3.1 is the Numarray manual in PDF and HTML formats; other formats are available for users if the source distribution. Numarray-0.3.2 is a source only release to support Alpha/Tru64. It is essentially Numarray-0.3.1 + one portability bug fix. WHERE ----------- Numarray-0.3.1 windows executable installers and source code tar ball is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS -------------------------- numarray-0.3.1 requires Python 2.0 or greater. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. Thanks go to Jochen Kupper of the University of North Carolina for his work on Numarray and for porting the Numarray manual to TeX format. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From perry at stsci.edu Mon Apr 15 14:20:01 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Apr 15 14:20:01 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020415040923.5808.qmail@web12903.mail.yahoo.com> Message-ID: Hi Scott, I'm not going to respond to all points but mainly concentrate on the last section. > > > Important Question: If an NDArray had a typecode (and it was a known > string), is it possible to promote it to one of the standard NumArray > types? > I think we want to avoid NDArray having any type attribute (Some types have subtypes and then the issue gets really messy). We leave it to the subclass to address how types will be handled. > Here goes (somewhat hypothetical, but close to the boat I'm currently in): > > Jon is our FPGA guy who makes screaming fast core files, but our FPGAs > don't do floating point. So I have to provide his driver with > ComplexInt16 > data. > > Jon and I write an extension module that calls his driver and reads data. > We also write a C routine (call it "munge") that takes both ComplexInt16 > data, and ComplexFloat64 data. We try it out for testing, and pass in my > arrays in both places. We could have used Numarray for the > ComplexFloat64, > but that meant we had to use two array packages, and use two C-APIs in our > extension. All we needed was a pointer to an array of doubles, > so we stuck > with mine. > > Ok, that part of development is done. Now we present it to the > application > developers. Their happy and we're rolling. Successful application. > > Another group find out about this and they want to use it. They're using > numarray for a large part of their application. In fact, their > calculating > the ComplexFloat64 half the data that they want to pass to my "munge" > routine using numarray, and they still need to use my ComplexInt32 data to > read the FPGA. > > They're going to be disappointed to find out my extension can't read > numarray data, and that they have to convert back and forth between the > two. And as the list of routines grow, they have to keep track of whether > it is a numarray-routine, or a scottarray-routine. > > It's not so bad for one simple "munge" function, but there are going to be > hundreds of functions... > > I don't expect you to have much sympathy for my having to convert > data back > and forth between my array types and yours, but it is an > avoidable problem. > > > > For the most part, we both agree on what parts an NDArray should have. If > we could only agree what to name them, and that we'd stick to those names, > that would be a large part of it for me. > > I'm not sure I understand the problem in all the details I need to. I'll restate it as best as I understand it and you can tell me if I understood incorrectly. You have extension modules that get complex int data from hardware. Other processing may be done to the complex int data in that format so it doesn't make sense to convert it to a more standard format when reading it in. You have C extensions that carry out certain tasks on complex data (in either complex int format or complex floats). You have users that would like to use your routine with numarray. (I haven't seen any specific mention of the need for ufuncs on complex ints so I'll assume you just need complex int arrays as containers for C programs to use.) [If you did need to perform ufuncs on complex ints, then extending numarray locally to handle them would be one possibility, but a little involved at the moment (a little easier later when we reimplement complex), then again, maybe not, the complex stuff is currently subclassed from numarray and not that hard to adapt to ints I think, but it isn't that well done now]. I guess my initial reaction is that you should develop a front- end C-API that handles obtaining data buffers from different sources. You get to define what kinds of things it supports, and changes to either the list of types you support and localizes any dependencies on our or anyone else's api to a small section of code. From what I'm hearing, you don't need it to provide much (pointer to arrays and associated information). If we are real bozos and change the interface, it doesn't hurt you much (not that we intend to be bozos or change the C-API willy nilly :-) To elaborate, you define your equivalent of our getNumInfo routine I don't think I've seen anything that requires explicit dependencies on Python attributes. Sure, you could use the same attribute names and use Python calls to get those just as our getNumInfo routine does, but I think that is bad practice. You may find some other representation for arrays out there that doesn't fit this model and you may want to work with those also and you won't be able to get them to adopt our scheme. You say that you don't want your users to have to convert between the two data representations. If they are using your C extensions that is understandable, and avoidable since you've written your programs to deal with the various types. On the other hand, unless you extend numarray, numarray clearly cannot deal with the complex ints so conversion is necessary. But understandably, you would like to eliminate the need for explicit conversions. I think there is an easy way of dealing with this. We haven't implemented this capability yet but we've been talking about having numarray check input values to see if they have a method "tonumarray" [not that we would choose that particular method name, I'm just illustrating the point]. If that method did exist, it would be called to create a numarray from the object. Thus you could add such a method to your class and when it is used in numarray ufuncs or in binary operations with numarray objects, your complex ints are automatically converted to numarray objects (presumably a complex float of some precision). Adding this capability to numarray should be pretty easy. True, the solution that I proposed doesn't protect you from making any changes ever. But we believe we are at a stage in the project where it is dangerous to lock ourselves into lower level details such as the internal description of the array. We still have things to implement and that may cause us to realize that some changes are needed. Our C-API stuff is relatively new. It may see changes in the near future, but likely not many related to what you need. And we intend to shield the C-API from changes in the Python attributes. We could change the name or contents of _byteswap and it would not change anything in the C-API. I see premature coupling of low level implementation details as a bad thing, not a good thing. Any change that are made to the API require changes only the corresponding routine in your C-API, and all your C applications are shielded from any changes (save rebuilding). If I've misunderstood your examples, please let me know. Perry From xscottg at yahoo.com Mon Apr 15 15:33:10 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 15 15:33:10 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020415223223.5901.qmail@web12905.mail.yahoo.com> Hi Perry. Well, I don't think I've made any progress convincing you that standardizing what it means to be an interoperable "NDArray" would be good for me or others in the community, but I do appreciate you letting me try. I'll take your suggestion and make my C-API understand a superset of array types. I'll wait to see how the tonumarray() thing pans out. That might meet all of my practical concerns even if I don't think it is as elegant of a solution as defining a strong interface. I'll just respond to the one point below. If I had to sum up my argument for why I think separate array implementations could (should) be compatible, it is buried in the answer to this question. > > > > > Important Question: If an NDArray had a typecode (and it was a known > > string), is it possible to promote it to one of the standard NumArray > > types? > > > > I think we want to avoid NDArray having any type attribute (Some types > have subtypes and then the issue gets really messy). We leave it > to the subclass to address how types will be handled. > Ok that's what you're currently doing, but let me rephrase the question. :-) Given a "leaf type" -- something that is really well specified and very similar on all modern platforms: "Int32" - not just an arbitrary "Int" "Float64" - not just an arbitrary "Float") Do you think you could write a general purpose _function_ that converted an "NDArray" to a full featured "NumArray"? I know this would be in Python, but let's pretend it's a C++ prototype to make the types clear: NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) { if (WellKnownLeafTypecodeString(typecode)) { /* fill in the blanks here */ return NumArray(result) } throw "conversion really is impossible"; } Cheers and thanks again for your time, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Tue Apr 16 08:15:09 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 16 08:15:09 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020415223223.5901.qmail@web12905.mail.yahoo.com> Message-ID: > > > Important Question: If an NDArray had a typecode (and it was a known > > > string), is it possible to promote it to one of the standard NumArray > > > types? > > > > > > > I think we want to avoid NDArray having any type attribute (Some types > > have subtypes and then the issue gets really messy). We leave it > > to the subclass to address how types will be handled. > > > > Ok that's what you're currently doing, but let me rephrase the question. > > :-) > > > Given a "leaf type" -- something that is really well specified and very > similar on all modern platforms: > > "Int32" - not just an arbitrary "Int" > "Float64" - not just an arbitrary "Float") > > > Do you think you could write a general purpose _function_ that > converted an > "NDArray" to a full featured "NumArray"? I know this would be in Python, > but let's pretend it's a C++ prototype to make the types clear: > > > NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) { > if (WellKnownLeafTypecodeString(typecode)) { > > /* fill in the blanks here */ > > return NumArray(result) > } > > throw "conversion really is impossible"; > } > I'm not sure I understand exactly what you are trying to do here, but I try to address the question as best I can. If one had an NDArray that happened to contain a type that numarray supported, yes it is possible (in fact RecArray does that sort of thing). If your point is that in doing so one must use the private attributes such as _strides, yes that is true. These attributes are private in the sense that users of instances of these objects should never have cause to access them. But it does not mean that classes that subclass NDArray or any of its subclasses, should not access them. They are not private in the sense of the class family (one reason we didn't use __strides since that mechanism is not usable (easily anyway) for subclasses. In that sense, the attributes form an interface within the class family. Some class extenders may need to access them, sure. Perry From omar.mekkaoui at eco.u-cergy.fr Tue Apr 16 11:04:38 2002 From: omar.mekkaoui at eco.u-cergy.fr (mekkaoui) Date: Tue Apr 16 11:04:38 2002 Subject: [Numpy-discussion] Extension under windows Message-ID: <3CBC68D9.64B7C425@eco.u-cergy.fr> Dear Numerical Python Users, I have writen an extension using GSL (Gnu Scientific Library) and Numerical Python. This extension work fine under Linux and I would to do the same under Windows. For that I use Cygwin. When I would create the module $ gcc -shared Example.o -o Example.pyd I receive this message : Example.o<.text+0x58>:Example.c: undefined reference to 'PyArg_ParseTuple' Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue' Example.o<.text+0x1b1>:Example.c: undefined reference to 'Py_InitModule4' Example.o<.text+0x1c1>:Example.c: undefined reference to 'PyImport_ImportModule' Example.o<.text+0x1db>:Example.c: undefined reference to 'PyModule_GetDict' Example.o<.text+0x1f4>:Example.c: undefined reference to 'PyDict_GetItemString' Example.o<.text+0x206>:Example.c: undefined reference to 'PyCObject_Type' Example.o<.text+0x214>:Example.c: undefined reference to 'PyCObject_AsVoidPtr' Perhaps this command is wrong. Perhaps, anyone could explain or show me a document which explain the procedure clearly ? Thanks in advance for your help Omar From xscottg at yahoo.com Tue Apr 16 16:38:02 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Tue Apr 16 16:38:02 2002 Subject: [Numpy-discussion] Introduction Message-ID: <20020416233700.72472.qmail@web12904.mail.yahoo.com> --- Perry Greenfield wrote: > > If one had an NDArray that happened to contain a type that numarray > supported, yes it is possible (in fact RecArray does that sort of thing). > > If your point is that in doing so one must use the private attributes > such as _strides, yes that is true. > My point was simply: = One *can* convert from (NDArray + typecode) to a full NumArray = You *do* already convert lists, tuples, ... to NumArrays in ufuncs = So you *could* convert *(NDArrays + typecode) to NumArrays in ufuncs in the same place that checks to see if it is a list, tuple, ... Therefore: = You possibly *could* standardize the attributes in an NDArray (buffer, typecode, shape, stride, offset, ...) = If you *did* standardize the attributes, then others *could* build UserDefinedNDArrays however they see fit and they would work with NumArrays However I get the sense that the numarray module is your baby, and you don't want to change him too much. That's very understandable, you're a proud parent. Truth be told, he's a good looking kid, and I look forward to hanging out with him when he's all grown up. We just have a little different view on parenting, and I was hoping my kid would have an easier time playing with yours. Now that I've beaten that silly metaphor to death... :-) Cheers, -Scott ps: It occurs to me, with the strong sense of encapsulation you desire, that I could have presented this better as requesting that you specify a set of standard *methods* instead of attributes. Something like: def __array_getbuffer__(self): def __array_getoffset__(self): def __array_getshape__(self): def __array_getstrides__(self): def __array_getitemsize__(self): def __array_gettypecode__(self): def __array_getendian__(self): # Who knows what the real list would consist of... # We never got to discuss what a really general # purpose description of an NDArray would require... Then anything which implemented those standard *methods* would be a viable NDArray. From my point of view it amounts to about the same thing, but I think it's a better design and that you might like this idea more. However I'm getting out of breath on this topic, and I have other things I need to do (I'm sure this is true for you too), so if you don't see any merit in this idea, I won't push for it any further. Cheers again. __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Tue Apr 16 17:52:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 16 17:52:03 2002 Subject: [Numpy-discussion] Conclusion In-Reply-To: <20020416233700.72472.qmail@web12904.mail.yahoo.com> Message-ID: After Scott's last display of his powers of persuasion, I lack for a meaningful response. It seems appropriate to declare this thread closed. Besides, I've got to go change some diapers ;-) Perry From paul at pfdubois.com Wed Apr 17 07:17:07 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Wed Apr 17 07:17:07 2002 Subject: [Numpy-discussion] Extension under windows In-Reply-To: <3CBC68D9.64B7C425@eco.u-cergy.fr> Message-ID: <000301c1e61a$20c2a590$0a01a8c0@NICKLEBY> You need to link with the Python library. I suggest you learn to use distutils and then it will load for you correctly on both platforms. The file "setup.py" in the Numeric source distribution is a good if complicated example. Some of the setup.py files in the Packages area are simpler and easier to understand. -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of mekkaoui Sent: Tuesday, April 16, 2002 11:09 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Extension under windows Dear Numerical Python Users, I have writen an extension using GSL (Gnu Scientific Library) and Numerical Python. This extension work fine under Linux and I would to do the same under Windows. For that I use Cygwin. When I would create the module $ gcc -shared Example.o -o Example.pyd I receive this message : Example.o<.text+0x58>:Example.c: undefined reference to 'PyArg_ParseTuple' Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue' Example.o<.text+0x1b1>:Example.c: undefined reference to 'Py_InitModule4' Example.o<.text+0x1c1>:Example.c: undefined reference to 'PyImport_ImportModule' Example.o<.text+0x1db>:Example.c: undefined reference to 'PyModule_GetDict' Example.o<.text+0x1f4>:Example.c: undefined reference to 'PyDict_GetItemString' Example.o<.text+0x206>:Example.c: undefined reference to 'PyCObject_Type' Example.o<.text+0x214>:Example.c: undefined reference to 'PyCObject_AsVoidPtr' Perhaps this command is wrong. Perhaps, anyone could explain or show me a document which explain the procedure clearly ? Thanks in advance for your help Omar _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From magnus at hetland.org Wed Apr 17 07:32:31 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Wed Apr 17 07:32:31 2002 Subject: [Numpy-discussion] Graphs in numarray? Message-ID: <20020417163133.F7565@idi.ntnu.no> I'm looking at various ways of implementing graphs in Python (beyond simple dict-based stuff -- more performance is needed). kjbuckets looks like a nice alternative, as does the Boost Graph Library (not sure how easy it is to use with Boost.Python) but if numarray is to become a part of the standard library, it could be beneficial to use that... For dense graphs, it makes sense to use an adjacency matrix directly in numarray, I should think. (I haven't implemented many graph algorithms with ufuncs yet, but it seems doable...) For sparse graphs I guess some sort of sparse array implementation would be useful, although the archives indicate that creating such a thing isn't a core part of the numarray project. What do you think -- is it reasonable to use numarray for graph algorithms? Perhaps an additional module with standard graph algorithms would be interesting? (I'm sure I could contribute some if there is any interest...) And -- is there any chance of getting sparse matrices in numarray? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From perry at stsci.edu Wed Apr 17 12:10:32 2002 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 17 12:10:32 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020417163133.F7565@idi.ntnu.no> Message-ID: Hi Magnus, On Behalf Of Magnus Lie Hetland > > I'm looking at various ways of implementing graphs in Python (beyond > simple dict-based stuff -- more performance is needed). kjbuckets > looks like a nice alternative, as does the Boost Graph Library (not > sure how easy it is to use with Boost.Python) but if numarray is to > become a part of the standard library, it could be beneficial to use > that... > > For dense graphs, it makes sense to use an adjacency matrix directly > in numarray, I should think. (I haven't implemented many graph > algorithms with ufuncs yet, but it seems doable...) For sparse graphs > I guess some sort of sparse array implementation would be useful, > although the archives indicate that creating such a thing isn't a core > part of the numarray project. > First of all, it may make sense, but I should say a few words about what scale sizes make sense. Currently numarray is implemented mostly in Python (excepting the very low level, very simple C functions that do the computational and indexing loops. This means it currently has a pretty sizable overhead to set up an array operation (I'm guessing an order of magnitude slower than Numeric). Once set up, it generally is pretty fast. So it is pretty good for very large data sets. Very lousy for very small ones. We haven't measured efficiency lately (we are deferring optimization until we have all the major functionality present first), but I wouldn't be at all surprised to find that the set up time can be equal to the time to actually process ~10,000-20,000 elements (i.e., the time spent per element for a 10K array is roughly half that for much larger arrays. So if you are working with much smaller arrays than 10K, you won't see total execution time decrease much (it was already spending half its time in setup, which doesn't change). We would like to reduce this size threshhold in the future, either by optimizing the Python code, or moving some of it into C. This optimization wouldn't be for at least a couple more months; we have more urgent features to deal with. I doubt that we will ever surpass the current Numeric in its performance on small arrays (though who knows, perhaps we can come close). > What do you think -- is it reasonable to use numarray for graph > algorithms? Perhaps an additional module with standard graph > algorithms would be interesting? (I'm sure I could contribute some if > there is any interest...) > Before I go further, I need to find out if the preceeding has made you gasp in horror or if the timescale is too slow for you to accept. (This particular issue also makes me wonder if numarray would ever be a suitable substitute for the existing array module). What size graphs are you most concerned about as far as speed goes? > And -- is there any chance of getting sparse matrices in numarray? > Since talk is cheap, yes :-). But I doubt it would be in the "core" and some thought would have to be given to how best to represent them. In one sense, since the underlying storage is different than numarray assumes for all its arrays, sparse arrays don't really share the same underlying C machinery very well. While it certainly would be possible to devise a class with the same interface as numarray objects, the implementation may have to be completely different. On the other hand, since numarray has much better support for index arrays, i.e., an array of indices that may be used to index another array of values, index array(s), value array pair may itself serve as a storage model for sparse arrays. One still needs to implement ufuncs and other functions (including simple things like indexing) using different machinery. It is something that would be nice to have, but I can't say when we would get around to it and don't want to raise hopes about how quickly it would appear. Perry From victor at idaccr.org Wed Apr 17 15:25:24 2002 From: victor at idaccr.org (Victor S. Miller) Date: Wed Apr 17 15:25:24 2002 Subject: [Numpy-discussion] The right way to use results of argmax and argmin Message-ID: # I'm running python 2.0 on Solaris and Numeric 21.0 #I have an m by n array -- called a and have # j an n long list of integers in range(m), such as j = argmax(a,0) # If I set z = zip(j,range(len(j))) # and try the statement res = take(a,z) # python appears to hang, but if I do res = array(map(lambda x,a=a: a[x[0],x[1]]],z) # It works. # Is there a simpler way of doing what I want, and why does take hang? # is it, perhaps, allocating some n by n work array (this would # probably make things thrash like crazy)? -- Victor S. Miller | " ... Meanwhile, those of us who can compute can hardly victor at idaccr.org | be expected to keep writing papers saying 'I can do the CCR, Princeton, NJ | following useless calculation in 2 seconds', and indeed 08540 USA | what editor would publish them?" -- Oliver Atkin From magnus at hetland.org Thu Apr 18 07:55:19 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 18 07:55:19 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: ; from perry@stsci.edu on Wed, Apr 17, 2002 at 03:06:12PM -0400 References: <20020417163133.F7565@idi.ntnu.no> Message-ID: <20020418165403.E300@idi.ntnu.no> Perry Greenfield : [snip] > First of all, it may make sense, but I should say a few words about > what scale sizes make sense. [snip] > So if you are working with much smaller arrays than 10K, you won't > see total execution time decrease much In relation to what? Using dictionaries etc? Using the array module? [snip] > Before I go further, I need to find out if the preceeding has made > you gasp in horror or if the timescale is too slow for you to > accept. Hm. If you need 10000 elements before numarray pays off, I'm starting to wonder if I can use it for anything at all. :I > (This particular issue also makes me wonder if numarray would > ever be a suitable substitute for the existing array module). Indeed. > What size graphs are you most concerned about as far as speed goes? I'm not sure. A wide range, I should imagine. But with only 100 nodes, I'll get 10000 entries in the adjacency matrix, so perhaps it's worthwile anyway? > > And -- is there any chance of getting sparse matrices in numarray? > > Since talk is cheap, yes :-). But I doubt it would be in the "core" > and some thought would have to be given to how best to represent them. > In one sense, since the underlying storage is different than numarray > assumes for all its arrays, sparse arrays don't really share the > same underlying C machinery very well. While it certainly would be > possible to devise a class with the same interface as numarray objects, > the implementation may have to be completely different. Yes, I realise that. > On the other hand, since numarray has much better support for index > arrays, i.e., an array of indices that may be used to index another > array of values, index array(s), value array pair may itself serve > as a storage model for sparse arrays. That's an interesting idea, although I don't quite see how it would help in the case of adjacency matrices. (You'd still need at least one n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... Right?) > One still needs to implement ufuncs and other functions (including > simple things like indexing) using different machinery. It is > something that would be nice to have, but I can't say when we would > get around to it and don't want to raise hopes about how quickly it > would appear. No - no problem. Basically, I'm looking for a platform to implement graph algorithms that doesn't necessitate too many installed packages etc. numarray seemed promising since it's a candidate for inclusion in the standard library. I guess I'll just have to do some timing experiments... > Perry -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From perry at stsci.edu Thu Apr 18 08:22:06 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 18 08:22:06 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020418165403.E300@idi.ntnu.no> Message-ID: > Behalf Of Magnus Lie Hetland > Perry Greenfield : > [snip] > > First of all, it may make sense, but I should say a few words about > > what scale sizes make sense. > [snip] > > So if you are working with much smaller arrays than 10K, you won't > > see total execution time decrease much > > In relation to what? Using dictionaries etc? Using the array module? No, in relation to operations on a 10K array. Basically, if an operation on a 10K array spends half its time on set up, operations on a 10 element array may only be twice as fast. I'm not making any claims about speed in relation to any other data structure (other than Numeric) > [snip] > > Before I go further, I need to find out if the preceeding has made > > you gasp in horror or if the timescale is too slow for you to > > accept. > > Hm. If you need 10000 elements before numarray pays off, I'm starting > to wonder if I can use it for anything at all. :I > I didn't make clear that this threshold may improve in the future (decrease). The corresponding threshold for Numeric is probably around 1000 to 2000 elements. (Likewise, operations on 10 element Numeric arrays are only about twice as fast as for 1K arrays) We may be able to eventually improve numarray performance to something in that neighborhood (if we are luckly) but I would be surprised to do much better (though if we use caching techniques, perhaps repeated cases of arrays of identical shape, strides, type, etc. may run much faster on subsequent operations). As usual, performance issues can be complicated. You have to keep in mind that Numeric and numarray provide much richer indexing and conversion handling feature than something like the array module, and that comes at some price in performance for small arrays. > > (This particular issue also makes me wonder if numarray would > > ever be a suitable substitute for the existing array module). > > Indeed. > > > What size graphs are you most concerned about as far as speed goes? > > I'm not sure. A wide range, I should imagine. But with only 100 nodes, > I'll get 10000 entries in the adjacency matrix, so perhaps it's > worthwile anyway? > That's right, a 100 nodes is where performance is being competitive, and if you feel you are worried about cases larger than that, then it isn't a problem. But if you are operating mostly on small graphs, then it may not be appropriate. The corresponding threshold for numeric would be on the order of 30 nodes. > > On the other hand, since numarray has much better support for index > > arrays, i.e., an array of indices that may be used to index another > > array of values, index array(s), value array pair may itself serve > > as a storage model for sparse arrays. > > That's an interesting idea, although I don't quite see how it would > help in the case of adjacency matrices. (You'd still need at least one > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... > Right?) > Right. > From magnus at hetland.org Thu Apr 18 08:48:17 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 18 08:48:17 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: ; from perry@stsci.edu on Thu, Apr 18, 2002 at 11:21:46AM -0400 References: <20020418165403.E300@idi.ntnu.no> Message-ID: <20020418174733.A7072@idi.ntnu.no> Perry Greenfield : > [snip] > > In relation to what? Using dictionaries etc? Using the array module? > > No, in relation to operations on a 10K array. Basically, if an operation > on a 10K array spends half its time on set up, operations on a > 10 element array may only be twice as fast. I'm not making any claims > about speed in relation to any other data structure (other than Numeric) Aaah! Sorry to be so dense :) But the speedup in numeric between different sizes isn't as important to me as the speedup compared to other solutions (such as a dict-based one) of course... If a 10 element array is only twice as fast as a 10K array that's no problem if it's still faster than an alternative solution (though I'm sure it might not be...) The same goes for 10K element graphs -- the interesting point has to be whether it's faster than various alternatives (which I'm sure it is). > > [snip] > > > Before I go further, I need to find out if the preceeding has made > > > you gasp in horror or if the timescale is too slow for you to > > > accept. > > > > Hm. If you need 10000 elements before numarray pays off, I'm starting > > to wonder if I can use it for anything at all. :I > > > I didn't make clear that this threshold may improve in the future > (decrease). Right. Good. And -- on small graphs performance probably won't be much of a problem anyway. :) > The corresponding threshold for Numeric is probably > around 1000 to 2000 elements. (Likewise, operations on 10 element > Numeric arrays are only about twice as fast as for 1K arrays) > We may be able to eventually improve numarray performance to something > in that neighborhood (if we are luckly) but I would be surprised to > do much better (though if we use caching techniques, perhaps repeated > cases of arrays of identical shape, strides, type, etc. may run > much faster on subsequent operations). As usual, performance issues > can be complicated. You have to keep in mind that Numeric and numarray > provide much richer indexing and conversion handling feature than > something like the array module, and that comes at some price in > performance for small arrays. Of course. I guess an alternative (for the graph situation) could be to wrap the graphs with a common interface with various implementations, so that a solution more optimised for small graphs could be used (in a factory function) if the graph is small... (Not really an issue for me at the moment, but should be easy to do, I guess.) [snip] > > I'm not sure. A wide range, I should imagine. But with only 100 nodes, > > I'll get 10000 entries in the adjacency matrix, so perhaps it's > > worthwile anyway? > > > That's right, a 100 nodes is where performance is being competitive, > and if you feel you are worried about cases larger than that, then > it isn't a problem. Seems probable. For smaller problems I wouldn't be thinking in terms of numarray anyway, I think. (Just using plain Python dicts or something similar.) [snip] > > > On the other hand, since numarray has much better support for index > > > arrays, i.e., an array of indices that may be used to index another > > > array of values, index array(s), value array pair may itself serve > > > as a storage model for sparse arrays. > > > > That's an interesting idea, although I don't quite see how it would > > help in the case of adjacency matrices. (You'd still need at least one > > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... > > Right?) > > > Right. I might as well use a full adjacency matrix, then... So, the conclusion for now is that numarray may well be suited for working with relatively large (100+ nodes), relatively dense graphs. Now, the next interesting question is how much of the standard graph algorithms can be implemented with ufuncs and array operations (which I guess is the key to performance) and not straight for-loops... After all, some of them are quite sequential. -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From rob at pythonemproject.com Thu Apr 18 09:18:31 2002 From: rob at pythonemproject.com (rob) Date: Thu Apr 18 09:18:31 2002 Subject: [Numpy-discussion] Graphs in numarray? References: <20020418165403.E300@idi.ntnu.no> <20020418174733.A7072@idi.ntnu.no> Message-ID: <3CBEF151.C440DCE@pythonemproject.com> I'm sorry I missed the original post, but the topic is important for me. I use the lightweight 3d volume renderer Animabob for most everything. The interface code is in all of the FDTD programs in my website. You just unwind a 3d array and scale it to +/- 128, turn it into chararacters, and you have the input file. I wish Animabob could somehow be turned into a Python package, as in Windows you need Cygwin to run it. I've tried other 3d packages like OpenDX, and they seem to be huge albatrosses. -- ----------------------------- The Numeric Python EM Project www.pythonemproject.com From perry at stsci.edu Thu Apr 18 10:36:19 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 18 10:36:19 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <3CBEF151.C440DCE@pythonemproject.com> Message-ID: Behalf Of rob > > I'm sorry I missed the original post, but the topic is important for > me. I use the lightweight 3d volume renderer Animabob for most > everything. The interface code is in all of the FDTD programs in my > website. You just unwind a 3d array and scale it to +/- 128, turn it > into chararacters, and you have the input file. I wish Animabob could > somehow be turned into a Python package, as in Windows you need Cygwin > to run it. I've tried other 3d packages like OpenDX, and they seem to > be huge albatrosses. > It sound like you are trying to do something different than Magnus, but if what you are looking to scale floating or int data to byte size and apply some character mapping, numarray (or Numeric) should be able to do that very well. If that is all you want done, you might find either to be overkill though (if you already wrote a C extension to do so). Perry From perry at stsci.edu Thu Apr 18 10:39:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 18 10:39:03 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020418174733.A7072@idi.ntnu.no> Message-ID: > Now, the next interesting question is how much of the standard graph > algorithms can be implemented with ufuncs and array operations (which > I guess is the key to performance) and not straight for-loops... After > all, some of them are quite sequential. > I'm not sure about that (not being very familiar with graph algorithms). If you can give me some examples (perhaps off the mailing list) I could say whether they are easily cast into ufunc or library calls. Perry From paul at pfdubois.com Fri Apr 19 07:42:23 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Fri Apr 19 07:42:23 2002 Subject: [Numpy-discussion] [ANN] Pyfort 7.1 Message-ID: <000101c1e7ae$38ac0d50$0a01a8c0@NICKLEBY> Pyfort 7.1 is available at sf.net/projects/pyfortran. Support for single Fortran characters was added. (Michiel de Hoon) Corrected behavior of scalars with C routines. (Michiel de Hoon) Pyfort is a tool for connecting Python to Fortran. Just to let you know, I'm working on a little tool to make it easier to set up simple projects so that you can build and install them with less effort. I hope to have that available soon. From rob at pythonemproject.com Fri Apr 19 09:00:02 2002 From: rob at pythonemproject.com (rob) Date: Fri Apr 19 09:00:02 2002 Subject: [Numpy-discussion] Icc compiled Python Message-ID: <3CC03E5A.9835FC0A@pythonemproject.com> There has been some discussion on the FreeBSD Ports list about an Icc compiled Python. Benchmarks much faster than the normal gcc compiled version. I'm wondering if anyone here knows anything about it. The discussion can be accessed via www.geocrawler.org/ FreeBSD/ freebsd-ports. Rob. -- ----------------------------- The Numeric Python EM Project www.pythonemproject.com From juenglin at informatik.uni-freiburg.de Sat Apr 20 09:59:13 2002 From: juenglin at informatik.uni-freiburg.de (Ralf Juengling) Date: Sat Apr 20 09:59:13 2002 Subject: [Numpy-discussion] NumPy initiated reference counting Message-ID: <1019321875.8067.141.camel@leto> I'm currently tinkering with the following problem and what like to hear your suggestions: Within a C module I define a new Python type 'IM' (representing an image). The indexing or slicing facilities of NumPy arrays were tailormade for the manipulation of the internal data of its instances. Thus, I could provide a method 'asarray', which creates a properly typed array object 'a' referring to the data of an IM instance 'im': a = im.asarray() I could use PyArray_FromDimsAndData() to create the array instance. Unfortunately, this wouldn't work, since 'a' would not get notified about the death of 'im'. However, if I could prevent 'im' from being garbage collected before all array instances referring to its data are deleted, it should work. NumPy's array type uses a mechanism to prevent garbage collection of array instances if there are other instances that share data with it. My idea was, to use this mechanism, that is to let the asarray method increment im's reference count and let a->base refer to im. Do you think this is a reliable approach? Thanks, Ralf -- -------------------------------------------------------------------------- Ralf J?ngling Institut f?r Informatik - Lehrstuhl f?r Mustererkennung & Bildverarbeitung Georges-K?hler-Allee Geb?ude 52 Tel: +49-(0)761-203-8215 79110 Freiburg Fax: +49-(0)761-203-8262 -------------------------------------------------------------------------- From juenglin at informatik.uni-freiburg.de Sat Apr 20 12:22:51 2002 From: juenglin at informatik.uni-freiburg.de (Ralf Juengling) Date: Sat Apr 20 12:22:51 2002 Subject: [Numpy-discussion] qs on NumPy Message-ID: <1019330305.8067.158.camel@leto> Hi, I did not find a way in Python to check whether a Numeric array instance is a shared array or not. Could you confirm: there is no way. Is there work underway to make Numeric arrays subclassable? Regards, Ralf -- -------------------------------------------------------------------------- Ralf J?ngling Institut f?r Informatik - Lehrstuhl f?r Mustererkennung & Bildverarbeitung Georges-K?hler-Allee Geb?ude 52 Tel: +49-(0)761-203-8215 79110 Freiburg Fax: +49-(0)761-203-8262 -------------------------------------------------------------------------- From mok at imsb.au.dk Tue Apr 23 04:21:03 2002 From: mok at imsb.au.dk (Morten Kjeldgaard) Date: Tue Apr 23 04:21:03 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020417163133.F7565@idi.ntnu.no> Message-ID: > simple dict-based stuff -- more performance is needed). kjbuckets > looks like a nice alternative, as does the Boost Graph Library (not Kjbuckets is *very* nice indeed. It is a compact and very fast implementation. I don't see why you'd want to wrap this functionality into NumPy, which has a very well-defined scope and an efficient implentation. It would be a shame to bloat it with something which is discretely different. I have modified kjbuckets so that it compiles and works with Python 2.x. You can pick it up at ftp://xray.imsb.au.dk /pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm Just do "rpm --rebuild" on it. I sent the patch to the original author, but it appears he is no longer maintaining it. Never mind, it works great. /Morten -- Morten Kjeldgaard | Phone : +45 89 42 50 26 Institute of Molecular and Structural Biology | Fax : +45 86 12 31 78 Aarhus University | Home : +45 86 18 81 80 Gustav Wieds Vej 10 C, DK-8000 Aarhus C, Denmark | http://imsb.au.dk/~mok From magnus at hetland.org Thu Apr 25 07:28:05 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 25 07:28:05 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: ; from mok@imsb.au.dk on Tue, Apr 23, 2002 at 01:20:04PM +0200 References: <20020417163133.F7565@idi.ntnu.no> Message-ID: <20020425162734.B6821@idi.ntnu.no> Morten Kjeldgaard : > > > > simple dict-based stuff -- more performance is needed). kjbuckets > > looks like a nice alternative, as does the Boost Graph Library (not > > Kjbuckets is *very* nice indeed. Yes, I guess it is. But the project doesn't seem very active... > It is a compact and very fast implementation. I don't see why you'd > want to wrap this functionality into NumPy, which has a very > well-defined scope and an efficient implentation. It would be a > shame to bloat it with something which is discretely different. Yes, I guess you're right. There is no point in adding this sort of thing to numarray. My motivation for using numarray in my implementations was simply that it would mean that the necessery tools would be (or might be in the future ;) available in the standard distribution. > I have modified kjbuckets so that it compiles and works with Python 2.x. > You can pick it up at > > ftp://xray.imsb.au.dk > /pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm > > Just do "rpm --rebuild" on it. > > I sent the patch to the original author, but it appears he is no longer > maintaining it. Never mind, it works great. Well... I do sort of mind... I'm a bit wary of using unmaintained software. Not that I would never do it or anything... But I think it would be a bonus to use stuff that is being actively maintained and developed. But I guess I'll take another look at it. (Any idea where the "kj" prefix comes from, by the way?) > /Morten -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From magnus at hetland.org Thu Apr 25 07:43:10 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 25 07:43:10 2002 Subject: [Numpy-discussion] Non-numeric arrays? Message-ID: <20020425164228.C6821@idi.ntnu.no> I can't find this in the docs (although I've heard it's mentioned there)... Is support for non-numeric arrays (such as character arrays or object pointer arrays) as in Numeric planned for numarray? (Perhaps even supported? My version might not be themost recent...) And what about subclasses of numeric types? E.g: # numarray >>> class foo(int): pass >>> a = array(map(foo, xrange(10))) [...] TypeError: Expecting a python numeric type, got a foo # Numeric >>> class foo(int): pass >>> a = array(map(foo, xrange(10))) >>> tupe(a[0]) Neither behaviour seems very helpful -- I guess numarray's is cleaner... (Although in this case I think an object array could have been nice...) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From jmiller at stsci.edu Thu Apr 25 07:53:04 2002 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 25 07:53:04 2002 Subject: [Numpy-discussion] Non-numeric arrays? References: <20020425164228.C6821@idi.ntnu.no> Message-ID: <3CC81814.2010702@stsci.edu> Magnus Lie Hetland wrote: >I can't find this in the docs (although I've heard it's mentioned >there)... Is support for non-numeric arrays (such as character arrays >or object pointer arrays) as in Numeric planned for numarray? (Perhaps > Check out chararray for character arrays. Check out recarray for arrays of fixed length structs. To make your own non-numeric arrays, subclass NDArray. > >even supported? My version might not be themost recent...) > >And what about subclasses of numeric types? > >E.g: > ># numarray > >>>>class foo(int): pass >>>>a = array(map(foo, xrange(10))) >>>> >[...] >TypeError: Expecting a python numeric type, got a foo > ># Numeric > >>>>class foo(int): pass >>>>a = array(map(foo, xrange(10))) >>>>tupe(a[0]) >>>> > > >Neither behaviour seems very helpful -- I guess numarray's is >cleaner... (Although in this case I think an object array could have >been nice...) > Object arrays fall into the *eventually* category: planned but not imminent. > > >-- >Magnus Lie Hetland The Anygui Project >http://hetland.org http://anygui.org > >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > Todd -- Todd Miller jmiller at stsci.edu STSCI / SSG (410) 338 4576 From magnus at hetland.org Thu Apr 25 07:54:02 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 25 07:54:02 2002 Subject: [Numpy-discussion] Non-numeric arrays? In-Reply-To: <20020425164228.C6821@idi.ntnu.no>; from magnus@hetland.org on Thu, Apr 25, 2002 at 04:42:28PM +0200 References: <20020425164228.C6821@idi.ntnu.no> Message-ID: <20020425165304.D6821@idi.ntnu.no> Magnus Lie Hetland : [snip] Just a quick explanation for why I'm interested in this... I've got a two-dimensional array of ints (or bytes, actually), that I would like to convert to a delimited string (e.g. comma-separated). This works in Numeric: >>> from string import letters >>> alphabet = array(letters) >>> data = arange(24) # E.g... >>> data.shape = 6, 4 >>> fields = sum(take(alphabet, data), 1) >>> ','.join(fields) 'abcd,efgh,ijkl,mnop,qrst,uvwx' -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From perry at stsci.edu Thu Apr 25 07:57:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 25 07:57:05 2002 Subject: [Numpy-discussion] Non-numeric arrays? In-Reply-To: <20020425164228.C6821@idi.ntnu.no> Message-ID: [I see Todd has already answered this, the following might add a little more detail] > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Magnus > Lie Hetland > Sent: Thursday, April 25, 2002 10:42 AM > To: Numpy-discussion > Subject: [Numpy-discussion] Non-numeric arrays? > > > I can't find this in the docs (although I've heard it's mentioned > there)... Is support for non-numeric arrays (such as character arrays > or object pointer arrays) as in Numeric planned for numarray? (Perhaps > even supported? My version might not be themost recent...) > Yes, in fact there is a character array class included with numarray (but not documented, I believe. For the moment, you'll have to deal with the source. We developed it for use with our I/O library but it seemed to be of general enough use to include with numarray. We also plan to support arrays of Python objects. There are various ways that this could be done and we ought to discuss how it should be done (perhaps multiple ways). But the underlying machinery certainly will support it. > And what about subclasses of numeric types? > > E.g: > > # numarray > >>> class foo(int): pass > >>> a = array(map(foo, xrange(10))) > [...] > TypeError: Expecting a python numeric type, got a foo > > # Numeric > >>> class foo(int): pass > >>> a = array(map(foo, xrange(10))) > >>> tupe(a[0]) > > > Neither behaviour seems very helpful -- I guess numarray's is > cleaner... (Although in this case I think an object array could have > been nice...) > We haven't had much time to think about how we deal with numeric subclasses. Certainly one would not use these for efficiency, I can't see any simple way of making such things go fast. But it may be possible to have such things work with numarray ufuncs and other numeric operations in some automatic way. I'd have to think about that. It's not high on the priority list at the moment. (Speaking of which I may post in a few days). Thanks, Perry > From hinsen at cnrs-orleans.fr Thu Apr 25 08:35:06 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Thu Apr 25 08:35:06 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020425162734.B6821@idi.ntnu.no> References: <20020417163133.F7565@idi.ntnu.no> <20020425162734.B6821@idi.ntnu.no> Message-ID: Magnus Lie Hetland writes: > (Any idea where the "kj" prefix comes from, by the way?) I asked Aaron Watter about this. The answer: k and j are the initials of his children. Konrad. From jasper at peak.org Mon Apr 29 03:14:04 2002 From: jasper at peak.org (Jasper Phillips) Date: Mon Apr 29 03:14:04 2002 Subject: [Numpy-discussion] Multiple Linear Regression? Message-ID: <200204291013.DAA32745@spock.peak.org> I'm helping my wife with programming for her economics thesis, which needs to calculate a "Multiple Linear Regression" on her data. Does anyone know of any (preferably though not necesarrily free) software that can do this? I'm working in Python, but not limited to it as I can relatively freely access other languages. I'm still looking for a library written in Python, but haven't had any luck. My second thought was Matlab, but looking over the Matlab website, I couldn't find anything like this by a name I recognize. It looks like I might be able to construct something out of a combination of Sparse Matrices and Linear Regesstion, or perhaps the stuff for overdetermined Linear Equations? Another option may be LAPACK routines, but I'm not familiar with those. Does anyone here have any experience with this kind of stuff? Is there a better place to ask? I'm about ready to take a shot at writing something myself, but I'd really rather avoid this if it's been done before. -Jasper From hinsen at cnrs-orleans.fr Mon Apr 29 05:40:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Apr 29 05:40:03 2002 Subject: [Numpy-discussion] Multiple Linear Regression? In-Reply-To: <200204291013.DAA32745@spock.peak.org> References: <200204291013.DAA32745@spock.peak.org> Message-ID: Jasper Phillips writes: > I'm still looking for a library written in Python, but haven't had any luck. Numerical Python has all the basic stuff, but you need to read in and arrange the data yourself. All linear regression problems ultimately become least-squares problems for a system of linear equations, which can be solved using LinearAlgebra.linear_least_squares. Konrad. From Alexandre.Fayolle at logilab.fr Mon Apr 29 06:20:03 2002 From: Alexandre.Fayolle at logilab.fr (Alexandre) Date: Mon Apr 29 06:20:03 2002 Subject: [Numpy-discussion] Multiple Linear Regression? In-Reply-To: <200204291013.DAA32745@spock.peak.org> References: <200204291013.DAA32745@spock.peak.org> Message-ID: <20020429131937.GE30347@orion.logilab.fr> On Mon, Apr 29, 2002 at 03:13:44AM -0700, Jasper Phillips wrote: > I'm helping my wife with programming for her economics thesis, which needs > to calculate a "Multiple Linear Regression" on her data. > > Does anyone know of any (preferably though not necesarrily free) software > that can do this? I'm working in Python, but not limited to it as I > can relatively freely access other languages. > > I'm still looking for a library written in Python, but haven't had any luck. > I'm helping my wife with her History PhD, and have to deal with similar stuff. I found R to be a very useful environment for statistical computations. R is a free software clone of S-plus, which is to statistics what Matlab is to linear algebra and automation. Pros: - programming environment, with a high level programming language - extensive statistical and linalg library (using C and FORTRAN code) - lots of third party code available, covering a very wide range of situations - Python bindings available if you don't want to learn the Scheme-like language - Tons of documentation available - Excellent support through the mailing lists - GPL'd - Tons of way to import data (ranging from CSV files to ODBC queries) - 2 printed books available, at Springer Verlag - postscript, png, wmf, X outputs, with precise control of the layout of the graphs and figures available for a nice colourful thesis Cons: - the language can be a bit weird at times (it took me some time to get used to '.' being used instead of '_' and vice versa in the scoping and variable naming), but you can use Python to script R, thanks to RPython - it's quite a big piece of code, with a rather steep learning curve and you need time to get inside it - the documentation is aimed at professional statisticians. I had to dig back in my statistics courses and to buy a couple of books on that topic for the software to become really useful. Asking newbie statistician questions on the r-help mailing list is off-topic - the springer verlag books are very expensive (Modern Applied Statistics with S-plus costs something like 70 euros), but they are great So you have a powerful tool available at your fingertips, designed to do precisely what you need. I think it's worth taking the time to look at it carefully. The more I get to understand the topic, the more ideas I get for new ways of exploring the data of my wife's PhD. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle at logilab.fr Mon Apr 29 06:28:08 2002 From: Alexandre.Fayolle at logilab.fr (Alexandre) Date: Mon Apr 29 06:28:08 2002 Subject: [Numpy-discussion] Multiple Linear Regression? In-Reply-To: <20020429131937.GE30347@orion.logilab.fr> References: <200204291013.DAA32745@spock.peak.org> <20020429131937.GE30347@orion.logilab.fr> Message-ID: <20020429132741.GF30347@orion.logilab.fr> On Mon, Apr 29, 2002 at 03:19:37PM +0200, Alexandre wrote: > I'm helping my wife with her History PhD, and have to deal with similar > stuff. I found R to be a very useful environment for statistical > computations. R is a free software clone of S-plus, which is to statistics > what Matlab is to linear algebra and automation. Woops, I forgot to add a couple of URLs: The R project website http://www.r-project.org/ The Comprehensive R Archive Network (CRAN) http://cran.r-project.org/ Using R from Python http://rpy.sourceforge.net/ Using R from Python and Python from R (coding R extensions in Python) http://www.omegahat.org/RSPython/ Cheers, Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From cavallo at kip.uni-heidelberg.de Mon Apr 29 10:10:20 2002 From: cavallo at kip.uni-heidelberg.de (cavallo at kip.uni-heidelberg.de) Date: Mon Apr 29 10:10:20 2002 Subject: [Numpy-discussion] kdfio, 1.1.1 Message-ID: hy, here is the url last version of kdfio a khoros/cantata kdf file importer: nothing special, but it seems working now, at least for me;-) You can find it at: http://kdfio.sourceforge.net This is my (very) small contribution to the numerical python: inside i plugged a way to modularize the code (and writing some skeleton semi-automatically) that could speed up a litte bit writing new code. Before to give a full announcement on sourceforge i will wait a little bit, just to see if there are no bugs around. Fell free to use/change/make what you want, thanks to all, antonio cavallo ps. khoros is available at http://www.khoral.com and it is not a free program: there is just a free student version. From jmiller at stsci.edu Mon Apr 29 10:14:07 2002 From: jmiller at stsci.edu (Todd Miller) Date: Mon Apr 29 10:14:07 2002 Subject: [Numpy-discussion] ANN: Numarray-0.3.3 Message-ID: <3CCD7F49.5030809@stsci.edu> Numarray 0.3.3 --------------------------------- Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, and improved type promotions. Numarray-0.3.3 features improved support for arrays of complex numbers, re-implementing complex types using generated code. In addition to being faster, the new complex ufuncs are better integrated with the numarray type system, so operations between numarrays and complex scalars now work properly. This release also fixes a problem experienced by RedHat Linux users installing numarray from source. WHERE ----------- Numarray-0.3.3 windows executable installers and source code tar ball is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS -------------------------- numarray-0.3.3 requires Python 2.0 or greater. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. Thanks go to Jochen Kupper of the University of North Carolina for his work on Numarray and for porting the Numarray manual to TeX format. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From haase at msg.ucsf.edu Mon Apr 29 11:18:15 2002 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Mon Apr 29 11:18:15 2002 Subject: [Numpy-discussion] unsigned short support in NumPy Message-ID: Hi all, I'm _very_ new to NumPy. I was interested in using it for our project, where we acquire data from a CCD camera. The Problem: Each pixel in the image is a 16 bit gray value. What I read in the documentation - there is only 8 bit (unsigned integer) support in numpy (or should I say numericarray) Are there plans to add a "unsigned short" (16 bit) support . How much effort would that be. Regards, Sebastian Haase -- _\\|//_ (' O-O ') ------------------------------ooO-(_)-Ooo-------------------------------------- Sebastian Haase University of California, San Francisco (415)502-4316 From rick at bioinformatics.org Mon Apr 29 11:35:26 2002 From: rick at bioinformatics.org (Rick Ree) Date: Mon Apr 29 11:35:26 2002 Subject: [Numpy-discussion] testing Numeric.array([0]) Message-ID: <1020105268.12239.27.camel@loco.ucdavis.edu> Should Numeric.array([0]) test false? This seems counterintuitive, and is not the case for the regular python array module. This recently caused a subtle bug for me when I wanted to find the indices of an array that met a condition. If only the first element met the condition, the result was array([0]) -- a non-empty result that evaluated false. If this is the intended behavior, can someone tell me the reason? thanks, Rick From perry at stsci.edu Mon Apr 29 13:31:14 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Apr 29 13:31:14 2002 Subject: [Numpy-discussion] unsigned short support in NumPy In-Reply-To: Message-ID: > Sebastian Haase writes: > > Hi all, > I'm _very_ new to NumPy. > I was interested in using it for our project, where we acquire > data from a > CCD camera. > > The Problem: Each pixel in the image is a 16 bit gray value. > What I read in the documentation - there is only 8 > bit (unsigned > integer) support in numpy (or should I say numericarray) > > Are there plans to add a "unsigned short" (16 bit) support . > How much effort would that be. > There is a reimplemenation of Numeric that we are doing that does support unsigned ints (Unsigned Int8, Unsigned Int16 for now). The project is not mature, but a lot of basic cabability exists now. You'll have to look it over to judge if it is usable for you now. The new version is called numarray ( http://stsdas.stsci.edu/numarray ) (btw, we acquire data from CCD cameras as well ;-) Perry From tchur at optushome.com.au Mon Apr 29 13:46:39 2002 From: tchur at optushome.com.au (Tim Churches) Date: Mon Apr 29 13:46:39 2002 Subject: [Numpy-discussion] Multiple Linear Regression? References: <200204291013.DAA32745@spock.peak.org> Message-ID: <3CCDBB6C.8A983A5C@optushome.com.au> Jasper Phillips wrote: > > I'm helping my wife with programming for her economics thesis, which needs > to calculate a "Multiple Linear Regression" on her data. > > Does anyone know of any (preferably though not necesarrily free) software > that can do this? I'm working in Python, but not limited to it as I > can relatively freely access other languages. Jasper, Use R (a free implementation of S). See http://www.r-project.org If you are managing your data in Python and NumPy, you can "embed" R in Python and transparently send data to it using Walter Moreira's wonderful RPy module - see http://rpy.sf.net Tim C From clee at spiralis.merseine.nu Mon Apr 1 09:06:59 2002 From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu) Date: Mon Apr 1 09:06:59 2002 Subject: [Numpy-discussion] slice question and bug Message-ID: <20020401165852.44D3E79B@spiralis.merseine.nu> Hello, I'm trying to track down a segv when I do the B[:] operation on an array, "B", a that I've built in as a view on external data. During the process I ran into the following code (Numeric-21.0): /* {%c++%} */ extern int PyArray_Free(PyObject *op, char *ptr) { PyArrayObject *ap = (PyArrayObject *)op; int i, n; if (ap->nd > 2) return -1; if (ap->nd == 3) { n = ap->dimensions[0]; for (i=0; ind >= 2) { free(ptr); } Py_DECREF(ap); return 0; } /* {%c++%} */ The multiple, incompatible tests of ap->nd are the problem. -chris From clee at spiralis.merseine.nu Mon Apr 1 10:59:02 2002 From: clee at spiralis.merseine.nu (clee at spiralis.merseine.nu) Date: Mon Apr 1 10:59:02 2002 Subject: [Numpy-discussion] slice question and bug In-Reply-To: <20020401165852.44D3E79B@spiralis.merseine.nu> References: <20020401165852.44D3E79B@spiralis.merseine.nu> Message-ID: <15528.44386.160013.936132@spiralis.merseine.nu> clee at spiralis.merseine.nu writes: > > Hello, > I'm trying to track down a segv when I do the B[:] operation on an > array, "B", a that I've built in as a view on external data. During... > [snip] To clarify my own somewhat non-sensical post: When I started composing my message, I was trying to figure out a bug in my own code that caused a crash while doing slice_array. I've since fixed that bug. However, in the process of figuring out what I was doing wrong I was browsing the Numeric source code. While examining PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the number of dimensions is greater than 2, yet it has code that tests for when the number of dimensions equals 3. So utimately, my post is just an alert, that I think there might be some code that needs to be cleaned up. Thanks, lacking-caffeine-ly yours -chris From nwagner at mecha.uni-stuttgart.de Wed Apr 3 11:48:47 2002 From: nwagner at mecha.uni-stuttgart.de (Nils Wagner) Date: Wed Apr 3 11:48:47 2002 Subject: [Numpy-discussion] Factorization of complex symmetric matrices Message-ID: <3CAABF60.12D609C0@mecha.uni-stuttgart.de> Hi, I am looking for a suitable factorization of complex symmetric matrices. Where can I find a proper routine ? Nils From ray_drew at yahoo.co.uk Thu Apr 4 02:27:09 2002 From: ray_drew at yahoo.co.uk (Ray Drew) Date: Thu Apr 4 02:27:09 2002 Subject: [Numpy-discussion] RandomArray difference between Python2.1 and 2.2? References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu> Message-ID: <000b01c1dbc3$65fe6100$6014000a@RDREWXP> Hi, Can anyone explain the following? Python 2.1.1, Numpy version='20.2.0' Python 2.1.1 (#20, Jul 20 2001, 01:19:29) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> from RandomArray import * >>> normal(3., 1., (5,)) array([ 2.19091588, 2.44682837, 2.51790264, 4.26374364, 4.56880629]) Python 2.2, Numpy version='20.3' Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. IDLE 0.8 -- press F1 for help >>> from RandomArray import * >>> normal(3., 1., (5,)) array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304]) Why am I getting negative values with Python 2.2? This happens consistently. Any help would be appreciated. Thanks, Ray _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com From pearu at cens.ioc.ee Thu Apr 4 18:51:36 2002 From: pearu at cens.ioc.ee (Pearu Peterson) Date: Thu Apr 4 18:51:36 2002 Subject: [Numpy-discussion] RandomArray difference between Python2.1 and 2.2? In-Reply-To: <000b01c1dbc3$65fe6100$6014000a@RDREWXP> Message-ID: On Thu, 4 Apr 2002, Ray Drew wrote: > Python 2.2, Numpy version='20.3' > > Python 2.2 (#28, Dec 21 2001, 12:21:22) [MSC 32 bit (Intel)] on win32 > Type "copyright", "credits" or "license" for more information. > IDLE 0.8 -- press F1 for help > >>> from RandomArray import * > >>> normal(3., 1., (5,)) > array([-3.78572679, -3.63714516, -3.01228334, -4.80211985, -2.57420304]) > > Why am I getting negative values with Python 2.2? This happens consistently. > Any help would be appreciated. This is a bug in Numpy 20.3 and should be fixed in Numpy 21.0. Pearu From kelson at fedka.ociw.edu Fri Apr 5 08:23:49 2002 From: kelson at fedka.ociw.edu (Daniel D. Kelson) Date: Fri Apr 5 08:23:49 2002 Subject: [Numpy-discussion] Error in MLab.py Message-ID: <200204040140.g341e1e04422@fedka.ociw.edu> Howdy: Shoudln't line 296 in MLab.py of Numeric 21.0, which currently reads: val = squeeze(dot(transpose(m)*conjugate(y)) / fact) read: val = squeeze(dot(transpose(m),conjugate(y)) / fact) Thanks, D.Kelson Carnegie Observatories http://www.ociw.edu/~kelson From DavidA at ActiveState.com Fri Apr 5 13:47:03 2002 From: DavidA at ActiveState.com (David Ascher) Date: Fri Apr 5 13:47:03 2002 Subject: [Numpy-discussion] Re: [Python-Dev] Array Enhancements References: <20020405203029.19286.qmail@web12903.mail.yahoo.com> <200204052121.g35LLut20125@pcp742651pcs.reston01.va.comcast.net> Message-ID: <3CAE1913.ECB27329@activestate.com> Guido van Rossum wrote: > > I would propose the following for multi-dimensional arrays: > > > > a = array.array('d', 20000, 20000) > > > > or: > > > > a = array.xarray('d', 20000, 20000) > > I just realized that multi-dimensional __getitem__ shouldn't be a big > deal. The question is, given the above declaration, what a[0] should > return: the same as a[0, 0] or a copy of a[0, 0:20000] or a reference > to a[0, 0:20000]. Or a ValueError? In the face of ambiguity, refuse the temptation to guess. IIRC, this issue caused lots of problems in the numpy world. cc'ing Paul in case he wants to jump in to fill in my rusty memory. Why does submitting a patch to arraymodule seem an easier path than modifying numarray or numpy to support what's needed? I believe that the goals of numarray aren't that different from what Scott is trying to do (memory management APIs, etc.). I'd like to see fewer multi-dimensional array objects, not more... --david ascher From jochen at unc.edu Fri Apr 5 20:56:09 2002 From: jochen at unc.edu (Jochen =?iso-8859-1?q?K=FCpper?=) Date: Fri Apr 5 20:56:09 2002 Subject: [Numpy-discussion] numerical integration Message-ID: The following message is a courtesy copy of an article that has been posted to comp.lang.python.announce as well. I have made a numerical intergation package available at ,---- | http://python.jochen-kuepper.de/integrate `---- This is a copy of the integrate module of scipy by Travis Oliphant plus some small changes and rearrangements to make it work standalone (well, it need Numeric). All credits go to the scipy folks, esp. Travis, all errors should be blamed on me. Greetings, Jochen PS: In the long run this module will be phased out in favor of scipy, but for now it might be useful for someone... -- Einigkeit und Recht und Freiheit http://www.Jochen-Kuepper.de Libert?, ?galit?, Fraternit? GnuPG key: 44BCCD8E Sex, drugs and rock-n-roll From andrewm at object-craft.com.au Sun Apr 7 23:32:07 2002 From: andrewm at object-craft.com.au (Andrew McNamara) Date: Sun Apr 7 23:32:07 2002 Subject: [Numpy-discussion] Puzzling numpy results? Message-ID: <20020408063157.1659D38F5B@coffee.object-craft.com.au> The behavior I'm seeing with zero length Numeric arrays is not what I would have expected: >>> from Numeric import * >>> array([5]) != array([]) zeros((0,), 'l') >>> array([]) == array([]) zeros((0,), 'l') >>> allclose(array([5]), array([])) 1 This is with Numeric-20.3 (and Numeric-20.2.1) - is this behavior correct, or have I stumbled across a bug? If both sides of the comparison are arrays with a length greater than zero, the comparisons work as expected: >>> array([5]) != array([6]) array([1]) >>> array([5, 5]) != array([6]) array([1, 1]) >>> array([5]) != array([5]) array([0]) The problem came up when I was writing unittests for some Numpy code: under some circumstances, the code under test is expected to return a zero length array: I was somewhat surprised when I couldn't make the test fail! 8-) -- Andrew McNamara, Senior Developer, Object Craft http://www.object-craft.com.au/ From tchur at optushome.com.au Mon Apr 8 13:34:16 2002 From: tchur at optushome.com.au (Tim Churches) Date: Mon Apr 8 13:34:16 2002 Subject: [Numpy-discussion] Puzzling numpy results? References: <20020408063157.1659D38F5B@coffee.object-craft.com.au> Message-ID: <3CB208CE.2270C89@optushome.com.au> Andrew McNamara wrote: > > The behavior I'm seeing with zero length Numeric arrays is not what I > would have expected: > > >>> from Numeric import * > >>> array([5]) != array([]) > zeros((0,), 'l') > >>> array([]) == array([]) > zeros((0,), 'l') > >>> allclose(array([5]), array([])) > 1 The Numpy docs point out that == and != are implemented via the logical ufuncs, and that: "The ``logical'' ufuncs also perform their operations on arrays in elementwise fashion, just like the ``mathematical'' ones." I think this explains the results you are seeing: if you do an element-wise comparison of a length-one array with a zero-length array, the Numpy recycling rule means that you should always get a zero-length result. Note that zeros((0,),'l') is not zero, it is zero zeros. So although the results are surprising (at least to me, and you), I think the observed results are logically correct, although surprising. But, if that is the case, why does this hold (which I suspect reflects what you originally expected)?: >>> from Numeric import * >>> array([5,6]) != array([]) 1 >>> array([5,6]) == array([]) 0 Tim C From xscottg at yahoo.com Thu Apr 11 04:32:03 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 11 04:32:03 2002 Subject: [Numpy-discussion] Introduction Message-ID: <20020411113152.98373.qmail@web12906.mail.yahoo.com> Hello All. I'm interested in this project, and am curious to what level you are willing to accept outside contribution. I just tried to subscribe to the developers list, but I didn't realize that required admin approval. Hopefully it doesn't look like I was shaking the door without knocking first. Is this list active? Is this the correct place to talk about Numarray? A little about me: My name is Scott Gilbert, and I work as a software developer for a company called Rincon Research in Tucson Arizona. We do a lot digital signal processing/analysis among other things. In the last year or so, we've started to use Python in various capacities, and we're hoping to use it for more things. We need a good array module for various things. Some are similar to what it looks like Numarray is targeted at (fft, convolutions, etc...), and others are pretty different (providing buffers for reading data from specialized hardware etc...) About a week ago, I noticed that Guido over in Python developer land was willing to accept patches to the standard array module. As such, I thought I would take that opportunity to try and wedge some desirements and requirements I have into that baseline. Bummer for me, but they weren't exactly exited about bloating out arraymodule.c to meet my needs, and in retrospect that does make good sense. A number of people suggested that this might be a better place to try and get what I need. So here I am, poking around and wondering if I can play in your sandbox. If you're willing to let me contribute, my specific itches that I need to scratch are below. Otherwise - bummer, and I hope you all catch crabs... :-) ----------------------------------- It's taken me a couple of days to understand what's going on in the source. I've read through the design docs, and the PEP, but it wasn't until I tried to re-implement it that it really clicked. My re-implementation of the array portion of what you're doing is attached. There are still some holes to fill in, but it's fairly complete and supports a whole bunch of things which yours does not (Some of which you might even find useful: Pickling, Bit type). I'm pretty proud of it for only 400 lines of Python (Most of which is the bazillion type declarations). It's probably riddled with bugs as it's less than a day old... After initially thinking that you guys were getting too clever, I've come to realize it's a pretty good design overall. Still I have some changes I would like to make if you'll let me. (Both to the design and the implementation) ------------------------- Following your design for the Array stuff, I've been able to implement a pretty usable array class that supports the bazillion array types I need (Bit, Complex Integer, etc...). This gets me past my core requirements without polluting your world, but unfortunately my new XArray type doesn't play so well with your UFuncs. I think my users will definitely want to use your UFuncs when the time comes, so I want to remedy this situation. The first change I would like to make is to rework your code that verifies that an object is a "usable" array. I think NumArray should only check for the interface required, not the actual type hierarchy. By this I mean that the minimum required to be a supported array type is that it support the correct attributes, not that it actually inherit from NDArray: (quoting from your paper) something like: _data _shape _strides _byteoffset _aligned _contiguous _type _byteswap Most of these are just integer fields, or tuples of integers. Ignoring _type for the moment, it appears that the interface required to be a NumArray is much less strict than actually requiring it to derive from NumArray. If you allow me to change a few functions (inputarray() in numarray.py is one small example), I could use my independant XArray class almost as is, and moreover I can implement new array objects (possibly as extension types) for crazy things like working with page aligned memory, memory mapping etc... Well, that's almost enough. The _type field poses a small problem of sorts. It looks like you don't require a _type to be derived from NumericType, and this is a good thing since it allows me (and others) to implement NumArray compatible arrays without actually requiring NumArray to be present. However, it would be nice if you declared a more comprehensive list of typenames - even if they aren't all implemented in NumArray proper. Who knows, maybe the SciPy guys have a use for complex integers or bit arrays. If you make a reasonable canonical list, our data could be passed back and forth even if NumArray doesn't know what to do with it. See my attached module for the types of things I'm thinking of. I'm not so concerned about the "Native Types" that are in there, but I think committing a list of named standard types. (I suspect there are others that are interested in standard C types even if the size changes between machines...) If you were to specify a minimal interface like this in the short term, I could begin propagating my array module to my users. I could get my work done now, knowing that I'll be compatible with NumArray proper once it matures. I'd be willing to participate in making these changes if necessary. Looking at the big picture, I think it's desirable that there really only be one official standard for ND arrays in the Python world. That way, the various independent groups can all share their independent work. You guys are the heir-apparent, so to speak, from the Python guys point of view. I don't know if you're trying to get all of NumArray into the Python distribution or not, but I suspect a good interim step would be to have a PEP that specifies what it means to be a NumArray or NDArray in minimal terms. Perhaps supplying an Array only module in Python that implements this interface. Again, I'd be willing to help with all of this. ------------------------- Ok, other suggestions... Here is the list of things that your design document indicates are required to be a NumArray: _data _shape _strides _byteoffset _aligned _contiguous _type _byteswap I believe that one could calculate the values for _aligned and _contiguous from the other fields. So they shouldn't really be part of the interface required. I suspect it is useful for the C implementation of UFuncs to have this information in the NDINfo struct though, so while I would drop them from attribute interface, I would delegate the task of calculating these values to getNDInfo() and/or getNumInfo(). I also notice that you chose _byteswap to indicate byteswapping is needed. I think a better choice would be to specify the endian-ness of the data (with an _endian attr), and have getNDInfo() and getNumInfo() calculte the _byteswap value for the NDInfo struct. In my implementation, I came up with a slightly different list: self._endian self._offset self._shape self._stride self._itemtype self._itemsize self._itemformat self._buffer The only minimal differences are that _itemsize allows me to work with arrays of bytes without having any clue what the underlying type is (in some cases, _itemtype is "Unknown".) Secondly, I implemented a "Struct" _itemtype, and _itemformat is useful for for this case. (It's the same format string that the struct module in Python uses.) Also, I specified 0 for _itemsize when the actual items aren't byte addressable. In my module, this only occurred with the Bit type. I figured specifying 0 like this could keep a UFunc that isn't Bit aware from stepping on memory that it isn't allowed to. ------------------------- Next thought: Memory Mapping I really like the idea of having Python objects that map huge files a piece at time without using all of available memory. I've seen this in NumArray's charter as part of the reason for breaking away from Numeric, and I'm curious how you intend to address it. Right now, the only requirement for _data seems to be that it implement the PyBufferProcs. For memory mapping something else is needed... I haven't implemented this, so take it as just my rambling thoughts: With the addition of 3 new, optional, attributes to the NumArray object interface, I think this could be efficiently accomplished: _mapproc _mapmin _mapmax If _mapproc is present and not None, then it points to a function who's responsibility it is to set _mapmin and _mapmax appropriately. _mapproc takes one argument which is the desired byte offset into the virtual array. This is probably easier to describe with code: def _mapproc(self, offset): unmap_the_old_range() mmap_a_new_range_that_includes_byteoffset() self._mapmin = minimum_of_new_range() self._mapmax = maximum_of_new_range() In this way, when the delta between _mapmin and _mapmax is large enough, the UFuncs could act over a large contiguous portion of the _data array at a time before another remapping is necessary. If the byteoffset that a UFunc needs to work with is outside of _mapmin and _mapmax, it must call _mapproc to remedy the situation. This puts a lot of work into UFuncs that choose to support this. I suppose that is tough to avoid though. Also, there are threading issues to think about here. I don't know if UFuncs are going to release the Global Interpreter Lock, but if they do it's possible that multiple threads could have the same PyObject and try to _mapproc different offsets at different times. It is possible to implement a mutex for the NumArray without requiring anything special from the PyObject that implements it... ----------------------------- Ok. That's probably way too much content for an Introductory email. I do have more thoughts on this stuff though. They'll just have to wait for another time. Nice to meet you all, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: XArray.py URL: From perry at stsci.edu Thu Apr 11 09:02:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 11 09:02:05 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com> Message-ID: Hi Scott, I've printed out your message and will try to read and understand it today. It may be a couple days before we can respond, so don't take a lack of an immediate response as disinterest. Thanks, Perry From jmiller at stsci.edu Thu Apr 11 14:36:03 2002 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 11 14:36:03 2002 Subject: [Numpy-discussion] slice question and bug References: <20020401165852.44D3E79B@spiralis.merseine.nu> <15528.44386.160013.936132@spiralis.merseine.nu> Message-ID: <3CB60188.1010203@stsci.edu> clee at spiralis.merseine.nu wrote: > >clee at spiralis.merseine.nu writes: > > > > Hello, > > I'm trying to track down a segv when I do the B[:] operation on an > > array, "B", a that I've built in as a view on external data. During... > > [snip] > >To clarify my own somewhat non-sensical post: When I started composing >my message, I was trying to figure out a bug in my own code that >caused a crash while doing slice_array. I've since fixed that bug. >However, in the process of figuring out what I was doing wrong I >was browsing the Numeric source code. While examining >PyArray_Free(..) in arrayobject.c, I saw that returns -1 whenever the >number of dimensions is greater than 2, yet it has code that tests for >when the number of dimensions equals 3. > >So utimately, my post is just an alert, that I think there might be >some code that needs to be cleaned up. > >Thanks, > lacking-caffeine-ly yours > -chris > >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > Looking at the code to PyArray_Free, I agree with Chris. Called to free a 2D array, I think that PyArray_Free leaks all of the row storage because ap->nd == 2, not 3: * {%c++%} */ extern int PyArray_Free(PyObject *op, char *ptr) { PyArrayObject *ap = (PyArrayObject *)op; int i, n; if (ap->nd > 2) return -1; if (ap->nd == 3) { n = ap->dimensions[0]; for (i=0; ind >= 2) { free(ptr); } Py_DECREF(ap); return 0; } /* {%c++%} */ Other opinions? Todd -- Todd Miller jmiller at stsci.edu STSCI / SSG (410) 338 4576 From perry at stsci.edu Thu Apr 11 14:57:14 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 11 14:57:14 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020411113152.98373.qmail@web12906.mail.yahoo.com> Message-ID: > [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Scott > Gilbert > Subject: [Numpy-discussion] Introduction > > > Hello All. > > I'm interested in this project, and am curious to what level you are > willing to accept outside contribution. I just tried to subscribe to > the developers list, but I didn't realize that required admin approval. > Hopefully it doesn't look like I was shaking the door without knocking > first. > > Is this list active? Is this the correct place to talk about Numarray? Sure. > > Following your design for the Array stuff, I've been able to implement > a pretty usable array class that supports the bazillion array types I > need (Bit, Complex Integer, etc...). This gets me past my core > requirements without polluting your world, but unfortunately my new > XArray type doesn't play so well with your UFuncs. I think my users > will definitely want to use your UFuncs when the time comes, so I want > to remedy this situation. > > The first change I would like to make is to rework your code that > verifies that an object is a "usable" array. I think NumArray should > only check for the interface required, not the actual type hierarchy. > By this I mean that the minimum required to be a supported array type > is that it support the correct attributes, not that it actually inherit > from NDArray: > > (quoting from your paper) something like: > > _data > _shape > _strides > _byteoffset > _aligned > _contiguous > _type > _byteswap > > Most of these are just integer fields, or tuples of integers. Ignoring > _type for the moment, it appears that the interface required to be a > NumArray is much less strict than actually requiring it to derive from > NumArray. If you allow me to change a few functions (inputarray() in > numarray.py is one small example), I could use my independant XArray > class almost as is, and moreover I can implement new array objects > (possibly as extension types) for crazy things like working with page > aligned memory, memory mapping etc... > I guess we are not sure we understand what you mean by interface. In particular, we don't understand why sharing the same object attributes (the private ones you list above) is a benefit to the code you are writing if you aren't also using the low level implementation. The above attributes are private and nothing external to the Class should depend on or even know about them. Could you elaborate on what you mean by interface and the relationship between your arrays and numarrays? > > Well, that's almost enough. The _type field poses a small problem of > sorts. It looks like you don't require a _type to be derived from > NumericType, and this is a good thing since it allows me (and others) > to implement NumArray compatible arrays without actually requiring > NumArray to be present. > What do you mean by NumArray compatible? [some issues snipped since we need to understand the interface issue first] > I don't know if you're trying to get all of NumArray into the Python > distribution or not, but I suspect a good interim step would be to have > a PEP that specifies what it means to be a NumArray or NDArray in > minimal terms. Perhaps supplying an Array only module in Python that > implements this interface. Again, I'd be willing to help with all of > this. > We are hoping to get numarray into the distribution [it won't be the end of the world for us if it doesn't happen]. I'll warn you that the PEP is out of date. We are likely to update it only after we feel we are close to having the implementation ready for consideration for including into the standard distribution. I would refer to the actual implementation and the design notes for the time being. > > ------------------------- > > Ok, other suggestions... > > Here is the list of things that your design document indicates are > required to be a NumArray: > > _data > _shape > _strides > _byteoffset > _aligned > _contiguous > _type > _byteswap > > I believe that one could calculate the values for _aligned and > _contiguous from the other fields. So they shouldn't really be part of > the interface required. I suspect it is useful for the C > implementation of UFuncs to have this information in the NDINfo struct > though, so while I would drop them from attribute interface, I would > delegate the task of calculating these values to getNDInfo() and/or > getNumInfo(). > > I also notice that you chose _byteswap to indicate byteswapping is > needed. I think a better choice would be to specify the endian-ness of > the data (with an _endian attr), and have getNDInfo() and getNumInfo() > calculte the _byteswap value for the NDInfo struct. > > In my implementation, I came up with a slightly different list: > > self._endian > self._offset > self._shape > self._stride > self._itemtype > self._itemsize > self._itemformat > self._buffer > Some of the name changes are worth considering (like replacing ._byteswap with an endian indicator, though I find _endian completely opaque as to what it would mean--1 means what? little or big?). (BTW, we already have _itemsize). _contiguous and _aligned are things we have been considering changing, but I would have to think about it carefully to determine if they really are redundant. > The only minimal differences are that _itemsize allows me to work with > arrays of bytes without having any clue what the underlying type is (in > some cases, _itemtype is "Unknown".) Secondly, I implemented a > "Struct" _itemtype, and _itemformat is useful for for this case. (It's > the same format string that the struct module in Python uses.) > It looks like you are trying to deal with records with these "structs". We deal with records (efficiently) in a completely different way. Take a look at the recarray module. > Also, I specified 0 for _itemsize when the actual items aren't byte > addressable. In my module, this only occurred with the Bit type. I > figured specifying 0 like this could keep a UFunc that isn't Bit aware > from stepping on memory that it isn't allowed to. > Again, we aren't sure how this works with numarray. > ------------------------- > > Next thought: Memory Mapping > > I really like the idea of having Python objects that map huge files a > piece at time without using all of available memory. I've seen this in > NumArray's charter as part of the reason for breaking away from > Numeric, and I'm curious how you intend to address it. > > Right now, the only requirement for _data seems to be that it implement > the PyBufferProcs. For memory mapping something else is needed... > > I haven't implemented this, so take it as just my rambling thoughts: > > With the addition of 3 new, optional, attributes to the NumArray object > interface, I think this could be efficiently accomplished: > > _mapproc > _mapmin > _mapmax > > If _mapproc is present and not None, then it points to a function who's > responsibility it is to set _mapmin and _mapmax appropriately. > _mapproc takes one argument which is the desired byte offset into the > virtual array. This is probably easier to describe with code: > > def _mapproc(self, offset): > unmap_the_old_range() > mmap_a_new_range_that_includes_byteoffset() > self._mapmin = minimum_of_new_range() > self._mapmax = maximum_of_new_range() > > In this way, when the delta between _mapmin and _mapmax is large > enough, the UFuncs could act over a large contiguous portion of the > _data array at a time before another remapping is necessary. If the > byteoffset that a UFunc needs to work with is outside of _mapmin and > _mapmax, it must call _mapproc to remedy the situation. > > This puts a lot of work into UFuncs that choose to support this. I > suppose that is tough to avoid though. > We deal with memory mapping a completely differnent way. It's a bit late for me to go into it in great detail, but we wrap the standard library mmap module with a module that lets us manage memory mapped files. This module basically memory maps an entire file and then in effect mallocs segments of that file as buffer objects. This allocation of subsets is needed to ensure that overlapping memory maps buffers don't happen. One can basically reserve part of the memory mapped file as a buffer. Once that is done, nothing else can use that part of the file for another buffer. We do not intend to handle memory maps as a way of sequentially mapping parts of the file to provide windowed views as your code segment above suggests. If you want a buffer that is the whole (large) file, you just get a mapped buffer to the whole thing. (Why wouldn't you?) The above scheme is needed for our purposes because many of our data files contain multiple data arrays and we need a means of creating a numarray object for each one. Most of this machinery has already been implemented, but we haven't released it since our I/O package (for astronomical FITS files) is not yet at the point of being able to use it. > Also, there are threading issues to think about here. I don't know if > UFuncs are going to release the Global Interpreter Lock, but if they do > it's possible that multiple threads could have the same PyObject and > try to _mapproc different offsets at different times. > To tell you the truth, we haven't dealt with the threading issue much. We think about it occasionally, but have deferred dealing with it until we have finished other aspects first. We do want to make it thread safe though. Perry Greenfield From oliphant at ee.byu.edu Thu Apr 11 15:47:04 2002 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 11 15:47:04 2002 Subject: [Numpy-discussion] slice question and bug In-Reply-To: <3CB60188.1010203@stsci.edu> Message-ID: > Looking at the code to PyArray_Free, I agree with Chris. Called to > free a 2D > array, I think that PyArray_Free leaks all of the row storage because > ap->nd == 2, not 3: > > * {%c++%} */ > extern int PyArray_Free(PyObject *op, char *ptr) { > PyArrayObject *ap = (PyArrayObject *)op; > int i, n; > > if (ap->nd > 2) return -1; > if (ap->nd == 3) { > n = ap->dimensions[0]; > for (i=0; i free(((char **)ptr)[i]); > } > } > if (ap->nd >= 2) { > free(ptr); > } > Py_DECREF(ap); > return 0; > } > /* {%c++%} */ > > This has been broken since the beginning. I believe the documentation says as much. I've never used it because I always think of 2-D arrays as a block of data not as rows of pointers. It should be fixed, but no one's ever been interested enough to do it. -Travis Oliphant From xscottg at yahoo.com Thu Apr 11 21:46:02 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Thu Apr 11 21:46:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020412044201.63373.qmail@web12908.mail.yahoo.com> --- Perry Greenfield wrote: > > I guess we are not sure we understand what you mean by interface. > In particular, we don't understand why sharing the same object > attributes (the private ones you list above) is a benefit to the > code you are writing if you aren't also using the low level > implementation. The above attributes are private and nothing > external to the Class should depend on or even know about them. > Could you elaborate on what you mean by interface and the > relationship between your arrays and numarrays? > There are several places in your code that check to see if you are working with a valid type for NDArrays. Currently this check consists of asking the following questions: 'Is it a tuple or list?' 'Is it a scalar of some sort?' 'Does it derive from our NDArray class?' If any of these questions answer true, it does the right thing and moves on. If none of these is true, it raises an exception. I suppose this is fine if you are only concerned about working with your own implementation of an array type, but I hope you'll consider the following as a minor change that opens up the possibility for other compatible array implementations to work interoperably. Instead have the code ask the following questions: 'Is it a tuple or list?' 'Is it a scalar of some sort?' 'Does it support the attributes necessary to be like an NDArray object?' This change is very similar to how you can pass in any Python object to the "pickle.dump()" function, and if it supports the "write()" method it will be called: >>> class WhoKnows: ... def write(self, x): ... print x >>> >>> import pickle >>> >>> w = WhoKnows() >>> >>> pickle.dump('some data', w) S'some data' p1 . Until reading your response above, I didn't realize that you consider your single underscore attributes to be totally private. In general, I try to use a single underscore to mean protected (meaning you can use them if you REALLY know what you are doing), hence my confusion. With that in mind, pretend that I suggested the following instead: The specification of an NDArray is that it has the following attributes ndarray_buffer - a PyObject which has PyBufferProcs ndarray_shape - a tuple specifying the shape of the array ndarray_stride - a tuple specifyinf the index multipliers ndarray_itemsize - an int/long stating the size of items ndarray_itemtype - some representation of type This would be a very minor change to your functions like inputarray(), getNDInfo(), getNDArray(), but it would allow your UFuncs to work with other implementations of arrays. As an example similar to the pickle example above: import array class ScottArray: def __init__(self): self.ndarray_buffer = array.array('d', [0]*100) self.ndarray_shape = (10, 10) self.ndarray_stride = (80, 8) self.ndarray_itemsize = 8 self.ndarray_itemtype = 'Float64' import numarray n = numarray.numarray((10, 10), type='Float64') s = ScottArray() very_cool = numarray.add(n, s) This example is kind of silly. I mean, why wouldn't I just use numarray for all of my array needs? Well, that's where my world is a little different than yours I think. Instead of using 'array.array()' above, there are times where I'll need to use 'whizbang.array()' to get a different PyBufferProcs supporting object. Or where I'll need to work with a crazy type in one part of the code, but I'd like to pass it to an extension that combines your types and mine. In these cases where I need "special memory" or "special types" I could try and get you guys to accept a patch, but this would just pollute your project and probably annoy you in general. A better solution is to create a general standard mechanism for implementing NDArray types, and let me make my own. In the above example, we could have completely different NDArray implementations working interoperably inside of one UFunc. It seems to me that all it really takes to be an NDArray can be specified by a list of attributes like the one above. (Probably need a few more attributes to be really general: 'ndarray_endian', etc...) In the end, NDArrays are just pointers to a buffer, and descriptors for indexing. I don't believe this would have any significant affect on the performance of numarray. (The efficient fast C code still gets a pointer to work with.) More over, I'd be very willing to contribute patches to make this happen. If you agree, and we can flesh out what this "attribute interface" should be, then I can start distributing my own array module to the engineers where I work without too much fear that they'll be screwed once numarray is stable and they want to mix and match. Code always lives a lot longer than I want it to, and if I give them something now which doesn't work with your end product, I'll have done them a disservice. BTW: Allowing other types to fill in as NDArrays also allows other types to implement things like slicing as they see fit (slice and copy contiguious, slice and copy on write, slice and copy by reference, etc...). > > We are hoping to get numarray into the distribution [it won't be the > end of the world for us if it doesn't happen]. I'll warn you that the > PEP is out of date. We are likely to update it only after we feel > we are close to having the implementation ready for consideration > for including into the standard distribution. I would refer to the > actual implementation and the design notes for the time being. > Yeah, I recognize that the PEP is gathering dust at the moment. I'm not having too much trouble following through the source and design docs. It took me a few days to "get it", but that's probably because I'm slower than your average bear. :-) Regarding the PEP, what I would like to see happen is that if we agree that the "attribute interface" stuff above is the right way to go about things, I would (or we would) submit a milder interim PEP specifying what those attributes are, how they are to be interpreted, and a simple Python module implementing a general NDArray class for consumption. Hopefully this PEP would specify a canonical list of type names as well. Then we could make updates to the other PEP if necessary. > > Some of the name changes are worth considering (like replacing ._byteswap > with an endian indicator, though I find _endian completely opaque as to > what it would mean--1 means what? little or big?). (BTW, we already have > _itemsize). _contiguous and _aligned are things we have been considering > changing, but I would have to think about it carefully to determine if > they really are redundant. > It's all open for discussion, but I would propose that ndarray_endian be one of: '>' - big endian '<' - little endian This is how the standard Python struct module specifies endian, and I've been trying to stay consistant with the baseline when possible. > > It looks like you are trying to deal with records with these "structs". > We deal with records (efficiently) in a completely different way. Take > a look at the recarray module. > Will definitely do. I've called them structs simply because they borrow their format string from the struct module that ships with Python. I'm not hung up on the name, and I wouldn't object to an alias. Too early for me to tell if there is even a difference in the underlying memory, but maybe we'll end up with 'structs' for my notion of things, and 'records' for yours. > > We deal with memory mapping a completely different way. It's a bit late > for me to go into it in great detail, but we wrap the standard library > mmap module with a module that lets us manage memory mapped files. > This module basically memory maps an entire file and then in effect > mallocs segments of that file as buffer objects. This allocation of > subsets is needed to ensure that overlapping memory maps buffers > don't happen. One can basically reserve part of the memory mapped file > as a buffer. Once that is done, nothing else can use that part of the > file for another buffer. We do not intend to handle memory maps as a > way of sequentially mapping parts of the file to provide windowed views > as your code segment above suggests. If you want a buffer that is the > whole (large) file, you just get a mapped buffer to the whole thing. > (Why wouldn't you?) > I think the idea of taking a 500 megabyte (or 5 gigabyte) file, and windowing 1 meg of actual memory at time pretty attractive. Sometimes we do very large correlations, and there just isn't enough memory to mmap the whole file (much less two files for correlation). Any library that doesn't want to support this business could just raise a NotImplemented error on encountering them. Maybe I shouldn't be calling this "memory mapping". Even though it could be implemented on top of mmap, truthfully I just want to support a "windowing" interface. If we could specify the windowing attributes and indicate the standard usage that would be great. Maybe: ndarray_window(self, offset) ndarray_winmin ndarray_winmax > > The above scheme is needed for our purposes because many of our data files > contain multiple data arrays and we need a means of creating a numarray > object for each one. Most of this machinery has already been implemented, > but we haven't released it since our I/O package (for astronomical FITS > files) is not yet at the point of being able to use it. > There is a group at my company that is using FITS for some stuff. I don't know enough about it to comment though... Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Fri Apr 12 17:44:04 2002 From: perry at stsci.edu (Perry Greenfield) Date: Fri Apr 12 17:44:04 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020412044201.63373.qmail@web12908.mail.yahoo.com> Message-ID: Scott Gilbert writes: > import array > class ScottArray: > def __init__(self): > self.ndarray_buffer = array.array('d', [0]*100) > self.ndarray_shape = (10, 10) > self.ndarray_stride = (80, 8) > self.ndarray_itemsize = 8 > self.ndarray_itemtype = 'Float64' > > import numarray > > n = numarray.numarray((10, 10), type='Float64') > s = ScottArray() > > very_cool = numarray.add(n, s) > But why not (I may have some details wrong, I'm doing this from memory, and I haven't worked on it myself in a bit): import array import numarray import memory # comes with numarray class ScottArray(NumArray): def __init__(self): # create necessary buffer obj buf = memory.writeable_buffer(array.array('d', [0]*100)) Numarray.__init__(self, shape=(10, 10), type=numarray.Float64 buffer=buf) # _strides not settable from constructor yet, but currently # if you needed to set it: # self._strides = (80, 8) # But for this case it would be computed automatically from # the supplied shape n = numarray.numarray((10, 10), type='Float64') s = ScottArray() maybe_not_quite_so_cool_but_just_as_functional = n + s > This example is kind of silly. I mean, why wouldn't I just use > numarray for > all of my array needs? Well, that's where my world is a little > different than > yours I think. Instead of using 'array.array()' above, there are > times where > I'll need to use 'whizbang.array()' to get a different > PyBufferProcs supporting > object. Or where I'll need to work with a crazy type in one part > of the code, > but I'd like to pass it to an extension that combines your types and mine. > > In these cases where I need "special memory" or "special types" I > could try and > get you guys to accept a patch, but this would just pollute your > project and > probably annoy you in general. A better solution is to create a general > standard mechanism for implementing NDArray types, and let me make my own. > >From everything I've seen so far, I don't see why you can't just create a NumArray object directly. You can subclass it (and use multiple inheritance if you need to subclass a different object as well) and add whatever customized behavior you want. You can create new kinds of objects as buffers just so long as you satisfy the buffer interface. > > In the above example, we could have completely different NDArray > implementations working interoperably inside of one UFunc. It > seems to me that > all it really takes to be an NDArray can be specified by a list > of attributes > like the one above. (Probably need a few more attributes to be > really general: > 'ndarray_endian', etc...) In the end, NDArrays are just pointers > to a buffer, > and descriptors for indexing. > Again, why not just create an NDArray object with the appropriate buffer object and attributes (subclassing if necessary). > > I don't believe this would have any significant affect on the > performance of > numarray. (The efficient fast C code still gets a pointer to > work with.) More > over, I'd be very willing to contribute patches to make this happen. > > > If you agree, and we can flesh out what this "attribute > interface" should be, > then I can start distributing my own array module to the > engineers where I work > without too much fear that they'll be screwed once numarray is > stable and they > want to mix and match. > > Code always lives a lot longer than I want it to, and if I give > them something > now which doesn't work with your end product, I'll have done them > a disservice. > All good in principle, but I haven't yet seen a reason to change numarray. As far as I can tell, it provides all you need exactly as it is. If you could give an example that demonstrated otherwise... > > It's all open for discussion, but I would propose that > ndarray_endian be one > of: > > '>' - big endian > '<' - little endian > > This is how the standard Python struct module specifies endian, > and I've been > trying to stay consistant with the baseline when possible. > To tell you the truth, I'm not crazy about how the struct module handles types or attributes. It's generally far too cryptic for my tastes. Other than providing backward compatibility, we aren't interested in it emulating struct. > > > > The above scheme is needed for our purposes because many of our > data files > > contain multiple data arrays and we need a means of creating a numarray > > object for each one. Most of this machinery has already been > implemented, > > but we haven't released it since our I/O package (for astronomical FITS > > files) is not yet at the point of being able to use it. > > > > I could well misundertand, but I thought that if you mmap a file in unix in write mode, you do not use up the virtual memory as limited by the physical memory and the paging file. Your only limit becomes the virtual address space available to the processor. If the 32 bit address is your problem, you are far, far better off using a 64-bit processor and operating system than trying to kludge up a windowing memory mechanism. I could see a way of doing it for ufuncs, but the numeric world (and I would think the DSP world as well) needs far more than element-by-element array functionality. providing a usable C-api for that kind of memory model would be a nightmare. But I'm not sure if this or the page file is your limitation. Perry From kragen at pobox.com Sat Apr 13 00:25:01 2002 From: kragen at pobox.com (Kragen Sitaker) Date: Sat Apr 13 00:25:01 2002 Subject: [Numpy-discussion] segfault in Numpy esxtension Message-ID: <20020413072433.2702DBDC1@panacea.canonical.org> (All of the below is with regard to Numeric 20.2.0.) For a consulting client, I wrote a extension module that does the equivalent of sum(take(a, b)), but without the temporary result in between. I was surprised that when I tried to .resize() the result of this routine, I got a segmentation fault and a core dump. It was crashing at this line in arrayobject.c: if (memcmp(self->descr->zero, all_zero, elsize) == 0) { self->descr, in this case, was the type description for arrays of type "double". It seems that self->descr->zero was 0, as in a null pointer, not a pointer to a location containing (double)0, and this was causing it to crash. It looks like the .zero fields of the type descriptions (which live in arraytypes.c and _numpy.so) are initialized to be null pointers, and only when the initmultiarray() function in multiarraymodule.c is run are these pointers set to point to actual zeroes somewhere in allocated memory. I guess Numeric.py imports multiarray.so, which calls initmultiarray(), so the solution for me was to make sure I import Numeric before importing my module (or at least before resizing arrays produced by my module). But, to my mind, this segfault is a bug --- importing a module that follows all the rules shouldn't put Python in a state that's so dangerously inconsistent that innocent things like .resize() can crash it. Maybe the same .so file that includes the actual data items should be responsible for initializing them --- especially since import_array() imports _numpy without importing multiarray. (I assume there's a reason it wasn't done this way in the first place.) What do other people think? -- /* By Kragen Sitaker, http://pobox.com/~kragen/puzzle4.html */ char b[2][10000],*s,*t=b,*d,*e=b+1,**p;main(int c,char**v){int n=atoi(v[1]); strcpy(b,v[2]);while(n--){for(s=t,d=e;*s;s++){for(p=v+3;*p;p++)if(**p==*s){ strcpy(d,*p+2);d+=strlen(d);goto x;}*d++=*s;x:}s=t;t=e;e=s;*d++=0;}puts(t);} From xscottg at yahoo.com Sat Apr 13 03:09:04 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Sat Apr 13 03:09:04 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020413100823.45837.qmail@web12907.mail.yahoo.com> --- Perry Greenfield wrote: > Scott Gilbert writes: [...] > > > > very_cool = numarray.add(n, s) > > > But why not (I may have some details wrong, I'm doing this > from memory, and I haven't worked on it myself in a bit): > [...] > > maybe_not_quite_so_cool_but_just_as_functional = n + s > [...] > > From everything I've seen so far, I don't see why you can't > just create a NumArray object directly. You can subclass it > (and use multiple inheritance if you need to subclass a different > object as well) and add whatever customized behavior you want. > You can create new kinds of objects as buffers just so long > as you satisfy the buffer interface. > Your point about the optional buffer parameter to the NumArray is well taken. I had seen that when looking through the code, but it slipped my mind for that example. I could very well be wrong about some of these other reasons too... I have a number of reasons listed below for wanting the standard that Python adopts to specify only the interface and not the implementation. You may not find all of these pursuasive, and I apologize in advance if any looks like a criticism. (In my limited years as a professional software developer, I've found that the majority of people can be very defensive and protective of their code. I've been trying to tread lightly, but I don't know if I'm succeeding.) However if any of these reasons is persuasive, keep in mind that the actual changes I'm proposing are pretty minimal in scope. And that I'd be willing to submit patches so as to reduce any inconvenience to you. (Not that you have any reason to believe I can code my way out of a box... :-) Ok, here's my list: Philosophical You have a proposal in to the Python guys to make Numarray into the standard _implementation_. I think standards like this should specify an _interface_, not an implementation. Simplicity I can give my users a single XArray.py file, and they can be off and running with something that works right then and there, and it could in many ways be compatible with Numarray (with some slight modifications) when they decide they want the extra functionality of extension modules that you or anyone else who follows your standard provides. But they don't have to compile anything until they really need to. Your implementation leaves me with all or nothing. I'll have to build and use numarray, or I've got an in house only solution. Expediency I want to see a usable standard arise quickly. If you maintain the stance that we should all use the Numarray implementation, instead of just defining a good Numarray interface, everyone has to wait for you to finish things enough to get them accepted by the Python group. Your implementation is complicated, and I suspect they will have many things that they will want you to change before they accept it into their baseline. (If you think my list of suggestions is annoying, wait until you see theirs!) If a simple interface protocol is presented, and a simple pure Python module that implements it. The PEP acceptance process might move along quickly, but you could take your time with implementing your code. Pragmatic You guys aren't finished yet, and I need to give my users an array module ASAP. As such a new project, there are likely to be many bugs floating around in there. I think that when you are done, you will probably have a very good library. Moreover, I'm grateful that you are making it open source. That's very generous of you, and the fact that you are tolerating this discussion is definitely appreciated. Still, I can't put off my projects, and I can't task you to work faster. However, I do think we could agree in a very short term that your design for the interface is a good one. I also think that we (or just me if you like) could make a much smaller PEP that would be more readily accepted. Then everyone in this community could proceed at their own pace - knowing that if we followed the simple standard we would have inter operability with each other. Social Normally I wouldn't expect you to care about any of my special issues. You have your own problems to solve. As I said above, it's generous of you to even offer your source code. However, you are (or at least were) trying to push for this to become a standard. As such, considering how to be more general and apply to a wider class of problems should be on your agenda. If it's not, then you shouldn't be creating the standard. If you don't care about numarray becoming standard, I would like to try my hand at submitting the slightly modified version of your design. I won't be compatible with your stuff, but hopefully others will follow suit. Functionality Data Types I have needs for other types of data that you probably have little use for. If I can't coerce you to make a minor change in specification, I really don't think I could coerce you to support brand new data types (complex ints is the one I've beaten to death, because I could use that one in the short term). What happens when someone at my company wants quaternions? I suspect that you won't have direct support for those. I know that numarray is supposed to be extensible, but the following raises an exception: from numarray import * class QuaternionType(NumericType): def __init__(self): NumericType.__init__(self, "Quaternion", 4*8, 0) Quaternion = QuaternionType() # BOOM! q = array(shape=(10, 10), type=Quaternion) Maybe I'm just doing something wrong, but it looks like your code wants "Quaternion" to be in your (private?) typeConverters dictionary. Ok, try two: from numarray import * q = NDArray(shape=(10, 10), itemsize=4*8) if a[5][5] is None: print "No boom, but what can I do with it?" Maybe this is just a documentation problem. On the other hand, I can do the following pretty readily: import array class Quat2D: def __init__(self, *shape): assert len(shape) == 2 self._buffer = array.array('d', [0])*shape[0]*shape[1]*4 self._shape, self._stride = tuple(shape), (4*shape[0], 4) self._itemsize = 4*8 def __getitem__(self, sub): assert isinstance(sub, tuple) and len(sub) == 2 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] return tuple([self._buffer[offset + i] for i in range(4)]) def __setitem__(self, sub, val): assert isinstance(sub, tuple) and len(sub) == 2 offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] for i in range(4): self._buffer[offset + i] = val[i] return val q = Quat2D(10, 10) q[5, 5] = (1, 2, 3, 4) print q[5, 5] This isn't very general, but it is short, and it makes a good example. If they get half of their data from calculations using Numarray, and half from whatever I provide them, and then try to mix the results in an extension module that has to know about separate implementations, life is more complicated than it should be. Operations I'm going to have to write my own C extension modules for some high performance operations. All I need to get this done is a void* pointer, the shape, stride, itemsize, itemtype, and maybe some other things to get off and running. You have a growing framework, and you have already indicated that you think of your hidden variables as private. I don't think I or my users should have to understand the whole UFunc framework and API just to create an extension that manipulates a pointer to an array of doubles. Arrays are simpler than UFuncs. I consider them to be pretty seperable parts of your design. If you keep it this way, and it becomes the standard, it seems that I and everyone else will have to understand both parts in order to create an extension module. Flexibility Numarray is going to make a choice of how to implement slicing. My guess is that it will be one of "copy contiguous", "copy on write", "copy by reference". I don't know what the correct choice is, but I know that someone else will need something different based on context. Things like UFuncs and other extension modules that do fast C level calculations typically don't need to concern themselves with slicing behaviour. Design Your implementation would be similar to having the 'pickle' module require you to derive from a 'Pickleable' base class - instead of simply providing __getstate__ and __setstate__ methods. It's an artificial constraint, and those are usually bad. > > All good in principle, but I haven't yet seen a reason to change > numarray. As far as I can tell, it provides all you need exactly > as it is. If you could give an example that demonstrated otherwise... > Maybe you're right. I suspect you as the author will come up with the quick example that shows how to implement my bizarre quaternion example above. I'm not sure if this makes either of us right or wrong, but if you're not buying any of this, then it's probably time for me to chock this off to a difference in opinion and move on. Truthfully this is taking me pretty far from my original tack. Originally I had simply hoped to hack a couple of things into arraymodule.c, and here I am now trying to get a simpler standard in place. I'll try one last time to convince you with the following two statements: - Changing such that you only require the interface is a subtle, but noticeable, improvement to your otherwise very good design. - It's not a difficult change. If that doesn't compel you, at least I can walk away knowing I tried. For the volumes I've written, this will probably be my last pesky message if you really don't want to budge on this issue. > > To tell you the truth, I'm not crazy about how the struct module > handles types or attributes. It's generally far too cryptic for > my tastes. Other than providing backward compatibility, we aren't > interested in it emulating struct. > I consider it a lot like regular expressions. I cringe when I see someone else's, but I don't have much difficulty putting them together. The alternative of coming up with a different specifier for records/structs is probably a mistake now that the struct module already has it's (terse) format specification. Once that is taken into consideration, following all the leads of the struct module makes sense to me. > > I could well misunderstand, but I thought that if you mmap a file > in unix in write mode, you do not use up the virtual memory as > limited by the physical memory and the paging file. Your only > limit becomes the virtual address space available to the processor. > Regarding efficiency, it depends on the implementations, which vary greatly, and there are other subtleties. I've already written a book above, so I won't tire you with details. I will say that closing a large memory mapped file on top of NFS can be dreadful. It probably takes the same amount of total time or less, but from an interactive analysys point of view it's pretty unpleasant on Tru64 at least. Also, just mmaping the whole file puts all of the memory use at the discretion of the OS. I might have a gig or two to work with, but if mmap takes them all, other threads will have to contend for memory. The system (application) as a whole might very well run better if I can retain some control over this. I'm not married to the windowing suggestion. I think it's something to consider, but it might not be a common enough case to try and make a standard mechanism for. If there isn't a way to do it without a kluge, then I'll drop it. Likewise if a simple strategy can't meet anyone's real needs. > > If the 32 bit address is your problem, you are far, far better off > using a 64-bit processor and operating system than trying to kludge up > a windowing memory mechanism. > We don't always get to specify what platform we want to run on. Our customer has other needs, and sometimes hardware support for exotic devices dictate what we'll be using. Frequently it is on 64 bit Alphas, but sometimes the requirement is x86 Linux, or 32 bit Solaris. Finally, our most frustrating piece of legacy software was written in Fortran assuming you could stuff a pointer into an INT*4 and now requires the -taso flag to the compiler for all new code (which turns a sexy 64 bit Alpha into a 32 bit kluge...). Also, much of our data comes on tapes. It's not easy to memory map those. > > I could see a way of doing it for > ufuncs, but the numeric world (and I would think the DSP world > as well) needs far more than element-by-element array functionality. > providing a usable C-api for that kind of memory model would be > a nightmare. But I'm not sure if this or the page file is your > limitation. > I would suggest that any extension module which is not interested in this feature simply raise a NotImplemented exception of some sort. UFuncs could fall into this camp without any criticism from me. All it would have to do is check if the 'window_get' attribute is a callable, and punt an exception. My proposal wasn't necessarily to map in a single element at a time. If the C extension was willing to work these beasts at all, it would check to see if the offset it wanted was between window_min and window_max. If it wasn't, then it would call ob.window_get(offset), and the Python object could update window_min and window_max however it sees fit. For instance by remapping 10 or 20 megabytes on both sides. This particular implementation would allow us to do correlations of a small (mega sample) chunk of data against a HUGE (giga sample) file. This might be the wrong interface, and I'm willing to listen to a better suggestion. It might also be too special of a need to detract from a simpler overall design. Also, there are other uses for things like this. It could possibly be used to implement sparse arrays. It's probably not the best implementation of that, but it could hide a dict of set data points, and present it to an extension module as a complete array. Cheers, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Sat Apr 13 18:43:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Sat Apr 13 18:43:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020413100823.45837.qmail@web12907.mail.yahoo.com> Message-ID: > Ok, here's my list: > > Philosophical > > You have a proposal in to the Python guys to make Numarray into the > standard _implementation_. I think standards like this should specify > an _interface_, not an implementation. > Sure (though there is often more to a standard than just an interface, but certainly an implementation is generally not the standard). I'm not sure why you think we imply the implementation is the standard. We are waiting to rewrite the PEP when we are closer to having the implementation ready, but we've been very open about the design and have asked for input on it for a long time now. > Simplicity > > I can give my users a single XArray.py file, and they can be off and > running with something that works right then and there, and it could in > many ways be compatible with Numarray (with some slight modifications) > when they decide they want the extra functionality of extension modules > that you or anyone else who follows your standard provides. But they > don't have to compile anything until they really need to. > > Your implementation leaves me with all or nothing. I'll have to build > and use numarray, or I've got an in house only solution. > Hard to comment on this. > Expediency > > I want to see a usable standard arise quickly. If you maintain the > stance that we should all use the Numarray implementation, instead of > just defining a good Numarray interface, everyone has to wait for you > to finish things enough to get them accepted by the Python group. Your > implementation is complicated, and I suspect they will have many things > that they will want you to change before they accept it into their > baseline. (If you think my list of suggestions is annoying, wait until > you see theirs!) > I have the strong sense you misunderstand how the process works. Guido will be driven in large part by the acceptance or non-acceptance of the Numeric community. If they don't buy into it. It won't be part of the standard. If it won't be used by many, it won't be part of the standard. Yes, he will review the design and interface to see if there should be a long term commitment by the Python maintainers to have it in the standard library. We have sent him the design documents, and we do keep him informed. He has given us feedback about it. But for the most part, the judgement is going to be by the Numeric community. > If a simple interface protocol is presented, and a simple pure Python > module that implements it. The PEP acceptance process might move along > quickly, but you could take your time with implementing your code. > > Pragmatic > > You guys aren't finished yet, and I need to give my users an array > module ASAP. As such a new project, there are likely to be many bugs > floating around in there. I think that when you are done, you will > probably have a very good library. Moreover, I'm grateful that you are > making it open source. That's very generous of you, and the fact that > you are tolerating this discussion is definitely appreciated. > > Still, I can't put off my projects, and I can't task you to > work faster. > > > However, I do think we could agree in a very short term that your design > for the interface is a good one. I also think that we (or just > me if you > like) could make a much smaller PEP that would be more readily accepted. > Then everyone in this community could proceed at their own pace > - knowing > that if we followed the simple standard we would have inter operability > with each other. > I think we still don't understand what you need yet. More elaboration on that later. > Social > > Normally I wouldn't expect you to care about any of my special issues. > You have your own problems to solve. As I said above, it's generous of > you to even offer your source code. > > However, you are (or at least were) trying to push for this to become a > standard. As such, considering how to be more general and apply to a > wider class of problems should be on your agenda. If it's not, then you > shouldn't be creating the standard. > Pleeease. Just because a library developer doesn't happen to meet your needs doesn't mean it can't be part of the standard library. There are plenty of modules in the standard library that could have been made more general in some way, but there they are. The criteria is whether it solves problems for a large community of users, not that it is infinitely extensible or so on. Software development is full of trade-offs and that includes limits to generalization. Sure we can discuss whether things could be made more general or not. But because you want it more general doesn't mean we just say "Sure, you define everything!" > If you don't care about numarray becoming standard, I would like to try > my hand at submitting the slightly modified version of your design. I > won't be compatible with your stuff, but hopefully others will follow > suit. > You are free to propose your own standard at any time. No one will stop you from doing so. > Functionality > > Data Types > > I have needs for other types of data that you probably have little use > for. If I can't coerce you to make a minor change in specification, I > really don't think I could coerce you to support brand new data types > (complex ints is the one I've beaten to death, because I > could use that > You are right on complex ints (that we won't consider them). One could take numarray and add them if one wanted and have a more extended version. But we won't do it, and we wouldn't support as being in what we maintain. It's one of those trade offs. > one in the short term). What happens when someone at my company wants > quaternions? I suspect that you won't have direct support for those. > I know that numarray is supposed to be extensible, but the following > raises an exception: > > from numarray import * > > class QuaternionType(NumericType): > def __init__(self): > NumericType.__init__(self, "Quaternion", 4*8, 0) > > Quaternion = QuaternionType() # BOOM! > > q = array(shape=(10, 10), type=Quaternion) > > Maybe I'm just doing something wrong, but it looks like your code > wants "Quaternion" to be in your (private?) typeConverters dictionary. > Yep, and there's a good reason for that. Just spend a few minutes thinking about the role types play with array packages and how they have traditionally been implemented. Generally speaking, it is presumed that any two numeric types may be used in a binary operator. So you, Scott, define your special type, Quaternions. You will need to provide the module all the machinery for knowing what to do with all the other numeric types available. You may not care, but it is a requirement that numarray (and Numeric) know what to do. If that doesn't fit in with your needs, then you shouldn't be trying to use it. The problem is worse than that. You supply a Quaternion type extension to numarray, and Bob supplies a super long int type (64 bytes!) also. Both of you have gone to the trouble of giving numarray the means of handling all other default numarray types. But you don't know to handle each other. How do you solve that problem? I don't know. If you do, let us know. Given the requirements, adding new numeric types is not going to allow indepenent extensions to work with each other. That's fairly limiting, but that's the price that is paid for the feature. > Ok, try two: > > from numarray import * > > q = NDArray(shape=(10, 10), itemsize=4*8) > > if a[5][5] is None: > print "No boom, but what can I do with it?" > > Maybe this is just a documentation problem. On the other hand, I can > do the following pretty readily: > > import array > class Quat2D: > def __init__(self, *shape): > assert len(shape) == 2 > self._buffer = array.array('d', [0])*shape[0]*shape[1]*4 > self._shape, self._stride = tuple(shape), (4*shape[0], 4) > self._itemsize = 4*8 > > def __getitem__(self, sub): > assert isinstance(sub, tuple) and len(sub) == 2 > offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] > return tuple([self._buffer[offset + i] for i in range(4)]) > > def __setitem__(self, sub, val): > assert isinstance(sub, tuple) and len(sub) == 2 > offset = sub[0]*self._stride[0] + sub[1]*self._stride[1] > for i in range(4): self._buffer[offset + i] = val[i] > return val > > q = Quat2D(10, 10) > q[5, 5] = (1, 2, 3, 4) > print q[5, 5] > > This isn't very general, but it is short, and it makes a good example. > I'm not sure what it proves. If all you need is an array to store some kind of type, be able to index and slice it, and not provide numeric operations, by all means use the existing array module, it does that fine. It's more work to subclass NDArray, but it can do it too, and gives you more capabilities (you won't be able to use index arrays or broadcasting in the array module for example). The extra functionality comes at some price. Sure, it isn't as simple to extend. It's your choice if it is worth it or not. If you want to add your large quaterion array efficiently, then the array module is worthless. Your example shows nothing about what your real needs for the object are. > If they get half of their data from calculations using Numarray, and > half from whatever I provide them, and then try to mix the results in > an extension module that has to know about separate implementations, > life is more complicated than it should be. > It's how you intend to 'mix' these that I have no clue about. > Operations > > I'm going to have to write my own C extension modules for some high > performance operations. All I need to get this done is a void* > pointer, > the shape, stride, itemsize, itemtype, and maybe some other things to > get off and running. You have a growing framework, and you have > already > indicated that you think of your hidden variables as private. I don't > think I or my users should have to understand the whole UFunc > framework > and API just to create an extension that manipulates a pointer to an > array of doubles. > Sigh. No one said you had to understand the ufunc framework to do so. We are working on an C API that just gives you a simple pointer (it's actually available now, but we aren't going to tout it until we have better documentation). > Arrays are simpler than UFuncs. I consider them to be pretty > seperable > parts of your design. If you keep it this way, and it becomes the > standard, it seems that I and everyone else will have to understand > both parts in order to create an extension module. > Wrong. > Flexibility > > Numarray is going to make a choice of how to implement slicing. > My guess > is that it will be one of "copy contiguous", "copy on write", "copy by > reference". I don't know what the correct choice is, but I know that > someone else will need something different based on context. > Things like > UFuncs and other extension modules that do fast C level calculations > typically don't need to concern themselves with slicing behaviour. > And they don't. > Design > > Your implementation would be similar to having the 'pickle' module > require you to derive from a 'Pickleable' base class - instead of simply > providing __getstate__ and __setstate__ methods. > > It's an artificial constraint, and those are usually bad. > You say. You are quite welcome do your own implementation that doesn't have this 'artificial' constraint. After all your text I *still* don't understand how you intend to use the 'interface' of the private attributes. You haven't provided any example (let alone a compelling one) of why we should accept any object that provides those attributes. Shoudn't the object also provide all the public methods. Shouldn't also provide indexing and so forth. All in all you are talking about checking quite a few attributes to make sure the object has the interface. And even if it does, *why* in the world would we presume that the C functions used by numarray would work properly with the object you provide. I really don't have a clue as to what you are getting at here, and without some real concrete example illustrating this point, I don't think there is any point to continuing this discussion. > > > > All good in principle, but I haven't yet seen a reason to change > > numarray. As far as I can tell, it provides all you need exactly > > as it is. If you could give an example that demonstrated otherwise... > > > > Maybe you're right. I suspect you as the author will come up with the > quick example that shows how to implement my bizarre quaternion example > above. I'm not sure if this makes either of us right or wrong, but if > you're not buying any of this, then it's probably time for me to chock > this off to a difference in opinion and move on. > > Truthfully this is taking me pretty far from my original tack. Originally > I had simply hoped to hack a couple of things into arraymodule.c, and here > I am now trying to get a simpler standard in place. I'll try one > last time > to convince you with the following two statements: > > - Changing such that you only require the interface is a subtle, > but noticeable, improvement to your otherwise very good design. > > - It's not a difficult change. > > > If that doesn't compel you, at least I can walk away knowing I tried. For > the volumes I've written, this will probably be my last pesky message if > you really don't want to budge on this issue. > We're not going to budge until you show us what the hell you are talking about. > > The alternative of coming up with a different specifier for > records/structs > is probably a mistake now that the struct module already has it's (terse) > format specification. Once that is taken into consideration, > following all > the leads of the struct module makes sense to me. > Again, you are free to do your own, or fork our numarray and do it the way you want. Or do your own from scratch. Or whatever. > [...] > Also, just mmaping the whole file puts all of the memory use at the > discretion of the OS. I might have a gig or two to work with, but if mmap > takes them all, other threads will have to contend for memory. The system > (application) as a whole might very well run better if I can retain some > control over this. > > > I'm not married to the windowing suggestion. I think it's something to > consider, but it might not be a common enough case to try and make a > standard mechanism for. If there isn't a way to do it without a kluge, > then I'll drop it. Likewise if a simple strategy can't meet anyone's real > needs. > You can forget our doing it. It's out of the question for us. > > > > If the 32 bit address is your problem, you are far, far better off > > using a 64-bit processor and operating system than trying to kludge up > > a windowing memory mechanism. > > > > We don't always get to specify what platform we want to run on. Our > customer has other needs, and sometimes hardware support for > exotic devices > dictate what we'll be using. Frequently it is on 64 bit Alphas, but > sometimes the requirement is x86 Linux, or 32 bit Solaris. > > Finally, our most frustrating piece of legacy software was written in > Fortran assuming you could stuff a pointer into an INT*4 and now requires > the -taso flag to the compiler for all new code (which turns a sexy 64 bit > Alpha into a 32 bit kluge...). > You may have customers with unreasonable demands. We don't have to let them cause an incredible complication in the underlying machinery. (And we won't). And we won't make it work on Windows 3.1 either. We have to draw the line somewhere. Your customers will pay dearly (and you will benefit :-). > Also, much of our data comes on tapes. It's not easy to memory map those. > Your point being? > > > [...] This doesn't seem to be going anywhere. If you can give us a better idea of how your interface needs would be used, at least we could respond to the specific issues. But we don't understand and although we are considering some changes, I'm not going to fold in your requests until we do understand. You may not be happy with the progress we are making either. Sorry, I can't help that. If you need something sooner, you'll need to do something else. Come up with your own system and try to get it into Python. Take numarray and do it the way you think it ought to be done and at the rate you think it should be done. You're welcome to. Take the array module and use that as a basis. We'd like numarray to be part of the standard. We'd like it to be the standard package in the Numeric community. But if neither happened, we'd still be working on it. We need it for our own work. Numeric doesn't give us the capabilities that we need. We are using it for our software development and it is being used to reduce HST data now. We are continuing on this regardless. Perry From paul at pfdubois.com Sat Apr 13 19:35:02 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Sat Apr 13 19:35:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <000001c1e35c$d85a2f90$0a01a8c0@NICKLEBY> I haven't been following this discussion (I have a product release on Monday). But I am getting a lot of mail stacking up for numpy-developers which will not go through unless you are one of the registered developers mailing from your registered mail account. All others, please do not use numpy-developers. This is a private channel for the official developers only. I gather from my brief reading that someone is looking for a standard to use now. That standard is Numeric. If you go with that now then when the time comes to switch to Numarray, you'll be in the same boat as the whole community and therefore liable to be able to profit from any conversion tools required. You can reduce your problems to a minimum by sticking with the Python interface where possible. If you have some special need that Numeric is not meeting please realize that what exists is a consensus product after a long evolution and it is not likely to change much to meet your particular needs. There are some areas where what is right for one set of people is wrong for the others. From xscottg at yahoo.com Sun Apr 14 04:20:03 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Sun Apr 14 04:20:03 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020414111911.2977.qmail@web12901.mail.yahoo.com> Perry, I've been trying to be persuasive, but I think all I've managed to do is to be verbose and annoy you. Please accept my apologies. I really am sorry this is going as poorly as it is. I'm doing a lousy job of getting my point across, and I'd like to turn around the tone this has taken. Email always comes off as more antagonistic than intended. Finally, my appeal to the fact that you are proposing a standard was heavy handed. I guess I was trying to use that to force you to consider my position. It clearly backfired... I'll try to be more to the point. Here's what I'm proposing, and it's only a suggestion. *** I think the requirements for being a general purpose "NDArray" can be specified with only the following attributes: __array_buffer__ - as buffer object __array_shape__ - as tuple of long __array_itemsize__ - as int Optionally __array_stride__ - as tuple of long (get from shape if None) __array_offset__ - as int (would default to 0 if not present) Then anyone who implemented these could work with the same C API for getting the pointer to memory, shape array, stride array, and item size. The set of operations on a pure "NDArray" is probably pretty minimal (reshape, transpose/rotate, index arrays?). So in order to create a full featured "NumArray", a few more attributes are required: __array_itemtype__ - as string? Optionally __array_endian__ - as 1 char string? (default to the native endian) This brings the total up to 4 required attributes, and 3 optional ones for a very general purpose array data structure. (I can think of other optional ones, but skip that for now.) > > All in all you are talking about checking quite a few attributes > to make sure the object has the interface. And even if it does, > *why* in the world would we presume that the C functions used by > numarray would work properly with the object you provide. > Because truthfully arrays are little more than a pointer to memory. That's like asking "why in the world would we presume memcpy() or qsort() would know what to do with your memory?" > > You haven't provided any example (let > alone a compelling one) of why we should accept any object that > provides those attributes. > Well, the UFuncs certainly should reject any object that they don't know how to handle. I'm currently only addressing what it takes to be an NDArray/NumArray object. OTOH, if I can present something to the UFuncs that looks like a known array type, why wouldn't UFuncs want to work with it? Ok, so what does this buy you? Well, it probably doesn't buy you personally very much. Your needs are already being met by the current implementation. Ok, so what does this cost you? A few translations: _data -> __array_buffer__ _shape -> __array_shape__ _strides -> __array_stride__ _itemsize -> __array_itemsize__ _offset -> __array_offset__ _type -> __array_type__ _byteswap -> __array_endian__ This isn't a style criticism. I'm not just asking you to change your names, I'm asking to promote the names to be a "standard interface" much like these things are in many places in Python. Also requires some small changes to getNDInfo() and getNumInfo() so that they can calculate the derived fields (contiguous, aligned, etc...). Also requires some changes to your scripts so that it checks for the interface rather than the inheritance. What are the benefits to anyone else? - Describes how anyone could implement something that looks and acts like NDArrays or NumArrays. There are probably a lot of reasons to want to do this. I have some reasons that I don't think you value too much. I think others would have reasons which I can't imagine too. - Allows one standard API for getting at the basics of NDArrays/NumArrays - Allows anyone to easily implement other data types for NumArrays. The typecode won't match any of your builtin types, but maybe other third parties could agree on other typecodes for their crazy needs and share modules. - Allows me personally to distribute a separate (and simpler) implementation of NDArrays/NumArrays right now and have the same data objects work with yours when you're all done. If I give the UFuncs a pointer to memory, and the attributes above, why shouldn't it work correctly? > > We're not going to budge until you show us what the hell you are talking > about. > Am I doing any better? I am trying. > > You are right on complex ints (that we won't consider them). One > could take numarray and add them if one wanted and have a more > extended version. But we won't do it, and we wouldn't support as > being in what we maintain. It's one of those trade offs. > Is there a way, today, without modifying numarray, for me to use numarray as a holder for these esoteric data types? Is that way difficult? Could it be easier? I'm not asking numarray to know about my types in it's core baseline. I'm wondering what it takes to implement new types at all. > > Your example shows nothing about what your > real needs for the object are. > My real needs are all over the place. Some of which you've shown me are solvable with the current implementation of numarray. Some of which you've not addressed or said you won't address. To be explicit: Here are (at least most of) my _needs_ for array objects: - support a wide variety of data types (user defined) - have efficient storage - support the pickle interface for serialization - allow alternate sources of underlying memory - have an easy interface for accessing the pieces necessary to create C extensions (buffer, shape, stride, ...) - completed and reliable in the near term Here are (at least some of) my _wants_ for array objects: - cooperate on some level with other standard array modules (once the standard is set) - have same API for accessing the pieces (buffer, shape, stride, ...) as all standard array modules will. - implementation in pure Python so that building extension modules is not required until the fast operations present in those modules is required. - implemented from a standard that is as good as it can be Here are (at least some of) my _whims_ for array objects: - has "windowing" functionality to work efficiently with really large files (on any modern platform). - alternate implementations for things such as "slicing behaviour" (copy on write, reference). Loosely following your design, I've already written a module that meets my "needs", I was hoping that we could cooperate towards filling in some of my "wants" (cooperating array modules), and I've brought up my "whims" because I thought they were interesting possibilities for discussion. I was going to respond to some of your other remarks, but I've probably wasted enough of your time. If you don't respond to this message, I'll take that as a sign that we just aren't going to see eye to eye on any of this, and I won't bother you any more. (I'll be half surprised if you even get this message. From the tone of your last one, I wouldn't be shocked to find out you've already added me to your killfile. :-) No hard feelings, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Sun Apr 14 11:55:02 2002 From: perry at stsci.edu (Perry Greenfield) Date: Sun Apr 14 11:55:02 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020414111911.2977.qmail@web12901.mail.yahoo.com> Message-ID: Hi Scott, Just to be to the point, I'm still missing what I've been asking for, to wit a concrete example that illustrates your point. I'll try to address a few of your points that appear to try to answer that and clarify what I mean by concrete example. > > Here's what I'm proposing, and it's only a suggestion. > > > *** I think the requirements for being a general purpose "NDArray" > can be specified with only the following attributes: > > __array_buffer__ - as buffer object > __array_shape__ - as tuple of long > __array_itemsize__ - as int > > Optionally > __array_stride__ - as tuple of long (get from shape if None) > __array_offset__ - as int (would default to 0 if not present) > > Then anyone who implemented these could work with the same C API for > getting the pointer to memory, shape array, stride array, and item size. > Then you are talking about standardizing a C-API. But I'm still confused. If you write a class that implements these attributes, is it your C-API that uses them, or do you mean our C-API uses them? If you have your own C-API, then the attributes are not relevant as an interface. If you intend to use our C-API to access your objects, then they are. But if you want to use our C-API, that still doesn't explain why the alternatives aren't acceptable (namely subclassing). > > Because truthfully arrays are little more than a pointer to memory. > > That's like asking "why in the world would we presume memcpy() or > qsort() would know what to do with your memory?" > Then you misunderstand Numarray. Numarrays are far more than just a pointer to memory. You can get a pointer to memory from them, but they entail much more than that. Numarray presumes that certain things are possible with NumArray objects (like standard math operations). If you want something that doesn't make such an assumption, you should be using NDArray instead. NDArray makes no presumptions about the contents of the memory other than they are arranged in memory in array fashion. > > > > > You haven't provided any example (let > > alone a compelling one) of why we should accept any object that > > provides those attributes. > > > > Well, the UFuncs certainly should reject any object that they don't > know how to handle. I'm currently only addressing what it takes to be > an NDArray/NumArray object. OTOH, if I can present something to the > UFuncs that looks like a known array type, why wouldn't UFuncs > want to work with it? > If you are presenting numarray with a type is already knows about, why aren't you subclassing it? If you present numarray an object with a type it doesn't know about, then that is pointless. Types and numarray are inextricably intertwined, and shall remain so. > > - Allows me personally to distribute a separate (and simpler) > implementation of NDArrays/NumArrays right now and have the same data > objects work with yours when you're all done. If I give the UFuncs a > pointer to memory, and the attributes above, why shouldn't it work > correctly? > > > Am I doing any better? I am trying. > Not really. More on that later. > > > Is there a way, today, without modifying numarray, for me to use > numarray as a holder for these esoteric data types? Is that way > difficult? > Could it be easier? > No to the first, it isn't intended to serve that purpose. If you just need something to blindly hold values without doing anything with them use NDArray (and you can add whatever customization you wish regarding what methods or operators are available). > I'm not asking numarray to know about my types in it's core baseline. I'm > wondering what it takes to implement new types at all. > It's possible to extend (but not in any way that makes it automaticaly usable with anyone elses extension. Currently that sort of extension would not be hard for someone that knows how things work. We haven't documented how to do so, and won't for a while. It's not a high priority for us now. ********************************************************** What I want to see is a specific example. I'm not going to pay much attention to generalities becasue I'm still unclear about how you intend to do what you say you will do. Perhaps I'm slow, but I still don't get it. On the one hand, you ask us to have numarray accept objects with the same 'interface'. Well, if they are not of an existing supported type, thats pointless since numarray won't work properly with them. If it is an existing type, you haven't explained why you can't use numarray directly (or alternatively, create a numarray object that uses the same buffer yours does). I still haven't seen a specific example that illustrates why you cannot use subclassing or an instance of a numarray object instead. If you need to add a new type that's possible but you'll have to spend some time figuring out how to do that for your own extended version. If you just want to use arrays to hold values (of new types), then use NDArray. It doesn't care about types. But please give a specific case. E.g., "I want complex ints and I will develop a class that will use this to do the following things [it doesn't have to be exhastive or complete, but include just enough to illustrate the point]. If the attributes were standardized then I would do this and that, and use it with your stuff like this showing you the code (and the behavior I expect)." Given this I can either show you an alternate solution or I can realize why you are right and we can discuss where to go from there. Otherwise you are wasting your time. Perry From xscottg at yahoo.com Sun Apr 14 21:10:12 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Sun Apr 14 21:10:12 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020415040923.5808.qmail@web12903.mail.yahoo.com> --- Perry Greenfield wrote: *** Just skim through my first few responses. About half way through writing this letter, a few things hit me. I still want to propose some changes, but I don't think you'll find them as intrusive... > > > > > Then anyone who implemented these could work with the same C API for > > getting the pointer to memory, shape array, stride array, and item > > size. > > > Then you are talking about standardizing a C-API. But I'm still > confused. If you write a class that implements these attributes, > is it your C-API that uses them, or do you mean our C-API uses > them? > I'm not really talking about standardizing a C-API. I'm talking about standardizing what that C-API would have to do. You would have your C-API as part of numarray proper. And, for the short term, I would have my own C-API as part of what I need to get done. Both C-API's would use the same attributes. Why do I want my own C-API today? Because numarray isn't done yet, and I can't create arrays of the types I need. I'll need a C-API to get at my types. It would be great if the same C-API could get at yours too. > > If you have your own C-API, then the attributes are not > relevant as an interface. If you intend to use our C-API to access > your objects, then they are. > Either C-API could access anything that looks like an NDArray. > > > > > Because truthfully arrays are little more than a pointer to memory. > > > > That's like asking "why in the world would we presume memcpy() or > > qsort() would know what to do with your memory?" > > > > Then you misunderstand Numarray. Numarrays are far more than just > a pointer to memory. You can get a pointer to memory from them, > but they entail much more than that. Numarray presumes that certain > things are possible with NumArray objects (like standard math > operations). If you want something that doesn't make such an > assumption, you should be using NDArray instead. NDArray makes > no presumptions about the contents of the memory other than > they are arranged in memory in array fashion. > I think I understand where you're coming from now. (BTW, I think some of our confusion comes from when I'm talking about "Numarray" or "numarray" the package versus "NumArray" and "NDArray" the classes.) *** Ok, I think there is light at the end of this tunnel... I guess what I've been arguing for all along is something a lot like an NDArray where I can specify the typecode (and possibly other things like 'endian' etc...), and that only NDArrays have a minimal set of standardized attributes. With this I can create extensions that will work with anything that looks like an NDArray. Your NDArrays from the numarray package, and my NDArrays of crazy types. I'm still left in the position of having to upcast an NDArray to a full blown NumArray if I ever want to use my NDArrays in a routine meant solely for NumArrays. However this conversion isn't difficult, and I think can do that when needed. Important Question: If an NDArray had a typecode (and it was a known string), is it possible to promote it to one of the standard NumArray types? Lesser Question: If an NDArray had a known typecode, is it desirable for numarray routines to promote the NDArray to a NumArray in the same way that the routines promote a Python list or tuple to a NumArray on the fly? Ok, my new proposal (again, treat it like a suggestion): - Do you think it would be possible to standardize the set of attributes that it requires to be an NDArray? NDArrays are simple and unlikely to change. I think _those_ really are just pointers to memory with array accounting information. We could agree on what exactly constitutes an NDArray. - Could this standard set of attributes optionally include the names for the typecode, endian, (and maybe some other) attributes? That doesn't mean that your NDArrays would have to have the typecode, endian or whatever information. It just means that when any class does add a typecode, it adds it as a specially named attribute. I realize that a large part of what I want is interoperability between separate implementations of NDArrays. Anything that has (_data, _shape, _itemsize, _type) is something I could work with in an extension. Some other fields are optional (_strides, _byteoffset) because they have sensible defaults that can be calculated from above in the common case. So the only difference between what you currently have and most of what I'm proposing is that the names of NDArray attributes become standardized. > > If you are presenting numarray with a type it already knows about, > why aren't you subclassing it? > Since I know I'll have to create types that numarray doesn't know about, I know I'm going to have to write a new array class (it's already written). It would be silly of my new array class to not implement the standard types just because numarray _does_ know about them. I now realize that I don't have to give my class to numarray directly. That didn't hit me before. I could promote/upcast it when necessary. The upcast-in and downcast-out thing will add up to extra work and messier code, but it is a workaround. > > If you present numarray an object > with a type it doesn't know about, then that is pointless. > Types and numarray are inextricably intertwined, and shall > remain so. > Understood. I don't want to ruin your NumArrays. > > ********************************************************** > > What I want to see is a specific example. I'm not going to > pay much attention to generalities because I'm still unclear > about how you intend to do what you say you will do. Perhaps > I'm slow, but I still don't get it. > Nope, clearly it was me that was being slow. There is still that bit about NDArrays that I'm trying to justify, so my example is below. > > (or alternatively, > create a numarray object that uses the same buffer yours does). > You're right. This hadn't occurred to me until just a little bit ago. > > E.g., "I want > complex ints and I will develop a class that will use this to > do the following things [it doesn't have to be exhaustive or > complete, but include just enough to illustrate the point]. > If the attributes were standardized then I would do this and that, > and use it with your stuff like this showing you the code > (and the behavior I expect)." > Here goes (somewhat hypothetical, but close to the boat I'm currently in): Jon is our FPGA guy who makes screaming fast core files, but our FPGAs don't do floating point. So I have to provide his driver with ComplexInt16 data. Jon and I write an extension module that calls his driver and reads data. We also write a C routine (call it "munge") that takes both ComplexInt16 data, and ComplexFloat64 data. We try it out for testing, and pass in my arrays in both places. We could have used Numarray for the ComplexFloat64, but that meant we had to use two array packages, and use two C-APIs in our extension. All we needed was a pointer to an array of doubles, so we stuck with mine. Ok, that part of development is done. Now we present it to the application developers. Their happy and we're rolling. Successful application. Another group find out about this and they want to use it. They're using numarray for a large part of their application. In fact, their calculating the ComplexFloat64 half the data that they want to pass to my "munge" routine using numarray, and they still need to use my ComplexInt32 data to read the FPGA. They're going to be disappointed to find out my extension can't read numarray data, and that they have to convert back and forth between the two. And as the list of routines grow, they have to keep track of whether it is a numarray-routine, or a scottarray-routine. It's not so bad for one simple "munge" function, but there are going to be hundreds of functions... I don't expect you to have much sympathy for my having to convert data back and forth between my array types and yours, but it is an avoidable problem. For the most part, we both agree on what parts an NDArray should have. If we could only agree what to name them, and that we'd stick to those names, that would be a large part of it for me. > > Given this I can either show you an alternate solution or > I can realize why you are right and we can discuss where > to go from there. Otherwise you are wasting your time. > Cheers, -Scott __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From jmiller at stsci.edu Mon Apr 15 11:19:09 2002 From: jmiller at stsci.edu (Todd Miller) Date: Mon Apr 15 11:19:09 2002 Subject: [Numpy-discussion] ANN: Numarray-0.3.1 and 0.3.2 Message-ID: <3CBB1955.1010800@stsci.edu> Numarray 0.3.1 and 0.3.2 --------------------------------- Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, and improved type promotions. Numarray-0.3.1 incorporates a number of bug fixes and enhancements to the C-API, including a minimal Numeric emulation layer which makes it easy to port simple Numeric C-extensions to numarray. The emulation layer is incomplete, so not all Numeric extensions will work, but simple ones *do* with a minimal amount of effort. See Doc/numpy_compat for an example of convolution done using the emulation layer. New for Numarray-0.3.1 is the Numarray manual in PDF and HTML formats; other formats are available for users if the source distribution. Numarray-0.3.2 is a source only release to support Alpha/Tru64. It is essentially Numarray-0.3.1 + one portability bug fix. WHERE ----------- Numarray-0.3.1 windows executable installers and source code tar ball is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS -------------------------- numarray-0.3.1 requires Python 2.0 or greater. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. Thanks go to Jochen Kupper of the University of North Carolina for his work on Numarray and for porting the Numarray manual to TeX format. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From perry at stsci.edu Mon Apr 15 14:20:01 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Apr 15 14:20:01 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020415040923.5808.qmail@web12903.mail.yahoo.com> Message-ID: Hi Scott, I'm not going to respond to all points but mainly concentrate on the last section. > > > Important Question: If an NDArray had a typecode (and it was a known > string), is it possible to promote it to one of the standard NumArray > types? > I think we want to avoid NDArray having any type attribute (Some types have subtypes and then the issue gets really messy). We leave it to the subclass to address how types will be handled. > Here goes (somewhat hypothetical, but close to the boat I'm currently in): > > Jon is our FPGA guy who makes screaming fast core files, but our FPGAs > don't do floating point. So I have to provide his driver with > ComplexInt16 > data. > > Jon and I write an extension module that calls his driver and reads data. > We also write a C routine (call it "munge") that takes both ComplexInt16 > data, and ComplexFloat64 data. We try it out for testing, and pass in my > arrays in both places. We could have used Numarray for the > ComplexFloat64, > but that meant we had to use two array packages, and use two C-APIs in our > extension. All we needed was a pointer to an array of doubles, > so we stuck > with mine. > > Ok, that part of development is done. Now we present it to the > application > developers. Their happy and we're rolling. Successful application. > > Another group find out about this and they want to use it. They're using > numarray for a large part of their application. In fact, their > calculating > the ComplexFloat64 half the data that they want to pass to my "munge" > routine using numarray, and they still need to use my ComplexInt32 data to > read the FPGA. > > They're going to be disappointed to find out my extension can't read > numarray data, and that they have to convert back and forth between the > two. And as the list of routines grow, they have to keep track of whether > it is a numarray-routine, or a scottarray-routine. > > It's not so bad for one simple "munge" function, but there are going to be > hundreds of functions... > > I don't expect you to have much sympathy for my having to convert > data back > and forth between my array types and yours, but it is an > avoidable problem. > > > > For the most part, we both agree on what parts an NDArray should have. If > we could only agree what to name them, and that we'd stick to those names, > that would be a large part of it for me. > > I'm not sure I understand the problem in all the details I need to. I'll restate it as best as I understand it and you can tell me if I understood incorrectly. You have extension modules that get complex int data from hardware. Other processing may be done to the complex int data in that format so it doesn't make sense to convert it to a more standard format when reading it in. You have C extensions that carry out certain tasks on complex data (in either complex int format or complex floats). You have users that would like to use your routine with numarray. (I haven't seen any specific mention of the need for ufuncs on complex ints so I'll assume you just need complex int arrays as containers for C programs to use.) [If you did need to perform ufuncs on complex ints, then extending numarray locally to handle them would be one possibility, but a little involved at the moment (a little easier later when we reimplement complex), then again, maybe not, the complex stuff is currently subclassed from numarray and not that hard to adapt to ints I think, but it isn't that well done now]. I guess my initial reaction is that you should develop a front- end C-API that handles obtaining data buffers from different sources. You get to define what kinds of things it supports, and changes to either the list of types you support and localizes any dependencies on our or anyone else's api to a small section of code. From what I'm hearing, you don't need it to provide much (pointer to arrays and associated information). If we are real bozos and change the interface, it doesn't hurt you much (not that we intend to be bozos or change the C-API willy nilly :-) To elaborate, you define your equivalent of our getNumInfo routine I don't think I've seen anything that requires explicit dependencies on Python attributes. Sure, you could use the same attribute names and use Python calls to get those just as our getNumInfo routine does, but I think that is bad practice. You may find some other representation for arrays out there that doesn't fit this model and you may want to work with those also and you won't be able to get them to adopt our scheme. You say that you don't want your users to have to convert between the two data representations. If they are using your C extensions that is understandable, and avoidable since you've written your programs to deal with the various types. On the other hand, unless you extend numarray, numarray clearly cannot deal with the complex ints so conversion is necessary. But understandably, you would like to eliminate the need for explicit conversions. I think there is an easy way of dealing with this. We haven't implemented this capability yet but we've been talking about having numarray check input values to see if they have a method "tonumarray" [not that we would choose that particular method name, I'm just illustrating the point]. If that method did exist, it would be called to create a numarray from the object. Thus you could add such a method to your class and when it is used in numarray ufuncs or in binary operations with numarray objects, your complex ints are automatically converted to numarray objects (presumably a complex float of some precision). Adding this capability to numarray should be pretty easy. True, the solution that I proposed doesn't protect you from making any changes ever. But we believe we are at a stage in the project where it is dangerous to lock ourselves into lower level details such as the internal description of the array. We still have things to implement and that may cause us to realize that some changes are needed. Our C-API stuff is relatively new. It may see changes in the near future, but likely not many related to what you need. And we intend to shield the C-API from changes in the Python attributes. We could change the name or contents of _byteswap and it would not change anything in the C-API. I see premature coupling of low level implementation details as a bad thing, not a good thing. Any change that are made to the API require changes only the corresponding routine in your C-API, and all your C applications are shielded from any changes (save rebuilding). If I've misunderstood your examples, please let me know. Perry From xscottg at yahoo.com Mon Apr 15 15:33:10 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Mon Apr 15 15:33:10 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: Message-ID: <20020415223223.5901.qmail@web12905.mail.yahoo.com> Hi Perry. Well, I don't think I've made any progress convincing you that standardizing what it means to be an interoperable "NDArray" would be good for me or others in the community, but I do appreciate you letting me try. I'll take your suggestion and make my C-API understand a superset of array types. I'll wait to see how the tonumarray() thing pans out. That might meet all of my practical concerns even if I don't think it is as elegant of a solution as defining a strong interface. I'll just respond to the one point below. If I had to sum up my argument for why I think separate array implementations could (should) be compatible, it is buried in the answer to this question. > > > > > Important Question: If an NDArray had a typecode (and it was a known > > string), is it possible to promote it to one of the standard NumArray > > types? > > > > I think we want to avoid NDArray having any type attribute (Some types > have subtypes and then the issue gets really messy). We leave it > to the subclass to address how types will be handled. > Ok that's what you're currently doing, but let me rephrase the question. :-) Given a "leaf type" -- something that is really well specified and very similar on all modern platforms: "Int32" - not just an arbitrary "Int" "Float64" - not just an arbitrary "Float") Do you think you could write a general purpose _function_ that converted an "NDArray" to a full featured "NumArray"? I know this would be in Python, but let's pretend it's a C++ prototype to make the types clear: NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) { if (WellKnownLeafTypecodeString(typecode)) { /* fill in the blanks here */ return NumArray(result) } throw "conversion really is impossible"; } Cheers and thanks again for your time, -Scott Gilbert __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Tue Apr 16 08:15:09 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 16 08:15:09 2002 Subject: [Numpy-discussion] Introduction In-Reply-To: <20020415223223.5901.qmail@web12905.mail.yahoo.com> Message-ID: > > > Important Question: If an NDArray had a typecode (and it was a known > > > string), is it possible to promote it to one of the standard NumArray > > > types? > > > > > > > I think we want to avoid NDArray having any type attribute (Some types > > have subtypes and then the issue gets really messy). We leave it > > to the subclass to address how types will be handled. > > > > Ok that's what you're currently doing, but let me rephrase the question. > > :-) > > > Given a "leaf type" -- something that is really well specified and very > similar on all modern platforms: > > "Int32" - not just an arbitrary "Int" > "Float64" - not just an arbitrary "Float") > > > Do you think you could write a general purpose _function_ that > converted an > "NDArray" to a full featured "NumArray"? I know this would be in Python, > but let's pretend it's a C++ prototype to make the types clear: > > > NumArray NDArray_to_NumArray(NDArray nda, String typecode, Endian end) { > if (WellKnownLeafTypecodeString(typecode)) { > > /* fill in the blanks here */ > > return NumArray(result) > } > > throw "conversion really is impossible"; > } > I'm not sure I understand exactly what you are trying to do here, but I try to address the question as best I can. If one had an NDArray that happened to contain a type that numarray supported, yes it is possible (in fact RecArray does that sort of thing). If your point is that in doing so one must use the private attributes such as _strides, yes that is true. These attributes are private in the sense that users of instances of these objects should never have cause to access them. But it does not mean that classes that subclass NDArray or any of its subclasses, should not access them. They are not private in the sense of the class family (one reason we didn't use __strides since that mechanism is not usable (easily anyway) for subclasses. In that sense, the attributes form an interface within the class family. Some class extenders may need to access them, sure. Perry From omar.mekkaoui at eco.u-cergy.fr Tue Apr 16 11:04:38 2002 From: omar.mekkaoui at eco.u-cergy.fr (mekkaoui) Date: Tue Apr 16 11:04:38 2002 Subject: [Numpy-discussion] Extension under windows Message-ID: <3CBC68D9.64B7C425@eco.u-cergy.fr> Dear Numerical Python Users, I have writen an extension using GSL (Gnu Scientific Library) and Numerical Python. This extension work fine under Linux and I would to do the same under Windows. For that I use Cygwin. When I would create the module $ gcc -shared Example.o -o Example.pyd I receive this message : Example.o<.text+0x58>:Example.c: undefined reference to 'PyArg_ParseTuple' Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue' Example.o<.text+0x1b1>:Example.c: undefined reference to 'Py_InitModule4' Example.o<.text+0x1c1>:Example.c: undefined reference to 'PyImport_ImportModule' Example.o<.text+0x1db>:Example.c: undefined reference to 'PyModule_GetDict' Example.o<.text+0x1f4>:Example.c: undefined reference to 'PyDict_GetItemString' Example.o<.text+0x206>:Example.c: undefined reference to 'PyCObject_Type' Example.o<.text+0x214>:Example.c: undefined reference to 'PyCObject_AsVoidPtr' Perhaps this command is wrong. Perhaps, anyone could explain or show me a document which explain the procedure clearly ? Thanks in advance for your help Omar From xscottg at yahoo.com Tue Apr 16 16:38:02 2002 From: xscottg at yahoo.com (Scott Gilbert) Date: Tue Apr 16 16:38:02 2002 Subject: [Numpy-discussion] Introduction Message-ID: <20020416233700.72472.qmail@web12904.mail.yahoo.com> --- Perry Greenfield wrote: > > If one had an NDArray that happened to contain a type that numarray > supported, yes it is possible (in fact RecArray does that sort of thing). > > If your point is that in doing so one must use the private attributes > such as _strides, yes that is true. > My point was simply: = One *can* convert from (NDArray + typecode) to a full NumArray = You *do* already convert lists, tuples, ... to NumArrays in ufuncs = So you *could* convert *(NDArrays + typecode) to NumArrays in ufuncs in the same place that checks to see if it is a list, tuple, ... Therefore: = You possibly *could* standardize the attributes in an NDArray (buffer, typecode, shape, stride, offset, ...) = If you *did* standardize the attributes, then others *could* build UserDefinedNDArrays however they see fit and they would work with NumArrays However I get the sense that the numarray module is your baby, and you don't want to change him too much. That's very understandable, you're a proud parent. Truth be told, he's a good looking kid, and I look forward to hanging out with him when he's all grown up. We just have a little different view on parenting, and I was hoping my kid would have an easier time playing with yours. Now that I've beaten that silly metaphor to death... :-) Cheers, -Scott ps: It occurs to me, with the strong sense of encapsulation you desire, that I could have presented this better as requesting that you specify a set of standard *methods* instead of attributes. Something like: def __array_getbuffer__(self): def __array_getoffset__(self): def __array_getshape__(self): def __array_getstrides__(self): def __array_getitemsize__(self): def __array_gettypecode__(self): def __array_getendian__(self): # Who knows what the real list would consist of... # We never got to discuss what a really general # purpose description of an NDArray would require... Then anything which implemented those standard *methods* would be a viable NDArray. From my point of view it amounts to about the same thing, but I think it's a better design and that you might like this idea more. However I'm getting out of breath on this topic, and I have other things I need to do (I'm sure this is true for you too), so if you don't see any merit in this idea, I won't push for it any further. Cheers again. __________________________________________________ Do You Yahoo!? Yahoo! Tax Center - online filing with TurboTax http://taxes.yahoo.com/ From perry at stsci.edu Tue Apr 16 17:52:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 16 17:52:03 2002 Subject: [Numpy-discussion] Conclusion In-Reply-To: <20020416233700.72472.qmail@web12904.mail.yahoo.com> Message-ID: After Scott's last display of his powers of persuasion, I lack for a meaningful response. It seems appropriate to declare this thread closed. Besides, I've got to go change some diapers ;-) Perry From paul at pfdubois.com Wed Apr 17 07:17:07 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Wed Apr 17 07:17:07 2002 Subject: [Numpy-discussion] Extension under windows In-Reply-To: <3CBC68D9.64B7C425@eco.u-cergy.fr> Message-ID: <000301c1e61a$20c2a590$0a01a8c0@NICKLEBY> You need to link with the Python library. I suggest you learn to use distutils and then it will load for you correctly on both platforms. The file "setup.py" in the Numeric source distribution is a good if complicated example. Some of the setup.py files in the Packages area are simpler and easier to understand. -----Original Message----- From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy-discussion-admin at lists.sourceforge.net] On Behalf Of mekkaoui Sent: Tuesday, April 16, 2002 11:09 AM To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] Extension under windows Dear Numerical Python Users, I have writen an extension using GSL (Gnu Scientific Library) and Numerical Python. This extension work fine under Linux and I would to do the same under Windows. For that I use Cygwin. When I would create the module $ gcc -shared Example.o -o Example.pyd I receive this message : Example.o<.text+0x58>:Example.c: undefined reference to 'PyArg_ParseTuple' Example.o<.text+0x15e>:Example.c: undefined reference to 'Py_BuildValue' Example.o<.text+0x1b1>:Example.c: undefined reference to 'Py_InitModule4' Example.o<.text+0x1c1>:Example.c: undefined reference to 'PyImport_ImportModule' Example.o<.text+0x1db>:Example.c: undefined reference to 'PyModule_GetDict' Example.o<.text+0x1f4>:Example.c: undefined reference to 'PyDict_GetItemString' Example.o<.text+0x206>:Example.c: undefined reference to 'PyCObject_Type' Example.o<.text+0x214>:Example.c: undefined reference to 'PyCObject_AsVoidPtr' Perhaps this command is wrong. Perhaps, anyone could explain or show me a document which explain the procedure clearly ? Thanks in advance for your help Omar _______________________________________________ Numpy-discussion mailing list Numpy-discussion at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion From magnus at hetland.org Wed Apr 17 07:32:31 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Wed Apr 17 07:32:31 2002 Subject: [Numpy-discussion] Graphs in numarray? Message-ID: <20020417163133.F7565@idi.ntnu.no> I'm looking at various ways of implementing graphs in Python (beyond simple dict-based stuff -- more performance is needed). kjbuckets looks like a nice alternative, as does the Boost Graph Library (not sure how easy it is to use with Boost.Python) but if numarray is to become a part of the standard library, it could be beneficial to use that... For dense graphs, it makes sense to use an adjacency matrix directly in numarray, I should think. (I haven't implemented many graph algorithms with ufuncs yet, but it seems doable...) For sparse graphs I guess some sort of sparse array implementation would be useful, although the archives indicate that creating such a thing isn't a core part of the numarray project. What do you think -- is it reasonable to use numarray for graph algorithms? Perhaps an additional module with standard graph algorithms would be interesting? (I'm sure I could contribute some if there is any interest...) And -- is there any chance of getting sparse matrices in numarray? -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From perry at stsci.edu Wed Apr 17 12:10:32 2002 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 17 12:10:32 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020417163133.F7565@idi.ntnu.no> Message-ID: Hi Magnus, On Behalf Of Magnus Lie Hetland > > I'm looking at various ways of implementing graphs in Python (beyond > simple dict-based stuff -- more performance is needed). kjbuckets > looks like a nice alternative, as does the Boost Graph Library (not > sure how easy it is to use with Boost.Python) but if numarray is to > become a part of the standard library, it could be beneficial to use > that... > > For dense graphs, it makes sense to use an adjacency matrix directly > in numarray, I should think. (I haven't implemented many graph > algorithms with ufuncs yet, but it seems doable...) For sparse graphs > I guess some sort of sparse array implementation would be useful, > although the archives indicate that creating such a thing isn't a core > part of the numarray project. > First of all, it may make sense, but I should say a few words about what scale sizes make sense. Currently numarray is implemented mostly in Python (excepting the very low level, very simple C functions that do the computational and indexing loops. This means it currently has a pretty sizable overhead to set up an array operation (I'm guessing an order of magnitude slower than Numeric). Once set up, it generally is pretty fast. So it is pretty good for very large data sets. Very lousy for very small ones. We haven't measured efficiency lately (we are deferring optimization until we have all the major functionality present first), but I wouldn't be at all surprised to find that the set up time can be equal to the time to actually process ~10,000-20,000 elements (i.e., the time spent per element for a 10K array is roughly half that for much larger arrays. So if you are working with much smaller arrays than 10K, you won't see total execution time decrease much (it was already spending half its time in setup, which doesn't change). We would like to reduce this size threshhold in the future, either by optimizing the Python code, or moving some of it into C. This optimization wouldn't be for at least a couple more months; we have more urgent features to deal with. I doubt that we will ever surpass the current Numeric in its performance on small arrays (though who knows, perhaps we can come close). > What do you think -- is it reasonable to use numarray for graph > algorithms? Perhaps an additional module with standard graph > algorithms would be interesting? (I'm sure I could contribute some if > there is any interest...) > Before I go further, I need to find out if the preceeding has made you gasp in horror or if the timescale is too slow for you to accept. (This particular issue also makes me wonder if numarray would ever be a suitable substitute for the existing array module). What size graphs are you most concerned about as far as speed goes? > And -- is there any chance of getting sparse matrices in numarray? > Since talk is cheap, yes :-). But I doubt it would be in the "core" and some thought would have to be given to how best to represent them. In one sense, since the underlying storage is different than numarray assumes for all its arrays, sparse arrays don't really share the same underlying C machinery very well. While it certainly would be possible to devise a class with the same interface as numarray objects, the implementation may have to be completely different. On the other hand, since numarray has much better support for index arrays, i.e., an array of indices that may be used to index another array of values, index array(s), value array pair may itself serve as a storage model for sparse arrays. One still needs to implement ufuncs and other functions (including simple things like indexing) using different machinery. It is something that would be nice to have, but I can't say when we would get around to it and don't want to raise hopes about how quickly it would appear. Perry From victor at idaccr.org Wed Apr 17 15:25:24 2002 From: victor at idaccr.org (Victor S. Miller) Date: Wed Apr 17 15:25:24 2002 Subject: [Numpy-discussion] The right way to use results of argmax and argmin Message-ID: # I'm running python 2.0 on Solaris and Numeric 21.0 #I have an m by n array -- called a and have # j an n long list of integers in range(m), such as j = argmax(a,0) # If I set z = zip(j,range(len(j))) # and try the statement res = take(a,z) # python appears to hang, but if I do res = array(map(lambda x,a=a: a[x[0],x[1]]],z) # It works. # Is there a simpler way of doing what I want, and why does take hang? # is it, perhaps, allocating some n by n work array (this would # probably make things thrash like crazy)? -- Victor S. Miller | " ... Meanwhile, those of us who can compute can hardly victor at idaccr.org | be expected to keep writing papers saying 'I can do the CCR, Princeton, NJ | following useless calculation in 2 seconds', and indeed 08540 USA | what editor would publish them?" -- Oliver Atkin From magnus at hetland.org Thu Apr 18 07:55:19 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 18 07:55:19 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: ; from perry@stsci.edu on Wed, Apr 17, 2002 at 03:06:12PM -0400 References: <20020417163133.F7565@idi.ntnu.no> Message-ID: <20020418165403.E300@idi.ntnu.no> Perry Greenfield : [snip] > First of all, it may make sense, but I should say a few words about > what scale sizes make sense. [snip] > So if you are working with much smaller arrays than 10K, you won't > see total execution time decrease much In relation to what? Using dictionaries etc? Using the array module? [snip] > Before I go further, I need to find out if the preceeding has made > you gasp in horror or if the timescale is too slow for you to > accept. Hm. If you need 10000 elements before numarray pays off, I'm starting to wonder if I can use it for anything at all. :I > (This particular issue also makes me wonder if numarray would > ever be a suitable substitute for the existing array module). Indeed. > What size graphs are you most concerned about as far as speed goes? I'm not sure. A wide range, I should imagine. But with only 100 nodes, I'll get 10000 entries in the adjacency matrix, so perhaps it's worthwile anyway? > > And -- is there any chance of getting sparse matrices in numarray? > > Since talk is cheap, yes :-). But I doubt it would be in the "core" > and some thought would have to be given to how best to represent them. > In one sense, since the underlying storage is different than numarray > assumes for all its arrays, sparse arrays don't really share the > same underlying C machinery very well. While it certainly would be > possible to devise a class with the same interface as numarray objects, > the implementation may have to be completely different. Yes, I realise that. > On the other hand, since numarray has much better support for index > arrays, i.e., an array of indices that may be used to index another > array of values, index array(s), value array pair may itself serve > as a storage model for sparse arrays. That's an interesting idea, although I don't quite see how it would help in the case of adjacency matrices. (You'd still need at least one n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... Right?) > One still needs to implement ufuncs and other functions (including > simple things like indexing) using different machinery. It is > something that would be nice to have, but I can't say when we would > get around to it and don't want to raise hopes about how quickly it > would appear. No - no problem. Basically, I'm looking for a platform to implement graph algorithms that doesn't necessitate too many installed packages etc. numarray seemed promising since it's a candidate for inclusion in the standard library. I guess I'll just have to do some timing experiments... > Perry -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From perry at stsci.edu Thu Apr 18 08:22:06 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 18 08:22:06 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020418165403.E300@idi.ntnu.no> Message-ID: > Behalf Of Magnus Lie Hetland > Perry Greenfield : > [snip] > > First of all, it may make sense, but I should say a few words about > > what scale sizes make sense. > [snip] > > So if you are working with much smaller arrays than 10K, you won't > > see total execution time decrease much > > In relation to what? Using dictionaries etc? Using the array module? No, in relation to operations on a 10K array. Basically, if an operation on a 10K array spends half its time on set up, operations on a 10 element array may only be twice as fast. I'm not making any claims about speed in relation to any other data structure (other than Numeric) > [snip] > > Before I go further, I need to find out if the preceeding has made > > you gasp in horror or if the timescale is too slow for you to > > accept. > > Hm. If you need 10000 elements before numarray pays off, I'm starting > to wonder if I can use it for anything at all. :I > I didn't make clear that this threshold may improve in the future (decrease). The corresponding threshold for Numeric is probably around 1000 to 2000 elements. (Likewise, operations on 10 element Numeric arrays are only about twice as fast as for 1K arrays) We may be able to eventually improve numarray performance to something in that neighborhood (if we are luckly) but I would be surprised to do much better (though if we use caching techniques, perhaps repeated cases of arrays of identical shape, strides, type, etc. may run much faster on subsequent operations). As usual, performance issues can be complicated. You have to keep in mind that Numeric and numarray provide much richer indexing and conversion handling feature than something like the array module, and that comes at some price in performance for small arrays. > > (This particular issue also makes me wonder if numarray would > > ever be a suitable substitute for the existing array module). > > Indeed. > > > What size graphs are you most concerned about as far as speed goes? > > I'm not sure. A wide range, I should imagine. But with only 100 nodes, > I'll get 10000 entries in the adjacency matrix, so perhaps it's > worthwile anyway? > That's right, a 100 nodes is where performance is being competitive, and if you feel you are worried about cases larger than that, then it isn't a problem. But if you are operating mostly on small graphs, then it may not be appropriate. The corresponding threshold for numeric would be on the order of 30 nodes. > > On the other hand, since numarray has much better support for index > > arrays, i.e., an array of indices that may be used to index another > > array of values, index array(s), value array pair may itself serve > > as a storage model for sparse arrays. > > That's an interesting idea, although I don't quite see how it would > help in the case of adjacency matrices. (You'd still need at least one > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... > Right?) > Right. > From magnus at hetland.org Thu Apr 18 08:48:17 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 18 08:48:17 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: ; from perry@stsci.edu on Thu, Apr 18, 2002 at 11:21:46AM -0400 References: <20020418165403.E300@idi.ntnu.no> Message-ID: <20020418174733.A7072@idi.ntnu.no> Perry Greenfield : > [snip] > > In relation to what? Using dictionaries etc? Using the array module? > > No, in relation to operations on a 10K array. Basically, if an operation > on a 10K array spends half its time on set up, operations on a > 10 element array may only be twice as fast. I'm not making any claims > about speed in relation to any other data structure (other than Numeric) Aaah! Sorry to be so dense :) But the speedup in numeric between different sizes isn't as important to me as the speedup compared to other solutions (such as a dict-based one) of course... If a 10 element array is only twice as fast as a 10K array that's no problem if it's still faster than an alternative solution (though I'm sure it might not be...) The same goes for 10K element graphs -- the interesting point has to be whether it's faster than various alternatives (which I'm sure it is). > > [snip] > > > Before I go further, I need to find out if the preceeding has made > > > you gasp in horror or if the timescale is too slow for you to > > > accept. > > > > Hm. If you need 10000 elements before numarray pays off, I'm starting > > to wonder if I can use it for anything at all. :I > > > I didn't make clear that this threshold may improve in the future > (decrease). Right. Good. And -- on small graphs performance probably won't be much of a problem anyway. :) > The corresponding threshold for Numeric is probably > around 1000 to 2000 elements. (Likewise, operations on 10 element > Numeric arrays are only about twice as fast as for 1K arrays) > We may be able to eventually improve numarray performance to something > in that neighborhood (if we are luckly) but I would be surprised to > do much better (though if we use caching techniques, perhaps repeated > cases of arrays of identical shape, strides, type, etc. may run > much faster on subsequent operations). As usual, performance issues > can be complicated. You have to keep in mind that Numeric and numarray > provide much richer indexing and conversion handling feature than > something like the array module, and that comes at some price in > performance for small arrays. Of course. I guess an alternative (for the graph situation) could be to wrap the graphs with a common interface with various implementations, so that a solution more optimised for small graphs could be used (in a factory function) if the graph is small... (Not really an issue for me at the moment, but should be easy to do, I guess.) [snip] > > I'm not sure. A wide range, I should imagine. But with only 100 nodes, > > I'll get 10000 entries in the adjacency matrix, so perhaps it's > > worthwile anyway? > > > That's right, a 100 nodes is where performance is being competitive, > and if you feel you are worried about cases larger than that, then > it isn't a problem. Seems probable. For smaller problems I wouldn't be thinking in terms of numarray anyway, I think. (Just using plain Python dicts or something similar.) [snip] > > > On the other hand, since numarray has much better support for index > > > arrays, i.e., an array of indices that may be used to index another > > > array of values, index array(s), value array pair may itself serve > > > as a storage model for sparse arrays. > > > > That's an interesting idea, although I don't quite see how it would > > help in the case of adjacency matrices. (You'd still need at least one > > n**2 size matrix for n nodes, wouldn't you -- i.e. the index array... > > Right?) > > > Right. I might as well use a full adjacency matrix, then... So, the conclusion for now is that numarray may well be suited for working with relatively large (100+ nodes), relatively dense graphs. Now, the next interesting question is how much of the standard graph algorithms can be implemented with ufuncs and array operations (which I guess is the key to performance) and not straight for-loops... After all, some of them are quite sequential. -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From rob at pythonemproject.com Thu Apr 18 09:18:31 2002 From: rob at pythonemproject.com (rob) Date: Thu Apr 18 09:18:31 2002 Subject: [Numpy-discussion] Graphs in numarray? References: <20020418165403.E300@idi.ntnu.no> <20020418174733.A7072@idi.ntnu.no> Message-ID: <3CBEF151.C440DCE@pythonemproject.com> I'm sorry I missed the original post, but the topic is important for me. I use the lightweight 3d volume renderer Animabob for most everything. The interface code is in all of the FDTD programs in my website. You just unwind a 3d array and scale it to +/- 128, turn it into chararacters, and you have the input file. I wish Animabob could somehow be turned into a Python package, as in Windows you need Cygwin to run it. I've tried other 3d packages like OpenDX, and they seem to be huge albatrosses. -- ----------------------------- The Numeric Python EM Project www.pythonemproject.com From perry at stsci.edu Thu Apr 18 10:36:19 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 18 10:36:19 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <3CBEF151.C440DCE@pythonemproject.com> Message-ID: Behalf Of rob > > I'm sorry I missed the original post, but the topic is important for > me. I use the lightweight 3d volume renderer Animabob for most > everything. The interface code is in all of the FDTD programs in my > website. You just unwind a 3d array and scale it to +/- 128, turn it > into chararacters, and you have the input file. I wish Animabob could > somehow be turned into a Python package, as in Windows you need Cygwin > to run it. I've tried other 3d packages like OpenDX, and they seem to > be huge albatrosses. > It sound like you are trying to do something different than Magnus, but if what you are looking to scale floating or int data to byte size and apply some character mapping, numarray (or Numeric) should be able to do that very well. If that is all you want done, you might find either to be overkill though (if you already wrote a C extension to do so). Perry From perry at stsci.edu Thu Apr 18 10:39:03 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 18 10:39:03 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020418174733.A7072@idi.ntnu.no> Message-ID: > Now, the next interesting question is how much of the standard graph > algorithms can be implemented with ufuncs and array operations (which > I guess is the key to performance) and not straight for-loops... After > all, some of them are quite sequential. > I'm not sure about that (not being very familiar with graph algorithms). If you can give me some examples (perhaps off the mailing list) I could say whether they are easily cast into ufunc or library calls. Perry From paul at pfdubois.com Fri Apr 19 07:42:23 2002 From: paul at pfdubois.com (Paul F Dubois) Date: Fri Apr 19 07:42:23 2002 Subject: [Numpy-discussion] [ANN] Pyfort 7.1 Message-ID: <000101c1e7ae$38ac0d50$0a01a8c0@NICKLEBY> Pyfort 7.1 is available at sf.net/projects/pyfortran. Support for single Fortran characters was added. (Michiel de Hoon) Corrected behavior of scalars with C routines. (Michiel de Hoon) Pyfort is a tool for connecting Python to Fortran. Just to let you know, I'm working on a little tool to make it easier to set up simple projects so that you can build and install them with less effort. I hope to have that available soon. From rob at pythonemproject.com Fri Apr 19 09:00:02 2002 From: rob at pythonemproject.com (rob) Date: Fri Apr 19 09:00:02 2002 Subject: [Numpy-discussion] Icc compiled Python Message-ID: <3CC03E5A.9835FC0A@pythonemproject.com> There has been some discussion on the FreeBSD Ports list about an Icc compiled Python. Benchmarks much faster than the normal gcc compiled version. I'm wondering if anyone here knows anything about it. The discussion can be accessed via www.geocrawler.org/ FreeBSD/ freebsd-ports. Rob. -- ----------------------------- The Numeric Python EM Project www.pythonemproject.com From juenglin at informatik.uni-freiburg.de Sat Apr 20 09:59:13 2002 From: juenglin at informatik.uni-freiburg.de (Ralf Juengling) Date: Sat Apr 20 09:59:13 2002 Subject: [Numpy-discussion] NumPy initiated reference counting Message-ID: <1019321875.8067.141.camel@leto> I'm currently tinkering with the following problem and what like to hear your suggestions: Within a C module I define a new Python type 'IM' (representing an image). The indexing or slicing facilities of NumPy arrays were tailormade for the manipulation of the internal data of its instances. Thus, I could provide a method 'asarray', which creates a properly typed array object 'a' referring to the data of an IM instance 'im': a = im.asarray() I could use PyArray_FromDimsAndData() to create the array instance. Unfortunately, this wouldn't work, since 'a' would not get notified about the death of 'im'. However, if I could prevent 'im' from being garbage collected before all array instances referring to its data are deleted, it should work. NumPy's array type uses a mechanism to prevent garbage collection of array instances if there are other instances that share data with it. My idea was, to use this mechanism, that is to let the asarray method increment im's reference count and let a->base refer to im. Do you think this is a reliable approach? Thanks, Ralf -- -------------------------------------------------------------------------- Ralf J?ngling Institut f?r Informatik - Lehrstuhl f?r Mustererkennung & Bildverarbeitung Georges-K?hler-Allee Geb?ude 52 Tel: +49-(0)761-203-8215 79110 Freiburg Fax: +49-(0)761-203-8262 -------------------------------------------------------------------------- From juenglin at informatik.uni-freiburg.de Sat Apr 20 12:22:51 2002 From: juenglin at informatik.uni-freiburg.de (Ralf Juengling) Date: Sat Apr 20 12:22:51 2002 Subject: [Numpy-discussion] qs on NumPy Message-ID: <1019330305.8067.158.camel@leto> Hi, I did not find a way in Python to check whether a Numeric array instance is a shared array or not. Could you confirm: there is no way. Is there work underway to make Numeric arrays subclassable? Regards, Ralf -- -------------------------------------------------------------------------- Ralf J?ngling Institut f?r Informatik - Lehrstuhl f?r Mustererkennung & Bildverarbeitung Georges-K?hler-Allee Geb?ude 52 Tel: +49-(0)761-203-8215 79110 Freiburg Fax: +49-(0)761-203-8262 -------------------------------------------------------------------------- From mok at imsb.au.dk Tue Apr 23 04:21:03 2002 From: mok at imsb.au.dk (Morten Kjeldgaard) Date: Tue Apr 23 04:21:03 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020417163133.F7565@idi.ntnu.no> Message-ID: > simple dict-based stuff -- more performance is needed). kjbuckets > looks like a nice alternative, as does the Boost Graph Library (not Kjbuckets is *very* nice indeed. It is a compact and very fast implementation. I don't see why you'd want to wrap this functionality into NumPy, which has a very well-defined scope and an efficient implentation. It would be a shame to bloat it with something which is discretely different. I have modified kjbuckets so that it compiles and works with Python 2.x. You can pick it up at ftp://xray.imsb.au.dk /pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm Just do "rpm --rebuild" on it. I sent the patch to the original author, but it appears he is no longer maintaining it. Never mind, it works great. /Morten -- Morten Kjeldgaard | Phone : +45 89 42 50 26 Institute of Molecular and Structural Biology | Fax : +45 86 12 31 78 Aarhus University | Home : +45 86 18 81 80 Gustav Wieds Vej 10 C, DK-8000 Aarhus C, Denmark | http://imsb.au.dk/~mok From magnus at hetland.org Thu Apr 25 07:28:05 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 25 07:28:05 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: ; from mok@imsb.au.dk on Tue, Apr 23, 2002 at 01:20:04PM +0200 References: <20020417163133.F7565@idi.ntnu.no> Message-ID: <20020425162734.B6821@idi.ntnu.no> Morten Kjeldgaard : > > > > simple dict-based stuff -- more performance is needed). kjbuckets > > looks like a nice alternative, as does the Boost Graph Library (not > > Kjbuckets is *very* nice indeed. Yes, I guess it is. But the project doesn't seem very active... > It is a compact and very fast implementation. I don't see why you'd > want to wrap this functionality into NumPy, which has a very > well-defined scope and an efficient implentation. It would be a > shame to bloat it with something which is discretely different. Yes, I guess you're right. There is no point in adding this sort of thing to numarray. My motivation for using numarray in my implementations was simply that it would mean that the necessery tools would be (or might be in the future ;) available in the standard distribution. > I have modified kjbuckets so that it compiles and works with Python 2.x. > You can pick it up at > > ftp://xray.imsb.au.dk > /pub/birdwash/packages/Python2.1/SRPMS/python-kjbuckets-2.2-7.src.rpm > > Just do "rpm --rebuild" on it. > > I sent the patch to the original author, but it appears he is no longer > maintaining it. Never mind, it works great. Well... I do sort of mind... I'm a bit wary of using unmaintained software. Not that I would never do it or anything... But I think it would be a bonus to use stuff that is being actively maintained and developed. But I guess I'll take another look at it. (Any idea where the "kj" prefix comes from, by the way?) > /Morten -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From magnus at hetland.org Thu Apr 25 07:43:10 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 25 07:43:10 2002 Subject: [Numpy-discussion] Non-numeric arrays? Message-ID: <20020425164228.C6821@idi.ntnu.no> I can't find this in the docs (although I've heard it's mentioned there)... Is support for non-numeric arrays (such as character arrays or object pointer arrays) as in Numeric planned for numarray? (Perhaps even supported? My version might not be themost recent...) And what about subclasses of numeric types? E.g: # numarray >>> class foo(int): pass >>> a = array(map(foo, xrange(10))) [...] TypeError: Expecting a python numeric type, got a foo # Numeric >>> class foo(int): pass >>> a = array(map(foo, xrange(10))) >>> tupe(a[0]) Neither behaviour seems very helpful -- I guess numarray's is cleaner... (Although in this case I think an object array could have been nice...) -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From jmiller at stsci.edu Thu Apr 25 07:53:04 2002 From: jmiller at stsci.edu (Todd Miller) Date: Thu Apr 25 07:53:04 2002 Subject: [Numpy-discussion] Non-numeric arrays? References: <20020425164228.C6821@idi.ntnu.no> Message-ID: <3CC81814.2010702@stsci.edu> Magnus Lie Hetland wrote: >I can't find this in the docs (although I've heard it's mentioned >there)... Is support for non-numeric arrays (such as character arrays >or object pointer arrays) as in Numeric planned for numarray? (Perhaps > Check out chararray for character arrays. Check out recarray for arrays of fixed length structs. To make your own non-numeric arrays, subclass NDArray. > >even supported? My version might not be themost recent...) > >And what about subclasses of numeric types? > >E.g: > ># numarray > >>>>class foo(int): pass >>>>a = array(map(foo, xrange(10))) >>>> >[...] >TypeError: Expecting a python numeric type, got a foo > ># Numeric > >>>>class foo(int): pass >>>>a = array(map(foo, xrange(10))) >>>>tupe(a[0]) >>>> > > >Neither behaviour seems very helpful -- I guess numarray's is >cleaner... (Although in this case I think an object array could have >been nice...) > Object arrays fall into the *eventually* category: planned but not imminent. > > >-- >Magnus Lie Hetland The Anygui Project >http://hetland.org http://anygui.org > >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > Todd -- Todd Miller jmiller at stsci.edu STSCI / SSG (410) 338 4576 From magnus at hetland.org Thu Apr 25 07:54:02 2002 From: magnus at hetland.org (Magnus Lie Hetland) Date: Thu Apr 25 07:54:02 2002 Subject: [Numpy-discussion] Non-numeric arrays? In-Reply-To: <20020425164228.C6821@idi.ntnu.no>; from magnus@hetland.org on Thu, Apr 25, 2002 at 04:42:28PM +0200 References: <20020425164228.C6821@idi.ntnu.no> Message-ID: <20020425165304.D6821@idi.ntnu.no> Magnus Lie Hetland : [snip] Just a quick explanation for why I'm interested in this... I've got a two-dimensional array of ints (or bytes, actually), that I would like to convert to a delimited string (e.g. comma-separated). This works in Numeric: >>> from string import letters >>> alphabet = array(letters) >>> data = arange(24) # E.g... >>> data.shape = 6, 4 >>> fields = sum(take(alphabet, data), 1) >>> ','.join(fields) 'abcd,efgh,ijkl,mnop,qrst,uvwx' -- Magnus Lie Hetland The Anygui Project http://hetland.org http://anygui.org From perry at stsci.edu Thu Apr 25 07:57:05 2002 From: perry at stsci.edu (Perry Greenfield) Date: Thu Apr 25 07:57:05 2002 Subject: [Numpy-discussion] Non-numeric arrays? In-Reply-To: <20020425164228.C6821@idi.ntnu.no> Message-ID: [I see Todd has already answered this, the following might add a little more detail] > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Magnus > Lie Hetland > Sent: Thursday, April 25, 2002 10:42 AM > To: Numpy-discussion > Subject: [Numpy-discussion] Non-numeric arrays? > > > I can't find this in the docs (although I've heard it's mentioned > there)... Is support for non-numeric arrays (such as character arrays > or object pointer arrays) as in Numeric planned for numarray? (Perhaps > even supported? My version might not be themost recent...) > Yes, in fact there is a character array class included with numarray (but not documented, I believe. For the moment, you'll have to deal with the source. We developed it for use with our I/O library but it seemed to be of general enough use to include with numarray. We also plan to support arrays of Python objects. There are various ways that this could be done and we ought to discuss how it should be done (perhaps multiple ways). But the underlying machinery certainly will support it. > And what about subclasses of numeric types? > > E.g: > > # numarray > >>> class foo(int): pass > >>> a = array(map(foo, xrange(10))) > [...] > TypeError: Expecting a python numeric type, got a foo > > # Numeric > >>> class foo(int): pass > >>> a = array(map(foo, xrange(10))) > >>> tupe(a[0]) > > > Neither behaviour seems very helpful -- I guess numarray's is > cleaner... (Although in this case I think an object array could have > been nice...) > We haven't had much time to think about how we deal with numeric subclasses. Certainly one would not use these for efficiency, I can't see any simple way of making such things go fast. But it may be possible to have such things work with numarray ufuncs and other numeric operations in some automatic way. I'd have to think about that. It's not high on the priority list at the moment. (Speaking of which I may post in a few days). Thanks, Perry > From hinsen at cnrs-orleans.fr Thu Apr 25 08:35:06 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Thu Apr 25 08:35:06 2002 Subject: [Numpy-discussion] Graphs in numarray? In-Reply-To: <20020425162734.B6821@idi.ntnu.no> References: <20020417163133.F7565@idi.ntnu.no> <20020425162734.B6821@idi.ntnu.no> Message-ID: Magnus Lie Hetland writes: > (Any idea where the "kj" prefix comes from, by the way?) I asked Aaron Watter about this. The answer: k and j are the initials of his children. Konrad. From jasper at peak.org Mon Apr 29 03:14:04 2002 From: jasper at peak.org (Jasper Phillips) Date: Mon Apr 29 03:14:04 2002 Subject: [Numpy-discussion] Multiple Linear Regression? Message-ID: <200204291013.DAA32745@spock.peak.org> I'm helping my wife with programming for her economics thesis, which needs to calculate a "Multiple Linear Regression" on her data. Does anyone know of any (preferably though not necesarrily free) software that can do this? I'm working in Python, but not limited to it as I can relatively freely access other languages. I'm still looking for a library written in Python, but haven't had any luck. My second thought was Matlab, but looking over the Matlab website, I couldn't find anything like this by a name I recognize. It looks like I might be able to construct something out of a combination of Sparse Matrices and Linear Regesstion, or perhaps the stuff for overdetermined Linear Equations? Another option may be LAPACK routines, but I'm not familiar with those. Does anyone here have any experience with this kind of stuff? Is there a better place to ask? I'm about ready to take a shot at writing something myself, but I'd really rather avoid this if it's been done before. -Jasper From hinsen at cnrs-orleans.fr Mon Apr 29 05:40:03 2002 From: hinsen at cnrs-orleans.fr (Konrad Hinsen) Date: Mon Apr 29 05:40:03 2002 Subject: [Numpy-discussion] Multiple Linear Regression? In-Reply-To: <200204291013.DAA32745@spock.peak.org> References: <200204291013.DAA32745@spock.peak.org> Message-ID: Jasper Phillips writes: > I'm still looking for a library written in Python, but haven't had any luck. Numerical Python has all the basic stuff, but you need to read in and arrange the data yourself. All linear regression problems ultimately become least-squares problems for a system of linear equations, which can be solved using LinearAlgebra.linear_least_squares. Konrad. From Alexandre.Fayolle at logilab.fr Mon Apr 29 06:20:03 2002 From: Alexandre.Fayolle at logilab.fr (Alexandre) Date: Mon Apr 29 06:20:03 2002 Subject: [Numpy-discussion] Multiple Linear Regression? In-Reply-To: <200204291013.DAA32745@spock.peak.org> References: <200204291013.DAA32745@spock.peak.org> Message-ID: <20020429131937.GE30347@orion.logilab.fr> On Mon, Apr 29, 2002 at 03:13:44AM -0700, Jasper Phillips wrote: > I'm helping my wife with programming for her economics thesis, which needs > to calculate a "Multiple Linear Regression" on her data. > > Does anyone know of any (preferably though not necesarrily free) software > that can do this? I'm working in Python, but not limited to it as I > can relatively freely access other languages. > > I'm still looking for a library written in Python, but haven't had any luck. > I'm helping my wife with her History PhD, and have to deal with similar stuff. I found R to be a very useful environment for statistical computations. R is a free software clone of S-plus, which is to statistics what Matlab is to linear algebra and automation. Pros: - programming environment, with a high level programming language - extensive statistical and linalg library (using C and FORTRAN code) - lots of third party code available, covering a very wide range of situations - Python bindings available if you don't want to learn the Scheme-like language - Tons of documentation available - Excellent support through the mailing lists - GPL'd - Tons of way to import data (ranging from CSV files to ODBC queries) - 2 printed books available, at Springer Verlag - postscript, png, wmf, X outputs, with precise control of the layout of the graphs and figures available for a nice colourful thesis Cons: - the language can be a bit weird at times (it took me some time to get used to '.' being used instead of '_' and vice versa in the scoping and variable naming), but you can use Python to script R, thanks to RPython - it's quite a big piece of code, with a rather steep learning curve and you need time to get inside it - the documentation is aimed at professional statisticians. I had to dig back in my statistics courses and to buy a couple of books on that topic for the software to become really useful. Asking newbie statistician questions on the r-help mailing list is off-topic - the springer verlag books are very expensive (Modern Applied Statistics with S-plus costs something like 70 euros), but they are great So you have a powerful tool available at your fingertips, designed to do precisely what you need. I think it's worth taking the time to look at it carefully. The more I get to understand the topic, the more ideas I get for new ways of exploring the data of my wife's PhD. Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From Alexandre.Fayolle at logilab.fr Mon Apr 29 06:28:08 2002 From: Alexandre.Fayolle at logilab.fr (Alexandre) Date: Mon Apr 29 06:28:08 2002 Subject: [Numpy-discussion] Multiple Linear Regression? In-Reply-To: <20020429131937.GE30347@orion.logilab.fr> References: <200204291013.DAA32745@spock.peak.org> <20020429131937.GE30347@orion.logilab.fr> Message-ID: <20020429132741.GF30347@orion.logilab.fr> On Mon, Apr 29, 2002 at 03:19:37PM +0200, Alexandre wrote: > I'm helping my wife with her History PhD, and have to deal with similar > stuff. I found R to be a very useful environment for statistical > computations. R is a free software clone of S-plus, which is to statistics > what Matlab is to linear algebra and automation. Woops, I forgot to add a couple of URLs: The R project website http://www.r-project.org/ The Comprehensive R Archive Network (CRAN) http://cran.r-project.org/ Using R from Python http://rpy.sourceforge.net/ Using R from Python and Python from R (coding R extensions in Python) http://www.omegahat.org/RSPython/ Cheers, Alexandre Fayolle -- LOGILAB, Paris (France). http://www.logilab.com http://www.logilab.fr http://www.logilab.org Narval, the first software agent available as free software (GPL). From cavallo at kip.uni-heidelberg.de Mon Apr 29 10:10:20 2002 From: cavallo at kip.uni-heidelberg.de (cavallo at kip.uni-heidelberg.de) Date: Mon Apr 29 10:10:20 2002 Subject: [Numpy-discussion] kdfio, 1.1.1 Message-ID: hy, here is the url last version of kdfio a khoros/cantata kdf file importer: nothing special, but it seems working now, at least for me;-) You can find it at: http://kdfio.sourceforge.net This is my (very) small contribution to the numerical python: inside i plugged a way to modularize the code (and writing some skeleton semi-automatically) that could speed up a litte bit writing new code. Before to give a full announcement on sourceforge i will wait a little bit, just to see if there are no bugs around. Fell free to use/change/make what you want, thanks to all, antonio cavallo ps. khoros is available at http://www.khoral.com and it is not a free program: there is just a free student version. From jmiller at stsci.edu Mon Apr 29 10:14:07 2002 From: jmiller at stsci.edu (Todd Miller) Date: Mon Apr 29 10:14:07 2002 Subject: [Numpy-discussion] ANN: Numarray-0.3.3 Message-ID: <3CCD7F49.5030809@stsci.edu> Numarray 0.3.3 --------------------------------- Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modelled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, and improved type promotions. Numarray-0.3.3 features improved support for arrays of complex numbers, re-implementing complex types using generated code. In addition to being faster, the new complex ufuncs are better integrated with the numarray type system, so operations between numarrays and complex scalars now work properly. This release also fixes a problem experienced by RedHat Linux users installing numarray from source. WHERE ----------- Numarray-0.3.3 windows executable installers and source code tar ball is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS -------------------------- numarray-0.3.3 requires Python 2.0 or greater. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. Thanks go to Jochen Kupper of the University of North Carolina for his work on Numarray and for porting the Numarray manual to TeX format. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From haase at msg.ucsf.edu Mon Apr 29 11:18:15 2002 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Mon Apr 29 11:18:15 2002 Subject: [Numpy-discussion] unsigned short support in NumPy Message-ID: Hi all, I'm _very_ new to NumPy. I was interested in using it for our project, where we acquire data from a CCD camera. The Problem: Each pixel in the image is a 16 bit gray value. What I read in the documentation - there is only 8 bit (unsigned integer) support in numpy (or should I say numericarray) Are there plans to add a "unsigned short" (16 bit) support . How much effort would that be. Regards, Sebastian Haase -- _\\|//_ (' O-O ') ------------------------------ooO-(_)-Ooo-------------------------------------- Sebastian Haase University of California, San Francisco (415)502-4316 From rick at bioinformatics.org Mon Apr 29 11:35:26 2002 From: rick at bioinformatics.org (Rick Ree) Date: Mon Apr 29 11:35:26 2002 Subject: [Numpy-discussion] testing Numeric.array([0]) Message-ID: <1020105268.12239.27.camel@loco.ucdavis.edu> Should Numeric.array([0]) test false? This seems counterintuitive, and is not the case for the regular python array module. This recently caused a subtle bug for me when I wanted to find the indices of an array that met a condition. If only the first element met the condition, the result was array([0]) -- a non-empty result that evaluated false. If this is the intended behavior, can someone tell me the reason? thanks, Rick From perry at stsci.edu Mon Apr 29 13:31:14 2002 From: perry at stsci.edu (Perry Greenfield) Date: Mon Apr 29 13:31:14 2002 Subject: [Numpy-discussion] unsigned short support in NumPy In-Reply-To: Message-ID: > Sebastian Haase writes: > > Hi all, > I'm _very_ new to NumPy. > I was interested in using it for our project, where we acquire > data from a > CCD camera. > > The Problem: Each pixel in the image is a 16 bit gray value. > What I read in the documentation - there is only 8 > bit (unsigned > integer) support in numpy (or should I say numericarray) > > Are there plans to add a "unsigned short" (16 bit) support . > How much effort would that be. > There is a reimplemenation of Numeric that we are doing that does support unsigned ints (Unsigned Int8, Unsigned Int16 for now). The project is not mature, but a lot of basic cabability exists now. You'll have to look it over to judge if it is usable for you now. The new version is called numarray ( http://stsdas.stsci.edu/numarray ) (btw, we acquire data from CCD cameras as well ;-) Perry From tchur at optushome.com.au Mon Apr 29 13:46:39 2002 From: tchur at optushome.com.au (Tim Churches) Date: Mon Apr 29 13:46:39 2002 Subject: [Numpy-discussion] Multiple Linear Regression? References: <200204291013.DAA32745@spock.peak.org> Message-ID: <3CCDBB6C.8A983A5C@optushome.com.au> Jasper Phillips wrote: > > I'm helping my wife with programming for her economics thesis, which needs > to calculate a "Multiple Linear Regression" on her data. > > Does anyone know of any (preferably though not necesarrily free) software > that can do this? I'm working in Python, but not limited to it as I > can relatively freely access other languages. Jasper, Use R (a free implementation of S). See http://www.r-project.org If you are managing your data in Python and NumPy, you can "embed" R in Python and transparently send data to it using Walter Moreira's wonderful RPy module - see http://rpy.sf.net Tim C