From falted at pytables.org Thu Jul 1 01:51:39 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 1 01:51:39 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <1088632048.7526.204.camel@halloween.stsci.edu> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> Message-ID: <200407011048.01929.falted@pytables.org> A Dimecres 30 Juny 2004 23:47, Todd Miller va escriure: > > There were a couple of other things I tried that resulted in additional > > small speedups, but the tactics I used were too horrible to reproduce > > here. The main one of interest is that all of the calls to > > NA_updateDataPtr seem to burn some time. However, I don't have any idea > > what one could do about that. > > Francesc Alted had the same comment about NA_updateDataPtr a while ago. > I tried to optimize it then but didn't get anywhere. NA_updateDataPtr() > should be called at most once per extension function (more is > unnecessary but not harmful) but needs to be called at least once as a > consequence of the way the buffer protocol doesn't give locked > pointers. FYI I'm still refusing to call NA_updateDataPtr() in a spoecific part of my code that requires as much speed as possible. It works just fine from numarray 0.5 on (numarray 0.4 gave a segmentation fault on that). However, Todd already warned me about that and told me that this is unsafe. Nevertheless, I'm using the optimization for read-only purposes (i.e. they are not accessible to users) over numarray objects, and that *seems* to be safe (at least I did not have any single problem after numarray 0.5). I know that I'm walking on the cutting edge, but life is dangerous anyway ;). By the way, that optimization gives me a 70% of improvement during element access to NumArray elements. It would be very nice if you finally can achieve additional performance with your recent bet :). Good luck!, -- Francesc Alted From haase at msg.ucsf.edu Thu Jul 1 09:06:24 2004 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Jul 1 09:06:24 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <20040701053355.M99698@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> Message-ID: <200407010904.25498.haase@msg.ucsf.edu> On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > > > So... you use the "meta" code to provide package specific ordinary > > (not-macro-fied) functions to keep the different versions of the > > Present() and isArray() macros from conflicting. > > > > It would be nice to have a standard approach for using the same > > "extension enhancement code" for both numarray and Numeric. The PEP > > should really be expanded to provide an example of dual support for one > > complete and real function, guts and all, so people can see the process > > end-to-end; Something like a simple arrayprint. That process needs > > to be refined to remove as much tedium and duplication of effort as > > possible. The idea is to make it as close to providing one > > implementation to support both array packages as possible. I think it's > > important to illustrate how to partition the extension module into > > separate compilation units which correctly navigate the dual > > implementation mine field in the easiest possible way. > > > > It would also be nice to add some logic to the meta-functions so that > > which array package gets used is configurable. We did something like > > that for the matplotlib plotting software at the Python level with > > the "numerix" layer, an idea I think we copied from Chaco. The kind > > of dispatch I think might be good to support configurability looks like > > this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(PyArray_Which(what)) { > > USE_NUMERIC: > > result = Numeric_whatsThis(what); break; > > USE_NUMARRAY: > > result = Numarray_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > In the above, I'm picturing a separate .c file for Numeric_whatsThis > > and for Numarray_whatsThis. It would be nice to streamline that to one > > .c and a process which somehow (simply) produces both functions. > > > > Or, ideally, the above would be done more like this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(Numerix_Which(what)) { > > USE_NUMERIX: > > result = Numerix_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > Here, a common Numerix implementation supports both numarray and Numeric > > from a single simple .c. The extension module would do "#include > > numerix/arrayobject.h" and "import_numerix()" and otherwise just call > > PyArray_* functions. > > > > The current stumbling block is that numarray is not binary compatible > > with Numeric... so numerix in C falls apart. I haven't analyzed > > every symbol and struct to see if it is really feasible... but it > > seems like it is *almost* feasible, at least for typical usage. > > > > So, in a nutshell, I think the dual implementation support you > > demoed is important and we should work up an example and kick it > > around to make sure it's the best way we can think of doing it. > > Then we should add a section to the PEP describing dual support as well. > > I would never apply numarray code to Numeric arrays and the inverse. It > looks dangerous and I do not know if it is possible. The first thing > coming to mind is that numarray and Numeric arrays refer to different type > objects (this is what my pep module uses to differentiate them). So, even > if numarray and Numeric are binary compatible, any 'alien' code referring > the the 'Python-standard part' of the type objects may lead to surprises. A > PEP proposing hacks will raise eyebrows at least. > > Secondly, most people use Numeric *or* numarray and not both. > > So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out > (NINO) Of course, Numeric or numarray output can be a user option if NINO > does not apply. (explicit safe conversion between Numeric and numarray is > possible if really needed). > > I'll try to flesh out the demo with real functions in the way you indicated > (going as far as I consider safe). > > The problem of coding the Numeric (or numarray) functions in more than > a single source file has also be addressed. > > It may take 2 weeks because I am off to a conference next week. > > Regards -- Gerard Hi all, first, I would like to state that I don't understand much of this discussion; so the only comment I wanted to make is that IF this where possible, to make (C/C++) code that can live with both Numeric and numarray, then I think it would be used more and more - think: transition phase !! (e.g. someone could start making the FFTW part of scipy numarray friendly without having to switch everything at one [hint ;-)] ) These where just my 2 cents. Cheers, Sebastian Haase From jmiller at stsci.edu Thu Jul 1 09:44:13 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 09:44:13 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <20040701053355.M99698@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> Message-ID: <1088700210.14402.17.camel@halloween.stsci.edu> On Thu, 2004-07-01 at 02:33, gerard.vermeulen at grenoble.cnrs.fr wrote: > On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > > > > So... you use the "meta" code to provide package specific ordinary > > (not-macro-fied) functions to keep the different versions of the > > Present() and isArray() macros from conflicting. > > > > It would be nice to have a standard approach for using the same > > "extension enhancement code" for both numarray and Numeric. The PEP > > should really be expanded to provide an example of dual support for one > > complete and real function, guts and all, so people can see the process > > end-to-end; Something like a simple arrayprint. That process needs > > to be refined to remove as much tedium and duplication of effort as > > possible. The idea is to make it as close to providing one > > implementation to support both array packages as possible. I think it's > > important to illustrate how to partition the extension module into > > separate compilation units which correctly navigate the dual > > implementation mine field in the easiest possible way. > > > > It would also be nice to add some logic to the meta-functions so that > > which array package gets used is configurable. We did something like > > that for the matplotlib plotting software at the Python level with > > the "numerix" layer, an idea I think we copied from Chaco. The kind > > of dispatch I think might be good to support configurability looks like > > this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(PyArray_Which(what)) { > > USE_NUMERIC: > > result = Numeric_whatsThis(what); break; > > USE_NUMARRAY: > > result = Numarray_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > In the above, I'm picturing a separate .c file for Numeric_whatsThis > > and for Numarray_whatsThis. It would be nice to streamline that to one > > .c and a process which somehow (simply) produces both functions. > > > > Or, ideally, the above would be done more like this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(Numerix_Which(what)) { > > USE_NUMERIX: > > result = Numerix_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > Here, a common Numerix implementation supports both numarray and Numeric > > from a single simple .c. The extension module would do "#include > > numerix/arrayobject.h" and "import_numerix()" and otherwise just call > > PyArray_* functions. > > > > The current stumbling block is that numarray is not binary compatible > > with Numeric... so numerix in C falls apart. I haven't analyzed > > every symbol and struct to see if it is really feasible... but it > > seems like it is *almost* feasible, at least for typical usage. > > > > So, in a nutshell, I think the dual implementation support you > > demoed is important and we should work up an example and kick it > > around to make sure it's the best way we can think of doing it. > > Then we should add a section to the PEP describing dual support as well. > > > I would never apply numarray code to Numeric arrays and the inverse. It looks > dangerous and I do not know if it is possible. I think that's definitely the marching orders for now... but you gotta admit, it would be nice. > The first thing coming > to mind is that numarray and Numeric arrays refer to different type objects > (this is what my pep module uses to differentiate them). So, even if > numarray and Numeric are binary compatible, any 'alien' code referring the > the 'Python-standard part' of the type objects may lead to surprises. > A PEP proposing hacks will raise eyebrows at least. I'm a little surprised it took someone to talk me out of it... I'll just concede that this was probably a bad idea. > Secondly, most people use Numeric *or* numarray and not both. A class of question which will arise for developers is this: "X works with Numeric, but X doesn't work with numaray." The reverse also happens occasionally. For this reason, being able to choose would be nice for developers. > So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) > Of course, Numeric or numarray output can be a user option if NINO does not > apply. When I first heard it, I though NINO was a good idea, with the limitation that it doesn't apply when a function produces an array without consuming any. But... there is another problem with NINO that Perry Greenfield pointed out: with multiple arguments, there can be a mix of array types. For this reason, it makes sense to be able to coerce all the inputs to a particular array package. This form might look more like: switch(PyArray_Which()) { case USE_NUMERIC: result = Numeric_doit(a1, a2, a3); break; case USE_NUMARRAY: result = Numarray_doit(a1, a2, a3); break; case USE_SEQUENCE: result = Sequence_doit(a1, a2, a3); break; } One last thing: I think it would be useful to be able to drive the code into sequence mode with arrays. This would enable easy benchmarking of the performance improvement. > (explicit safe conversion between Numeric and numarray is possible > if really needed). > >I'll try to flesh out the demo with real functions in the way you indicated > (going as far as I consider safe). > > The problem of coding the Numeric (or numarray) functions in more than > a single source file has also be addressed. > > It may take 2 weeks because I am off to a conference next week. Excellent. See you in a couple weeks. Regards, Todd From jmiller at stsci.edu Thu Jul 1 09:59:01 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 09:59:01 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <40E3462A.9080303@cox.net> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> <40E3462A.9080303@cox.net> Message-ID: <1088701077.14402.20.camel@halloween.stsci.edu> On Wed, 2004-06-30 at 19:00, Tim Hochberg wrote: > By this do you mean the "#if PY_VERSION_HEX >= 0x02030000 " that is > wrapped around _ndarray_item? If so, I believe that it *is* getting > compiled, it's just never getting called. > > What I think is happening is that the class NumArray inherits its > sq_item from PyClassObject. In particular, I think it picks up > instance_item from Objects/classobject.c. This appears to be fairly > expensive and, I think, ends up calling tp_as_mapping->mp_subscript. > Thus, _ndarray's sq_item slot never gets called. All of this is pretty > iffy since I don't know this stuff very well and I didn't trace it all > the way through. However, it explains what I've seen thus far. > > This is why I ended up using the horrible hack. I'm resetting NumArray's > sq_item to point to _ndarray_item instead of instance_item. I believe > that access at the python level goes through mp_subscrip, so it > shouldn't be affected, and only objects at the C level should notice and > they should just get the faster sq_item. You, will notice that there are > an awful lot of I thinks in the above paragraphs though... Ugh... Thanks for explaining this. > >>I then optimized _ndarray_item (code > >>at end). This halved the execution time of my arbitrary benchmark. This > >>trick may have horrible, unforseen consequences so use at your own risk. > >> > >> > > > >Right now the sq_item hack strikes me as somewhere between completely > >unnecessary and too scary for me! Maybe if python-dev blessed it. > > > > > Yes, very scary. And it occurs to me that it will break subclasses of > NumArray if they override __getitem__. When these subclasses are > accessed from C they will see nd_array's sq_item instead of the > overridden getitem. However, I think I also know how to fix it. But > it does point out that it is very dangerous and there are probably dark > corners of which I'm unaware. Asking on Python-List or PyDev would > probably be a good idea. > > The nonscary, but painful, fix would to rewrite NumArray in C. Non-scary to whom? > >This optimization looks good to me. > > > > > Unfortunately, I don't think the optimization to sq_item will affect > much since NumArray appears to override it with > > >>Finally I commented out the __del__ method numarraycore. This resulted > >>in an additional speedup of 64% for a total speed up of 240%. Still not > >>close to 10x, but a large improvement. However, this is obviously not > >>viable for real use, but it's enough of a speedup that I'll try to see > >>if there's anyway to move the shadow stuff back to tp_dealloc. > >> > >> > > > >FYI, the issue with tp_dealloc may have to do with which mode Python is > >compiled in, --with-pydebug, or not. One approach which seems like it > >ought to work (just thought of this!) is to add an extra reference in C > >to the NumArray instance __dict__ (from NumArray.__init__ and stashed > >via a new attribute in the PyArrayObject struct) and then DECREF it as > >the last part of the tp_dealloc. > > > > > That sounds promising. I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore. So the INCREF needs to be done in C without doing any getattrs; this seems to mean calling a private _PyObject_GetDictPtr function to get a pointer to the __dict__ slot which can be dereferenced to get the __dict__. > [SNIP] > > > > >Well, be picking out your beer. > > > > > I was only about half right, so I'm not sure I qualify... We could always reduce your wages to a 12-pack... Todd From gerard.vermeulen at grenoble.cnrs.fr Thu Jul 1 11:39:08 2004 From: gerard.vermeulen at grenoble.cnrs.fr (Gerard Vermeulen) Date: Thu Jul 1 11:39:08 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <1088700210.14402.17.camel@halloween.stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <1088700210.14402.17.camel@halloween.stsci.edu> Message-ID: <20040701203739.31f80e02.gerard.vermeulen@grenoble.cnrs.fr> On 01 Jul 2004 12:43:31 -0400 Todd Miller wrote: > A class of question which will arise for developers is this: "X works > with Numeric, but X doesn't work with numaray." The reverse also > happens occasionally. For this reason, being able to choose would be > nice for developers. > > > So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) > > Of course, Numeric or numarray output can be a user option if NINO does not > > apply. > > When I first heard it, I though NINO was a good idea, with the > limitation that it doesn't apply when a function produces an array > without consuming any. But... there is another problem with NINO that > Perry Greenfield pointed out: with multiple arguments, there can be a > mix of array types. For this reason, it makes sense to be able to > coerce all the inputs to a particular array package. This form might > look more like: > > switch(PyArray_Which()) { > case USE_NUMERIC: > result = Numeric_doit(a1, a2, a3); break; > case USE_NUMARRAY: > result = Numarray_doit(a1, a2, a3); break; > case USE_SEQUENCE: > result = Sequence_doit(a1, a2, a3); break; > } > > One last thing: I think it would be useful to be able to drive the code > into sequence mode with arrays. This would enable easy benchmarking of > the performance improvement. > > > (explicit safe conversion between Numeric and numarray is possible > > if really needed). Yeah, when I wrote 'if really needed', I was hoping to shift the responsibility of coercion (or conversion) to the Python programmer (my lazy side telling me that it can be done in pure Python). You talked me into doing it in C :-) Regards -- Gerard From tim.hochberg at cox.net Thu Jul 1 11:52:05 2004 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Jul 1 11:52:05 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <1088701077.14402.20.camel@halloween.stsci.edu> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> <40E3462A.9080303@cox.net> <1088701077.14402.20.camel@halloween.stsci.edu> Message-ID: <40E45D3C.7020501@cox.net> Todd Miller wrote: >On Wed, 2004-06-30 at 19:00, Tim Hochberg wrote: > > >>>> >>>> >>>> >>>FYI, the issue with tp_dealloc may have to do with which mode Python is >>>compiled in, --with-pydebug, or not. One approach which seems like it >>>ought to work (just thought of this!) is to add an extra reference in C >>>to the NumArray instance __dict__ (from NumArray.__init__ and stashed >>>via a new attribute in the PyArrayObject struct) and then DECREF it as >>>the last part of the tp_dealloc. >>> >>> >>> >>> >>That sounds promising. >> >> > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > So the INCREF needs to be done in C without doing any getattrs; this > seems to mean calling a private _PyObject_GetDictPtr function to get a > pointer to the __dict__ slot which can be dereferenced to get the > __dict__. Might there be a simpler way? Since you're putting an extra attribute on the PyArrayObject structure anyway, wouldn't it be possible to just stash _shadows there instead of the reference to the dictionary? It appears that that the only time _shadows is accessed from python is in __del__. If it were instead an attribute on ndarray, the dealloc problem would go away since the responsibility for deallocing it would fall to ndarray. Since everything else accesses it from C, that shouldn't be much of a problem and should speed that stuff up as well. -tim From cjw at sympatico.ca Thu Jul 1 12:59:01 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 1 12:59:01 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <200407010904.25498.haase@msg.ucsf.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <200407010904.25498.haase@msg.ucsf.edu> Message-ID: <40E46CD3.9090802@sympatico.ca> Sebastian Haase wrote: >On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > > >>On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote >> >> >> >>>So... you use the "meta" code to provide package specific ordinary >>>(not-macro-fied) functions to keep the different versions of the >>>Present() and isArray() macros from conflicting. >>> >>>It would be nice to have a standard approach for using the same >>>"extension enhancement code" for both numarray and Numeric. The PEP >>>should really be expanded to provide an example of dual support for one >>>complete and real function, guts and all, so people can see the process >>>end-to-end; Something like a simple arrayprint. That process needs >>>to be refined to remove as much tedium and duplication of effort as >>>possible. The idea is to make it as close to providing one >>>implementation to support both array packages as possible. I think it's >>>important to illustrate how to partition the extension module into >>>separate compilation units which correctly navigate the dual >>>implementation mine field in the easiest possible way. >>> >>>It would also be nice to add some logic to the meta-functions so that >>>which array package gets used is configurable. We did something like >>>that for the matplotlib plotting software at the Python level with >>>the "numerix" layer, an idea I think we copied from Chaco. The kind >>>of dispatch I think might be good to support configurability looks like >>>this: >>> >>>PyObject * >>>whatsThis(PyObject *dummy, PyObject *args) >>>{ >>> PyObject *result, *what = NULL; >>> if (!PyArg_ParseTuple(args, "O", &what)) >>> return 0; >>> switch(PyArray_Which(what)) { >>> USE_NUMERIC: >>> result = Numeric_whatsThis(what); break; >>> USE_NUMARRAY: >>> result = Numarray_whatsThis(what); break; >>> USE_SEQUENCE: >>> result = Sequence_whatsThis(what); break; >>> } >>> Py_INCREF(Py_None); >>> return Py_None; >>>} >>> >>>In the above, I'm picturing a separate .c file for Numeric_whatsThis >>>and for Numarray_whatsThis. It would be nice to streamline that to one >>>.c and a process which somehow (simply) produces both functions. >>> >>>Or, ideally, the above would be done more like this: >>> >>>PyObject * >>>whatsThis(PyObject *dummy, PyObject *args) >>>{ >>> PyObject *result, *what = NULL; >>> if (!PyArg_ParseTuple(args, "O", &what)) >>> return 0; >>> switch(Numerix_Which(what)) { >>> USE_NUMERIX: >>> result = Numerix_whatsThis(what); break; >>> USE_SEQUENCE: >>> result = Sequence_whatsThis(what); break; >>> } >>> Py_INCREF(Py_None); >>> return Py_None; >>>} >>> >>>Here, a common Numerix implementation supports both numarray and Numeric >>>from a single simple .c. The extension module would do "#include >>>numerix/arrayobject.h" and "import_numerix()" and otherwise just call >>>PyArray_* functions. >>> >>>The current stumbling block is that numarray is not binary compatible >>>with Numeric... so numerix in C falls apart. I haven't analyzed >>>every symbol and struct to see if it is really feasible... but it >>>seems like it is *almost* feasible, at least for typical usage. >>> >>>So, in a nutshell, I think the dual implementation support you >>>demoed is important and we should work up an example and kick it >>>around to make sure it's the best way we can think of doing it. >>>Then we should add a section to the PEP describing dual support as well. >>> >>> >>I would never apply numarray code to Numeric arrays and the inverse. It >>looks dangerous and I do not know if it is possible. The first thing >>coming to mind is that numarray and Numeric arrays refer to different type >>objects (this is what my pep module uses to differentiate them). So, even >>if numarray and Numeric are binary compatible, any 'alien' code referring >>the the 'Python-standard part' of the type objects may lead to surprises. A >>PEP proposing hacks will raise eyebrows at least. >> >>Secondly, most people use Numeric *or* numarray and not both. >> >>So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out >>(NINO) Of course, Numeric or numarray output can be a user option if NINO >>does not apply. (explicit safe conversion between Numeric and numarray is >>possible if really needed). >> >>I'll try to flesh out the demo with real functions in the way you indicated >>(going as far as I consider safe). >> >>The problem of coding the Numeric (or numarray) functions in more than >>a single source file has also be addressed. >> >>It may take 2 weeks because I am off to a conference next week. >> >>Regards -- Gerard >> >> > >Hi all, >first, I would like to state that I don't understand much of this discussion; >so the only comment I wanted to make is that IF this where possible, to make >(C/C++) code that can live with both Numeric and numarray, then I think it >would be used more and more - think: transition phase !! (e.g. someone could >start making the FFTW part of scipy numarray friendly without having to >switch everything at one [hint ;-)] ) > >These where just my 2 cents. >Cheers, >Sebastian Haase > > I feel lower on the understanding tree with respect to what is being proposed in the draft PEP, but would still like to offer my 2 cents worth. I get the feeling that numarray is being bent out of shape to fit Numeric. It was my understanding that Numeric had certain weakness which made it unacceptable as a Python component and that numarray was intended to provide the same or better functionality within a pythonic framework. numarray has not achieved the expected performance level to date, but progress is being made and I believe that, for larger arrays, numarray has been shown to be be superior to Numeric - please correct me if I'm wrong here. The shock came for me when Todd Miller said: <> I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore. Wasn't it the intent of numarray to work towards the full use of the Python class structure to provide the benefits which it offers? The Python class has two constructors and one destructor. The constructors are __init__ and __new__, the latter only provides the shell of an instance which later has to be initialized. In version 0.9, which I use, there is no __new__, but there is a new function which has a functionality similar to that intended for __new__. Thus, with this change, numarray appears to be moving further away from being pythonic. Colin W From jmiller at stsci.edu Thu Jul 1 13:03:12 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 13:03:12 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <40E45D3C.7020501@cox.net> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> <40E3462A.9080303@cox.net> <1088701077.14402.20.camel@halloween.stsci.edu> <40E45D3C.7020501@cox.net> Message-ID: <1088712102.14402.73.camel@halloween.stsci.edu> On Thu, 2004-07-01 at 14:51, Tim Hochberg wrote: > Todd Miller wrote: > > >On Wed, 2004-06-30 at 19:00, Tim Hochberg wrote: > > > > > >>>> > >>>> > >>>> > >>>FYI, the issue with tp_dealloc may have to do with which mode Python is > >>>compiled in, --with-pydebug, or not. One approach which seems like it > >>>ought to work (just thought of this!) is to add an extra reference in C > >>>to the NumArray instance __dict__ (from NumArray.__init__ and stashed > >>>via a new attribute in the PyArrayObject struct) and then DECREF it as > >>>the last part of the tp_dealloc. > >>> > >>> > >>> > >>> > >>That sounds promising. > >> > >> > > <> > > I looked at this some, and while INCREFing __dict__ maybe the right > > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > > > So the INCREF needs to be done in C without doing any getattrs; this > > seems to mean calling a private _PyObject_GetDictPtr function to get a > > pointer to the __dict__ slot which can be dereferenced to get the > > __dict__. > > Might there be a simpler way? Since you're putting an extra attribute on > the PyArrayObject structure anyway, wouldn't it be possible to just > stash _shadows there instead of the reference to the dictionary? _shadows is already in the struct. The root problem (I recall) is not the loss of self->_shadows, it's the loss self->__dict__ before self can be copied onto self->_shadows. The cause of the problem appeared to me to be the tear down order of self: the NumArray part appeared to be torn down before the _numarray part, and the tp_dealloc needs to do a Python callback where a half destructed object just won't do. To really know what the problem is, I need to stick tp_dealloc back in and see what breaks. I'm pretty sure the problem was a missing instance __dict__, but my memory is quite fallable. Todd From Chris.Barker at noaa.gov Thu Jul 1 13:18:01 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 1 13:18:01 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <20040701053355.M99698@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> Message-ID: <40E470D9.8060603@noaa.gov> Hi all, I'm looking for a way to read data from ascii text files quickly. I've found that using the standard python idioms like: data = array((M,N),Float) for in range(N): data.append(map(float,file.readline().split())) Can be pretty slow. What I'd like is something like Matlab's fscanf: data = fscanf(file, "%g", [M,N] ) I may have the syntax a little wrong, but the gist is there. What Matlab does keep recycling the format string until the desired number of elements have been read. It is quite flexible, and ends up being pretty fast. Has anyone written something like this for Numeric (or numarray, but I'd prefer Numeric at this point) ? I was surprised not to find something like this in SciPy, maybe I didn't look hard enough. If no one has done this, I guess I'll get started on it.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Fernando.Perez at colorado.edu Thu Jul 1 13:28:01 2004 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Jul 1 13:28:01 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40E470D9.8060603@noaa.gov> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> Message-ID: <40E473A9.5040109@colorado.edu> Chris Barker wrote: > Hi all, > > I'm looking for a way to read data from ascii text files quickly. I've > found that using the standard python idioms like: > > data = array((M,N),Float) > for in range(N): > data.append(map(float,file.readline().split())) > > Can be pretty slow. What I'd like is something like Matlab's fscanf: > > data = fscanf(file, "%g", [M,N] ) > > I may have the syntax a little wrong, but the gist is there. What Matlab > does keep recycling the format string until the desired number of > elements have been read. > > It is quite flexible, and ends up being pretty fast. > > Has anyone written something like this for Numeric (or numarray, but I'd > prefer Numeric at this point) ? > > I was surprised not to find something like this in SciPy, maybe I didn't > look hard enough. scipy.io.read_array? I haven't timed it, because it's been 'fast enough' for my needs. For reading binary data files, I have this little utility which is basically a wrapper around Numeric.fromstring (N below is Numeric imported 'as N'). Note that it can read binary .gz files directly, a _huge_ gain for very sparse files representing 3d arrays (I can read a 400k gz file which blows up to ~60MB when unzipped in no time at all, while reading the unzipped file is very slow): def read_bin(fname,dims,typecode,recast_type=None,offset=0,verbose=0): """Read in a binary data file. Does NOT check for endianness issues. Inputs: fname - can be .gz dims (nx1,nx2,...,nxd) typecode recast_type offset=0: # of bytes to skip in file *from the beginning* before data starts """ # config parameters item_size = N.zeros(1,typecode).itemsize() # size in bytes data_size = N.product(N.array(dims))*item_size # read in data if fname.endswith('.gz'): data_file = gzip.open(fname) else: data_file = file(fname) data_file.seek(offset) data = N.fromstring(data_file.read(data_size),typecode) data_file.close() data.shape = dims if verbose: #print 'Read',data_size/item_size,'data points. Shape:',dims print 'Read',N.size(data),'data points. Shape:',dims if recast_type is not None: data = data.astype(recast_type) return data HTH, f From squirrel at WPI.EDU Thu Jul 1 13:37:13 2004 From: squirrel at WPI.EDU (Christopher T King) Date: Thu Jul 1 13:37:13 2004 Subject: [Numpy-discussion] numarray and SMP Message-ID: (I originally posted this in comp.lang.python and was redirected here) In a quest to speed up numarray computations, I tried writing a 'threaded array' class for use on SMP systems that would distribute its workload across the processors. I hit a snag when I found out that since the Python interpreter is not reentrant, this effectively disables parallel processing in Python. I've come up with two solutions to this problem, both involving numarray's C functions that perform the actual vector operations: 1) Surround the C vector operations with Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS, thus allowing the vector operations (which don't access Python structures) to run in parallel with the interpreter. Python glue code would take care of threading and locking. 2) Move the parallelization into the C vector functions themselves. This would likely get poorer performance (a chain of vector operations couldn't be combined into one threaded operation). I'd much rather do #1, but will playing around with the interpreter state like that cause any problems? Update from original posting: I've partially implemented method #1 for Float64s. Running on four 2.4GHz Xeons (possibly two with hyperthreading?), I get about a 30% speedup while dividing 10 million Float64s, but a small (<10%) slowdown doing addition or multiplication. The operation was repeated 100 times, with the threads created outside of the loop (i.e. the threads weren't recreated for each iteration). Is there really that much overhead in Python? I can post the code I'm using and the numarray patch if it's requested. From gerard.vermeulen at grenoble.cnrs.fr Thu Jul 1 13:40:07 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Thu Jul 1 13:40:07 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <40E46CD3.9090802@sympatico.ca> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <200407010904.25498.haase@msg.ucsf.edu> <40E46CD3.9090802@sympatico.ca> Message-ID: <20040701200934.M74616@grenoble.cnrs.fr> On Thu, 01 Jul 2004 15:58:11 -0400, Colin J. Williams wrote > Sebastian Haase wrote: > > >On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > > > > > >>On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > >> > >> > >> > >>>So... you use the "meta" code to provide package specific ordinary > >>>(not-macro-fied) functions to keep the different versions of the > >>>Present() and isArray() macros from conflicting. > >>> > >>>It would be nice to have a standard approach for using the same > >>>"extension enhancement code" for both numarray and Numeric. The PEP > >>>should really be expanded to provide an example of dual support for one > >>>complete and real function, guts and all, so people can see the process > >>>end-to-end; Something like a simple arrayprint. That process needs > >>>to be refined to remove as much tedium and duplication of effort as > >>>possible. The idea is to make it as close to providing one > >>>implementation to support both array packages as possible. I think it's > >>>important to illustrate how to partition the extension module into > >>>separate compilation units which correctly navigate the dual > >>>implementation mine field in the easiest possible way. > >>> > >>>It would also be nice to add some logic to the meta-functions so that > >>>which array package gets used is configurable. We did something like > >>>that for the matplotlib plotting software at the Python level with > >>>the "numerix" layer, an idea I think we copied from Chaco. The kind > >>>of dispatch I think might be good to support configurability looks like > >>>this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(PyArray_Which(what)) { > >>> USE_NUMERIC: > >>> result = Numeric_whatsThis(what); break; > >>> USE_NUMARRAY: > >>> result = Numarray_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>In the above, I'm picturing a separate .c file for Numeric_whatsThis > >>>and for Numarray_whatsThis. It would be nice to streamline that to one > >>>.c and a process which somehow (simply) produces both functions. > >>> > >>>Or, ideally, the above would be done more like this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(Numerix_Which(what)) { > >>> USE_NUMERIX: > >>> result = Numerix_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>Here, a common Numerix implementation supports both numarray and Numeric > >>>from a single simple .c. The extension module would do "#include > >>>numerix/arrayobject.h" and "import_numerix()" and otherwise just call > >>>PyArray_* functions. > >>> > >>>The current stumbling block is that numarray is not binary compatible > >>>with Numeric... so numerix in C falls apart. I haven't analyzed > >>>every symbol and struct to see if it is really feasible... but it > >>>seems like it is *almost* feasible, at least for typical usage. > >>> > >>>So, in a nutshell, I think the dual implementation support you > >>>demoed is important and we should work up an example and kick it > >>>around to make sure it's the best way we can think of doing it. > >>>Then we should add a section to the PEP describing dual support as well. > >>> > >>> > >>I would never apply numarray code to Numeric arrays and the inverse. It > >>looks dangerous and I do not know if it is possible. The first thing > >>coming to mind is that numarray and Numeric arrays refer to different type > >>objects (this is what my pep module uses to differentiate them). So, even > >>if numarray and Numeric are binary compatible, any 'alien' code referring > >>the the 'Python-standard part' of the type objects may lead to surprises. A > >>PEP proposing hacks will raise eyebrows at least. > >> > >>Secondly, most people use Numeric *or* numarray and not both. > >> > >>So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out > >>(NINO) Of course, Numeric or numarray output can be a user option if NINO > >>does not apply. (explicit safe conversion between Numeric and numarray is > >>possible if really needed). > >> > >>I'll try to flesh out the demo with real functions in the way you indicated > >>(going as far as I consider safe). > >> > >>The problem of coding the Numeric (or numarray) functions in more than > >>a single source file has also be addressed. > >> > >>It may take 2 weeks because I am off to a conference next week. > >> > >>Regards -- Gerard > >> > >> > > > >Hi all, > >first, I would like to state that I don't understand much of this discussion; > >so the only comment I wanted to make is that IF this where possible, to make > >(C/C++) code that can live with both Numeric and numarray, then I think it > >would be used more and more - think: transition phase !! (e.g. someone could > >start making the FFTW part of scipy numarray friendly without having to > >switch everything at one [hint ;-)] ) > > > >These where just my 2 cents. > >Cheers, > >Sebastian Haase > > > > > I feel lower on the understanding tree with respect to what is being > proposed in the draft PEP, but would still like to offer my 2 cents > worth. I get the feeling that numarray is being bent out of shape > to fit Numeric. > What we are discussing are methods to make it possible to import Numeric and numarray in the same extension module. This can be done by separating the colliding APIs of Numeric and numarray in separate *.c files. To achieve this, no changes to Numeric and numarray itself are necessary. In fact, this can be done by the author of the C-extension himself, but since it is not obvious we discuss the best methods and we like to provide the necessary glue code. It will make life easier for extension writers and facilitate the transition to numarray. Try to look at the problem from the other side: I am using Numeric (since my life depends on SciPy) but have written an extension that can also import numarray (hoping to get more users). I will never use the methods proposed in the draft PEP, because it excludes importing Numeric. > > It was my understanding that Numeric had certain weakness which made > it unacceptable as a Python component and that numarray was intended > to provide the same or better functionality within a pythonic framework. > > numarray has not achieved the expected performance level to date, > but progress is being made and I believe that, for larger arrays, > numarray has been shown to be be superior to Numeric - please > correct me if I'm wrong here. > I think you are correct. I don't know why the __init__ has disappeared, but I don't think it is because of the PEP and certainly not because of the thread. > > The shock came for me when Todd Miller said: > > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > Wasn't it the intent of numarray to work towards the full use of the > Python class structure to provide the benefits which it offers? > > The Python class has two constructors and one destructor. > > The constructors are __init__ and __new__, the latter only provides > the shell of an instance which later has to be initialized. In > version 0.9, which I use, there is no __new__, but there is a new > function which has a functionality similar to that intended for > __new__. Thus, with this change, numarray appears to be moving > further away from being pythonic. > Gerard From jmiller at stsci.edu Thu Jul 1 13:46:07 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 13:46:07 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <40E46CD3.9090802@sympatico.ca> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <200407010904.25498.haase@msg.ucsf.edu> <40E46CD3.9090802@sympatico.ca> Message-ID: <1088714723.14402.114.camel@halloween.stsci.edu> On Thu, 2004-07-01 at 15:58, Colin J. Williams wrote: > Sebastian Haase wrote: > > >On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > > > > > >>On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > >> > >> > >> > >>>So... you use the "meta" code to provide package specific ordinary > >>>(not-macro-fied) functions to keep the different versions of the > >>>Present() and isArray() macros from conflicting. > >>> > >>>It would be nice to have a standard approach for using the same > >>>"extension enhancement code" for both numarray and Numeric. The PEP > >>>should really be expanded to provide an example of dual support for one > >>>complete and real function, guts and all, so people can see the process > >>>end-to-end; Something like a simple arrayprint. That process needs > >>>to be refined to remove as much tedium and duplication of effort as > >>>possible. The idea is to make it as close to providing one > >>>implementation to support both array packages as possible. I think it's > >>>important to illustrate how to partition the extension module into > >>>separate compilation units which correctly navigate the dual > >>>implementation mine field in the easiest possible way. > >>> > >>>It would also be nice to add some logic to the meta-functions so that > >>>which array package gets used is configurable. We did something like > >>>that for the matplotlib plotting software at the Python level with > >>>the "numerix" layer, an idea I think we copied from Chaco. The kind > >>>of dispatch I think might be good to support configurability looks like > >>>this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(PyArray_Which(what)) { > >>> USE_NUMERIC: > >>> result = Numeric_whatsThis(what); break; > >>> USE_NUMARRAY: > >>> result = Numarray_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>In the above, I'm picturing a separate .c file for Numeric_whatsThis > >>>and for Numarray_whatsThis. It would be nice to streamline that to one > >>>.c and a process which somehow (simply) produces both functions. > >>> > >>>Or, ideally, the above would be done more like this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(Numerix_Which(what)) { > >>> USE_NUMERIX: > >>> result = Numerix_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>Here, a common Numerix implementation supports both numarray and Numeric > >>>from a single simple .c. The extension module would do "#include > >>>numerix/arrayobject.h" and "import_numerix()" and otherwise just call > >>>PyArray_* functions. > >>> > >>>The current stumbling block is that numarray is not binary compatible > >>>with Numeric... so numerix in C falls apart. I haven't analyzed > >>>every symbol and struct to see if it is really feasible... but it > >>>seems like it is *almost* feasible, at least for typical usage. > >>> > >>>So, in a nutshell, I think the dual implementation support you > >>>demoed is important and we should work up an example and kick it > >>>around to make sure it's the best way we can think of doing it. > >>>Then we should add a section to the PEP describing dual support as well. > >>> > >>> > >>I would never apply numarray code to Numeric arrays and the inverse. It > >>looks dangerous and I do not know if it is possible. The first thing > >>coming to mind is that numarray and Numeric arrays refer to different type > >>objects (this is what my pep module uses to differentiate them). So, even > >>if numarray and Numeric are binary compatible, any 'alien' code referring > >>the the 'Python-standard part' of the type objects may lead to surprises. A > >>PEP proposing hacks will raise eyebrows at least. > >> > >>Secondly, most people use Numeric *or* numarray and not both. > >> > >>So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out > >>(NINO) Of course, Numeric or numarray output can be a user option if NINO > >>does not apply. (explicit safe conversion between Numeric and numarray is > >>possible if really needed). > >> > >>I'll try to flesh out the demo with real functions in the way you indicated > >>(going as far as I consider safe). > >> > >>The problem of coding the Numeric (or numarray) functions in more than > >>a single source file has also be addressed. > >> > >>It may take 2 weeks because I am off to a conference next week. > >> > >>Regards -- Gerard > >> > >> > > > >Hi all, > >first, I would like to state that I don't understand much of this discussion; > >so the only comment I wanted to make is that IF this where possible, to make > >(C/C++) code that can live with both Numeric and numarray, then I think it > >would be used more and more - think: transition phase !! (e.g. someone could > >start making the FFTW part of scipy numarray friendly without having to > >switch everything at one [hint ;-)] ) > > > >These where just my 2 cents. > >Cheers, > >Sebastian Haase > > > > > I feel lower on the understanding tree with respect to what is being > proposed in the draft PEP, but would still like to offer my 2 cents > worth. I get the feeling that numarray is being bent out of shape to > fit Numeric. Yes and no. The numarray team has over time realized the importance of backward compatibility with the dominant array package, Numeric. A lot of People use Numeric now. We're trying to make it as easy as possible to use numarray. > It was my understanding that Numeric had certain weakness which made it > unacceptable as a Python component and that numarray was intended to > provide the same or better functionality within a pythonic framework. My understanding is that until there is a consensus on an array package, neither numarray nor Numeric is going into the Python core. > numarray has not achieved the expected performance level to date, but > progress is being made and I believe that, for larger arrays, numarray > has been shown to be be superior to Numeric - please correct me if I'm > wrong here. I think that's a fair summary. > > The shock came for me when Todd Miller said: > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > Wasn't it the intent of numarray to work towards the full use of the > Python class structure to provide the benefits which it offers? > Ack. I wasn't trying to start a panic. The __init__ still exists, as does __new__, they're just in C. Sorry if I was unclear. > The Python class has two constructors and one destructor. We're mostly on the same page. > The constructors are __init__ and __new__, the latter only provides the > shell of an instance which later has to be initialized. In version 0.9, > which I use, there is no __new__, It's there, but it's not very useful: >>> import numarray >>> numarray.NumArray.__new__ >>> a = numarray.NumArray.__new__(numarray.NumArray) >>> a.info() class: shape: () strides: () byteoffset: 0 bytestride: 0 itemsize: 0 aligned: 1 contiguous: 1 data: None byteorder: little byteswap: 0 type: Any I don't, however, recommend doing this. > but there is a new function which has > a functionality similar to that intended for __new__. Thus, with this > change, numarray appears to be moving further away from being pythonic. Nope. I'm talking about moving toward better speed with no change in functionality at the Python level. I also think maybe we've gotten list threads crossed here: the "Numarray header PEP" thread is independent (but admittedly related) of the "Speeding up wxPython/numarray" thread. The Numarray header PEP is about making it easy for packages to write C extensions which *optionally* support numarray (and now Numeric as well). One aspect of the PEP is getting headers included in the Python core so that extensions can be compiled even when the numarray is not installed. The other aspect will be illustrating a good technique for supporting both numarray and Numeric, optionally and with choice, at the same time. Such an extension would still run where there is numarray, Numeric, both, or none installed. Gerard V. has already done some integration of numarray and Numeric with PyQwt so he has a few good ideas on how to do the "good technique" aspect of the PEP. The Speeding up wxPython/numarray thread is about improving the performance of a 50000 point wxPython drawlines which is 10x slower with numarray than Numeric. Tim H. and Chris B. have nailed this down (mostly) to the numarray sequence protocol and destructor, __del__. Regards, Todd From perry at stsci.edu Thu Jul 1 13:57:02 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 1 13:57:02 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <40E46CD3.9090802@sympatico.ca> Message-ID: Collin J. Williams Wrote: > I feel lower on the understanding tree with respect to what is being > proposed in the draft PEP, but would still like to offer my 2 cents > worth. I get the feeling that numarray is being bent out of shape to > fit Numeric. > Todd and Gerard address this point well. > It was my understanding that Numeric had certain weakness which made it > unacceptable as a Python component and that numarray was intended to > provide the same or better functionality within a pythonic framework. > Let me reiterate what our motivations were. We wanted to use an array package for our software, and Numeric had enough shortcomings that we needed some changes in behavior (e.g., type coercion for scalars), changes in performance (particularly with regard to memory usage), and enhancements in capabilities (e.g., memory mapping, record arrays, etc.). It was the opinion of some (Paul Dubois, for example) that a rewrite was in order in any case since the code was not that maintainable (not everyone felt this way, though at the time that wasn't as clear). At the same time there was some hope that Numeric could be accepted into the standard Python distribution. That's something we thought would be good (but wasn't the highest priority for us) and I've come to believe that perhaps a better solution with regard to that is what this PEP is trying to address. In any case Guido made it clear that he would not accept Numeric in its (then) current form. That it be written mostly in Python was something suggested by Guido, and we started off that way, mainly because it would get us going much faster than writing it all in C. We definitely understood that it would also have the consequence of making small array performance worse. We said as much when we started; it wasn't as clear as it is now that many users objected to a factor of few slower performance (as it turned out, a mostly Python based implemenation was more than an order of magnitude slower for small arrays). > numarray has not achieved the expected performance level to date, but > progress is being made and I believe that, for larger arrays, numarray > has been shown to be be superior to Numeric - please correct me if I'm > wrong here. > We never expected numarray to ever reach the performance level for small arrays that Numeric has. If it were within a factor of two I would be thrilled (its more like a factor of 3 or 4 currently for simple ufuncs). I still don't think it ever will be as fast for small arrays. The focus all along was on handling large arrays, which I think it does quite well, both regard to memory and speed. Yes, there are some functions and operations that may be much slower. Mainly they need to be called out so they can be improved. Generally we only notice performance issues that affect our software. Others need to point out remaining large discrepancies. I'm still of the opinion that if small array performance is really important, a very different approach should be used and have a completely different implementation. I would think that improvements of an order of magnitude over what Numeric does now are possible. But since that isn't important to us (STScI), don't expect us to work on that :-) > The shock came for me when Todd Miller said: > > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > Wasn't it the intent of numarray to work towards the full use of the > Python class structure to provide the benefits which it offers? > > The Python class has two constructors and one destructor. > > The constructors are __init__ and __new__, the latter only provides the > shell of an instance which later has to be initialized. In version 0.9, > which I use, there is no __new__, but there is a new function which has > a functionality similar to that intended for __new__. Thus, with this > change, numarray appears to be moving further away from being pythonic. > I'll agree that optimization is driving the underlying implementation to one that is more complex and that is the drawback (no surprise there). There's Pythonic in use and Pythonic in implementation. We are certainly receptive to better ideas for the implementation, but I doubt that a heavily Python-based implementation is ever going to be competitive for small arrays (unless something like psyco become universal, but I think there are a whole mess of problems to be solved for that kind of approach to work well generically). Perry From perry at stsci.edu Thu Jul 1 15:01:04 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 1 15:01:04 2004 Subject: [Numpy-discussion] numarray and SMP In-Reply-To: Message-ID: Christopher T King wrote: > > (I originally posted this in comp.lang.python and was redirected here) > > In a quest to speed up numarray computations, I tried writing a 'threaded > array' class for use on SMP systems that would distribute its workload > across the processors. I hit a snag when I found out that since > the Python > interpreter is not reentrant, this effectively disables parallel > processing in Python. I've come up with two solutions to this problem, > both involving numarray's C functions that perform the actual vector > operations: > > 1) Surround the C vector operations with Py_BEGIN_ALLOW_THREADS and > Py_END_ALLOW_THREADS, thus allowing the vector operations (which don't > access Python structures) to run in parallel with the interpreter. > Python glue code would take care of threading and locking. > > 2) Move the parallelization into the C vector functions themselves. This > would likely get poorer performance (a chain of vector operations > couldn't be combined into one threaded operation). > > I'd much rather do #1, but will playing around with the interpreter state > like that cause any problems? > I don't think so, but it raises a number of questions that I ask just below. > Update from original posting: > > I've partially implemented method #1 for Float64s. Running on four 2.4GHz > Xeons (possibly two with hyperthreading?), I get about a 30% speedup while > dividing 10 million Float64s, but a small (<10%) slowdown doing addition > or multiplication. The operation was repeated 100 times, with the threads > created outside of the loop (i.e. the threads weren't recreated for each > iteration). Is there really that much overhead in Python? I can post the > code I'm using and the numarray patch if it's requested. > Questions and comments: 1) I suppose you did this for generated ufunc code? (ideally one would put this in the codegenerator stuff but for the purposes of testing it would be fine). I guess we would like to see how you actually changed the code fragment (you can email me or Todd Miller directly if you wish) 2) How much improvement you would see depends on many details. But if you were doing this for 10 million element arrays, I'm surprised you saw such a small improvement (30% for 4 processors isn't worth the trouble it would seem). So seeing the actual test code would be helpful. If the array operation you are doing for numarray aren't simple (that's a specialized use of the word; by that I mean if the arrays are not the same type, aren't contiguous, aren't aligned, or aren't of proper byte-order) then there are a number of other issues that may slow it down quite a bit (and there are ways of improving these for parallel processing). 3) I don't speak as an expert on threading or parallel processors, but I believe so long as you don't call any Python API functions (either directly or indirectly) between the global interpreter lock release and reacquisition, you should be fine. The vector ufunc code in numarray should satisfy this fine. Perry Greenfield From squirrel at WPI.EDU Fri Jul 2 06:37:20 2004 From: squirrel at WPI.EDU (Christopher T King) Date: Fri Jul 2 06:37:20 2004 Subject: [Numpy-discussion] numarray and SMP In-Reply-To: Message-ID: On Thu, 1 Jul 2004, Perry Greenfield wrote: > 1) I suppose you did this for generated ufunc code? (ideally one > would put this in the codegenerator stuff but for the purposes > of testing it would be fine). I guess we would like to see > how you actually changed the code fragment (you can email > me or Todd Miller directly if you wish) Yep, I didn't know it was automatically generated :P > 2) How much improvement you would see depends on many details. > But if you were doing this for 10 million element arrays, I'm > surprised you saw such a small improvement (30% for 4 processors > isn't worth the trouble it would seem). So seeing the actual > test code would be helpful. If the array operation you are doing > for numarray aren't simple (that's a specialized use of the word; > by that I mean if the arrays are not the same type, aren't > contiguous, aren't aligned, or aren't of proper byte-order) > then there are a number of other issues that may slow it down > quite a bit (and there are ways of improving these for > parallel processing). I've been careful not to use anything to cause discontiguities in the arrays, and to keep them all the same type (Float64 in this case). See my next post for the code I'm using. From haase at msg.ucsf.edu Fri Jul 2 08:28:01 2004 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jul 2 08:28:01 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <200406291705.55454.haase@msg.ucsf.edu> References: <200406291705.55454.haase@msg.ucsf.edu> Message-ID: <200407020827.05407.haase@msg.ucsf.edu> On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > Hi, > > Is this a bug?: > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > >>> d.type() > > Float32 > > >>> d[80, 136, 122] > > 80.3997039795 > > >>> na.maximum.reduce(d[:,136, 122]) > > 85.8426361084 > > >>> na.maximum.reduce(d) [136, 122] > > 37.3658103943 > > >>> na.maximum.reduce(d,0)[136, 122] > > 37.3658103943 > > >>> na.maximum.reduce(d,1)[136, 122] > > Traceback (most recent call last): > File "", line 1, in ? > IndexError: Index out of range > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > (axis 0). But as seen above it does not get it right. I then tried to > reproduce > > this with some simple arrays, but here it works just fine: > >>> a = na.arange(4*4*4) > >>> a.shape=(4,4,4) > >>> na.maximum.reduce(a) > > [[48 49 50 51] > [52 53 54 55] > [56 57 58 59] > [60 61 62 63]] > > >>> a = na.arange(4*4*4).astype(na.Float32) > >>> a.shape=(4,4,4) > >>> na.maximum.reduce(a) > > [[ 48. 49. 50. 51.] > [ 52. 53. 54. 55.] > [ 56. 57. 58. 59.] > [ 60. 61. 62. 63.]] > > > Any hint ? > > Regards, > Sebastian Haase Hi again, I think the reason that no one responded to this is that it just sounds to unbelievable ... Sorry for the missing piece of information, but 'd' is actually a memmapped array ! >>> d.info() class: shape: (80, 150, 150) strides: (90000, 600, 4) byteoffset: 0 bytestride: 4 itemsize: 4 aligned: 1 contiguous: 1 data: byteorder: big byteswap: 1 type: Float32 >>> dd = d.copy() >>> na.maximum.reduce(dd[:,136, 122]) 85.8426361084 >>> na.maximum.reduce(dd)[136, 122] 85.8426361084 >>> Apparently we are using memmap so frequently now that I didn't even think about that - which is good news for everyone, because it means that it works (mostly). I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux PC. Could this be the problem? Please some comments ! Thanks, Sebastian From jmiller at stsci.edu Fri Jul 2 09:03:08 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 2 09:03:08 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <200407020827.05407.haase@msg.ucsf.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> Message-ID: <1088784157.26482.14.camel@halloween.stsci.edu> On Fri, 2004-07-02 at 11:27, Sebastian Haase wrote: > On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > > Hi, > > > > Is this a bug?: > > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > > >>> d.type() > > > > Float32 > > > > >>> d[80, 136, 122] > > > > 80.3997039795 > > > > >>> na.maximum.reduce(d[:,136, 122]) > > > > 85.8426361084 > > > > >>> na.maximum.reduce(d) [136, 122] > > > > 37.3658103943 > > > > >>> na.maximum.reduce(d,0)[136, 122] > > > > 37.3658103943 > > > > >>> na.maximum.reduce(d,1)[136, 122] > > > > Traceback (most recent call last): > > File "", line 1, in ? > > IndexError: Index out of range > > > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > > (axis 0). But as seen above it does not get it right. I then tried to > > reproduce > > > > this with some simple arrays, but here it works just fine: > > >>> a = na.arange(4*4*4) > > >>> a.shape=(4,4,4) > > >>> na.maximum.reduce(a) > > > > [[48 49 50 51] > > [52 53 54 55] > > [56 57 58 59] > > [60 61 62 63]] > > > > >>> a = na.arange(4*4*4).astype(na.Float32) > > >>> a.shape=(4,4,4) > > >>> na.maximum.reduce(a) > > > > [[ 48. 49. 50. 51.] > > [ 52. 53. 54. 55.] > > [ 56. 57. 58. 59.] > > [ 60. 61. 62. 63.]] > > > > > > Any hint ? > > > > Regards, > > Sebastian Haase > > Hi again, > I think the reason that no one responded to this is that it just sounds to > unbelievable ... This just slipped through the cracks for me. > Sorry for the missing piece of information, but 'd' is actually a memmapped > array ! > >>> d.info() > class: > shape: (80, 150, 150) > strides: (90000, 600, 4) > byteoffset: 0 > bytestride: 4 > itemsize: 4 > aligned: 1 > contiguous: 1 > data: > byteorder: big > byteswap: 1 > type: Float32 > >>> dd = d.copy() > >>> na.maximum.reduce(dd[:,136, 122]) > 85.8426361084 > >>> na.maximum.reduce(dd)[136, 122] > 85.8426361084 > >>> > > Apparently we are using memmap so frequently now that I didn't even think > about that - which is good news for everyone, because it means that it works > (mostly). > > I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux PC. > Could this be the problem? I think byteorder is a good guess at this point. What version of Python and numarray are you using? Regards, Todd From haase at msg.ucsf.edu Fri Jul 2 10:46:01 2004 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jul 2 10:46:01 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <1088784157.26482.14.camel@halloween.stsci.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> Message-ID: <200407021045.00866.haase@msg.ucsf.edu> On Friday 02 July 2004 09:02 am, Todd Miller wrote: > On Fri, 2004-07-02 at 11:27, Sebastian Haase wrote: > > On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > > > Hi, > > > > > > Is this a bug?: > > > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > > > >>> d.type() > > > > > > Float32 > > > > > > >>> d[80, 136, 122] > > > > > > 80.3997039795 > > > > > > >>> na.maximum.reduce(d[:,136, 122]) > > > > > > 85.8426361084 > > > > > > >>> na.maximum.reduce(d) [136, 122] > > > > > > 37.3658103943 > > > > > > >>> na.maximum.reduce(d,0)[136, 122] > > > > > > 37.3658103943 > > > > > > >>> na.maximum.reduce(d,1)[136, 122] > > > > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > IndexError: Index out of range > > > > > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > > > (axis 0). But as seen above it does not get it right. I then tried to > > > reproduce > > > > > > this with some simple arrays, but here it works just fine: > > > >>> a = na.arange(4*4*4) > > > >>> a.shape=(4,4,4) > > > >>> na.maximum.reduce(a) > > > > > > [[48 49 50 51] > > > [52 53 54 55] > > > [56 57 58 59] > > > [60 61 62 63]] > > > > > > >>> a = na.arange(4*4*4).astype(na.Float32) > > > >>> a.shape=(4,4,4) > > > >>> na.maximum.reduce(a) > > > > > > [[ 48. 49. 50. 51.] > > > [ 52. 53. 54. 55.] > > > [ 56. 57. 58. 59.] > > > [ 60. 61. 62. 63.]] > > > > > > > > > Any hint ? > > > > > > Regards, > > > Sebastian Haase > > > > Hi again, > > I think the reason that no one responded to this is that it just sounds > > to unbelievable ... > > This just slipped through the cracks for me. > > > Sorry for the missing piece of information, but 'd' is actually a > > memmapped array ! > > > > >>> d.info() > > > > class: > > shape: (80, 150, 150) > > strides: (90000, 600, 4) > > byteoffset: 0 > > bytestride: 4 > > itemsize: 4 > > aligned: 1 > > contiguous: 1 > > data: > > byteorder: big > > byteswap: 1 > > type: Float32 > > > > >>> dd = d.copy() > > >>> na.maximum.reduce(dd[:,136, 122]) > > > > 85.8426361084 > > > > >>> na.maximum.reduce(dd)[136, 122] > > > > 85.8426361084 > > > > > > Apparently we are using memmap so frequently now that I didn't even think > > about that - which is good news for everyone, because it means that it > > works (mostly). > > > > I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux > > PC. Could this be the problem? > > I think byteorder is a good guess at this point. What version of Python > and numarray are you using? Python 2.2.1 (#1, Feb 28 2004, 00:52:10) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 numarray 0.9 - from CVS on 2004-05-13. Regards, Sebastian Haase From jmiller at stsci.edu Fri Jul 2 12:34:09 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 2 12:34:09 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <200407021045.00866.haase@msg.ucsf.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> Message-ID: <1088796821.5974.15.camel@halloween.stsci.edu> On Fri, 2004-07-02 at 13:45, Sebastian Haase wrote: > On Friday 02 July 2004 09:02 am, Todd Miller wrote: > > On Fri, 2004-07-02 at 11:27, Sebastian Haase wrote: > > > On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > > > > Hi, > > > > > > > > Is this a bug?: > > > > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > > > > >>> d.type() > > > > > > > > Float32 > > > > > > > > >>> d[80, 136, 122] > > > > > > > > 80.3997039795 > > > > > > > > >>> na.maximum.reduce(d[:,136, 122]) > > > > > > > > 85.8426361084 > > > > > > > > >>> na.maximum.reduce(d) [136, 122] > > > > > > > > 37.3658103943 > > > > > > > > >>> na.maximum.reduce(d,0)[136, 122] > > > > > > > > 37.3658103943 > > > > > > > > >>> na.maximum.reduce(d,1)[136, 122] > > > > > > > > Traceback (most recent call last): > > > > File "", line 1, in ? > > > > IndexError: Index out of range > > > > > > > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > > > > (axis 0). But as seen above it does not get it right. I then tried to > > > > reproduce > > > > > > > > this with some simple arrays, but here it works just fine: > > > > >>> a = na.arange(4*4*4) > > > > >>> a.shape=(4,4,4) > > > > >>> na.maximum.reduce(a) > > > > > > > > [[48 49 50 51] > > > > [52 53 54 55] > > > > [56 57 58 59] > > > > [60 61 62 63]] > > > > > > > > >>> a = na.arange(4*4*4).astype(na.Float32) > > > > >>> a.shape=(4,4,4) > > > > >>> na.maximum.reduce(a) > > > > > > > > [[ 48. 49. 50. 51.] > > > > [ 52. 53. 54. 55.] > > > > [ 56. 57. 58. 59.] > > > > [ 60. 61. 62. 63.]] > > > > > > > > > > > > Any hint ? > > > > > > > > Regards, > > > > Sebastian Haase > > > > > > Hi again, > > > I think the reason that no one responded to this is that it just sounds > > > to unbelievable ... > > > > This just slipped through the cracks for me. > > > > > Sorry for the missing piece of information, but 'd' is actually a > > > memmapped array ! > > > > > > >>> d.info() > > > > > > class: > > > shape: (80, 150, 150) > > > strides: (90000, 600, 4) > > > byteoffset: 0 > > > bytestride: 4 > > > itemsize: 4 > > > aligned: 1 > > > contiguous: 1 > > > data: > > > byteorder: big > > > byteswap: 1 > > > type: Float32 > > > > > > >>> dd = d.copy() > > > >>> na.maximum.reduce(dd[:,136, 122]) > > > > > > 85.8426361084 > > > > > > >>> na.maximum.reduce(dd)[136, 122] > > > > > > 85.8426361084 > > > > > > > > > Apparently we are using memmap so frequently now that I didn't even think > > > about that - which is good news for everyone, because it means that it > > > works (mostly). > > > > > > I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux > > > PC. Could this be the problem? > > > > I think byteorder is a good guess at this point. What version of Python > > and numarray are you using? > > Python 2.2.1 (#1, Feb 28 2004, 00:52:10) > [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 > > numarray 0.9 - from CVS on 2004-05-13. > > Regards, > Sebastian Haase Hi Sebastian, I logged this on SF as a bug but won't get to it until next week after numarray-1.0 comes out. Regards, Todd From jmiller at stsci.edu Fri Jul 2 14:06:13 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 2 14:06:13 2004 Subject: [Numpy-discussion] ANN: numarray-1.0 released Message-ID: <1088802348.5974.28.camel@halloween.stsci.edu> Release Notes for numarray-1.0 Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modeled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, and improved type promotions. I. ENHANCEMENTS 1. User added ufuncs There's a setup.py file in numarray-1.0/Examples/ufunc which demonstrates how a numarray user can define their own universal functions of one or two parameters. Ever wanted to write your own bessel() function for use on arrays? Now you can. Your ufunc can use exactly the same machinery as add(). 2. Ports of Numeric functions A bunch of Numeric functions were ported to numarray in the new libnumeric module. To get these import from numarray.numeric. Most notable among these are put, putmask, take, argmin, and argmax. Also added were sort, argsort, concatenate, repeat and resize. These are independent ports/implementations in C done for the purpose of best Numeric compatibility and small array performance. The numarray versions, which handle additional cases, still exist and are the default in numarray proper. 3. Faster matrix multiply The setup for numarray's matrix multiply was moved into C-code. This makes it faster for small matrices. 4. The numarray "header PEP" A PEP has been started for the inclusion of numarray (and possibly Numeric) C headers into the Python core. The PEP will demonstrate how to provide optional support for arrays (the end-user may or may not have numarray installed and the extension will still work). It may also (eventually) demonstrate how to build extensions which support both numarray and Numeric. Thus, the PEP is seeking to make it possible to distribute extensions which will still compile when numarray (or either) is not present in a user's Python installation, which will work when numarry (or either) is not installed, and which will improve performance when either is installed. The PEP is now in numarray-1.0/Doc/header_pep.txt in docutils format. We want feedback and consensus before we submit to python-dev so please consider reading it and commenting. For the PEP, the C-API has been partitioned into two parts: a relatively simple Numeric compatible part and the numarray native part. This broke source and binary compatibility with numarray-0.9. See CAUTIONS below for more information. 5. Changes to the manual There are now brief sections on numarray.mlab and numarray.objects in the manual. The discussion of the C-API has been updated. II. CAUTIONS 1. The numarray-1.0 C-API is neither completely source level nor binary compatible with numarray-0.9. First, this means that some 3rd party extensions will no longer compile without errors. Second, this means that binary packages built against numarray-0.9 will fail, probably disastrously, using numarray-1.0. Don't install numarray-1.0 until you are ready to recompile or replace your extensions with numarray-1.0 binaries because 0.9 binaries will not work. In order to support the header PEP, the numarray C-API was partitioned into two parts: Numeric compatible and numarry extensions. You can use the Numeric compatible API (the PyArray_* functions) by including arrayobject.h and calling import_array() in your module init function. You can use the extended API (the NA_* functions) by including libnumarray.h and calling import_libnumarray() in your init function. Because of the partitioning, all numarray extensions must be recompiled to work with 1.0. Extensions using *both* APIs must include both files in order to compile, and must do both imports in order to run. Both APIs share a common PyArrayObject struct. 2. numarray extension writers should note that the documented use of PyArray_INCREF and PyArray_XDECREF (in numarray) was found to be incompatible with Numeric and these functions have therefore been removed from the supported API and will now result in errors. 3. The numarray.objects.ObjectArray parameter order was changed. 4. The undocumented API function PyArray_DescrFromTypeObj was removed from the Numeric compatible API because it is not provided by Numeric. III. BUGS FIXED / CLOSED See http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse for more details. 979834 convolve2d parameter order issues 979775 ObjectArray parameter order 979712 No exception for invalid axis 979702 too many slices fails silently 979123 A[n:n] = x no longer works 979028 matrixmultiply precision 976951 Unpickled numarray types unsable? 977472 CharArray concatenate 970356 bug in accumulate contiguity status 969162 object array bug/ambiguity 963921 bitwise_not over Bool type fails 963706 _reduce_out: problem with otype 942804 numarray C-API include file 932438 suggest moving mlab up a level 932436 mlab docs missing 857628 numarray allclose returns int 839401 Argmax's behavior has changed for ties 817348 a+=1j # Not converted to complex 957089 PyArray_FromObject dim check broken 923046 numarray.objects incompatibility 897854 Type conflict when embedding on OS X 793421 PyArray_INCREF / PyArray_XDECREF deprecated 735479 Build failure on Cygwin 1.3.22 (very current install). 870660 Numarray: CFLAGS build problem 874198 numarray.random_array.random() broken? 874207 not-so random numbers in numarray.random_array 829662 Downcast from Float64 to UInt8 anomaly 867073 numarray diagonal bug? 806705 a tale of two rank-0's 863155 Zero size numarray breaks for loop 922157 argmax returns integer in some cases 934514 suggest nelements -> size 953294 choose bug 955314 strings.num2char bug? 955336 searchsorted has strange behaviour 955409 MaskedArray problems 953567 Add read-write requirement to NA_InputArray 952705 records striding for > 1D arrays 944690 many numarray array methods not documented 915015 numarray/Numeric incompatabilities 949358 UsesOpPriority unexpected behavior 944678 incorrect help for "size" func/method 888430 NA_NewArray() creates array with wrong endianess 922798 The document Announce.txt is out of date 947080 numarray.image.median bugs 922796 Manual has some dated MA info 931384 What does True mean in a mask? 931379 numeric.ma called MA in manual 933842 Bool arrays don't allow bool assignment 935588 problem parsing argument "nbyte" in callStrideConvCFunc() 936162 problem parsing "nbytes" argument in copyToString() 937680 Error in Lib/numerictypes.py ? 936539 array([cmplx_array, int_array]) fails 936541 a[...,1] += 0 crashes interpreter. 940826 Ufunct operator don't work 935882 take for character arrays? 933783 numarray, _ufuncmodule.c: problem setting buffersize 930014 fromstring typecode param still broken 929841 searchsorted type coercion 924841 numarray.objects rank-0 results 925253 numarray.objects __str__ and __repr__ 913782 Minor error in chapter 12: NUM_ or not? 889591 wrong header file for C extensions 925073 API manual comments 924854 take() errors 925754 arange() with large argument crashes interpreter 926246 ufunc reduction crash 902153 can't compile under RH9/gcc 3.2.2 916876 searchsorted/histogram broken in versions 0.8 and 0.9 920470 numarray arange() problem 915736 numarray-0.9: Doc/CHANGES not up to date WHERE ----------- Numarray-1.0 windows executable installers, source code, and manual is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS ------------------------------ numarray-1.0 requires Python 2.2.2 or greater. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. We'd like to acknowledge the assitance of Francesc Alted, Paul Dubois, Sebastian Haase, Tim Hochberg, Nadav Horesh, Edward C. Jones, Eric Jones, Jochen K"upper, Travis Oliphant, Pearu Peterson, Peter Verveer, Colin Williams, and everyone else who has contributed with comments, bug reports, or patches. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From paustin at eos.ubc.ca Sat Jul 3 10:11:03 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Sat Jul 3 10:11:03 2004 Subject: [Numpy-discussion] Bug in numarray.typecode()? In-Reply-To: <1088796821.5974.15.camel@halloween.stsci.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> Message-ID: <16614.59532.288486.645869@gull.eos.ubc.ca> I'm in the process of switching to numarray, but I still need typecode(). I notice that, although it's discouraged, the typecode ids have been extended to all new numarray types described in table 4.1 (p. 19) of the manual, except UInt64. That is, the following script: import numarray as Na print "Numarray version: ",Na.__version__ print Na.array([1],'Int8').typecode() print Na.array([1],'UInt8').typecode() print Na.array([1],'Int16').typecode() print Na.array([1],'UInt16').typecode() print Na.array([1],'Int32').typecode() print Na.array([1],'UInt32').typecode() print Na.array([1],'Float32').typecode() print Na.array([1],'Float64').typecode() print Na.array([1],'Complex32').typecode() print Na.array([1],'Complex64').typecode() print Na.array([1],'Bool').typecode() print Na.array([1],'UInt64').typecode() prints: Numarray version: 1.0 1 b s w l u f d F D 1 Traceback (most recent call last): File "", line 14, in ? File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 1092, in typecode return _nt.typecode[self._type] KeyError: UInt64 Should this print 'U'? Regards, Phil Austin From curzio.basso at unibas.ch Tue Jul 6 02:42:06 2004 From: curzio.basso at unibas.ch (Curzio Basso) Date: Tue Jul 6 02:42:06 2004 Subject: [Numpy-discussion] inconsistencies between docs and C headers? Message-ID: <40EA73C9.7070604@unibas.ch> Hi all, can someone explain me why in the docs functions like NA_NewArray() return a PyObject*, while in the headers they return a PyArrayObject*? Is it just the documentation which is slow to catch up with the development? Or am i missing something? thanks, curzio From jmiller at stsci.edu Tue Jul 6 06:35:11 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 6 06:35:11 2004 Subject: [Numpy-discussion] Bug in numarray.typecode()? In-Reply-To: <16614.59532.288486.645869@gull.eos.ubc.ca> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> Message-ID: <1089120859.25460.3.camel@halloween.stsci.edu> On Sat, 2004-07-03 at 13:10, Philip Austin wrote: > I'm in the process of switching to numarray, but I still > need typecode(). I notice that, although it's discouraged, > the typecode ids have been extended to all new numarray > types described in table 4.1 (p. 19) of the manual, except UInt64. > That is, the following script: > > import numarray as Na > print "Numarray version: ",Na.__version__ > print Na.array([1],'Int8').typecode() > print Na.array([1],'UInt8').typecode() > print Na.array([1],'Int16').typecode() > print Na.array([1],'UInt16').typecode() > print Na.array([1],'Int32').typecode() > print Na.array([1],'UInt32').typecode() > print Na.array([1],'Float32').typecode() > print Na.array([1],'Float64').typecode() > print Na.array([1],'Complex32').typecode() > print Na.array([1],'Complex64').typecode() > print Na.array([1],'Bool').typecode() > print Na.array([1],'UInt64').typecode() > > prints: > > Numarray version: 1.0 > 1 > b > s > w > l > u > f > d > F > D > 1 > Traceback (most recent call last): > File "", line 14, in ? > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 1092, in typecode > return _nt.typecode[self._type] > KeyError: UInt64 > > Should this print 'U'? I think it could, but I wouldn't go so far as to say it should. typecode() is there for backward compatibility with Numeric. Since 'U' doesn't work for Numeric, I see no point in adding it to numarray. I'm not sure it would hurt anything other than create the illusion that something which works on numarray will also work on Numeric. If anyone has a good reason to add it, please speak up. Regards, Todd From jmiller at stsci.edu Tue Jul 6 06:58:09 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 6 06:58:09 2004 Subject: [Numpy-discussion] inconsistencies between docs and C headers? In-Reply-To: <40EA73C9.7070604@unibas.ch> References: <40EA73C9.7070604@unibas.ch> Message-ID: <1089122261.25460.41.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 05:41, Curzio Basso wrote: > Hi all, > can someone explain me why in the docs functions like NA_NewArray() > return a PyObject*, while in the headers they return a PyArrayObject*? > Is it just the documentation which is slow to catch up with the > development? Yes, it's a bona fide inconsistency. It's not great, but it's fairly harmless since a PyArrayObject is a PyObject. From paustin at eos.ubc.ca Tue Jul 6 09:31:05 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Tue Jul 6 09:31:05 2004 Subject: [Numpy-discussion] Bug in numarray.typecode()? In-Reply-To: <1089120859.25460.3.camel@halloween.stsci.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> <1089120859.25460.3.camel@halloween.stsci.edu> Message-ID: <16618.54200.934079.44467@gull.eos.ubc.ca> Todd Miller writes: > > > > Should this print 'U'? > > I think it could, but I wouldn't go so far as to say it should. > typecode() is there for backward compatibility with Numeric. Since 'U' > doesn't work for Numeric, I see no point in adding it to numarray. I'm > not sure it would hurt anything other than create the illusion that > something which works on numarray will also work on Numeric. > > If anyone has a good reason to add it, please speak up. > I don't necessarily need typecode, but I couldn't find the inverse of a = array([10], type = 'UInt8') (p. 19) in the manual. That is, I need a method that returns the string representation of a numarray type in a single call (as opposed to the two-step repr(array.type()). This is for code that uses the Boost C++ bindings to numarray. These bindings work via callbacks to python (which eliminates the need to link to the numarray or numeric api). Currently I use typecode() to get an index into a map of types when I need to check that the type of a passed argument is correct: void check_type(boost::python::numeric::array arr, string expected_type){ string actual_type = arr.typecode(); if (actual_type != expected_type) { std::ostringstream stream; stream << "expected Numeric type " << kindstrings[expected_type] << ", found Numeric type " << kindstrings[actual_type] << std::ends; PyErr_SetString(PyExc_TypeError, stream.str().c_str()); throw_error_already_set(); } return; } Unless I'm missing something, without typecode I need a second interpreter call to repr, or I need to import numarray and load all the types into storage for a type object comparison. It's not a showstopper, but since I check every argument in every call, I'd like to avoid this unless absolutely necessary. Regards, Phil From jmiller at stsci.edu Tue Jul 6 11:40:08 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 6 11:40:08 2004 Subject: [Numpy-discussion] Missing header_pep.txt Message-ID: <1089139173.26741.2.camel@halloween.stsci.edu> Somehow header_pep.txt didn't make it into the numarray-1.0 source tar-ball. It's now in CVS and also attached. Regards, Todd -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 38 URL: From jmiller at stsci.edu Tue Jul 6 10:15:27 2004 From: jmiller at stsci.edu (Todd Miller) Date: 06 Jul 2004 10:15:27 -0400 Subject: ANN: numarray-1.0 released In-Reply-To: <40C2E65B0000343B@cpfe4.be.tisc.dk> References: <40C2E65B0000343B@cpfe4.be.tisc.dk> Message-ID: <1089123327.25460.57.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 02:59, jjm at tiscali.dk wrote: > > The PEP is now in > > numarray-1.0/Doc/header_pep.txt in docutils format. We want feedback > > and consensus before we submit to python-dev so please consider > > reading it and commenting. > > I can't find header_pep.txt! It is not in numarray-1.0.tar.gz. Oops, you're right. I attached it. Apparently I forgot to add it to CVS. Todd -------------- next part -------------- PEP: XXX Title: numerical array headers Version: $Revision: 1.3 $ Last-Modified: $Date: 2002/08/30 04:11:20 $ Author: Todd Miller , Perry Greenfield Discussions-To: numpy-discussion at lists.sf.net Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Jun-2004 Python-Version: 2.4 Post-History: 30-Aug-2002 Abstract ======== We propose the inclusion of three numarray header files within the CPython distribution to facilitate use of numarray array objects as an optional data data format for 3rd party modules. The PEP illustrates a simple technique by which a 3rd party extension may support numarrays as input or output values if numarray is installed, and yet the 3rd party extension does not require numarray to be installed to be built. Nothing needs to be changed in the setup.py or makefile for installing with or without numarray, and a subsequent installation of numarray will allow its use without rebuilding the 3rd party extension. Specification ============= This PEP applies only to the CPython platform and only to numarray. Analogous PEPs could be written for Jython and Python.NET and Numeric, but what is discussed here is a speed optimization that is tightly coupled to CPython and numarray. Three header files to support the numarray C-API should be included in the CPython distribution within a numarray subdirectory of the Python include directory: * numarray/arraybase.h * numarray/libnumeric.h * numarray/arrayobject.h The files are shown prefixed with "numarray" to leave the door open for doing similar PEPs with other packages, such as Numeric. If a plethora of such header contributions is anticipated, a further refinement would be to locate the headers under something like "third_party/numarray". In order to provide enhanced performance for array objects, an extension writer would start by including the numarray C-API in addition to any other Python headers: :: #include "numarray/arrayobject.h" Not shown in this PEP are the API calls which operate on numarrays. These are documented in the numarray manual. What is shown here are two calls which are guaranteed to be safe even when numarray is not installed: * PyArray_Present() * PyArray_isArray() In an extension function that wants to access the numarray API, a test needs to be performed to determine if the API functions are safely callable: :: PyObject * some_array_returning_function(PyObject *m, PyObject *args) { int param; PyObject *result; if (!PyArg_ParseTuple(args, "i", ¶m)) return NULL; if (PyArray_Present()) { result = numarray_returning_function(param); } else { result = list_returning_function(param); } return result; } Within **numarray_returning_function**, a subset of the numarray C-API (the Numeric compatible API) is available for use so it is possible to create and return numarrays. Within **list_returning_function**, only the standard Python C-API can be used because numarray is assumed to be unavailable in that particular Python installation. In an extension function that wants to accept numarrays as inputs and provide improved performance over the Python sequence protocol, an additional convenience function exists which diverts arrays to specialized code when numarray is present and the input is an array: :: PyObject * some_array_accepting_function(PyObject *m, PyObject *args) { PyObject *sequence, *result; if (!PyArg_ParseTuple(args, "O", &sequence)) return NULL; if (PyArray_isArray(sequence)) { result = numarray_input_function(sequence); } else { result = sequence_input_function(sequence); } return result; } During module initialization, a numarray enhanced extension must call **import_array()**, a macro which imports numarray and assigns a value to a static API pointer: PyArray_API. Since the API pointer starts with the value NULL and remains so if the numarray import fails, the API pointer serves as a flag that indicates that numarray was sucessfully imported whenever it is non-NULL. :: static void initfoo(void) { PyObject *m = Py_InitModule3( "foo", _foo_functions, _foo__doc__); if (m == NULL) return; import_array(); } **PyArray_Present()** indicates that numarray was successfully imported. It is defined in terms of the API function pointer as: :: #define PyArray_Present() (PyArray_API != NULL) **PyArray_isArray(s)** indicates that numarray was successfully imported and the given parameter is a numarray instance. It is defined as: :: #define PyArray_isArray(s) (PyArray_Present() && PyArray_Check(s)) Motivation ========== The use of numeric arrays as an interchange format is eminently sensible for many kinds of modules. For example, image, graphics, and audio modules all can accept or generate large amounts of numerical data that could easily use the numarray format. But since numarray is not part of the standard distribution, some authors of 3rd party extensions may be reluctant to add a dependency on a different 3rd party extension that isn't absolutely essential for its use fearing dissuading users who may be put off by extra installation requirements. Yet, not allowing easy interchange with numarray introduces annoyances that need not be present. Normally, in the absence of an explicit ability to generate or use numarray objects, one must write conversion utilities to convert from the data representation used to that for numarray. This typically involves excess copying of data (usually from internal to string to numarray). In cases where the 3rd party uses buffer objects, the data may not need copying at all. Either many users may have to develop their own conversion routines or numarray will have to include adapters for many other 3rd party packages. Since numarray is used by many projects, it makes more sense to put the conversion logic on the other side of the fence. There is a clear need for a mechanism that allows 3rd party software to use numarray objects if it is available without requiring numarray's presence to build and install properly. Rationale ========= One solution is to make numarray part of the standard distribution. That may be a good long-term solution, but at the moment, the numeric community is in transition period between the Numeric and numarray packages which may take years to complete. It is not likely that numarray will be considered for adoption until the transition is complete. Numarray is also a large package, and there is legitimate concern about its inclusion as regards the long-term commitment to support. We can solve that problem by making a few include files part of the Python Standard Distribution and demonstrating how extension writers can write code that uses numarray conditionally. The API submitted in this PEP is the subset of the numarray API which is most source compatible with Numeric. The headers consist of two handwritten files (arraybase.h and arrayobject.h) and one generated file (libnumeric.h). arraybase.h contains typedefs and enumerations which are important to both the API presented here and to the larger numarray specific API. arrayobject.h glues together arraybase and libnumeric and is needed for Numeric compatibility. libnumeric.h consists of macros generated from a template and a list of function prototypes. The macros themselves are somewhat intricate in order to provide the compile time checking effect of function prototypes. Further, the interface takes two forms: one form is used to compile numarray and defines static function prototypes. The other form is used to compile extensions which use the API and defines macros which execute function calls through pointers which are found in a table located using a single public API pointer. These macros also test the value of the API pointer in order to deliver a fatal error should a developer forget to initialize by calling import_array(). The interface chosen here is the subset of numarray most useful for porting existing Numeric code or creating new extensions which can be compiled for either numarray or Numeric. There are a number of other numarray API functions which are omitted here for the sake of simplicity. By choosing to support only the Numeric compatible subset of the numarray C-API, concerns about interface stability are minimized because the Numeric API is well established. However, it should be made clear that the numarray API subset proposed here is source compatible, not binary compatible, with Numeric. Resources ========= * numarray/arraybase.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arraybase.h) * numarray/libnumeric.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/libnumeric.h) * numarray/arrayobject.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arrayobject.h) * numarray-1.0 manual PDF * numarray-1.0 source distribution * numarray website at STSCI (http://www.stsci.edu/resources/software_hardware/numarray) * example numarray enhanced extension References ========== .. [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton (http://www.python.org/peps/pep-0001.html) .. [2] PEP 9, Sample Plaintext PEP Template, Warsaw (http://www.python.org/peps/pep-0009.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From paustin at eos.ubc.ca Tue Jul 6 16:09:02 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Tue Jul 6 16:09:02 2004 Subject: [Numpy-discussion] non-intuitive behaviour for isbyteswapped()? In-Reply-To: <16614.59532.288486.645869@gull.eos.ubc.ca> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> Message-ID: <16619.12490.596884.579782@gull.eos.ubc.ca> With numarray 1.0 and Mandrake 10 i686 I get the following: >>> y=N.array([1,1,2,1],type="Float64") >>> y array([ 1., 1., 2., 1.]) >>> y.byteswap() >>> y array([ 3.03865194e-319, 3.03865194e-319, 3.16202013e-322, 3.03865194e-319]) >>> y.isbyteswapped() 0 Should this be 1? Thanks, Phil From paustin at eos.ubc.ca Tue Jul 6 18:43:49 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Tue Jul 6 18:43:49 2004 Subject: [Numpy-discussion] optional arguments to the array constructor Message-ID: <16619.21771.686179.152410@gull.eos.ubc.ca> (for numpy v1.0 on Mandrake 10 i686) As noted on p. 25 the array constructor takes up to 5 optional arguments array(sequence=None, type=None, shape=None, copy=1, savespace=0,typecode=None) (and raises an exception if both type and typecode are set). Is there any way to make an alias (copy=0) of an array without passing keyword values? That is, specifying the copy keyword alone works: test=N.array((1., 3), "Float64", shape=(2,), copy=1, savespace=0) a=N.array(test, copy=0) a[1]=999 print test >>> [ 1. 999.] But when intervening keywords are specified copy won't toggle: test=N.array((1., 3)) a=N.array(sequence=test, type="Float64", shape=(2,), copy=0) a[1]=999. print test >>> [ 1. 3.] Which is also the behaviour I see when I drop the keywords: test=N.array((1., 3)) a=N.array(test, "Float64", (2,), 0) a[1]=999. print test >>> [ 1. 3.] an additional puzzle is that adding the savespace parameter raises the following exception: >>> a=N.array(test, "Float64", (2,), 0,0) Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 312, in array type = getTypeObject(sequence, type, typecode) File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 256, in getTypeObject rtype = _typeFromTypeAndTypecode(type, typecode) File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 243, in _typeFromTypeAndTypecode raise ValueError("Can't define both 'type' and 'typecode' for an array.") ValueError: Can't define both 'type' and 'typecode' for an array. Thanks for any insights -- Phil From postmaster at framatome-anp.com Tue Jul 6 23:59:40 2004 From: postmaster at framatome-anp.com (System Administrator) Date: Tue Jul 6 23:59:40 2004 Subject: [Numpy-discussion] Undeliverable: Re: Thanks! Message-ID: <72B401374280BA4897AF65F8853386C0F193DC@fpari01mxb.di.framatome.fr> Your message To: jacques.heliot at framatome-anp.com Subject: Re: Thanks! Sent: Wed, 7 Jul 2004 08:56:14 +0200 did not reach the following recipient(s): jacques.heliot at framail.framatome-anp.com on Wed, 7 Jul 2004 08:56:06 +0200 The recipient name is not recognized The MTS-ID of the original message is: c=fr;a= ;p=fragroup;l=FPARI01MXB0407070656LWFRLMFV MSEXCH:IMS:FRAGROUP:FRAANP-FR-PARIS-PARIS:FPARI01MXB 0 (000C05A6) Unknown Recipient -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 38 URL: From numpy-discussion at lists.sourceforge.net Wed Jul 7 02:56:14 2004 From: numpy-discussion at lists.sourceforge.net (numpy-discussion at lists.sourceforge.net) Date: Wed, 7 Jul 2004 08:56:14 +0200 Subject: Thanks! Message-ID: <200407070647.i676lSFD047810@mx.framatome-anp.com> ------------------ Virus Warning Message (on octopussy) Found virus WORM_NETSKY.D in file message_part2.pif The uncleanable file is deleted. --------------------------------------------------------- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT2026863.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT2026864.txt URL: From jmiller at stsci.edu Wed Jul 7 07:58:05 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 7 07:58:05 2004 Subject: [Numpy-discussion] non-intuitive behaviour for isbyteswapped()? In-Reply-To: <16619.12490.596884.579782@gull.eos.ubc.ca> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> <16619.12490.596884.579782@gull.eos.ubc.ca> Message-ID: <1089212251.29456.212.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 19:07, Philip Austin wrote: > With numarray 1.0 and Mandrake 10 i686 I get the following: > > >>> y=N.array([1,1,2,1],type="Float64") > >>> y > array([ 1., 1., 2., 1.]) > >>> y.byteswap() > >>> y > array([ 3.03865194e-319, 3.03865194e-319, 3.16202013e-322, > 3.03865194e-319]) > >>> y.isbyteswapped() > 0 > > Should this be 1? The behavior of byteswap() has been controversial in the past, at one time implementing exactly the behavior I think you expected. Without giving any guarantee for the future, here's how things work now: byteswap() just swaps the bytes. There's a related method, togglebyteorder(), which inverts the sense of the byteorder: >>> y.byteswap() >>> y.togglebyteorder() >>> y.isbyteswapped() 1 The ability to munge bytes and change the sense of byteorder independently is definitely needed... but you're certainly not the first one to ask this question. There is also (Numeric compatible) byteswapped(), which both swaps and changes sense, but it creates a copy rather than operating in place: >>> x = y.byteswapped() >>> (x is not y) and (x._data is not y._data) 1 Regards, Todd From jmiller at stsci.edu Wed Jul 7 08:13:05 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 7 08:13:05 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <16619.21771.686179.152410@gull.eos.ubc.ca> References: <16619.21771.686179.152410@gull.eos.ubc.ca> Message-ID: <1089213153.29456.229.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 21:42, Philip Austin wrote: > (for numpy v1.0 on Mandrake 10 i686) My guess is you're talking about numarray here. Please be charitable if I'm talking out of turn... I tend to see everything as a numarray issue. > As noted on p. 25 the array constructor takes up to 5 optional arguments > > array(sequence=None, type=None, shape=None, copy=1, savespace=0,typecode=None) > (and raises an exception if both type and typecode are set). > > Is there any way to make an alias (copy=0) of an array without passing > keyword values? In numarray, all you have to do to get an alias is: >>> b = a.view() It's an alias because: >>> b._data is a._data True > That is, specifying the copy keyword alone works: > > test=N.array((1., 3), "Float64", shape=(2,), copy=1, savespace=0) > a=N.array(test, copy=0) > a[1]=999 > print test > > >>> [ 1. 999.] > > But when intervening keywords are specified copy won't toggle: > > test=N.array((1., 3)) > a=N.array(sequence=test, type="Float64", shape=(2,), copy=0) > a[1]=999. > print test > >>> [ 1. 3.] > > Which is also the behaviour I see when I drop the keywords: > > test=N.array((1., 3)) > a=N.array(test, "Float64", (2,), 0) > a[1]=999. > print test > >>> [ 1. 3.] > > an additional puzzle is that adding the savespace parameter raises > the following exception: > > > >>> a=N.array(test, "Float64", (2,), 0,0) > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 312, in array > type = getTypeObject(sequence, type, typecode) > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 256, in getTypeObject > rtype = _typeFromTypeAndTypecode(type, typecode) > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 243, in _typeFromTypeAndTypecode > raise ValueError("Can't define both 'type' and 'typecode' for an array.") > ValueError: Can't define both 'type' and 'typecode' for an array. All this looks like a documentation problem. The numarray array() signature has been tortured by Numeric backward compatibility, so there has been more flux in it than you would expect. Anyway, the manual is out of date. Here's the current signature from the code: def array(sequence=None, typecode=None, copy=1, savespace=0, type=None, shape=None): Sorry about the confusion, Todd From paustin at eos.ubc.ca Wed Jul 7 11:26:11 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Wed Jul 7 11:26:11 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <1089213153.29456.229.camel@halloween.stsci.edu> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> Message-ID: <16620.16395.603789.28730@gull.eos.ubc.ca> Todd Miller writes: > On Tue, 2004-07-06 at 21:42, Philip Austin wrote: > > (for numpy v1.0 on Mandrake 10 i686) > > My guess is you're talking about numarray here. Please be charitable if > I'm talking out of turn... I tend to see everything as a numarray > issue. Right -- I'm still working through the boost test suite for numarray, which is failing a couple of tests that passed (around numarray v0.3). > All this looks like a documentation problem. The numarray array() > signature has been tortured by Numeric backward compatibility, so there > has been more flux in it than you would expect. Anyway, the manual is > out of date. Here's the current signature from the code: > > def array(sequence=None, typecode=None, copy=1, savespace=0, > type=None, shape=None): > Actually, it seems to be a difference in the way that numeric and numarray treat the copy flag when typecode is specified. In numeric, if no change in type is requested and copy=0, then the constructor goes ahead and produces a view: import Numeric as nc test=nc.array([1,2,3],'i') a=nc.array(test,'i',0) a[0]=99 print test >> [99 2 3] but makes a copy if a cast is required: test=nc.array([1,2,3],'i') a=nc.array(test,'F',0) a[0]=99 print test >>> [1 2 3] Looking at numarraycore.py line 305 I see that: if type is None and typecode is None: if copy: a = sequence.copy() else: a = sequence i.e. numarray skips the check for a type match and ignores the copy flag, even if the type is preserved: import numarray as ny test=ny.array([1,2,3],'i') a=ny.array(test,'i',0) a._data is test._data >>> False It look like there might have been a comment about this in the docstring, but it got clipped at some point?: array() constructs a NumArray by calling NumArray, one of its factory functions (fromstring, fromfile, fromlist), or by making a copy of an existing array. If copy=0, array() will create a new array only if sequence specifies the contents or storage for the array Thanks, Phil From jmiller at stsci.edu Wed Jul 7 12:47:02 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 7 12:47:02 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <16620.16395.603789.28730@gull.eos.ubc.ca> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> <16620.16395.603789.28730@gull.eos.ubc.ca> Message-ID: <1089229573.29456.544.camel@halloween.stsci.edu> On Wed, 2004-07-07 at 14:25, Philip Austin wrote: > Todd Miller writes: > > On Tue, 2004-07-06 at 21:42, Philip Austin wrote: > > > (for numpy v1.0 on Mandrake 10 i686) > > > > My guess is you're talking about numarray here. Please be charitable if > > I'm talking out of turn... I tend to see everything as a numarray > > issue. > > Right -- I'm still working through the boost test suite for numarray, which is > failing a couple of tests that passed (around numarray v0.3). > > > All this looks like a documentation problem. The numarray array() > > signature has been tortured by Numeric backward compatibility, so there > > has been more flux in it than you would expect. Anyway, the manual is > > out of date. Here's the current signature from the code: > > > > def array(sequence=None, typecode=None, copy=1, savespace=0, > > type=None, shape=None): > > > > Actually, it seems to be a difference in the way that numeric and > numarray treat the copy flag when typecode is specified. In numeric, > if no change in type is requested and copy=0, then the constructor > goes ahead and produces a view: > > import Numeric as nc > test=nc.array([1,2,3],'i') > a=nc.array(test,'i',0) > a[0]=99 > print test > >> [99 2 3] > > but makes a copy if a cast is required: > > test=nc.array([1,2,3],'i') > a=nc.array(test,'F',0) > a[0]=99 > print test > >>> [1 2 3] > > Looking at numarraycore.py line 305 I see that: > > if type is None and typecode is None: > if copy: > a = sequence.copy() > else: > a = sequence > > i.e. numarray skips the check for a type match and ignores > the copy flag, even if the type is preserved: > > import numarray as ny > test=ny.array([1,2,3],'i') > a=ny.array(test,'i',0) > a._data is test._data > >>> False > OK, I think I see what you're after and agree that it's a bug. Here's how I'll change the behavior: >>> import numarray >>> a = numarray.arange(10) >>> b = numarray.array(a, copy=0) >>> a is b True >>> b = numarray.array(a, copy=1) >>> a is b False One possible point of note is that array() doesn't return views for copy=0; neither does Numeric; both return the original sequence. Regards, Todd From paustin at eos.ubc.ca Wed Jul 7 13:15:04 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Wed Jul 7 13:15:04 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <1089229573.29456.544.camel@halloween.stsci.edu> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> <16620.16395.603789.28730@gull.eos.ubc.ca> <1089229573.29456.544.camel@halloween.stsci.edu> Message-ID: <16620.22921.791432.143944@gull.eos.ubc.ca> Todd Miller writes: > > OK, I think I see what you're after and agree that it's a bug. Here's > how I'll change the behavior: > > >>> import numarray > >>> a = numarray.arange(10) > >>> b = numarray.array(a, copy=0) > >>> a is b > True > >>> b = numarray.array(a, copy=1) > >>> a is b > False Just to be clear -- the above is the current numarray v1.0 behavior (at least on my machine). Numeric compatibility would additonally require that import numarray a = numarray.arange(10) theTypeCode=repr(a.type()) b = numarray.array(a, theTypeCode, copy=0) print a is b b = numarray.array(a, copy=1) print a is b produce True False While currently it produces True True Having said this, I can work around this difference -- so either a note in the documentation or just removing the copy flag from numarray.array would also be ok. -- Thanks, Phil From paustin at eos.ubc.ca Wed Jul 7 13:17:03 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Wed Jul 7 13:17:03 2004 Subject: [Numpy-discussion] Re: Correction -- optional arguments to the array constructor In-Reply-To: <1089229573.29456.544.camel@halloween.stsci.edu> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> <16620.16395.603789.28730@gull.eos.ubc.ca> <1089229573.29456.544.camel@halloween.stsci.edu> Message-ID: <16620.23066.506262.410021@gull.eos.ubc.ca> Oops, note the change below at ---> Todd Miller writes: > > OK, I think I see what you're after and agree that it's a bug. Here's > how I'll change the behavior: > > >>> import numarray > >>> a = numarray.arange(10) > >>> b = numarray.array(a, copy=0) > >>> a is b > True > >>> b = numarray.array(a, copy=1) > >>> a is b > False Just to be clear -- the above is the current numarray v1.0 behavior (at least on my machine). Numeric compatibility would additonally require that import numarray a = numarray.arange(10) theTypeCode=repr(a.type()) b = numarray.array(a, theTypeCode, copy=0) print a is b b = numarray.array(a, copy=1) print a is b produce True False While currently it produces ---> False False Having said this, I can work around this difference -- so either a note in the documentation or just removing the copy flag from numarray.array would also be ok. -- Thanks, Phil From wlanger at bigpond.net.au Thu Jul 8 10:29:01 2004 From: wlanger at bigpond.net.au (Wendy Langer) Date: Thu Jul 8 10:29:01 2004 Subject: [Numpy-discussion] "buffer not aligned on 8 byte boundary" errors when running numarray.testall.test() Message-ID: Hi there all :) I am having trouble with my installation of numarray. :( I am a python newbie and a numarray extreme-newbie, so it could be that I don't yet have the first clue what I am doing. ;) Python 2.3.3 (#51, Feb 16 2004, 04:07:52) [MSC v.1200 32 bit (Intel)] on win32 numarray 1.0 The Python I am using is the one that comes with the "Enthought" version (www.enthought.com), a distro specifically designed to be useful for scientists, so it comes with numerical stuff and scipy and chaco and things like that preinstalled. I used the windows binary installer. However it came with Numeric and not numarray, so I installed numarray "by hand". This seemed to go ok, and it seems that there is no problem having both Numeric and numarray in the same installation, since they have (obviously) different names (still getting used to this whole modules and namespaces &c &c) At the bottom of this email I have pasted an example of what it was I was trying to do, and the error messages that the interpreter gave me - but before anyone bothers reading them in any detail, the essential error seems to be as follows: error: multiply_Float64_scalar_vector: buffer not aligned on 8 byte boundary. I have no idea what this means, but I do recall that when I ran the numarray.testall.test() procedure after first completing my installation a couple of days ago, it reported a *lot* of problems, many of which sounded quite similar to this. I hoped for the best and thought that perhaps I had "run the test wrong"(!) since numarray seemed to be working ok, and I had investigated many of the examples in chapters 3 and 4 of the user manual withour any obvious problems (chapter 3 = "high level overview" and chapter 4 = "array basics") I decided at the time to leave well enough alone until I actually came across odd or mysterious behaviour ...however that time has come all-too-soon... The procedure I am using to run the test is as described on page 11 of the excellent user's manual (release 0.8 at http://www.pfdubois.com/numpy/numarray.pdf): --------------------------------------------- Testing your Installation Once you have installed numarray, test it with: C:\numarray> python Python 2.2.2 (#18, Dec 30 2002, 02:26:03) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import numarray.testall as testall >>> testall.test() numeric: (0, 1115) records: (0, 48) strings: (0, 166) objects: (0, 72) memmap: (0, 75) Each line in the above output indicates that 0 of X tests failed. X grows steadily with each release, so the numbers shown above may not be current. ------------------------------------------------------------------------ Anyway, when I ran this, instead of the nice, comforting output above, I got about a million(!) errors and then a final count of 320 failures. This number is not always constant - I recall the first time I ran it it was 209. [I just ran it again and this time it was 324...it all has a rather disturbing air of semi-randomness...] So below is the (heavily snipped) output from the testall.test() run, and below that is the code where I first noticed a possibly similar error, and below *that* is the output of that code, with the highly suspicous error.... Any suggestions greatly appreciated! I can give you more info about the setup on my computer and so on if you need :) wendy langer ====================================================================== ==================================== IDLE 1.0.2 ==== No Subprocess ==== >>> import numarray.testall as testall >>> testall.test() ***************************************************************** Failure in example: x+y from line #50 of first pass Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? File "C:\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 733, in __add__ return ufunc.add(self, operand) error: Int32asFloat64: buffer not aligned on 8 byte boundary. ***************************************************************** Failure in example: x[:] = 0.1 from line #72 of first pass Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? error: Float64asBool: buffer not aligned on 8 byte boundary. ***************************************************************** Failure in example: y from line #74 of first pass Expected: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) Got: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) ***************************************************************** Failure in example: x + z from line #141 of first pass Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? File "C:\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 733, in __add__ return ufunc.add(self, operand) error: Int32asFloat64: buffer not aligned on 8 byte boundary. ***************************************************************** ***************************************************************** Failure in example: a2dma = average(a2dm, axis=1) from line #812 of numarray.ma.dtest Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? File "C:\PYTHON23\Lib\site-packages\numarray\ma\MA.py", line 1686, in average w = Numeric.choose(mask, (1.0, 0.0)) File "C:\PYTHON23\Lib\site-packages\numarray\ufunc.py", line 1666, in choose return _choose(selector, population, outarr, clipmode) File "C:\PYTHON23\Lib\site-packages\numarray\ufunc.py", line 1573, in __call__ result = self._doit(computation_mode, woutarr, cfunc, ufargs, 0) File "C:\PYTHON23\Lib\site-packages\numarray\ufunc.py", line 1558, in _doit blockingparameters) error: choose8bytes: buffer not aligned on 8 byte boundary. ***************************************************************** Failure in example: alltest(a2dma, [1.5, 4.0]) from line #813 of numarray.ma.dtest Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? NameError: name 'a2dma' is not defined ***************************************************************** 1 items had failures: 320 of 671 in numarray.ma.dtest ***Test Failed*** 320 failures. numarray.ma: (320, 671) ========================================================================= ======================== import numarray class anXmatrix: def __init__(self, stepsize = 3): self.stepsize = stepsize self.populate_matrix() def describe(self): print "I am a ", self.__class__ print "my stepsize is", self.stepsize print "my matrix is: \n" print self.matrix def populate_matrix(self): def xvalues(i,j): return self.stepsize*j mx = numarray.fromfunction(xvalues, (4,4)) self.matrix = mx if __name__ == '__main__': print " " print "Making anXmatrix..." r = anXmatrix(stepsize = 5) r.describe() r = anXmatrix(stepsize = 0.02) r.describe() ============================================================================ ======== Making anXmatrix... I am a __main__.anXmatrix my stepsize is 5 my matrix is: [[ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15]] Traceback (most recent call last): File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 97, in ? r = anXmatrix(stepsize = 0.02) File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 72, in __init__ self.populate_matrix() File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 86, in populate_matrix mx = numarray.fromfunction(xvalues, (4,4)) File "C:\PYTHON23\Lib\site-packages\numarray\generic.py", line 1094, in fromfunction return apply(function, tuple(indices(dimensions))) File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 84, in xvalues return self.stepsize*j File "C:\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 772, in __rmul__ r = ufunc.multiply(operand, self) error: multiply_Float64_scalar_vector: buffer not aligned on 8 byte boundary. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ============================================================================ ======== "You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat." Albert Einstein From Chris.Barker at noaa.gov Thu Jul 8 10:58:07 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 10:58:07 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40E473A9.5040109@colorado.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> Message-ID: <40ED8A6D.5050505@noaa.gov> Thanks to Fernando Perez and Travis Oliphant for pointing me to: > scipy.io.read_array In testing, I've found that it's very slow (for my needs), though quite nifty in other ways, so I'm sure I'll find a use for it in the future. Travis Oliphant wrote: > Alternatively, we could move some of the Python code in read_array to > C to improve the speed. That was beyond me, so I wrote a very simple module in C that does what I want, and it is very much faster than read_array or straight python version. It has two functions: FileScan(file) """ Reads all the values in rest of the ascii file, and produces a Numeric vector full of Floats (C doubles). All text in the file that is not part of a floating point number is skipped over. """ FileScanN(file, N) """ Reads N values in the ascii file, and produces a Numeric vector of length N full of Floats (C doubles). Raises an exception if there are fewer than N numbers in the file. All text in the file that is not part of a floating point number is skipped over. After reading N numbers, the file is left before the next non-whitespace character in the file. This will often leave the file at the start of the next line, after scanning a line full of numbers. """ I implemented them separately, 'cause I wasn't sure how to deal with optional arguments in a C function. They could easily have wrapped in a Python function if you wanted one interface. FileScan was much more complex, as I had to deal with all the dynamic memory allocation. I probably took a more complex approach to this than I had to, but it was an exercise for me, being a newbie at C. I also decided not to specify a shape for the resulting array, always returning a rank-1 array, as that made the code easier, and you can always set A.shape afterward. This could be put in a Python wrapper as well. It has the obvious limitation that it only does doubles. I'd like to add longs as well, but probably won't have a need for anything else. The way memory is these days, it seems just as easy to read the long ones, and convert afterward if you want. Here is a quick benchmark (see below) run with a file that is 63,000 lines, with two comma-delimited numbers on each line. Run on a 1GHz P4 under Linux. Reading with read_array it took 16.351712 seconds to read the file with read_array Reading with Standard Python methods it took 2.832078 seconds to read the file with standard Python methods Reading with FileScan it took 0.444431 seconds to read the file with FileScan Reading with FileScanN it took 0.407875 seconds to read the file with FileScanN As you can see, read_array is painfully slow for this kind of thing, straight Python is OK, and FileScan is pretty darn fast. I've enclosed the C code and setup.py, if anyone wants to take a look, and use it, or give suggestions or bug fixes or whatever, that would be great. In particular, I don't think I've structured the code very well, and there could be memory leak, which I have not tested carefully for. Tested only on Linux with Python2.3.3, Numeric 23.1. If someone wants to port it to numarray, that would be great too. -Chris The benchmark: def test6(): """ Testing various IO options """ from scipy.io import array_import filename = "JunkBig.txt" file = open(filename) print "Reading with read_array" start = time.time() A = array_import.read_array(file,",") print "it took %f seconds to read the file with read_array"%(time.time() - start) file.close() file = open(filename) print "Reading with Standard Python methods" start = time.time() A = [] for line in file: A.append( map ( float, line.strip().split(",") ) ) A = array(A) print "it took %f seconds to read the file with standard Python methods"%(time.time() - start) file.close() file = open(filename) print "Reading with FileScan" start = time.time() A = FileScanner.FileScan(file) A.shape = (-1,2) print "it took %f seconds to read the file with FileScan"%(time.time() - start) file.close() file = open(filename) print "Reading with FileScanN" start = time.time() A = FileScanner.FileScanN(file, product(A.shape) ) A.shape = (-1,2) print "it took %f seconds to read the file with FileScanN"%(time.time() - start) -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: FileScan_module.c URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: setup.py URL: From jmiller at stsci.edu Thu Jul 8 12:05:02 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 8 12:05:02 2004 Subject: [Numpy-discussion] "buffer not aligned on 8 byte boundary" errors when running numarray.testall.test() In-Reply-To: References: Message-ID: <1089313446.2639.55.camel@halloween.stsci.edu> On Thu, 2004-07-08 at 13:28, Wendy Langer wrote: > Hi there all :) > > I am having trouble with my installation of numarray. :( > > I am a python newbie and a numarray extreme-newbie, so it could be that I > don't yet have the first clue what I am doing. ;) > > > > Python 2.3.3 (#51, Feb 16 2004, 04:07:52) [MSC v.1200 32 bit (Intel)] on > win32 > numarray 1.0 > > > The Python I am using is the one that comes with the "Enthought" version > (www.enthought.com), a distro specifically designed to be useful for > scientists, so it comes with numerical stuff and scipy and chaco and things > like that preinstalled. > > I used the windows binary installer. However it came with Numeric and not > numarray, so I installed numarray "by hand". This seemed to go ok, and it > seems that there is no problem having both Numeric and numarray in the same > installation, since they have (obviously) different names (still getting > used to this whole modules and namespaces &c &c) I don't normally use SciPy, but I normally have both numarray and Numeric installed so there's no inherent conflict there. > At the bottom of this email I have pasted an example of what it was I was > trying to do, and the error messages that the interpreter gave me - but > before anyone bothers reading them in any detail, the essential error seems > to be as follows: > > error: multiply_Float64_scalar_vector: buffer not aligned on 8 byte > boundary. This is a low level exception triggered by a misaligned data buffer. It's low level so it's impossible to tell what the real problem is without more information. > I have no idea what this means, but I do recall that when I ran the > numarray.testall.test() procedure after first completing my installation a > couple of days ago, it reported a *lot* of problems, many of which sounded > quite similar to this. That sounds pretty bad. Here's roughly how it should look these days: % python >>> import numarray.testall as testall >>> testall.test() numarray: ((0, 1165), (0, 1165)) numarray.records: (0, 48) numarray.strings: (0, 176) numarray.memmap: (0, 82) numarray.objects: (0, 105) numarray.memorytest: (0, 16) numarray.examples.convolve: ((0, 20), (0, 20), (0, 20), (0, 20)) numarray.convolve: (0, 52) numarray.fft: (0, 75) numarray.linear_algebra: ((0, 46), (0, 51)) numarray.image: (0, 27) numarray.nd_image: (0, 390) numarray.random_array: (0, 53) numarray.ma: (0, 671) The tuple results for your test should all have leading zeros as above. The number of tests varies from release to release. > I hoped for the best and thought that perhaps I had "run the test wrong"(!) > since numarray seemed to be working ok, and I had investigated many of the > examples in chapters 3 and 4 of the user manual withour any obvious problems > (chapter 3 = "high level overview" and chapter 4 = "array basics") > > I decided at the time to leave well enough alone until I actually came > across odd or mysterious behaviour ...however that time has come > all-too-soon... > > > > > The procedure I am using to run the test is as described on page 11 of the > excellent user's manual (release 0.8 at > http://www.pfdubois.com/numpy/numarray.pdf): There's an updated manual here: http://prdownloads.sourceforge.net/numpy/numarray-1.0.pdf?download > -- > Testing your Installation > Once you have installed numarray, test it with: > C:\numarray> python > Python 2.2.2 (#18, Dec 30 2002, 02:26:03) [MSC 32 bit (Intel)] on win32 > Type "copyright", "credits" or "license" for more information. > >>> import numarray.testall as testall > >>> testall.test() > numeric: (0, 1115) > records: (0, 48) > strings: (0, 166) > objects: (0, 72) > memmap: (0, 75) > Each line in the above output indicates that 0 of X tests failed. X grows > steadily with each release, so the numbers > shown above may not be current. > -- > > Anyway, when I ran this, instead of the nice, comforting output above, I > got about a million(!) errors and then a final count of 320 failures. This > number is not always constant - I recall the first time I ran it it was 209. > [I just ran it again and this time it was 324...it all has a rather > disturbing air of semi-randomness...] > > > So below is the (heavily snipped) output from the testall.test() run, and > below that is the code where I first noticed a possibly similar error, and > below *that* is the output of that code, with the highly suspicous > error.... > > > Any suggestions greatly appreciated! If you've ever had numarray installed before, go to your site-packages directory and delete numarray as well as any numarray.pth. Then reinstall numarray-1.0. Also, just do: >>> import numarray >>> numarray and see what kind of path is involved getting to the numarray module. > I can give you more info about the setup on my computer and so on if you > need :) I think you already included everything important; the exact variant of Windows you're using might be helpful; I'm not aware of any problems there though. It looks like you're on a well supported platform. I just tested pretty much the same configuration on Windows 2000 Pro, with Python-2.3.4, and it worked fine even with SciPy-0.3. > wendy langer > > > ====================================================================== > > There's something hugely wrong with your test output. I've never seen anything like it other than during development. > > > ========================================================================= > > ======================== > > > import numarray > > class anXmatrix: > def __init__(self, stepsize = 3): > self.stepsize = stepsize > self.populate_matrix() > > > def describe(self): > print "I am a ", self.__class__ > print "my stepsize is", self.stepsize > print "my matrix is: \n" > print self.matrix > > def populate_matrix(self): > > def xvalues(i,j): > return self.stepsize*j > > mx = numarray.fromfunction(xvalues, (4,4)) > self.matrix = mx > > > if __name__ == '__main__': > > > print " " > print "Making anXmatrix..." > r = anXmatrix(stepsize = 5) > r.describe() > r = anXmatrix(stepsize = 0.02) > r.describe() > > > > ============================================================================ Here's what I get when I run your code, windows or linux: Making anXmatrix... I am a __main__.anXmatrix my stepsize is 5 my matrix is: [[ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15]] I am a __main__.anXmatrix my stepsize is 0.02 my matrix is: [[ 0. 0.02 0.04 0.06] [ 0. 0.02 0.04 0.06] [ 0. 0.02 0.04 0.06] [ 0. 0.02 0.04 0.06]] Regards, Todd From Fernando.Perez at colorado.edu Thu Jul 8 12:25:07 2004 From: Fernando.Perez at colorado.edu (Fernando.Perez at colorado.edu) Date: Thu Jul 8 12:25:07 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40ED8A6D.5050505@noaa.gov> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> Message-ID: <1089314664.40ed9f68e1db5@webmail.colorado.edu> Quoting Chris Barker : > Thanks to Fernando Perez and Travis Oliphant for pointing me to: > > > scipy.io.read_array > > In testing, I've found that it's very slow (for my needs), though quite > nifty in other ways, so I'm sure I'll find a use for it in the future. Just a quick note Travis sent to me privately: he suggested using io.numpyio.fread instead of Numeric.fromstring() for speed reasons. I don't know if it will help in your case, I just mention it in case it helps. Cheers, F From Chris.Barker at noaa.gov Thu Jul 8 12:41:06 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 12:41:06 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <1089314664.40ed9f68e1db5@webmail.colorado.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> <1089314664.40ed9f68e1db5@webmail.colorado.edu> Message-ID: <40EDA2A8.9030300@noaa.gov> Fernando.Perez at colorado.edu wrote: \> Just a quick note Travis sent to me privately: he suggested using > io.numpyio.fread instead of Numeric.fromstring() for speed reasons. I don't > know if it will help in your case, I just mention it in case it helps. Thanks, but those are for binary files, which I have to do sometimes, so I'll keep it in mind. However, my problem at hand is text files, and my solution is working nicely, though I'd love a pair of more experienced eyes on the code.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jul 8 13:50:03 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 13:50:03 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <004c01c46524$ab808090$ebeca782@stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> <1089314664.40ed9f68e1db5@webmail.colorado.edu> <40EDA2A8.9030300@noaa.gov> <004c01c46524$ab808090$ebeca782@stsci.edu> Message-ID: <40EDB2BD.4080809@noaa.gov> Todd Miller wrote: > I looked this over to see how hard it would be to port to numarray. At > first glance, it looks easy. I didn't really read it closely enough to > pick up bugs, but what I saw looks good. One thing I did notice was a > calloc of temporary data space. That seemed like a possible waste: can't > you just preallocate the array and read your data directly into it? The short answer is that I'm not very smart! The longer answer is that this is because at first I misunderstood what PyArray_FromDimsAndData was for. For ScanFileN, I'll re-do it as you suggest. For ScanFile, it is unknown at the beginning how big the final array is, and I did scheme that would allocate the memory as it went, in reasonable sized chunks. However, this does require a full copy, which is a problem. Since posting, I thought of a MUCH easier scheme: scan the file, without storing the data, to see how many numbers there are. rewind the file allocate the Array Read the data. This requires scanning the file twice, which would cost, but would be easier, and prevent an unnecessary copy of the data. I hope I"ll get a change to try it out and see what the performance is like. IN the meantime, anyone else have any thoughts? By the way, does it matter whether I use malloc or calloc? I can't really tell the difference from K&R. > This is > probably a very minor speed issue, but might be a significant storage issue > as people are starting to max out 32-bit systems. yup. This is all pointless if it's not a lot of data, after all. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jul 8 16:21:16 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 16:21:16 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40EDB2BD.4080809@noaa.gov> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> <1089314664.40ed9f68e1db5@webmail.colorado.edu> <40EDA2A8.9030300@noaa.gov> <004c01c46524$ab808090$ebeca782@stsci.edu> <40EDB2BD.4080809@noaa.gov> Message-ID: <40EDD64A.1060508@noaa.gov> Chris Barker wrote: >> can't >> you just preallocate the array and read your data directly into it? > > The short answer is that I'm not very smart! The longer answer is that > this is because at first I misunderstood what PyArray_FromDimsAndData > was for. For ScanFileN, I'll re-do it as you suggest. I've re-done it. Now I don't double allocate storage for ScanFileN. There was no noticeable difference in performance, but why use memory you don't have to? For ScanFile, it is unknown at the beginning how big the final array is, so I now have two versions. One is what I had before, it allocates memory in blocks of some Buffersize as it reads the file (now set to 1024 elements). Once it's all read in, it creates an appropriate size PyArray, and copies the data to it. This results in a double copy of all the data until the temporary memory is freed. I now also have a ScanFile2, which scans the whole file first, then creates a PyArray, and re-reads the file to fill it up. This version takes about twice as long, confirming my expectation that the time to allocate and copy data is tiny compared to reading and parsing the file. Here's a simple benchmark: Reading with Standard Python methods (62936, 2) it took 2.824013 seconds to read the file with standard Python methods Reading with FileScan (62936, 2) it took 0.400936 seconds to read the file with FileScan Reading with FileScan2 (62936, 2) it took 0.752649 seconds to read the file with FileScan2 Reading with FileScanN (62936, 2) it took 0.441714 seconds to read the file with FileScanN So it takes twice as long to count the numbers first, but it's still three times as fast as just doing all this with Python. However, I usually don't think it's worth all this effort for a 3 times speed up, and I tend to make copies my arrays all over the place with NumPy anyway, so I'm inclined to stick with the first method. Also, if you are really that tight on memory, you could always read it in chunks with ScanFileN. Any feedback anyone wants to give is very welcome. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: FileScan_module.c URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: setup.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: TestFileScan.py URL: From falted at pytables.org Fri Jul 9 03:55:03 2004 From: falted at pytables.org (Francesc Alted) Date: Fri Jul 9 03:55:03 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion Message-ID: <200407091254.06579.falted@pytables.org> Hi, As Perry said not too long ago that numarray crew would ask for suggestions for RecArray improvements, I'm going to suggest a couple. I find quite inconvenient the .tolist() method when applied to RecArray objects as it is now: >>> r[2:4] array( [(3, 33.0, 'c'), (4, 44.0, 'd')], formats=['1UInt8', '1Float32', '1a1'], shape=2, names=['c1', 'c2', 'c3']) >>> r[2:4].tolist() [, ] The suggested behaviour would be: >>> r[2:4].tolist() [(3, 33.0, 'c'),(4, 44.0, 'd')] Another thing is that an element of recarray would be returned as a tuple instead as a records.Record object: >>> r[2] The suggested behaviour would be: >>> r[2] (3, 33.0, 'c') I think the latter would be consistent with the convention that a __getitem__(int) of a NumArray object returns a python type instead of a rank-0 array. In the same way, a __getitem__(int) of a RecArray should return a a python type (a tuple in this case). Below is the code that I use right now to simulate this behaviour, but it would be nice if the code would included into numarray.records module. def tolist(arr): """Converts a RecArray or Record to a list of rows""" outlist = [] if isinstance(arr, records.Record): for i in range(arr.array._nfields): outlist.append(arr.array.field(i)[arr.row]) outlist = tuple(outlist) # return a tuple for records elif isinstance(arr, records.RecArray): for j in range(arr.nelements()): tmplist = [] for i in range(arr._nfields): tmplist.append(arr.field(i)[j]) outlist.append(tuple(tmplist)) return outlist Cheers, -- Francesc Alted From thomas_karlsson_569 at hotmail.com Fri Jul 9 08:02:44 2004 From: thomas_karlsson_569 at hotmail.com (Thomas Karlsson) Date: Fri Jul 9 08:02:44 2004 Subject: [Numpy-discussion] Numpy compiling error... Help! Message-ID: Hi I'm trying to compile/install numpy on a RH9 machine. When doing so I run into problems. I give the command: python setup.py install and get a long answer, with this error at the end: gcc -shared build/temp.linux-i686-2.2/lapack_litemodule.o -L/usr/lib/atlas -llapack -lcblas -lf77blas -latlas -lg2c -o build/lib.linux-i686-2.2/lapack_lite.so /usr/bin/ld: cannot find -llapack collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 Does anyone know what I've done wrong? I've spent alot of time on this and really needs help now... Regards Thomas _________________________________________________________________ Hitta r?tt p? n?tet med MSN S?k http://search.msn.se/ From Chris.Barker at noaa.gov Fri Jul 9 09:44:12 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jul 9 09:44:12 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <3afee4a2.5cf5a1c3.8234000@expms6.cites.uiuc.edu> References: <3afee4a2.5cf5a1c3.8234000@expms6.cites.uiuc.edu> Message-ID: <40EECAB8.3050900@noaa.gov> Bruce, Thanks for your feedback. Bruce Southey wrote: > While I am not really following your thread, I just wanted to comment that the > Python Cookbook (at least the printed version) has some ways to count lines in a > file - assuming that the number of lines provides the size. The number of lines does not necessarily provide the size. In the general case, it doesn't at all. My whole goal here is the general case: being able to read a bunch of numbers out of any format of text file. This can be used as part of a parser for many file formats. If I was shooting for just one format, this would be easier, but not general purpose. Now that I have this, I can write a number of file format parsers in python with improved performance and easier syntax. Under Unix (but not > windows), I am aiming for a portable solution. > Alternatively if sufficient memory is available, storing the file in memory > (during the counting of elements) should always be faster than reading it a > second time from the hard disk. The primary reason to scan the file ahead of time to count the elements is to save the memory of duplicate copies of data. The other reason is to make memory management easier, but since I've already solved that problem, I'm done. thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From perry at stsci.edu Mon Jul 12 14:15:01 2004 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jul 12 14:15:01 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407091254.06579.falted@pytables.org> Message-ID: Francesc Alted wrote: > > As Perry said not too long ago that numarray crew would ask for > suggestions > for RecArray improvements, I'm going to suggest a couple. > > I find quite inconvenient the .tolist() method when applied to RecArray > objects as it is now: > > >>> r[2:4] > array( > [(3, 33.0, 'c'), > (4, 44.0, 'd')], > formats=['1UInt8', '1Float32', '1a1'], > shape=2, > names=['c1', 'c2', 'c3']) > >>> r[2:4].tolist() > [, > ] > > > The suggested behaviour would be: > > >>> r[2:4].tolist() > [(3, 33.0, 'c'),(4, 44.0, 'd')] > > Another thing is that an element of recarray would be returned as a tuple > instead as a records.Record object: > > >>> r[2] > > > The suggested behaviour would be: > > >>> r[2] > (3, 33.0, 'c') > > I think the latter would be consistent with the convention that a > __getitem__(int) of a NumArray object returns a python type instead of a > rank-0 array. In the same way, a __getitem__(int) of a RecArray should > return a a python type (a tuple in this case). > These are good examples of where improvements are needed (we are also looking at how best to handle multidimensional arrays and should have a proposal this week). What I'm wondering about is what a single element of a record array should be. Returning a tuple has an undeniable simplicity to it. On the other hand, we've been using recarrays that allow naming the various columns (which we refer to as "fields"). If one can refer to fields of a recarray, shouldn't one be able to refer to a field (by name) of one of it's elements? Or are you proposing that basic recarrays not have that sort of capability (something added by a subclass)? Perry From rowen at u.washington.edu Mon Jul 12 16:09:00 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Mon Jul 12 16:09:00 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: At 5:14 PM -0400 2004-07-12, Perry Greenfield wrote: >What I'm wondering about is what a single element of a record array >should be. Returning a tuple has an undeniable simplicity to it. >On the other hand, we've been using recarrays that allow naming the >various columns (which we refer to as "fields"). If one can refer >to fields of a recarray, shouldn't one be able to refer to a field >(by name) of one of it's elements? Or are you proposing that basic >recarrays not have that sort of capability (something added by a >subclass)? In my opinion, an single item of a record array should be a RecordItem object that is a dictionary that keeps items in field order. Thus: - use the standard dictionary interface to deal with values by name (except the keys are always in the correct order. - one can also get and set the all data at once as a tuple. This is NOT a standard dictionary interface, but is essential. Functions such as getvalues(), setvalues(dataTuple) should do it. Adopting the full dictionary interface means one gets a standard, mature and fairly complete set of features. ALSO a RecordItem object can then be used wherever a dictionary object is needed. I suspect it's also useful to have named field access: RecordItem.fieldname but am a bit reluctant to suggest so many different ways of getting to the data. I assume it will continue to be easy to get all data for a field by naming the appropriate field. That's a really nice feature. It would be even better if a masked array could be used, but I have no idea how hard this would be. Which brings up a side issue: any hope of integrating masked arrays into numarray, such that they could be used wherever a numarray array could be used? Areas that I particularly find myself needing them including nd_image filtering and writing C extensions. -- Russell P.S. I submitted several feature requests and bug reports for records on sourceforge months ago. I hope they'll not be overlooked during the review process. From falted at pytables.org Tue Jul 13 01:30:55 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 13 01:30:55 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: <200407131028.04791.falted@pytables.org> A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: > What I'm wondering about is what a single element of a record array > should be. Returning a tuple has an undeniable simplicity to it. Yeah, this why I'm strongly biased toward this possibility. > On the other hand, we've been using recarrays that allow naming the > various columns (which we refer to as "fields"). If one can refer > to fields of a recarray, shouldn't one be able to refer to a field > (by name) of one of it's elements? Or are you proposing that basic > recarrays not have that sort of capability (something added by a > subclass)? Well, I'm not sure about that. But just in case most of people would like to access records by field as well as by index, I would advocate for the possibility that the Record instances would behave as similar as possible as a tuple (or dictionary?). That include creating appropriate __str__() *and* __repr__() methods as well as __getitem__() that supports both name fields and indices. I'm not sure about whether providing an __getattr__() method would ok, but for the sake of simplicity and in order to have (preferably) only one way to do things, I would say no. Regards, -- Francesc Alted From falted at pytables.org Tue Jul 13 02:07:00 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 13 02:07:00 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407131028.04791.falted@pytables.org> References: <200407131028.04791.falted@pytables.org> Message-ID: <200407131106.19557.falted@pytables.org> A Dimarts 13 Juliol 2004 10:28, Francesc Alted va escriure: > A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: > > What I'm wondering about is what a single element of a record array > > should be. Returning a tuple has an undeniable simplicity to it. > > Yeah, this why I'm strongly biased toward this possibility. > > > On the other hand, we've been using recarrays that allow naming the > > various columns (which we refer to as "fields"). If one can refer > > to fields of a recarray, shouldn't one be able to refer to a field > > (by name) of one of it's elements? Or are you proposing that basic > > recarrays not have that sort of capability (something added by a > > subclass)? > > Well, I'm not sure about that. But just in case most of people would like to > access records by field as well as by index, I would advocate for the > possibility that the Record instances would behave as similar as possible as > a tuple (or dictionary?). That include creating appropriate __str__() *and* > __repr__() methods as well as __getitem__() that supports both name fields > and indices. I'm not sure about whether providing an __getattr__() method > would ok, but for the sake of simplicity and in order to have (preferably) > only one way to do things, I would say no. I've been thinking that one can made compatible to return a tuple on a single element of a RecArray and still being able to retrieve a field by name is to play with the RecArray.__getitem__ and let it to suport key names in addition to indices. This would be better seen as an example: Right now, one can say: >>> r=records.array([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") >>> r._fields["c1"] array([1, 2]) >>> r._fields["c1"][1] 2 What I propose is to be able to say: >>> r["c1"] array([1, 2]) >>> r["c1"][1] 2 Which would replace the notation: >>> r[1]["c1"] 2 which was recently suggested. I.e. the suggestion is to realize RecArrays as a collection of columns, as well as a collection of rows. -- Francesc Alted From falted at pytables.org Tue Jul 13 02:13:03 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 13 02:13:03 2004 Subject: [Numpy-discussion] PyTables 0.8.1 released Message-ID: <200407131112.15345.falted@pytables.org> PyTables is a hierarchical database package designed to efficiently manage very large amounts of data. PyTables is built on top of the HDF5 library and the numarray package. It features an object-oriented interface that, combined with natural naming and C-code generated from Pyrex sources, makes it a fast, yet extremely easy-to-use tool for interactively saving and retrieving different kinds of datasets. It also provides flexible indexed access on disk to anywhere in the data. The primary purpose of this release is to incorporate updates to related to the newly released numarray 1.0. I've taken the opportunity to backport some improvements added in PyTables 0.9 (in alpha stage) as well as to fix the known problems Improvements: - The logic for computing the buffer sizes has been revamped. As a consequence, the performance of writing/reading tables with large record sizes has improved by a factor of ten or more, now exceeding 70 MB/s for writing and 130 MB/s for reading (using compression). - The maximum record size for tables has been raised to 512 KB (before it was 8 KB, due to some internal limitations) - Documentation has been improved in many minor details. As a result of a fix in the underlying documentation system (tbook), chapters start now at odd pages, instead of even. So those of you who want to print to double side probably will have better luck now when aligning pages ;). Another one is that HTML documentation has improved its look as well. Bug Fixes: - Indexing of Arrays with list or tuple flavors (#968131) When retrieving single elements from an array with 'List' or 'Tuple' flavors, an error occurred. This has been corrected and now you can retrieve fileh.root.array[2] without problems for 'List' or 'Tuple' flavored (E, VL)Arrays. - Iterators on Arrays with list or tuple flavors fail (#968132) When using iterators with Array objects with 'List' or 'Tuple' flavors, an error occurred. This has been corrected. - Last Index (-1) of Arrays doesn't work (#968149) When accessing to the last element in an Array using the notation -1, an empty list (or tuple or array) is returned instead of the proper value. This happened in general with all negative indices. Fixed. - Table.read(flavor="List") should return pure lists (#972534) However, it used to return a pointer to numarray.records.Record instances, as in: >>> fileh.root.table.read(1,2,flavor="List") [] >>> fileh.root.table.read(1,3,flavor="List") [, ] Now the next records are returned: >>> fileh.root.table.read(1,2, flavor=List) [(' ', 1, 1.0)] >>> fileh.root.table.read(1,3, flavor=List) [(' ', 1, 1.0), (' ', 2, 2.0)] In addition, when reading a single row of a table, a numarray.records.Record pointer was returned: >>> fileh.root.table[1] Now, it returns a tuple: >>> fileh.root.table[1] (' ', 1, 1.0) Which I think is more consistent, and more Pythonic. - Copy of leaves fails... (#973370) Attempting to copy leaves (Table or Array with different flavors) on top of themselves caused an internal error in PyTables. This has been corrected by silently avoiding the copy and returning the original Leaf as a result. Minor changes: - When assigning a value to a non-existing field in a table row, now a KeyError is raised, instead of the AttributeError that was issued before. I think this is more consistent with the type of error. - Tests have been improved so as to pass the whole suite when compiled in 64 bit mode on a Linux/PowerPC machine (namely a dual-G5 Powermac running a 64-bit, 2.6.4 Linux kernel and the preview YDL distribution for G5, with 64-bit GCC toolchain). Thanks to Ciro Cattuto for testing and reporting the modifications that were needed. Where PyTables can be applied? ------------------------------ PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? ---------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux (Intel 32-bit) as the main development platform, but PyTables should be easy to compile/install on many other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors, with the MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit platforms, like an AMD Opteron running SuSe Linux Enterprise Server or PowerPC G5 with Linux 2.6.x in 64bit mode. It has also been tested in MacOSX platforms (10.2 but should also work on newer versions). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/html/tut/tutorial1-1.html and, for newly introduced Variable Length Arrays: http://pytables.sourceforge.net/html/tut/vlarray2.html Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted From jmiller at stsci.edu Tue Jul 13 10:42:04 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 13 10:42:04 2004 Subject: [Numpy-discussion] numarray-1.0 Bug Alert Message-ID: <1089740511.9509.372.camel@halloween.stsci.edu> Overview There is a bug in numarray's Numeric compatible C-API. The bug has been latent for a long time, since numarray-0.3 was released roughly two years ago. It is serious because it results in wrong answers for a certain extension functions fed a certain class of arrays. What's affected The bug affects affects numarray's add-on packages or third party extension functions which use the Numeric compatibility C-API. Generally, this means C-code that was either ported from Numeric or was written with both Numeric and numarray in mind. This includes the add-on packages numarray.linear_algebra, numarray.fft, numarray.random_array, and numarray.mlab. More recently, it includes the ports of core Numeric functions to numarray.numeric. Because numarray.ma uses numarray.numeric, the bug also affects numarray.ma. Finally, for numarray-1.0 this bug affects the functions numarray.argmin and numarray.argmax; these should be the only two functions in core numarray which are affected. Detailed Bug Description The bug is exposed by calling an extension function (written using the Numeric compatible C-API) with an array that has a non-zero _byteoffset attribute. Arrays with non-zero _byteoffset are typically created as a result of partially indexing higher dimensional arrays or slicing arrays. Partially indexing or slicing an array generally results in a sub-array, a view which often refers to an interior region of the original array buffer. Because numarray's PyArrayObject does not currently include it's ->byteoffset in its ->data pointer as the Numeric compatibility API assumes it does, an extension function sees the base region of the original array rather than the region belonging to the sub-array. Immediate User Workaround A simple user level workaround for people that need to use the affected packages and functions today is one like the following: def make_safe_for_numeric_api(a): a = numarray.asarray(a) if a._byteoffset != 0: return a.copy() else: return a The array inputs to an affected extension function need to be wrapped with calls to make_safe_for_numeric_api(). Since this is intrusive and a real fix should be released in the near future, this approach is not recommended. Long Term Fix The real fix for the bug appears to be to redefine the semantics of numarray's PyArrayObject ->data pointer to include ->byteoffset, altering the C-API. This should make most existing Numeric compatible extension functions work without modification or recompilation, but will necessitate the re-compilation of some extension functions written using the native numarray API approaches (the NA_* functions and macros). This recompilation will be required because key macros will change, most notably NA_OFFSETDATA. This fix is not the only possible one, and other suggestions are welcome, but changing the semantics of ->data appears to be the best way to facilitate numarray/Numeric interoperability. By doing this fix, numarray operates more like Numeric so fewer changes need to be made in the future to perform ports of Numeric code to numarray. Impact of Proposed Fix Regrettably, the proposed fix will break binary compatibility for clients of the numarray-1.0 native C-API. So, extensions built using the numarray native C-API will need to be rebuilt for numarray-1.1. Extensions that have made direct access to PyArrayObject's ->data and require the original offsetless meaning will also need to change code for numarray-1.1. This is something we *really* wanted to avoid... it just isn't going to happen this time. The Plan The current plan is to fix the Numeric compatible API by changing the semantics of ->data and release numarray-1.1 relatively soon, hopefully within 2 weeks. I'm sorry for any inconvenience this has caused numarray users. Regards, Todd Miller From zingale at ucolick.org Tue Jul 13 12:54:02 2004 From: zingale at ucolick.org (Mike Zingale) Date: Tue Jul 13 12:54:02 2004 Subject: [Numpy-discussion] differencing numarray arrays. Message-ID: Hi, I am trying to efficiently compute a difference of two 2-d flux arrays, as arises quite commonly in finite-difference/finite-volume methods. Ex: a = arange(64) a.shape = (8,8) I want to do create a new array, b, of shape such that b[i,j] = a[i,j] - a[i-1,j] for 1 <= i < 8 0 <= i < 8 I can obviously do this through loops, but this is quite slow. In IDL, which is often compared to numarray/python, this is simple to do with the shift() function, but I cannot find an efficient way to do it with numarray arrays. I tried defining a list i = range(8) im1[1:9] = im1[1:9] - 1 and indexing with im1, but this does not work. Any suggestions? For large array, this simple differencing in python is very expensive when using loops. Thanks, Mike ------------------------------------------------------------------------------ Michael Zingale UCO/Lick Observatory UCSC Santa Cruz, CA 95064 phone: (831) 459-5246 fax: (831) 459-5265 e-mail: zingale at ucolick.org web: http://www.ucolick.org/~zingale ``Don't worry head, the computer will do our thinking now'' -- Homer From tim.hochberg at cox.net Tue Jul 13 12:59:00 2004 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Jul 13 12:59:00 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: References: Message-ID: <40F43EC4.70903@cox.net> Mike Zingale wrote: >Hi, I am trying to efficiently compute a difference of two 2-d flux >arrays, as arises quite commonly in finite-difference/finite-volume >methods. Ex: > >a = arange(64) >a.shape = (8,8) > >I want to do create a new array, b, of shape such that > >b[i,j] = a[i,j] - a[i-1,j] > >for 1 <= i < 8 > 0 <= i < 8 > > That's supposed to be a j in the second eq., right? If I understand you right, what you want is: b = a[1:] - a[:-1] -tim >I can obviously do this through loops, but this is quite slow. In IDL, >which is often compared to numarray/python, this is simple to do with the >shift() function, but I cannot find an efficient way to do it with >numarray arrays. > >I tried defining a list > >i = range(8) >im1[1:9] = im1[1:9] - 1 > >and indexing with im1, but this does not work. > >Any suggestions? For large array, this simple differencing in python is >very expensive when using loops. > >Thanks, > >Mike > >------------------------------------------------------------------------------ >Michael Zingale >UCO/Lick Observatory >UCSC >Santa Cruz, CA 95064 > >phone: (831) 459-5246 >fax: (831) 459-5265 >e-mail: zingale at ucolick.org >web: http://www.ucolick.org/~zingale > >``Don't worry head, the computer will do our thinking now'' -- Homer > > > >------------------------------------------------------- >This SF.Net email sponsored by Black Hat Briefings & Training. >Attend Black Hat Briefings & Training, Las Vegas July 24-29 - >digital self defense, top technical experts, no vendor pitches, >unmatched networking opportunities. Visit www.blackhat.com >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From rkern at ucsd.edu Tue Jul 13 13:01:04 2004 From: rkern at ucsd.edu (Robert Kern) Date: Tue Jul 13 13:01:04 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: References: Message-ID: <40F43F65.9040208@ucsd.edu> Mike Zingale wrote: > Hi, I am trying to efficiently compute a difference of two 2-d flux > arrays, as arises quite commonly in finite-difference/finite-volume > methods. Ex: > > a = arange(64) > a.shape = (8,8) > > I want to do create a new array, b, of shape such that > > b[i,j] = a[i,j] - a[i-1,j] > > for 1 <= i < 8 > 0 <= i < 8 Try b = a[1:] - a[:-1] -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From zingale at ucolick.org Tue Jul 13 13:42:02 2004 From: zingale at ucolick.org (Mike Zingale) Date: Tue Jul 13 13:42:02 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: <40F44766.9010009@pfdubois.com> References: <40F44766.9010009@pfdubois.com> Message-ID: thanks, all these responses helped. I guess I was still a little unclear with the slicing abilities in numarray. Mike On Tue, 13 Jul 2004, Paul Dubois wrote: > Two of the responses to your question, while correct, might have seemed > mysterious to a beginner. > > a[1:] - a[:-1] > > is actually shorthand for: > > a[1:, :] - a[:-1, :] > > Or to be even more explicit: > > n = 8 > a[1:n, 0:n] - a[0:(n-1), 0:n] > > If you had wanted the difference in the second index, you have to use > the more explicit forms. > > > From rowen at u.washington.edu Tue Jul 13 17:11:49 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Tue Jul 13 17:11:49 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: References: <40F44766.9010009@pfdubois.com> Message-ID: At 1:41 PM -0700 2004-07-13, Mike Zingale wrote: >thanks, all these responses helped. I guess I was still a little >unclear with the slicing abilities in numarray... Also note that there is a shift function: numarray.nd_image.shift In your case I suspect slicing is better, but there are times when one really does want to shift the data (e.g. when one wants the resulting array to be the same shape as the original). -- Russell From kyeser at earthlink.net Tue Jul 13 19:35:39 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Tue Jul 13 19:35:39 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? Message-ID: Hi. I wrote a program to calculate sums of every possible combinations of two indices of a list. The main body of the program looks something like this: r = [0,2,5,6,8] l = [] for x in range(0, len(r)): for y in range(0, len(r)): k = r[x]+r[y] l.append(k) print l 1. I've heard that it's not a good idea to have a 'for' loop within another 'for' loop, and I was wondering if there is a more efficient way to do this. 2. Does anyone know if there is a built-in function or module that would do the above task in NumPy or Numarray (or even in Python)? I would really appreciate it if anyone could let me know. Thanks for your help! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 715 bytes Desc: not available URL: From focke at slac.stanford.edu Tue Jul 13 22:02:08 2004 From: focke at slac.stanford.edu (Warren Focke) Date: Tue Jul 13 22:02:08 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? In-Reply-To: References: Message-ID: l = Numeric.add.outer(r, r).flat oughta do the trick. Should work for numarray, too. On Tue, 13 Jul 2004, Hee-Seng Kye wrote: > Hi. I wrote a program to calculate sums of every possible combinations > of two indices of a list. The main body of the program looks something > like this: > > r = [0,2,5,6,8] > l = [] > > for x in range(0, len(r)): > for y in range(0, len(r)): > k = r[x]+r[y] > l.append(k) > print l > > 1. I've heard that it's not a good idea to have a 'for' loop within > another 'for' loop, and I was wondering if there is a more efficient > way to do this. > > 2. Does anyone know if there is a built-in function or module that > would do the above task in NumPy or Numarray (or even in Python)? > > I would really appreciate it if anyone could let me know. > > Thanks for your help! From eric at enthought.com Tue Jul 13 22:09:01 2004 From: eric at enthought.com (eric jones) Date: Tue Jul 13 22:09:01 2004 Subject: [Numpy-discussion] ANN: Reminder -- SciPy 04 is coming up Message-ID: <40F4BF9E.8060103@enthought.com> Hey folks, Just a reminder that SciPy 04 is coming up. More information is here: http://www.scipy.org/wikis/scipy04 About the Conference and Keynote Speaker --------------------------------------------- The 1st annual *SciPy Conference* will be held this year at Caltech, September 2-3, 2004. As some of you may know, we've experienced great participation in two SciPy "Workshops" (with ~70 attendees in both 2002 and 2003) and this year we're graduating to a "conference." With the prestige of a conference comes the responsibility of a keynote address. This year, Jim Hugunin has answered the call and will be speaking to kickoff the meeting on Thursday September 2nd. Jim is the creator of Numeric Python, Jython, and co-designer of AspectJ. Jim is currently working on IronPython--a fast implementation of Python for .NET and Mono. Presenters ----------- We still have room for a few more standard talks, and there is plenty of room for lightning talks. Because of this, we are extending the abstract deadline until July 23rd. Please send your abstract to abstracts at scipy.org. Travis Oliphant is organizing the presentations this year. (Thanks!) Once accepted, papers and/or presentation slides are acceptable and are due by August 20, 2004. Registration ------------- Early registration ($100.00) has been extended to July 23rd. Follow the links off of the main conference site: http://www.scipy.org/wikis/scipy04 After July 23rd, registration will be $150.00. Registration includes breakfast and lunch Thursday & Friday and a very nice dinner Thursday night. Please register as soon as possible as it will help us in planning for food, room sizes, etc. Sprints -------- As of now, we really haven't had much of a call for coding sprints for the 3 days prior to SciPy 04. Below is the original announcement about sprints. If you would like to suggest a topic and see if others are interested, please send a message to the list. Otherwise, we'll forgo the sprints session this year. We're also planning three days of informal "Coding Sprints" prior to the conference -- August 30 to September 1, 2004. Conference registration is not required to participate in the sprints. Please email the list, however, if you plan to attend. Topics for these sprints will be determined via the mailing lists as well, so please submit any suggestions for topics to the scipy-user list: list signup: http://www.scipy.org/mailinglists/ list address: scipy-user at scipy.org thanks, eric From kyeser at earthlink.net Tue Jul 13 23:30:13 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Tue Jul 13 23:30:13 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? In-Reply-To: References: Message-ID: <34CF38C4-D55F-11D8-8504-000393479EE8@earthlink.net> Thank you so much. It works beautifully! On Jul 14, 2004, at 1:01 AM, Warren Focke wrote: > l = Numeric.add.outer(r, r).flat > oughta do the trick. Should work for numarray, too. > > On Tue, 13 Jul 2004, Hee-Seng Kye wrote: > >> Hi. I wrote a program to calculate sums of every possible >> combinations >> of two indices of a list. The main body of the program looks >> something >> like this: >> >> r = [0,2,5,6,8] >> l = [] >> >> for x in range(0, len(r)): >> for y in range(0, len(r)): >> k = r[x]+r[y] >> l.append(k) >> print l >> >> 1. I've heard that it's not a good idea to have a 'for' loop within >> another 'for' loop, and I was wondering if there is a more efficient >> way to do this. >> >> 2. Does anyone know if there is a built-in function or module that >> would do the above task in NumPy or Numarray (or even in Python)? >> >> I would really appreciate it if anyone could let me know. >> >> Thanks for your help! > > > ------------------------------------------------------- > This SF.Net email sponsored by Black Hat Briefings & Training. > Attend Black Hat Briefings & Training, Las Vegas July 24-29 - > digital self defense, top technical experts, no vendor pitches, > unmatched networking opportunities. Visit www.blackhat.com > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From falted at pytables.org Wed Jul 14 02:37:06 2004 From: falted at pytables.org (Francesc Alted) Date: Wed Jul 14 02:37:06 2004 Subject: [Numpy-discussion] numarray-1.0 Bug Alert In-Reply-To: <1089740511.9509.372.camel@halloween.stsci.edu> References: <1089740511.9509.372.camel@halloween.stsci.edu> Message-ID: <200407141136.09436.falted@pytables.org> A Dimarts 13 Juliol 2004 19:41, Todd Miller va escriure: > The real fix for the bug appears to be to redefine the semantics of > numarray's PyArrayObject ->data pointer to include ->byteoffset, > altering the C-API. Oh well, I'm afraid that I'll be affected by that :(. Just to understand that fully, you mean that real data for an array will start in the future at narr->data, instead of narr->data+narr->byteoffset as it does now? -- Francesc Alted From jmiller at stsci.edu Wed Jul 14 04:38:09 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 14 04:38:09 2004 Subject: [Numpy-discussion] numarray-1.0 Bug Alert In-Reply-To: <200407141136.09436.falted@pytables.org> References: <1089740511.9509.372.camel@halloween.stsci.edu> <200407141136.09436.falted@pytables.org> Message-ID: <1089805021.3741.62.camel@localhost.localdomain> On Wed, 2004-07-14 at 05:36, Francesc Alted wrote: > A Dimarts 13 Juliol 2004 19:41, Todd Miller va escriure: > > The real fix for the bug appears to be to redefine the semantics of > > numarray's PyArrayObject ->data pointer to include ->byteoffset, > > altering the C-API. > > Oh well, I'm afraid that I'll be affected by that :(. Just to understand > that fully, you mean that real data for an array will start in the future at > narr->data, instead of narr->data+narr->byteoffset as it does now? That is the current plan. I was thinking developers could just replace the new narr->data with (narr->data - narr->byteoffset) if needed. I'm assuming the planned changes will cost at most a few edits and package redistribution, which I understand is still a major pain in the neck; let me know if the cost is higher than that for some reason. Regards, Todd From paul at pfdubois.com Wed Jul 14 05:57:07 2004 From: paul at pfdubois.com (Paul F. Dubois) Date: Wed Jul 14 05:57:07 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? In-Reply-To: References: Message-ID: <40F52D8B.9050601@pfdubois.com> >>> add.reduce(take(r,indices([len(r),len(r)]))).flat array([ 0, 2, 5, 6, 8, 2, 4, 7, 8, 10, 5, 7, 10, 11, 13, 6, 8, 11, 12, 14, 8, 10, 13, 14, 16]) Always like a good challenge in the morning. God, it is like the old rush of writing APL. Hee-Seng Kye wrote: > Hi. I wrote a program to calculate sums of every possible combinations > of two indices of a list. The main body of the program looks something > like this: > > r = [0,2,5,6,8] > l = [] > > for x in range(0, len(r)): > for y in range(0, len(r)): > k = r[x]+r[y] > l.append(k) > print l > > 1. I've heard that it's not a good idea to have a 'for' loop within > another 'for' loop, and I was wondering if there is a more efficient way > to do this. > > 2. Does anyone know if there is a built-in function or module that would > do the above task in NumPy or Numarray (or even in Python)? > > I would really appreciate it if anyone could let me know. > > Thanks for your help! From Sebastien.deMentendeHorne at electrabel.com Wed Jul 14 08:41:09 2004 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Wed Jul 14 08:41:09 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? Message-ID: <035965348644D511A38C00508BF7EAEB145CAF2A@seacex03.eib.electrabel.be> I could not resist to propose an other solution: r = array([0,2,5,6,8]) l = (r[:,NewAxis] + r[NewAxis,:]).flat -----Original Message----- From: Hee-Seng Kye [mailto:kyeser at earthlink.net] Sent: mercredi 14 juillet 2004 4:22 To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? Hi. I wrote a program to calculate sums of every possible combinations of two indices of a list. The main body of the program looks something like this: r = [0,2,5,6,8] l = [] for x in range(0, len(r)): for y in range(0, len(r)): k = r[x]+r[y] l.append(k) print l 1. I've heard that it's not a good idea to have a 'for' loop within another 'for' loop, and I was wondering if there is a more efficient way to do this. 2. Does anyone know if there is a built-in function or module that would do the above task in NumPy or Numarray (or even in Python)? I would really appreciate it if anyone could let me know. Thanks for your help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rowen at u.washington.edu Wed Jul 14 08:48:07 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Wed Jul 14 08:48:07 2004 Subject: [Numpy-discussion] How to median filter a masked array? Message-ID: I want to 3x3 median filter a masked array (2-d array of ints -- an astronomical image), where the masked data and points off the edge are excluded from the local median calculation. Any suggestions for how to do this efficiently? I suspect I have to write it in C, which is an unpleasant prospect. I tried using NaN for points to mask out, but the median filter seems to handle those as "infinity", or something equally inappropriate. In a related vein, has Python come along far enough that it would be reasonable to add support for NaN to numarray -- in the sense that statistics calculations, filters, etc. could be convinced to ignore NaNs? Obviously this support would be contingent on compiling python with IEEE floating point support, but I suspect that's the default on most platforms these days. -- Russell From jdhunter at ace.bsd.uchicago.edu Wed Jul 14 09:51:12 2004 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Wed Jul 14 09:51:12 2004 Subject: [Numpy-discussion] ANN matplotlib-0.60.2: python graphs and charts Message-ID: matplotlib is a 2D plotting library for python. You can use matplotlib interactively from a python shell or IDE, or embed it in GUI applications (WX, GTK, and Tkinter). matplotlib supports many plot types: line plots, bar charts, log plots, images, pseudocolor plots, legends, date plots, finance charts and more. What's new since matplotlib 0.50 This is the first wide release in 5 months and there has been a tremendous amount of development since then, with new backends, many optimizations, new plotting types, new backends and enhanced text support. See http://matplotlib.sourceforge.net/whats_new.html for details. * Todd Miller's tkinter backend (tkagg) with good support for interactive plotting using the standard python shell, ipython or others. matplotlib now runs on windows out of the box with python + numeric/numarry * Full Numeric / numarray integration with Todd Miller's numerix module. Prebuilt installers for numeric and numarray on win32. Others, please set your numerix settings before building matplotlib, as described on http://matplotlib.sourceforge.net/faq.html#NUMARRAY * Mathtext: you can write TeX style math expressions anywhere in your figure. http://matplotlib.sourceforge.net/screenshots.html#mathtext_demo. * Images - figure and axes images with optional interpolated resampling, alpha blending of multiple images, and more with the imshow and figimage commands. Interactive control of colormaps, intensity scaling and colorbars - http://matplotlib.sourceforge.net/screenshots.html#layer_images * Text: freetype2 support, newline separated strings with arbitrary rotations, Paul Barrett's cross platform font manager. http://matplotlib.sourceforge.net/screenshots.html#align_text * Jared Wahlstrand's SVG backend (alpha) * Support for popular financial plot types - http://matplotlib.sourceforge.net/screenshots.html#finance_work2 * Many optimizations and extension code to remove performance bottlenecks. pcolors and scatters are an order of magnitude faster. * GTKAgg, WXAgg, TkAgg backends for http://antigrain.com (agg) rendering in the GUI canvas. Now all the major GUIs (WX, GTK, Tk) can be used with a common (agg) renderer. * Many new examples and demos - see http://matplotlib.sf.net/examples or download the src distribution and look in the examples dir. Documentation and downloads available at http://matplotlib.sourceforge.net. John Hunter From verveer at embl-heidelberg.de Wed Jul 14 10:39:59 2004 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Wed Jul 14 10:39:59 2004 Subject: [Numpy-discussion] How to median filter a masked array? In-Reply-To: References: Message-ID: <1122AA7E-D5B4-11D8-8510-000A95C92C8E@embl-heidelberg.de> On 14 Jul 2004, at 17:47, Russell E Owen wrote: > I want to 3x3 median filter a masked array (2-d array of ints -- an > astronomical image), where the masked data and points off the edge are > excluded from the local median calculation. Any suggestions for how to > do this efficiently? I don't think that you can do it very efficiently right now with the functions that are available in numarray. > I suspect I have to write it in C, which is an unpleasant prospect. Yes, that is unpleasant, trust me :-) However, in version 1.0 of numarray in the nd_image package, I have added some support for writing filter functions. The generic_filter() function iterates over the array and applies a user-defined filter function at each element. The user-defined function can be written in python or in C, and is called at each element with the values within the filter-footprint as an argument. You would write a function that finds the median of these values, excluding the NaNs (or whatever value that flags the mask.) I would suggest to prototype this function in python and move that to C as soon as it works to your satisfaction. See the numarray manual for more details. Cheers, Peter From rowen at u.washington.edu Wed Jul 14 10:44:39 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Wed Jul 14 10:44:39 2004 Subject: [Numpy-discussion] How to median filter a masked array? In-Reply-To: <40F56462.2030000@pfdubois.com> References: <40F56462.2030000@pfdubois.com> Message-ID: At 9:50 AM -0700 2004-07-14, Paul F. Dubois wrote: >The median filter is prepared to take an argument of a numarray >array but ignorant of and unprepared to deal with masked values. >Using the __array__ trick, both Numeric.MA and numarray.ma would >'know' this and therefore replace the missing values in the filter's >argument with the 'fill value' for that type -- a big number in the >case of real arrays. You could explicitly choose that value (say >using the overall median of the data m) by passing x.filled(m) >rather than x to the filter. > >If there is no such value, you probably do have to do it in C. If >you wrote it in C, how would you treat missing elements? BTW it >wouldn't be that hard; just pass both the array and its mask as >separate elements to a C routine and use SWIG to hook it up. I already have routines that handle masked data in C to create a radial profiles from 2-d integer data (since I could not figure out how to do that in numarray). I chose to pass the mask as a separate array, since I could not find any C interface for numarray.ma and since NaN made no sense for integer data. That code was pretty straightforward. I wish I could have found a simple way to support multiple array types. I thought using C++ with prototypes would be the ticket, but absent any examples and after looking through the numarray code, I gave up and took the easy way out. (I didn't use SWIG, though, I just hand coded everything. Maybe that was a mistake.) I confess that makes me worry about the underpinnings of numarray. It seems an obvious candidate to be written in C++ with prototypes. I hate to think what the developers have to go through, instead. In any case, writing a median filter is a bigger deal than taking a radial profile, and since one already existed I thought I'd ask. >I doubt NaN would help you here; you'd still have to figure out what >to do in those places. Numeric did not have support for NaN because >there were portability problems. Probably still are. And you still >are stuck in a lot of cases anyway. Well, NaN isn't very general in any case, since it's meaningless for integer data. So maybe that's a red herring. (Though if NaN had worked to mask data I would cheerfully have converted my images to floats to take advantage of it!). What's really wanted is a more unified approach to masked data. I suppose it's pie in the sky, but I sure wish most the numarray functions took an optional mask array (or accepted a numarray.ma object -- nice for the user, but probably too painful for words under the hood). I don't think there are major issues with what to do with masked data. Simply ignoring it works in most cases, e.g. mean, std dev, sum, max... In some cases one needs the new mask as output (e.g. matrix multiply). Filtering is a bit subtle: can masked data be treated the same as data off the edge? I hope so, but I'm not sure. Anyway, I am grateful for what we do have. Without Numeric or numarray I would have to write all my image processing code in a different language. -- Russell From gazzar at email.com Wed Jul 14 21:00:03 2004 From: gazzar at email.com (Gary Ruben) Date: Wed Jul 14 21:00:03 2004 Subject: [Numpy-discussion] sum() and mean() broken? Message-ID: <20040715035046.C8BFE1535C5@ws3-1.us4.outblaze.com> I'm getting tracebacks on even the most basic sum() and mean() calls in numarray 1.0 under Windows. Apologies if this has already been reported. Gary >>> from numarray import * >>> arange(10).sum() Traceback (most recent call last): File "", line 1, in -toplevel- arange(10).sum() File "C:\APPS\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 1106, in sum return ufunc.add.reduce(ufunc.add.areduce(self, type=type).flat, type=type) error: Int32asInt64: buffer not aligned on 8 byte boundary. -- _______________________________________________ Talk More, Pay Less with Net2Phone Direct(R), up to 1500 minutes free! http://www.net2phone.com/cgi-bin/link.cgi?143 From jmiller at stsci.edu Thu Jul 15 06:18:04 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 06:18:04 2004 Subject: [Numpy-discussion] sum() and mean() broken? In-Reply-To: <20040715035046.C8BFE1535C5@ws3-1.us4.outblaze.com> References: <20040715035046.C8BFE1535C5@ws3-1.us4.outblaze.com> Message-ID: <1089897432.2637.34.camel@halloween.stsci.edu> numarray-1.0 is known to have problems with Windows-98, etc. (My guess is any Pre-NT windows). I haven't seen any problems with Windows XP or Windows 2000 Pro. Which windows variant are you running? Does the numarray selftest pass? It should look something like: >>> import nuamrray.testall as testall >>> testall.test() numarray: ((0, 1178), (0, 1178)) numarray.records: (0, 48) numarray.strings: (0, 176) numarray.memmap: (0, 82) numarray.objects: (0, 105) numarray.memorytest: (0, 16) numarray.examples.convolve: ((0, 20), (0, 20), (0, 20), (0, 20)) numarray.convolve: (0, 52) numarray.fft: (0, 75) numarray.linear_algebra: ((0, 46), (0, 51)) numarray.image: (0, 27) numarray.nd_image: (0, 390) numarray.random_array: (0, 53) numarray.ma: (0, 671) On Wed, 2004-07-14 at 23:50, Gary Ruben wrote: > I'm getting tracebacks on even the most basic sum() and mean() calls in numarray 1.0 under Windows. Apologies if this has already been reported. > Gary > > >>> from numarray import * > >>> arange(10).sum() > > Traceback (most recent call last): > File "", line 1, in -toplevel- > arange(10).sum() > File "C:\APPS\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 1106, in sum > return ufunc.add.reduce(ufunc.add.areduce(self, type=type).flat, type=type) > error: Int32asInt64: buffer not aligned on 8 byte boundary. -- From mathieu.gontier at fft.be Thu Jul 15 06:29:04 2004 From: mathieu.gontier at fft.be (Mathieu Gontier) Date: Thu Jul 15 06:29:04 2004 Subject: [Numpy-discussion] static void** libnumarray_API Message-ID: <200407151528.16261.mathieu.gontier@fft.be> Hello, I am developping FEM bendings from a C++ code to Python with Numarray. So, I have the following problem. In the distribution file 'libnumarray.h', the variable 'libnumarray_API' is defined as a static variable (because of the symbol NO_IMPORT is not defined). Then, I understand that all the examples are implemented in a unique file. But, in my project, I must edit header files and source files in order to solve other problems (like cycle includes). So, I have two different source files which use numarray : - the file containing the 'init' function which call the function 'import_libnumarray()' (which initialize 'libnumarray_API') - a file containing implementations, more precisely an implementation calling numarray functionnalities: with is 'static' state, this 'libnumarray_API' is NULL... I tried to compile NumArray with the symbol 'NO_IMPORT' (see libnumarray.h) in order to have an extern variable. But this symbol doesn't allow to import numarray in the python environment. So, does someone have a solution allowing to use NumArray API with header/source files ? Thanks, Mathieu Gontier From curzio.basso at unibas.ch Thu Jul 15 07:22:01 2004 From: curzio.basso at unibas.ch (Curzio Basso) Date: Thu Jul 15 07:22:01 2004 Subject: [Numpy-discussion] NA.dot transposing in place Message-ID: <40F692CC.3000103@unibas.ch> Hi all. I wonder if anyone noticed the following behaviour (new in 1.0) of the dot/matrixmultiply functions: >>> alpha = NA.arange(10, shape = (10,1)) >>> beta = NA.arange(10, shape = (10,1)) >>> NA.dot(alpha, alpha) array([[285]]) >>> alpha.shape # here it looks like it's doing the transpose in place (1, 10) >>> NA.dot(beta, alpha) array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], [ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27], [ 0, 4, 8, 12, 16, 20, 24, 28, 32, 36], [ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45], [ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54], [ 0, 7, 14, 21, 28, 35, 42, 49, 56, 63], [ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72], [ 0, 9, 18, 27, 36, 45, 54, 63, 72, 81]]) >>> alpha.shape, beta.shape # but not the second time ((1, 10), (10, 1)) ------------------------------------------------- Can someone explain me what's going on? thanks, curzio From jmiller at stsci.edu Thu Jul 15 07:36:11 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 07:36:11 2004 Subject: [Numpy-discussion] static void** libnumarray_API In-Reply-To: <200407151528.16261.mathieu.gontier@fft.be> References: <200407151528.16261.mathieu.gontier@fft.be> Message-ID: <1089902141.2637.61.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 09:28, Mathieu Gontier wrote: > Hello, > > I am developping FEM bendings from a C++ code to Python with Numarray. > So, I have the following problem. > > In the distribution file 'libnumarray.h', the variable 'libnumarray_API' is > defined as a static variable (because of the symbol NO_IMPORT is not > defined). > > Then, I understand that all the examples are implemented in a unique file. > > But, in my project, I must edit header files and source files in order to > solve other problems (like cycle includes). So, I have two different source > files which use numarray : > - the file containing the 'init' function which call the function > 'import_libnumarray()' (which initialize 'libnumarray_API') > - a file containing implementations, more precisely an implementation calling > numarray functionnalities: with is 'static' state, this 'libnumarray_API' is > NULL... > > I tried to compile NumArray with the symbol 'NO_IMPORT' (see libnumarray.h) in > order to have an extern variable. But this symbol doesn't allow to import > numarray in the python environment. > > So, does someone have a solution allowing to use NumArray API with > header/source files ? The good news is that the 1.0 headers, at least, work. I intended to capture this form of multi-compilation-unit module in the numpy_compat example... but didn't. I think there's two "tricks" missing in the example. In *a* module of the several modules you're linking together, do the following: #define NO_IMPORT 1 /* This prevents the definition of the static version of the API var. The extern won't conflict with the real definition below. */ #include "libnumarray.h" void **libnumarray_API; /* This defines the missing API var for *all* your compilation units */ This variable will be assigned the API pointer by the import_libnumarray() call. I fixed the numpy_compat example to demonstrate this in CVS but they have a Numeric flavor. The same principles apply to libnumarray. Note that for numarray-1.0 you must include/import both the Numeric compatible and native numarray APIs separately if you use both. Regards, Todd From gazzar at email.com Thu Jul 15 07:37:01 2004 From: gazzar at email.com (Gary Ruben) Date: Thu Jul 15 07:37:01 2004 Subject: [Numpy-discussion] sum() and mean() broken? Message-ID: <20040715143500.2CD321CE306@ws3-6.us4.outblaze.com> Thanks Todd, It's under Win98 as you suspected and the selftest definitely doesn't pass. Are you planning on supporting Win98? If so, I'll revert to numarray 0.9. Otherwise, I'll just use Numeric for this task and restrict playing with numarray 1.0 to my Win2k laptop. thanks, Gary -- _______________________________________________ Talk More, Pay Less with Net2Phone Direct(R), up to 1500 minutes free! http://www.net2phone.com/cgi-bin/link.cgi?143 From jmiller at stsci.edu Thu Jul 15 07:38:00 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 07:38:00 2004 Subject: [Numpy-discussion] NA.dot transposing in place In-Reply-To: <40F692CC.3000103@unibas.ch> References: <40F692CC.3000103@unibas.ch> Message-ID: <1089902251.2637.64.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 10:21, Curzio Basso wrote: > Hi all. > > I wonder if anyone noticed the following behaviour (new in 1.0) of the > dot/matrixmultiply functions: > > >>> alpha = NA.arange(10, shape = (10,1)) > > >>> beta = NA.arange(10, shape = (10,1)) > > >>> NA.dot(alpha, alpha) > array([[285]]) > > >>> alpha.shape # here it looks like it's doing the transpose in place > (1, 10) > > >>> NA.dot(beta, alpha) > array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], > [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], > [ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27], > [ 0, 4, 8, 12, 16, 20, 24, 28, 32, 36], > [ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45], > [ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54], > [ 0, 7, 14, 21, 28, 35, 42, 49, 56, 63], > [ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72], > [ 0, 9, 18, 27, 36, 45, 54, 63, 72, 81]]) > > >>> alpha.shape, beta.shape # but not the second time > ((1, 10), (10, 1)) > > ------------------------------------------------- > > Can someone explain me what's going on? It's a bug introduced in numarray-1.0. It'll be fixed for 1.1 in a couple weeks. Regards, Todd From jmiller at stsci.edu Thu Jul 15 07:49:14 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 07:49:14 2004 Subject: [Numpy-discussion] sum() and mean() broken? In-Reply-To: <20040715143500.2CD321CE306@ws3-6.us4.outblaze.com> References: <20040715143500.2CD321CE306@ws3-6.us4.outblaze.com> Message-ID: <1089902892.2637.75.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 10:35, Gary Ruben wrote: > Thanks Todd, > It's under Win98 as you suspected and the selftest definitely doesn't pass. > Are you planning on supporting Win98? I'm planning to debug this particular problem because I'm concerned that it's just latent in the newer windows variants. To the degree that Win98 is "free" under the umbrella of win32, it will continue to be supported. An ongoing issue will likely be that Win98 testing doesn't get done on a regular basis... just as problems are reported. Regards, Todd From curzio.basso at unibas.ch Thu Jul 15 07:51:01 2004 From: curzio.basso at unibas.ch (Curzio Basso) Date: Thu Jul 15 07:51:01 2004 Subject: [Numpy-discussion] NA.dot transposing in place In-Reply-To: <1089902251.2637.64.camel@halloween.stsci.edu> References: <40F692CC.3000103@unibas.ch> <1089902251.2637.64.camel@halloween.stsci.edu> Message-ID: <40F6999C.2050101@unibas.ch> Todd Miller wrote: > It's a bug introduced in numarray-1.0. It'll be fixed for 1.1 in a > couple weeks. Ah, ok. Is it related with the bug announced a couple of days ago? From jmiller at stsci.edu Thu Jul 15 08:14:10 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 08:14:10 2004 Subject: [Numpy-discussion] NA.dot transposing in place In-Reply-To: <40F6999C.2050101@unibas.ch> References: <40F692CC.3000103@unibas.ch> <1089902251.2637.64.camel@halloween.stsci.edu> <40F6999C.2050101@unibas.ch> Message-ID: <1089904417.2637.147.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 10:50, Curzio Basso wrote: > Todd Miller wrote: > > > It's a bug introduced in numarray-1.0. It'll be fixed for 1.1 in a > > couple weeks. > > Ah, ok. Is it related with the bug announced a couple of days ago? Only peripherally. The Numeric compatibility layer problem was discovered as a result of porting a bunch of Numeric functions to numarray... ports done to try to get better small array speed. Similarly, the setup for matrixmultiply was moved into C for numarray-1.0... to try to get better small array speed. numarray-1.0 is disappointingly buggy, but the interest generated by the 1.0 moniker is making the open source model work well so I think 1.1 will be much more solid as a result of strong user feedback. So, thanks for the report. Regards, Todd From cjw at sympatico.ca Thu Jul 15 08:22:07 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 15 08:22:07 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407131106.19557.falted@pytables.org> References: <200407131028.04791.falted@pytables.org> <200407131106.19557.falted@pytables.org> Message-ID: <40F6A106.6020606@sympatico.ca> Francesc Alted wrote: >A Dimarts 13 Juliol 2004 10:28, Francesc Alted va escriure: > > >>A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: >> >> >>>What I'm wondering about is what a single element of a record array >>>should be. Returning a tuple has an undeniable simplicity to it. >>> >>> >>Yeah, this why I'm strongly biased toward this possibility. >> >> >> >>>On the other hand, we've been using recarrays that allow naming the >>>various columns (which we refer to as "fields"). If one can refer >>>to fields of a recarray, shouldn't one be able to refer to a field >>>(by name) of one of it's elements? Or are you proposing that basic >>>recarrays not have that sort of capability (something added by a >>>subclass)? >>> >>> >>Well, I'm not sure about that. But just in case most of people would like to >>access records by field as well as by index, I would advocate for the >>possibility that the Record instances would behave as similar as possible as >>a tuple (or dictionary?). That include creating appropriate __str__() *and* >>__repr__() methods as well as __getitem__() that supports both name fields >>and indices. I'm not sure about whether providing an __getattr__() method >>would ok, but for the sake of simplicity and in order to have (preferably) >>only one way to do things, I would say no. >> >> > >I've been thinking that one can made compatible to return a tuple on a >single element of a RecArray and still being able to retrieve a field by >name is to play with the RecArray.__getitem__ and let it to suport key names >in addition to indices. This would be better seen as an example: > >Right now, one can say: > > > >>>>r=records.array([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") >>>>r._fields["c1"] >>>> >>>> >array([1, 2]) > > >>>>r._fields["c1"][1] >>>> >>>> >2 > >What I propose is to be able to say: > > > >>>>r["c1"] >>>> >>>> >array([1, 2]) > > >>>>r["c1"][1] >>>> I would suggest going a step beyond this, so that one can have r.c1[1], see the script below. I have not explored the assignment of a value to r.c1.[1], but it seems to be achievable. If changes along this line are acceptable, it is suggested that fields be renamed cols, or some such, to indicate its wider impact. Colin W. >>>> >>>> >2 > >Which would replace the notation: > > > >>>>r[1]["c1"] >>>> >>>> >2 > >which was recently suggested. > >I.e. the suggestion is to realize RecArrays as a collection of columns, >as well as a collection of rows. > > # tRecord.py to explore RecArray import numarray.records as _rec import sys # class Rec1(_rec.RecArray): def __new__(cls, buffer, formats, shape=0, names=None, byteoffset=0, bytestride=None, byteorder=sys.byteorder, aligned=0): # This calls RecArray.__init__ - reason unclear. # Why can't the instance be fully created by RecArray.__init__? return _rec.RecArray.__new__(cls, buffer, formats=formats, shape=shape, names=names, byteorder=byteorder, aligned=aligned) def __init__(self, buffer, formats, shape=0, names=None, byteoffset=0, bytestride=None, byteorder=sys.byteorder, aligned=0): arr= _rec.array(buffer, formats=formats, shape=shape, names=names, byteorder=byteorder, aligned=aligned) self.__setstate__(arr.__getstate__()) def __getattr__(self, name): # We reach here if the attribute does not belong to the basic Rec1 set return self._fields[name] def __getattribute__(self, name): return _rec.RecArray.__getattribute__(self, name) def __repr__(self): return self.__class__.__name__ + _rec.RecArray.__repr__(self)[8:] def __setattr__(self, name, value): return _rec.RecArray.__setattr__(self, name, value) def __str__(self): return self.__class__.__name__ + _rec.RecArray.__str__(self)[8:] if __name__ == '__main__': # Frances Alted 13-Jul-04 05:06 r= _rec.array([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") print r._fields["c1"] print r._fields["c1"][1] r1= Rec1([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") print r1._fields["c1"] print r1._fields["c1"][1] # r1.zz= 99 # acceptable print r1.c1 print r1.c1[1] try: x= r1.ugh except: print 'ugh not recognized as an attribute' ''' The above delivers: [1 2] 2 [1 2] 2 [1 2] 2 ugh not recognized as an attribute ''' From falted at pytables.org Thu Jul 15 09:12:08 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 15 09:12:08 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <40F6A106.6020606@sympatico.ca> References: <200407131106.19557.falted@pytables.org> <40F6A106.6020606@sympatico.ca> Message-ID: <200407151811.20359.falted@pytables.org> A Dijous 15 Juliol 2004 17:21, Colin J. Williams va escriure: > >What I propose is to be able to say: > >>>>r["c1"][1] > I would suggest going a step beyond this, so that one can have r.c1[1], > see the script below. Yeah. I've implemented something similar to access column elements for pytables Table objects. However, the problem in this case is that there are already attributes that "pollute" the column namespace, so that a column named "size" collides with the size() method. I came up with a solution by adding a new "cols" attribute to the Table object that is an instance of a simple class named Cols with no attributes that can pollute the namespace (except some starting by "__" or "_v_"). Then, it is just a matter of provide functionality to access the different columns. In that case, when a reference of a column is made, another object (instance of Column class) is returned. This Column object is basically an accessor to column values with a __getitem__() and __setitem__() methods. That might sound complicated, but it is not. I'm attaching part of the relevant code below. I personally like that solution in the context of pytables because it extends the "natural naming" convention quite naturally. A similar approach could be applied to RecArray objects as well, although numarray might (and probably do) have other usage conventions. > I have not explored the assignment of a value to r.c1.[1], but it seems > to be achievable. in the schema I've just proposed the next should be feasible: value = r.cols.c1[1] r.cols.c1[1] = value -- Francesc Alted ----------------------------------------------------------------- class Cols(object): """This is a container for columns in a table It provides methods to get Column objects that gives access to the data in the column. Like with Group instances and AttributeSet instances, the natural naming is used, i.e. you can access the columns on a table like if they were normal Cols attributes. Instance variables: _v_table -- The parent table instance _v_colnames -- List with all column names Methods: __getitem__(colname) """ def __init__(self, table): """Create the container to keep the column information. table -- The parent table """ self.__dict__["_v_table"] = table self.__dict__["_v_colnames"] = table.colnames # Put the column in the local dictionary for name in table.colnames: self.__dict__[name] = Column(table, name) def __len__(self): return self._v_table.nrows def __getitem__(self, name): """Get the column named "name" as an item.""" if not isinstance(name, types.StringType): raise TypeError, \ "Only strings are allowed as keys of a Cols instance. You passed object: %s" % name # If attribute does not exist, return None if not name in self._v_colnames: raise AttributeError, \ "Column name '%s' does not exist in table:\n'%s'" % (name, str(self._v_table)) return self.__dict__[name] def __str__(self): """The string representation for this object.""" # The pathname pathname = self._v_table._v_pathname # Get this class name classname = self.__class__.__name__ # The number of columns ncols = len(self._v_colnames) return "%s.cols (%s), %s columns" % (pathname, classname, ncols) def __repr__(self): """A detailed string representation for this object.""" out = str(self) + "\n" for name in self._v_colnames: # Get this class name classname = getattr(self, name).__class__.__name__ # The shape for this column shape = self._v_table.colshapes[name] # The type tcol = self._v_table.coltypes[name] if shape == 1: shape = (1,) out += " %s (%s%s, %s)" % (name, classname, shape, tcol) + "\n" return out class Column(object): """This is an accessor for the actual data in a table column Instance variables: table -- The parent table instance name -- The name of the associated column Methods: __getitem__(key) """ def __init__(self, table, name): """Create the container to keep the column information. table -- The parent table instance name -- The name of the column that is associated with this object """ self.table = table self.name = name # Check whether an index exists or not iname = "_i_"+table.name+"_"+name self.index = None if iname in table._v_parent._v_indices: self.index = Index(where=self, name=iname, expectedrows=table._v_expectedrows) else: self.index = None def __getitem__(self, key): """Returns a column element or slice It takes different actions depending on the type of the "key" parameter: If "key" is an integer, the corresponding element in the column is returned as a NumArray/CharArray, or a scalar object, depending on its shape. If "key" is a slice, the row slice determined by this slice is returned as a NumArray or CharArray object (whatever is appropriate). """ if isinstance(key, types.IntType): if key < 0: # To support negative values key += self.table.nrows (start, stop, step) = processRange(self.table.nrows, key, key+1, 1) return self.table._read(start, stop, step, self.name, None)[0] elif isinstance(key, types.SliceType): (start, stop, step) = processRange(self.table.nrows, key.start, key.stop, key.step) return self.table._read(start, stop, step, self.name, None) else: raise TypeError, "'%s' key type is not valid in this context" % \ (key) def __str__(self): """The string representation for this object.""" # The pathname pathname = self.table._v_pathname # Get this class name classname = self.__class__.__name__ # The shape for this column shape = self.table.colshapes[self.name] if shape == 1: shape = (1,) # The type tcol = self.table.coltypes[self.name] return "%s.cols.%s (%s%s, %s)" % (pathname, self.name, classname, shape, tcol) def __repr__(self): """A detailed string representation for this object.""" return str(self) From perry at stsci.edu Thu Jul 15 10:39:06 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 15 10:39:06 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407151811.20359.falted@pytables.org> Message-ID: Francesc Alted wrote: > A Dijous 15 Juliol 2004 17:21, Colin J. Williams va escriure: > > >What I propose is to be able to say: > > >>>>r["c1"][1] > > I would suggest going a step beyond this, so that one can have r.c1[1], > > see the script below. > > Yeah. I've implemented something similar to access column elements for > pytables Table objects. However, the problem in this case is that > there are > already attributes that "pollute" the column namespace, so that a column > named "size" collides with the size() method. > The idea of mapping field names to attributes occurs to everyone quickly, but for the reasons Francesc gives (as well as another I'll mention) we were reluctant to implement it. The other reason is that it would be nice to allow field names that are not legal attributes (e.g., that include spaces or other illegal attribute characters). There are potentially people with data in databases or other similar formats that would like to map field name exactly. Well certainly one can still use the attribute approach and not support all field names (or column, or col...) it does introduce another glitch in the user interface when it works only for a subset of legal names. > I came up with a solution by adding a new "cols" attribute to the Table > object that is an instance of a simple class named Cols with no attributes > that can pollute the namespace (except some starting by "__" or "_v_"). > Then, it is just a matter of provide functionality to access the different > columns. In that case, when a reference of a column is made, > another object > (instance of Column class) is returned. This Column object is basically an > accessor to column values with a __getitem__() and __setitem__() methods. > That might sound complicated, but it is not. I'm attaching part of the > relevant code below. > > I personally like that solution in the context of pytables because it > extends the "natural naming" convention quite naturally. A > similar approach > could be applied to RecArray objects as well, although numarray might (and > probably do) have other usage conventions. > > > I have not explored the assignment of a value to r.c1.[1], but it seems > > to be achievable. > > in the schema I've just proposed the next should be feasible: > > value = r.cols.c1[1] > r.cols.c1[1] = value > This solution avoids name collisions but doesn't handle the other problem. This is worth considering, but I thought I'd hear comments about the other issue before deciding it (there is also the "more than one way" issue as well; but this guideline seems to bend quite often to pragmatic concerns). We're still chewing on all the other issues and plan to start floating some proposals, rationales and questions before long. Perry From falted at pytables.org Thu Jul 15 11:21:10 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 15 11:21:10 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: <200407152020.00873.falted@pytables.org> A Dijous 15 Juliol 2004 19:37, Perry Greenfield va escriure: > formats that would like to map field name exactly. Well certainly > one can still use the attribute approach and not support all field > names (or column, or col...) it does introduce another glitch in > the user interface when it works only for a subset of legal names. Yep. I forgot that issue. My particular workaround on that was to provide an optional trMap dictionary during Table (in our case, RecArray) creation time to map those original names that are not valid python names by valid ones. That would read something like: >>> r=records.array([(1,"as")], "1i4,1a2", names=["c 1", "c2"], trMap={"c1": "c 1"}) that would indicate that the "c 1" column which is not a valid python name (it has an space in the middle) can be accessed by using "c1" string, which is a valid python id. That way, r.cols.c1 would access column "c 1". And although I must admit that this solution is not very elegant, it allows to cope with those situations where the columns are not valid python names. -- Francesc Alted From cjw at sympatico.ca Thu Jul 15 17:22:42 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 15 17:22:42 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: <40F71F9C.9040008@sympatico.ca> Perry Greenfield wrote: >Francesc Alted wrote: > > >>A Dijous 15 Juliol 2004 17:21, Colin J. Williams va escriure: >> >> >>>>What I propose is to be able to say: >>>> >>>> >>>>>>>r["c1"][1] >>>>>>> >>>>>>> >>>I would suggest going a step beyond this, so that one can have r.c1[1], >>>see the script below. >>> >>> >>Yeah. I've implemented something similar to access column elements for >>pytables Table objects. However, the problem in this case is that >>there are >>already attributes that "pollute" the column namespace, so that a column >>named "size" collides with the size() method. >> >> >> >The idea of mapping field names to attributes occurs to everyone >quickly, but for the reasons Francesc gives (as well as another I'll >mention) we were reluctant to implement it. The other reason is that >it would be nice to allow field names that are not legal attributes >(e.g., that include spaces or other illegal attribute characters). >There are potentially people with data in databases or other similar >formats that would like to map field name exactly. Well certainly >one can still use the attribute approach and not support all field >names (or column, or col...) it does introduce another glitch in >the user interface when it works only for a subset of legal names. > > It would, I suggest, not be unduly restrictive to bar the existing attribute names but, if that's not acceptable, Francesc has suggested the.col workaround, although I would prefer to avoid the added clutter. Incidentally, there is no current protection against wiping out an existing method: [Dbg]>>> r1.size= 0 [Dbg]>>> r1.size 0 [Dbg]>>> > > >>I came up with a solution by adding a new "cols" attribute to the Table >>object that is an instance of a simple class named Cols with no attributes >>that can pollute the namespace (except some starting by "__" or "_v_"). >>Then, it is just a matter of provide functionality to access the different >>columns. In that case, when a reference of a column is made, >>another object >>(instance of Column class) is returned. This Column object is basically an >>accessor to column values with a __getitem__() and __setitem__() methods. >>That might sound complicated, but it is not. I'm attaching part of the >>relevant code below. >> >>I personally like that solution in the context of pytables because it >>extends the "natural naming" convention quite naturally. A >>similar approach >>could be applied to RecArray objects as well, although numarray might (and >>probably do) have other usage conventions. >> >> >> >>>I have not explored the assignment of a value to r.c1.[1], but it seems >>>to be achievable. >>> >>> >>in the schema I've just proposed the next should be feasible: >> >>value = r.cols.c1[1] >>r.cols.c1[1] = value >> >> >> >This solution avoids name collisions but doesn't handle the other >problem. This is worth considering, but I thought I'd hear comments >about the other issue before deciding it (there is also the >"more than one way" issue as well; but this guideline seems to bend >quite often to pragmatic concerns). > To allow for multi-word column names, assignment could replace a space by an underscore and, in retrieval, the reverse could be done - ie. underscore would be banned for a column name. Colin W. > >We're still chewing on all the other issues and plan to start floating >some proposals, rationales and questions before long. > >Perry > > > > From falted at pytables.org Fri Jul 16 02:12:11 2004 From: falted at pytables.org (Francesc Alted) Date: Fri Jul 16 02:12:11 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <40F71F9C.9040008@sympatico.ca> References: <40F71F9C.9040008@sympatico.ca> Message-ID: <200407161111.41626.falted@pytables.org> A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: > To allow for multi-word column names, assignment could replace a space > by an underscore > and, in retrieval, the reverse could be done - ie. underscore would be > banned for a column name. That's not so easy. What about other chars like '/&%@$()' that cannot be part of python names? Finding a biunivocal map between them and allowed chars would be difficult (if possible at all). Besides, the resulting colnames might become a real mess. Regards, -- Francesc Alted From cjw at sympatico.ca Fri Jul 16 05:41:12 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Jul 16 05:41:12 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407161111.41626.falted@pytables.org> References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> Message-ID: <40F7CBC6.2030607@sympatico.ca> Francesc Alted wrote: >A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: > > >>To allow for multi-word column names, assignment could replace a space >>by an underscore >>and, in retrieval, the reverse could be done - ie. underscore would be >>banned for a column name. >> >> > >That's not so easy. What about other chars like '/&%@$()' that cannot be >part of python names? Finding a biunivocal map between them and allowed >chars would be difficult (if possible at all). Besides, the resulting >colnames might become a real mess. > >Regards, > > Yes, if the objective is to include special characters or facilitate multi-lingual columns names and it probably should be, then my suggestion is quite inadequate. Perhaps there could be a simple name -> column number mapping in place of _names. References to a column, or a field in a record, could then be through this dictionary. Basic access to data in a record would be by position number, rather than name, but the dictionary would facilitate access by name. Data could be referenced either through the column name: r1.c2[1] or through the record r1[1].c2, with the possibility that the index is multi-dimensional in either case. Colin W. From rowen at u.washington.edu Fri Jul 16 10:55:23 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Fri Jul 16 10:55:23 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407161111.41626.falted@pytables.org> References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> Message-ID: >A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: >> To allow for multi-word column names, assignment could replace a space >> by an underscore >> and, in retrieval, the reverse could be done - ie. underscore would be >> banned for a column name. > >That's not so easy. What about other chars like '/&%@$()' that cannot be >part of python names? Finding a biunivocal map between them and allowed >chars would be difficult (if possible at all). Besides, the resulting >colnames might become a real mess. Personally, I think the idea of allowing access to fields via attributes is fatally flawed. The problems raised (non-obvious mapping between field names with special characters and allowable attribute names and also the collision with existing instance variable and method names) clearly show it would be forced and non-pythonic. The obvious solution seems to be some combination of the dict interface (an ordered dict that keeps its keys in original field order) and the list interface. My personal leaning is: - Offer most of the dict methods, including __get/setitem__, keys, values and all iterators but NOT set_default pop_item or anything else that adds or deletes a field. - Offer the list version of __get/setitem__, as well, but NONE of list's methods. - Make the default iterator iterate over values, not keys (field names), i.e have the item act like a list, not a dict when used as an iterator. In other words, the following all work (where item is one element of a numarray.record array): item[0] = 10 # set value of field 0 to 10 x = item[0:5] # get value of fields 0 through 4 item[:] = list of replacement values item["afield"] = 10 "%s(afield)" % item the methods iterkeys, itervalues, iteritems, keys, values, has_key all work the method update might work, but it's an error to add new fields -- Russell P.S. Folks are welcome to use my ordered dictionary implementation RO.Alg.OrderedDictionary, which is part of the RO package . It is fully standalone (despite its location in my hierarchy) and is used in production code. From barrett at stsci.edu Fri Jul 16 11:49:01 2004 From: barrett at stsci.edu (Paul Barrett) Date: Fri Jul 16 11:49:01 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> Message-ID: <40F822E0.5010406@stsci.edu> Russell E Owen wrote: >> A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: >> >>> To allow for multi-word column names, assignment could replace a space >>> by an underscore >>> and, in retrieval, the reverse could be done - ie. underscore would be >>> banned for a column name. >> >> >> That's not so easy. What about other chars like '/&%@$()' that cannot be >> part of python names? Finding a biunivocal map between them and allowed >> chars would be difficult (if possible at all). Besides, the resulting >> colnames might become a real mess. > > > Personally, I think the idea of allowing access to fields via > attributes is fatally flawed. The problems raised (non-obvious mapping > between field names with special characters and allowable attribute > names and also the collision with existing instance variable and > method names) clearly show it would be forced and non-pythonic. +1 It also make it difficult to do the following: a = item[:10, ('age', 'surname', 'firstname')] where field (or column) 1 is 'firstname, field 2 is 'surname', and field 10 is 'age'. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218 From jmiller at stsci.edu Fri Jul 16 12:43:02 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 16 12:43:02 2004 Subject: [Numpy-discussion] I move your "Bugs" reports... Message-ID: <1090006936.7264.66.camel@halloween.stsci.edu> Not infrequently even very experienced numarray contributors file bug reports in the numpy "Bugs" tracker. Because numpy is a shared SF project with both Numeric and numarray, numarray bugs are actually tracked in the "Numarray Bugs" tracker, here: http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse "Numarray Bugs" can also be found through the "Tracker" link at the top of any numpy SF web page. So, don't worry, your painstaking reports are not getting deleted, they're getting relocated to a place where *only* numarray bugs live. There's probably a better way to do this, but until I find it or someone tells me about it, I thought I should tell everyone what's going on. Thanks to everybody who takes the time to fill out bug reports to make numarray better... Regards, Todd From hsu at stsci.edu Fri Jul 16 13:19:00 2004 From: hsu at stsci.edu (Jin-chung Hsu) Date: Fri Jul 16 13:19:00 2004 Subject: [Numpy-discussion] multidimensional record arrays Message-ID: <200407162018.ANW09710@donner.stsci.edu> There have been a number of questions and suggestions about how the record array facility in numarray could be improved. We've been talking about these internally and thought it would be useful to air some proposals along with discussions of the rationale behind each proposal as well discussions of drawbacks, and some remaining open questions. Rather than do this in one long message, we will do this in pieces. The first addresses how to improve handling multidimensional record arrays. These will not discuss how or when we implement the proposed enhancements or changes. We first want to come to some consensus (or lacking that, decision) first about what the target should be. ********************************************************* Proposal for records module enhancement, to handle record arrays of dimension (rank) higher than 1. Background: The current records module in numarray doesn't handle record arrays of dimension higher than one well. Even though most of the infrastructure for higher dimensionality is already in place, the current implementation for the record arrays was based on the implicit assumption that record arrays are 1-D. This limitation is reflected in the areas of input user interface, indexing, and output. The indexing and output are more straightforward to modify, so I'll discuss it first. Although it is possible to create a multi-dimensional record array, indexing does not work properly for 2 or more dimensions. For example, for a 2-D record array r, r[i,j] does not give correct result (but r[i][j] does). This will be fixed. At present, a user cannot print record arrays higher than 1-D. This will also be fixed as well as incorporating some numarray features (e.g., printing only the beginning and end of an array for large arrays--as is done for numarrays now). Input Interface: There are currently several different ways to construct the record array using the array() function These include setting the buffer argument to: (1) None (2) File object (3) String object or appropriate buffer object (i.e., binary data) (4) a list of records (in the form of sequences), for example: [(1,'abc', 2.3), (2,'xyz', 2.4)] (5) a list of numarrays/chararrays for each field (e.g., effectively 'zipping' the arrays into records) The first three types of input are very general and can be used to generate multi-dimensional record arrays in the current implementation. All these options need to specify the "shape" argument. The input options that do not work for multi-dimensional record arrays now are the last two. Option 4 (sequence of 'records') If a user has a multi-dimensional record array and if one or more field is also a multidimensional array, using this option is potentially confusing since there can be ambiguity regarding what part of a nested sequence structure is the structure of the record array and what should be considered part of the record since record elements themselves may be arrays. (Some of the same issues arise for object arrays) As an example: --> r=rec.array([([1,2],[3,4]),([11,12],[13,14])]) could be interpreted as a 1-D record array, where each cell is an (num)array: RecArray[ (array([1, 2]), array([3, 4])), (array([11, 12]), array([13, 14])) ] or a 2-D record array, where each cell is just a number: RecArray( [[(1, 2), (3, 4)], [(11, 12), (13, 14)]]) Thus we propose a new argument "rank" (following the convention used in object arrays) to specify the dimensionality of the output record array. In the first example above, rank is 1, and the second example rank=2. If rank is set to None, the highest possible rank will be assumed (in this example, 2). We propose to eventually generalize that to accept any sequence object for the array structure (though there will be the same requirement that exist for other arrays that the nested sequences be of the same type). As would be expected, strings are not permitted as the enclosing sequence. In this future implementation the record 'item' itself must either be: 1) A tuple 2) A subclass of tuple 3) A Record object (this may be taken care of by 2 if we make Record a subclass of tuple; this will be discussed in a subsequent proposal. This requirement allows distinguishing the sequence of records from Option 5 below. For tuples (or tuple derived elements), the items of the tuple must be one of the following: basic data types such as int, float, boolean, or string; a numarray or chararray; or an object that can be converted to a numarray or chararray. Option 5 (List of Arrays) Using a list of arrays to construct an N-D record array should be easier Than using the previous option. The input syntax is simply: [array1, array2, array3,...] The shape of the record array will be determined from the shape of the input arrays as described below. All the user needs to do is to construct the arrays in the list. There is, similar to option 4, a possible ambiguity: if all the arrays are of the shape, say, (2,3), then the user may intend a 1-D record array of 2 rows while each cell is an array of shape 3, or a 2-D record array of shape (2,3) while each cell is a single number of string. Thus, the user must either explicitly specify the "shape" or "rank". We propose the following behavior via examples: Example 1: given: array1.shape=(2,3,4,5) array2.shape=(2,3,4) array3.shape=(2,3) Rank can only be specified as rank=1 (the record array's shape will then be (2,)) or rank=2 (the record array's shape will then be (2,3)). For rank=None the record shape will be (2,3), i.e. the "highest common denominator": each cell in the first field will be an array of shape (4,5), each cell in the second field will be an array of shape (4,), and each cell in the 3rd field will be a single number or a string. If "shape" is specified, it will take precedence over "rank" and its allowed value in this example will be either 2, or (2,3). Example 2: array1.shape=(3,4,5) array2.shape=(4,5) this will raise exception because the 'slowest' axes do not match. ********* For both the sequence of records and list-of-arrays input options, we Propose the default value for "rank" be None (current default is 1). This gives consistent behavior with object arrays but does change the current behavior. Also for both cases specifying a shape inconsistent with the supplied data will raise an exception. From cjw at sympatico.ca Fri Jul 16 19:46:09 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Jul 16 19:46:09 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <40F822E0.5010406@stsci.edu> References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> <40F822E0.5010406@stsci.edu> Message-ID: <40F892B2.7090706@sympatico.ca> Paul Barrett wrote: > Russell E Owen wrote: > >>> A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: >>> >>>> To allow for multi-word column names, assignment could replace a >>>> space >>>> by an underscore >>>> and, in retrieval, the reverse could be done - ie. underscore >>>> would be >>>> banned for a column name. >>> >>> >>> >>> That's not so easy. What about other chars like '/&%@$()' that >>> cannot be >>> part of python names? Finding a biunivocal map between them and allowed >>> chars would be difficult (if possible at all). Besides, the resulting >>> colnames might become a real mess. >> >> >> >> Personally, I think the idea of allowing access to fields via >> attributes is fatally flawed. The problems raised (non-obvious >> mapping between field names with special characters and allowable >> attribute names and also the collision with existing instance >> variable and method names) clearly show it would be forced and >> non-pythonic. > > > +1 Paul, Below, I've appended my response to Francesc's 08:36 message, it was copied to the list but does not appear in the archive. > > It also make it difficult to do the following: > > a = item[:10, ('age', 'surname', 'firstname')] > > where field (or column) 1 is 'firstname, field 2 is 'surname', and > field 10 is 'age'. > > -- Paul Could you clarify what you have in mind here please? Is this a proposed extension to records.py, as it exists in version 1.0? Colin W. ------------------------------------------------------------------------ Yes, if the objective is to include special characters or facilitate multi-lingual columns names and it probably should be, then my suggestion is quite inadequate. Perhaps there could be a simple name -> column number mapping in place of _names. References to a column, or a field in a record, could then be through this dictionary. Basic access to data in a record would be by position number, rather than name, but the dictionary would facilitate access by name. Data could be referenced either through the column name: r1.c2[1] or through the record r1[1].c2, with the possibility that the index is multi-dimensional in either case. Colin W. From gerard.vermeulen at grenoble.cnrs.fr Sun Jul 18 14:25:10 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Sun Jul 18 14:25:10 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <1088632459.7526.213.camel@halloween.stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> Message-ID: <20040718212443.M21561@grenoble.cnrs.fr> Hi Todd, This is a follow-up on the 'header pep' discussion. The attachment numnum-0.1.tar.gz contains the sources for the extension modules pep and numnum. At least on my systems, both modules behave as described in the 'numarray header PEP' when the extension modules implementing the C-API are not present (a situation not foreseen by the macros import_array() of Numeric and especially numarray). IMO, my solution is 'bona fide', but requires further testing. The pep module shows how to handle the colliding C-APIs of the Numeric and numarray extension modules and how to implement automagical conversion between Numeric and numarray arrays. For a technical reason explained in the README, the hard work of doing the conversion between Numeric and numarray arrays has been delegated to the numnum module. The numnum module is useful when one needs to convert from one array type to the other to use an extension module which only exists for the other type (eg. combining numarray's image processing extensions with pygame's Numeric interface): Python 2.3+ (#1, Jan 7 2004, 09:17:35) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numnum; import Numeric as np; import numarray as na >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) >>> print type(np1); np1; type(np2); np2 array([[1, 2], [3, 4]]) array([[1, 2, 3], [4, 5, 6]],'i') >>> print type(na1); na1; type(na2); na2 array([[1, 2], [3, 4]]) array([[1, 2, 3], [4, 5, 6]]) >>> The pep module shows how to implement array processing functions which use the Numeric, numarray or Sequence C-API: static PyObject * wysiwyg(PyObject *dummy, PyObject *args) { PyObject *seq1, *seq2; PyObject *result; if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) return NULL; switch(API) { case NumericAPI: { PyObject *np1 = NN_API->toNP(seq1); PyObject *np2 = NN_API->toNP(seq2); result = np_wysiwyg(np1, np2); Py_XDECREF(np1); Py_XDECREF(np2); break; } case NumarrayAPI: { PyObject *na1 = NN_API->toNA(seq1); PyObject *na2 = NN_API->toNA(seq2); result = na_wysiwyg(na1, na2); Py_XDECREF(na1); Py_XDECREF(na2); break; } case SequenceAPI: result = seq_wysiwyg(seq1, seq2); break; default: PyErr_SetString(PyExc_RuntimeError, "Should never happen"); return 0; } return result; } See the README for an example session using the pep module showing that it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). Notes: - it is straightforward to adapt pep and numnum so that the conversion functions are linked into pep instead of imported. - numnum is still 'proof of concept'. I am thinking about methods to make those techniques safer if the numarray (and Numeric?) header files make it never into the Python headers (or make it safer to use those techniques with Python < 2.4). In particular it would be helpful if the numerical C-APIs export an API version number, similar to the versioning scheme of shared libraries -- see the libtool->versioning info pages. I am considering three possibilities to release a more polished version of numnum (3rd party extension writers may prefer to link rather than import numnum's functionality): 1. release it from PyQwt's project page 2. register an independent numnum project at SourceForge 3. hand numnum over to the Numerical Python project (frees me from worrying about API changes). Regards -- Gerard Vermeulen -------------- next part -------------- A non-text attachment was scrubbed... Name: numnum-0.1.tar.gz Type: application/gzip Size: 12851 bytes Desc: not available URL: From jmiller at stsci.edu Tue Jul 20 05:49:04 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 20 05:49:04 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <20040718212443.M21561@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040718212443.M21561@grenoble.cnrs.fr> Message-ID: <1090327693.3749.257.camel@localhost.localdomain> On Sun, 2004-07-18 at 17:24, gerard.vermeulen at grenoble.cnrs.fr wrote: > Hi Todd, > > This is a follow-up on the 'header pep' discussion. Great! I was afraid you were going to disappear back into the ether. Sorry I didn't respond to this yesterday... I saw it but accidentally marked it as "read" and then forgot about it as the day went on. > The attachment numnum-0.1.tar.gz contains the sources for the > extension modules pep and numnum. At least on my systems, both > modules behave as described in the 'numarray header PEP' when the > extension modules implementing the C-API are not present (a situation > not foreseen by the macros import_array() of Numeric and especially > numarray). For numarray, this was *definitely* foreseen at some point, so I'm wondering what doesn't work now... > IMO, my solution is 'bona fide', but requires further > testing. I'll look it over today or tomorrow and comment more then. > The pep module shows how to handle the colliding C-APIs of the Numeric > and numarray extension modules and how to implement automagical > conversion between Numeric and numarray arrays. Nice; the conversion code sounds like a good addition to me. > For a technical reason explained in the README, the hard work of doing > the conversion between Numeric and numarray arrays has been delegated > to the numnum module. The numnum module is useful when one needs to > convert from one array type to the other to use an extension module > which only exists for the other type (eg. combining numarray's image > processing extensions with pygame's Numeric interface): > > Python 2.3+ (#1, Jan 7 2004, 09:17:35) > [GCC 3.3.1 (SuSE Linux)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numnum; import Numeric as np; import numarray as na > >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) > >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) > >>> print type(np1); np1; type(np2); np2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]],'i') > >>> print type(na1); na1; type(na2); na2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]]) > >>> > > The pep module shows how to implement array processing functions which > use the Numeric, numarray or Sequence C-API: > > static PyObject * > wysiwyg(PyObject *dummy, PyObject *args) > { > PyObject *seq1, *seq2; > PyObject *result; > > if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) > return NULL; > > switch(API) { We'll definitely need to cover API in the PEP. There is a design choice here which needs to be discussed some and any resulting consensus documented. I haven't looked at the attachment yet. > case NumericAPI: > { > PyObject *np1 = NN_API->toNP(seq1); > PyObject *np2 = NN_API->toNP(seq2); > result = np_wysiwyg(np1, np2); > Py_XDECREF(np1); > Py_XDECREF(np2); > break; > } > case NumarrayAPI: > { > PyObject *na1 = NN_API->toNA(seq1); > PyObject *na2 = NN_API->toNA(seq2); > result = na_wysiwyg(na1, na2); > Py_XDECREF(na1); > Py_XDECREF(na2); > break; > } > case SequenceAPI: > result = seq_wysiwyg(seq1, seq2); > break; > default: > PyErr_SetString(PyExc_RuntimeError, "Should never happen"); > return 0; > } > > return result; > } > > See the README for an example session using the pep module showing that > it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). > > Notes: > > - it is straightforward to adapt pep and numnum so that the conversion > functions are linked into pep instead of imported. > > - numnum is still 'proof of concept'. I am thinking about methods to > make those techniques safer if the numarray (and Numeric?) header > files make it never into the Python headers (or make it safer to > use those techniques with Python < 2.4). In particular it would > be helpful if the numerical C-APIs export an API version number, > similar to the versioning scheme of shared libraries -- see the > libtool->versioning info pages. I've thought about this a few times; there's certainly a need for it in numarray anyway... and I'm always one release too late. Thanks for the tip on libtool->versioning. > I am considering three possibilities to release a more polished > version of numnum (3rd party extension writers may prefer to link > rather than import numnum's functionality): > > 1. release it from PyQwt's project page > 2. register an independent numnum project at SourceForge > 3. hand numnum over to the Numerical Python project (frees me from > worrying about API changes). > > Regards -- Gerard Vermeulen (3) sounds best to me, for the same reason that numarray is a part of the numpy project and because numnum is a Numeric/numarray tool. There is a small issue of sub-project organization (seperate bug tracking, etc.), but I figure if SF can handle Python, it can handle Numeric, numarray, and probably a number of other packages as well. Something like numnum should not be a problem and so to promote it, it would be good to keep it where people can find it without having to look too hard. For now, I'm again marking your post as "unread" and will revisit it later this week. In the meantime, thanks very much for your efforts with numnum and the PEP. Regards, Todd From perry at stsci.edu Tue Jul 20 09:05:02 2004 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jul 20 09:05:02 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: Message-ID: We now turn to the behavior of Records. We'll note that many of the current proposals had been considered in the past but not implemented with more of a 'wait and see' attitude towards what was really necessary and a desire to prevent too many ways of doing the same thing without seeing that there was a real call for them. This proposal deals with the behavior of record array 'items', i.e., what we call Record objects now. The primary issues that have been raised with regard to Record behavior are summarized as follows: 1) Items should be tuples instead of Records 2) Items should be objects, but present tuple and/or dictionary consistent behavior. 3) Field (or column) names should be accessible as Record (and record array) attributes. Issue 1: Should record array items be tuples instead of Records? Francesc Alted made this suggestion recently. Essentially the argument is that tuples are a natural way of representing records. Unfortunately, tuples do not provide a means of accessing fields of a record by name, but only by number. For this reason alone, tuples don't appear to be adequate. Francesc proposed allowing dictionary-like indexing to record arrays to facilitate the field access to tuple entries by name. However, it seems that if rarr is a record array, that both rarr['column 1'][2] and rarr[2]['column 1'] should work, not just the former. So the short answer is "No". It should be noted that using tuples will force another change in current behavior. Note that the current Record objects are actually views into the record array. Changing the value within a record object changes the record array. Use of tuples won't allow that since tuples are not mutable. Whole records must be changed in their entirety if single elements of record arrays were set by and returned from tuples. But his comments (and well as those of others) do point out a number of problems with the current implementation that could be improved, and making the Record object support tuple behaviors is quite reasonable. Hence: Issue 2: Should record array items present tuple and/or dictionary compatible behaviors? The short answer is, yes, we do agree that they should. This includes many of the proposals made including: 1) supporting all Tuple capabilities with the following differences: a) fields are mutable (unlike tuple items) so long as the assigned value is coerceable to the expected type. For example the current methods of doing so are: >>> cell = oneRec.field(1) >>> oneRec.setfield(1, newValue) This proposal would allow: >>> cell = oneRec[1] >>> oneRec[1] = newValue b) slice assignments are permitted so long as it doesn't change the size of the record (i.e., no insertion of extra items) and the items can be assigned as permitted for a. E.g., OneCell[2:4] = (3, 'abc') c) __str__ will result in a display looking like that for tuples, __repr__ will show a Record constructor >>> print oneRec # as is currently implemented (1.1, 2, 'abc', 3) >>> oneRec Record((1.1, 2, 'abc', 3), formats=['1Float32', '1Int16', '1a3', '1Int32']) names=['abc', 'c2', 'xyz', 'c4']) (note that how best to handle formats is still being thought about) 2) supporting all Dictionary capabilities with the following differences: a) keys and items are ordered. b) keys are restricted to being integers or strings only c) new keys cannot be dynamically added or deleted as for dictionaries d) no support for any other dictionary capabilities that can change the number or names of items e) __str__ will not show a result looking like a dictionary (see 1c) f) values must meet Record object required type (or be coerceable to it) For example the current >>> cell = onRec.field('c2') >>> oneRec.setfield('c2', newValue) And the proposed added indexing capability: >>> cell = oneRec['c2'] >>> oneRec['c2'] = newValue Issue 3: Field (or column) names should be accessible as Record (and record array) attributes. As much as the attribute approach has appeal for simple usage, the problems of name collisions and mismatches between acceptable field names and attribute names strikes us as it does Russell Owen as being very problematic. The technique of using a special attribute as Francesc suggests (in his case, cols) that contains the field name attributes solves the name collision problem, but not the legality issue (particularly with regard to illegal characters, it's hard to imagine easily remembered mappings between legal attribute representations and the actual field name. We are inclined to try to pass (for now anyway) on mapping fields to attributes in any way. It seems to us that indexing by name should be convenient enough, as well as fully flexible to really satisfy all needs (and is needed in any case since attributes are a clumsy way to use field access when using a variable to specify the field (yes, one can use getattr(), but it's clumsy) ******************************************* Record array behavior changes: 1) It will be possible to assign any sequence to a record array item so long as the sequence contains the right number of fields, and each item of the sequence can be coerced to what the record array expects for the corresponding field of the record. (addressing numarray feature request 928473 by Russell Owen). I.e., >>> recArr[1] = (2, 3.2, 'xyz', 3) 2) One may assign a record to a record array so long as the record matches the format of the record format of the record array (current behavior). 3) Easier construction and initialization of recarrays with default field values as requested in numarray bug report 928479) 4) Support for lists of field names and formats as detailed in numarray bug report 928488. 5) Field name indexing for record arrays. It will be possible to index record arrays with a field name, i.e., if the index is a string, then what will be returned is a numarray/chararray for that column. (Note that it won't be possible to index record arrays by field number for obvious reasons). I.e. Currently >>> col = recArr.field('doc') Can also be >>> col = recArr['abc'] But the current >>> col = recArr.field(1) Cannot become >>> col = recArr[1] On the other hand, it will not be permitted to mix a field index with an array index in the same brackets, e.g., rarr[10, 'column 2'] will not be supported. Allowing indexing to have two different interpretations is a bit worrying. But if record array items may be indexed in this manner, it seems natural to permit the same indexing for the record array. Mixing the two kinds of indexing in one index seems of limited usefulness in the first place and it makes inheriting the existing indexing machinery for NDArrays more complicated (any efficiency gains in avoiding the intermediate object creation by using two separate index operations will likely be offset by the slowness of handling much more complicated mixed indices). Perhaps someone can argue for why mixing field indices with array indices is important, but for now we will prohibit this mode of indexing. This does point to a possible enhancement for the field indexing, namely being able to provide the equivalent of index arrays (e.g., a list of field names) to generate a new record array with a subset of fields. Are there any other issues that should be addressed for improving record arrays? From rowen at u.washington.edu Tue Jul 20 10:15:05 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Tue Jul 20 10:15:05 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: References: Message-ID: At 12:04 PM -0400 2004-07-20, Perry Greenfield wrote: >...(a detailed summary of proposed changes to numarray record arrays) +1 on all of it with one exception noted below. This sounds like a first-rate overhaul and is much appreciated. Will it be possible, when creating a new records array, to specify types of a record array as a list of normal numarray types? Currently one has to specify the types as a "formats" string, which is nonstandard. I'm unhappy about one proposal: >... >Record array behavior changes: >... >5) Field name indexing for record arrays. It will be possible to index >record arrays with a field name, i.e., if the index is a string, then what >will be returned is a numarray/chararray for that column. (Note that it >won't be possible to index record arrays by field number for obvious >reasons). > >I.e. Currently > >>>> col = recArr.field('doc') > >Can also be > >>>> col = recArr['abc'] > >But the current > >>>> col = recArr.field(1) > >Cannot become > >>>> col = recArr[1] I think recarray[field name] is too easily confused with recarray[index] and is unnecessary. I suggest one of two solutions: - Do nothing. Make users use field(field name or index) or - Allow access to the fields via an indexable entity. Simplest for the user would be to use "field" itself: recArr.field[1] recArr.field["abc"] (i.e. field becomes an object that can be called or can be accessed via __getitem__) This could easily support index arrays (a topic you brought up and that sound appealing to me): recArr.field[index array] and it might even be practical to support: recArr.field[sequence of field indices and/or names] e.g. recArr.field[(ind 1, field name 2, ind 3...)] You asked about other issues. One that comes to mind is record arrays of record arrays. Should they be allowed? My gut reaction is yes if it's not too hard. Folks always seem to find a use for generality if it's offered. On the other hand, if it's hard, it's not worth the effort. If they are allowed, users are going to want some efficient way to get to a particular field (i.e. in one call even if the field is several recArrays deep). That could get messy. Thanks for a great posting. The improvements to record arrays sound first-rate. -- Russell From hsu at stsci.edu Wed Jul 21 11:53:40 2004 From: hsu at stsci.edu (Jin-chung Hsu) Date: Wed Jul 21 11:53:40 2004 Subject: [Numpy-discussion] formats in record array Message-ID: <200407211850.AOO09987@donner.stsci.edu> > From: Russell E Owen > Subject: Re: [Numpy-discussion] Proposed record array behavior: the rest > of the story > Will it be possible, when creating a new records array, to specify > types of a record array as a list of normal numarray types? Currently > one has to specify the types as a "formats" string, which is > nonstandard. In theory it is easy to do that except you can't specify cell arrays, i.e. how do you specify the equivalent of: formats=['3Int16', '(4,5)Float32'] with the numarray type instances? JC Hsu From rlw at stsci.edu Wed Jul 21 12:23:07 2004 From: rlw at stsci.edu (Rick White) Date: Wed Jul 21 12:23:07 2004 Subject: [Numpy-discussion] formats in record array In-Reply-To: <200407211850.AOO09987@donner.stsci.edu> Message-ID: On Wed, 21 Jul 2004, Jin-chung Hsu wrote: > > From: Russell E Owen > > Subject: Re: [Numpy-discussion] Proposed record array behavior: the rest > > of the story > > > > Will it be possible, when creating a new records array, to specify > > types of a record array as a list of normal numarray types? Currently > > one has to specify the types as a "formats" string, which is > > nonstandard. > > In theory it is easy to do that except you can't specify cell arrays, i.e. > how do you specify the equivalent of: > > formats=['3Int16', '(4,5)Float32'] > > with the numarray type instances? > > JC Hsu Well, how about one (or both) of these: formats = 3*(Int16,), 4*(5*(Float32,),) formats = (3,Int16), ((4,5), Float32) From kyeser at earthlink.net Wed Jul 21 18:19:07 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Wed Jul 21 18:19:07 2004 Subject: [Numpy-discussion] Is there a better way to do this? Message-ID: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> My question is not directly related to NumPy, but since many people here deal with numbers, I was wondering if I could get some help; it would be even better if there is a NumPy (or Numarray) function that takes care of what I want! I'm trying to write a program that computes six-digit numbers, in which the left digit is always smaller than its following digit (i.e., it's always ascending). The best I could do was to have many embedded 'for' statement: c = 1 for p0 in range(0, 7): for p1 in range(1, 12): for p2 in range(2, 12): for p3 in range(3, 12): for p4 in range(4, 12): for p5 in range(5, 12): if p0 < p1 < p2 < p3 < p4 < p5: print repr(c).rjust(3), "\t", print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) c += 1 print "...Done" This works, except that it's very slow. I need to get it up to nine-digit numbers, in which case it's significantly slow. I was wondering if there is a more efficient way to do this. I would highly appreciate it if anyone could help. Many thanks. -Kye From jcollins_boulder at earthlink.net Wed Jul 21 18:49:10 2004 From: jcollins_boulder at earthlink.net (Jeffery D. Collins) Date: Wed Jul 21 18:49:10 2004 Subject: [Numpy-discussion] Is there a better way to do this? In-Reply-To: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> References: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> Message-ID: <40FF1D11.8090606@earthlink.net> Hee-Seng Kye wrote: > My question is not directly related to NumPy, but since many people > here deal with numbers, I was wondering if I could get some help; it > would be even better if there is a NumPy (or Numarray) function that > takes care of what I want! > > I'm trying to write a program that computes six-digit numbers, in > which the left digit is always smaller than its following digit (i.e., > it's always ascending). The best I could do was to have many embedded > 'for' statement: > > c = 1 > for p0 in range(0, 7): > for p1 in range(1, 12): > for p2 in range(2, 12): > for p3 in range(3, 12): > for p4 in range(4, 12): > for p5 in range(5, 12): > if p0 < p1 < p2 < p3 < p4 < p5: > print repr(c).rjust(3), "\t", > print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) > c += 1 > print "...Done" > > This works, except that it's very slow. I need to get it up to > nine-digit numbers, in which case it's significantly slow. I was > wondering if there is a more efficient way to do this. > > I would highly appreciate it if anyone could help. This appears to give the same results and is significantly faster. def vers1(): c = 1 for p0 in range(0, 7): for p1 in range(p0+1, 12): for p2 in range(p1+1, 12): for p3 in range(p2+1, 12): for p4 in range(p3+1, 12): for p5 in range(p4+1, 12): print repr(c).rjust(3), "\t", print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) c += 1 print "...Done" > > Many thanks. > > -Kye > -- Jeff From rlw at stsci.edu Wed Jul 21 22:03:03 2004 From: rlw at stsci.edu (Rick White) Date: Wed Jul 21 22:03:03 2004 Subject: [Numpy-discussion] Is there a better way to do this? In-Reply-To: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> Message-ID: On Wed, 21 Jul 2004, Hee-Seng Kye wrote: > I'm trying to write a program that computes six-digit numbers, in which > the left digit is always smaller than its following digit (i.e., it's > always ascending). Here's another version that is a little faster still: def f3(): c = 1 for p0 in range(0, 7): for p1 in range(p0+1, 8): for p2 in range(p1+1, 9): for p3 in range(p2+1, 10): for p4 in range(p3+1, 11): for p5 in range(p4+1, 12): print repr(c).rjust(3), "\t", print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) c += 1 print "...Done" This is plenty fast even for 9-digit numbers. In fact it gets a little faster for larger numbers of digits. This problem is completely equivalent to the problem of finding all combinations of 6 numbers chosen from the digits 0..11. If you sort the digits of each combination in ascending order, you get your numbers. So if you search for something like "Python permutations combinations" you can find other algorithms that work. Here's a recursive version: def f4(n, digits=range(12)): if n==0: return [[]] rv = [] for i in range(len(digits)): for cc in f4(n-1,digits[i+1:]): rv.append([digits[i]]+cc) return rv That returns a list of all the number sets having n digits. It's slower than the loop version but is more general. There are fast C versions of this sort of thing out there, I think. Rick White From falted at pytables.org Thu Jul 22 02:47:27 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 22 02:47:27 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: References: Message-ID: <200407221146.41319.falted@pytables.org> Hi, I agree that numarray team's overhaul of RecArray access modes is very good and I agree most of it. A Dimarts 20 Juliol 2004 19:14, Russell E Owen va escriure: > I think recarray[field name] is too easily confused with > recarray[index] and is unnecessary. Yeah, maybe you are right. > I suggest one of two solutions: > - Do nothing. Make users use field(field name or index) > or > - Allow access to the fields via an indexable entity. Simplest for > the user would be to use "field" itself: > recArr.field[1] > recArr.field["abc"] > (i.e. field becomes an object that can be called or can be accessed > via __getitem__) I prefer the second one. Although I know that you don't like the __getattr__ method, the field object can be used to host one. The main advantage I see having such a __getattr__ method is that I'm very used to press TAB twice in the python console with its completion capabilities activated. It would be a very nice way of interactively discovering the fields of a RecArray object. I don't know whether this feature is used a lot or not out there, but for me is just great. I understand, however, that having to include a map to suport non-vbalid python names for field names can be quite inconvenient. Regards, -- Francesc Alted From cjw at sympatico.ca Thu Jul 22 05:22:01 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 22 05:22:01 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: <200407221146.41319.falted@pytables.org> References: <200407221146.41319.falted@pytables.org> Message-ID: <40FFB132.10103@sympatico.ca> Francesc Alted wrote: >Hi, > >I agree that numarray team's overhaul of RecArray access modes is very good >and I agree most of it. > >A Dimarts 20 Juliol 2004 19:14, Russell E Owen va escriure: > > >>I think recarray[field name] is too easily confused with >>recarray[index] and is unnecessary. >> >> > >Yeah, maybe you are right. > > > >>I suggest one of two solutions: >>- Do nothing. Make users use field(field name or index) >>or >>- Allow access to the fields via an indexable entity. Simplest for >>the user would be to use "field" itself: >> recArr.field[1] >> recArr.field["abc"] >>(i.e. field becomes an object that can be called or can be accessed >>via __getitem__) >> >> > >I prefer the second one. Although I know that you don't like the __getattr__ >method, the field object can be used to host one. The main advantage I see >having such a __getattr__ method is that I'm very used to press TAB twice in >the python console with its completion capabilities activated. It would be a >very nice way of interactively discovering the fields of a RecArray object. >I don't know whether this feature is used a lot or not out there, but for me >is just great. I understand, however, that having to include a map to >suport non-vbalid python names for field names can be quite inconvenient. > >Regards, > > Perry's issue 3. Perhaps there is a need to separate the name or identifier of a column in a RecArray or a field in a Record from its label. The labels, for display purposes, would default to the column names. The column names would default, as at present, to the Cn form. I like the use of attributes for the column names, it avoids the problem Russell Owen mentioned above. Suppose we have a simple RecArray with the fields "name" and "age", it's much simpler to write rec.name or rec.age that rec["name"] or rec["age"]. The problems with the use of attributes, which must be Python names, are (1) they cannot have accented or special characters eg ?, ?, @, & * etc. and (2) there is a danger of conflict with existing properties or attributes. My guess is that the special characters would be required primarily for display purposes. Thus, the label could meet that need. The danger of conflict could be addressed by raising an exception. There remains a possible problem where identifiers are passed on from some other system, perhaps a database. Thus, the primary identifier of a row in a RecArray would be an integer index and that of a column or field would be a standard Python identifer. Although, at times, it would be useful to be able to index the individual fields (or columns) as part of the usual indexing scheme. Thus rec[2, 3, 4] could identify a record and rec[2, 3, 4].age or rec[2, 3, 4, 5] could identify the sixth field in that record. The use of attributes raises the possibility that one could have nested records. For example, suppose one has an address record: addressRecord streetNumber streetName postalCode ... There could then be a personal record: personRecord ... officeAddress homeAddress ... One could address a component as rec.homeAddress.postalCode. Finally, there was mention, earlier in the discussion, of facilitating the indexing of a RecArray. I hope that some way will be found to do this. Colin W. From kyeser at earthlink.net Thu Jul 22 13:24:06 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Thu Jul 22 13:24:06 2004 Subject: [Numpy-discussion] Is there a better way to do this? In-Reply-To: References: Message-ID: Thanks a lot everyone for suggestions. On my slow machine (667 MHz), inefficient programs run even slower, and when I expand the program to calculate 9-digit numbers, there is almost a 2-minute difference! Thanks again. Best, Kye From sag at hydrosphere.com Thu Jul 22 15:34:11 2004 From: sag at hydrosphere.com (sag at hydrosphere.com) Date: Thu Jul 22 15:34:11 2004 Subject: [Numpy-discussion] Unpickling python 2.2 UserArray objs in python 2.3 Message-ID: <40FFF0A2.26467.FBF2E27@localhost> I have a large bunch of objects that subclass UserArray from Numeric 22. These objects were created and pickled in binary mode in Python2.2 and stored in a mysql database on Red hat 8. Using Python2.2, I can easily retrieve and unpickle the objects. I have just upgraded the system to Fedora Core 2 which supplies Python 2.3.3. After much hassle, I have been able to compile Numeric 1.0 (ver 23) and have tried to unpickle these objects. Now, I get a failure in the loads call. The code is: import cPickle obj = cPickle.loads(str(blob)) When this is called, the python interpreter (via IDLE) goes into a loop in the UserArray __getattr__ function.(line 198): return getattr(self.array,attr) >> File "/usr/lib/python2.3/site-packages/Numeric/UserArray.py" line 198, in __getattr__ >> return getattr(self.array,attr) No other error is reported, just a stack full of these lines. It seems that at this point, UserArray doesn't know that it has an 'array' attr. This worked just fine in Python2.2. Has something changed in Python2.3 cPickle functions or in how Numeric 23 handles pickle/unpickle that would make my Python2.2 blobs unusable in Python 2.3? Is there a solution for this, other than remaking my blobs (not an option - there are literally millions of them), or must I figure out how to access python2.2 for this code? So far as I can tell, the string I get back is exactly the same for both versions. Any help you can give me would be appreciated. Thanks sue giller From kyeser at earthlink.net Fri Jul 23 07:31:07 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Fri Jul 23 07:31:07 2004 Subject: [Numpy-discussion] A bit long, but would appreciate anyone's help, if time permits! Message-ID: Hi. Like my previous post, my question is not directly related to Numpy, but I couldn't help posting it since many people here deal with numbers. I have a question that requires a bit of explanation. I would highly appreciate it if anyone could read this and offer any suggestions, whenever time permits. I'm trying to write a program that 1) gives all possible rotations of an ordered list, 2) chooses the ordering that has the smallest difference from first to last element of the rotation, and 3) continues to compare the difference from first to second-to-last element, and so on, if there was a tie in step 2. The following is the output of a function I wrote. The first 6 lines are all possible rotations of [0,1,3,6,7,10], and this takes care of step 1 mentioned above. The last line provides the differences (mod 12). If the last line were denoted as r, r[0] lists the differences from first to last element of each rotation (p0 through p5), r[1] the differences from first to second-to-last element, and so on. >>> from normal import normal >>> normal([0,1,3,6,7,10]) [0, 1, 3, 6, 7, 10] #p0 [1, 3, 6, 7, 10, 0] #p1 [3, 6, 7, 10, 0, 1] #p2 [6, 7, 10, 0, 1, 3] #p3 [7, 10, 0, 1, 3, 6] #p4 [10, 0, 1, 3, 6, 7] #p5 [[10, 11, 10, 9, 11, 9], [7, 9, 9, 7, 8, 8], [6, 6, 7, 6, 6, 5], [3, 5, 4, 4, 5, 3], [1, 2, 3, 1, 3, 2]] #r Here is my question. I'm having trouble realizing step 2 (and 3, if necessary). In the above case, the smallest number in r[0] is 9, which is present in both r[0][3] and r[0][5]. This means that p3 and p5 and only p3 and p5 need to be further compared. r[1][3] is 7, and r[1][5] is 8, so the comparison ends here, and the final result I'm looking for is p3, [6,7,10,0,1,3] (the final 'n' value for 'pn' corresponds to the final 'y' value for 'r[x][y]'). How would I find the smallest values of a list r[0], take only those values (r[0][3] and r[0][5]) for further comparison (r[1][3] and r[1][5]), and finally print a p3? Thanks again for reading this. If there is anything unclear, please let me know. Best, Kye My code begins here: #normal.py def normal(s): s.sort() r = [] q = [] v = [] for x in range(0, len(s)): k = s[x:]+s[0:x] r.append(k) for y in range(0, len(s)): print r[y], '\t' d = [] for yy in range(len(s)-1, 0, -1): w = (r[y][yy]-r[y][0])%12 d.append(w) q.append(d) for z in range(0, len(s)-1): d = [] for zz in range(0, len(s)): w = q[zz][z] d.append(w) v.append(d) print '\n', v From sag at hydrosphere.com Fri Jul 23 10:09:11 2004 From: sag at hydrosphere.com (sag at hydrosphere.com) Date: Fri Jul 23 10:09:11 2004 Subject: [Numpy-discussion] re: Unpickling python 2.2 userArray objs in python 2.3 Message-ID: <4100F5DD.17007.13BB9C82@localhost> I have further information on my problem of unpickling an object that is based on Numeric.UserArray class. I can recreate the endless getattr loop with the following code, which is a small subsection of my class: data = Numeric.ones(31,savespace=1) ua = UserArray(data) blob = cPickle.dumps(ua) obj = cPickle.loads(blob) <-- fails here If you pickle the data obj, everything works. This code works in Python2.2. Is this a bug? Is it fixable? sue From jmiller at stsci.edu Fri Jul 23 10:30:15 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 23 10:30:15 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <20040718212443.M21561@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040718212443.M21561@grenoble.cnrs.fr> Message-ID: <1090603727.7138.33.camel@halloween.stsci.edu> Hi Gerard, I finally got to your numnum stuff today... awesome work! You've got lots of good suggestions. Here are some comments: 1. Thanks for catching the early return problem with numarray's import_array(). It's not just bad, it's wrong. It'll be fixed for 1.1. 2. That said, I think expanding the macros in-line in numnum is a mistake. It seems to me that "import_array(); PyErr_Clear();" or something like it ought to be enough... after numarray-1.1 anyway. 3. I think there's a problem in numnum.toNP() because of numarray's array "behavior" issues. A test needs to be done to ensure that the incoming array is not byteswapped or misaligned; if it is, the easy fix is to make a numarray copy of the array before copying it to Numeric. 4. Kudos for the LP64 stuff. numconfig is a thorn in the side of the PEP, so I'll put your techniques into numarray for 1.1. HAS_FLOAT128 is not currently used, so it might be time to ditch it. Anyway, thanks! 5. PyArray_Present() and isArray() are superfluous *now*. I was planning to add them to Numeric. 6. The LGPL may be a problem for us and is probably an issue if we ever try to get numnum into the Python distribution. It would be better to release numnum under the modified BSD license, same as numarray. 7. Your API struct was very clean. Eventually I'll regenerate numarray like that. 8. I logged your comments and bug reports on Source Forge and eventually they'll get fixed. A to Z the numnum/pep code is beautiful. Next stop, header PEP update. Regards, Todd On Sun, 2004-07-18 at 17:24, gerard.vermeulen at grenoble.cnrs.fr wrote: > Hi Todd, > > This is a follow-up on the 'header pep' discussion. > > The attachment numnum-0.1.tar.gz contains the sources for the > extension modules pep and numnum. At least on my systems, both > modules behave as described in the 'numarray header PEP' when the > extension modules implementing the C-API are not present (a situation > not foreseen by the macros import_array() of Numeric and especially > numarray). IMO, my solution is 'bona fide', but requires further > testing. > > The pep module shows how to handle the colliding C-APIs of the Numeric > and numarray extension modules and how to implement automagical > conversion between Numeric and numarray arrays. > > For a technical reason explained in the README, the hard work of doing > the conversion between Numeric and numarray arrays has been delegated > to the numnum module. The numnum module is useful when one needs to > convert from one array type to the other to use an extension module > which only exists for the other type (eg. combining numarray's image > processing extensions with pygame's Numeric interface): > > Python 2.3+ (#1, Jan 7 2004, 09:17:35) > [GCC 3.3.1 (SuSE Linux)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numnum; import Numeric as np; import numarray as na > >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) > >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) > >>> print type(np1); np1; type(np2); np2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]],'i') > >>> print type(na1); na1; type(na2); na2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]]) > >>> > > The pep module shows how to implement array processing functions which > use the Numeric, numarray or Sequence C-API: > > static PyObject * > wysiwyg(PyObject *dummy, PyObject *args) > { > PyObject *seq1, *seq2; > PyObject *result; > > if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) > return NULL; > > switch(API) { > case NumericAPI: > { > PyObject *np1 = NN_API->toNP(seq1); > PyObject *np2 = NN_API->toNP(seq2); > result = np_wysiwyg(np1, np2); > Py_XDECREF(np1); > Py_XDECREF(np2); > break; > } > case NumarrayAPI: > { > PyObject *na1 = NN_API->toNA(seq1); > PyObject *na2 = NN_API->toNA(seq2); > result = na_wysiwyg(na1, na2); > Py_XDECREF(na1); > Py_XDECREF(na2); > break; > } > case SequenceAPI: > result = seq_wysiwyg(seq1, seq2); > break; > default: > PyErr_SetString(PyExc_RuntimeError, "Should never happen"); > return 0; > } > > return result; > } > > See the README for an example session using the pep module showing that > it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). > > Notes: > > - it is straightforward to adapt pep and numnum so that the conversion > functions are linked into pep instead of imported. > > - numnum is still 'proof of concept'. I am thinking about methods to > make those techniques safer if the numarray (and Numeric?) header > files make it never into the Python headers (or make it safer to > use those techniques with Python < 2.4). In particular it would > be helpful if the numerical C-APIs export an API version number, > similar to the versioning scheme of shared libraries -- see the > libtool->versioning info pages. > > I am considering three possibilities to release a more polished > version of numnum (3rd party extension writers may prefer to link > rather than import numnum's functionality): > > 1. release it from PyQwt's project page > 2. register an independent numnum project at SourceForge > 3. hand numnum over to the Numerical Python project (frees me from > worrying about API changes). > > > Regards -- Gerard Vermeulen -- From eric at enthought.com Fri Jul 23 10:56:07 2004 From: eric at enthought.com (eric jones) Date: Fri Jul 23 10:56:07 2004 Subject: [Numpy-discussion] ANN: SciPy04 -- Last day for abstracts and early registration! Message-ID: <4101510B.9050005@enthought.com> Hey Group, Just a reminder that this is the last day to submit abstracts for SciPy04. It is also the last day for early registration. More information is here: http://www.scipy.org/wikis/scipy04 About the Conference and Keynote Speaker --------------------------------------------- The 1st annual *SciPy Conference* will be held this year at Caltech, September 2-3, 2004. As some of you may know, we've experienced great participation in two SciPy "Workshops" (with ~70 attendees in both 2002 and 2003) and this year we're graduating to a "conference." With the prestige of a conference comes the responsibility of a keynote address. This year, Jim Hugunin has answered the call and will be speaking to kickoff the meeting on Thursday September 2nd. Jim is the creator of Numeric Python, Jython, and co-designer of AspectJ. Jim is currently working on IronPython--a fast implementation of Python for .NET and Mono. Presenters ----------- We still have room for a few more standard talks, and there is plenty of room for lightning talks. Because of this, we are extending the abstract deadline until July 23rd. Please send your abstract to abstracts at scipy.org. Travis Oliphant is organizing the presentations this year. (Thanks!) Once accepted, papers and/or presentation slides are acceptable and are due by August 20, 2004. Registration ------------- Early registration ($100.00) has been extended to July 23rd. Follow the links off of the main conference site: http://www.scipy.org/wikis/scipy04 After July 23rd, registration will be $150.00. Registration includes breakfast and lunch Thursday & Friday and a very nice dinner Thursday night. Please register as soon as possible as it will help us in planning for food, room sizes, etc. Sprints -------- As of now, we really haven't had much of a call for coding sprints for the 3 days prior to SciPy 04. Below is the original announcement about sprints. If you would like to suggest a topic and see if others are interested, please send a message to the list. Otherwise, we'll forgo the sprints session this year. We're also planning three days of informal "Coding Sprints" prior to the conference -- August 30 to September 1, 2004. Conference registration is not required to participate in the sprints. Please email the list, however, if you plan to attend. Topics for these sprints will be determined via the mailing lists as well, so please submit any suggestions for topics to the scipy-user list: list signup: http://www.scipy.org/mailinglists/ list address: scipy-user at scipy.org thanks, eric From cjw at sympatico.ca Sat Jul 24 07:18:04 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Jul 24 07:18:04 2004 Subject: [Numpy-discussion] A bit long, but would appreciate anyone's help, if time permits! In-Reply-To: References: Message-ID: <41026F91.3090706@sympatico.ca> Hee-Seng Kye wrote: > Hi. Like my previous post, my question is not directly related to Numpy, True, but numarray can be of help. > but I couldn't help posting it since many people here deal with > numbers. I have a question that requires a bit of explanation. I > would highly appreciate it if anyone could read this and offer any > suggestions, whenever time permits. > > I'm trying to write a program that 1) gives all possible rotations of > an ordered list, 2) chooses the ordering that has the smallest > difference from first to last element of the rotation, and 3) > continues to compare the difference from first to second-to-last > element, and so on, if there was a tie in step 2. > > The following is the output of a function I wrote. The first 6 lines > are all possible rotations of [0,1,3,6,7,10], and this takes care of > step 1 mentioned above. The last line provides the differences (mod > 12). If the last line were denoted as r, r[0] lists the differences > from first to last element of each rotation (p0 through p5), r[1] the > differences from first to second-to-last element, and so on. > > >>> from normal import normal > >>> normal([0,1,3,6,7,10]) > [0, 1, 3, 6, 7, 10] #p0 > [1, 3, 6, 7, 10, 0] #p1 > [3, 6, 7, 10, 0, 1] #p2 > [6, 7, 10, 0, 1, 3] #p3 > [7, 10, 0, 1, 3, 6] #p4 > [10, 0, 1, 3, 6, 7] #p5 > > [[10, 11, 10, 9, 11, 9], [7, 9, 9, 7, 8, 8], [6, 6, 7, 6, 6, 5], [3, > 5, 4, 4, 5, 3], [1, 2, 3, 1, 3, 2]] #r > > Here is my question. I'm having trouble realizing step 2 (and 3, if > necessary). In the above case, the smallest number in r[0] is 9, > which is present in both r[0][3] and r[0][5]. This means that p3 and > p5 and only p3 and p5 need to be further compared. r[1][3] is 7, and > r[1][5] is 8, so the comparison ends here, and the final result I'm > looking for is p3, [6,7,10,0,1,3] (the final 'n' value for 'pn' > corresponds to the final 'y' value for 'r[x][y]'). > > How would I find the smallest values of a list r[0], take only those > values (r[0][3] and r[0][5]) for further comparison (r[1][3] and > r[1][5]), and finally print a p3? > > Thanks again for reading this. If there is anything unclear, please > let me know. > > Best, > Kye > > My code begins here: [snip] The following reproduces your result, but I'm not sure that it does what you want to do. Best wishes. Colin W. # Kye.py #normal.py def normal(s): s.sort() r = [] q = [] v = [] for x in range(0, len(s)): k = s[x:]+s[0:x] r.append(k) for y in range(0, len(s)): print r[y], '\t' d = [] for yy in range(len(s)-1, 0, -1): w = (r[y][yy]-r[y][0])%12 d.append(w) q.append(d) for z in range(0, len(s)-1): d = [] for zz in range(0, len(s)): w = q[zz][z] d.append(w) v.append(d) print '\n', v def findMinima(i, lst): global diff print 'lst:', lst, 'i:', i res= [] dataRow= diff[i].take(lst) fnd= dataRow.argmin() val= val0= dataRow[fnd] while val == val0: fndRes= lst[fnd] # This will become the result iff no dupicate found res.append(fnd) dataRow[fnd]= 100 fnd= dataRow.argmin() val0= dataRow[fnd] if len(res) == 1: return fndRes else: ret= findMinima(i-1, res) return ret def normal1(s): import numarray.numarraycore as _num import numarray.numerictypes as _nt global diff s= _num.array(s) s.sort() rl= len(s) r= _num.zeros(shape= (rl, rl), type= _nt.Int) for i in range(rl): r[i, 0:rl-i]= s[i:] if i: r[i, rl-i:]= s[0:i] subtr= r[0].repeat(5, 1).resize(6, 5) subtr.transpose() neg= r[1:] < subtr diff= r[1:]-subtr + 12 * neg return 'The selectect rotation is:', r[findMinima(diff._shape[0]-1, range(diff._shape[1]))] if __name__ == '__main__': print normal1([0,1,3,6,7,10]) > > #normal.py > def normal(s): > s.sort() > r = [] > q = [] > v = [] > > for x in range(0, len(s)): > k = s[x:]+s[0:x] > r.append(k) > > for y in range(0, len(s)): > print r[y], '\t' > d = [] > for yy in range(len(s)-1, 0, -1): > w = (r[y][yy]-r[y][0])%12 > d.append(w) > q.append(d) > > for z in range(0, len(s)-1): > d = [] > for zz in range(0, len(s)): > w = q[zz][z] > d.append(w) > v.append(d) > print '\n', v > > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From riiuwjjnivge at yahoo.com Sat Jul 24 08:38:04 2004 From: riiuwjjnivge at yahoo.com (riiuwjjnivge at yahoo.com) Date: Sat Jul 24 08:38:04 2004 Subject: [Numpy-discussion] Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns R753KT98 Message-ID: <249974lbl4oi11j$1so1q6g39$95a678wba@airmen.yahoo.com> E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. A_RM_M - our NEW stck pick - GREAT N.EWS V650OE49 >A.RMM - our NEW s_t_0_c_k p1ck = GREAT N_E_WS V3501136 NnnEW St_ock Pick - Hug.e Mon-day - /ArMm\ m468MV68 NewW Stoc-k Pick + Hug.e Mon-day - ArMm = Earn_1ngs 1497cJ72 Mas_sive G.a1ns - F0r-casted For Mond#y g984iJ69 Monday F0rcaSST is A>R.M.M - Read & Earnn Z8697B79 In-Creased Earn-ings Report - AR-MM - For Monday Morning l547BH81 EX PLO SIVE Gain-s - ALERT for MONDAY T288xC38 NewsWire - Double your Monday Earn>ings! q664qv16 A,L,E,R,T - A>R>M>M- This st0ck is h0t - They announced great news l993L941 A>RM is about to EXPL0DE - A c t n_o_w Z484TE26 - Ma-jor TradeeE Al_ert! !e330vH15 1O to 2O cent in=crease monday. Ma_jor ALer.t. c8620c55 New P1ck Bownd to Dou_ble & Tri_ple. A.R/M.M.. I942qD93 B1gGa1ns For-M0nday = (2X)Double Your Pr0fits!y747s506 UpCOMING Mondays Hot/test St O CK {2x} PROF!TS L572lS00 Get Ins1ders SEcrEt_s - A|R|M|M Sets to Expl0de U812Jb41 Ab0ut To Expl0de - y142qK13 Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns 7074WE36 M0nday Ga1ns, *ARMM*, St0ck NewsW1re g504mo93 {3x} Ur m0nDay Pr0FITS - A\R\M\M w433T229 Break.ing New.s for ARM.M - American Resource Management, Inc. E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. barycentric deform conservator cacophony critter addison armament complain difluoride boris discriminatory boron abo deoxyribose boorish compote belfast carolingian court albania accentuate belshazzar bridesmaid breakwater brandish average bolshevism coppery From riiuwjjnivge at yahoo.com Sat Jul 24 08:42:06 2004 From: riiuwjjnivge at yahoo.com (riiuwjjnivge at yahoo.com) Date: Sat Jul 24 08:42:06 2004 Subject: [Numpy-discussion] Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns R753KT98 Message-ID: <249974lbl4oi11j$1so1q6g39$95a678wba@airmen.yahoo.com> E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. A_RM_M - our NEW stck pick - GREAT N.EWS V650OE49 >A.RMM - our NEW s_t_0_c_k p1ck = GREAT N_E_WS V3501136 NnnEW St_ock Pick - Hug.e Mon-day - /ArMm\ m468MV68 NewW Stoc-k Pick + Hug.e Mon-day - ArMm = Earn_1ngs 1497cJ72 Mas_sive G.a1ns - F0r-casted For Mond#y g984iJ69 Monday F0rcaSST is A>R.M.M - Read & Earnn Z8697B79 In-Creased Earn-ings Report - AR-MM - For Monday Morning l547BH81 EX PLO SIVE Gain-s - ALERT for MONDAY T288xC38 NewsWire - Double your Monday Earn>ings! q664qv16 A,L,E,R,T - A>R>M>M- This st0ck is h0t - They announced great news l993L941 A>RM is about to EXPL0DE - A c t n_o_w Z484TE26 - Ma-jor TradeeE Al_ert! !e330vH15 1O to 2O cent in=crease monday. Ma_jor ALer.t. c8620c55 New P1ck Bownd to Dou_ble & Tri_ple. A.R/M.M.. I942qD93 B1gGa1ns For-M0nday = (2X)Double Your Pr0fits!y747s506 UpCOMING Mondays Hot/test St O CK {2x} PROF!TS L572lS00 Get Ins1ders SEcrEt_s - A|R|M|M Sets to Expl0de U812Jb41 Ab0ut To Expl0de - y142qK13 Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns 7074WE36 M0nday Ga1ns, *ARMM*, St0ck NewsW1re g504mo93 {3x} Ur m0nDay Pr0FITS - A\R\M\M w433T229 Break.ing New.s for ARM.M - American Resource Management, Inc. E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. barycentric deform conservator cacophony critter addison armament complain difluoride boris discriminatory boron abo deoxyribose boorish compote belfast carolingian court albania accentuate belshazzar bridesmaid breakwater brandish average bolshevism coppery From kyeser at earthlink.net Sun Jul 25 04:25:14 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Sun Jul 25 04:25:14 2004 Subject: [Numpy-discussion] Permutation in Numpy Message-ID: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> #perm.py def perm(k): # Compute the list of all permutations of k if len(k) <= 1: return [k] r = [] for i in range(len(k)): s = k[:i] + k[i+1:] p = perm(s) for x in p: r.append(k[i:i+1] + x) return r Does anyone know if there is a built-in function in Numpy (or Numarray) that does the above task faster (computes the list of all permutations of a list, k)? Or is there a way to make the above function run faster using Numpy? I'm asking because I need to create a very large list which contains all permutations of range(12), in which case there would be 12! permutations. I created a file test.py: #!/usr/bin/env python from perm import perm print perm(range(12)) And ran the program: $ ./test.py >> list.txt The program ran for about 90 minutes and was still running on my machine (667 MHz PowerPC G4, 512 MB SDRAM) until I quit the process as I was getting nervous (and impatient). I would highly appreciate anyone's suggestions. Many thanks, Kye From gerard.vermeulen at grenoble.cnrs.fr Sun Jul 25 22:49:12 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Sun Jul 25 22:49:12 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <1090603727.7138.33.camel@halloween.stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040718212443.M21561@grenoble.cnrs.fr> <1090603727.7138.33.camel@halloween.stsci.edu> Message-ID: <20040726050416.M83815@grenoble.cnrs.fr> Hi Todd, Attached is a new version of numnum (including 'topbot', an alternative implementation of numnum). The README contains some additional comments with respect to numarray and Numeric (new comments are preceeded by '+', old comments by '-'). There were still some other bugs in numnum, too. On 23 Jul 2004 13:28:47 -0400, Todd Miller wrote > I finally got to your numnum stuff today... awesome work! You've got > lots of good suggestions. Here are some comments: > > 1. Thanks for catching the early return problem with numarray's > import_array(). It's not just bad, it's wrong. It'll be fixed for 1.1. > > 2. That said, I think expanding the macros in-line in numnum is a > mistake. It seems to me that "import_array(); PyErr_Clear();" or > something like it ought to be enough... after numarray-1.1 anyway. > Indeed, but I am spoiled by C++ and was falling back on gcc -E for debugging. > > 3. I think there's a problem in numnum.toNP() because of numarray's > array "behavior" issues. A test needs to be done to ensure that the > incoming array is not byteswapped or misaligned; if it is, the easy > fix is to make a numarray copy of the array before copying it to Numeric. > Done, but what would be the best function to do this? And the documentation could insist a little more on the possibility of ill-behaved arrays (see README). > > 4. Kudos for the LP64 stuff. numconfig is a thorn in the side of the > PEP, so I'll put your techniques into numarray for 1.1. > HAS_FLOAT128 is not currently used, so it might be time to ditch > it. Anyway, thanks! > There is a difference between the PEP header files and internal numarray usage. I find in my CVS working copy: [packer at slow numarray]$ grep HAS_FLOAT */* Src/_ndarraymodule.c:#if HAS_FLOAT128 and [packer at slow numarray]$ grep HAS_UINT64 */* Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 but that is not be true for the header files (more important for the PEP) [packer at slow Include]$ grep HAS_UINT64 */* [packer at slow Include]$ grep HAS_FLOAT128 */* numarray/arraybase.h:#if HAS_FLOAT128 > > 5. PyArray_Present() and isArray() are superfluous *now*. I was > planning to add them to Numeric. > > 6. The LGPL may be a problem for us and is probably an issue if we ever > try to get numnum into the Python distribution. It would be better > to release numnum under the modified BSD license, same as numarray. > Done, with certain regrets because I believe in (L)GPL. The minutes of the last board meeting of the PSF tipped the scale ( http://www.python.org/psf/records/board/minutes-2004-06-18.html ) What remains to be done is showing how to add numnum's functionality to a 3rd party extension by linking numnum's object files to the extension instead of importing numnum's C-API (numnum should not become another dependency) Gerard > > 7. Your API struct was very clean. Eventually I'll regenerate numarray > like that. > > 8. I logged your comments and bug reports on Source Forge and eventually > they'll get fixed. > > A to Z the numnum/pep code is beautiful. Next stop, header PEP update. > > Regards, > Todd > > > On Sun, 2004-07-18 at 17:24, gerard.vermeulen at grenoble.cnrs.fr wrote: > > Hi Todd, > > > > This is a follow-up on the 'header pep' discussion. > > > > The attachment numnum-0.1.tar.gz contains the sources for the > > extension modules pep and numnum. At least on my systems, both > > modules behave as described in the 'numarray header PEP' when the > > extension modules implementing the C-API are not present (a situation > > not foreseen by the macros import_array() of Numeric and especially > > numarray). IMO, my solution is 'bona fide', but requires further > > testing. > > > > The pep module shows how to handle the colliding C-APIs of the Numeric > > and numarray extension modules and how to implement automagical > > conversion between Numeric and numarray arrays. > > > > For a technical reason explained in the README, the hard work of doing > > the conversion between Numeric and numarray arrays has been delegated > > to the numnum module. The numnum module is useful when one needs to > > convert from one array type to the other to use an extension module > > which only exists for the other type (eg. combining numarray's image > > processing extensions with pygame's Numeric interface): > > > > Python 2.3+ (#1, Jan 7 2004, 09:17:35) > > [GCC 3.3.1 (SuSE Linux)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import numnum; import Numeric as np; import numarray as na > > >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) > > >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) > > >>> print type(np1); np1; type(np2); np2 > > > > array([[1, 2], > > [3, 4]]) > > > > array([[1, 2, 3], > > [4, 5, 6]],'i') > > >>> print type(na1); na1; type(na2); na2 > > > > array([[1, 2], > > [3, 4]]) > > > > array([[1, 2, 3], > > [4, 5, 6]]) > > >>> > > > > The pep module shows how to implement array processing functions which > > use the Numeric, numarray or Sequence C-API: > > > > static PyObject * > > wysiwyg(PyObject *dummy, PyObject *args) > > { > > PyObject *seq1, *seq2; > > PyObject *result; > > > > if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) > > return NULL; > > > > switch(API) { > > case NumericAPI: > > { > > PyObject *np1 = NN_API->toNP(seq1); > > PyObject *np2 = NN_API->toNP(seq2); > > result = np_wysiwyg(np1, np2); > > Py_XDECREF(np1); > > Py_XDECREF(np2); > > break; > > } > > case NumarrayAPI: > > { > > PyObject *na1 = NN_API->toNA(seq1); > > PyObject *na2 = NN_API->toNA(seq2); > > result = na_wysiwyg(na1, na2); > > Py_XDECREF(na1); > > Py_XDECREF(na2); > > break; > > } > > case SequenceAPI: > > result = seq_wysiwyg(seq1, seq2); > > break; > > default: > > PyErr_SetString(PyExc_RuntimeError, "Should never happen"); > > return 0; > > } > > > > return result; > > } > > > > See the README for an example session using the pep module showing that > > it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). > > > > Notes: > > > > - it is straightforward to adapt pep and numnum so that the conversion > > functions are linked into pep instead of imported. > > > > - numnum is still 'proof of concept'. I am thinking about methods to > > make those techniques safer if the numarray (and Numeric?) header > > files make it never into the Python headers (or make it safer to > > use those techniques with Python < 2.4). In particular it would > > be helpful if the numerical C-APIs export an API version number, > > similar to the versioning scheme of shared libraries -- see the > > libtool->versioning info pages. > > > > I am considering three possibilities to release a more polished > > version of numnum (3rd party extension writers may prefer to link > > rather than import numnum's functionality): > > > > 1. release it from PyQwt's project page > > 2. register an independent numnum project at SourceForge > > 3. hand numnum over to the Numerical Python project (frees me from > > worrying about API changes). > > > > > > Regards -- Gerard Vermeulen > > -- -- Open WebMail Project (http://openwebmail.org) -------------- next part -------------- A non-text attachment was scrubbed... Name: numnum-0.2.tar.gz Type: application/gzip Size: 19729 bytes Desc: not available URL: From perry at stsci.edu Mon Jul 26 08:44:06 2004 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jul 26 08:44:06 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <40FFB132.10103@sympatico.ca> Message-ID: I'll try to see if I can address all the comments raised (please let me know if I missed something). 1) Russell Owen asked that indexing by field name not be permitted for record arrays and at least one other agreed. Since it is easier to add something like this later rather than take it away, I'll go along with that. So while it will be possible to index a Record by field name, it won't be for record arrays. 2) Russell asked if it would be possible to specify the types of the fields using numarray/chararray type objects. Yes, it will. We will adopt Rick White's 2nd suggestion for handling fields that themselves are arrays, I.e., formats = (3,Int16), ((4,5), Float32) For a 1-d Int16 cell of shape (3,) and a 2-d Float32 cell of shape (4,5) The first suggestion ("formats = 3*(Int16,), 4*(5*(Float32,),)") will not be supported. While it is very suggestive, it does allow for inconsistent nestings that must be checked and rejected (what if someone supplies (Int16, Int16, Float32) as one of the fields?) which complicates the code. It doesn't read as well. 3) Russell also suggested nesting record arrays. This sort of capability is not being ruled out, but there isn't a chance we can devote resources to this any time soon (can anyone else?) 4) To address the suggestions of Russell and Francesc, I'm proposing that the current "field" method now become an object (callable to retain backward compatibility) that supports: a) indexing by name or number (just like Records) b) name to attribute mapping (with restrictions). So that this means 3 ways to do things! As far as attribute access goes, I simply do not want to throw arbitrary attributes into the main object itself. The use of field is comparatively clean since it has not other public attributes. Aside from mapping '_' into spaces, no other illegal attribute characters will be mapped. (The identifier/label suggestion by Colin Williams has some merit, but on the whole, I think it brings more baggage than benefit). The mapping algorithm is such that it tries to map the attribute to any field name that has either a ' ' or '_' in the place of '_' in the attribute name. While all '_' in the name will take precedence over any other match, there will be no guaranteed order for other cases (e.g., 'x_y z' vs 'x y_z' vs 'x y z'; though 'x_y_z' would be guaranteed to be selected for field.x_y_z if present) Note that the only real need to support indexing other than consistency is to support slices. Only slices for numerical indexing will be supported (and not initially). The callable syntax can support index arrays just as easily. To summarize Rarr.field.home_address Rarr.field['home address'] Rarr.field('home address') Will all work for a field named "home address" ************************************************ Any comments on these changes to the proposal? Are there those that are opposed to supporting attribute access? Thanks, Perry From rowen at u.washington.edu Mon Jul 26 09:40:06 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Mon Jul 26 09:40:06 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: >I'll try to see if I can address all the comments raised (please let me know >if I missed something). >...(nice proposal elided)... >Any comments on these changes to the proposal? Are there those that are >opposed to supporting attribute access? Overall this sounds great. However, I am still strongly against attribute access. Attributes are usually meant for names that are intrinsic to the design of an object, not to the user's "configuration" of the object. The name mapping proposal isn't bad (thank you for keeping it simple!), but it still feels like a kludge and it adds unnecessary clutter. Your explanation of this limitations was clear, but still, imagine putting that into the manual. It's a lot of "be careful of this" info. That's a red flag to me. Imagine all the folks who don't read carefully. Also imagine those who consider attribute access "the right way to do it" and so want to clean up the limitations. I think you'll see a steady stream of: "why can't I see my field..." "why can't you solve the collision problems" "why can't I use special character thus and so" I personally feel that when a feature is hard to document or adds strange limitations then it probably suggests a flawed design. In this case there is another mechanism that is more natural, has no funny corner cases, and is much more powerful. Its only disadvantage is the need for typing for 4 extra characters. Saving 4 characters simply not sufficient reason to add this dubious feature. Before implementing attribute access I have two suggestions (which can be taken singly or together): - Postpone the decision until after the rest of the proposal is implemented. See if folks are happy with the mechanisms that are available. I freely confess to hoping that momentum will then kill the idea. - Discuss it on comp.lang.py. I'd like to see it aired more widely before being adopted. So far I've seen just a few voices for it and a few others against it. I realize it's not a democracy -- those who write the code get the final say. I also realize some folks will always want it, but that tension between simplicity and expressiveness is intrinsic to any language. If you add everything anybody wants you get a mess, and I want to avoid this mess while we still can. I hope nobody takes offense. I certainly did not mean to imply that those who wish attribute access are inferior in any way. There are features of python I wish it had that will never occur. I honestly can see the appeal of attributes; I was in favor of them myself, early on. It adds an appealing expressiveness that makes some kind of code read more naturally. But I personally feel it has too many limitations and is unnecessary. Regards, -- Russell From falted at pytables.org Mon Jul 26 11:12:18 2004 From: falted at pytables.org (Francesc Alted) Date: Mon Jul 26 11:12:18 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <200407262011.33067.falted@pytables.org> Hi, Perry, your last proposal sounds good to me. Just a couple of comments. A Dilluns 26 Juliol 2004 17:43, Perry Greenfield va escriure: > 4) To address the suggestions of Russell and Francesc, I'm proposing that > the current "field" method now become an object (callable to retain backward > compatibility) that supports: > a) indexing by name or number (just like Records) > b) name to attribute mapping (with restrictions). > So that this means 3 ways to do things! As far as attribute access goes, I > simply do not want to throw arbitrary attributes into the main object > itself. The use of field is comparatively clean since it has not other > public attributes. Aside from mapping '_' into spaces, no other illegal > attribute characters will be mapped. (The identifier/label suggestion by > Colin Williams has some merit, but on the whole, I think it brings more > baggage than benefit). The mapping algorithm is such that it tries to map > the attribute to any field name that has either a ' ' or '_' in the place of > '_' in the attribute name. While all '_' in the name will take precedence > over any other match, there will be no guaranteed order for other cases > (e.g., 'x_y z' vs 'x y_z' vs 'x y z'; though 'x_y_z' would be guaranteed to > be selected for field.x_y_z if present) I guess that this mapping algorithm is weak enough to create some problems with special chars that are not suported. I'd prefer the dictionary/tuple of pairs mechanism in order to create a user-configured translation. I don't see the problem that Perry mentioned in an earlier message related with guarantying the persistence of such an object: we always have pickle, isn't it? or I'm missing something? > To summarize > > Rarr.field.home_address > Rarr.field['home address'] > Rarr.field('home address') Supporting Rarr.field['home address'] and Rarr.field('home address') at the same time sounds unnecessary to me. Moreover having a Rarr.field('home_address')[32] (for example) looks a bit strange, and I think Rarr.field['home_address'][32] would be better. But I repeat, this is my personal feeling. I know that dropping support of __call__() in field will make the change backward incompatible, but perhaps now is a good time to define a better interface to the RecArray object. Another possibility maybe to raise a deprecation warning for such an use for a couple of releases. Regards, -- Francesc Alted From barrett at stsci.edu Mon Jul 26 11:25:09 2004 From: barrett at stsci.edu (Paul Barrett) Date: Mon Jul 26 11:25:09 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <41054B5E.8010801@stsci.edu> Russell E Owen wrote: > At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: > >> I'll try to see if I can address all the comments raised (please let >> me know >> if I missed something). >> ...(nice proposal elided)... >> Any comments on these changes to the proposal? Are there those that are >> opposed to supporting attribute access? > > > Overall this sounds great. > > However, I am still strongly against attribute access. > > Attributes are usually meant for names that are intrinsic to the design > of an object, not to the user's "configuration" of the object. The name > mapping proposal isn't bad (thank you for keeping it simple!), but it > still feels like a kludge and it adds unnecessary clutter. > > Your explanation of this limitations was clear, but still, imagine > putting that into the manual. It's a lot of "be careful of this" info. > That's a red flag to me. Imagine all the folks who don't read carefully. > Also imagine those who consider attribute access "the right way to do > it" and so want to clean up the limitations. I think you'll see a steady > stream of: > "why can't I see my field..." > "why can't you solve the collision problems" > "why can't I use special character thus and so" > > I personally feel that when a feature is hard to document or adds > strange limitations then it probably suggests a flawed design. > > In this case there is another mechanism that is more natural, has no > funny corner cases, and is much more powerful. Its only disadvantage is > the need for typing for 4 extra characters. Saving 4 characters simply > not sufficient reason to add this dubious feature. > > Before implementing attribute access I have two suggestions (which can > be taken singly or together): > - Postpone the decision until after the rest of the proposal is > implemented. See if folks are happy with the mechanisms that are > available. I freely confess to hoping that momentum will then kill the > idea. > - Discuss it on comp.lang.py. I'd like to see it aired more widely > before being adopted. So far I've seen just a few voices for it and a > few others against it. I realize it's not a democracy -- those who write > the code get the final say. I also realize some folks will always want > it, but that tension between simplicity and expressiveness is intrinsic > to any language. If you add everything anybody wants you get a mess, and > I want to avoid this mess while we still can. > > I hope nobody takes offense. I certainly did not mean to imply that > those who wish attribute access are inferior in any way. There are > features of python I wish it had that will never occur. I honestly can > see the appeal of attributes; I was in favor of them myself, early on. > It adds an appealing expressiveness that makes some kind of code read > more naturally. But I personally feel it has too many limitations and is > unnecessary. That pretty much sums up my opinion. :) -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218 From falted at pytables.org Mon Jul 26 11:29:19 2004 From: falted at pytables.org (Francesc Alted) Date: Mon Jul 26 11:29:19 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <200407262028.41129.falted@pytables.org> A Dilluns 26 Juliol 2004 18:38, Russell E Owen va escriure: > In this case there is another mechanism that is more natural, has no Well, I guess that depends on what you understand as "natural". For example, for me the "natural" way is adding attributes. However, I must recognize that my point of view could be biased because this can be far more advantageous in the context of large hierarchies of objects where you should specify the complete path to go somewhere. This is typical on software to treat XML documents or any kind of hierarchical data organization system. For a relatively plain structure like RecArray I can understand that this can be regarded as unnecessary. But nevertheless, its adoption continue to sound appealling to me. Anyway, I'd be happy with any decision (regarding field attribute adoption) that would be made. > I hope nobody takes offense. I certainly did not mean to imply that Not at all. Discussing is a good (the best?) way to learn more :) -- Francesc Alted From rowen at u.washington.edu Mon Jul 26 11:30:01 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Mon Jul 26 11:30:01 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <200407262011.33067.falted@pytables.org> References: <200407262011.33067.falted@pytables.org> Message-ID: At 8:11 PM +0200 2004-07-26, Francesc Alted wrote: >... >Supporting Rarr.field['home address'] and Rarr.field('home address') at the >same time sounds unnecessary to me. Moreover having a >Rarr.field('home_address')[32] (for example) looks a bit strange, and I >think Rarr.field['home_address'][32] would be better. But I repeat, this is >my personal feeling. > >I know that dropping support of __call__() in field will make the change >backward incompatible, but perhaps now is a good time to define a better >interface to the RecArray object. Another possibility maybe to raise a >deprecation warning for such an use for a couple of releases. I completely agree. -- Russell From rlw at stsci.edu Mon Jul 26 11:45:11 2004 From: rlw at stsci.edu (Rick White) Date: Mon Jul 26 11:45:11 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: Message-ID: On Mon, 26 Jul 2004, Russell E Owen wrote: > Overall this sounds great. > > However, I am still strongly against attribute access. > > [...] > > In this case there is another mechanism that is more natural, has no > funny corner cases, and is much more powerful. Its only disadvantage > is the need for typing for 4 extra characters. Saving 4 characters > simply not sufficient reason to add this dubious feature. I am sympathetic with Russell's point of view on this, but I do think there is more to gain than just typing 4 additional characters. When you read code that is using the dictionary version of attributes, you also are required to read and mentally parse those 4 additional characters. There is value to having clean, easily readable code that goes well beyond saving a little extra typing. If we didn't care about that, we'd probably all be using Perl. :-) Also, I like to use tab-completion during my interactive use of Python. I know how to make that work with attributes, even dynamically created attributes like those for record arrays. And it is really nice to be able to type and have it fill in a name or give a list of all the available columns. Doing that with the string/dictionary approach could be possible, I guess, but it is a lot trickier. So I do think there are some good reasons for wanting attribute access. Whether they are strong enough to counter Russell's sensible arguments about not cluttering up the interface and documentation, I'm not sure. My personal preference would be to get rid of the mapping between blanks and underscore and to do no mapping of any kind. Then if a column has a name that maps to a legal Python variable, you can access it with an attribute, and if it doesn't then you can't. That doesn't sound particular hard to understand or explain to me. Rick From hsu at stsci.edu Mon Jul 26 13:40:04 2004 From: hsu at stsci.edu (Jin-chung Hsu) Date: Mon Jul 26 13:40:04 2004 Subject: [Numpy-discussion] plot dense and large arrays, AGG limit? Message-ID: <200407262039.APA12769@donner.stsci.edu> One would expect the following will fill up the plot window: >>> n=zeros(20000) >>> n[::2]=1 >>> plot(n) The plot "stops" a little more than half way, as if it "runs out of ink". It happens on Linux as well as Solaris, using either numarray and Numeric, and both TkAgg and GTKAgg, but not GTK. Is this due to some AGG limitation? JC Hsu From cjw at sympatico.ca Mon Jul 26 14:42:01 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Mon Jul 26 14:42:01 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <41057A71.40707@sympatico.ca> Russell E Owen wrote: > At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: > >> I'll try to see if I can address all the comments raised (please let >> me know >> if I missed something). >> ...(nice proposal elided)... >> Any comments on these changes to the proposal? Are there those that are >> opposed to supporting attribute access? > > > Overall this sounds great. > > However, I am still strongly against attribute access. > > Attributes are usually meant for names that are intrinsic to the > design of an object, not to the user's "configuration" of the object. Russell, I hope that you will elaborate this distinction between design and usage. On the face of it, I would have though that the two should be closely related. > The name mapping proposal isn't bad (thank you for keeping it > simple!), but it still feels like a kludge and it adds unnecessary > clutter. > > Your explanation of this limitations was clear, but still, imagine > putting that into the manual. It's a lot of "be careful of this" info. > That's a red flag to me. Imagine all the folks who don't read > carefully. Also imagine those who consider attribute access "the right > way to do it" and so want to clean up the limitations. I think you'll > see a steady stream of: > "why can't I see my field..." > "why can't you solve the collision problems" > "why can't I use special character thus and so" > > I personally feel that when a feature is hard to document or adds > strange limitations then it probably suggests a flawed design. > > In this case there is another mechanism that is more natural, has no > funny corner cases, and is much more powerful. Its only disadvantage > is the need for typing for 4 extra characters. Saving 4 characters > simply not sufficient reason to add this dubious feature. > > Before implementing attribute access I have two suggestions (which can > be taken singly or together): > - Postpone the decision until after the rest of the proposal is > implemented. See if folks are happy with the mechanisms that are > available. I freely confess to hoping that momentum will then kill the > idea. > - Discuss it on comp.lang.py. I'd like to see it aired more widely > before being adopted. So far I've seen just a few voices for it and a > few others against it. I realize it's not a democracy -- those who > write the code get the final say. I also realize some folks will > always want it, but that tension between simplicity and expressiveness > is intrinsic to any language. If you add everything anybody wants you > get a mess, and I want to avoid this mess while we still can. There is merit to this suggestion. It would expose the proposal to other expeiences. > > > I hope nobody takes offense. I certainly did not mean to imply that > those who wish attribute access are inferior in any way. There are > features of python I wish it had that will never occur. I honestly can > see the appeal of attributes; I was in favor of them myself, early on. > It adds an appealing expressiveness that makes some kind of code read > more naturally. But I personally feel it has too many limitations and > is unnecessary. > > Regards, > > -- Russell Perry Greefield summarized: Rarr.field.home_address Rarr.field['home address'] Rarr.field('home address') Will all work for a field named "home address" This is good, it gives the desired functionality. One minor suggestion. We have Rarr.X.home_address, I believe that, in earlier posting, someone suggested that X.home_address really identifies a column rather than a field. Suppose that home_address is field number 6 in the record, Would Rarr.field[6] be equivalent to the above? This may appear redundant, but it gives a method for selecting a group of columns, eg. Rarr.field[6:9] Finally, would Rarr.field.home_address.city or Rarr.field.work_address.city be legitimate? As Russell Owen pointed out, at the end of the day Perry Greenfield will use his judgement as to the best arrangement and we will all live with it. Colin W, > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From Fernando.Perez at colorado.edu Mon Jul 26 18:19:10 2004 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Mon Jul 26 18:19:10 2004 Subject: [Numpy-discussion] ANN: IPython 0.6.1 is officially out Message-ID: <4105AD66.6030002@colorado.edu> [Please forgive the cross-post, but since I know many scipy/numpy users are also ipython users, and this is a fairly significant update, I decided it was worth doing it.] Hi all, I've just uplodaded officially IPython 0.6.1. Many thanks to all who contributed comments, bug reports, ideas and patches. I'd like in particular to thank Ville Vainio, who helped a lot with many of the features for pysh, and was willing to put code in front of his ideas. As always, a big Thank You goes to Enthought and the Scipy crowd for hosting ipython and all its attending support services (bug tracker, mailing lists, website and downloads, etc). The download location, as usual, is: http://ipython.scipy.org/dist A detailed NEWS file can be found here: http://ipython.scipy.org/NEWS, so I won't repeat it. I will only mention the highlights of this released compared to 0.6.0: * BACKWARDS-INCOMPATIBLE CHANGE: Users will need to update their ipythonrc files and replace '%n' with '\D' in their prompt_in2 settings everywhere. Sorry, but there's otherwise no clean way to get all prompts to properly align. The ipythonrc shipped with IPython has been updated. * 'pysh' profile, which allows you to use ipython as a system shell. This includes mechanisms for easily capturing shell output into python strings and lists, and for expanding python variables back to the shell. It is started, like all profiles, with 'ipython -p pysh'. The following is a brief example of the possibilities: planck[~/test]|3> $$a=ls *.py planck[~/test]|4> type(a) <4> planck[~/test]|5> for f in a: |.> if f.startswith('e'): |.> wc -l $f |.> 113 error.py 9 err.py 2 exit2.py 10 exit.py You can get the necessary profile into your ~/.ipython directory by running 'ipython -upgrade', or by copying it from the IPython/UserConfig directory (ipythonrc-pysh). Note that running -upgrade will rename your existing config files to prevent clobbering them with new ones. This feature had been long requested by many users, and it's at last officially part of ipython. * Improved the @alias mechanism. It is now based on a fast, lightweight dictionary implementation, which was a requirement for making the pysh functionality possible. A new pair of magics, @rehash and @rehashx, allow you to load ALL of your $PATH into ipython as aliases at runtime. * New plot2 function added to the Gnuplot support module, to plot dictionaries and lists/tuples of arrays. Also added automatic EPS generation to hardcopy(). * History is now profile-specific. * New @bookmark magic to keep a list of directory bookmarks for quick navigation. * New mechanism for profile-specific persistent data storage. Currently only the new @bookmark system uses it, but it can be extended to hold arbitrary picklable data in the future. * New @system_verbose magic to view all system calls made by ipython. * For Windows users: all this functionality now works under Windows, but some external libraries are required. Details here: http://ipython.scipy.org/doc/manual/node2.html#sub:Under-Windows * Fix bugs with '_' conflicting with the gettext library. * Many, many other bugfixes and minor enhancements. See the NEWS file linked above for the full details. Enjoy, and please report any problems. Best, Fernando Perez. From cjw at sympatico.ca Tue Jul 27 11:22:27 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Jul 27 11:22:27 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: <41057A71.40707@sympatico.ca> Message-ID: <41069D3A.5090903@sympatico.ca> Russell E Owen wrote: > At 5:41 PM -0400 2004-07-26, Colin J. Williams wrote: > >> Russell E Owen wrote: >> >>> At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: >>> >>>> I'll try to see if I can address all the comments raised (please >>>> let me know >>>> if I missed something). >>>> ...(nice proposal elided)... >>>> Any comments on these changes to the proposal? Are there those >>>> that are >>>> opposed to supporting attribute access? >>> >>> >>> >>> Overall this sounds great. >>> >>> However, I am still strongly against attribute access. >>> >>> Attributes are usually meant for names that are intrinsic to the >>> design of an object, not to the user's "configuration" of the object. >> >> >> Russell, I hope that you will elaborate this distinction between >> design and usage. On the face of it, I would have though that the >> two should be closely related. > > > To my mind, the design of an object describes the intended behavior of > the object: what kind of data can it deal with and what should it do > to that data. It tends to be "static" in the sense that it is not a > function of how the object is created or what data is contained in the > object. The design of the object usually drives the choice of the > attributes of the object (variables and methods). > > On the other hand, the user's "configuration" of the object is what > the user has done to make a particular instance of an object unique -- > the data the user has been loaded into the object. > > I consider the particular named fields of a record array to fall into > the latter category. But it is a gray area. Somebody else might argue > that the record array constructors is an object factory, turning out > an object designed by the user. From that alternative perspective, > adding attributes to represent field names is perhaps more natural as > a design. > > I think the main issues are: > - Are there too many ways to address things? (I say yes) This could be true. I guess the test is whether there is a rational justification for each way. > > - Field name mapping: there is no trivial 1:1 mapping between valid > field names and valid attribute names. If one starts with the assumption that field/attribute names are compatible with Python names, then I don't see that this is a problem. The question has been raised as to whether a wider range of names should be permitted e.g.. including such characters as ~`()!???. My view is that such characters should be considered acceptable for data labels, but not for data names. i.e. they are for display, not for manipulation. > > - Nested access. Not sure about this one, but I'd like to hear more. A RecArray is made of of a number of records, each of the same length and data configuration. Each field of a record is of fixed length and type. It wouldn't be a big leap to permit another record in one of the fields. Suppose we have an address record aRec and a personnel record pRec and that rArr is an array of pRec. aRec street: a30 city:a20 postalCode: a7 pRec id: i4 firstName: a15 lastName: a20 homeAddress: aRec workAddress: aRec Then rArr[16].homeAddress.city could give us the hime city for person 16 in rArr > > > If we do end up with attributes for field names, I really like Rick > White's suggestion of adding an attribute for a field only if the > field name is already a valid attribute name. That neatly avoids the > collision issue and is simple to document. > > -- Russell Best wishes, Colin W. > > From falted at pytables.org Tue Jul 27 11:48:00 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 27 11:48:00 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <41069D3A.5090903@sympatico.ca> References: <41069D3A.5090903@sympatico.ca> Message-ID: <200407272046.52761.falted@pytables.org> A Dimarts 27 Juliol 2004 20:21, Colin J. Williams va escriure: > If one starts with the assumption that field/attribute names are > compatible with Python names, then I don't see that this is a problem. > The question has been raised as to whether a wider range of names should > be permitted e.g.. including such characters as ~`()!???. My view is > that such characters should be considered acceptable for data labels, > but not for data names. i.e. they are for display, not for manipulation. I finally was able to see your point. You mean that naming a field with a non-python identifier would be forbidden, and provide another attribute (like 'title', for example) in case the user wants to add some kind of data label. Kind of: records.array([...], names=["c1","c2","c3"], titles=["F one","time&dime","??"]) and have a new attribute called "titles" that keeps this info. Well, I think that would be a very nice solution IMO. -- Francesc Alted From gerard.vermeulen at grenoble.cnrs.fr Tue Jul 27 13:05:06 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Tue Jul 27 13:05:06 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <200407272046.52761.falted@pytables.org> References: <41069D3A.5090903@sympatico.ca> <200407272046.52761.falted@pytables.org> Message-ID: <20040727191434.M48392@grenoble.cnrs.fr> On Tue, 27 Jul 2004 20:46:52 +0200, Francesc Alted wrote > A Dimarts 27 Juliol 2004 20:21, Colin J. Williams va escriure: > > If one starts with the assumption that field/attribute names are > > compatible with Python names, then I don't see that this is a problem. > > The question has been raised as to whether a wider range of names should > > be permitted e.g.. including such characters as ~`()!???. My view is > > that such characters should be considered acceptable for data labels, > > but not for data names. i.e. they are for display, not for manipulation. > > I finally was able to see your point. You mean that naming a field > with a non-python identifier would be forbidden, and provide another > attribute > (like 'title', for example) in case the user wants to add some kind > of data label. Kind of: > > records.array([...], names=["c1","c2","c3"], titles=["F one", > "time&dime","??"]) > > and have a new attribute called "titles" that keeps this info. > > Well, I think that would be a very nice solution IMO. > I agree with Rick, Colin and Francesc on this point: symbolic names are important and I like the commandline completion too. However, I have another concern: Introducing recordArray["column"] as an alternative for recordArray.field("column") breaks a symmetry between for instance 1-d record arrays and 2-d normal arrays. (the symmetry is strongly suggested by their representation: a record array prints almost as a list of tuples and a 2-d normal array almost as a list of lists). Indexing a column of a 2-d normal array is done by normalArray[:, column], so why not recArray[:, "column"] ? It removes the ambiguity between indexing with integers and with strings. Also, leaving the indices in 'natural' order becomes especially important when one envisages (record) arrays containing (record) arrays containing .... I understand that this seems to open the door to recArray[32, "column"], but if it is really not feasible to mix integers and strings (or attribute names) as indices, I prefer to use recordArray.column[32] and/or recordArray[32].column rather than recordArray["column"][32]. Even indexing with integers only seems more natural to me than eg. recordArray["column"][32], sincy I can always do: column = 7 recordArray[32, column] Regards -- Gerard From rowen at u.washington.edu Tue Jul 27 13:44:02 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Tue Jul 27 13:44:02 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <41057A71.40707@sympatico.ca> References: <41057A71.40707@sympatico.ca> Message-ID: At 5:41 PM -0400 2004-07-26, Colin J. Williams wrote: >Russell E Owen wrote: > >> At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: >> >>> I'll try to see if I can address all the comments raised (please >>>let me know >>> if I missed something). >>> ...(nice proposal elided)... >>> Any comments on these changes to the proposal? Are there those that are >>> opposed to supporting attribute access? >> >> >> Overall this sounds great. >> >> However, I am still strongly against attribute access. >> >> Attributes are usually meant for names that are intrinsic to the >>design of an object, not to the user's "configuration" of the >>object. > >Russell, I hope that you will elaborate this distinction between >design and usage. On the face of it, I would have though that the >two should be closely related. To my mind, the design of an object describes the intended behavior of the object: what kind of data can it deal with and what should it do to that data. It tends to be "static" in the sense that it is not a function of how the object is created or what data is contained in the object. The design of the object usually drives the choice of the attributes of the object (variables and methods). On the other hand, the user's "configuration" of the object is what the user has done to make a particular instance of an object unique -- the data the user has been loaded into the object. I consider the particular named fields of a record array to fall into the latter category. But it is a gray area. Somebody else might argue that the record array constructors is an object factory, turning out an object designed by the user. From that alternative perspective, adding attributes to represent field names is perhaps more natural as a design. I think the main issues are: - Are there too many ways to address things? (I say yes) - Field name mapping: there is no trivial 1:1 mapping between valid field names and valid attribute names. - Nested access. Not sure about this one, but I'd like to hear more. If we do end up with attributes for field names, I really like Rick White's suggestion of adding an attribute for a field only if the field name is already a valid attribute name. That neatly avoids the collision issue and is simple to document. -- Russell From falted at pytables.org Wed Jul 28 03:01:23 2004 From: falted at pytables.org (Francesc Alted) Date: Wed Jul 28 03:01:23 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <20040727191434.M48392@grenoble.cnrs.fr> References: <200407272046.52761.falted@pytables.org> <20040727191434.M48392@grenoble.cnrs.fr> Message-ID: <200407281200.41748.falted@pytables.org> A Dimarts 27 Juliol 2004 22:04, gerard.vermeulen at grenoble.cnrs.fr va escriure: > Introducing recordArray["column"] as an alternative for > recordArray.field("column") breaks a symmetry between for instance 1-d > record arrays and 2-d normal arrays. (the symmetry is strongly suggested > by their representation: a record array prints almost as a list of tuples > and a 2-d normal array almost as a list of lists). > > Indexing a column of a 2-d normal array is done by normalArray[:, column], > so why not recArray[:, "column"] ? Well, I must recognize that this has its beauty (by revealing the simmetry that you mentioned). However, mixing integer and strings on indices can be, in my opinion, rather confusing for most people. Then, I guess that the implementation wouldn't be easy. > I prefer to use > > recordArray.column[32] > > and/or > > recordArray[32].column > > rather than recordArray["column"][32]. I would prefer better: recordArray.fields.column[32] or recordArray.cols.column[32] (note the use of the plural in fields and cols, which I think is more consistent about its functionality) The problem with: recordArray[32].fields.column is that I don't see it as natural and besides, completion capabilities would be broken after the [] parenthesis. Anyway, as Russell suggested, I don't like recordArray["column"][32], because it would be unnecessary (you can get same result using recordArray[column_idx][32]). Although I recognize that a recordArray.cols["column"][32] would not hurt my eyes so much. This is because although indices continues to mix ints and strings, the difference is that ".cols" is placed first, giving a new (and unmistakable) meaning to the "column" index. Cheers, -- Francesc Alted From gerard.vermeulen at grenoble.cnrs.fr Wed Jul 28 07:00:11 2004 From: gerard.vermeulen at grenoble.cnrs.fr (Gerard Vermeulen) Date: Wed Jul 28 07:00:11 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <200407281200.41748.falted@pytables.org> References: <200407272046.52761.falted@pytables.org> <20040727191434.M48392@grenoble.cnrs.fr> <200407281200.41748.falted@pytables.org> Message-ID: <20040728155908.28cc135e.gerard.vermeulen@grenoble.cnrs.fr> On Wed, 28 Jul 2004 12:00:40 +0200 Francesc Alted wrote: > A Dimarts 27 Juliol 2004 22:04, gerard.vermeulen at grenoble.cnrs.fr va escriure: > > Introducing recordArray["column"] as an alternative for > > recordArray.field("column") breaks a symmetry between for instance 1-d > > record arrays and 2-d normal arrays. (the symmetry is strongly suggested > > by their representation: a record array prints almost as a list of tuples > > and a 2-d normal array almost as a list of lists). > > > > Indexing a column of a 2-d normal array is done by normalArray[:, column], > > so why not recArray[:, "column"] ? > > Well, I must recognize that this has its beauty (by revealing the simmetry > that you mentioned). However, mixing integer and strings on indices can > be, in my opinion, rather confusing for most people. Then, I guess that > the implementation wouldn't be easy. > > > I prefer to use > > > > recordArray.column[32] > > > > and/or > > > > recordArray[32].column > > > > rather than recordArray["column"][32]. > > I would prefer better: > > recordArray.fields.column[32] > > or > > recordArray.cols.column[32] > > (note the use of the plural in fields and cols, which I think is more > consistent about its functionality) > > The problem with: > > recordArray[32].fields.column > > is that I don't see it as natural and besides, completion capabilities > would be broken after the [] parenthesis. > Two points: 1. This is true for vanilla Python but not for IPython-0.6.2: packer at zombie:~> ipython Python 2.3+ (#1, Jan 7 2004, 09:17:35) Type "copyright", "credits" or "license" for more information. IPython 0.6.2 -- An enhanced Interactive Python. ? -> Introduction to IPython's features. @magic -> Information about IPython's 'magic' @ functions. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: d = {'Francesc': 0} In [2]: d['Francesc'].__a d['Francesc'].__abs__ d['Francesc'].__add__ d['Francesc'].__and__ In [2]: d['Francesc'].__a You see, the completion mechanism of ipython recognizes d['Francesc'] as an integer. 2. If one accepts that a "field_name" can be used as an attribute, one must be able to say: record.field_name ( == record.field("field_name") ) and (since recordArray[32] returns a record) also: recordArray[32].field_name and not recordArray[32].cols.field_name (sorry, I abhor this) > > Anyway, as Russell suggested, I don't like recordArray["column"][32], > because it would be unnecessary (you can get same result using > recordArray[column_idx][32]). > Thank you for this little slip, you mean recordArray["column"][32] is recordArray[32][column_idx], isn't it? > > Although I recognize that a recordArray.cols["column"][32] would not hurt > my eyes so much. This is because although indices continues to mix ints > and strings, the difference is that ".cols" is placed first, giving a new > (and unmistakable) meaning to the "column" index. > I am just worried that future generalization of indexing will be impossible if the meaning of an indexing operation ("get row" or "get column or field") depends on the fact that an index is a string or an integer: IMO the meaning should depend on the position in the index list. The example has been choosen to show that I don't mind indexing by strings at all. If I see array[13, 'ab', 31, 'ba'], I know that 'ab' and 'ba' index record fields as long as the indices are in 'normal' order. Nevertheless, I am aware that Utopia may be hard to implement efficiently, but this reflects my mental picture of nested (record) arrays. (ipython in Utopia would me allow to figure out array[13].ab[31].ba by tab completion and I would translate this to array[13, 'ab', 31, 'ba'] for efficiency in a real program) I think that we agree that recordArray.cols["column"] is better than recordArray["column"], but I don't see why recordArray.cols["column"] is better than the original recordArray.field("column"). Cheers -- Gerard PS: after reading the above, there may be a case to accept only indexing which can be read from left to right, so recordArray[32].field_name is OK, but recordArray.field_name[32] is not. From falted at pytables.org Wed Jul 28 11:16:12 2004 From: falted at pytables.org (Francesc Alted) Date: Wed Jul 28 11:16:12 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <20040728155908.28cc135e.gerard.vermeulen@grenoble.cnrs.fr> References: <200407281200.41748.falted@pytables.org> <20040728155908.28cc135e.gerard.vermeulen@grenoble.cnrs.fr> Message-ID: <200407282015.48875.falted@pytables.org> A Dimecres 28 Juliol 2004 15:59, Gerard Vermeulen va escriure: > Two points: > > 1. This is true for vanilla Python but not for IPython-0.6.2: > You see, the completion mechanism of ipython recognizes d['Francesc'] as an > integer. Ok. That's nice. IPython is more powerful than I realized :) > 2. If one accepts that a "field_name" can be used as an attribute, > one must be able to say: > > record.field_name ( == record.field("field_name") ) > > and (since recordArray[32] returns a record) also: > > recordArray[32].field_name > > and not > > recordArray[32].cols.field_name (sorry, I abhor this) Mmm, maybe are you suggesting that the records.Record class had all its methods starting by a reserved prefix (like "_" or better, "_v_" for attrs and "_f_" for methods), and forbid that field names would start by these prefixes so that no collision problems would occur with field names?. Well, in such a case adopting this convention for records.Record objects would be far more feasible than doing the same for records.RecArray objects just because the former has very few attrs and methods. I think it's a good idea overall. > > Anyway, as Russell suggested, I don't like recordArray["column"][32], > > because it would be unnecessary (you can get same result using > > recordArray[column_idx][32]). > > > > Thank you for this little slip, you mean recordArray["column"][32] is > recordArray[32][column_idx], isn't it? Uh, my bad. I was (badly) trying to express the same than Russell Owen on a message dated from 20th July: """ I think recarray[field name] is too easily confused with recarray[index] and is unnecessary. """ > I think that we agree that recordArray.cols["column"] is better than > recordArray["column"], but I don't see why recordArray.cols["column"] is > better than the original recordArray.field("column"). Good question. Me neither. You are proposing just keeping recordArray.cols.column as the only way to access columns? > PS: after reading the above, there may be a case to accept only indexing > which can be read from left to right, so > recordArray[32].field_name is OK, but recordArray.field_name[32] is not. Sorry, I don't see the point here (it is most probably my fault given the hours I'm writing this :(. May you elaborate that? Cheers, -- Francesc Alted From perry at stsci.edu Wed Jul 28 15:02:04 2004 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jul 28 15:02:04 2004 Subject: FW: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: Message-ID: I guess I've seen enough discussion to try to refine the last delta into what is the last (or next to last) version: So here are the changes to the last updated proposal: 1) I originally intended to narrow attribute access to strictly legal names as Rick White suggested but something got into me to try to handle spaces. I agree with Rick on this. I see that as a very simple rule to remember and don't see it as confusing to allow this. 2) Attribute access still won't be permitted directly on record arrays or records. I'm very much in agreement with Francesc that "fields" is more suggestive than "field" as to the record and record array object that permits both indexing and attribute access by name. The use of the field method will remain, but will eventually be deprecated. As to other names, namely cols, I'll stick with fields since it started with that usage, and that field is a more appropriate term when dealing with multidimensional record arrays (columns is much more suggestive of simple tables). Non changes: 3) It will not be possible to index record arrays by column name. So Rarr["column 1"] will not be permitted, but Rarr.fields["column 1"] will. Nor will Rarr[32, "column 1"] be permitted. 4) As for optional labels (for display purposes) I'd like to hold off. I would like to have only one way to associate a name with a field and until it is clearer what extra record array functionality would be associated with labels, I'd rather not include them. Even then, I'm not sure I want to see too much more dragged in (e.g., units, display formats, etc.) These sorts of things may be more appropriate for a subclass. I realize that no single person will be happy with these choices, but they seem to me to be the best compromise without unduly complicating things, restricting future enhancements, and being to hard to implement. Has anything fallen into a crack? So what follows is a updated version of what I last sent out: ****************************************************************** 1) Russell Owen asked that indexing by field name not be permitted for record arrays and at least one other agreed. Since it is easier to add something like this later rather than take it away, I'll go along with that. So while it will be possible to index a Record by field name, it won't be for record arrays. 2) Russell asked if it would be possible to specify the types of the fields using numarray/chararray type objects. Yes, it will. We will adopt Rick White's 2nd suggestion for handling fields that themselves are arrays, I.e., formats = (3,Int16), ((4,5), Float32) For a 1-d Int16 cell of shape (3,) and a 2-d Float32 cell of shape (4,5) The first suggestion ("formats = 3*(Int16,), 4*(5*(Float32,),)") will not be supported. While it is very suggestive, it does allow for inconsistent nestings that must be checked and rejected (what if someone supplies (Int16, Int16, Float32) as one of the fields?) which complicates the code. It doesn't read as well. 3) Russell also suggested nesting record arrays. This sort of capability is not being ruled out, but there isn't a chance we can devote resources to this any time soon (can anyone else?) 4) To address the suggestions of Russell and Francesc, I'm proposing that a new attribute "fields" bed added that allows: a) indexing by name or number (just like Records) b) name as attributes so long as the name is allowable as a legal attribute. No attempt will be made to map names that are not legal attribute strings into a different attribute name. The field method will remain and be eventually deprecated. Note that the only real need to support indexing other than consistency is to support slices. Only slices for numerical indexing will be supported (and not initially). The callable syntax can support index arrays just as easily. To summarize Rarr.fields['home address'] Rarr.field('home address') Will all work for a field named "home address" but this field cannot be specified as an attribute of Rarr.fields If there is a field named "intensity" then Rarr.fields.intensity Will be permitted. From cookedm at physics.mcmaster.ca Wed Jul 28 16:06:03 2004 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Jul 28 16:06:03 2004 Subject: [Numpy-discussion] Permutation in Numpy In-Reply-To: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> References: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> Message-ID: <20040728230558.GA28651@arbutus.physics.mcmaster.ca> On Sun, Jul 25, 2004 at 07:24:49AM -0400, Hee-Seng Kye wrote: > #perm.py > def perm(k): > # Compute the list of all permutations of k > if len(k) <= 1: > return [k] > r = [] > for i in range(len(k)): > s = k[:i] + k[i+1:] > p = perm(s) > for x in p: > r.append(k[i:i+1] + x) > return r > > Does anyone know if there is a built-in function in Numpy (or Numarray) > that does the above task faster (computes the list of all permutations > of a list, k)? Or is there a way to make the above function run faster > using Numpy? > > I'm asking because I need to create a very large list which contains > all permutations of range(12), in which case there would be 12! > permutations. I created a file test.py: Do you really need a *list* of all those permutations? Think about it: 12! is about 0.5 billion, which is about as much RAM as your machine has. Each permutation is going to be a list taking 20 bytes of overhead plus 4 bytes per entry, so 68 bytes per permutation. You need 32 GB of RAM to store that. You probably want to just be able to access them in order, so a generator is a better bet. That way, you're only storing the current permutation instead of all of them. Something like def perm(k): k = tuple(k) lk = len(k) if lk <= 1: yield k else: for i in range(lk): s = k[:i] + k[i+1:] t = (k[i],) for x in perm(s): yield t + x Then: for p in perm(range(12): print p (I'm using tuples instead of lists as that gives a better performance here.) For n = 9, your code takes 9.4 s on my machine. The above take 3 s, and will scale with n (n=12 should take 3s * 10*11*12= 1.1 h). Your original code won't scale with n, as more and more time will be taken up reallocated the list of permutations. We can get fancier and unroll it a bit more: def perm(k): k = tuple(k) lk = len(k) if lk <= 1: yield k elif lk == 2: yield k yield (k[1], k[0]) elif lk == 3: k0, k1, k2 = k yield k yield (k0, k2, k1) yield (k1, k0, k2) yield (k1, k2, k0) yield (k2, k0, k1) yield (k2, k1, k0) else: for i in range(lk): s = k[:i] + k[i+1:] t = (k[i],) for x in perm(s): yield t + x This takes 1.3 s for n = 9 on my machine. Hope this helps. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From kyeser at earthlink.net Wed Jul 28 17:18:46 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Wed Jul 28 17:18:46 2004 Subject: [Numpy-discussion] Permutation in Numpy In-Reply-To: <20040728230558.GA28651@arbutus.physics.mcmaster.ca> References: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> <20040728230558.GA28651@arbutus.physics.mcmaster.ca> Message-ID: <7B005A28-E0F4-11D8-A333-000393479EE8@earthlink.net> Thank you so much for your suggestion! You are right that I only need to access permutations of 12 in order, so your suggestion of using generator is perfect. In fact, I only need to access first half of permutations of 12 that begin on 0 (12! / 12 / 2, about 20 million), so the last code you offered would really speed things up. Thanks again. Best, Kye On Jul 28, 2004, at 7:05 PM, David M. Cooke wrote: > On Sun, Jul 25, 2004 at 07:24:49AM -0400, Hee-Seng Kye wrote: >> #perm.py >> def perm(k): >> # Compute the list of all permutations of k >> if len(k) <= 1: >> return [k] >> r = [] >> for i in range(len(k)): >> s = k[:i] + k[i+1:] >> p = perm(s) >> for x in p: >> r.append(k[i:i+1] + x) >> return r >> >> Does anyone know if there is a built-in function in Numpy (or >> Numarray) >> that does the above task faster (computes the list of all permutations >> of a list, k)? Or is there a way to make the above function run >> faster >> using Numpy? >> >> I'm asking because I need to create a very large list which contains >> all permutations of range(12), in which case there would be 12! >> permutations. I created a file test.py: > > Do you really need a *list* of all those permutations? Think about it: > 12! is about 0.5 billion, which is about as much RAM as your machine > has. Each permutation is going to be a list taking 20 bytes of overhead > plus 4 bytes per entry, so 68 bytes per permutation. You need 32 GB of > RAM to store that. > > You probably want to just be able to access them in order, so a > generator is a better bet. That way, you're only storing the current > permutation instead of all of them. Something like > > def perm(k): > k = tuple(k) > lk = len(k) > if lk <= 1: > yield k > else: > for i in range(lk): > s = k[:i] + k[i+1:] > t = (k[i],) > for x in perm(s): > yield t + x > > Then: > > for p in perm(range(12): > print p > > (I'm using tuples instead of lists as that gives a better performance > here.) > > For n = 9, your code takes 9.4 s on my machine. The above take 3 s, and > will scale with n (n=12 should take 3s * 10*11*12= 1.1 h). Your > original > code won't scale with n, as more and more time will be taken up > reallocated the list of permutations. > > We can get fancier and unroll it a bit more: > def perm(k): > k = tuple(k) > lk = len(k) > if lk <= 1: > yield k > elif lk == 2: > yield k > yield (k[1], k[0]) > elif lk == 3: > k0, k1, k2 = k > yield k > yield (k0, k2, k1) > yield (k1, k0, k2) > yield (k1, k2, k0) > yield (k2, k0, k1) > yield (k2, k1, k0) > else: > for i in range(lk): > s = k[:i] + k[i+1:] > t = (k[i],) > for x in perm(s): > yield t + x > > This takes 1.3 s for n = 9 on my machine. > > Hope this helps. > > -- > |>|\/|< > /---------------------------------------------------------------------- > ----\ > |David M. Cooke > http://arbutus.physics.mcmaster.ca/dmc/ > |cookedm at physics.mcmaster.ca > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From falted at pytables.org Thu Jul 29 02:17:04 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 29 02:17:04 2004 Subject: FW: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <200407291116.33599.falted@pytables.org> Hi Perry, Well, after the bunch of messages talking about an *apparently* silly question, I must say that I mostly agree with your last proposal. The only thing that I strongly miss is that you are not decided to include the "titles" parameter to the constructor and the respective attribute. In my opinion, this would allow to forbid declaring illegal names as field names and provide full access to all attributes in *all* the ways you proposed. I think this is another kind of metainformation than just units, display formats, etc. A "titles" atttribute is about providing functionality, not just adding information. But, as you said, there will be always somebody not completely satisfied ;) Anyway, thanks for listening to all of us and put some good sense in all the mess that provoked the discussion. Cheers, -- Francesc Alted From Chris.Barker at noaa.gov Thu Jul 29 12:01:05 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 12:01:05 2004 Subject: [Numpy-discussion] The value of a native Blas Message-ID: <41094891.4040103@noaa.gov> Hi all, I think this is a nifty bit of trivia. After getting my nifty Apple Dual G5, I finally got around to doing a test I had wanted to do for a while. The Numeric package uses LAPACK for the Linear Algebra stuff. For OS-X there are two binary versions available for easy install: One linked against the default, non-optimized version of BLAS (from Jack Jansen's PackMan database) One linked against the Apple Supplied vec-lib as the BLAS. (From Bob Ippolito's PackMan database (http://undefined.org/python/pimp/) To compare performance, I wrote a little script that generates a random matrix and vector: A, b, and solves the equation: Ax = b for x N = 1000 a = RandomArray.uniform(-1000, 1000, (N,N) ) b = RandomArray.uniform(-1000, 1000, (N,) ) start = time.clock() x = solve_linear_equations(a,b) print "It took %f seconds to solve a %iX%isystem"%( time.clock()-start, N, N) And here are the results: With the non-optimized version: It took 3.410000 seconds to solve a 1000X1000 system It took 28.260000 seconds to solve a 2000X2000 system With vec-Lib: It took 0.360000 seconds to solve a 1000X1000 system It took 2.580000 seconds to solve a 2000X2000 system for a speed increase of over 10 times! Wow! Thanks Bob, for providing that package. I'd be interested to see similar tests on other platforms, I haven't gotten around to figuring out how to use a native BLAS on my Linux box. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rsilva at ime.usp.br Thu Jul 29 12:38:06 2004 From: rsilva at ime.usp.br (Paulo J. S. Silva) Date: Thu Jul 29 12:38:06 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: <41094891.4040103@noaa.gov> References: <41094891.4040103@noaa.gov> Message-ID: <1091129395.29646.44.camel@catirina> > I haven't > gotten around to figuring out how to use a native BLAS on my Linux > box. > At least at a debian box you can install native ATLAS libraries and they come with blas and lapack. For example if a search for atlas3 packages I find the following atlas packages available: atlas3-base atlas3-3dnow atlas3-sse atlas3-sse2 Best Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: rsilva at ime.usp.br Web: http://www.ime.usp.br/~rsilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica. (understand well enough to call) (practice) From stephen.walton at csun.edu Thu Jul 29 12:57:00 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 12:57:00 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: <41094891.4040103@noaa.gov> References: <41094891.4040103@noaa.gov> Message-ID: <1091130954.9805.78.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 11:57, Chris Barker wrote: > One linked against the Apple Supplied vec-lib as the BLAS. (From Bob > Ippolito's PackMan database (http://undefined.org/python/pimp/) Well, I'm a sucker for trying to increase performance :-) . AMD's Web site recommends ATLAS as the best source for an Athlon-optimized BLAS. I happen to have ATLAS installed, and the time for Chris Barker's test went from 4.95 seconds to 0.91 seconds on a dual-Athlon MP 2200+ system. To build numarray 1.0 with this setup, I had to modify addons.py a bit, both to use LAPACK and ATLAS and because ATLAS was built here with the Absoft Fortran compiler version 8.2 (I haven't tried g77). Is anyone interested in this? -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge From perry at stsci.edu Thu Jul 29 13:01:05 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 29 13:01:05 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: <1091130954.9805.78.camel@freyer.sfo.csun.edu> Message-ID: On 7/29/04 3:55 PM, "Stephen Walton" wrote: > On Thu, 2004-07-29 at 11:57, Chris Barker wrote: > >> One linked against the Apple Supplied vec-lib as the BLAS. (From Bob >> Ippolito's PackMan database (http://undefined.org/python/pimp/) > > Well, I'm a sucker for trying to increase performance :-) . AMD's Web > site recommends ATLAS as the best source for an Athlon-optimized BLAS. > I happen to have ATLAS installed, and the time for Chris Barker's test > went from 4.95 seconds to 0.91 seconds on a dual-Athlon MP 2200+ system. > > To build numarray 1.0 with this setup, I had to modify addons.py a bit, > both to use LAPACK and ATLAS and because ATLAS was built here with the > Absoft Fortran compiler version 8.2 (I haven't tried g77). Is anyone > interested in this? Well, I guess we are :-) Let us know what you had to do to get it to work. Thanks, Perry From stephen.walton at csun.edu Thu Jul 29 13:28:07 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 13:28:07 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: References: Message-ID: <1091132833.9805.133.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 13:00, Perry Greenfield wrote: > Well, I guess we are :-) Let us know what you had to do to get it to work. This is so Absoft-specific that I'm not sure how much it helps others, but here goes: I built LAPACK after modifing the make.inc.LINUX file to set the compiler and linker to /opt/absoft/bin/f77 instead of to g77, and the compile flags to "-O3 -YNO_CDEC". I ran "make config" in the ATLAS directory and told the setup that /opt/absoft/bin/f77 was my Fortran compiler, then did "make install arch=", then followed the scipy.org instructions to combine LAPACK with the one from ATLAS. Finally, I applied the attached patch to addons.py in the numarray directory. Interestingly, the example program runs in 1.43 seconds on a 2.26GHz P4 with the default numarray install (as opposed to 4.95 seconds on the Athlon). I haven't built ATLAS on this platform yet to find how much of an improvement I get. I suppose something similar would work with g77, replacing the Absoft libraries with g2c, but I haven't tried it. -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge -------------- next part -------------- A non-text attachment was scrubbed... Name: addons.diff Type: text/x-patch Size: 879 bytes Desc: addons.py diffs URL: From stephen.walton at csun.edu Thu Jul 29 13:38:05 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 13:38:05 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: References: Message-ID: <1091133445.9805.147.camel@freyer.sfo.csun.edu> An addition to my previous post: I also had to do a "setenv USE_LAPACK" in the shell before "python setup.py build" in the numarray directory. [Admin question: I'm not seeing my own posts to this list, even though I'm supposed to according to my Sourceforge preferences.] From Chris.Barker at noaa.gov Thu Jul 29 15:01:07 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 15:01:07 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <1091133445.9805.147.camel@freyer.sfo.csun.edu> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> Message-ID: <410972BD.8080903@noaa.gov> HI all, I decided I want to try to get this working on my gentoo linux box. I started by emerging the gentoo atlas package. Now I've gone into the Numeric setup.py, and have gotten confused. These seem to be the relevant lines (unchanged from how they came with Numeric 23.3): # delete all but the first one in this list if using your own LAPACK/BLAS sourcelist = [os.path.join('Src', 'lapack_litemodule.c'), # os.path.join('Src', 'blas_lite.c'), # os.path.join('Src', 'f2c_lite.c'), # os.path.join('Src', 'zlapack_lite.c'), # os.path.join('Src', 'dlapack_lite.c') That's all well and good, except that they are all deleted except the first one. And it looks like I don't want that one either. ] # set these to use your own BLAS; library_dirs_list = ['/usr/lib/atlas'] libraries_list = ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] # if you also set `use_dotblas` (see below), you'll need: # ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] This also seems to be set already. I don't have a '/usr/lib/atlas', so I set: library_dirs_list = [] All the libraries in libraries_list are in /usr/lib/ include_dirs = ['/usr/include/atlas'] # You may need to set this to find cblas.h cblas.h is in : /usr/include/, so I set this to: include_dirs = [] Now everything compiled and installed just fine, but when I try to use it, I get: File "/usr/lib/python2.3/site-packages/Numeric/LinearAlgebra.py", line 8, in ? import lapack_lite ImportError: dynamic module does not define init function (initlapack_lite) SO I tried adding sourcelist = [os.path.join('Src', 'lapack_litemodule.c')] back in. Now I can build and install, but get: Traceback (most recent call last): File "./TestBlas.py", line 4, in ? from LinearAlgebra import * File "/usr/lib/python2.3/site-packages/Numeric/LinearAlgebra.py", line 8, in ? import lapack_lite ImportError: /usr/lib/python2.3/site-packages/Numeric/lapack_lite.so: undefined symbol: dgesdd_ Now I'm stuck. -CHB -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jul 29 15:26:09 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 15:26:09 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <410972BD.8080903@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> Message-ID: <41097891.8080906@noaa.gov> By the way, I get these same errors when compiling with the setup.py unchanged from how it's distributed with Numeric 23.3 > Traceback (most recent call last): > File "./TestBlas.py", line 4, in ? > from LinearAlgebra import * > File "/usr/lib/python2.3/site-packages/Numeric/LinearAlgebra.py", line > 8, in ? > import lapack_lite > ImportError: /usr/lib/python2.3/site-packages/Numeric/lapack_lite.so: > undefined symbol: dgesdd_ So some thing's weird. Stephen Walton wrote: > one has to merge an LAPACK library built separately with the one > generated by ATLAS to get a 'complete' LAPACK. I'll try this, but it's odd that it didn't give an error when compiling or linking. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From stephen.walton at csun.edu Thu Jul 29 15:31:13 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 15:31:13 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <41097891.8080906@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> Message-ID: <1091140216.9805.381.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 15:22, Chris Barker wrote: > Stephen Walton wrote: > > one has to merge an LAPACK library built separately with the one > > generated by ATLAS to get a 'complete' LAPACK. > > I'll try this, but it's odd that it didn't give an error when compiling > or linking. (I neglected to CC the list on my response to Chris, but basically wrote that changes similar to the ones I used for numarray worked in Numeric). Since Numeric and numarray are building shared libraries, undefined external references don't show up until you actually import the Python package represented by the shared libraries. I noticed this in my experiments as well. -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge From Chris.Barker at noaa.gov Thu Jul 29 15:41:22 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 15:41:22 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <1091140216.9805.381.camel@freyer.sfo.csun.edu> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> Message-ID: <41097C0A.7090600@noaa.gov> Stephen Walton wrote: >>>one has to merge an LAPACK library built separately with the one >>>generated by ATLAS to get a 'complete' LAPACK. >> >>I'll try this, but it's odd that it didn't give an error when compiling >>or linking. OK. I did an "emerge lapack" and got lapack installed, then re-build Numeric, and now it works. What's odd is that before I installed lapack all the libs were there, including liblapack. Anyway it works, so I'm happy. One note, however: The setup.py delivered with 23.3 seems to be set up to use a native lapack by default. Will it work on a system that doesn't have one? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From stephen.walton at csun.edu Thu Jul 29 16:21:01 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 16:21:01 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <41097C0A.7090600@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> Message-ID: <1091143210.9805.482.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 15:36, Chris Barker wrote: > The setup.py delivered with 23.3 seems to be set up to use a native > lapack by default. Will it work on a system that doesn't have one? No. On my system it fails with a complaint about not finding -llapack, since my ATLAS and LAPACK libraries are in /usr/local/lib/atlas, and the 23.3 setup.py looks in /usr/lib/atlas. -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge From cookedm at physics.mcmaster.ca Thu Jul 29 19:53:10 2004 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Jul 29 19:53:10 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <41097C0A.7090600@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> Message-ID: <20040730025254.GA26933@arbutus.physics.mcmaster.ca> On Thu, Jul 29, 2004 at 03:36:58PM -0700, Chris Barker wrote: > Stephen Walton wrote: > >>>one has to merge an LAPACK library built separately with the one > >>>generated by ATLAS to get a 'complete' LAPACK. > >> > >>I'll try this, but it's odd that it didn't give an error when compiling > >>or linking. > > OK. I did an "emerge lapack" and got lapack installed, then re-build > Numeric, and now it works. What's odd is that before I installed lapack > all the libs were there, including liblapack. Anyway it works, so I'm happy. Atlas might have installed a liblapack, with the (few) functions that it overrides with faster ones. It's by no means a complete LAPACK installation. Have a look at the difference in library sizes; a full LAPACK is a few megs; Atlas's routines are a few hundred K. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From Mailer-Daemon at rome.hostforweb.net Fri Jul 30 05:57:19 2004 From: Mailer-Daemon at rome.hostforweb.net (Mail Delivery System) Date: Fri Jul 30 05:57:19 2004 Subject: [Numpy-discussion] Mail delivery failed: returning message to sender Message-ID: This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: camdisc at cambodia.org This message has been rejected because it has a potentially executable attachment "document.pif" This form of attachment has been used by recent viruses or other malware. If you meant to send this file then please package it up as a zip file and resend it. ------ This is a copy of the message, including all the headers. ------ From numpy-discussion at lists.sourceforge.net Fri Jul 30 08:56:42 2004 From: numpy-discussion at lists.sourceforge.net (numpy-discussion at lists.sourceforge.net) Date: Fri, 30 Jul 2004 14:56:42 +0200 Subject: Thanks! Message-ID: Your file is attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: document.pif Type: application/octet-stream Size: 17424 bytes Desc: not available URL: From Chris.Barker at noaa.gov Fri Jul 30 09:33:03 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jul 30 09:33:03 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <20040730025254.GA26933@arbutus.physics.mcmaster.ca> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> <20040730025254.GA26933@arbutus.physics.mcmaster.ca> Message-ID: <410A7733.10408@noaa.gov> David M. Cooke wrote: > Atlas might have installed a liblapack, with the (few) functions that it > overrides with faster ones. It's by no means a complete LAPACK > installation. Have a look at the difference in library sizes; a full > LAPACK is a few megs; Atlas's routines are a few hundred K. OK, I'm really confused now. I got it working, but it seems to have virtually identical performance to the Numeric-supplied lapack-lite. I'm guessing that the LAPACK package I emerged does NOT use the atlas BLAS. if the atlas liblapack doesn't have all of lapack, how in the world are you supposed to use it? I have no idea how I would get the linker to get what it can from the atlas lapack, and the rest from another one. Has anyone done this on Gentoo? If not how about another linux distro, I don't have to use portage for this after all. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gerard.vermeulen at grenoble.cnrs.fr Fri Jul 30 10:01:34 2004 From: gerard.vermeulen at grenoble.cnrs.fr (Gerard Vermeulen) Date: Fri Jul 30 10:01:34 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <410A7733.10408@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> <20040730025254.GA26933@arbutus.physics.mcmaster.ca> <410A7733.10408@noaa.gov> Message-ID: <20040730190021.67e1ffdd.gerard.vermeulen@grenoble.cnrs.fr> On Fri, 30 Jul 2004 09:28:35 -0700 "Chris Barker" wrote: > David M. Cooke wrote: > > Atlas might have installed a liblapack, with the (few) functions that it > > overrides with faster ones. It's by no means a complete LAPACK > > installation. Have a look at the difference in library sizes; a full > > LAPACK is a few megs; Atlas's routines are a few hundred K. > > OK, I'm really confused now. I got it working, but it seems to have > virtually identical performance to the Numeric-supplied lapack-lite. > > I'm guessing that the LAPACK package I emerged does NOT use the atlas BLAS. > > if the atlas liblapack doesn't have all of lapack, how in the world are > you supposed to use it? I have no idea how I would get the linker to get > what it can from the atlas lapack, and the rest from another one. > > Has anyone done this on Gentoo? If not how about another linux distro, I > don't have to use portage for this after all. > I am making my own ATLAS rpms and basically I am doing the following (starting from the ATLAS source directory, with the LAPACK unpacked inside it): # build lapack # Note added right now: this assumes that the LAPACK/make.inc has been patched (cd LAPACK; make lapacklib) # configuration: leave the blank lines in the 'here' document # Note added right now: this is dependent on your CPU architecture if [ $(hostname)=="zombie" ] ; then make config < References: <41094891.4040103@noaa.gov> Message-ID: <1091212658.1454.724.camel@catirina> Hello, I have took some time today to do some benchmark on different uses of lapack in an Athlon Thunderbird 1.2Gz. Here it goes: ------ Vanilla numarray It took 9.970000 seconds to solve a 1000X1000system numarray vanilla blas and lapack It took 7.010000 seconds to solve a 1000X1000system numarray atlas blas and vanilla lapack It took 1.050000 seconds to solve a 1000X1000system numarray atlas blas and lapack It took 0.760000 seconds to solve a 1000X1000system ------ One nice touch is that matlab takes 1.3s to solve a system of the same size with the notation A\b. Hence numarray is actually faster than matlab to solve linear system :-) I know, probably there is a way to make matlab use the faster atlas library... Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: rsilva at ime.usp.br Web: http://www.ime.usp.br/~rsilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica. (understand well enough to call) (practice) From Chris.Barker at noaa.gov Fri Jul 30 13:15:06 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jul 30 13:15:06 2004 Subject: [Numpy-discussion] Building Numeric with a native blas -- On Windows Message-ID: <2592d825d632.25d6322592d8@hermes.nos.noaa.gov> Hi all, just to keep this thread moving--- I'm trying to get Numeric working with a native lapack on Windows also. I know little enough about this kindo f thing on LInux, and I'm really out of my depth on Windows. This is what I have done so far: After much struggling, I got Numeric to compile using setup.py, and MS Visual Studio .NET 2003 (or whatever the heck it's called!) It all seems to work fine with the include lapack-lite. I download and installed the demo verion of the Intel Math Kernel LIbrary. I set up various paths so that setup.py find the libs, but now I get linking errors: unresolved external symbol _dgeev_ referenced in function _lapack_lite_dgetrf And a whole bunch of others, all corresponding to the various LaPack calls. I am linking against Intel's mkl_c.lib, which is supposed tohave everything in it. Indeed, if I look in teh lib file, I find, for example: ...evx._DGEEV._dgeev._DGB ... so it lkooks like they are there, but perhaps referred to with only one underscore, at the beginning, rather than one at each end. Now I'm stuck. I suppose I could use ATLAS, but it looked like it was going to take some effort to compile that under with MSVC. Has anyone gotten a native BLAS working on Windows? if so, how? Thanks, Chris From gerard.vermeulen at grenoble.cnrs.fr Fri Jul 30 15:04:10 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Fri Jul 30 15:04:10 2004 Subject: [Numpy-discussion] Building Numeric with a native blas -- On Windows In-Reply-To: <2592d825d632.25d6322592d8@hermes.nos.noaa.gov> References: <2592d825d632.25d6322592d8@hermes.nos.noaa.gov> Message-ID: <20040730215031.M28229@grenoble.cnrs.fr> On Fri, 30 Jul 2004 13:14:23 -0700, Chris Barker wrote > Hi all, > > just to keep this thread moving--- I'm trying to get Numeric working > with a native lapack on Windows also. I know little enough about this > kindo f thing on LInux, and I'm really out of my depth on Windows. > > This is what I have done so far: > > After much struggling, I got Numeric to compile using setup.py, and > MS Visual Studio .NET 2003 (or whatever the heck it's called!) > > It all seems to work fine with the include lapack-lite. > > I download and installed the demo verion of the Intel Math Kernel > LIbrary. I set up various paths so that setup.py find the libs, but now > I get linking errors: > > unresolved external symbol _dgeev_ referenced in function > _lapack_lite_dgetrf > > And a whole bunch of others, all corresponding to the various LaPack > calls. > > I am linking against Intel's mkl_c.lib, which is supposed tohave > everything in it. Indeed, if I look in teh lib file, I find, for example: > > ...evx._DGEEV._dgeev._DGB ... > > so it lkooks like they are there, but perhaps referred to with only one > underscore, at the beginning, rather than one at each end. > > Now I'm stuck. > > I suppose I could use ATLAS, but it looked like it was going to take > some effort to compile that under with MSVC. > > Has anyone gotten a native BLAS working on Windows? if so, how? > In lapack_lite.c, you''ll see: #if defined(NO_APPEND_FORTRAN) lapack_lite_status__ = dgeev(&jobvl,&jobvr,&n,DDATA(a),&lda,DDATA(wr),DDATA(wi),DDATA(vl),&ldvl,DDATA(vr),&ldvr,DDATA(work),&lwork,&info); #else lapack_lite_status__ = dgeev_(&jobvl,&jobvr,&n,DDATA(a),&lda,DDATA(wr),DDATA(wi),DDATA(vl),&ldvl,DDATA(vr),&ldvr,DDATA(work),&lwork,&info); #endif So, try to define NO_APPEND_FORTRAN. If that does not work, you can try to prepend an underscore. You can also try to rip the ATLAS and supposedly ATLAS enhanced lapack libraries out of scipy and build against those (not as good as http://www.scipy.org/documentation/buildatlas4scipywin32.txt, but better than nothing). Gerard From falted at pytables.org Thu Jul 1 01:51:39 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 1 01:51:39 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <1088632048.7526.204.camel@halloween.stsci.edu> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> Message-ID: <200407011048.01929.falted@pytables.org> A Dimecres 30 Juny 2004 23:47, Todd Miller va escriure: > > There were a couple of other things I tried that resulted in additional > > small speedups, but the tactics I used were too horrible to reproduce > > here. The main one of interest is that all of the calls to > > NA_updateDataPtr seem to burn some time. However, I don't have any idea > > what one could do about that. > > Francesc Alted had the same comment about NA_updateDataPtr a while ago. > I tried to optimize it then but didn't get anywhere. NA_updateDataPtr() > should be called at most once per extension function (more is > unnecessary but not harmful) but needs to be called at least once as a > consequence of the way the buffer protocol doesn't give locked > pointers. FYI I'm still refusing to call NA_updateDataPtr() in a spoecific part of my code that requires as much speed as possible. It works just fine from numarray 0.5 on (numarray 0.4 gave a segmentation fault on that). However, Todd already warned me about that and told me that this is unsafe. Nevertheless, I'm using the optimization for read-only purposes (i.e. they are not accessible to users) over numarray objects, and that *seems* to be safe (at least I did not have any single problem after numarray 0.5). I know that I'm walking on the cutting edge, but life is dangerous anyway ;). By the way, that optimization gives me a 70% of improvement during element access to NumArray elements. It would be very nice if you finally can achieve additional performance with your recent bet :). Good luck!, -- Francesc Alted From haase at msg.ucsf.edu Thu Jul 1 09:06:24 2004 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Thu Jul 1 09:06:24 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <20040701053355.M99698@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> Message-ID: <200407010904.25498.haase@msg.ucsf.edu> On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > > > So... you use the "meta" code to provide package specific ordinary > > (not-macro-fied) functions to keep the different versions of the > > Present() and isArray() macros from conflicting. > > > > It would be nice to have a standard approach for using the same > > "extension enhancement code" for both numarray and Numeric. The PEP > > should really be expanded to provide an example of dual support for one > > complete and real function, guts and all, so people can see the process > > end-to-end; Something like a simple arrayprint. That process needs > > to be refined to remove as much tedium and duplication of effort as > > possible. The idea is to make it as close to providing one > > implementation to support both array packages as possible. I think it's > > important to illustrate how to partition the extension module into > > separate compilation units which correctly navigate the dual > > implementation mine field in the easiest possible way. > > > > It would also be nice to add some logic to the meta-functions so that > > which array package gets used is configurable. We did something like > > that for the matplotlib plotting software at the Python level with > > the "numerix" layer, an idea I think we copied from Chaco. The kind > > of dispatch I think might be good to support configurability looks like > > this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(PyArray_Which(what)) { > > USE_NUMERIC: > > result = Numeric_whatsThis(what); break; > > USE_NUMARRAY: > > result = Numarray_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > In the above, I'm picturing a separate .c file for Numeric_whatsThis > > and for Numarray_whatsThis. It would be nice to streamline that to one > > .c and a process which somehow (simply) produces both functions. > > > > Or, ideally, the above would be done more like this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(Numerix_Which(what)) { > > USE_NUMERIX: > > result = Numerix_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > Here, a common Numerix implementation supports both numarray and Numeric > > from a single simple .c. The extension module would do "#include > > numerix/arrayobject.h" and "import_numerix()" and otherwise just call > > PyArray_* functions. > > > > The current stumbling block is that numarray is not binary compatible > > with Numeric... so numerix in C falls apart. I haven't analyzed > > every symbol and struct to see if it is really feasible... but it > > seems like it is *almost* feasible, at least for typical usage. > > > > So, in a nutshell, I think the dual implementation support you > > demoed is important and we should work up an example and kick it > > around to make sure it's the best way we can think of doing it. > > Then we should add a section to the PEP describing dual support as well. > > I would never apply numarray code to Numeric arrays and the inverse. It > looks dangerous and I do not know if it is possible. The first thing > coming to mind is that numarray and Numeric arrays refer to different type > objects (this is what my pep module uses to differentiate them). So, even > if numarray and Numeric are binary compatible, any 'alien' code referring > the the 'Python-standard part' of the type objects may lead to surprises. A > PEP proposing hacks will raise eyebrows at least. > > Secondly, most people use Numeric *or* numarray and not both. > > So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out > (NINO) Of course, Numeric or numarray output can be a user option if NINO > does not apply. (explicit safe conversion between Numeric and numarray is > possible if really needed). > > I'll try to flesh out the demo with real functions in the way you indicated > (going as far as I consider safe). > > The problem of coding the Numeric (or numarray) functions in more than > a single source file has also be addressed. > > It may take 2 weeks because I am off to a conference next week. > > Regards -- Gerard Hi all, first, I would like to state that I don't understand much of this discussion; so the only comment I wanted to make is that IF this where possible, to make (C/C++) code that can live with both Numeric and numarray, then I think it would be used more and more - think: transition phase !! (e.g. someone could start making the FFTW part of scipy numarray friendly without having to switch everything at one [hint ;-)] ) These where just my 2 cents. Cheers, Sebastian Haase From jmiller at stsci.edu Thu Jul 1 09:44:13 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 09:44:13 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <20040701053355.M99698@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> Message-ID: <1088700210.14402.17.camel@halloween.stsci.edu> On Thu, 2004-07-01 at 02:33, gerard.vermeulen at grenoble.cnrs.fr wrote: > On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > > > > So... you use the "meta" code to provide package specific ordinary > > (not-macro-fied) functions to keep the different versions of the > > Present() and isArray() macros from conflicting. > > > > It would be nice to have a standard approach for using the same > > "extension enhancement code" for both numarray and Numeric. The PEP > > should really be expanded to provide an example of dual support for one > > complete and real function, guts and all, so people can see the process > > end-to-end; Something like a simple arrayprint. That process needs > > to be refined to remove as much tedium and duplication of effort as > > possible. The idea is to make it as close to providing one > > implementation to support both array packages as possible. I think it's > > important to illustrate how to partition the extension module into > > separate compilation units which correctly navigate the dual > > implementation mine field in the easiest possible way. > > > > It would also be nice to add some logic to the meta-functions so that > > which array package gets used is configurable. We did something like > > that for the matplotlib plotting software at the Python level with > > the "numerix" layer, an idea I think we copied from Chaco. The kind > > of dispatch I think might be good to support configurability looks like > > this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(PyArray_Which(what)) { > > USE_NUMERIC: > > result = Numeric_whatsThis(what); break; > > USE_NUMARRAY: > > result = Numarray_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > In the above, I'm picturing a separate .c file for Numeric_whatsThis > > and for Numarray_whatsThis. It would be nice to streamline that to one > > .c and a process which somehow (simply) produces both functions. > > > > Or, ideally, the above would be done more like this: > > > > PyObject * > > whatsThis(PyObject *dummy, PyObject *args) > > { > > PyObject *result, *what = NULL; > > if (!PyArg_ParseTuple(args, "O", &what)) > > return 0; > > switch(Numerix_Which(what)) { > > USE_NUMERIX: > > result = Numerix_whatsThis(what); break; > > USE_SEQUENCE: > > result = Sequence_whatsThis(what); break; > > } > > Py_INCREF(Py_None); > > return Py_None; > > } > > > > Here, a common Numerix implementation supports both numarray and Numeric > > from a single simple .c. The extension module would do "#include > > numerix/arrayobject.h" and "import_numerix()" and otherwise just call > > PyArray_* functions. > > > > The current stumbling block is that numarray is not binary compatible > > with Numeric... so numerix in C falls apart. I haven't analyzed > > every symbol and struct to see if it is really feasible... but it > > seems like it is *almost* feasible, at least for typical usage. > > > > So, in a nutshell, I think the dual implementation support you > > demoed is important and we should work up an example and kick it > > around to make sure it's the best way we can think of doing it. > > Then we should add a section to the PEP describing dual support as well. > > > I would never apply numarray code to Numeric arrays and the inverse. It looks > dangerous and I do not know if it is possible. I think that's definitely the marching orders for now... but you gotta admit, it would be nice. > The first thing coming > to mind is that numarray and Numeric arrays refer to different type objects > (this is what my pep module uses to differentiate them). So, even if > numarray and Numeric are binary compatible, any 'alien' code referring the > the 'Python-standard part' of the type objects may lead to surprises. > A PEP proposing hacks will raise eyebrows at least. I'm a little surprised it took someone to talk me out of it... I'll just concede that this was probably a bad idea. > Secondly, most people use Numeric *or* numarray and not both. A class of question which will arise for developers is this: "X works with Numeric, but X doesn't work with numaray." The reverse also happens occasionally. For this reason, being able to choose would be nice for developers. > So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) > Of course, Numeric or numarray output can be a user option if NINO does not > apply. When I first heard it, I though NINO was a good idea, with the limitation that it doesn't apply when a function produces an array without consuming any. But... there is another problem with NINO that Perry Greenfield pointed out: with multiple arguments, there can be a mix of array types. For this reason, it makes sense to be able to coerce all the inputs to a particular array package. This form might look more like: switch(PyArray_Which()) { case USE_NUMERIC: result = Numeric_doit(a1, a2, a3); break; case USE_NUMARRAY: result = Numarray_doit(a1, a2, a3); break; case USE_SEQUENCE: result = Sequence_doit(a1, a2, a3); break; } One last thing: I think it would be useful to be able to drive the code into sequence mode with arrays. This would enable easy benchmarking of the performance improvement. > (explicit safe conversion between Numeric and numarray is possible > if really needed). > >I'll try to flesh out the demo with real functions in the way you indicated > (going as far as I consider safe). > > The problem of coding the Numeric (or numarray) functions in more than > a single source file has also be addressed. > > It may take 2 weeks because I am off to a conference next week. Excellent. See you in a couple weeks. Regards, Todd From jmiller at stsci.edu Thu Jul 1 09:59:01 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 09:59:01 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <40E3462A.9080303@cox.net> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> <40E3462A.9080303@cox.net> Message-ID: <1088701077.14402.20.camel@halloween.stsci.edu> On Wed, 2004-06-30 at 19:00, Tim Hochberg wrote: > By this do you mean the "#if PY_VERSION_HEX >= 0x02030000 " that is > wrapped around _ndarray_item? If so, I believe that it *is* getting > compiled, it's just never getting called. > > What I think is happening is that the class NumArray inherits its > sq_item from PyClassObject. In particular, I think it picks up > instance_item from Objects/classobject.c. This appears to be fairly > expensive and, I think, ends up calling tp_as_mapping->mp_subscript. > Thus, _ndarray's sq_item slot never gets called. All of this is pretty > iffy since I don't know this stuff very well and I didn't trace it all > the way through. However, it explains what I've seen thus far. > > This is why I ended up using the horrible hack. I'm resetting NumArray's > sq_item to point to _ndarray_item instead of instance_item. I believe > that access at the python level goes through mp_subscrip, so it > shouldn't be affected, and only objects at the C level should notice and > they should just get the faster sq_item. You, will notice that there are > an awful lot of I thinks in the above paragraphs though... Ugh... Thanks for explaining this. > >>I then optimized _ndarray_item (code > >>at end). This halved the execution time of my arbitrary benchmark. This > >>trick may have horrible, unforseen consequences so use at your own risk. > >> > >> > > > >Right now the sq_item hack strikes me as somewhere between completely > >unnecessary and too scary for me! Maybe if python-dev blessed it. > > > > > Yes, very scary. And it occurs to me that it will break subclasses of > NumArray if they override __getitem__. When these subclasses are > accessed from C they will see nd_array's sq_item instead of the > overridden getitem. However, I think I also know how to fix it. But > it does point out that it is very dangerous and there are probably dark > corners of which I'm unaware. Asking on Python-List or PyDev would > probably be a good idea. > > The nonscary, but painful, fix would to rewrite NumArray in C. Non-scary to whom? > >This optimization looks good to me. > > > > > Unfortunately, I don't think the optimization to sq_item will affect > much since NumArray appears to override it with > > >>Finally I commented out the __del__ method numarraycore. This resulted > >>in an additional speedup of 64% for a total speed up of 240%. Still not > >>close to 10x, but a large improvement. However, this is obviously not > >>viable for real use, but it's enough of a speedup that I'll try to see > >>if there's anyway to move the shadow stuff back to tp_dealloc. > >> > >> > > > >FYI, the issue with tp_dealloc may have to do with which mode Python is > >compiled in, --with-pydebug, or not. One approach which seems like it > >ought to work (just thought of this!) is to add an extra reference in C > >to the NumArray instance __dict__ (from NumArray.__init__ and stashed > >via a new attribute in the PyArrayObject struct) and then DECREF it as > >the last part of the tp_dealloc. > > > > > That sounds promising. I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore. So the INCREF needs to be done in C without doing any getattrs; this seems to mean calling a private _PyObject_GetDictPtr function to get a pointer to the __dict__ slot which can be dereferenced to get the __dict__. > [SNIP] > > > > >Well, be picking out your beer. > > > > > I was only about half right, so I'm not sure I qualify... We could always reduce your wages to a 12-pack... Todd From gerard.vermeulen at grenoble.cnrs.fr Thu Jul 1 11:39:08 2004 From: gerard.vermeulen at grenoble.cnrs.fr (Gerard Vermeulen) Date: Thu Jul 1 11:39:08 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <1088700210.14402.17.camel@halloween.stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <1088700210.14402.17.camel@halloween.stsci.edu> Message-ID: <20040701203739.31f80e02.gerard.vermeulen@grenoble.cnrs.fr> On 01 Jul 2004 12:43:31 -0400 Todd Miller wrote: > A class of question which will arise for developers is this: "X works > with Numeric, but X doesn't work with numaray." The reverse also > happens occasionally. For this reason, being able to choose would be > nice for developers. > > > So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out (NINO) > > Of course, Numeric or numarray output can be a user option if NINO does not > > apply. > > When I first heard it, I though NINO was a good idea, with the > limitation that it doesn't apply when a function produces an array > without consuming any. But... there is another problem with NINO that > Perry Greenfield pointed out: with multiple arguments, there can be a > mix of array types. For this reason, it makes sense to be able to > coerce all the inputs to a particular array package. This form might > look more like: > > switch(PyArray_Which()) { > case USE_NUMERIC: > result = Numeric_doit(a1, a2, a3); break; > case USE_NUMARRAY: > result = Numarray_doit(a1, a2, a3); break; > case USE_SEQUENCE: > result = Sequence_doit(a1, a2, a3); break; > } > > One last thing: I think it would be useful to be able to drive the code > into sequence mode with arrays. This would enable easy benchmarking of > the performance improvement. > > > (explicit safe conversion between Numeric and numarray is possible > > if really needed). Yeah, when I wrote 'if really needed', I was hoping to shift the responsibility of coercion (or conversion) to the Python programmer (my lazy side telling me that it can be done in pure Python). You talked me into doing it in C :-) Regards -- Gerard From tim.hochberg at cox.net Thu Jul 1 11:52:05 2004 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Jul 1 11:52:05 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <1088701077.14402.20.camel@halloween.stsci.edu> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> <40E3462A.9080303@cox.net> <1088701077.14402.20.camel@halloween.stsci.edu> Message-ID: <40E45D3C.7020501@cox.net> Todd Miller wrote: >On Wed, 2004-06-30 at 19:00, Tim Hochberg wrote: > > >>>> >>>> >>>> >>>FYI, the issue with tp_dealloc may have to do with which mode Python is >>>compiled in, --with-pydebug, or not. One approach which seems like it >>>ought to work (just thought of this!) is to add an extra reference in C >>>to the NumArray instance __dict__ (from NumArray.__init__ and stashed >>>via a new attribute in the PyArrayObject struct) and then DECREF it as >>>the last part of the tp_dealloc. >>> >>> >>> >>> >>That sounds promising. >> >> > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > So the INCREF needs to be done in C without doing any getattrs; this > seems to mean calling a private _PyObject_GetDictPtr function to get a > pointer to the __dict__ slot which can be dereferenced to get the > __dict__. Might there be a simpler way? Since you're putting an extra attribute on the PyArrayObject structure anyway, wouldn't it be possible to just stash _shadows there instead of the reference to the dictionary? It appears that that the only time _shadows is accessed from python is in __del__. If it were instead an attribute on ndarray, the dealloc problem would go away since the responsibility for deallocing it would fall to ndarray. Since everything else accesses it from C, that shouldn't be much of a problem and should speed that stuff up as well. -tim From cjw at sympatico.ca Thu Jul 1 12:59:01 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 1 12:59:01 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <200407010904.25498.haase@msg.ucsf.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <200407010904.25498.haase@msg.ucsf.edu> Message-ID: <40E46CD3.9090802@sympatico.ca> Sebastian Haase wrote: >On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > > >>On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote >> >> >> >>>So... you use the "meta" code to provide package specific ordinary >>>(not-macro-fied) functions to keep the different versions of the >>>Present() and isArray() macros from conflicting. >>> >>>It would be nice to have a standard approach for using the same >>>"extension enhancement code" for both numarray and Numeric. The PEP >>>should really be expanded to provide an example of dual support for one >>>complete and real function, guts and all, so people can see the process >>>end-to-end; Something like a simple arrayprint. That process needs >>>to be refined to remove as much tedium and duplication of effort as >>>possible. The idea is to make it as close to providing one >>>implementation to support both array packages as possible. I think it's >>>important to illustrate how to partition the extension module into >>>separate compilation units which correctly navigate the dual >>>implementation mine field in the easiest possible way. >>> >>>It would also be nice to add some logic to the meta-functions so that >>>which array package gets used is configurable. We did something like >>>that for the matplotlib plotting software at the Python level with >>>the "numerix" layer, an idea I think we copied from Chaco. The kind >>>of dispatch I think might be good to support configurability looks like >>>this: >>> >>>PyObject * >>>whatsThis(PyObject *dummy, PyObject *args) >>>{ >>> PyObject *result, *what = NULL; >>> if (!PyArg_ParseTuple(args, "O", &what)) >>> return 0; >>> switch(PyArray_Which(what)) { >>> USE_NUMERIC: >>> result = Numeric_whatsThis(what); break; >>> USE_NUMARRAY: >>> result = Numarray_whatsThis(what); break; >>> USE_SEQUENCE: >>> result = Sequence_whatsThis(what); break; >>> } >>> Py_INCREF(Py_None); >>> return Py_None; >>>} >>> >>>In the above, I'm picturing a separate .c file for Numeric_whatsThis >>>and for Numarray_whatsThis. It would be nice to streamline that to one >>>.c and a process which somehow (simply) produces both functions. >>> >>>Or, ideally, the above would be done more like this: >>> >>>PyObject * >>>whatsThis(PyObject *dummy, PyObject *args) >>>{ >>> PyObject *result, *what = NULL; >>> if (!PyArg_ParseTuple(args, "O", &what)) >>> return 0; >>> switch(Numerix_Which(what)) { >>> USE_NUMERIX: >>> result = Numerix_whatsThis(what); break; >>> USE_SEQUENCE: >>> result = Sequence_whatsThis(what); break; >>> } >>> Py_INCREF(Py_None); >>> return Py_None; >>>} >>> >>>Here, a common Numerix implementation supports both numarray and Numeric >>>from a single simple .c. The extension module would do "#include >>>numerix/arrayobject.h" and "import_numerix()" and otherwise just call >>>PyArray_* functions. >>> >>>The current stumbling block is that numarray is not binary compatible >>>with Numeric... so numerix in C falls apart. I haven't analyzed >>>every symbol and struct to see if it is really feasible... but it >>>seems like it is *almost* feasible, at least for typical usage. >>> >>>So, in a nutshell, I think the dual implementation support you >>>demoed is important and we should work up an example and kick it >>>around to make sure it's the best way we can think of doing it. >>>Then we should add a section to the PEP describing dual support as well. >>> >>> >>I would never apply numarray code to Numeric arrays and the inverse. It >>looks dangerous and I do not know if it is possible. The first thing >>coming to mind is that numarray and Numeric arrays refer to different type >>objects (this is what my pep module uses to differentiate them). So, even >>if numarray and Numeric are binary compatible, any 'alien' code referring >>the the 'Python-standard part' of the type objects may lead to surprises. A >>PEP proposing hacks will raise eyebrows at least. >> >>Secondly, most people use Numeric *or* numarray and not both. >> >>So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out >>(NINO) Of course, Numeric or numarray output can be a user option if NINO >>does not apply. (explicit safe conversion between Numeric and numarray is >>possible if really needed). >> >>I'll try to flesh out the demo with real functions in the way you indicated >>(going as far as I consider safe). >> >>The problem of coding the Numeric (or numarray) functions in more than >>a single source file has also be addressed. >> >>It may take 2 weeks because I am off to a conference next week. >> >>Regards -- Gerard >> >> > >Hi all, >first, I would like to state that I don't understand much of this discussion; >so the only comment I wanted to make is that IF this where possible, to make >(C/C++) code that can live with both Numeric and numarray, then I think it >would be used more and more - think: transition phase !! (e.g. someone could >start making the FFTW part of scipy numarray friendly without having to >switch everything at one [hint ;-)] ) > >These where just my 2 cents. >Cheers, >Sebastian Haase > > I feel lower on the understanding tree with respect to what is being proposed in the draft PEP, but would still like to offer my 2 cents worth. I get the feeling that numarray is being bent out of shape to fit Numeric. It was my understanding that Numeric had certain weakness which made it unacceptable as a Python component and that numarray was intended to provide the same or better functionality within a pythonic framework. numarray has not achieved the expected performance level to date, but progress is being made and I believe that, for larger arrays, numarray has been shown to be be superior to Numeric - please correct me if I'm wrong here. The shock came for me when Todd Miller said: <> I looked at this some, and while INCREFing __dict__ maybe the right idea, I forgot that there *is no* Python NumArray.__init__ anymore. Wasn't it the intent of numarray to work towards the full use of the Python class structure to provide the benefits which it offers? The Python class has two constructors and one destructor. The constructors are __init__ and __new__, the latter only provides the shell of an instance which later has to be initialized. In version 0.9, which I use, there is no __new__, but there is a new function which has a functionality similar to that intended for __new__. Thus, with this change, numarray appears to be moving further away from being pythonic. Colin W From jmiller at stsci.edu Thu Jul 1 13:03:12 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 13:03:12 2004 Subject: [Numpy-discussion] Speeding up wxPython/numarray In-Reply-To: <40E45D3C.7020501@cox.net> References: <40E31B31.7040105@cox.net> <1088632048.7526.204.camel@halloween.stsci.edu> <40E3462A.9080303@cox.net> <1088701077.14402.20.camel@halloween.stsci.edu> <40E45D3C.7020501@cox.net> Message-ID: <1088712102.14402.73.camel@halloween.stsci.edu> On Thu, 2004-07-01 at 14:51, Tim Hochberg wrote: > Todd Miller wrote: > > >On Wed, 2004-06-30 at 19:00, Tim Hochberg wrote: > > > > > >>>> > >>>> > >>>> > >>>FYI, the issue with tp_dealloc may have to do with which mode Python is > >>>compiled in, --with-pydebug, or not. One approach which seems like it > >>>ought to work (just thought of this!) is to add an extra reference in C > >>>to the NumArray instance __dict__ (from NumArray.__init__ and stashed > >>>via a new attribute in the PyArrayObject struct) and then DECREF it as > >>>the last part of the tp_dealloc. > >>> > >>> > >>> > >>> > >>That sounds promising. > >> > >> > > <> > > I looked at this some, and while INCREFing __dict__ maybe the right > > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > > > So the INCREF needs to be done in C without doing any getattrs; this > > seems to mean calling a private _PyObject_GetDictPtr function to get a > > pointer to the __dict__ slot which can be dereferenced to get the > > __dict__. > > Might there be a simpler way? Since you're putting an extra attribute on > the PyArrayObject structure anyway, wouldn't it be possible to just > stash _shadows there instead of the reference to the dictionary? _shadows is already in the struct. The root problem (I recall) is not the loss of self->_shadows, it's the loss self->__dict__ before self can be copied onto self->_shadows. The cause of the problem appeared to me to be the tear down order of self: the NumArray part appeared to be torn down before the _numarray part, and the tp_dealloc needs to do a Python callback where a half destructed object just won't do. To really know what the problem is, I need to stick tp_dealloc back in and see what breaks. I'm pretty sure the problem was a missing instance __dict__, but my memory is quite fallable. Todd From Chris.Barker at noaa.gov Thu Jul 1 13:18:01 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 1 13:18:01 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <20040701053355.M99698@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> Message-ID: <40E470D9.8060603@noaa.gov> Hi all, I'm looking for a way to read data from ascii text files quickly. I've found that using the standard python idioms like: data = array((M,N),Float) for in range(N): data.append(map(float,file.readline().split())) Can be pretty slow. What I'd like is something like Matlab's fscanf: data = fscanf(file, "%g", [M,N] ) I may have the syntax a little wrong, but the gist is there. What Matlab does keep recycling the format string until the desired number of elements have been read. It is quite flexible, and ends up being pretty fast. Has anyone written something like this for Numeric (or numarray, but I'd prefer Numeric at this point) ? I was surprised not to find something like this in SciPy, maybe I didn't look hard enough. If no one has done this, I guess I'll get started on it.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Fernando.Perez at colorado.edu Thu Jul 1 13:28:01 2004 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Jul 1 13:28:01 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40E470D9.8060603@noaa.gov> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> Message-ID: <40E473A9.5040109@colorado.edu> Chris Barker wrote: > Hi all, > > I'm looking for a way to read data from ascii text files quickly. I've > found that using the standard python idioms like: > > data = array((M,N),Float) > for in range(N): > data.append(map(float,file.readline().split())) > > Can be pretty slow. What I'd like is something like Matlab's fscanf: > > data = fscanf(file, "%g", [M,N] ) > > I may have the syntax a little wrong, but the gist is there. What Matlab > does keep recycling the format string until the desired number of > elements have been read. > > It is quite flexible, and ends up being pretty fast. > > Has anyone written something like this for Numeric (or numarray, but I'd > prefer Numeric at this point) ? > > I was surprised not to find something like this in SciPy, maybe I didn't > look hard enough. scipy.io.read_array? I haven't timed it, because it's been 'fast enough' for my needs. For reading binary data files, I have this little utility which is basically a wrapper around Numeric.fromstring (N below is Numeric imported 'as N'). Note that it can read binary .gz files directly, a _huge_ gain for very sparse files representing 3d arrays (I can read a 400k gz file which blows up to ~60MB when unzipped in no time at all, while reading the unzipped file is very slow): def read_bin(fname,dims,typecode,recast_type=None,offset=0,verbose=0): """Read in a binary data file. Does NOT check for endianness issues. Inputs: fname - can be .gz dims (nx1,nx2,...,nxd) typecode recast_type offset=0: # of bytes to skip in file *from the beginning* before data starts """ # config parameters item_size = N.zeros(1,typecode).itemsize() # size in bytes data_size = N.product(N.array(dims))*item_size # read in data if fname.endswith('.gz'): data_file = gzip.open(fname) else: data_file = file(fname) data_file.seek(offset) data = N.fromstring(data_file.read(data_size),typecode) data_file.close() data.shape = dims if verbose: #print 'Read',data_size/item_size,'data points. Shape:',dims print 'Read',N.size(data),'data points. Shape:',dims if recast_type is not None: data = data.astype(recast_type) return data HTH, f From squirrel at WPI.EDU Thu Jul 1 13:37:13 2004 From: squirrel at WPI.EDU (Christopher T King) Date: Thu Jul 1 13:37:13 2004 Subject: [Numpy-discussion] numarray and SMP Message-ID: (I originally posted this in comp.lang.python and was redirected here) In a quest to speed up numarray computations, I tried writing a 'threaded array' class for use on SMP systems that would distribute its workload across the processors. I hit a snag when I found out that since the Python interpreter is not reentrant, this effectively disables parallel processing in Python. I've come up with two solutions to this problem, both involving numarray's C functions that perform the actual vector operations: 1) Surround the C vector operations with Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS, thus allowing the vector operations (which don't access Python structures) to run in parallel with the interpreter. Python glue code would take care of threading and locking. 2) Move the parallelization into the C vector functions themselves. This would likely get poorer performance (a chain of vector operations couldn't be combined into one threaded operation). I'd much rather do #1, but will playing around with the interpreter state like that cause any problems? Update from original posting: I've partially implemented method #1 for Float64s. Running on four 2.4GHz Xeons (possibly two with hyperthreading?), I get about a 30% speedup while dividing 10 million Float64s, but a small (<10%) slowdown doing addition or multiplication. The operation was repeated 100 times, with the threads created outside of the loop (i.e. the threads weren't recreated for each iteration). Is there really that much overhead in Python? I can post the code I'm using and the numarray patch if it's requested. From gerard.vermeulen at grenoble.cnrs.fr Thu Jul 1 13:40:07 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Thu Jul 1 13:40:07 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <40E46CD3.9090802@sympatico.ca> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <200407010904.25498.haase@msg.ucsf.edu> <40E46CD3.9090802@sympatico.ca> Message-ID: <20040701200934.M74616@grenoble.cnrs.fr> On Thu, 01 Jul 2004 15:58:11 -0400, Colin J. Williams wrote > Sebastian Haase wrote: > > >On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > > > > > >>On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > >> > >> > >> > >>>So... you use the "meta" code to provide package specific ordinary > >>>(not-macro-fied) functions to keep the different versions of the > >>>Present() and isArray() macros from conflicting. > >>> > >>>It would be nice to have a standard approach for using the same > >>>"extension enhancement code" for both numarray and Numeric. The PEP > >>>should really be expanded to provide an example of dual support for one > >>>complete and real function, guts and all, so people can see the process > >>>end-to-end; Something like a simple arrayprint. That process needs > >>>to be refined to remove as much tedium and duplication of effort as > >>>possible. The idea is to make it as close to providing one > >>>implementation to support both array packages as possible. I think it's > >>>important to illustrate how to partition the extension module into > >>>separate compilation units which correctly navigate the dual > >>>implementation mine field in the easiest possible way. > >>> > >>>It would also be nice to add some logic to the meta-functions so that > >>>which array package gets used is configurable. We did something like > >>>that for the matplotlib plotting software at the Python level with > >>>the "numerix" layer, an idea I think we copied from Chaco. The kind > >>>of dispatch I think might be good to support configurability looks like > >>>this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(PyArray_Which(what)) { > >>> USE_NUMERIC: > >>> result = Numeric_whatsThis(what); break; > >>> USE_NUMARRAY: > >>> result = Numarray_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>In the above, I'm picturing a separate .c file for Numeric_whatsThis > >>>and for Numarray_whatsThis. It would be nice to streamline that to one > >>>.c and a process which somehow (simply) produces both functions. > >>> > >>>Or, ideally, the above would be done more like this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(Numerix_Which(what)) { > >>> USE_NUMERIX: > >>> result = Numerix_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>Here, a common Numerix implementation supports both numarray and Numeric > >>>from a single simple .c. The extension module would do "#include > >>>numerix/arrayobject.h" and "import_numerix()" and otherwise just call > >>>PyArray_* functions. > >>> > >>>The current stumbling block is that numarray is not binary compatible > >>>with Numeric... so numerix in C falls apart. I haven't analyzed > >>>every symbol and struct to see if it is really feasible... but it > >>>seems like it is *almost* feasible, at least for typical usage. > >>> > >>>So, in a nutshell, I think the dual implementation support you > >>>demoed is important and we should work up an example and kick it > >>>around to make sure it's the best way we can think of doing it. > >>>Then we should add a section to the PEP describing dual support as well. > >>> > >>> > >>I would never apply numarray code to Numeric arrays and the inverse. It > >>looks dangerous and I do not know if it is possible. The first thing > >>coming to mind is that numarray and Numeric arrays refer to different type > >>objects (this is what my pep module uses to differentiate them). So, even > >>if numarray and Numeric are binary compatible, any 'alien' code referring > >>the the 'Python-standard part' of the type objects may lead to surprises. A > >>PEP proposing hacks will raise eyebrows at least. > >> > >>Secondly, most people use Numeric *or* numarray and not both. > >> > >>So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out > >>(NINO) Of course, Numeric or numarray output can be a user option if NINO > >>does not apply. (explicit safe conversion between Numeric and numarray is > >>possible if really needed). > >> > >>I'll try to flesh out the demo with real functions in the way you indicated > >>(going as far as I consider safe). > >> > >>The problem of coding the Numeric (or numarray) functions in more than > >>a single source file has also be addressed. > >> > >>It may take 2 weeks because I am off to a conference next week. > >> > >>Regards -- Gerard > >> > >> > > > >Hi all, > >first, I would like to state that I don't understand much of this discussion; > >so the only comment I wanted to make is that IF this where possible, to make > >(C/C++) code that can live with both Numeric and numarray, then I think it > >would be used more and more - think: transition phase !! (e.g. someone could > >start making the FFTW part of scipy numarray friendly without having to > >switch everything at one [hint ;-)] ) > > > >These where just my 2 cents. > >Cheers, > >Sebastian Haase > > > > > I feel lower on the understanding tree with respect to what is being > proposed in the draft PEP, but would still like to offer my 2 cents > worth. I get the feeling that numarray is being bent out of shape > to fit Numeric. > What we are discussing are methods to make it possible to import Numeric and numarray in the same extension module. This can be done by separating the colliding APIs of Numeric and numarray in separate *.c files. To achieve this, no changes to Numeric and numarray itself are necessary. In fact, this can be done by the author of the C-extension himself, but since it is not obvious we discuss the best methods and we like to provide the necessary glue code. It will make life easier for extension writers and facilitate the transition to numarray. Try to look at the problem from the other side: I am using Numeric (since my life depends on SciPy) but have written an extension that can also import numarray (hoping to get more users). I will never use the methods proposed in the draft PEP, because it excludes importing Numeric. > > It was my understanding that Numeric had certain weakness which made > it unacceptable as a Python component and that numarray was intended > to provide the same or better functionality within a pythonic framework. > > numarray has not achieved the expected performance level to date, > but progress is being made and I believe that, for larger arrays, > numarray has been shown to be be superior to Numeric - please > correct me if I'm wrong here. > I think you are correct. I don't know why the __init__ has disappeared, but I don't think it is because of the PEP and certainly not because of the thread. > > The shock came for me when Todd Miller said: > > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > Wasn't it the intent of numarray to work towards the full use of the > Python class structure to provide the benefits which it offers? > > The Python class has two constructors and one destructor. > > The constructors are __init__ and __new__, the latter only provides > the shell of an instance which later has to be initialized. In > version 0.9, which I use, there is no __new__, but there is a new > function which has a functionality similar to that intended for > __new__. Thus, with this change, numarray appears to be moving > further away from being pythonic. > Gerard From jmiller at stsci.edu Thu Jul 1 13:46:07 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 1 13:46:07 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <40E46CD3.9090802@sympatico.ca> References: <1088451653.3744.200.camel@localhost.localdomain> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <200407010904.25498.haase@msg.ucsf.edu> <40E46CD3.9090802@sympatico.ca> Message-ID: <1088714723.14402.114.camel@halloween.stsci.edu> On Thu, 2004-07-01 at 15:58, Colin J. Williams wrote: > Sebastian Haase wrote: > > >On Wednesday 30 June 2004 11:33 pm, gerard.vermeulen at grenoble.cnrs.fr wrote: > > > > > >>On 30 Jun 2004 17:54:19 -0400, Todd Miller wrote > >> > >> > >> > >>>So... you use the "meta" code to provide package specific ordinary > >>>(not-macro-fied) functions to keep the different versions of the > >>>Present() and isArray() macros from conflicting. > >>> > >>>It would be nice to have a standard approach for using the same > >>>"extension enhancement code" for both numarray and Numeric. The PEP > >>>should really be expanded to provide an example of dual support for one > >>>complete and real function, guts and all, so people can see the process > >>>end-to-end; Something like a simple arrayprint. That process needs > >>>to be refined to remove as much tedium and duplication of effort as > >>>possible. The idea is to make it as close to providing one > >>>implementation to support both array packages as possible. I think it's > >>>important to illustrate how to partition the extension module into > >>>separate compilation units which correctly navigate the dual > >>>implementation mine field in the easiest possible way. > >>> > >>>It would also be nice to add some logic to the meta-functions so that > >>>which array package gets used is configurable. We did something like > >>>that for the matplotlib plotting software at the Python level with > >>>the "numerix" layer, an idea I think we copied from Chaco. The kind > >>>of dispatch I think might be good to support configurability looks like > >>>this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(PyArray_Which(what)) { > >>> USE_NUMERIC: > >>> result = Numeric_whatsThis(what); break; > >>> USE_NUMARRAY: > >>> result = Numarray_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>In the above, I'm picturing a separate .c file for Numeric_whatsThis > >>>and for Numarray_whatsThis. It would be nice to streamline that to one > >>>.c and a process which somehow (simply) produces both functions. > >>> > >>>Or, ideally, the above would be done more like this: > >>> > >>>PyObject * > >>>whatsThis(PyObject *dummy, PyObject *args) > >>>{ > >>> PyObject *result, *what = NULL; > >>> if (!PyArg_ParseTuple(args, "O", &what)) > >>> return 0; > >>> switch(Numerix_Which(what)) { > >>> USE_NUMERIX: > >>> result = Numerix_whatsThis(what); break; > >>> USE_SEQUENCE: > >>> result = Sequence_whatsThis(what); break; > >>> } > >>> Py_INCREF(Py_None); > >>> return Py_None; > >>>} > >>> > >>>Here, a common Numerix implementation supports both numarray and Numeric > >>>from a single simple .c. The extension module would do "#include > >>>numerix/arrayobject.h" and "import_numerix()" and otherwise just call > >>>PyArray_* functions. > >>> > >>>The current stumbling block is that numarray is not binary compatible > >>>with Numeric... so numerix in C falls apart. I haven't analyzed > >>>every symbol and struct to see if it is really feasible... but it > >>>seems like it is *almost* feasible, at least for typical usage. > >>> > >>>So, in a nutshell, I think the dual implementation support you > >>>demoed is important and we should work up an example and kick it > >>>around to make sure it's the best way we can think of doing it. > >>>Then we should add a section to the PEP describing dual support as well. > >>> > >>> > >>I would never apply numarray code to Numeric arrays and the inverse. It > >>looks dangerous and I do not know if it is possible. The first thing > >>coming to mind is that numarray and Numeric arrays refer to different type > >>objects (this is what my pep module uses to differentiate them). So, even > >>if numarray and Numeric are binary compatible, any 'alien' code referring > >>the the 'Python-standard part' of the type objects may lead to surprises. A > >>PEP proposing hacks will raise eyebrows at least. > >> > >>Secondly, most people use Numeric *or* numarray and not both. > >> > >>So, I prefer: Numeric In => Numeric Out or Numarray In => Numarray Out > >>(NINO) Of course, Numeric or numarray output can be a user option if NINO > >>does not apply. (explicit safe conversion between Numeric and numarray is > >>possible if really needed). > >> > >>I'll try to flesh out the demo with real functions in the way you indicated > >>(going as far as I consider safe). > >> > >>The problem of coding the Numeric (or numarray) functions in more than > >>a single source file has also be addressed. > >> > >>It may take 2 weeks because I am off to a conference next week. > >> > >>Regards -- Gerard > >> > >> > > > >Hi all, > >first, I would like to state that I don't understand much of this discussion; > >so the only comment I wanted to make is that IF this where possible, to make > >(C/C++) code that can live with both Numeric and numarray, then I think it > >would be used more and more - think: transition phase !! (e.g. someone could > >start making the FFTW part of scipy numarray friendly without having to > >switch everything at one [hint ;-)] ) > > > >These where just my 2 cents. > >Cheers, > >Sebastian Haase > > > > > I feel lower on the understanding tree with respect to what is being > proposed in the draft PEP, but would still like to offer my 2 cents > worth. I get the feeling that numarray is being bent out of shape to > fit Numeric. Yes and no. The numarray team has over time realized the importance of backward compatibility with the dominant array package, Numeric. A lot of People use Numeric now. We're trying to make it as easy as possible to use numarray. > It was my understanding that Numeric had certain weakness which made it > unacceptable as a Python component and that numarray was intended to > provide the same or better functionality within a pythonic framework. My understanding is that until there is a consensus on an array package, neither numarray nor Numeric is going into the Python core. > numarray has not achieved the expected performance level to date, but > progress is being made and I believe that, for larger arrays, numarray > has been shown to be be superior to Numeric - please correct me if I'm > wrong here. I think that's a fair summary. > > The shock came for me when Todd Miller said: > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > Wasn't it the intent of numarray to work towards the full use of the > Python class structure to provide the benefits which it offers? > Ack. I wasn't trying to start a panic. The __init__ still exists, as does __new__, they're just in C. Sorry if I was unclear. > The Python class has two constructors and one destructor. We're mostly on the same page. > The constructors are __init__ and __new__, the latter only provides the > shell of an instance which later has to be initialized. In version 0.9, > which I use, there is no __new__, It's there, but it's not very useful: >>> import numarray >>> numarray.NumArray.__new__ >>> a = numarray.NumArray.__new__(numarray.NumArray) >>> a.info() class: shape: () strides: () byteoffset: 0 bytestride: 0 itemsize: 0 aligned: 1 contiguous: 1 data: None byteorder: little byteswap: 0 type: Any I don't, however, recommend doing this. > but there is a new function which has > a functionality similar to that intended for __new__. Thus, with this > change, numarray appears to be moving further away from being pythonic. Nope. I'm talking about moving toward better speed with no change in functionality at the Python level. I also think maybe we've gotten list threads crossed here: the "Numarray header PEP" thread is independent (but admittedly related) of the "Speeding up wxPython/numarray" thread. The Numarray header PEP is about making it easy for packages to write C extensions which *optionally* support numarray (and now Numeric as well). One aspect of the PEP is getting headers included in the Python core so that extensions can be compiled even when the numarray is not installed. The other aspect will be illustrating a good technique for supporting both numarray and Numeric, optionally and with choice, at the same time. Such an extension would still run where there is numarray, Numeric, both, or none installed. Gerard V. has already done some integration of numarray and Numeric with PyQwt so he has a few good ideas on how to do the "good technique" aspect of the PEP. The Speeding up wxPython/numarray thread is about improving the performance of a 50000 point wxPython drawlines which is 10x slower with numarray than Numeric. Tim H. and Chris B. have nailed this down (mostly) to the numarray sequence protocol and destructor, __del__. Regards, Todd From perry at stsci.edu Thu Jul 1 13:57:02 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 1 13:57:02 2004 Subject: [Numpy-discussion] Numarray header PEP In-Reply-To: <40E46CD3.9090802@sympatico.ca> Message-ID: Collin J. Williams Wrote: > I feel lower on the understanding tree with respect to what is being > proposed in the draft PEP, but would still like to offer my 2 cents > worth. I get the feeling that numarray is being bent out of shape to > fit Numeric. > Todd and Gerard address this point well. > It was my understanding that Numeric had certain weakness which made it > unacceptable as a Python component and that numarray was intended to > provide the same or better functionality within a pythonic framework. > Let me reiterate what our motivations were. We wanted to use an array package for our software, and Numeric had enough shortcomings that we needed some changes in behavior (e.g., type coercion for scalars), changes in performance (particularly with regard to memory usage), and enhancements in capabilities (e.g., memory mapping, record arrays, etc.). It was the opinion of some (Paul Dubois, for example) that a rewrite was in order in any case since the code was not that maintainable (not everyone felt this way, though at the time that wasn't as clear). At the same time there was some hope that Numeric could be accepted into the standard Python distribution. That's something we thought would be good (but wasn't the highest priority for us) and I've come to believe that perhaps a better solution with regard to that is what this PEP is trying to address. In any case Guido made it clear that he would not accept Numeric in its (then) current form. That it be written mostly in Python was something suggested by Guido, and we started off that way, mainly because it would get us going much faster than writing it all in C. We definitely understood that it would also have the consequence of making small array performance worse. We said as much when we started; it wasn't as clear as it is now that many users objected to a factor of few slower performance (as it turned out, a mostly Python based implemenation was more than an order of magnitude slower for small arrays). > numarray has not achieved the expected performance level to date, but > progress is being made and I believe that, for larger arrays, numarray > has been shown to be be superior to Numeric - please correct me if I'm > wrong here. > We never expected numarray to ever reach the performance level for small arrays that Numeric has. If it were within a factor of two I would be thrilled (its more like a factor of 3 or 4 currently for simple ufuncs). I still don't think it ever will be as fast for small arrays. The focus all along was on handling large arrays, which I think it does quite well, both regard to memory and speed. Yes, there are some functions and operations that may be much slower. Mainly they need to be called out so they can be improved. Generally we only notice performance issues that affect our software. Others need to point out remaining large discrepancies. I'm still of the opinion that if small array performance is really important, a very different approach should be used and have a completely different implementation. I would think that improvements of an order of magnitude over what Numeric does now are possible. But since that isn't important to us (STScI), don't expect us to work on that :-) > The shock came for me when Todd Miller said: > > <> > I looked at this some, and while INCREFing __dict__ maybe the right > idea, I forgot that there *is no* Python NumArray.__init__ anymore. > > Wasn't it the intent of numarray to work towards the full use of the > Python class structure to provide the benefits which it offers? > > The Python class has two constructors and one destructor. > > The constructors are __init__ and __new__, the latter only provides the > shell of an instance which later has to be initialized. In version 0.9, > which I use, there is no __new__, but there is a new function which has > a functionality similar to that intended for __new__. Thus, with this > change, numarray appears to be moving further away from being pythonic. > I'll agree that optimization is driving the underlying implementation to one that is more complex and that is the drawback (no surprise there). There's Pythonic in use and Pythonic in implementation. We are certainly receptive to better ideas for the implementation, but I doubt that a heavily Python-based implementation is ever going to be competitive for small arrays (unless something like psyco become universal, but I think there are a whole mess of problems to be solved for that kind of approach to work well generically). Perry From perry at stsci.edu Thu Jul 1 15:01:04 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 1 15:01:04 2004 Subject: [Numpy-discussion] numarray and SMP In-Reply-To: Message-ID: Christopher T King wrote: > > (I originally posted this in comp.lang.python and was redirected here) > > In a quest to speed up numarray computations, I tried writing a 'threaded > array' class for use on SMP systems that would distribute its workload > across the processors. I hit a snag when I found out that since > the Python > interpreter is not reentrant, this effectively disables parallel > processing in Python. I've come up with two solutions to this problem, > both involving numarray's C functions that perform the actual vector > operations: > > 1) Surround the C vector operations with Py_BEGIN_ALLOW_THREADS and > Py_END_ALLOW_THREADS, thus allowing the vector operations (which don't > access Python structures) to run in parallel with the interpreter. > Python glue code would take care of threading and locking. > > 2) Move the parallelization into the C vector functions themselves. This > would likely get poorer performance (a chain of vector operations > couldn't be combined into one threaded operation). > > I'd much rather do #1, but will playing around with the interpreter state > like that cause any problems? > I don't think so, but it raises a number of questions that I ask just below. > Update from original posting: > > I've partially implemented method #1 for Float64s. Running on four 2.4GHz > Xeons (possibly two with hyperthreading?), I get about a 30% speedup while > dividing 10 million Float64s, but a small (<10%) slowdown doing addition > or multiplication. The operation was repeated 100 times, with the threads > created outside of the loop (i.e. the threads weren't recreated for each > iteration). Is there really that much overhead in Python? I can post the > code I'm using and the numarray patch if it's requested. > Questions and comments: 1) I suppose you did this for generated ufunc code? (ideally one would put this in the codegenerator stuff but for the purposes of testing it would be fine). I guess we would like to see how you actually changed the code fragment (you can email me or Todd Miller directly if you wish) 2) How much improvement you would see depends on many details. But if you were doing this for 10 million element arrays, I'm surprised you saw such a small improvement (30% for 4 processors isn't worth the trouble it would seem). So seeing the actual test code would be helpful. If the array operation you are doing for numarray aren't simple (that's a specialized use of the word; by that I mean if the arrays are not the same type, aren't contiguous, aren't aligned, or aren't of proper byte-order) then there are a number of other issues that may slow it down quite a bit (and there are ways of improving these for parallel processing). 3) I don't speak as an expert on threading or parallel processors, but I believe so long as you don't call any Python API functions (either directly or indirectly) between the global interpreter lock release and reacquisition, you should be fine. The vector ufunc code in numarray should satisfy this fine. Perry Greenfield From squirrel at WPI.EDU Fri Jul 2 06:37:20 2004 From: squirrel at WPI.EDU (Christopher T King) Date: Fri Jul 2 06:37:20 2004 Subject: [Numpy-discussion] numarray and SMP In-Reply-To: Message-ID: On Thu, 1 Jul 2004, Perry Greenfield wrote: > 1) I suppose you did this for generated ufunc code? (ideally one > would put this in the codegenerator stuff but for the purposes > of testing it would be fine). I guess we would like to see > how you actually changed the code fragment (you can email > me or Todd Miller directly if you wish) Yep, I didn't know it was automatically generated :P > 2) How much improvement you would see depends on many details. > But if you were doing this for 10 million element arrays, I'm > surprised you saw such a small improvement (30% for 4 processors > isn't worth the trouble it would seem). So seeing the actual > test code would be helpful. If the array operation you are doing > for numarray aren't simple (that's a specialized use of the word; > by that I mean if the arrays are not the same type, aren't > contiguous, aren't aligned, or aren't of proper byte-order) > then there are a number of other issues that may slow it down > quite a bit (and there are ways of improving these for > parallel processing). I've been careful not to use anything to cause discontiguities in the arrays, and to keep them all the same type (Float64 in this case). See my next post for the code I'm using. From haase at msg.ucsf.edu Fri Jul 2 08:28:01 2004 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jul 2 08:28:01 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <200406291705.55454.haase@msg.ucsf.edu> References: <200406291705.55454.haase@msg.ucsf.edu> Message-ID: <200407020827.05407.haase@msg.ucsf.edu> On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > Hi, > > Is this a bug?: > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > >>> d.type() > > Float32 > > >>> d[80, 136, 122] > > 80.3997039795 > > >>> na.maximum.reduce(d[:,136, 122]) > > 85.8426361084 > > >>> na.maximum.reduce(d) [136, 122] > > 37.3658103943 > > >>> na.maximum.reduce(d,0)[136, 122] > > 37.3658103943 > > >>> na.maximum.reduce(d,1)[136, 122] > > Traceback (most recent call last): > File "", line 1, in ? > IndexError: Index out of range > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > (axis 0). But as seen above it does not get it right. I then tried to > reproduce > > this with some simple arrays, but here it works just fine: > >>> a = na.arange(4*4*4) > >>> a.shape=(4,4,4) > >>> na.maximum.reduce(a) > > [[48 49 50 51] > [52 53 54 55] > [56 57 58 59] > [60 61 62 63]] > > >>> a = na.arange(4*4*4).astype(na.Float32) > >>> a.shape=(4,4,4) > >>> na.maximum.reduce(a) > > [[ 48. 49. 50. 51.] > [ 52. 53. 54. 55.] > [ 56. 57. 58. 59.] > [ 60. 61. 62. 63.]] > > > Any hint ? > > Regards, > Sebastian Haase Hi again, I think the reason that no one responded to this is that it just sounds to unbelievable ... Sorry for the missing piece of information, but 'd' is actually a memmapped array ! >>> d.info() class: shape: (80, 150, 150) strides: (90000, 600, 4) byteoffset: 0 bytestride: 4 itemsize: 4 aligned: 1 contiguous: 1 data: byteorder: big byteswap: 1 type: Float32 >>> dd = d.copy() >>> na.maximum.reduce(dd[:,136, 122]) 85.8426361084 >>> na.maximum.reduce(dd)[136, 122] 85.8426361084 >>> Apparently we are using memmap so frequently now that I didn't even think about that - which is good news for everyone, because it means that it works (mostly). I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux PC. Could this be the problem? Please some comments ! Thanks, Sebastian From jmiller at stsci.edu Fri Jul 2 09:03:08 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 2 09:03:08 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <200407020827.05407.haase@msg.ucsf.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> Message-ID: <1088784157.26482.14.camel@halloween.stsci.edu> On Fri, 2004-07-02 at 11:27, Sebastian Haase wrote: > On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > > Hi, > > > > Is this a bug?: > > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > > >>> d.type() > > > > Float32 > > > > >>> d[80, 136, 122] > > > > 80.3997039795 > > > > >>> na.maximum.reduce(d[:,136, 122]) > > > > 85.8426361084 > > > > >>> na.maximum.reduce(d) [136, 122] > > > > 37.3658103943 > > > > >>> na.maximum.reduce(d,0)[136, 122] > > > > 37.3658103943 > > > > >>> na.maximum.reduce(d,1)[136, 122] > > > > Traceback (most recent call last): > > File "", line 1, in ? > > IndexError: Index out of range > > > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > > (axis 0). But as seen above it does not get it right. I then tried to > > reproduce > > > > this with some simple arrays, but here it works just fine: > > >>> a = na.arange(4*4*4) > > >>> a.shape=(4,4,4) > > >>> na.maximum.reduce(a) > > > > [[48 49 50 51] > > [52 53 54 55] > > [56 57 58 59] > > [60 61 62 63]] > > > > >>> a = na.arange(4*4*4).astype(na.Float32) > > >>> a.shape=(4,4,4) > > >>> na.maximum.reduce(a) > > > > [[ 48. 49. 50. 51.] > > [ 52. 53. 54. 55.] > > [ 56. 57. 58. 59.] > > [ 60. 61. 62. 63.]] > > > > > > Any hint ? > > > > Regards, > > Sebastian Haase > > Hi again, > I think the reason that no one responded to this is that it just sounds to > unbelievable ... This just slipped through the cracks for me. > Sorry for the missing piece of information, but 'd' is actually a memmapped > array ! > >>> d.info() > class: > shape: (80, 150, 150) > strides: (90000, 600, 4) > byteoffset: 0 > bytestride: 4 > itemsize: 4 > aligned: 1 > contiguous: 1 > data: > byteorder: big > byteswap: 1 > type: Float32 > >>> dd = d.copy() > >>> na.maximum.reduce(dd[:,136, 122]) > 85.8426361084 > >>> na.maximum.reduce(dd)[136, 122] > 85.8426361084 > >>> > > Apparently we are using memmap so frequently now that I didn't even think > about that - which is good news for everyone, because it means that it works > (mostly). > > I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux PC. > Could this be the problem? I think byteorder is a good guess at this point. What version of Python and numarray are you using? Regards, Todd From haase at msg.ucsf.edu Fri Jul 2 10:46:01 2004 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Fri Jul 2 10:46:01 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <1088784157.26482.14.camel@halloween.stsci.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> Message-ID: <200407021045.00866.haase@msg.ucsf.edu> On Friday 02 July 2004 09:02 am, Todd Miller wrote: > On Fri, 2004-07-02 at 11:27, Sebastian Haase wrote: > > On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > > > Hi, > > > > > > Is this a bug?: > > > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > > > >>> d.type() > > > > > > Float32 > > > > > > >>> d[80, 136, 122] > > > > > > 80.3997039795 > > > > > > >>> na.maximum.reduce(d[:,136, 122]) > > > > > > 85.8426361084 > > > > > > >>> na.maximum.reduce(d) [136, 122] > > > > > > 37.3658103943 > > > > > > >>> na.maximum.reduce(d,0)[136, 122] > > > > > > 37.3658103943 > > > > > > >>> na.maximum.reduce(d,1)[136, 122] > > > > > > Traceback (most recent call last): > > > File "", line 1, in ? > > > IndexError: Index out of range > > > > > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > > > (axis 0). But as seen above it does not get it right. I then tried to > > > reproduce > > > > > > this with some simple arrays, but here it works just fine: > > > >>> a = na.arange(4*4*4) > > > >>> a.shape=(4,4,4) > > > >>> na.maximum.reduce(a) > > > > > > [[48 49 50 51] > > > [52 53 54 55] > > > [56 57 58 59] > > > [60 61 62 63]] > > > > > > >>> a = na.arange(4*4*4).astype(na.Float32) > > > >>> a.shape=(4,4,4) > > > >>> na.maximum.reduce(a) > > > > > > [[ 48. 49. 50. 51.] > > > [ 52. 53. 54. 55.] > > > [ 56. 57. 58. 59.] > > > [ 60. 61. 62. 63.]] > > > > > > > > > Any hint ? > > > > > > Regards, > > > Sebastian Haase > > > > Hi again, > > I think the reason that no one responded to this is that it just sounds > > to unbelievable ... > > This just slipped through the cracks for me. > > > Sorry for the missing piece of information, but 'd' is actually a > > memmapped array ! > > > > >>> d.info() > > > > class: > > shape: (80, 150, 150) > > strides: (90000, 600, 4) > > byteoffset: 0 > > bytestride: 4 > > itemsize: 4 > > aligned: 1 > > contiguous: 1 > > data: > > byteorder: big > > byteswap: 1 > > type: Float32 > > > > >>> dd = d.copy() > > >>> na.maximum.reduce(dd[:,136, 122]) > > > > 85.8426361084 > > > > >>> na.maximum.reduce(dd)[136, 122] > > > > 85.8426361084 > > > > > > Apparently we are using memmap so frequently now that I didn't even think > > about that - which is good news for everyone, because it means that it > > works (mostly). > > > > I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux > > PC. Could this be the problem? > > I think byteorder is a good guess at this point. What version of Python > and numarray are you using? Python 2.2.1 (#1, Feb 28 2004, 00:52:10) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 numarray 0.9 - from CVS on 2004-05-13. Regards, Sebastian Haase From jmiller at stsci.edu Fri Jul 2 12:34:09 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 2 12:34:09 2004 Subject: [Numpy-discussion] bug in numarray.maximum.reduce ? In-Reply-To: <200407021045.00866.haase@msg.ucsf.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> Message-ID: <1088796821.5974.15.camel@halloween.stsci.edu> On Fri, 2004-07-02 at 13:45, Sebastian Haase wrote: > On Friday 02 July 2004 09:02 am, Todd Miller wrote: > > On Fri, 2004-07-02 at 11:27, Sebastian Haase wrote: > > > On Tuesday 29 June 2004 05:05 pm, Sebastian Haase wrote: > > > > Hi, > > > > > > > > Is this a bug?: > > > > >>> # (import numarray as na ; 'd' is a 3 dimensional array) > > > > >>> d.type() > > > > > > > > Float32 > > > > > > > > >>> d[80, 136, 122] > > > > > > > > 80.3997039795 > > > > > > > > >>> na.maximum.reduce(d[:,136, 122]) > > > > > > > > 85.8426361084 > > > > > > > > >>> na.maximum.reduce(d) [136, 122] > > > > > > > > 37.3658103943 > > > > > > > > >>> na.maximum.reduce(d,0)[136, 122] > > > > > > > > 37.3658103943 > > > > > > > > >>> na.maximum.reduce(d,1)[136, 122] > > > > > > > > Traceback (most recent call last): > > > > File "", line 1, in ? > > > > IndexError: Index out of range > > > > > > > > I was using na.maximum.reduce(d) to get a "pixelwise" maximum along Z > > > > (axis 0). But as seen above it does not get it right. I then tried to > > > > reproduce > > > > > > > > this with some simple arrays, but here it works just fine: > > > > >>> a = na.arange(4*4*4) > > > > >>> a.shape=(4,4,4) > > > > >>> na.maximum.reduce(a) > > > > > > > > [[48 49 50 51] > > > > [52 53 54 55] > > > > [56 57 58 59] > > > > [60 61 62 63]] > > > > > > > > >>> a = na.arange(4*4*4).astype(na.Float32) > > > > >>> a.shape=(4,4,4) > > > > >>> na.maximum.reduce(a) > > > > > > > > [[ 48. 49. 50. 51.] > > > > [ 52. 53. 54. 55.] > > > > [ 56. 57. 58. 59.] > > > > [ 60. 61. 62. 63.]] > > > > > > > > > > > > Any hint ? > > > > > > > > Regards, > > > > Sebastian Haase > > > > > > Hi again, > > > I think the reason that no one responded to this is that it just sounds > > > to unbelievable ... > > > > This just slipped through the cracks for me. > > > > > Sorry for the missing piece of information, but 'd' is actually a > > > memmapped array ! > > > > > > >>> d.info() > > > > > > class: > > > shape: (80, 150, 150) > > > strides: (90000, 600, 4) > > > byteoffset: 0 > > > bytestride: 4 > > > itemsize: 4 > > > aligned: 1 > > > contiguous: 1 > > > data: > > > byteorder: big > > > byteswap: 1 > > > type: Float32 > > > > > > >>> dd = d.copy() > > > >>> na.maximum.reduce(dd[:,136, 122]) > > > > > > 85.8426361084 > > > > > > >>> na.maximum.reduce(dd)[136, 122] > > > > > > 85.8426361084 > > > > > > > > > Apparently we are using memmap so frequently now that I didn't even think > > > about that - which is good news for everyone, because it means that it > > > works (mostly). > > > > > > I just see that 'byteorder' is 'big' - I'm running this on an Intel Linux > > > PC. Could this be the problem? > > > > I think byteorder is a good guess at this point. What version of Python > > and numarray are you using? > > Python 2.2.1 (#1, Feb 28 2004, 00:52:10) > [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 > > numarray 0.9 - from CVS on 2004-05-13. > > Regards, > Sebastian Haase Hi Sebastian, I logged this on SF as a bug but won't get to it until next week after numarray-1.0 comes out. Regards, Todd From jmiller at stsci.edu Fri Jul 2 14:06:13 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 2 14:06:13 2004 Subject: [Numpy-discussion] ANN: numarray-1.0 released Message-ID: <1088802348.5974.28.camel@halloween.stsci.edu> Release Notes for numarray-1.0 Numarray is an array processing package designed to efficiently manipulate large multi-dimensional arrays. Numarray is modeled after Numeric and features c-code generated from python template scripts, the capacity to operate directly on arrays in files, and improved type promotions. I. ENHANCEMENTS 1. User added ufuncs There's a setup.py file in numarray-1.0/Examples/ufunc which demonstrates how a numarray user can define their own universal functions of one or two parameters. Ever wanted to write your own bessel() function for use on arrays? Now you can. Your ufunc can use exactly the same machinery as add(). 2. Ports of Numeric functions A bunch of Numeric functions were ported to numarray in the new libnumeric module. To get these import from numarray.numeric. Most notable among these are put, putmask, take, argmin, and argmax. Also added were sort, argsort, concatenate, repeat and resize. These are independent ports/implementations in C done for the purpose of best Numeric compatibility and small array performance. The numarray versions, which handle additional cases, still exist and are the default in numarray proper. 3. Faster matrix multiply The setup for numarray's matrix multiply was moved into C-code. This makes it faster for small matrices. 4. The numarray "header PEP" A PEP has been started for the inclusion of numarray (and possibly Numeric) C headers into the Python core. The PEP will demonstrate how to provide optional support for arrays (the end-user may or may not have numarray installed and the extension will still work). It may also (eventually) demonstrate how to build extensions which support both numarray and Numeric. Thus, the PEP is seeking to make it possible to distribute extensions which will still compile when numarray (or either) is not present in a user's Python installation, which will work when numarry (or either) is not installed, and which will improve performance when either is installed. The PEP is now in numarray-1.0/Doc/header_pep.txt in docutils format. We want feedback and consensus before we submit to python-dev so please consider reading it and commenting. For the PEP, the C-API has been partitioned into two parts: a relatively simple Numeric compatible part and the numarray native part. This broke source and binary compatibility with numarray-0.9. See CAUTIONS below for more information. 5. Changes to the manual There are now brief sections on numarray.mlab and numarray.objects in the manual. The discussion of the C-API has been updated. II. CAUTIONS 1. The numarray-1.0 C-API is neither completely source level nor binary compatible with numarray-0.9. First, this means that some 3rd party extensions will no longer compile without errors. Second, this means that binary packages built against numarray-0.9 will fail, probably disastrously, using numarray-1.0. Don't install numarray-1.0 until you are ready to recompile or replace your extensions with numarray-1.0 binaries because 0.9 binaries will not work. In order to support the header PEP, the numarray C-API was partitioned into two parts: Numeric compatible and numarry extensions. You can use the Numeric compatible API (the PyArray_* functions) by including arrayobject.h and calling import_array() in your module init function. You can use the extended API (the NA_* functions) by including libnumarray.h and calling import_libnumarray() in your init function. Because of the partitioning, all numarray extensions must be recompiled to work with 1.0. Extensions using *both* APIs must include both files in order to compile, and must do both imports in order to run. Both APIs share a common PyArrayObject struct. 2. numarray extension writers should note that the documented use of PyArray_INCREF and PyArray_XDECREF (in numarray) was found to be incompatible with Numeric and these functions have therefore been removed from the supported API and will now result in errors. 3. The numarray.objects.ObjectArray parameter order was changed. 4. The undocumented API function PyArray_DescrFromTypeObj was removed from the Numeric compatible API because it is not provided by Numeric. III. BUGS FIXED / CLOSED See http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse for more details. 979834 convolve2d parameter order issues 979775 ObjectArray parameter order 979712 No exception for invalid axis 979702 too many slices fails silently 979123 A[n:n] = x no longer works 979028 matrixmultiply precision 976951 Unpickled numarray types unsable? 977472 CharArray concatenate 970356 bug in accumulate contiguity status 969162 object array bug/ambiguity 963921 bitwise_not over Bool type fails 963706 _reduce_out: problem with otype 942804 numarray C-API include file 932438 suggest moving mlab up a level 932436 mlab docs missing 857628 numarray allclose returns int 839401 Argmax's behavior has changed for ties 817348 a+=1j # Not converted to complex 957089 PyArray_FromObject dim check broken 923046 numarray.objects incompatibility 897854 Type conflict when embedding on OS X 793421 PyArray_INCREF / PyArray_XDECREF deprecated 735479 Build failure on Cygwin 1.3.22 (very current install). 870660 Numarray: CFLAGS build problem 874198 numarray.random_array.random() broken? 874207 not-so random numbers in numarray.random_array 829662 Downcast from Float64 to UInt8 anomaly 867073 numarray diagonal bug? 806705 a tale of two rank-0's 863155 Zero size numarray breaks for loop 922157 argmax returns integer in some cases 934514 suggest nelements -> size 953294 choose bug 955314 strings.num2char bug? 955336 searchsorted has strange behaviour 955409 MaskedArray problems 953567 Add read-write requirement to NA_InputArray 952705 records striding for > 1D arrays 944690 many numarray array methods not documented 915015 numarray/Numeric incompatabilities 949358 UsesOpPriority unexpected behavior 944678 incorrect help for "size" func/method 888430 NA_NewArray() creates array with wrong endianess 922798 The document Announce.txt is out of date 947080 numarray.image.median bugs 922796 Manual has some dated MA info 931384 What does True mean in a mask? 931379 numeric.ma called MA in manual 933842 Bool arrays don't allow bool assignment 935588 problem parsing argument "nbyte" in callStrideConvCFunc() 936162 problem parsing "nbytes" argument in copyToString() 937680 Error in Lib/numerictypes.py ? 936539 array([cmplx_array, int_array]) fails 936541 a[...,1] += 0 crashes interpreter. 940826 Ufunct operator don't work 935882 take for character arrays? 933783 numarray, _ufuncmodule.c: problem setting buffersize 930014 fromstring typecode param still broken 929841 searchsorted type coercion 924841 numarray.objects rank-0 results 925253 numarray.objects __str__ and __repr__ 913782 Minor error in chapter 12: NUM_ or not? 889591 wrong header file for C extensions 925073 API manual comments 924854 take() errors 925754 arange() with large argument crashes interpreter 926246 ufunc reduction crash 902153 can't compile under RH9/gcc 3.2.2 916876 searchsorted/histogram broken in versions 0.8 and 0.9 920470 numarray arange() problem 915736 numarray-0.9: Doc/CHANGES not up to date WHERE ----------- Numarray-1.0 windows executable installers, source code, and manual is here: http://sourceforge.net/project/showfiles.php?group_id=1369 Numarray is hosted by Source Forge in the same project which hosts Numeric: http://sourceforge.net/projects/numpy/ The web page for Numarray information is at: http://stsdas.stsci.edu/numarray/index.html Trackers for Numarray Bugs, Feature Requests, Support, and Patches are at the Source Forge project for NumPy at: http://sourceforge.net/tracker/?group_id=1369 REQUIREMENTS ------------------------------ numarray-1.0 requires Python 2.2.2 or greater. AUTHORS, LICENSE ------------------------------ Numarray was written by Perry Greenfield, Rick White, Todd Miller, JC Hsu, Paul Barrett, Phil Hodge at the Space Telescope Science Institute. We'd like to acknowledge the assitance of Francesc Alted, Paul Dubois, Sebastian Haase, Tim Hochberg, Nadav Horesh, Edward C. Jones, Eric Jones, Jochen K"upper, Travis Oliphant, Pearu Peterson, Peter Verveer, Colin Williams, and everyone else who has contributed with comments, bug reports, or patches. Numarray is made available under a BSD-style License. See LICENSE.txt in the source distribution for details. -- Todd Miller jmiller at stsci.edu From paustin at eos.ubc.ca Sat Jul 3 10:11:03 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Sat Jul 3 10:11:03 2004 Subject: [Numpy-discussion] Bug in numarray.typecode()? In-Reply-To: <1088796821.5974.15.camel@halloween.stsci.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> Message-ID: <16614.59532.288486.645869@gull.eos.ubc.ca> I'm in the process of switching to numarray, but I still need typecode(). I notice that, although it's discouraged, the typecode ids have been extended to all new numarray types described in table 4.1 (p. 19) of the manual, except UInt64. That is, the following script: import numarray as Na print "Numarray version: ",Na.__version__ print Na.array([1],'Int8').typecode() print Na.array([1],'UInt8').typecode() print Na.array([1],'Int16').typecode() print Na.array([1],'UInt16').typecode() print Na.array([1],'Int32').typecode() print Na.array([1],'UInt32').typecode() print Na.array([1],'Float32').typecode() print Na.array([1],'Float64').typecode() print Na.array([1],'Complex32').typecode() print Na.array([1],'Complex64').typecode() print Na.array([1],'Bool').typecode() print Na.array([1],'UInt64').typecode() prints: Numarray version: 1.0 1 b s w l u f d F D 1 Traceback (most recent call last): File "", line 14, in ? File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 1092, in typecode return _nt.typecode[self._type] KeyError: UInt64 Should this print 'U'? Regards, Phil Austin From curzio.basso at unibas.ch Tue Jul 6 02:42:06 2004 From: curzio.basso at unibas.ch (Curzio Basso) Date: Tue Jul 6 02:42:06 2004 Subject: [Numpy-discussion] inconsistencies between docs and C headers? Message-ID: <40EA73C9.7070604@unibas.ch> Hi all, can someone explain me why in the docs functions like NA_NewArray() return a PyObject*, while in the headers they return a PyArrayObject*? Is it just the documentation which is slow to catch up with the development? Or am i missing something? thanks, curzio From jmiller at stsci.edu Tue Jul 6 06:35:11 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 6 06:35:11 2004 Subject: [Numpy-discussion] Bug in numarray.typecode()? In-Reply-To: <16614.59532.288486.645869@gull.eos.ubc.ca> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> Message-ID: <1089120859.25460.3.camel@halloween.stsci.edu> On Sat, 2004-07-03 at 13:10, Philip Austin wrote: > I'm in the process of switching to numarray, but I still > need typecode(). I notice that, although it's discouraged, > the typecode ids have been extended to all new numarray > types described in table 4.1 (p. 19) of the manual, except UInt64. > That is, the following script: > > import numarray as Na > print "Numarray version: ",Na.__version__ > print Na.array([1],'Int8').typecode() > print Na.array([1],'UInt8').typecode() > print Na.array([1],'Int16').typecode() > print Na.array([1],'UInt16').typecode() > print Na.array([1],'Int32').typecode() > print Na.array([1],'UInt32').typecode() > print Na.array([1],'Float32').typecode() > print Na.array([1],'Float64').typecode() > print Na.array([1],'Complex32').typecode() > print Na.array([1],'Complex64').typecode() > print Na.array([1],'Bool').typecode() > print Na.array([1],'UInt64').typecode() > > prints: > > Numarray version: 1.0 > 1 > b > s > w > l > u > f > d > F > D > 1 > Traceback (most recent call last): > File "", line 14, in ? > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 1092, in typecode > return _nt.typecode[self._type] > KeyError: UInt64 > > Should this print 'U'? I think it could, but I wouldn't go so far as to say it should. typecode() is there for backward compatibility with Numeric. Since 'U' doesn't work for Numeric, I see no point in adding it to numarray. I'm not sure it would hurt anything other than create the illusion that something which works on numarray will also work on Numeric. If anyone has a good reason to add it, please speak up. Regards, Todd From jmiller at stsci.edu Tue Jul 6 06:58:09 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 6 06:58:09 2004 Subject: [Numpy-discussion] inconsistencies between docs and C headers? In-Reply-To: <40EA73C9.7070604@unibas.ch> References: <40EA73C9.7070604@unibas.ch> Message-ID: <1089122261.25460.41.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 05:41, Curzio Basso wrote: > Hi all, > can someone explain me why in the docs functions like NA_NewArray() > return a PyObject*, while in the headers they return a PyArrayObject*? > Is it just the documentation which is slow to catch up with the > development? Yes, it's a bona fide inconsistency. It's not great, but it's fairly harmless since a PyArrayObject is a PyObject. From paustin at eos.ubc.ca Tue Jul 6 09:31:05 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Tue Jul 6 09:31:05 2004 Subject: [Numpy-discussion] Bug in numarray.typecode()? In-Reply-To: <1089120859.25460.3.camel@halloween.stsci.edu> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> <1089120859.25460.3.camel@halloween.stsci.edu> Message-ID: <16618.54200.934079.44467@gull.eos.ubc.ca> Todd Miller writes: > > > > Should this print 'U'? > > I think it could, but I wouldn't go so far as to say it should. > typecode() is there for backward compatibility with Numeric. Since 'U' > doesn't work for Numeric, I see no point in adding it to numarray. I'm > not sure it would hurt anything other than create the illusion that > something which works on numarray will also work on Numeric. > > If anyone has a good reason to add it, please speak up. > I don't necessarily need typecode, but I couldn't find the inverse of a = array([10], type = 'UInt8') (p. 19) in the manual. That is, I need a method that returns the string representation of a numarray type in a single call (as opposed to the two-step repr(array.type()). This is for code that uses the Boost C++ bindings to numarray. These bindings work via callbacks to python (which eliminates the need to link to the numarray or numeric api). Currently I use typecode() to get an index into a map of types when I need to check that the type of a passed argument is correct: void check_type(boost::python::numeric::array arr, string expected_type){ string actual_type = arr.typecode(); if (actual_type != expected_type) { std::ostringstream stream; stream << "expected Numeric type " << kindstrings[expected_type] << ", found Numeric type " << kindstrings[actual_type] << std::ends; PyErr_SetString(PyExc_TypeError, stream.str().c_str()); throw_error_already_set(); } return; } Unless I'm missing something, without typecode I need a second interpreter call to repr, or I need to import numarray and load all the types into storage for a type object comparison. It's not a showstopper, but since I check every argument in every call, I'd like to avoid this unless absolutely necessary. Regards, Phil From jmiller at stsci.edu Tue Jul 6 11:40:08 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 6 11:40:08 2004 Subject: [Numpy-discussion] Missing header_pep.txt Message-ID: <1089139173.26741.2.camel@halloween.stsci.edu> Somehow header_pep.txt didn't make it into the numarray-1.0 source tar-ball. It's now in CVS and also attached. Regards, Todd -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 38 URL: From jmiller at stsci.edu Tue Jul 6 10:15:27 2004 From: jmiller at stsci.edu (Todd Miller) Date: 06 Jul 2004 10:15:27 -0400 Subject: ANN: numarray-1.0 released In-Reply-To: <40C2E65B0000343B@cpfe4.be.tisc.dk> References: <40C2E65B0000343B@cpfe4.be.tisc.dk> Message-ID: <1089123327.25460.57.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 02:59, jjm at tiscali.dk wrote: > > The PEP is now in > > numarray-1.0/Doc/header_pep.txt in docutils format. We want feedback > > and consensus before we submit to python-dev so please consider > > reading it and commenting. > > I can't find header_pep.txt! It is not in numarray-1.0.tar.gz. Oops, you're right. I attached it. Apparently I forgot to add it to CVS. Todd -------------- next part -------------- PEP: XXX Title: numerical array headers Version: $Revision: 1.3 $ Last-Modified: $Date: 2002/08/30 04:11:20 $ Author: Todd Miller , Perry Greenfield Discussions-To: numpy-discussion at lists.sf.net Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 02-Jun-2004 Python-Version: 2.4 Post-History: 30-Aug-2002 Abstract ======== We propose the inclusion of three numarray header files within the CPython distribution to facilitate use of numarray array objects as an optional data data format for 3rd party modules. The PEP illustrates a simple technique by which a 3rd party extension may support numarrays as input or output values if numarray is installed, and yet the 3rd party extension does not require numarray to be installed to be built. Nothing needs to be changed in the setup.py or makefile for installing with or without numarray, and a subsequent installation of numarray will allow its use without rebuilding the 3rd party extension. Specification ============= This PEP applies only to the CPython platform and only to numarray. Analogous PEPs could be written for Jython and Python.NET and Numeric, but what is discussed here is a speed optimization that is tightly coupled to CPython and numarray. Three header files to support the numarray C-API should be included in the CPython distribution within a numarray subdirectory of the Python include directory: * numarray/arraybase.h * numarray/libnumeric.h * numarray/arrayobject.h The files are shown prefixed with "numarray" to leave the door open for doing similar PEPs with other packages, such as Numeric. If a plethora of such header contributions is anticipated, a further refinement would be to locate the headers under something like "third_party/numarray". In order to provide enhanced performance for array objects, an extension writer would start by including the numarray C-API in addition to any other Python headers: :: #include "numarray/arrayobject.h" Not shown in this PEP are the API calls which operate on numarrays. These are documented in the numarray manual. What is shown here are two calls which are guaranteed to be safe even when numarray is not installed: * PyArray_Present() * PyArray_isArray() In an extension function that wants to access the numarray API, a test needs to be performed to determine if the API functions are safely callable: :: PyObject * some_array_returning_function(PyObject *m, PyObject *args) { int param; PyObject *result; if (!PyArg_ParseTuple(args, "i", ¶m)) return NULL; if (PyArray_Present()) { result = numarray_returning_function(param); } else { result = list_returning_function(param); } return result; } Within **numarray_returning_function**, a subset of the numarray C-API (the Numeric compatible API) is available for use so it is possible to create and return numarrays. Within **list_returning_function**, only the standard Python C-API can be used because numarray is assumed to be unavailable in that particular Python installation. In an extension function that wants to accept numarrays as inputs and provide improved performance over the Python sequence protocol, an additional convenience function exists which diverts arrays to specialized code when numarray is present and the input is an array: :: PyObject * some_array_accepting_function(PyObject *m, PyObject *args) { PyObject *sequence, *result; if (!PyArg_ParseTuple(args, "O", &sequence)) return NULL; if (PyArray_isArray(sequence)) { result = numarray_input_function(sequence); } else { result = sequence_input_function(sequence); } return result; } During module initialization, a numarray enhanced extension must call **import_array()**, a macro which imports numarray and assigns a value to a static API pointer: PyArray_API. Since the API pointer starts with the value NULL and remains so if the numarray import fails, the API pointer serves as a flag that indicates that numarray was sucessfully imported whenever it is non-NULL. :: static void initfoo(void) { PyObject *m = Py_InitModule3( "foo", _foo_functions, _foo__doc__); if (m == NULL) return; import_array(); } **PyArray_Present()** indicates that numarray was successfully imported. It is defined in terms of the API function pointer as: :: #define PyArray_Present() (PyArray_API != NULL) **PyArray_isArray(s)** indicates that numarray was successfully imported and the given parameter is a numarray instance. It is defined as: :: #define PyArray_isArray(s) (PyArray_Present() && PyArray_Check(s)) Motivation ========== The use of numeric arrays as an interchange format is eminently sensible for many kinds of modules. For example, image, graphics, and audio modules all can accept or generate large amounts of numerical data that could easily use the numarray format. But since numarray is not part of the standard distribution, some authors of 3rd party extensions may be reluctant to add a dependency on a different 3rd party extension that isn't absolutely essential for its use fearing dissuading users who may be put off by extra installation requirements. Yet, not allowing easy interchange with numarray introduces annoyances that need not be present. Normally, in the absence of an explicit ability to generate or use numarray objects, one must write conversion utilities to convert from the data representation used to that for numarray. This typically involves excess copying of data (usually from internal to string to numarray). In cases where the 3rd party uses buffer objects, the data may not need copying at all. Either many users may have to develop their own conversion routines or numarray will have to include adapters for many other 3rd party packages. Since numarray is used by many projects, it makes more sense to put the conversion logic on the other side of the fence. There is a clear need for a mechanism that allows 3rd party software to use numarray objects if it is available without requiring numarray's presence to build and install properly. Rationale ========= One solution is to make numarray part of the standard distribution. That may be a good long-term solution, but at the moment, the numeric community is in transition period between the Numeric and numarray packages which may take years to complete. It is not likely that numarray will be considered for adoption until the transition is complete. Numarray is also a large package, and there is legitimate concern about its inclusion as regards the long-term commitment to support. We can solve that problem by making a few include files part of the Python Standard Distribution and demonstrating how extension writers can write code that uses numarray conditionally. The API submitted in this PEP is the subset of the numarray API which is most source compatible with Numeric. The headers consist of two handwritten files (arraybase.h and arrayobject.h) and one generated file (libnumeric.h). arraybase.h contains typedefs and enumerations which are important to both the API presented here and to the larger numarray specific API. arrayobject.h glues together arraybase and libnumeric and is needed for Numeric compatibility. libnumeric.h consists of macros generated from a template and a list of function prototypes. The macros themselves are somewhat intricate in order to provide the compile time checking effect of function prototypes. Further, the interface takes two forms: one form is used to compile numarray and defines static function prototypes. The other form is used to compile extensions which use the API and defines macros which execute function calls through pointers which are found in a table located using a single public API pointer. These macros also test the value of the API pointer in order to deliver a fatal error should a developer forget to initialize by calling import_array(). The interface chosen here is the subset of numarray most useful for porting existing Numeric code or creating new extensions which can be compiled for either numarray or Numeric. There are a number of other numarray API functions which are omitted here for the sake of simplicity. By choosing to support only the Numeric compatible subset of the numarray C-API, concerns about interface stability are minimized because the Numeric API is well established. However, it should be made clear that the numarray API subset proposed here is source compatible, not binary compatible, with Numeric. Resources ========= * numarray/arraybase.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arraybase.h) * numarray/libnumeric.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/libnumeric.h) * numarray/arrayobject.h (http://cvs.sourceforge.net/viewcvs.py/numpy/numarray/Include/numarray/arrayobject.h) * numarray-1.0 manual PDF * numarray-1.0 source distribution * numarray website at STSCI (http://www.stsci.edu/resources/software_hardware/numarray) * example numarray enhanced extension References ========== .. [1] PEP 1, PEP Purpose and Guidelines, Warsaw, Hylton (http://www.python.org/peps/pep-0001.html) .. [2] PEP 9, Sample Plaintext PEP Template, Warsaw (http://www.python.org/peps/pep-0009.html) Copyright ========= This document has been placed in the public domain. .. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 End: From paustin at eos.ubc.ca Tue Jul 6 16:09:02 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Tue Jul 6 16:09:02 2004 Subject: [Numpy-discussion] non-intuitive behaviour for isbyteswapped()? In-Reply-To: <16614.59532.288486.645869@gull.eos.ubc.ca> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> Message-ID: <16619.12490.596884.579782@gull.eos.ubc.ca> With numarray 1.0 and Mandrake 10 i686 I get the following: >>> y=N.array([1,1,2,1],type="Float64") >>> y array([ 1., 1., 2., 1.]) >>> y.byteswap() >>> y array([ 3.03865194e-319, 3.03865194e-319, 3.16202013e-322, 3.03865194e-319]) >>> y.isbyteswapped() 0 Should this be 1? Thanks, Phil From paustin at eos.ubc.ca Tue Jul 6 18:43:49 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Tue Jul 6 18:43:49 2004 Subject: [Numpy-discussion] optional arguments to the array constructor Message-ID: <16619.21771.686179.152410@gull.eos.ubc.ca> (for numpy v1.0 on Mandrake 10 i686) As noted on p. 25 the array constructor takes up to 5 optional arguments array(sequence=None, type=None, shape=None, copy=1, savespace=0,typecode=None) (and raises an exception if both type and typecode are set). Is there any way to make an alias (copy=0) of an array without passing keyword values? That is, specifying the copy keyword alone works: test=N.array((1., 3), "Float64", shape=(2,), copy=1, savespace=0) a=N.array(test, copy=0) a[1]=999 print test >>> [ 1. 999.] But when intervening keywords are specified copy won't toggle: test=N.array((1., 3)) a=N.array(sequence=test, type="Float64", shape=(2,), copy=0) a[1]=999. print test >>> [ 1. 3.] Which is also the behaviour I see when I drop the keywords: test=N.array((1., 3)) a=N.array(test, "Float64", (2,), 0) a[1]=999. print test >>> [ 1. 3.] an additional puzzle is that adding the savespace parameter raises the following exception: >>> a=N.array(test, "Float64", (2,), 0,0) Traceback (most recent call last): File "", line 1, in ? File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 312, in array type = getTypeObject(sequence, type, typecode) File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 256, in getTypeObject rtype = _typeFromTypeAndTypecode(type, typecode) File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 243, in _typeFromTypeAndTypecode raise ValueError("Can't define both 'type' and 'typecode' for an array.") ValueError: Can't define both 'type' and 'typecode' for an array. Thanks for any insights -- Phil From postmaster at framatome-anp.com Tue Jul 6 23:59:40 2004 From: postmaster at framatome-anp.com (System Administrator) Date: Tue Jul 6 23:59:40 2004 Subject: [Numpy-discussion] Undeliverable: Re: Thanks! Message-ID: <72B401374280BA4897AF65F8853386C0F193DC@fpari01mxb.di.framatome.fr> Your message To: jacques.heliot at framatome-anp.com Subject: Re: Thanks! Sent: Wed, 7 Jul 2004 08:56:14 +0200 did not reach the following recipient(s): jacques.heliot at framail.framatome-anp.com on Wed, 7 Jul 2004 08:56:06 +0200 The recipient name is not recognized The MTS-ID of the original message is: c=fr;a= ;p=fragroup;l=FPARI01MXB0407070656LWFRLMFV MSEXCH:IMS:FRAGROUP:FRAANP-FR-PARIS-PARIS:FPARI01MXB 0 (000C05A6) Unknown Recipient -------------- next part -------------- An embedded message was scrubbed... From: unknown sender Subject: no subject Date: no date Size: 38 URL: From numpy-discussion at lists.sourceforge.net Wed Jul 7 02:56:14 2004 From: numpy-discussion at lists.sourceforge.net (numpy-discussion at lists.sourceforge.net) Date: Wed, 7 Jul 2004 08:56:14 +0200 Subject: Thanks! Message-ID: <200407070647.i676lSFD047810@mx.framatome-anp.com> ------------------ Virus Warning Message (on octopussy) Found virus WORM_NETSKY.D in file message_part2.pif The uncleanable file is deleted. --------------------------------------------------------- -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT2026863.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ATT2026864.txt URL: From jmiller at stsci.edu Wed Jul 7 07:58:05 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 7 07:58:05 2004 Subject: [Numpy-discussion] non-intuitive behaviour for isbyteswapped()? In-Reply-To: <16619.12490.596884.579782@gull.eos.ubc.ca> References: <200406291705.55454.haase@msg.ucsf.edu> <200407020827.05407.haase@msg.ucsf.edu> <1088784157.26482.14.camel@halloween.stsci.edu> <200407021045.00866.haase@msg.ucsf.edu> <1088796821.5974.15.camel@halloween.stsci.edu> <16614.59532.288486.645869@gull.eos.ubc.ca> <16619.12490.596884.579782@gull.eos.ubc.ca> Message-ID: <1089212251.29456.212.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 19:07, Philip Austin wrote: > With numarray 1.0 and Mandrake 10 i686 I get the following: > > >>> y=N.array([1,1,2,1],type="Float64") > >>> y > array([ 1., 1., 2., 1.]) > >>> y.byteswap() > >>> y > array([ 3.03865194e-319, 3.03865194e-319, 3.16202013e-322, > 3.03865194e-319]) > >>> y.isbyteswapped() > 0 > > Should this be 1? The behavior of byteswap() has been controversial in the past, at one time implementing exactly the behavior I think you expected. Without giving any guarantee for the future, here's how things work now: byteswap() just swaps the bytes. There's a related method, togglebyteorder(), which inverts the sense of the byteorder: >>> y.byteswap() >>> y.togglebyteorder() >>> y.isbyteswapped() 1 The ability to munge bytes and change the sense of byteorder independently is definitely needed... but you're certainly not the first one to ask this question. There is also (Numeric compatible) byteswapped(), which both swaps and changes sense, but it creates a copy rather than operating in place: >>> x = y.byteswapped() >>> (x is not y) and (x._data is not y._data) 1 Regards, Todd From jmiller at stsci.edu Wed Jul 7 08:13:05 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 7 08:13:05 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <16619.21771.686179.152410@gull.eos.ubc.ca> References: <16619.21771.686179.152410@gull.eos.ubc.ca> Message-ID: <1089213153.29456.229.camel@halloween.stsci.edu> On Tue, 2004-07-06 at 21:42, Philip Austin wrote: > (for numpy v1.0 on Mandrake 10 i686) My guess is you're talking about numarray here. Please be charitable if I'm talking out of turn... I tend to see everything as a numarray issue. > As noted on p. 25 the array constructor takes up to 5 optional arguments > > array(sequence=None, type=None, shape=None, copy=1, savespace=0,typecode=None) > (and raises an exception if both type and typecode are set). > > Is there any way to make an alias (copy=0) of an array without passing > keyword values? In numarray, all you have to do to get an alias is: >>> b = a.view() It's an alias because: >>> b._data is a._data True > That is, specifying the copy keyword alone works: > > test=N.array((1., 3), "Float64", shape=(2,), copy=1, savespace=0) > a=N.array(test, copy=0) > a[1]=999 > print test > > >>> [ 1. 999.] > > But when intervening keywords are specified copy won't toggle: > > test=N.array((1., 3)) > a=N.array(sequence=test, type="Float64", shape=(2,), copy=0) > a[1]=999. > print test > >>> [ 1. 3.] > > Which is also the behaviour I see when I drop the keywords: > > test=N.array((1., 3)) > a=N.array(test, "Float64", (2,), 0) > a[1]=999. > print test > >>> [ 1. 3.] > > an additional puzzle is that adding the savespace parameter raises > the following exception: > > > >>> a=N.array(test, "Float64", (2,), 0,0) > Traceback (most recent call last): > File "", line 1, in ? > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 312, in array > type = getTypeObject(sequence, type, typecode) > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 256, in getTypeObject > rtype = _typeFromTypeAndTypecode(type, typecode) > File "/usr/lib/python2.3/site-packages/numarray/numarraycore.py", line 243, in _typeFromTypeAndTypecode > raise ValueError("Can't define both 'type' and 'typecode' for an array.") > ValueError: Can't define both 'type' and 'typecode' for an array. All this looks like a documentation problem. The numarray array() signature has been tortured by Numeric backward compatibility, so there has been more flux in it than you would expect. Anyway, the manual is out of date. Here's the current signature from the code: def array(sequence=None, typecode=None, copy=1, savespace=0, type=None, shape=None): Sorry about the confusion, Todd From paustin at eos.ubc.ca Wed Jul 7 11:26:11 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Wed Jul 7 11:26:11 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <1089213153.29456.229.camel@halloween.stsci.edu> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> Message-ID: <16620.16395.603789.28730@gull.eos.ubc.ca> Todd Miller writes: > On Tue, 2004-07-06 at 21:42, Philip Austin wrote: > > (for numpy v1.0 on Mandrake 10 i686) > > My guess is you're talking about numarray here. Please be charitable if > I'm talking out of turn... I tend to see everything as a numarray > issue. Right -- I'm still working through the boost test suite for numarray, which is failing a couple of tests that passed (around numarray v0.3). > All this looks like a documentation problem. The numarray array() > signature has been tortured by Numeric backward compatibility, so there > has been more flux in it than you would expect. Anyway, the manual is > out of date. Here's the current signature from the code: > > def array(sequence=None, typecode=None, copy=1, savespace=0, > type=None, shape=None): > Actually, it seems to be a difference in the way that numeric and numarray treat the copy flag when typecode is specified. In numeric, if no change in type is requested and copy=0, then the constructor goes ahead and produces a view: import Numeric as nc test=nc.array([1,2,3],'i') a=nc.array(test,'i',0) a[0]=99 print test >> [99 2 3] but makes a copy if a cast is required: test=nc.array([1,2,3],'i') a=nc.array(test,'F',0) a[0]=99 print test >>> [1 2 3] Looking at numarraycore.py line 305 I see that: if type is None and typecode is None: if copy: a = sequence.copy() else: a = sequence i.e. numarray skips the check for a type match and ignores the copy flag, even if the type is preserved: import numarray as ny test=ny.array([1,2,3],'i') a=ny.array(test,'i',0) a._data is test._data >>> False It look like there might have been a comment about this in the docstring, but it got clipped at some point?: array() constructs a NumArray by calling NumArray, one of its factory functions (fromstring, fromfile, fromlist), or by making a copy of an existing array. If copy=0, array() will create a new array only if sequence specifies the contents or storage for the array Thanks, Phil From jmiller at stsci.edu Wed Jul 7 12:47:02 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 7 12:47:02 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <16620.16395.603789.28730@gull.eos.ubc.ca> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> <16620.16395.603789.28730@gull.eos.ubc.ca> Message-ID: <1089229573.29456.544.camel@halloween.stsci.edu> On Wed, 2004-07-07 at 14:25, Philip Austin wrote: > Todd Miller writes: > > On Tue, 2004-07-06 at 21:42, Philip Austin wrote: > > > (for numpy v1.0 on Mandrake 10 i686) > > > > My guess is you're talking about numarray here. Please be charitable if > > I'm talking out of turn... I tend to see everything as a numarray > > issue. > > Right -- I'm still working through the boost test suite for numarray, which is > failing a couple of tests that passed (around numarray v0.3). > > > All this looks like a documentation problem. The numarray array() > > signature has been tortured by Numeric backward compatibility, so there > > has been more flux in it than you would expect. Anyway, the manual is > > out of date. Here's the current signature from the code: > > > > def array(sequence=None, typecode=None, copy=1, savespace=0, > > type=None, shape=None): > > > > Actually, it seems to be a difference in the way that numeric and > numarray treat the copy flag when typecode is specified. In numeric, > if no change in type is requested and copy=0, then the constructor > goes ahead and produces a view: > > import Numeric as nc > test=nc.array([1,2,3],'i') > a=nc.array(test,'i',0) > a[0]=99 > print test > >> [99 2 3] > > but makes a copy if a cast is required: > > test=nc.array([1,2,3],'i') > a=nc.array(test,'F',0) > a[0]=99 > print test > >>> [1 2 3] > > Looking at numarraycore.py line 305 I see that: > > if type is None and typecode is None: > if copy: > a = sequence.copy() > else: > a = sequence > > i.e. numarray skips the check for a type match and ignores > the copy flag, even if the type is preserved: > > import numarray as ny > test=ny.array([1,2,3],'i') > a=ny.array(test,'i',0) > a._data is test._data > >>> False > OK, I think I see what you're after and agree that it's a bug. Here's how I'll change the behavior: >>> import numarray >>> a = numarray.arange(10) >>> b = numarray.array(a, copy=0) >>> a is b True >>> b = numarray.array(a, copy=1) >>> a is b False One possible point of note is that array() doesn't return views for copy=0; neither does Numeric; both return the original sequence. Regards, Todd From paustin at eos.ubc.ca Wed Jul 7 13:15:04 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Wed Jul 7 13:15:04 2004 Subject: [Numpy-discussion] optional arguments to the array constructor In-Reply-To: <1089229573.29456.544.camel@halloween.stsci.edu> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> <16620.16395.603789.28730@gull.eos.ubc.ca> <1089229573.29456.544.camel@halloween.stsci.edu> Message-ID: <16620.22921.791432.143944@gull.eos.ubc.ca> Todd Miller writes: > > OK, I think I see what you're after and agree that it's a bug. Here's > how I'll change the behavior: > > >>> import numarray > >>> a = numarray.arange(10) > >>> b = numarray.array(a, copy=0) > >>> a is b > True > >>> b = numarray.array(a, copy=1) > >>> a is b > False Just to be clear -- the above is the current numarray v1.0 behavior (at least on my machine). Numeric compatibility would additonally require that import numarray a = numarray.arange(10) theTypeCode=repr(a.type()) b = numarray.array(a, theTypeCode, copy=0) print a is b b = numarray.array(a, copy=1) print a is b produce True False While currently it produces True True Having said this, I can work around this difference -- so either a note in the documentation or just removing the copy flag from numarray.array would also be ok. -- Thanks, Phil From paustin at eos.ubc.ca Wed Jul 7 13:17:03 2004 From: paustin at eos.ubc.ca (Philip Austin) Date: Wed Jul 7 13:17:03 2004 Subject: [Numpy-discussion] Re: Correction -- optional arguments to the array constructor In-Reply-To: <1089229573.29456.544.camel@halloween.stsci.edu> References: <16619.21771.686179.152410@gull.eos.ubc.ca> <1089213153.29456.229.camel@halloween.stsci.edu> <16620.16395.603789.28730@gull.eos.ubc.ca> <1089229573.29456.544.camel@halloween.stsci.edu> Message-ID: <16620.23066.506262.410021@gull.eos.ubc.ca> Oops, note the change below at ---> Todd Miller writes: > > OK, I think I see what you're after and agree that it's a bug. Here's > how I'll change the behavior: > > >>> import numarray > >>> a = numarray.arange(10) > >>> b = numarray.array(a, copy=0) > >>> a is b > True > >>> b = numarray.array(a, copy=1) > >>> a is b > False Just to be clear -- the above is the current numarray v1.0 behavior (at least on my machine). Numeric compatibility would additonally require that import numarray a = numarray.arange(10) theTypeCode=repr(a.type()) b = numarray.array(a, theTypeCode, copy=0) print a is b b = numarray.array(a, copy=1) print a is b produce True False While currently it produces ---> False False Having said this, I can work around this difference -- so either a note in the documentation or just removing the copy flag from numarray.array would also be ok. -- Thanks, Phil From wlanger at bigpond.net.au Thu Jul 8 10:29:01 2004 From: wlanger at bigpond.net.au (Wendy Langer) Date: Thu Jul 8 10:29:01 2004 Subject: [Numpy-discussion] "buffer not aligned on 8 byte boundary" errors when running numarray.testall.test() Message-ID: Hi there all :) I am having trouble with my installation of numarray. :( I am a python newbie and a numarray extreme-newbie, so it could be that I don't yet have the first clue what I am doing. ;) Python 2.3.3 (#51, Feb 16 2004, 04:07:52) [MSC v.1200 32 bit (Intel)] on win32 numarray 1.0 The Python I am using is the one that comes with the "Enthought" version (www.enthought.com), a distro specifically designed to be useful for scientists, so it comes with numerical stuff and scipy and chaco and things like that preinstalled. I used the windows binary installer. However it came with Numeric and not numarray, so I installed numarray "by hand". This seemed to go ok, and it seems that there is no problem having both Numeric and numarray in the same installation, since they have (obviously) different names (still getting used to this whole modules and namespaces &c &c) At the bottom of this email I have pasted an example of what it was I was trying to do, and the error messages that the interpreter gave me - but before anyone bothers reading them in any detail, the essential error seems to be as follows: error: multiply_Float64_scalar_vector: buffer not aligned on 8 byte boundary. I have no idea what this means, but I do recall that when I ran the numarray.testall.test() procedure after first completing my installation a couple of days ago, it reported a *lot* of problems, many of which sounded quite similar to this. I hoped for the best and thought that perhaps I had "run the test wrong"(!) since numarray seemed to be working ok, and I had investigated many of the examples in chapters 3 and 4 of the user manual withour any obvious problems (chapter 3 = "high level overview" and chapter 4 = "array basics") I decided at the time to leave well enough alone until I actually came across odd or mysterious behaviour ...however that time has come all-too-soon... The procedure I am using to run the test is as described on page 11 of the excellent user's manual (release 0.8 at http://www.pfdubois.com/numpy/numarray.pdf): --------------------------------------------- Testing your Installation Once you have installed numarray, test it with: C:\numarray> python Python 2.2.2 (#18, Dec 30 2002, 02:26:03) [MSC 32 bit (Intel)] on win32 Type "copyright", "credits" or "license" for more information. >>> import numarray.testall as testall >>> testall.test() numeric: (0, 1115) records: (0, 48) strings: (0, 166) objects: (0, 72) memmap: (0, 75) Each line in the above output indicates that 0 of X tests failed. X grows steadily with each release, so the numbers shown above may not be current. ------------------------------------------------------------------------ Anyway, when I ran this, instead of the nice, comforting output above, I got about a million(!) errors and then a final count of 320 failures. This number is not always constant - I recall the first time I ran it it was 209. [I just ran it again and this time it was 324...it all has a rather disturbing air of semi-randomness...] So below is the (heavily snipped) output from the testall.test() run, and below that is the code where I first noticed a possibly similar error, and below *that* is the output of that code, with the highly suspicous error.... Any suggestions greatly appreciated! I can give you more info about the setup on my computer and so on if you need :) wendy langer ====================================================================== ==================================== IDLE 1.0.2 ==== No Subprocess ==== >>> import numarray.testall as testall >>> testall.test() ***************************************************************** Failure in example: x+y from line #50 of first pass Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? File "C:\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 733, in __add__ return ufunc.add(self, operand) error: Int32asFloat64: buffer not aligned on 8 byte boundary. ***************************************************************** Failure in example: x[:] = 0.1 from line #72 of first pass Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? error: Float64asBool: buffer not aligned on 8 byte boundary. ***************************************************************** Failure in example: y from line #74 of first pass Expected: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) Got: array([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]) ***************************************************************** Failure in example: x + z from line #141 of first pass Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? File "C:\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 733, in __add__ return ufunc.add(self, operand) error: Int32asFloat64: buffer not aligned on 8 byte boundary. ***************************************************************** ***************************************************************** Failure in example: a2dma = average(a2dm, axis=1) from line #812 of numarray.ma.dtest Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? File "C:\PYTHON23\Lib\site-packages\numarray\ma\MA.py", line 1686, in average w = Numeric.choose(mask, (1.0, 0.0)) File "C:\PYTHON23\Lib\site-packages\numarray\ufunc.py", line 1666, in choose return _choose(selector, population, outarr, clipmode) File "C:\PYTHON23\Lib\site-packages\numarray\ufunc.py", line 1573, in __call__ result = self._doit(computation_mode, woutarr, cfunc, ufargs, 0) File "C:\PYTHON23\Lib\site-packages\numarray\ufunc.py", line 1558, in _doit blockingparameters) error: choose8bytes: buffer not aligned on 8 byte boundary. ***************************************************************** Failure in example: alltest(a2dma, [1.5, 4.0]) from line #813 of numarray.ma.dtest Exception raised: Traceback (most recent call last): File "C:\PYTHON23\lib\doctest.py", line 442, in _run_examples_inner compileflags, 1) in globs File "", line 1, in ? NameError: name 'a2dma' is not defined ***************************************************************** 1 items had failures: 320 of 671 in numarray.ma.dtest ***Test Failed*** 320 failures. numarray.ma: (320, 671) ========================================================================= ======================== import numarray class anXmatrix: def __init__(self, stepsize = 3): self.stepsize = stepsize self.populate_matrix() def describe(self): print "I am a ", self.__class__ print "my stepsize is", self.stepsize print "my matrix is: \n" print self.matrix def populate_matrix(self): def xvalues(i,j): return self.stepsize*j mx = numarray.fromfunction(xvalues, (4,4)) self.matrix = mx if __name__ == '__main__': print " " print "Making anXmatrix..." r = anXmatrix(stepsize = 5) r.describe() r = anXmatrix(stepsize = 0.02) r.describe() ============================================================================ ======== Making anXmatrix... I am a __main__.anXmatrix my stepsize is 5 my matrix is: [[ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15]] Traceback (most recent call last): File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 97, in ? r = anXmatrix(stepsize = 0.02) File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 72, in __init__ self.populate_matrix() File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 86, in populate_matrix mx = numarray.fromfunction(xvalues, (4,4)) File "C:\PYTHON23\Lib\site-packages\numarray\generic.py", line 1094, in fromfunction return apply(function, tuple(indices(dimensions))) File "C:\Python23\Lib\site-packages\WendyStuff\wendycode\propagatorstuff\core_obj ects\domain_objects.py", line 84, in xvalues return self.stepsize*j File "C:\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 772, in __rmul__ r = ufunc.multiply(operand, self) error: multiply_Float64_scalar_vector: buffer not aligned on 8 byte boundary. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ============================================================================ ======== "You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat." Albert Einstein From Chris.Barker at noaa.gov Thu Jul 8 10:58:07 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 10:58:07 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40E473A9.5040109@colorado.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> Message-ID: <40ED8A6D.5050505@noaa.gov> Thanks to Fernando Perez and Travis Oliphant for pointing me to: > scipy.io.read_array In testing, I've found that it's very slow (for my needs), though quite nifty in other ways, so I'm sure I'll find a use for it in the future. Travis Oliphant wrote: > Alternatively, we could move some of the Python code in read_array to > C to improve the speed. That was beyond me, so I wrote a very simple module in C that does what I want, and it is very much faster than read_array or straight python version. It has two functions: FileScan(file) """ Reads all the values in rest of the ascii file, and produces a Numeric vector full of Floats (C doubles). All text in the file that is not part of a floating point number is skipped over. """ FileScanN(file, N) """ Reads N values in the ascii file, and produces a Numeric vector of length N full of Floats (C doubles). Raises an exception if there are fewer than N numbers in the file. All text in the file that is not part of a floating point number is skipped over. After reading N numbers, the file is left before the next non-whitespace character in the file. This will often leave the file at the start of the next line, after scanning a line full of numbers. """ I implemented them separately, 'cause I wasn't sure how to deal with optional arguments in a C function. They could easily have wrapped in a Python function if you wanted one interface. FileScan was much more complex, as I had to deal with all the dynamic memory allocation. I probably took a more complex approach to this than I had to, but it was an exercise for me, being a newbie at C. I also decided not to specify a shape for the resulting array, always returning a rank-1 array, as that made the code easier, and you can always set A.shape afterward. This could be put in a Python wrapper as well. It has the obvious limitation that it only does doubles. I'd like to add longs as well, but probably won't have a need for anything else. The way memory is these days, it seems just as easy to read the long ones, and convert afterward if you want. Here is a quick benchmark (see below) run with a file that is 63,000 lines, with two comma-delimited numbers on each line. Run on a 1GHz P4 under Linux. Reading with read_array it took 16.351712 seconds to read the file with read_array Reading with Standard Python methods it took 2.832078 seconds to read the file with standard Python methods Reading with FileScan it took 0.444431 seconds to read the file with FileScan Reading with FileScanN it took 0.407875 seconds to read the file with FileScanN As you can see, read_array is painfully slow for this kind of thing, straight Python is OK, and FileScan is pretty darn fast. I've enclosed the C code and setup.py, if anyone wants to take a look, and use it, or give suggestions or bug fixes or whatever, that would be great. In particular, I don't think I've structured the code very well, and there could be memory leak, which I have not tested carefully for. Tested only on Linux with Python2.3.3, Numeric 23.1. If someone wants to port it to numarray, that would be great too. -Chris The benchmark: def test6(): """ Testing various IO options """ from scipy.io import array_import filename = "JunkBig.txt" file = open(filename) print "Reading with read_array" start = time.time() A = array_import.read_array(file,",") print "it took %f seconds to read the file with read_array"%(time.time() - start) file.close() file = open(filename) print "Reading with Standard Python methods" start = time.time() A = [] for line in file: A.append( map ( float, line.strip().split(",") ) ) A = array(A) print "it took %f seconds to read the file with standard Python methods"%(time.time() - start) file.close() file = open(filename) print "Reading with FileScan" start = time.time() A = FileScanner.FileScan(file) A.shape = (-1,2) print "it took %f seconds to read the file with FileScan"%(time.time() - start) file.close() file = open(filename) print "Reading with FileScanN" start = time.time() A = FileScanner.FileScanN(file, product(A.shape) ) A.shape = (-1,2) print "it took %f seconds to read the file with FileScanN"%(time.time() - start) -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: FileScan_module.c URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: setup.py URL: From jmiller at stsci.edu Thu Jul 8 12:05:02 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 8 12:05:02 2004 Subject: [Numpy-discussion] "buffer not aligned on 8 byte boundary" errors when running numarray.testall.test() In-Reply-To: References: Message-ID: <1089313446.2639.55.camel@halloween.stsci.edu> On Thu, 2004-07-08 at 13:28, Wendy Langer wrote: > Hi there all :) > > I am having trouble with my installation of numarray. :( > > I am a python newbie and a numarray extreme-newbie, so it could be that I > don't yet have the first clue what I am doing. ;) > > > > Python 2.3.3 (#51, Feb 16 2004, 04:07:52) [MSC v.1200 32 bit (Intel)] on > win32 > numarray 1.0 > > > The Python I am using is the one that comes with the "Enthought" version > (www.enthought.com), a distro specifically designed to be useful for > scientists, so it comes with numerical stuff and scipy and chaco and things > like that preinstalled. > > I used the windows binary installer. However it came with Numeric and not > numarray, so I installed numarray "by hand". This seemed to go ok, and it > seems that there is no problem having both Numeric and numarray in the same > installation, since they have (obviously) different names (still getting > used to this whole modules and namespaces &c &c) I don't normally use SciPy, but I normally have both numarray and Numeric installed so there's no inherent conflict there. > At the bottom of this email I have pasted an example of what it was I was > trying to do, and the error messages that the interpreter gave me - but > before anyone bothers reading them in any detail, the essential error seems > to be as follows: > > error: multiply_Float64_scalar_vector: buffer not aligned on 8 byte > boundary. This is a low level exception triggered by a misaligned data buffer. It's low level so it's impossible to tell what the real problem is without more information. > I have no idea what this means, but I do recall that when I ran the > numarray.testall.test() procedure after first completing my installation a > couple of days ago, it reported a *lot* of problems, many of which sounded > quite similar to this. That sounds pretty bad. Here's roughly how it should look these days: % python >>> import numarray.testall as testall >>> testall.test() numarray: ((0, 1165), (0, 1165)) numarray.records: (0, 48) numarray.strings: (0, 176) numarray.memmap: (0, 82) numarray.objects: (0, 105) numarray.memorytest: (0, 16) numarray.examples.convolve: ((0, 20), (0, 20), (0, 20), (0, 20)) numarray.convolve: (0, 52) numarray.fft: (0, 75) numarray.linear_algebra: ((0, 46), (0, 51)) numarray.image: (0, 27) numarray.nd_image: (0, 390) numarray.random_array: (0, 53) numarray.ma: (0, 671) The tuple results for your test should all have leading zeros as above. The number of tests varies from release to release. > I hoped for the best and thought that perhaps I had "run the test wrong"(!) > since numarray seemed to be working ok, and I had investigated many of the > examples in chapters 3 and 4 of the user manual withour any obvious problems > (chapter 3 = "high level overview" and chapter 4 = "array basics") > > I decided at the time to leave well enough alone until I actually came > across odd or mysterious behaviour ...however that time has come > all-too-soon... > > > > > The procedure I am using to run the test is as described on page 11 of the > excellent user's manual (release 0.8 at > http://www.pfdubois.com/numpy/numarray.pdf): There's an updated manual here: http://prdownloads.sourceforge.net/numpy/numarray-1.0.pdf?download > -- > Testing your Installation > Once you have installed numarray, test it with: > C:\numarray> python > Python 2.2.2 (#18, Dec 30 2002, 02:26:03) [MSC 32 bit (Intel)] on win32 > Type "copyright", "credits" or "license" for more information. > >>> import numarray.testall as testall > >>> testall.test() > numeric: (0, 1115) > records: (0, 48) > strings: (0, 166) > objects: (0, 72) > memmap: (0, 75) > Each line in the above output indicates that 0 of X tests failed. X grows > steadily with each release, so the numbers > shown above may not be current. > -- > > Anyway, when I ran this, instead of the nice, comforting output above, I > got about a million(!) errors and then a final count of 320 failures. This > number is not always constant - I recall the first time I ran it it was 209. > [I just ran it again and this time it was 324...it all has a rather > disturbing air of semi-randomness...] > > > So below is the (heavily snipped) output from the testall.test() run, and > below that is the code where I first noticed a possibly similar error, and > below *that* is the output of that code, with the highly suspicous > error.... > > > Any suggestions greatly appreciated! If you've ever had numarray installed before, go to your site-packages directory and delete numarray as well as any numarray.pth. Then reinstall numarray-1.0. Also, just do: >>> import numarray >>> numarray and see what kind of path is involved getting to the numarray module. > I can give you more info about the setup on my computer and so on if you > need :) I think you already included everything important; the exact variant of Windows you're using might be helpful; I'm not aware of any problems there though. It looks like you're on a well supported platform. I just tested pretty much the same configuration on Windows 2000 Pro, with Python-2.3.4, and it worked fine even with SciPy-0.3. > wendy langer > > > ====================================================================== > > There's something hugely wrong with your test output. I've never seen anything like it other than during development. > > > ========================================================================= > > ======================== > > > import numarray > > class anXmatrix: > def __init__(self, stepsize = 3): > self.stepsize = stepsize > self.populate_matrix() > > > def describe(self): > print "I am a ", self.__class__ > print "my stepsize is", self.stepsize > print "my matrix is: \n" > print self.matrix > > def populate_matrix(self): > > def xvalues(i,j): > return self.stepsize*j > > mx = numarray.fromfunction(xvalues, (4,4)) > self.matrix = mx > > > if __name__ == '__main__': > > > print " " > print "Making anXmatrix..." > r = anXmatrix(stepsize = 5) > r.describe() > r = anXmatrix(stepsize = 0.02) > r.describe() > > > > ============================================================================ Here's what I get when I run your code, windows or linux: Making anXmatrix... I am a __main__.anXmatrix my stepsize is 5 my matrix is: [[ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15] [ 0 5 10 15]] I am a __main__.anXmatrix my stepsize is 0.02 my matrix is: [[ 0. 0.02 0.04 0.06] [ 0. 0.02 0.04 0.06] [ 0. 0.02 0.04 0.06] [ 0. 0.02 0.04 0.06]] Regards, Todd From Fernando.Perez at colorado.edu Thu Jul 8 12:25:07 2004 From: Fernando.Perez at colorado.edu (Fernando.Perez at colorado.edu) Date: Thu Jul 8 12:25:07 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40ED8A6D.5050505@noaa.gov> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> Message-ID: <1089314664.40ed9f68e1db5@webmail.colorado.edu> Quoting Chris Barker : > Thanks to Fernando Perez and Travis Oliphant for pointing me to: > > > scipy.io.read_array > > In testing, I've found that it's very slow (for my needs), though quite > nifty in other ways, so I'm sure I'll find a use for it in the future. Just a quick note Travis sent to me privately: he suggested using io.numpyio.fread instead of Numeric.fromstring() for speed reasons. I don't know if it will help in your case, I just mention it in case it helps. Cheers, F From Chris.Barker at noaa.gov Thu Jul 8 12:41:06 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 12:41:06 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <1089314664.40ed9f68e1db5@webmail.colorado.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> <1089314664.40ed9f68e1db5@webmail.colorado.edu> Message-ID: <40EDA2A8.9030300@noaa.gov> Fernando.Perez at colorado.edu wrote: \> Just a quick note Travis sent to me privately: he suggested using > io.numpyio.fread instead of Numeric.fromstring() for speed reasons. I don't > know if it will help in your case, I just mention it in case it helps. Thanks, but those are for binary files, which I have to do sometimes, so I'll keep it in mind. However, my problem at hand is text files, and my solution is working nicely, though I'd love a pair of more experienced eyes on the code.... -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jul 8 13:50:03 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 13:50:03 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <004c01c46524$ab808090$ebeca782@stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> <1089314664.40ed9f68e1db5@webmail.colorado.edu> <40EDA2A8.9030300@noaa.gov> <004c01c46524$ab808090$ebeca782@stsci.edu> Message-ID: <40EDB2BD.4080809@noaa.gov> Todd Miller wrote: > I looked this over to see how hard it would be to port to numarray. At > first glance, it looks easy. I didn't really read it closely enough to > pick up bugs, but what I saw looks good. One thing I did notice was a > calloc of temporary data space. That seemed like a possible waste: can't > you just preallocate the array and read your data directly into it? The short answer is that I'm not very smart! The longer answer is that this is because at first I misunderstood what PyArray_FromDimsAndData was for. For ScanFileN, I'll re-do it as you suggest. For ScanFile, it is unknown at the beginning how big the final array is, and I did scheme that would allocate the memory as it went, in reasonable sized chunks. However, this does require a full copy, which is a problem. Since posting, I thought of a MUCH easier scheme: scan the file, without storing the data, to see how many numbers there are. rewind the file allocate the Array Read the data. This requires scanning the file twice, which would cost, but would be easier, and prevent an unnecessary copy of the data. I hope I"ll get a change to try it out and see what the performance is like. IN the meantime, anyone else have any thoughts? By the way, does it matter whether I use malloc or calloc? I can't really tell the difference from K&R. > This is > probably a very minor speed issue, but might be a significant storage issue > as people are starting to max out 32-bit systems. yup. This is all pointless if it's not a lot of data, after all. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jul 8 16:21:16 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 8 16:21:16 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <40EDB2BD.4080809@noaa.gov> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040701053355.M99698@grenoble.cnrs.fr> <40E470D9.8060603@noaa.gov> <40E473A9.5040109@colorado.edu> <40ED8A6D.5050505@noaa.gov> <1089314664.40ed9f68e1db5@webmail.colorado.edu> <40EDA2A8.9030300@noaa.gov> <004c01c46524$ab808090$ebeca782@stsci.edu> <40EDB2BD.4080809@noaa.gov> Message-ID: <40EDD64A.1060508@noaa.gov> Chris Barker wrote: >> can't >> you just preallocate the array and read your data directly into it? > > The short answer is that I'm not very smart! The longer answer is that > this is because at first I misunderstood what PyArray_FromDimsAndData > was for. For ScanFileN, I'll re-do it as you suggest. I've re-done it. Now I don't double allocate storage for ScanFileN. There was no noticeable difference in performance, but why use memory you don't have to? For ScanFile, it is unknown at the beginning how big the final array is, so I now have two versions. One is what I had before, it allocates memory in blocks of some Buffersize as it reads the file (now set to 1024 elements). Once it's all read in, it creates an appropriate size PyArray, and copies the data to it. This results in a double copy of all the data until the temporary memory is freed. I now also have a ScanFile2, which scans the whole file first, then creates a PyArray, and re-reads the file to fill it up. This version takes about twice as long, confirming my expectation that the time to allocate and copy data is tiny compared to reading and parsing the file. Here's a simple benchmark: Reading with Standard Python methods (62936, 2) it took 2.824013 seconds to read the file with standard Python methods Reading with FileScan (62936, 2) it took 0.400936 seconds to read the file with FileScan Reading with FileScan2 (62936, 2) it took 0.752649 seconds to read the file with FileScan2 Reading with FileScanN (62936, 2) it took 0.441714 seconds to read the file with FileScanN So it takes twice as long to count the numbers first, but it's still three times as fast as just doing all this with Python. However, I usually don't think it's worth all this effort for a 3 times speed up, and I tend to make copies my arrays all over the place with NumPy anyway, so I'm inclined to stick with the first method. Also, if you are really that tight on memory, you could always read it in chunks with ScanFileN. Any feedback anyone wants to give is very welcome. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: FileScan_module.c URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: setup.py URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: TestFileScan.py URL: From falted at pytables.org Fri Jul 9 03:55:03 2004 From: falted at pytables.org (Francesc Alted) Date: Fri Jul 9 03:55:03 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion Message-ID: <200407091254.06579.falted@pytables.org> Hi, As Perry said not too long ago that numarray crew would ask for suggestions for RecArray improvements, I'm going to suggest a couple. I find quite inconvenient the .tolist() method when applied to RecArray objects as it is now: >>> r[2:4] array( [(3, 33.0, 'c'), (4, 44.0, 'd')], formats=['1UInt8', '1Float32', '1a1'], shape=2, names=['c1', 'c2', 'c3']) >>> r[2:4].tolist() [, ] The suggested behaviour would be: >>> r[2:4].tolist() [(3, 33.0, 'c'),(4, 44.0, 'd')] Another thing is that an element of recarray would be returned as a tuple instead as a records.Record object: >>> r[2] The suggested behaviour would be: >>> r[2] (3, 33.0, 'c') I think the latter would be consistent with the convention that a __getitem__(int) of a NumArray object returns a python type instead of a rank-0 array. In the same way, a __getitem__(int) of a RecArray should return a a python type (a tuple in this case). Below is the code that I use right now to simulate this behaviour, but it would be nice if the code would included into numarray.records module. def tolist(arr): """Converts a RecArray or Record to a list of rows""" outlist = [] if isinstance(arr, records.Record): for i in range(arr.array._nfields): outlist.append(arr.array.field(i)[arr.row]) outlist = tuple(outlist) # return a tuple for records elif isinstance(arr, records.RecArray): for j in range(arr.nelements()): tmplist = [] for i in range(arr._nfields): tmplist.append(arr.field(i)[j]) outlist.append(tuple(tmplist)) return outlist Cheers, -- Francesc Alted From thomas_karlsson_569 at hotmail.com Fri Jul 9 08:02:44 2004 From: thomas_karlsson_569 at hotmail.com (Thomas Karlsson) Date: Fri Jul 9 08:02:44 2004 Subject: [Numpy-discussion] Numpy compiling error... Help! Message-ID: Hi I'm trying to compile/install numpy on a RH9 machine. When doing so I run into problems. I give the command: python setup.py install and get a long answer, with this error at the end: gcc -shared build/temp.linux-i686-2.2/lapack_litemodule.o -L/usr/lib/atlas -llapack -lcblas -lf77blas -latlas -lg2c -o build/lib.linux-i686-2.2/lapack_lite.so /usr/bin/ld: cannot find -llapack collect2: ld returned 1 exit status error: command 'gcc' failed with exit status 1 Does anyone know what I've done wrong? I've spent alot of time on this and really needs help now... Regards Thomas _________________________________________________________________ Hitta r?tt p? n?tet med MSN S?k http://search.msn.se/ From Chris.Barker at noaa.gov Fri Jul 9 09:44:12 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jul 9 09:44:12 2004 Subject: [Numpy-discussion] How to read data from text files fast? In-Reply-To: <3afee4a2.5cf5a1c3.8234000@expms6.cites.uiuc.edu> References: <3afee4a2.5cf5a1c3.8234000@expms6.cites.uiuc.edu> Message-ID: <40EECAB8.3050900@noaa.gov> Bruce, Thanks for your feedback. Bruce Southey wrote: > While I am not really following your thread, I just wanted to comment that the > Python Cookbook (at least the printed version) has some ways to count lines in a > file - assuming that the number of lines provides the size. The number of lines does not necessarily provide the size. In the general case, it doesn't at all. My whole goal here is the general case: being able to read a bunch of numbers out of any format of text file. This can be used as part of a parser for many file formats. If I was shooting for just one format, this would be easier, but not general purpose. Now that I have this, I can write a number of file format parsers in python with improved performance and easier syntax. Under Unix (but not > windows), I am aiming for a portable solution. > Alternatively if sufficient memory is available, storing the file in memory > (during the counting of elements) should always be faster than reading it a > second time from the hard disk. The primary reason to scan the file ahead of time to count the elements is to save the memory of duplicate copies of data. The other reason is to make memory management easier, but since I've already solved that problem, I'm done. thanks, -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From perry at stsci.edu Mon Jul 12 14:15:01 2004 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jul 12 14:15:01 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407091254.06579.falted@pytables.org> Message-ID: Francesc Alted wrote: > > As Perry said not too long ago that numarray crew would ask for > suggestions > for RecArray improvements, I'm going to suggest a couple. > > I find quite inconvenient the .tolist() method when applied to RecArray > objects as it is now: > > >>> r[2:4] > array( > [(3, 33.0, 'c'), > (4, 44.0, 'd')], > formats=['1UInt8', '1Float32', '1a1'], > shape=2, > names=['c1', 'c2', 'c3']) > >>> r[2:4].tolist() > [, > ] > > > The suggested behaviour would be: > > >>> r[2:4].tolist() > [(3, 33.0, 'c'),(4, 44.0, 'd')] > > Another thing is that an element of recarray would be returned as a tuple > instead as a records.Record object: > > >>> r[2] > > > The suggested behaviour would be: > > >>> r[2] > (3, 33.0, 'c') > > I think the latter would be consistent with the convention that a > __getitem__(int) of a NumArray object returns a python type instead of a > rank-0 array. In the same way, a __getitem__(int) of a RecArray should > return a a python type (a tuple in this case). > These are good examples of where improvements are needed (we are also looking at how best to handle multidimensional arrays and should have a proposal this week). What I'm wondering about is what a single element of a record array should be. Returning a tuple has an undeniable simplicity to it. On the other hand, we've been using recarrays that allow naming the various columns (which we refer to as "fields"). If one can refer to fields of a recarray, shouldn't one be able to refer to a field (by name) of one of it's elements? Or are you proposing that basic recarrays not have that sort of capability (something added by a subclass)? Perry From rowen at u.washington.edu Mon Jul 12 16:09:00 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Mon Jul 12 16:09:00 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: At 5:14 PM -0400 2004-07-12, Perry Greenfield wrote: >What I'm wondering about is what a single element of a record array >should be. Returning a tuple has an undeniable simplicity to it. >On the other hand, we've been using recarrays that allow naming the >various columns (which we refer to as "fields"). If one can refer >to fields of a recarray, shouldn't one be able to refer to a field >(by name) of one of it's elements? Or are you proposing that basic >recarrays not have that sort of capability (something added by a >subclass)? In my opinion, an single item of a record array should be a RecordItem object that is a dictionary that keeps items in field order. Thus: - use the standard dictionary interface to deal with values by name (except the keys are always in the correct order. - one can also get and set the all data at once as a tuple. This is NOT a standard dictionary interface, but is essential. Functions such as getvalues(), setvalues(dataTuple) should do it. Adopting the full dictionary interface means one gets a standard, mature and fairly complete set of features. ALSO a RecordItem object can then be used wherever a dictionary object is needed. I suspect it's also useful to have named field access: RecordItem.fieldname but am a bit reluctant to suggest so many different ways of getting to the data. I assume it will continue to be easy to get all data for a field by naming the appropriate field. That's a really nice feature. It would be even better if a masked array could be used, but I have no idea how hard this would be. Which brings up a side issue: any hope of integrating masked arrays into numarray, such that they could be used wherever a numarray array could be used? Areas that I particularly find myself needing them including nd_image filtering and writing C extensions. -- Russell P.S. I submitted several feature requests and bug reports for records on sourceforge months ago. I hope they'll not be overlooked during the review process. From falted at pytables.org Tue Jul 13 01:30:55 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 13 01:30:55 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: <200407131028.04791.falted@pytables.org> A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: > What I'm wondering about is what a single element of a record array > should be. Returning a tuple has an undeniable simplicity to it. Yeah, this why I'm strongly biased toward this possibility. > On the other hand, we've been using recarrays that allow naming the > various columns (which we refer to as "fields"). If one can refer > to fields of a recarray, shouldn't one be able to refer to a field > (by name) of one of it's elements? Or are you proposing that basic > recarrays not have that sort of capability (something added by a > subclass)? Well, I'm not sure about that. But just in case most of people would like to access records by field as well as by index, I would advocate for the possibility that the Record instances would behave as similar as possible as a tuple (or dictionary?). That include creating appropriate __str__() *and* __repr__() methods as well as __getitem__() that supports both name fields and indices. I'm not sure about whether providing an __getattr__() method would ok, but for the sake of simplicity and in order to have (preferably) only one way to do things, I would say no. Regards, -- Francesc Alted From falted at pytables.org Tue Jul 13 02:07:00 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 13 02:07:00 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407131028.04791.falted@pytables.org> References: <200407131028.04791.falted@pytables.org> Message-ID: <200407131106.19557.falted@pytables.org> A Dimarts 13 Juliol 2004 10:28, Francesc Alted va escriure: > A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: > > What I'm wondering about is what a single element of a record array > > should be. Returning a tuple has an undeniable simplicity to it. > > Yeah, this why I'm strongly biased toward this possibility. > > > On the other hand, we've been using recarrays that allow naming the > > various columns (which we refer to as "fields"). If one can refer > > to fields of a recarray, shouldn't one be able to refer to a field > > (by name) of one of it's elements? Or are you proposing that basic > > recarrays not have that sort of capability (something added by a > > subclass)? > > Well, I'm not sure about that. But just in case most of people would like to > access records by field as well as by index, I would advocate for the > possibility that the Record instances would behave as similar as possible as > a tuple (or dictionary?). That include creating appropriate __str__() *and* > __repr__() methods as well as __getitem__() that supports both name fields > and indices. I'm not sure about whether providing an __getattr__() method > would ok, but for the sake of simplicity and in order to have (preferably) > only one way to do things, I would say no. I've been thinking that one can made compatible to return a tuple on a single element of a RecArray and still being able to retrieve a field by name is to play with the RecArray.__getitem__ and let it to suport key names in addition to indices. This would be better seen as an example: Right now, one can say: >>> r=records.array([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") >>> r._fields["c1"] array([1, 2]) >>> r._fields["c1"][1] 2 What I propose is to be able to say: >>> r["c1"] array([1, 2]) >>> r["c1"][1] 2 Which would replace the notation: >>> r[1]["c1"] 2 which was recently suggested. I.e. the suggestion is to realize RecArrays as a collection of columns, as well as a collection of rows. -- Francesc Alted From falted at pytables.org Tue Jul 13 02:13:03 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 13 02:13:03 2004 Subject: [Numpy-discussion] PyTables 0.8.1 released Message-ID: <200407131112.15345.falted@pytables.org> PyTables is a hierarchical database package designed to efficiently manage very large amounts of data. PyTables is built on top of the HDF5 library and the numarray package. It features an object-oriented interface that, combined with natural naming and C-code generated from Pyrex sources, makes it a fast, yet extremely easy-to-use tool for interactively saving and retrieving different kinds of datasets. It also provides flexible indexed access on disk to anywhere in the data. The primary purpose of this release is to incorporate updates to related to the newly released numarray 1.0. I've taken the opportunity to backport some improvements added in PyTables 0.9 (in alpha stage) as well as to fix the known problems Improvements: - The logic for computing the buffer sizes has been revamped. As a consequence, the performance of writing/reading tables with large record sizes has improved by a factor of ten or more, now exceeding 70 MB/s for writing and 130 MB/s for reading (using compression). - The maximum record size for tables has been raised to 512 KB (before it was 8 KB, due to some internal limitations) - Documentation has been improved in many minor details. As a result of a fix in the underlying documentation system (tbook), chapters start now at odd pages, instead of even. So those of you who want to print to double side probably will have better luck now when aligning pages ;). Another one is that HTML documentation has improved its look as well. Bug Fixes: - Indexing of Arrays with list or tuple flavors (#968131) When retrieving single elements from an array with 'List' or 'Tuple' flavors, an error occurred. This has been corrected and now you can retrieve fileh.root.array[2] without problems for 'List' or 'Tuple' flavored (E, VL)Arrays. - Iterators on Arrays with list or tuple flavors fail (#968132) When using iterators with Array objects with 'List' or 'Tuple' flavors, an error occurred. This has been corrected. - Last Index (-1) of Arrays doesn't work (#968149) When accessing to the last element in an Array using the notation -1, an empty list (or tuple or array) is returned instead of the proper value. This happened in general with all negative indices. Fixed. - Table.read(flavor="List") should return pure lists (#972534) However, it used to return a pointer to numarray.records.Record instances, as in: >>> fileh.root.table.read(1,2,flavor="List") [] >>> fileh.root.table.read(1,3,flavor="List") [, ] Now the next records are returned: >>> fileh.root.table.read(1,2, flavor=List) [(' ', 1, 1.0)] >>> fileh.root.table.read(1,3, flavor=List) [(' ', 1, 1.0), (' ', 2, 2.0)] In addition, when reading a single row of a table, a numarray.records.Record pointer was returned: >>> fileh.root.table[1] Now, it returns a tuple: >>> fileh.root.table[1] (' ', 1, 1.0) Which I think is more consistent, and more Pythonic. - Copy of leaves fails... (#973370) Attempting to copy leaves (Table or Array with different flavors) on top of themselves caused an internal error in PyTables. This has been corrected by silently avoiding the copy and returning the original Leaf as a result. Minor changes: - When assigning a value to a non-existing field in a table row, now a KeyError is raised, instead of the AttributeError that was issued before. I think this is more consistent with the type of error. - Tests have been improved so as to pass the whole suite when compiled in 64 bit mode on a Linux/PowerPC machine (namely a dual-G5 Powermac running a 64-bit, 2.6.4 Linux kernel and the preview YDL distribution for G5, with 64-bit GCC toolchain). Thanks to Ciro Cattuto for testing and reporting the modifications that were needed. Where PyTables can be applied? ------------------------------ PyTables is not designed to work as a relational database competitor, but rather as a teammate. If you want to work with large datasets of multidimensional data (for example, for multidimensional analysis), or just provide a categorized structure for some portions of your cluttered RDBS, then give PyTables a try. It works well for storing data from data acquisition systems (DAS), simulation software, network data monitoring systems (for example, traffic measurements of IP packets on routers), very large XML files, or for creating a centralized repository for system logs, to name only a few possible uses. What is a table? ---------------- A table is defined as a collection of records whose values are stored in fixed-length fields. All records have the same structure and all values in each field have the same data type. The terms "fixed-length" and "strict data types" seem to be quite a strange requirement for a language like Python that supports dynamic data types, but they serve a useful function if the goal is to save very large quantities of data (such as is generated by many scientific applications, for example) in an efficient manner that reduces demand on CPU time and I/O resources. What is HDF5? ------------- For those people who know nothing about HDF5, it is a general purpose library and file format for storing scientific data made at NCSA. HDF5 can store two primary objects: datasets and groups. A dataset is essentially a multidimensional array of data elements, and a group is a structure for organizing objects in an HDF5 file. Using these two basic constructs, one can create and store almost any kind of scientific data structure, such as images, arrays of vectors, and structured and unstructured grids. You can also mix and match them in HDF5 files according to your needs. Platforms --------- I'm using Linux (Intel 32-bit) as the main development platform, but PyTables should be easy to compile/install on many other UNIX machines. This package has also passed all the tests on a UltraSparc platform with Solaris 7 and Solaris 8. It also compiles and passes all the tests on a SGI Origin2000 with MIPS R12000 processors, with the MIPSPro compiler and running IRIX 6.5. It also runs fine on Linux 64-bit platforms, like an AMD Opteron running SuSe Linux Enterprise Server or PowerPC G5 with Linux 2.6.x in 64bit mode. It has also been tested in MacOSX platforms (10.2 but should also work on newer versions). Regarding Windows platforms, PyTables has been tested with Windows 2000 and Windows XP (using the Microsoft Visual C compiler), but it should also work with other flavors as well. An example? ----------- For online code examples, have a look at http://pytables.sourceforge.net/html/tut/tutorial1-1.html and, for newly introduced Variable Length Arrays: http://pytables.sourceforge.net/html/tut/vlarray2.html Web site -------- Go to the PyTables web site for more details: http://pytables.sourceforge.net/ Share your experience --------------------- Let me know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy! -- Francesc Alted From jmiller at stsci.edu Tue Jul 13 10:42:04 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 13 10:42:04 2004 Subject: [Numpy-discussion] numarray-1.0 Bug Alert Message-ID: <1089740511.9509.372.camel@halloween.stsci.edu> Overview There is a bug in numarray's Numeric compatible C-API. The bug has been latent for a long time, since numarray-0.3 was released roughly two years ago. It is serious because it results in wrong answers for a certain extension functions fed a certain class of arrays. What's affected The bug affects affects numarray's add-on packages or third party extension functions which use the Numeric compatibility C-API. Generally, this means C-code that was either ported from Numeric or was written with both Numeric and numarray in mind. This includes the add-on packages numarray.linear_algebra, numarray.fft, numarray.random_array, and numarray.mlab. More recently, it includes the ports of core Numeric functions to numarray.numeric. Because numarray.ma uses numarray.numeric, the bug also affects numarray.ma. Finally, for numarray-1.0 this bug affects the functions numarray.argmin and numarray.argmax; these should be the only two functions in core numarray which are affected. Detailed Bug Description The bug is exposed by calling an extension function (written using the Numeric compatible C-API) with an array that has a non-zero _byteoffset attribute. Arrays with non-zero _byteoffset are typically created as a result of partially indexing higher dimensional arrays or slicing arrays. Partially indexing or slicing an array generally results in a sub-array, a view which often refers to an interior region of the original array buffer. Because numarray's PyArrayObject does not currently include it's ->byteoffset in its ->data pointer as the Numeric compatibility API assumes it does, an extension function sees the base region of the original array rather than the region belonging to the sub-array. Immediate User Workaround A simple user level workaround for people that need to use the affected packages and functions today is one like the following: def make_safe_for_numeric_api(a): a = numarray.asarray(a) if a._byteoffset != 0: return a.copy() else: return a The array inputs to an affected extension function need to be wrapped with calls to make_safe_for_numeric_api(). Since this is intrusive and a real fix should be released in the near future, this approach is not recommended. Long Term Fix The real fix for the bug appears to be to redefine the semantics of numarray's PyArrayObject ->data pointer to include ->byteoffset, altering the C-API. This should make most existing Numeric compatible extension functions work without modification or recompilation, but will necessitate the re-compilation of some extension functions written using the native numarray API approaches (the NA_* functions and macros). This recompilation will be required because key macros will change, most notably NA_OFFSETDATA. This fix is not the only possible one, and other suggestions are welcome, but changing the semantics of ->data appears to be the best way to facilitate numarray/Numeric interoperability. By doing this fix, numarray operates more like Numeric so fewer changes need to be made in the future to perform ports of Numeric code to numarray. Impact of Proposed Fix Regrettably, the proposed fix will break binary compatibility for clients of the numarray-1.0 native C-API. So, extensions built using the numarray native C-API will need to be rebuilt for numarray-1.1. Extensions that have made direct access to PyArrayObject's ->data and require the original offsetless meaning will also need to change code for numarray-1.1. This is something we *really* wanted to avoid... it just isn't going to happen this time. The Plan The current plan is to fix the Numeric compatible API by changing the semantics of ->data and release numarray-1.1 relatively soon, hopefully within 2 weeks. I'm sorry for any inconvenience this has caused numarray users. Regards, Todd Miller From zingale at ucolick.org Tue Jul 13 12:54:02 2004 From: zingale at ucolick.org (Mike Zingale) Date: Tue Jul 13 12:54:02 2004 Subject: [Numpy-discussion] differencing numarray arrays. Message-ID: Hi, I am trying to efficiently compute a difference of two 2-d flux arrays, as arises quite commonly in finite-difference/finite-volume methods. Ex: a = arange(64) a.shape = (8,8) I want to do create a new array, b, of shape such that b[i,j] = a[i,j] - a[i-1,j] for 1 <= i < 8 0 <= i < 8 I can obviously do this through loops, but this is quite slow. In IDL, which is often compared to numarray/python, this is simple to do with the shift() function, but I cannot find an efficient way to do it with numarray arrays. I tried defining a list i = range(8) im1[1:9] = im1[1:9] - 1 and indexing with im1, but this does not work. Any suggestions? For large array, this simple differencing in python is very expensive when using loops. Thanks, Mike ------------------------------------------------------------------------------ Michael Zingale UCO/Lick Observatory UCSC Santa Cruz, CA 95064 phone: (831) 459-5246 fax: (831) 459-5265 e-mail: zingale at ucolick.org web: http://www.ucolick.org/~zingale ``Don't worry head, the computer will do our thinking now'' -- Homer From tim.hochberg at cox.net Tue Jul 13 12:59:00 2004 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Jul 13 12:59:00 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: References: Message-ID: <40F43EC4.70903@cox.net> Mike Zingale wrote: >Hi, I am trying to efficiently compute a difference of two 2-d flux >arrays, as arises quite commonly in finite-difference/finite-volume >methods. Ex: > >a = arange(64) >a.shape = (8,8) > >I want to do create a new array, b, of shape such that > >b[i,j] = a[i,j] - a[i-1,j] > >for 1 <= i < 8 > 0 <= i < 8 > > That's supposed to be a j in the second eq., right? If I understand you right, what you want is: b = a[1:] - a[:-1] -tim >I can obviously do this through loops, but this is quite slow. In IDL, >which is often compared to numarray/python, this is simple to do with the >shift() function, but I cannot find an efficient way to do it with >numarray arrays. > >I tried defining a list > >i = range(8) >im1[1:9] = im1[1:9] - 1 > >and indexing with im1, but this does not work. > >Any suggestions? For large array, this simple differencing in python is >very expensive when using loops. > >Thanks, > >Mike > >------------------------------------------------------------------------------ >Michael Zingale >UCO/Lick Observatory >UCSC >Santa Cruz, CA 95064 > >phone: (831) 459-5246 >fax: (831) 459-5265 >e-mail: zingale at ucolick.org >web: http://www.ucolick.org/~zingale > >``Don't worry head, the computer will do our thinking now'' -- Homer > > > >------------------------------------------------------- >This SF.Net email sponsored by Black Hat Briefings & Training. >Attend Black Hat Briefings & Training, Las Vegas July 24-29 - >digital self defense, top technical experts, no vendor pitches, >unmatched networking opportunities. Visit www.blackhat.com >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From rkern at ucsd.edu Tue Jul 13 13:01:04 2004 From: rkern at ucsd.edu (Robert Kern) Date: Tue Jul 13 13:01:04 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: References: Message-ID: <40F43F65.9040208@ucsd.edu> Mike Zingale wrote: > Hi, I am trying to efficiently compute a difference of two 2-d flux > arrays, as arises quite commonly in finite-difference/finite-volume > methods. Ex: > > a = arange(64) > a.shape = (8,8) > > I want to do create a new array, b, of shape such that > > b[i,j] = a[i,j] - a[i-1,j] > > for 1 <= i < 8 > 0 <= i < 8 Try b = a[1:] - a[:-1] -- Robert Kern rkern at ucsd.edu "In the fields of hell where the grass grows high Are the graves of dreams allowed to die." -- Richard Harter From zingale at ucolick.org Tue Jul 13 13:42:02 2004 From: zingale at ucolick.org (Mike Zingale) Date: Tue Jul 13 13:42:02 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: <40F44766.9010009@pfdubois.com> References: <40F44766.9010009@pfdubois.com> Message-ID: thanks, all these responses helped. I guess I was still a little unclear with the slicing abilities in numarray. Mike On Tue, 13 Jul 2004, Paul Dubois wrote: > Two of the responses to your question, while correct, might have seemed > mysterious to a beginner. > > a[1:] - a[:-1] > > is actually shorthand for: > > a[1:, :] - a[:-1, :] > > Or to be even more explicit: > > n = 8 > a[1:n, 0:n] - a[0:(n-1), 0:n] > > If you had wanted the difference in the second index, you have to use > the more explicit forms. > > > From rowen at u.washington.edu Tue Jul 13 17:11:49 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Tue Jul 13 17:11:49 2004 Subject: [Numpy-discussion] differencing numarray arrays. In-Reply-To: References: <40F44766.9010009@pfdubois.com> Message-ID: At 1:41 PM -0700 2004-07-13, Mike Zingale wrote: >thanks, all these responses helped. I guess I was still a little >unclear with the slicing abilities in numarray... Also note that there is a shift function: numarray.nd_image.shift In your case I suspect slicing is better, but there are times when one really does want to shift the data (e.g. when one wants the resulting array to be the same shape as the original). -- Russell From kyeser at earthlink.net Tue Jul 13 19:35:39 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Tue Jul 13 19:35:39 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? Message-ID: Hi. I wrote a program to calculate sums of every possible combinations of two indices of a list. The main body of the program looks something like this: r = [0,2,5,6,8] l = [] for x in range(0, len(r)): for y in range(0, len(r)): k = r[x]+r[y] l.append(k) print l 1. I've heard that it's not a good idea to have a 'for' loop within another 'for' loop, and I was wondering if there is a more efficient way to do this. 2. Does anyone know if there is a built-in function or module that would do the above task in NumPy or Numarray (or even in Python)? I would really appreciate it if anyone could let me know. Thanks for your help! -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: text/enriched Size: 715 bytes Desc: not available URL: From focke at slac.stanford.edu Tue Jul 13 22:02:08 2004 From: focke at slac.stanford.edu (Warren Focke) Date: Tue Jul 13 22:02:08 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? In-Reply-To: References: Message-ID: l = Numeric.add.outer(r, r).flat oughta do the trick. Should work for numarray, too. On Tue, 13 Jul 2004, Hee-Seng Kye wrote: > Hi. I wrote a program to calculate sums of every possible combinations > of two indices of a list. The main body of the program looks something > like this: > > r = [0,2,5,6,8] > l = [] > > for x in range(0, len(r)): > for y in range(0, len(r)): > k = r[x]+r[y] > l.append(k) > print l > > 1. I've heard that it's not a good idea to have a 'for' loop within > another 'for' loop, and I was wondering if there is a more efficient > way to do this. > > 2. Does anyone know if there is a built-in function or module that > would do the above task in NumPy or Numarray (or even in Python)? > > I would really appreciate it if anyone could let me know. > > Thanks for your help! From eric at enthought.com Tue Jul 13 22:09:01 2004 From: eric at enthought.com (eric jones) Date: Tue Jul 13 22:09:01 2004 Subject: [Numpy-discussion] ANN: Reminder -- SciPy 04 is coming up Message-ID: <40F4BF9E.8060103@enthought.com> Hey folks, Just a reminder that SciPy 04 is coming up. More information is here: http://www.scipy.org/wikis/scipy04 About the Conference and Keynote Speaker --------------------------------------------- The 1st annual *SciPy Conference* will be held this year at Caltech, September 2-3, 2004. As some of you may know, we've experienced great participation in two SciPy "Workshops" (with ~70 attendees in both 2002 and 2003) and this year we're graduating to a "conference." With the prestige of a conference comes the responsibility of a keynote address. This year, Jim Hugunin has answered the call and will be speaking to kickoff the meeting on Thursday September 2nd. Jim is the creator of Numeric Python, Jython, and co-designer of AspectJ. Jim is currently working on IronPython--a fast implementation of Python for .NET and Mono. Presenters ----------- We still have room for a few more standard talks, and there is plenty of room for lightning talks. Because of this, we are extending the abstract deadline until July 23rd. Please send your abstract to abstracts at scipy.org. Travis Oliphant is organizing the presentations this year. (Thanks!) Once accepted, papers and/or presentation slides are acceptable and are due by August 20, 2004. Registration ------------- Early registration ($100.00) has been extended to July 23rd. Follow the links off of the main conference site: http://www.scipy.org/wikis/scipy04 After July 23rd, registration will be $150.00. Registration includes breakfast and lunch Thursday & Friday and a very nice dinner Thursday night. Please register as soon as possible as it will help us in planning for food, room sizes, etc. Sprints -------- As of now, we really haven't had much of a call for coding sprints for the 3 days prior to SciPy 04. Below is the original announcement about sprints. If you would like to suggest a topic and see if others are interested, please send a message to the list. Otherwise, we'll forgo the sprints session this year. We're also planning three days of informal "Coding Sprints" prior to the conference -- August 30 to September 1, 2004. Conference registration is not required to participate in the sprints. Please email the list, however, if you plan to attend. Topics for these sprints will be determined via the mailing lists as well, so please submit any suggestions for topics to the scipy-user list: list signup: http://www.scipy.org/mailinglists/ list address: scipy-user at scipy.org thanks, eric From kyeser at earthlink.net Tue Jul 13 23:30:13 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Tue Jul 13 23:30:13 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? In-Reply-To: References: Message-ID: <34CF38C4-D55F-11D8-8504-000393479EE8@earthlink.net> Thank you so much. It works beautifully! On Jul 14, 2004, at 1:01 AM, Warren Focke wrote: > l = Numeric.add.outer(r, r).flat > oughta do the trick. Should work for numarray, too. > > On Tue, 13 Jul 2004, Hee-Seng Kye wrote: > >> Hi. I wrote a program to calculate sums of every possible >> combinations >> of two indices of a list. The main body of the program looks >> something >> like this: >> >> r = [0,2,5,6,8] >> l = [] >> >> for x in range(0, len(r)): >> for y in range(0, len(r)): >> k = r[x]+r[y] >> l.append(k) >> print l >> >> 1. I've heard that it's not a good idea to have a 'for' loop within >> another 'for' loop, and I was wondering if there is a more efficient >> way to do this. >> >> 2. Does anyone know if there is a built-in function or module that >> would do the above task in NumPy or Numarray (or even in Python)? >> >> I would really appreciate it if anyone could let me know. >> >> Thanks for your help! > > > ------------------------------------------------------- > This SF.Net email sponsored by Black Hat Briefings & Training. > Attend Black Hat Briefings & Training, Las Vegas July 24-29 - > digital self defense, top technical experts, no vendor pitches, > unmatched networking opportunities. Visit www.blackhat.com > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From falted at pytables.org Wed Jul 14 02:37:06 2004 From: falted at pytables.org (Francesc Alted) Date: Wed Jul 14 02:37:06 2004 Subject: [Numpy-discussion] numarray-1.0 Bug Alert In-Reply-To: <1089740511.9509.372.camel@halloween.stsci.edu> References: <1089740511.9509.372.camel@halloween.stsci.edu> Message-ID: <200407141136.09436.falted@pytables.org> A Dimarts 13 Juliol 2004 19:41, Todd Miller va escriure: > The real fix for the bug appears to be to redefine the semantics of > numarray's PyArrayObject ->data pointer to include ->byteoffset, > altering the C-API. Oh well, I'm afraid that I'll be affected by that :(. Just to understand that fully, you mean that real data for an array will start in the future at narr->data, instead of narr->data+narr->byteoffset as it does now? -- Francesc Alted From jmiller at stsci.edu Wed Jul 14 04:38:09 2004 From: jmiller at stsci.edu (Todd Miller) Date: Wed Jul 14 04:38:09 2004 Subject: [Numpy-discussion] numarray-1.0 Bug Alert In-Reply-To: <200407141136.09436.falted@pytables.org> References: <1089740511.9509.372.camel@halloween.stsci.edu> <200407141136.09436.falted@pytables.org> Message-ID: <1089805021.3741.62.camel@localhost.localdomain> On Wed, 2004-07-14 at 05:36, Francesc Alted wrote: > A Dimarts 13 Juliol 2004 19:41, Todd Miller va escriure: > > The real fix for the bug appears to be to redefine the semantics of > > numarray's PyArrayObject ->data pointer to include ->byteoffset, > > altering the C-API. > > Oh well, I'm afraid that I'll be affected by that :(. Just to understand > that fully, you mean that real data for an array will start in the future at > narr->data, instead of narr->data+narr->byteoffset as it does now? That is the current plan. I was thinking developers could just replace the new narr->data with (narr->data - narr->byteoffset) if needed. I'm assuming the planned changes will cost at most a few edits and package redistribution, which I understand is still a major pain in the neck; let me know if the cost is higher than that for some reason. Regards, Todd From paul at pfdubois.com Wed Jul 14 05:57:07 2004 From: paul at pfdubois.com (Paul F. Dubois) Date: Wed Jul 14 05:57:07 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? In-Reply-To: References: Message-ID: <40F52D8B.9050601@pfdubois.com> >>> add.reduce(take(r,indices([len(r),len(r)]))).flat array([ 0, 2, 5, 6, 8, 2, 4, 7, 8, 10, 5, 7, 10, 11, 13, 6, 8, 11, 12, 14, 8, 10, 13, 14, 16]) Always like a good challenge in the morning. God, it is like the old rush of writing APL. Hee-Seng Kye wrote: > Hi. I wrote a program to calculate sums of every possible combinations > of two indices of a list. The main body of the program looks something > like this: > > r = [0,2,5,6,8] > l = [] > > for x in range(0, len(r)): > for y in range(0, len(r)): > k = r[x]+r[y] > l.append(k) > print l > > 1. I've heard that it's not a good idea to have a 'for' loop within > another 'for' loop, and I was wondering if there is a more efficient way > to do this. > > 2. Does anyone know if there is a built-in function or module that would > do the above task in NumPy or Numarray (or even in Python)? > > I would really appreciate it if anyone could let me know. > > Thanks for your help! From Sebastien.deMentendeHorne at electrabel.com Wed Jul 14 08:41:09 2004 From: Sebastien.deMentendeHorne at electrabel.com (Sebastien.deMentendeHorne at electrabel.com) Date: Wed Jul 14 08:41:09 2004 Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? Message-ID: <035965348644D511A38C00508BF7EAEB145CAF2A@seacex03.eib.electrabel.be> I could not resist to propose an other solution: r = array([0,2,5,6,8]) l = (r[:,NewAxis] + r[NewAxis,:]).flat -----Original Message----- From: Hee-Seng Kye [mailto:kyeser at earthlink.net] Sent: mercredi 14 juillet 2004 4:22 To: numpy-discussion at lists.sourceforge.net Subject: [Numpy-discussion] a 'for' loop within another 'for' loop? Hi. I wrote a program to calculate sums of every possible combinations of two indices of a list. The main body of the program looks something like this: r = [0,2,5,6,8] l = [] for x in range(0, len(r)): for y in range(0, len(r)): k = r[x]+r[y] l.append(k) print l 1. I've heard that it's not a good idea to have a 'for' loop within another 'for' loop, and I was wondering if there is a more efficient way to do this. 2. Does anyone know if there is a built-in function or module that would do the above task in NumPy or Numarray (or even in Python)? I would really appreciate it if anyone could let me know. Thanks for your help! -------------- next part -------------- An HTML attachment was scrubbed... URL: From rowen at u.washington.edu Wed Jul 14 08:48:07 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Wed Jul 14 08:48:07 2004 Subject: [Numpy-discussion] How to median filter a masked array? Message-ID: I want to 3x3 median filter a masked array (2-d array of ints -- an astronomical image), where the masked data and points off the edge are excluded from the local median calculation. Any suggestions for how to do this efficiently? I suspect I have to write it in C, which is an unpleasant prospect. I tried using NaN for points to mask out, but the median filter seems to handle those as "infinity", or something equally inappropriate. In a related vein, has Python come along far enough that it would be reasonable to add support for NaN to numarray -- in the sense that statistics calculations, filters, etc. could be convinced to ignore NaNs? Obviously this support would be contingent on compiling python with IEEE floating point support, but I suspect that's the default on most platforms these days. -- Russell From jdhunter at ace.bsd.uchicago.edu Wed Jul 14 09:51:12 2004 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Wed Jul 14 09:51:12 2004 Subject: [Numpy-discussion] ANN matplotlib-0.60.2: python graphs and charts Message-ID: matplotlib is a 2D plotting library for python. You can use matplotlib interactively from a python shell or IDE, or embed it in GUI applications (WX, GTK, and Tkinter). matplotlib supports many plot types: line plots, bar charts, log plots, images, pseudocolor plots, legends, date plots, finance charts and more. What's new since matplotlib 0.50 This is the first wide release in 5 months and there has been a tremendous amount of development since then, with new backends, many optimizations, new plotting types, new backends and enhanced text support. See http://matplotlib.sourceforge.net/whats_new.html for details. * Todd Miller's tkinter backend (tkagg) with good support for interactive plotting using the standard python shell, ipython or others. matplotlib now runs on windows out of the box with python + numeric/numarry * Full Numeric / numarray integration with Todd Miller's numerix module. Prebuilt installers for numeric and numarray on win32. Others, please set your numerix settings before building matplotlib, as described on http://matplotlib.sourceforge.net/faq.html#NUMARRAY * Mathtext: you can write TeX style math expressions anywhere in your figure. http://matplotlib.sourceforge.net/screenshots.html#mathtext_demo. * Images - figure and axes images with optional interpolated resampling, alpha blending of multiple images, and more with the imshow and figimage commands. Interactive control of colormaps, intensity scaling and colorbars - http://matplotlib.sourceforge.net/screenshots.html#layer_images * Text: freetype2 support, newline separated strings with arbitrary rotations, Paul Barrett's cross platform font manager. http://matplotlib.sourceforge.net/screenshots.html#align_text * Jared Wahlstrand's SVG backend (alpha) * Support for popular financial plot types - http://matplotlib.sourceforge.net/screenshots.html#finance_work2 * Many optimizations and extension code to remove performance bottlenecks. pcolors and scatters are an order of magnitude faster. * GTKAgg, WXAgg, TkAgg backends for http://antigrain.com (agg) rendering in the GUI canvas. Now all the major GUIs (WX, GTK, Tk) can be used with a common (agg) renderer. * Many new examples and demos - see http://matplotlib.sf.net/examples or download the src distribution and look in the examples dir. Documentation and downloads available at http://matplotlib.sourceforge.net. John Hunter From verveer at embl-heidelberg.de Wed Jul 14 10:39:59 2004 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Wed Jul 14 10:39:59 2004 Subject: [Numpy-discussion] How to median filter a masked array? In-Reply-To: References: Message-ID: <1122AA7E-D5B4-11D8-8510-000A95C92C8E@embl-heidelberg.de> On 14 Jul 2004, at 17:47, Russell E Owen wrote: > I want to 3x3 median filter a masked array (2-d array of ints -- an > astronomical image), where the masked data and points off the edge are > excluded from the local median calculation. Any suggestions for how to > do this efficiently? I don't think that you can do it very efficiently right now with the functions that are available in numarray. > I suspect I have to write it in C, which is an unpleasant prospect. Yes, that is unpleasant, trust me :-) However, in version 1.0 of numarray in the nd_image package, I have added some support for writing filter functions. The generic_filter() function iterates over the array and applies a user-defined filter function at each element. The user-defined function can be written in python or in C, and is called at each element with the values within the filter-footprint as an argument. You would write a function that finds the median of these values, excluding the NaNs (or whatever value that flags the mask.) I would suggest to prototype this function in python and move that to C as soon as it works to your satisfaction. See the numarray manual for more details. Cheers, Peter From rowen at u.washington.edu Wed Jul 14 10:44:39 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Wed Jul 14 10:44:39 2004 Subject: [Numpy-discussion] How to median filter a masked array? In-Reply-To: <40F56462.2030000@pfdubois.com> References: <40F56462.2030000@pfdubois.com> Message-ID: At 9:50 AM -0700 2004-07-14, Paul F. Dubois wrote: >The median filter is prepared to take an argument of a numarray >array but ignorant of and unprepared to deal with masked values. >Using the __array__ trick, both Numeric.MA and numarray.ma would >'know' this and therefore replace the missing values in the filter's >argument with the 'fill value' for that type -- a big number in the >case of real arrays. You could explicitly choose that value (say >using the overall median of the data m) by passing x.filled(m) >rather than x to the filter. > >If there is no such value, you probably do have to do it in C. If >you wrote it in C, how would you treat missing elements? BTW it >wouldn't be that hard; just pass both the array and its mask as >separate elements to a C routine and use SWIG to hook it up. I already have routines that handle masked data in C to create a radial profiles from 2-d integer data (since I could not figure out how to do that in numarray). I chose to pass the mask as a separate array, since I could not find any C interface for numarray.ma and since NaN made no sense for integer data. That code was pretty straightforward. I wish I could have found a simple way to support multiple array types. I thought using C++ with prototypes would be the ticket, but absent any examples and after looking through the numarray code, I gave up and took the easy way out. (I didn't use SWIG, though, I just hand coded everything. Maybe that was a mistake.) I confess that makes me worry about the underpinnings of numarray. It seems an obvious candidate to be written in C++ with prototypes. I hate to think what the developers have to go through, instead. In any case, writing a median filter is a bigger deal than taking a radial profile, and since one already existed I thought I'd ask. >I doubt NaN would help you here; you'd still have to figure out what >to do in those places. Numeric did not have support for NaN because >there were portability problems. Probably still are. And you still >are stuck in a lot of cases anyway. Well, NaN isn't very general in any case, since it's meaningless for integer data. So maybe that's a red herring. (Though if NaN had worked to mask data I would cheerfully have converted my images to floats to take advantage of it!). What's really wanted is a more unified approach to masked data. I suppose it's pie in the sky, but I sure wish most the numarray functions took an optional mask array (or accepted a numarray.ma object -- nice for the user, but probably too painful for words under the hood). I don't think there are major issues with what to do with masked data. Simply ignoring it works in most cases, e.g. mean, std dev, sum, max... In some cases one needs the new mask as output (e.g. matrix multiply). Filtering is a bit subtle: can masked data be treated the same as data off the edge? I hope so, but I'm not sure. Anyway, I am grateful for what we do have. Without Numeric or numarray I would have to write all my image processing code in a different language. -- Russell From gazzar at email.com Wed Jul 14 21:00:03 2004 From: gazzar at email.com (Gary Ruben) Date: Wed Jul 14 21:00:03 2004 Subject: [Numpy-discussion] sum() and mean() broken? Message-ID: <20040715035046.C8BFE1535C5@ws3-1.us4.outblaze.com> I'm getting tracebacks on even the most basic sum() and mean() calls in numarray 1.0 under Windows. Apologies if this has already been reported. Gary >>> from numarray import * >>> arange(10).sum() Traceback (most recent call last): File "", line 1, in -toplevel- arange(10).sum() File "C:\APPS\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 1106, in sum return ufunc.add.reduce(ufunc.add.areduce(self, type=type).flat, type=type) error: Int32asInt64: buffer not aligned on 8 byte boundary. -- _______________________________________________ Talk More, Pay Less with Net2Phone Direct(R), up to 1500 minutes free! http://www.net2phone.com/cgi-bin/link.cgi?143 From jmiller at stsci.edu Thu Jul 15 06:18:04 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 06:18:04 2004 Subject: [Numpy-discussion] sum() and mean() broken? In-Reply-To: <20040715035046.C8BFE1535C5@ws3-1.us4.outblaze.com> References: <20040715035046.C8BFE1535C5@ws3-1.us4.outblaze.com> Message-ID: <1089897432.2637.34.camel@halloween.stsci.edu> numarray-1.0 is known to have problems with Windows-98, etc. (My guess is any Pre-NT windows). I haven't seen any problems with Windows XP or Windows 2000 Pro. Which windows variant are you running? Does the numarray selftest pass? It should look something like: >>> import nuamrray.testall as testall >>> testall.test() numarray: ((0, 1178), (0, 1178)) numarray.records: (0, 48) numarray.strings: (0, 176) numarray.memmap: (0, 82) numarray.objects: (0, 105) numarray.memorytest: (0, 16) numarray.examples.convolve: ((0, 20), (0, 20), (0, 20), (0, 20)) numarray.convolve: (0, 52) numarray.fft: (0, 75) numarray.linear_algebra: ((0, 46), (0, 51)) numarray.image: (0, 27) numarray.nd_image: (0, 390) numarray.random_array: (0, 53) numarray.ma: (0, 671) On Wed, 2004-07-14 at 23:50, Gary Ruben wrote: > I'm getting tracebacks on even the most basic sum() and mean() calls in numarray 1.0 under Windows. Apologies if this has already been reported. > Gary > > >>> from numarray import * > >>> arange(10).sum() > > Traceback (most recent call last): > File "", line 1, in -toplevel- > arange(10).sum() > File "C:\APPS\PYTHON23\Lib\site-packages\numarray\numarraycore.py", line 1106, in sum > return ufunc.add.reduce(ufunc.add.areduce(self, type=type).flat, type=type) > error: Int32asInt64: buffer not aligned on 8 byte boundary. -- From mathieu.gontier at fft.be Thu Jul 15 06:29:04 2004 From: mathieu.gontier at fft.be (Mathieu Gontier) Date: Thu Jul 15 06:29:04 2004 Subject: [Numpy-discussion] static void** libnumarray_API Message-ID: <200407151528.16261.mathieu.gontier@fft.be> Hello, I am developping FEM bendings from a C++ code to Python with Numarray. So, I have the following problem. In the distribution file 'libnumarray.h', the variable 'libnumarray_API' is defined as a static variable (because of the symbol NO_IMPORT is not defined). Then, I understand that all the examples are implemented in a unique file. But, in my project, I must edit header files and source files in order to solve other problems (like cycle includes). So, I have two different source files which use numarray : - the file containing the 'init' function which call the function 'import_libnumarray()' (which initialize 'libnumarray_API') - a file containing implementations, more precisely an implementation calling numarray functionnalities: with is 'static' state, this 'libnumarray_API' is NULL... I tried to compile NumArray with the symbol 'NO_IMPORT' (see libnumarray.h) in order to have an extern variable. But this symbol doesn't allow to import numarray in the python environment. So, does someone have a solution allowing to use NumArray API with header/source files ? Thanks, Mathieu Gontier From curzio.basso at unibas.ch Thu Jul 15 07:22:01 2004 From: curzio.basso at unibas.ch (Curzio Basso) Date: Thu Jul 15 07:22:01 2004 Subject: [Numpy-discussion] NA.dot transposing in place Message-ID: <40F692CC.3000103@unibas.ch> Hi all. I wonder if anyone noticed the following behaviour (new in 1.0) of the dot/matrixmultiply functions: >>> alpha = NA.arange(10, shape = (10,1)) >>> beta = NA.arange(10, shape = (10,1)) >>> NA.dot(alpha, alpha) array([[285]]) >>> alpha.shape # here it looks like it's doing the transpose in place (1, 10) >>> NA.dot(beta, alpha) array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], [ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27], [ 0, 4, 8, 12, 16, 20, 24, 28, 32, 36], [ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45], [ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54], [ 0, 7, 14, 21, 28, 35, 42, 49, 56, 63], [ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72], [ 0, 9, 18, 27, 36, 45, 54, 63, 72, 81]]) >>> alpha.shape, beta.shape # but not the second time ((1, 10), (10, 1)) ------------------------------------------------- Can someone explain me what's going on? thanks, curzio From jmiller at stsci.edu Thu Jul 15 07:36:11 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 07:36:11 2004 Subject: [Numpy-discussion] static void** libnumarray_API In-Reply-To: <200407151528.16261.mathieu.gontier@fft.be> References: <200407151528.16261.mathieu.gontier@fft.be> Message-ID: <1089902141.2637.61.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 09:28, Mathieu Gontier wrote: > Hello, > > I am developping FEM bendings from a C++ code to Python with Numarray. > So, I have the following problem. > > In the distribution file 'libnumarray.h', the variable 'libnumarray_API' is > defined as a static variable (because of the symbol NO_IMPORT is not > defined). > > Then, I understand that all the examples are implemented in a unique file. > > But, in my project, I must edit header files and source files in order to > solve other problems (like cycle includes). So, I have two different source > files which use numarray : > - the file containing the 'init' function which call the function > 'import_libnumarray()' (which initialize 'libnumarray_API') > - a file containing implementations, more precisely an implementation calling > numarray functionnalities: with is 'static' state, this 'libnumarray_API' is > NULL... > > I tried to compile NumArray with the symbol 'NO_IMPORT' (see libnumarray.h) in > order to have an extern variable. But this symbol doesn't allow to import > numarray in the python environment. > > So, does someone have a solution allowing to use NumArray API with > header/source files ? The good news is that the 1.0 headers, at least, work. I intended to capture this form of multi-compilation-unit module in the numpy_compat example... but didn't. I think there's two "tricks" missing in the example. In *a* module of the several modules you're linking together, do the following: #define NO_IMPORT 1 /* This prevents the definition of the static version of the API var. The extern won't conflict with the real definition below. */ #include "libnumarray.h" void **libnumarray_API; /* This defines the missing API var for *all* your compilation units */ This variable will be assigned the API pointer by the import_libnumarray() call. I fixed the numpy_compat example to demonstrate this in CVS but they have a Numeric flavor. The same principles apply to libnumarray. Note that for numarray-1.0 you must include/import both the Numeric compatible and native numarray APIs separately if you use both. Regards, Todd From gazzar at email.com Thu Jul 15 07:37:01 2004 From: gazzar at email.com (Gary Ruben) Date: Thu Jul 15 07:37:01 2004 Subject: [Numpy-discussion] sum() and mean() broken? Message-ID: <20040715143500.2CD321CE306@ws3-6.us4.outblaze.com> Thanks Todd, It's under Win98 as you suspected and the selftest definitely doesn't pass. Are you planning on supporting Win98? If so, I'll revert to numarray 0.9. Otherwise, I'll just use Numeric for this task and restrict playing with numarray 1.0 to my Win2k laptop. thanks, Gary -- _______________________________________________ Talk More, Pay Less with Net2Phone Direct(R), up to 1500 minutes free! http://www.net2phone.com/cgi-bin/link.cgi?143 From jmiller at stsci.edu Thu Jul 15 07:38:00 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 07:38:00 2004 Subject: [Numpy-discussion] NA.dot transposing in place In-Reply-To: <40F692CC.3000103@unibas.ch> References: <40F692CC.3000103@unibas.ch> Message-ID: <1089902251.2637.64.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 10:21, Curzio Basso wrote: > Hi all. > > I wonder if anyone noticed the following behaviour (new in 1.0) of the > dot/matrixmultiply functions: > > >>> alpha = NA.arange(10, shape = (10,1)) > > >>> beta = NA.arange(10, shape = (10,1)) > > >>> NA.dot(alpha, alpha) > array([[285]]) > > >>> alpha.shape # here it looks like it's doing the transpose in place > (1, 10) > > >>> NA.dot(beta, alpha) > array([[ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], > [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], > [ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18], > [ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27], > [ 0, 4, 8, 12, 16, 20, 24, 28, 32, 36], > [ 0, 5, 10, 15, 20, 25, 30, 35, 40, 45], > [ 0, 6, 12, 18, 24, 30, 36, 42, 48, 54], > [ 0, 7, 14, 21, 28, 35, 42, 49, 56, 63], > [ 0, 8, 16, 24, 32, 40, 48, 56, 64, 72], > [ 0, 9, 18, 27, 36, 45, 54, 63, 72, 81]]) > > >>> alpha.shape, beta.shape # but not the second time > ((1, 10), (10, 1)) > > ------------------------------------------------- > > Can someone explain me what's going on? It's a bug introduced in numarray-1.0. It'll be fixed for 1.1 in a couple weeks. Regards, Todd From jmiller at stsci.edu Thu Jul 15 07:49:14 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 07:49:14 2004 Subject: [Numpy-discussion] sum() and mean() broken? In-Reply-To: <20040715143500.2CD321CE306@ws3-6.us4.outblaze.com> References: <20040715143500.2CD321CE306@ws3-6.us4.outblaze.com> Message-ID: <1089902892.2637.75.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 10:35, Gary Ruben wrote: > Thanks Todd, > It's under Win98 as you suspected and the selftest definitely doesn't pass. > Are you planning on supporting Win98? I'm planning to debug this particular problem because I'm concerned that it's just latent in the newer windows variants. To the degree that Win98 is "free" under the umbrella of win32, it will continue to be supported. An ongoing issue will likely be that Win98 testing doesn't get done on a regular basis... just as problems are reported. Regards, Todd From curzio.basso at unibas.ch Thu Jul 15 07:51:01 2004 From: curzio.basso at unibas.ch (Curzio Basso) Date: Thu Jul 15 07:51:01 2004 Subject: [Numpy-discussion] NA.dot transposing in place In-Reply-To: <1089902251.2637.64.camel@halloween.stsci.edu> References: <40F692CC.3000103@unibas.ch> <1089902251.2637.64.camel@halloween.stsci.edu> Message-ID: <40F6999C.2050101@unibas.ch> Todd Miller wrote: > It's a bug introduced in numarray-1.0. It'll be fixed for 1.1 in a > couple weeks. Ah, ok. Is it related with the bug announced a couple of days ago? From jmiller at stsci.edu Thu Jul 15 08:14:10 2004 From: jmiller at stsci.edu (Todd Miller) Date: Thu Jul 15 08:14:10 2004 Subject: [Numpy-discussion] NA.dot transposing in place In-Reply-To: <40F6999C.2050101@unibas.ch> References: <40F692CC.3000103@unibas.ch> <1089902251.2637.64.camel@halloween.stsci.edu> <40F6999C.2050101@unibas.ch> Message-ID: <1089904417.2637.147.camel@halloween.stsci.edu> On Thu, 2004-07-15 at 10:50, Curzio Basso wrote: > Todd Miller wrote: > > > It's a bug introduced in numarray-1.0. It'll be fixed for 1.1 in a > > couple weeks. > > Ah, ok. Is it related with the bug announced a couple of days ago? Only peripherally. The Numeric compatibility layer problem was discovered as a result of porting a bunch of Numeric functions to numarray... ports done to try to get better small array speed. Similarly, the setup for matrixmultiply was moved into C for numarray-1.0... to try to get better small array speed. numarray-1.0 is disappointingly buggy, but the interest generated by the 1.0 moniker is making the open source model work well so I think 1.1 will be much more solid as a result of strong user feedback. So, thanks for the report. Regards, Todd From cjw at sympatico.ca Thu Jul 15 08:22:07 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 15 08:22:07 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407131106.19557.falted@pytables.org> References: <200407131028.04791.falted@pytables.org> <200407131106.19557.falted@pytables.org> Message-ID: <40F6A106.6020606@sympatico.ca> Francesc Alted wrote: >A Dimarts 13 Juliol 2004 10:28, Francesc Alted va escriure: > > >>A Dilluns 12 Juliol 2004 23:14, Perry Greenfield va escriure: >> >> >>>What I'm wondering about is what a single element of a record array >>>should be. Returning a tuple has an undeniable simplicity to it. >>> >>> >>Yeah, this why I'm strongly biased toward this possibility. >> >> >> >>>On the other hand, we've been using recarrays that allow naming the >>>various columns (which we refer to as "fields"). If one can refer >>>to fields of a recarray, shouldn't one be able to refer to a field >>>(by name) of one of it's elements? Or are you proposing that basic >>>recarrays not have that sort of capability (something added by a >>>subclass)? >>> >>> >>Well, I'm not sure about that. But just in case most of people would like to >>access records by field as well as by index, I would advocate for the >>possibility that the Record instances would behave as similar as possible as >>a tuple (or dictionary?). That include creating appropriate __str__() *and* >>__repr__() methods as well as __getitem__() that supports both name fields >>and indices. I'm not sure about whether providing an __getattr__() method >>would ok, but for the sake of simplicity and in order to have (preferably) >>only one way to do things, I would say no. >> >> > >I've been thinking that one can made compatible to return a tuple on a >single element of a RecArray and still being able to retrieve a field by >name is to play with the RecArray.__getitem__ and let it to suport key names >in addition to indices. This would be better seen as an example: > >Right now, one can say: > > > >>>>r=records.array([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") >>>>r._fields["c1"] >>>> >>>> >array([1, 2]) > > >>>>r._fields["c1"][1] >>>> >>>> >2 > >What I propose is to be able to say: > > > >>>>r["c1"] >>>> >>>> >array([1, 2]) > > >>>>r["c1"][1] >>>> I would suggest going a step beyond this, so that one can have r.c1[1], see the script below. I have not explored the assignment of a value to r.c1.[1], but it seems to be achievable. If changes along this line are acceptable, it is suggested that fields be renamed cols, or some such, to indicate its wider impact. Colin W. >>>> >>>> >2 > >Which would replace the notation: > > > >>>>r[1]["c1"] >>>> >>>> >2 > >which was recently suggested. > >I.e. the suggestion is to realize RecArrays as a collection of columns, >as well as a collection of rows. > > # tRecord.py to explore RecArray import numarray.records as _rec import sys # class Rec1(_rec.RecArray): def __new__(cls, buffer, formats, shape=0, names=None, byteoffset=0, bytestride=None, byteorder=sys.byteorder, aligned=0): # This calls RecArray.__init__ - reason unclear. # Why can't the instance be fully created by RecArray.__init__? return _rec.RecArray.__new__(cls, buffer, formats=formats, shape=shape, names=names, byteorder=byteorder, aligned=aligned) def __init__(self, buffer, formats, shape=0, names=None, byteoffset=0, bytestride=None, byteorder=sys.byteorder, aligned=0): arr= _rec.array(buffer, formats=formats, shape=shape, names=names, byteorder=byteorder, aligned=aligned) self.__setstate__(arr.__getstate__()) def __getattr__(self, name): # We reach here if the attribute does not belong to the basic Rec1 set return self._fields[name] def __getattribute__(self, name): return _rec.RecArray.__getattribute__(self, name) def __repr__(self): return self.__class__.__name__ + _rec.RecArray.__repr__(self)[8:] def __setattr__(self, name, value): return _rec.RecArray.__setattr__(self, name, value) def __str__(self): return self.__class__.__name__ + _rec.RecArray.__str__(self)[8:] if __name__ == '__main__': # Frances Alted 13-Jul-04 05:06 r= _rec.array([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") print r._fields["c1"] print r._fields["c1"][1] r1= Rec1([(1,"asds", 24.),(2,"pwdw", 48.)], "1i4,1a4,1f8") print r1._fields["c1"] print r1._fields["c1"][1] # r1.zz= 99 # acceptable print r1.c1 print r1.c1[1] try: x= r1.ugh except: print 'ugh not recognized as an attribute' ''' The above delivers: [1 2] 2 [1 2] 2 [1 2] 2 ugh not recognized as an attribute ''' From falted at pytables.org Thu Jul 15 09:12:08 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 15 09:12:08 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <40F6A106.6020606@sympatico.ca> References: <200407131106.19557.falted@pytables.org> <40F6A106.6020606@sympatico.ca> Message-ID: <200407151811.20359.falted@pytables.org> A Dijous 15 Juliol 2004 17:21, Colin J. Williams va escriure: > >What I propose is to be able to say: > >>>>r["c1"][1] > I would suggest going a step beyond this, so that one can have r.c1[1], > see the script below. Yeah. I've implemented something similar to access column elements for pytables Table objects. However, the problem in this case is that there are already attributes that "pollute" the column namespace, so that a column named "size" collides with the size() method. I came up with a solution by adding a new "cols" attribute to the Table object that is an instance of a simple class named Cols with no attributes that can pollute the namespace (except some starting by "__" or "_v_"). Then, it is just a matter of provide functionality to access the different columns. In that case, when a reference of a column is made, another object (instance of Column class) is returned. This Column object is basically an accessor to column values with a __getitem__() and __setitem__() methods. That might sound complicated, but it is not. I'm attaching part of the relevant code below. I personally like that solution in the context of pytables because it extends the "natural naming" convention quite naturally. A similar approach could be applied to RecArray objects as well, although numarray might (and probably do) have other usage conventions. > I have not explored the assignment of a value to r.c1.[1], but it seems > to be achievable. in the schema I've just proposed the next should be feasible: value = r.cols.c1[1] r.cols.c1[1] = value -- Francesc Alted ----------------------------------------------------------------- class Cols(object): """This is a container for columns in a table It provides methods to get Column objects that gives access to the data in the column. Like with Group instances and AttributeSet instances, the natural naming is used, i.e. you can access the columns on a table like if they were normal Cols attributes. Instance variables: _v_table -- The parent table instance _v_colnames -- List with all column names Methods: __getitem__(colname) """ def __init__(self, table): """Create the container to keep the column information. table -- The parent table """ self.__dict__["_v_table"] = table self.__dict__["_v_colnames"] = table.colnames # Put the column in the local dictionary for name in table.colnames: self.__dict__[name] = Column(table, name) def __len__(self): return self._v_table.nrows def __getitem__(self, name): """Get the column named "name" as an item.""" if not isinstance(name, types.StringType): raise TypeError, \ "Only strings are allowed as keys of a Cols instance. You passed object: %s" % name # If attribute does not exist, return None if not name in self._v_colnames: raise AttributeError, \ "Column name '%s' does not exist in table:\n'%s'" % (name, str(self._v_table)) return self.__dict__[name] def __str__(self): """The string representation for this object.""" # The pathname pathname = self._v_table._v_pathname # Get this class name classname = self.__class__.__name__ # The number of columns ncols = len(self._v_colnames) return "%s.cols (%s), %s columns" % (pathname, classname, ncols) def __repr__(self): """A detailed string representation for this object.""" out = str(self) + "\n" for name in self._v_colnames: # Get this class name classname = getattr(self, name).__class__.__name__ # The shape for this column shape = self._v_table.colshapes[name] # The type tcol = self._v_table.coltypes[name] if shape == 1: shape = (1,) out += " %s (%s%s, %s)" % (name, classname, shape, tcol) + "\n" return out class Column(object): """This is an accessor for the actual data in a table column Instance variables: table -- The parent table instance name -- The name of the associated column Methods: __getitem__(key) """ def __init__(self, table, name): """Create the container to keep the column information. table -- The parent table instance name -- The name of the column that is associated with this object """ self.table = table self.name = name # Check whether an index exists or not iname = "_i_"+table.name+"_"+name self.index = None if iname in table._v_parent._v_indices: self.index = Index(where=self, name=iname, expectedrows=table._v_expectedrows) else: self.index = None def __getitem__(self, key): """Returns a column element or slice It takes different actions depending on the type of the "key" parameter: If "key" is an integer, the corresponding element in the column is returned as a NumArray/CharArray, or a scalar object, depending on its shape. If "key" is a slice, the row slice determined by this slice is returned as a NumArray or CharArray object (whatever is appropriate). """ if isinstance(key, types.IntType): if key < 0: # To support negative values key += self.table.nrows (start, stop, step) = processRange(self.table.nrows, key, key+1, 1) return self.table._read(start, stop, step, self.name, None)[0] elif isinstance(key, types.SliceType): (start, stop, step) = processRange(self.table.nrows, key.start, key.stop, key.step) return self.table._read(start, stop, step, self.name, None) else: raise TypeError, "'%s' key type is not valid in this context" % \ (key) def __str__(self): """The string representation for this object.""" # The pathname pathname = self.table._v_pathname # Get this class name classname = self.__class__.__name__ # The shape for this column shape = self.table.colshapes[self.name] if shape == 1: shape = (1,) # The type tcol = self.table.coltypes[self.name] return "%s.cols.%s (%s%s, %s)" % (pathname, self.name, classname, shape, tcol) def __repr__(self): """A detailed string representation for this object.""" return str(self) From perry at stsci.edu Thu Jul 15 10:39:06 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 15 10:39:06 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407151811.20359.falted@pytables.org> Message-ID: Francesc Alted wrote: > A Dijous 15 Juliol 2004 17:21, Colin J. Williams va escriure: > > >What I propose is to be able to say: > > >>>>r["c1"][1] > > I would suggest going a step beyond this, so that one can have r.c1[1], > > see the script below. > > Yeah. I've implemented something similar to access column elements for > pytables Table objects. However, the problem in this case is that > there are > already attributes that "pollute" the column namespace, so that a column > named "size" collides with the size() method. > The idea of mapping field names to attributes occurs to everyone quickly, but for the reasons Francesc gives (as well as another I'll mention) we were reluctant to implement it. The other reason is that it would be nice to allow field names that are not legal attributes (e.g., that include spaces or other illegal attribute characters). There are potentially people with data in databases or other similar formats that would like to map field name exactly. Well certainly one can still use the attribute approach and not support all field names (or column, or col...) it does introduce another glitch in the user interface when it works only for a subset of legal names. > I came up with a solution by adding a new "cols" attribute to the Table > object that is an instance of a simple class named Cols with no attributes > that can pollute the namespace (except some starting by "__" or "_v_"). > Then, it is just a matter of provide functionality to access the different > columns. In that case, when a reference of a column is made, > another object > (instance of Column class) is returned. This Column object is basically an > accessor to column values with a __getitem__() and __setitem__() methods. > That might sound complicated, but it is not. I'm attaching part of the > relevant code below. > > I personally like that solution in the context of pytables because it > extends the "natural naming" convention quite naturally. A > similar approach > could be applied to RecArray objects as well, although numarray might (and > probably do) have other usage conventions. > > > I have not explored the assignment of a value to r.c1.[1], but it seems > > to be achievable. > > in the schema I've just proposed the next should be feasible: > > value = r.cols.c1[1] > r.cols.c1[1] = value > This solution avoids name collisions but doesn't handle the other problem. This is worth considering, but I thought I'd hear comments about the other issue before deciding it (there is also the "more than one way" issue as well; but this guideline seems to bend quite often to pragmatic concerns). We're still chewing on all the other issues and plan to start floating some proposals, rationales and questions before long. Perry From falted at pytables.org Thu Jul 15 11:21:10 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 15 11:21:10 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: <200407152020.00873.falted@pytables.org> A Dijous 15 Juliol 2004 19:37, Perry Greenfield va escriure: > formats that would like to map field name exactly. Well certainly > one can still use the attribute approach and not support all field > names (or column, or col...) it does introduce another glitch in > the user interface when it works only for a subset of legal names. Yep. I forgot that issue. My particular workaround on that was to provide an optional trMap dictionary during Table (in our case, RecArray) creation time to map those original names that are not valid python names by valid ones. That would read something like: >>> r=records.array([(1,"as")], "1i4,1a2", names=["c 1", "c2"], trMap={"c1": "c 1"}) that would indicate that the "c 1" column which is not a valid python name (it has an space in the middle) can be accessed by using "c1" string, which is a valid python id. That way, r.cols.c1 would access column "c 1". And although I must admit that this solution is not very elegant, it allows to cope with those situations where the columns are not valid python names. -- Francesc Alted From cjw at sympatico.ca Thu Jul 15 17:22:42 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 15 17:22:42 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: Message-ID: <40F71F9C.9040008@sympatico.ca> Perry Greenfield wrote: >Francesc Alted wrote: > > >>A Dijous 15 Juliol 2004 17:21, Colin J. Williams va escriure: >> >> >>>>What I propose is to be able to say: >>>> >>>> >>>>>>>r["c1"][1] >>>>>>> >>>>>>> >>>I would suggest going a step beyond this, so that one can have r.c1[1], >>>see the script below. >>> >>> >>Yeah. I've implemented something similar to access column elements for >>pytables Table objects. However, the problem in this case is that >>there are >>already attributes that "pollute" the column namespace, so that a column >>named "size" collides with the size() method. >> >> >> >The idea of mapping field names to attributes occurs to everyone >quickly, but for the reasons Francesc gives (as well as another I'll >mention) we were reluctant to implement it. The other reason is that >it would be nice to allow field names that are not legal attributes >(e.g., that include spaces or other illegal attribute characters). >There are potentially people with data in databases or other similar >formats that would like to map field name exactly. Well certainly >one can still use the attribute approach and not support all field >names (or column, or col...) it does introduce another glitch in >the user interface when it works only for a subset of legal names. > > It would, I suggest, not be unduly restrictive to bar the existing attribute names but, if that's not acceptable, Francesc has suggested the.col workaround, although I would prefer to avoid the added clutter. Incidentally, there is no current protection against wiping out an existing method: [Dbg]>>> r1.size= 0 [Dbg]>>> r1.size 0 [Dbg]>>> > > >>I came up with a solution by adding a new "cols" attribute to the Table >>object that is an instance of a simple class named Cols with no attributes >>that can pollute the namespace (except some starting by "__" or "_v_"). >>Then, it is just a matter of provide functionality to access the different >>columns. In that case, when a reference of a column is made, >>another object >>(instance of Column class) is returned. This Column object is basically an >>accessor to column values with a __getitem__() and __setitem__() methods. >>That might sound complicated, but it is not. I'm attaching part of the >>relevant code below. >> >>I personally like that solution in the context of pytables because it >>extends the "natural naming" convention quite naturally. A >>similar approach >>could be applied to RecArray objects as well, although numarray might (and >>probably do) have other usage conventions. >> >> >> >>>I have not explored the assignment of a value to r.c1.[1], but it seems >>>to be achievable. >>> >>> >>in the schema I've just proposed the next should be feasible: >> >>value = r.cols.c1[1] >>r.cols.c1[1] = value >> >> >> >This solution avoids name collisions but doesn't handle the other >problem. This is worth considering, but I thought I'd hear comments >about the other issue before deciding it (there is also the >"more than one way" issue as well; but this guideline seems to bend >quite often to pragmatic concerns). > To allow for multi-word column names, assignment could replace a space by an underscore and, in retrieval, the reverse could be done - ie. underscore would be banned for a column name. Colin W. > >We're still chewing on all the other issues and plan to start floating >some proposals, rationales and questions before long. > >Perry > > > > From falted at pytables.org Fri Jul 16 02:12:11 2004 From: falted at pytables.org (Francesc Alted) Date: Fri Jul 16 02:12:11 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <40F71F9C.9040008@sympatico.ca> References: <40F71F9C.9040008@sympatico.ca> Message-ID: <200407161111.41626.falted@pytables.org> A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: > To allow for multi-word column names, assignment could replace a space > by an underscore > and, in retrieval, the reverse could be done - ie. underscore would be > banned for a column name. That's not so easy. What about other chars like '/&%@$()' that cannot be part of python names? Finding a biunivocal map between them and allowed chars would be difficult (if possible at all). Besides, the resulting colnames might become a real mess. Regards, -- Francesc Alted From cjw at sympatico.ca Fri Jul 16 05:41:12 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Jul 16 05:41:12 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407161111.41626.falted@pytables.org> References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> Message-ID: <40F7CBC6.2030607@sympatico.ca> Francesc Alted wrote: >A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: > > >>To allow for multi-word column names, assignment could replace a space >>by an underscore >>and, in retrieval, the reverse could be done - ie. underscore would be >>banned for a column name. >> >> > >That's not so easy. What about other chars like '/&%@$()' that cannot be >part of python names? Finding a biunivocal map between them and allowed >chars would be difficult (if possible at all). Besides, the resulting >colnames might become a real mess. > >Regards, > > Yes, if the objective is to include special characters or facilitate multi-lingual columns names and it probably should be, then my suggestion is quite inadequate. Perhaps there could be a simple name -> column number mapping in place of _names. References to a column, or a field in a record, could then be through this dictionary. Basic access to data in a record would be by position number, rather than name, but the dictionary would facilitate access by name. Data could be referenced either through the column name: r1.c2[1] or through the record r1[1].c2, with the possibility that the index is multi-dimensional in either case. Colin W. From rowen at u.washington.edu Fri Jul 16 10:55:23 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Fri Jul 16 10:55:23 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <200407161111.41626.falted@pytables.org> References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> Message-ID: >A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: >> To allow for multi-word column names, assignment could replace a space >> by an underscore >> and, in retrieval, the reverse could be done - ie. underscore would be >> banned for a column name. > >That's not so easy. What about other chars like '/&%@$()' that cannot be >part of python names? Finding a biunivocal map between them and allowed >chars would be difficult (if possible at all). Besides, the resulting >colnames might become a real mess. Personally, I think the idea of allowing access to fields via attributes is fatally flawed. The problems raised (non-obvious mapping between field names with special characters and allowable attribute names and also the collision with existing instance variable and method names) clearly show it would be forced and non-pythonic. The obvious solution seems to be some combination of the dict interface (an ordered dict that keeps its keys in original field order) and the list interface. My personal leaning is: - Offer most of the dict methods, including __get/setitem__, keys, values and all iterators but NOT set_default pop_item or anything else that adds or deletes a field. - Offer the list version of __get/setitem__, as well, but NONE of list's methods. - Make the default iterator iterate over values, not keys (field names), i.e have the item act like a list, not a dict when used as an iterator. In other words, the following all work (where item is one element of a numarray.record array): item[0] = 10 # set value of field 0 to 10 x = item[0:5] # get value of fields 0 through 4 item[:] = list of replacement values item["afield"] = 10 "%s(afield)" % item the methods iterkeys, itervalues, iteritems, keys, values, has_key all work the method update might work, but it's an error to add new fields -- Russell P.S. Folks are welcome to use my ordered dictionary implementation RO.Alg.OrderedDictionary, which is part of the RO package . It is fully standalone (despite its location in my hierarchy) and is used in production code. From barrett at stsci.edu Fri Jul 16 11:49:01 2004 From: barrett at stsci.edu (Paul Barrett) Date: Fri Jul 16 11:49:01 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> Message-ID: <40F822E0.5010406@stsci.edu> Russell E Owen wrote: >> A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: >> >>> To allow for multi-word column names, assignment could replace a space >>> by an underscore >>> and, in retrieval, the reverse could be done - ie. underscore would be >>> banned for a column name. >> >> >> That's not so easy. What about other chars like '/&%@$()' that cannot be >> part of python names? Finding a biunivocal map between them and allowed >> chars would be difficult (if possible at all). Besides, the resulting >> colnames might become a real mess. > > > Personally, I think the idea of allowing access to fields via > attributes is fatally flawed. The problems raised (non-obvious mapping > between field names with special characters and allowable attribute > names and also the collision with existing instance variable and > method names) clearly show it would be forced and non-pythonic. +1 It also make it difficult to do the following: a = item[:10, ('age', 'surname', 'firstname')] where field (or column) 1 is 'firstname, field 2 is 'surname', and field 10 is 'age'. -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218 From jmiller at stsci.edu Fri Jul 16 12:43:02 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 16 12:43:02 2004 Subject: [Numpy-discussion] I move your "Bugs" reports... Message-ID: <1090006936.7264.66.camel@halloween.stsci.edu> Not infrequently even very experienced numarray contributors file bug reports in the numpy "Bugs" tracker. Because numpy is a shared SF project with both Numeric and numarray, numarray bugs are actually tracked in the "Numarray Bugs" tracker, here: http://sourceforge.net/tracker/?atid=450446&group_id=1369&func=browse "Numarray Bugs" can also be found through the "Tracker" link at the top of any numpy SF web page. So, don't worry, your painstaking reports are not getting deleted, they're getting relocated to a place where *only* numarray bugs live. There's probably a better way to do this, but until I find it or someone tells me about it, I thought I should tell everyone what's going on. Thanks to everybody who takes the time to fill out bug reports to make numarray better... Regards, Todd From hsu at stsci.edu Fri Jul 16 13:19:00 2004 From: hsu at stsci.edu (Jin-chung Hsu) Date: Fri Jul 16 13:19:00 2004 Subject: [Numpy-discussion] multidimensional record arrays Message-ID: <200407162018.ANW09710@donner.stsci.edu> There have been a number of questions and suggestions about how the record array facility in numarray could be improved. We've been talking about these internally and thought it would be useful to air some proposals along with discussions of the rationale behind each proposal as well discussions of drawbacks, and some remaining open questions. Rather than do this in one long message, we will do this in pieces. The first addresses how to improve handling multidimensional record arrays. These will not discuss how or when we implement the proposed enhancements or changes. We first want to come to some consensus (or lacking that, decision) first about what the target should be. ********************************************************* Proposal for records module enhancement, to handle record arrays of dimension (rank) higher than 1. Background: The current records module in numarray doesn't handle record arrays of dimension higher than one well. Even though most of the infrastructure for higher dimensionality is already in place, the current implementation for the record arrays was based on the implicit assumption that record arrays are 1-D. This limitation is reflected in the areas of input user interface, indexing, and output. The indexing and output are more straightforward to modify, so I'll discuss it first. Although it is possible to create a multi-dimensional record array, indexing does not work properly for 2 or more dimensions. For example, for a 2-D record array r, r[i,j] does not give correct result (but r[i][j] does). This will be fixed. At present, a user cannot print record arrays higher than 1-D. This will also be fixed as well as incorporating some numarray features (e.g., printing only the beginning and end of an array for large arrays--as is done for numarrays now). Input Interface: There are currently several different ways to construct the record array using the array() function These include setting the buffer argument to: (1) None (2) File object (3) String object or appropriate buffer object (i.e., binary data) (4) a list of records (in the form of sequences), for example: [(1,'abc', 2.3), (2,'xyz', 2.4)] (5) a list of numarrays/chararrays for each field (e.g., effectively 'zipping' the arrays into records) The first three types of input are very general and can be used to generate multi-dimensional record arrays in the current implementation. All these options need to specify the "shape" argument. The input options that do not work for multi-dimensional record arrays now are the last two. Option 4 (sequence of 'records') If a user has a multi-dimensional record array and if one or more field is also a multidimensional array, using this option is potentially confusing since there can be ambiguity regarding what part of a nested sequence structure is the structure of the record array and what should be considered part of the record since record elements themselves may be arrays. (Some of the same issues arise for object arrays) As an example: --> r=rec.array([([1,2],[3,4]),([11,12],[13,14])]) could be interpreted as a 1-D record array, where each cell is an (num)array: RecArray[ (array([1, 2]), array([3, 4])), (array([11, 12]), array([13, 14])) ] or a 2-D record array, where each cell is just a number: RecArray( [[(1, 2), (3, 4)], [(11, 12), (13, 14)]]) Thus we propose a new argument "rank" (following the convention used in object arrays) to specify the dimensionality of the output record array. In the first example above, rank is 1, and the second example rank=2. If rank is set to None, the highest possible rank will be assumed (in this example, 2). We propose to eventually generalize that to accept any sequence object for the array structure (though there will be the same requirement that exist for other arrays that the nested sequences be of the same type). As would be expected, strings are not permitted as the enclosing sequence. In this future implementation the record 'item' itself must either be: 1) A tuple 2) A subclass of tuple 3) A Record object (this may be taken care of by 2 if we make Record a subclass of tuple; this will be discussed in a subsequent proposal. This requirement allows distinguishing the sequence of records from Option 5 below. For tuples (or tuple derived elements), the items of the tuple must be one of the following: basic data types such as int, float, boolean, or string; a numarray or chararray; or an object that can be converted to a numarray or chararray. Option 5 (List of Arrays) Using a list of arrays to construct an N-D record array should be easier Than using the previous option. The input syntax is simply: [array1, array2, array3,...] The shape of the record array will be determined from the shape of the input arrays as described below. All the user needs to do is to construct the arrays in the list. There is, similar to option 4, a possible ambiguity: if all the arrays are of the shape, say, (2,3), then the user may intend a 1-D record array of 2 rows while each cell is an array of shape 3, or a 2-D record array of shape (2,3) while each cell is a single number of string. Thus, the user must either explicitly specify the "shape" or "rank". We propose the following behavior via examples: Example 1: given: array1.shape=(2,3,4,5) array2.shape=(2,3,4) array3.shape=(2,3) Rank can only be specified as rank=1 (the record array's shape will then be (2,)) or rank=2 (the record array's shape will then be (2,3)). For rank=None the record shape will be (2,3), i.e. the "highest common denominator": each cell in the first field will be an array of shape (4,5), each cell in the second field will be an array of shape (4,), and each cell in the 3rd field will be a single number or a string. If "shape" is specified, it will take precedence over "rank" and its allowed value in this example will be either 2, or (2,3). Example 2: array1.shape=(3,4,5) array2.shape=(4,5) this will raise exception because the 'slowest' axes do not match. ********* For both the sequence of records and list-of-arrays input options, we Propose the default value for "rank" be None (current default is 1). This gives consistent behavior with object arrays but does change the current behavior. Also for both cases specifying a shape inconsistent with the supplied data will raise an exception. From cjw at sympatico.ca Fri Jul 16 19:46:09 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Fri Jul 16 19:46:09 2004 Subject: [Numpy-discussion] RecArray.tolist() suggestion In-Reply-To: <40F822E0.5010406@stsci.edu> References: <40F71F9C.9040008@sympatico.ca> <200407161111.41626.falted@pytables.org> <40F822E0.5010406@stsci.edu> Message-ID: <40F892B2.7090706@sympatico.ca> Paul Barrett wrote: > Russell E Owen wrote: > >>> A Divendres 16 Juliol 2004 02:21, Colin J. Williams va escriure: >>> >>>> To allow for multi-word column names, assignment could replace a >>>> space >>>> by an underscore >>>> and, in retrieval, the reverse could be done - ie. underscore >>>> would be >>>> banned for a column name. >>> >>> >>> >>> That's not so easy. What about other chars like '/&%@$()' that >>> cannot be >>> part of python names? Finding a biunivocal map between them and allowed >>> chars would be difficult (if possible at all). Besides, the resulting >>> colnames might become a real mess. >> >> >> >> Personally, I think the idea of allowing access to fields via >> attributes is fatally flawed. The problems raised (non-obvious >> mapping between field names with special characters and allowable >> attribute names and also the collision with existing instance >> variable and method names) clearly show it would be forced and >> non-pythonic. > > > +1 Paul, Below, I've appended my response to Francesc's 08:36 message, it was copied to the list but does not appear in the archive. > > It also make it difficult to do the following: > > a = item[:10, ('age', 'surname', 'firstname')] > > where field (or column) 1 is 'firstname, field 2 is 'surname', and > field 10 is 'age'. > > -- Paul Could you clarify what you have in mind here please? Is this a proposed extension to records.py, as it exists in version 1.0? Colin W. ------------------------------------------------------------------------ Yes, if the objective is to include special characters or facilitate multi-lingual columns names and it probably should be, then my suggestion is quite inadequate. Perhaps there could be a simple name -> column number mapping in place of _names. References to a column, or a field in a record, could then be through this dictionary. Basic access to data in a record would be by position number, rather than name, but the dictionary would facilitate access by name. Data could be referenced either through the column name: r1.c2[1] or through the record r1[1].c2, with the possibility that the index is multi-dimensional in either case. Colin W. From gerard.vermeulen at grenoble.cnrs.fr Sun Jul 18 14:25:10 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Sun Jul 18 14:25:10 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <1088632459.7526.213.camel@halloween.stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> Message-ID: <20040718212443.M21561@grenoble.cnrs.fr> Hi Todd, This is a follow-up on the 'header pep' discussion. The attachment numnum-0.1.tar.gz contains the sources for the extension modules pep and numnum. At least on my systems, both modules behave as described in the 'numarray header PEP' when the extension modules implementing the C-API are not present (a situation not foreseen by the macros import_array() of Numeric and especially numarray). IMO, my solution is 'bona fide', but requires further testing. The pep module shows how to handle the colliding C-APIs of the Numeric and numarray extension modules and how to implement automagical conversion between Numeric and numarray arrays. For a technical reason explained in the README, the hard work of doing the conversion between Numeric and numarray arrays has been delegated to the numnum module. The numnum module is useful when one needs to convert from one array type to the other to use an extension module which only exists for the other type (eg. combining numarray's image processing extensions with pygame's Numeric interface): Python 2.3+ (#1, Jan 7 2004, 09:17:35) [GCC 3.3.1 (SuSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numnum; import Numeric as np; import numarray as na >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) >>> print type(np1); np1; type(np2); np2 array([[1, 2], [3, 4]]) array([[1, 2, 3], [4, 5, 6]],'i') >>> print type(na1); na1; type(na2); na2 array([[1, 2], [3, 4]]) array([[1, 2, 3], [4, 5, 6]]) >>> The pep module shows how to implement array processing functions which use the Numeric, numarray or Sequence C-API: static PyObject * wysiwyg(PyObject *dummy, PyObject *args) { PyObject *seq1, *seq2; PyObject *result; if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) return NULL; switch(API) { case NumericAPI: { PyObject *np1 = NN_API->toNP(seq1); PyObject *np2 = NN_API->toNP(seq2); result = np_wysiwyg(np1, np2); Py_XDECREF(np1); Py_XDECREF(np2); break; } case NumarrayAPI: { PyObject *na1 = NN_API->toNA(seq1); PyObject *na2 = NN_API->toNA(seq2); result = na_wysiwyg(na1, na2); Py_XDECREF(na1); Py_XDECREF(na2); break; } case SequenceAPI: result = seq_wysiwyg(seq1, seq2); break; default: PyErr_SetString(PyExc_RuntimeError, "Should never happen"); return 0; } return result; } See the README for an example session using the pep module showing that it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). Notes: - it is straightforward to adapt pep and numnum so that the conversion functions are linked into pep instead of imported. - numnum is still 'proof of concept'. I am thinking about methods to make those techniques safer if the numarray (and Numeric?) header files make it never into the Python headers (or make it safer to use those techniques with Python < 2.4). In particular it would be helpful if the numerical C-APIs export an API version number, similar to the versioning scheme of shared libraries -- see the libtool->versioning info pages. I am considering three possibilities to release a more polished version of numnum (3rd party extension writers may prefer to link rather than import numnum's functionality): 1. release it from PyQwt's project page 2. register an independent numnum project at SourceForge 3. hand numnum over to the Numerical Python project (frees me from worrying about API changes). Regards -- Gerard Vermeulen -------------- next part -------------- A non-text attachment was scrubbed... Name: numnum-0.1.tar.gz Type: application/gzip Size: 12851 bytes Desc: not available URL: From jmiller at stsci.edu Tue Jul 20 05:49:04 2004 From: jmiller at stsci.edu (Todd Miller) Date: Tue Jul 20 05:49:04 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <20040718212443.M21561@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040718212443.M21561@grenoble.cnrs.fr> Message-ID: <1090327693.3749.257.camel@localhost.localdomain> On Sun, 2004-07-18 at 17:24, gerard.vermeulen at grenoble.cnrs.fr wrote: > Hi Todd, > > This is a follow-up on the 'header pep' discussion. Great! I was afraid you were going to disappear back into the ether. Sorry I didn't respond to this yesterday... I saw it but accidentally marked it as "read" and then forgot about it as the day went on. > The attachment numnum-0.1.tar.gz contains the sources for the > extension modules pep and numnum. At least on my systems, both > modules behave as described in the 'numarray header PEP' when the > extension modules implementing the C-API are not present (a situation > not foreseen by the macros import_array() of Numeric and especially > numarray). For numarray, this was *definitely* foreseen at some point, so I'm wondering what doesn't work now... > IMO, my solution is 'bona fide', but requires further > testing. I'll look it over today or tomorrow and comment more then. > The pep module shows how to handle the colliding C-APIs of the Numeric > and numarray extension modules and how to implement automagical > conversion between Numeric and numarray arrays. Nice; the conversion code sounds like a good addition to me. > For a technical reason explained in the README, the hard work of doing > the conversion between Numeric and numarray arrays has been delegated > to the numnum module. The numnum module is useful when one needs to > convert from one array type to the other to use an extension module > which only exists for the other type (eg. combining numarray's image > processing extensions with pygame's Numeric interface): > > Python 2.3+ (#1, Jan 7 2004, 09:17:35) > [GCC 3.3.1 (SuSE Linux)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numnum; import Numeric as np; import numarray as na > >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) > >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) > >>> print type(np1); np1; type(np2); np2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]],'i') > >>> print type(na1); na1; type(na2); na2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]]) > >>> > > The pep module shows how to implement array processing functions which > use the Numeric, numarray or Sequence C-API: > > static PyObject * > wysiwyg(PyObject *dummy, PyObject *args) > { > PyObject *seq1, *seq2; > PyObject *result; > > if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) > return NULL; > > switch(API) { We'll definitely need to cover API in the PEP. There is a design choice here which needs to be discussed some and any resulting consensus documented. I haven't looked at the attachment yet. > case NumericAPI: > { > PyObject *np1 = NN_API->toNP(seq1); > PyObject *np2 = NN_API->toNP(seq2); > result = np_wysiwyg(np1, np2); > Py_XDECREF(np1); > Py_XDECREF(np2); > break; > } > case NumarrayAPI: > { > PyObject *na1 = NN_API->toNA(seq1); > PyObject *na2 = NN_API->toNA(seq2); > result = na_wysiwyg(na1, na2); > Py_XDECREF(na1); > Py_XDECREF(na2); > break; > } > case SequenceAPI: > result = seq_wysiwyg(seq1, seq2); > break; > default: > PyErr_SetString(PyExc_RuntimeError, "Should never happen"); > return 0; > } > > return result; > } > > See the README for an example session using the pep module showing that > it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). > > Notes: > > - it is straightforward to adapt pep and numnum so that the conversion > functions are linked into pep instead of imported. > > - numnum is still 'proof of concept'. I am thinking about methods to > make those techniques safer if the numarray (and Numeric?) header > files make it never into the Python headers (or make it safer to > use those techniques with Python < 2.4). In particular it would > be helpful if the numerical C-APIs export an API version number, > similar to the versioning scheme of shared libraries -- see the > libtool->versioning info pages. I've thought about this a few times; there's certainly a need for it in numarray anyway... and I'm always one release too late. Thanks for the tip on libtool->versioning. > I am considering three possibilities to release a more polished > version of numnum (3rd party extension writers may prefer to link > rather than import numnum's functionality): > > 1. release it from PyQwt's project page > 2. register an independent numnum project at SourceForge > 3. hand numnum over to the Numerical Python project (frees me from > worrying about API changes). > > Regards -- Gerard Vermeulen (3) sounds best to me, for the same reason that numarray is a part of the numpy project and because numnum is a Numeric/numarray tool. There is a small issue of sub-project organization (seperate bug tracking, etc.), but I figure if SF can handle Python, it can handle Numeric, numarray, and probably a number of other packages as well. Something like numnum should not be a problem and so to promote it, it would be good to keep it where people can find it without having to look too hard. For now, I'm again marking your post as "unread" and will revisit it later this week. In the meantime, thanks very much for your efforts with numnum and the PEP. Regards, Todd From perry at stsci.edu Tue Jul 20 09:05:02 2004 From: perry at stsci.edu (Perry Greenfield) Date: Tue Jul 20 09:05:02 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: Message-ID: We now turn to the behavior of Records. We'll note that many of the current proposals had been considered in the past but not implemented with more of a 'wait and see' attitude towards what was really necessary and a desire to prevent too many ways of doing the same thing without seeing that there was a real call for them. This proposal deals with the behavior of record array 'items', i.e., what we call Record objects now. The primary issues that have been raised with regard to Record behavior are summarized as follows: 1) Items should be tuples instead of Records 2) Items should be objects, but present tuple and/or dictionary consistent behavior. 3) Field (or column) names should be accessible as Record (and record array) attributes. Issue 1: Should record array items be tuples instead of Records? Francesc Alted made this suggestion recently. Essentially the argument is that tuples are a natural way of representing records. Unfortunately, tuples do not provide a means of accessing fields of a record by name, but only by number. For this reason alone, tuples don't appear to be adequate. Francesc proposed allowing dictionary-like indexing to record arrays to facilitate the field access to tuple entries by name. However, it seems that if rarr is a record array, that both rarr['column 1'][2] and rarr[2]['column 1'] should work, not just the former. So the short answer is "No". It should be noted that using tuples will force another change in current behavior. Note that the current Record objects are actually views into the record array. Changing the value within a record object changes the record array. Use of tuples won't allow that since tuples are not mutable. Whole records must be changed in their entirety if single elements of record arrays were set by and returned from tuples. But his comments (and well as those of others) do point out a number of problems with the current implementation that could be improved, and making the Record object support tuple behaviors is quite reasonable. Hence: Issue 2: Should record array items present tuple and/or dictionary compatible behaviors? The short answer is, yes, we do agree that they should. This includes many of the proposals made including: 1) supporting all Tuple capabilities with the following differences: a) fields are mutable (unlike tuple items) so long as the assigned value is coerceable to the expected type. For example the current methods of doing so are: >>> cell = oneRec.field(1) >>> oneRec.setfield(1, newValue) This proposal would allow: >>> cell = oneRec[1] >>> oneRec[1] = newValue b) slice assignments are permitted so long as it doesn't change the size of the record (i.e., no insertion of extra items) and the items can be assigned as permitted for a. E.g., OneCell[2:4] = (3, 'abc') c) __str__ will result in a display looking like that for tuples, __repr__ will show a Record constructor >>> print oneRec # as is currently implemented (1.1, 2, 'abc', 3) >>> oneRec Record((1.1, 2, 'abc', 3), formats=['1Float32', '1Int16', '1a3', '1Int32']) names=['abc', 'c2', 'xyz', 'c4']) (note that how best to handle formats is still being thought about) 2) supporting all Dictionary capabilities with the following differences: a) keys and items are ordered. b) keys are restricted to being integers or strings only c) new keys cannot be dynamically added or deleted as for dictionaries d) no support for any other dictionary capabilities that can change the number or names of items e) __str__ will not show a result looking like a dictionary (see 1c) f) values must meet Record object required type (or be coerceable to it) For example the current >>> cell = onRec.field('c2') >>> oneRec.setfield('c2', newValue) And the proposed added indexing capability: >>> cell = oneRec['c2'] >>> oneRec['c2'] = newValue Issue 3: Field (or column) names should be accessible as Record (and record array) attributes. As much as the attribute approach has appeal for simple usage, the problems of name collisions and mismatches between acceptable field names and attribute names strikes us as it does Russell Owen as being very problematic. The technique of using a special attribute as Francesc suggests (in his case, cols) that contains the field name attributes solves the name collision problem, but not the legality issue (particularly with regard to illegal characters, it's hard to imagine easily remembered mappings between legal attribute representations and the actual field name. We are inclined to try to pass (for now anyway) on mapping fields to attributes in any way. It seems to us that indexing by name should be convenient enough, as well as fully flexible to really satisfy all needs (and is needed in any case since attributes are a clumsy way to use field access when using a variable to specify the field (yes, one can use getattr(), but it's clumsy) ******************************************* Record array behavior changes: 1) It will be possible to assign any sequence to a record array item so long as the sequence contains the right number of fields, and each item of the sequence can be coerced to what the record array expects for the corresponding field of the record. (addressing numarray feature request 928473 by Russell Owen). I.e., >>> recArr[1] = (2, 3.2, 'xyz', 3) 2) One may assign a record to a record array so long as the record matches the format of the record format of the record array (current behavior). 3) Easier construction and initialization of recarrays with default field values as requested in numarray bug report 928479) 4) Support for lists of field names and formats as detailed in numarray bug report 928488. 5) Field name indexing for record arrays. It will be possible to index record arrays with a field name, i.e., if the index is a string, then what will be returned is a numarray/chararray for that column. (Note that it won't be possible to index record arrays by field number for obvious reasons). I.e. Currently >>> col = recArr.field('doc') Can also be >>> col = recArr['abc'] But the current >>> col = recArr.field(1) Cannot become >>> col = recArr[1] On the other hand, it will not be permitted to mix a field index with an array index in the same brackets, e.g., rarr[10, 'column 2'] will not be supported. Allowing indexing to have two different interpretations is a bit worrying. But if record array items may be indexed in this manner, it seems natural to permit the same indexing for the record array. Mixing the two kinds of indexing in one index seems of limited usefulness in the first place and it makes inheriting the existing indexing machinery for NDArrays more complicated (any efficiency gains in avoiding the intermediate object creation by using two separate index operations will likely be offset by the slowness of handling much more complicated mixed indices). Perhaps someone can argue for why mixing field indices with array indices is important, but for now we will prohibit this mode of indexing. This does point to a possible enhancement for the field indexing, namely being able to provide the equivalent of index arrays (e.g., a list of field names) to generate a new record array with a subset of fields. Are there any other issues that should be addressed for improving record arrays? From rowen at u.washington.edu Tue Jul 20 10:15:05 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Tue Jul 20 10:15:05 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: References: Message-ID: At 12:04 PM -0400 2004-07-20, Perry Greenfield wrote: >...(a detailed summary of proposed changes to numarray record arrays) +1 on all of it with one exception noted below. This sounds like a first-rate overhaul and is much appreciated. Will it be possible, when creating a new records array, to specify types of a record array as a list of normal numarray types? Currently one has to specify the types as a "formats" string, which is nonstandard. I'm unhappy about one proposal: >... >Record array behavior changes: >... >5) Field name indexing for record arrays. It will be possible to index >record arrays with a field name, i.e., if the index is a string, then what >will be returned is a numarray/chararray for that column. (Note that it >won't be possible to index record arrays by field number for obvious >reasons). > >I.e. Currently > >>>> col = recArr.field('doc') > >Can also be > >>>> col = recArr['abc'] > >But the current > >>>> col = recArr.field(1) > >Cannot become > >>>> col = recArr[1] I think recarray[field name] is too easily confused with recarray[index] and is unnecessary. I suggest one of two solutions: - Do nothing. Make users use field(field name or index) or - Allow access to the fields via an indexable entity. Simplest for the user would be to use "field" itself: recArr.field[1] recArr.field["abc"] (i.e. field becomes an object that can be called or can be accessed via __getitem__) This could easily support index arrays (a topic you brought up and that sound appealing to me): recArr.field[index array] and it might even be practical to support: recArr.field[sequence of field indices and/or names] e.g. recArr.field[(ind 1, field name 2, ind 3...)] You asked about other issues. One that comes to mind is record arrays of record arrays. Should they be allowed? My gut reaction is yes if it's not too hard. Folks always seem to find a use for generality if it's offered. On the other hand, if it's hard, it's not worth the effort. If they are allowed, users are going to want some efficient way to get to a particular field (i.e. in one call even if the field is several recArrays deep). That could get messy. Thanks for a great posting. The improvements to record arrays sound first-rate. -- Russell From hsu at stsci.edu Wed Jul 21 11:53:40 2004 From: hsu at stsci.edu (Jin-chung Hsu) Date: Wed Jul 21 11:53:40 2004 Subject: [Numpy-discussion] formats in record array Message-ID: <200407211850.AOO09987@donner.stsci.edu> > From: Russell E Owen > Subject: Re: [Numpy-discussion] Proposed record array behavior: the rest > of the story > Will it be possible, when creating a new records array, to specify > types of a record array as a list of normal numarray types? Currently > one has to specify the types as a "formats" string, which is > nonstandard. In theory it is easy to do that except you can't specify cell arrays, i.e. how do you specify the equivalent of: formats=['3Int16', '(4,5)Float32'] with the numarray type instances? JC Hsu From rlw at stsci.edu Wed Jul 21 12:23:07 2004 From: rlw at stsci.edu (Rick White) Date: Wed Jul 21 12:23:07 2004 Subject: [Numpy-discussion] formats in record array In-Reply-To: <200407211850.AOO09987@donner.stsci.edu> Message-ID: On Wed, 21 Jul 2004, Jin-chung Hsu wrote: > > From: Russell E Owen > > Subject: Re: [Numpy-discussion] Proposed record array behavior: the rest > > of the story > > > > Will it be possible, when creating a new records array, to specify > > types of a record array as a list of normal numarray types? Currently > > one has to specify the types as a "formats" string, which is > > nonstandard. > > In theory it is easy to do that except you can't specify cell arrays, i.e. > how do you specify the equivalent of: > > formats=['3Int16', '(4,5)Float32'] > > with the numarray type instances? > > JC Hsu Well, how about one (or both) of these: formats = 3*(Int16,), 4*(5*(Float32,),) formats = (3,Int16), ((4,5), Float32) From kyeser at earthlink.net Wed Jul 21 18:19:07 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Wed Jul 21 18:19:07 2004 Subject: [Numpy-discussion] Is there a better way to do this? Message-ID: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> My question is not directly related to NumPy, but since many people here deal with numbers, I was wondering if I could get some help; it would be even better if there is a NumPy (or Numarray) function that takes care of what I want! I'm trying to write a program that computes six-digit numbers, in which the left digit is always smaller than its following digit (i.e., it's always ascending). The best I could do was to have many embedded 'for' statement: c = 1 for p0 in range(0, 7): for p1 in range(1, 12): for p2 in range(2, 12): for p3 in range(3, 12): for p4 in range(4, 12): for p5 in range(5, 12): if p0 < p1 < p2 < p3 < p4 < p5: print repr(c).rjust(3), "\t", print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) c += 1 print "...Done" This works, except that it's very slow. I need to get it up to nine-digit numbers, in which case it's significantly slow. I was wondering if there is a more efficient way to do this. I would highly appreciate it if anyone could help. Many thanks. -Kye From jcollins_boulder at earthlink.net Wed Jul 21 18:49:10 2004 From: jcollins_boulder at earthlink.net (Jeffery D. Collins) Date: Wed Jul 21 18:49:10 2004 Subject: [Numpy-discussion] Is there a better way to do this? In-Reply-To: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> References: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> Message-ID: <40FF1D11.8090606@earthlink.net> Hee-Seng Kye wrote: > My question is not directly related to NumPy, but since many people > here deal with numbers, I was wondering if I could get some help; it > would be even better if there is a NumPy (or Numarray) function that > takes care of what I want! > > I'm trying to write a program that computes six-digit numbers, in > which the left digit is always smaller than its following digit (i.e., > it's always ascending). The best I could do was to have many embedded > 'for' statement: > > c = 1 > for p0 in range(0, 7): > for p1 in range(1, 12): > for p2 in range(2, 12): > for p3 in range(3, 12): > for p4 in range(4, 12): > for p5 in range(5, 12): > if p0 < p1 < p2 < p3 < p4 < p5: > print repr(c).rjust(3), "\t", > print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) > c += 1 > print "...Done" > > This works, except that it's very slow. I need to get it up to > nine-digit numbers, in which case it's significantly slow. I was > wondering if there is a more efficient way to do this. > > I would highly appreciate it if anyone could help. This appears to give the same results and is significantly faster. def vers1(): c = 1 for p0 in range(0, 7): for p1 in range(p0+1, 12): for p2 in range(p1+1, 12): for p3 in range(p2+1, 12): for p4 in range(p3+1, 12): for p5 in range(p4+1, 12): print repr(c).rjust(3), "\t", print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) c += 1 print "...Done" > > Many thanks. > > -Kye > -- Jeff From rlw at stsci.edu Wed Jul 21 22:03:03 2004 From: rlw at stsci.edu (Rick White) Date: Wed Jul 21 22:03:03 2004 Subject: [Numpy-discussion] Is there a better way to do this? In-Reply-To: <16A7C641-DB7D-11D8-A37A-000393479EE8@earthlink.net> Message-ID: On Wed, 21 Jul 2004, Hee-Seng Kye wrote: > I'm trying to write a program that computes six-digit numbers, in which > the left digit is always smaller than its following digit (i.e., it's > always ascending). Here's another version that is a little faster still: def f3(): c = 1 for p0 in range(0, 7): for p1 in range(p0+1, 8): for p2 in range(p1+1, 9): for p3 in range(p2+1, 10): for p4 in range(p3+1, 11): for p5 in range(p4+1, 12): print repr(c).rjust(3), "\t", print "%X %X %X %X %X %X" % (p0, p1, p2, p3, p4, p5) c += 1 print "...Done" This is plenty fast even for 9-digit numbers. In fact it gets a little faster for larger numbers of digits. This problem is completely equivalent to the problem of finding all combinations of 6 numbers chosen from the digits 0..11. If you sort the digits of each combination in ascending order, you get your numbers. So if you search for something like "Python permutations combinations" you can find other algorithms that work. Here's a recursive version: def f4(n, digits=range(12)): if n==0: return [[]] rv = [] for i in range(len(digits)): for cc in f4(n-1,digits[i+1:]): rv.append([digits[i]]+cc) return rv That returns a list of all the number sets having n digits. It's slower than the loop version but is more general. There are fast C versions of this sort of thing out there, I think. Rick White From falted at pytables.org Thu Jul 22 02:47:27 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 22 02:47:27 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: References: Message-ID: <200407221146.41319.falted@pytables.org> Hi, I agree that numarray team's overhaul of RecArray access modes is very good and I agree most of it. A Dimarts 20 Juliol 2004 19:14, Russell E Owen va escriure: > I think recarray[field name] is too easily confused with > recarray[index] and is unnecessary. Yeah, maybe you are right. > I suggest one of two solutions: > - Do nothing. Make users use field(field name or index) > or > - Allow access to the fields via an indexable entity. Simplest for > the user would be to use "field" itself: > recArr.field[1] > recArr.field["abc"] > (i.e. field becomes an object that can be called or can be accessed > via __getitem__) I prefer the second one. Although I know that you don't like the __getattr__ method, the field object can be used to host one. The main advantage I see having such a __getattr__ method is that I'm very used to press TAB twice in the python console with its completion capabilities activated. It would be a very nice way of interactively discovering the fields of a RecArray object. I don't know whether this feature is used a lot or not out there, but for me is just great. I understand, however, that having to include a map to suport non-vbalid python names for field names can be quite inconvenient. Regards, -- Francesc Alted From cjw at sympatico.ca Thu Jul 22 05:22:01 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Thu Jul 22 05:22:01 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story In-Reply-To: <200407221146.41319.falted@pytables.org> References: <200407221146.41319.falted@pytables.org> Message-ID: <40FFB132.10103@sympatico.ca> Francesc Alted wrote: >Hi, > >I agree that numarray team's overhaul of RecArray access modes is very good >and I agree most of it. > >A Dimarts 20 Juliol 2004 19:14, Russell E Owen va escriure: > > >>I think recarray[field name] is too easily confused with >>recarray[index] and is unnecessary. >> >> > >Yeah, maybe you are right. > > > >>I suggest one of two solutions: >>- Do nothing. Make users use field(field name or index) >>or >>- Allow access to the fields via an indexable entity. Simplest for >>the user would be to use "field" itself: >> recArr.field[1] >> recArr.field["abc"] >>(i.e. field becomes an object that can be called or can be accessed >>via __getitem__) >> >> > >I prefer the second one. Although I know that you don't like the __getattr__ >method, the field object can be used to host one. The main advantage I see >having such a __getattr__ method is that I'm very used to press TAB twice in >the python console with its completion capabilities activated. It would be a >very nice way of interactively discovering the fields of a RecArray object. >I don't know whether this feature is used a lot or not out there, but for me >is just great. I understand, however, that having to include a map to >suport non-vbalid python names for field names can be quite inconvenient. > >Regards, > > Perry's issue 3. Perhaps there is a need to separate the name or identifier of a column in a RecArray or a field in a Record from its label. The labels, for display purposes, would default to the column names. The column names would default, as at present, to the Cn form. I like the use of attributes for the column names, it avoids the problem Russell Owen mentioned above. Suppose we have a simple RecArray with the fields "name" and "age", it's much simpler to write rec.name or rec.age that rec["name"] or rec["age"]. The problems with the use of attributes, which must be Python names, are (1) they cannot have accented or special characters eg ?, ?, @, & * etc. and (2) there is a danger of conflict with existing properties or attributes. My guess is that the special characters would be required primarily for display purposes. Thus, the label could meet that need. The danger of conflict could be addressed by raising an exception. There remains a possible problem where identifiers are passed on from some other system, perhaps a database. Thus, the primary identifier of a row in a RecArray would be an integer index and that of a column or field would be a standard Python identifer. Although, at times, it would be useful to be able to index the individual fields (or columns) as part of the usual indexing scheme. Thus rec[2, 3, 4] could identify a record and rec[2, 3, 4].age or rec[2, 3, 4, 5] could identify the sixth field in that record. The use of attributes raises the possibility that one could have nested records. For example, suppose one has an address record: addressRecord streetNumber streetName postalCode ... There could then be a personal record: personRecord ... officeAddress homeAddress ... One could address a component as rec.homeAddress.postalCode. Finally, there was mention, earlier in the discussion, of facilitating the indexing of a RecArray. I hope that some way will be found to do this. Colin W. From kyeser at earthlink.net Thu Jul 22 13:24:06 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Thu Jul 22 13:24:06 2004 Subject: [Numpy-discussion] Is there a better way to do this? In-Reply-To: References: Message-ID: Thanks a lot everyone for suggestions. On my slow machine (667 MHz), inefficient programs run even slower, and when I expand the program to calculate 9-digit numbers, there is almost a 2-minute difference! Thanks again. Best, Kye From sag at hydrosphere.com Thu Jul 22 15:34:11 2004 From: sag at hydrosphere.com (sag at hydrosphere.com) Date: Thu Jul 22 15:34:11 2004 Subject: [Numpy-discussion] Unpickling python 2.2 UserArray objs in python 2.3 Message-ID: <40FFF0A2.26467.FBF2E27@localhost> I have a large bunch of objects that subclass UserArray from Numeric 22. These objects were created and pickled in binary mode in Python2.2 and stored in a mysql database on Red hat 8. Using Python2.2, I can easily retrieve and unpickle the objects. I have just upgraded the system to Fedora Core 2 which supplies Python 2.3.3. After much hassle, I have been able to compile Numeric 1.0 (ver 23) and have tried to unpickle these objects. Now, I get a failure in the loads call. The code is: import cPickle obj = cPickle.loads(str(blob)) When this is called, the python interpreter (via IDLE) goes into a loop in the UserArray __getattr__ function.(line 198): return getattr(self.array,attr) >> File "/usr/lib/python2.3/site-packages/Numeric/UserArray.py" line 198, in __getattr__ >> return getattr(self.array,attr) No other error is reported, just a stack full of these lines. It seems that at this point, UserArray doesn't know that it has an 'array' attr. This worked just fine in Python2.2. Has something changed in Python2.3 cPickle functions or in how Numeric 23 handles pickle/unpickle that would make my Python2.2 blobs unusable in Python 2.3? Is there a solution for this, other than remaking my blobs (not an option - there are literally millions of them), or must I figure out how to access python2.2 for this code? So far as I can tell, the string I get back is exactly the same for both versions. Any help you can give me would be appreciated. Thanks sue giller From kyeser at earthlink.net Fri Jul 23 07:31:07 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Fri Jul 23 07:31:07 2004 Subject: [Numpy-discussion] A bit long, but would appreciate anyone's help, if time permits! Message-ID: Hi. Like my previous post, my question is not directly related to Numpy, but I couldn't help posting it since many people here deal with numbers. I have a question that requires a bit of explanation. I would highly appreciate it if anyone could read this and offer any suggestions, whenever time permits. I'm trying to write a program that 1) gives all possible rotations of an ordered list, 2) chooses the ordering that has the smallest difference from first to last element of the rotation, and 3) continues to compare the difference from first to second-to-last element, and so on, if there was a tie in step 2. The following is the output of a function I wrote. The first 6 lines are all possible rotations of [0,1,3,6,7,10], and this takes care of step 1 mentioned above. The last line provides the differences (mod 12). If the last line were denoted as r, r[0] lists the differences from first to last element of each rotation (p0 through p5), r[1] the differences from first to second-to-last element, and so on. >>> from normal import normal >>> normal([0,1,3,6,7,10]) [0, 1, 3, 6, 7, 10] #p0 [1, 3, 6, 7, 10, 0] #p1 [3, 6, 7, 10, 0, 1] #p2 [6, 7, 10, 0, 1, 3] #p3 [7, 10, 0, 1, 3, 6] #p4 [10, 0, 1, 3, 6, 7] #p5 [[10, 11, 10, 9, 11, 9], [7, 9, 9, 7, 8, 8], [6, 6, 7, 6, 6, 5], [3, 5, 4, 4, 5, 3], [1, 2, 3, 1, 3, 2]] #r Here is my question. I'm having trouble realizing step 2 (and 3, if necessary). In the above case, the smallest number in r[0] is 9, which is present in both r[0][3] and r[0][5]. This means that p3 and p5 and only p3 and p5 need to be further compared. r[1][3] is 7, and r[1][5] is 8, so the comparison ends here, and the final result I'm looking for is p3, [6,7,10,0,1,3] (the final 'n' value for 'pn' corresponds to the final 'y' value for 'r[x][y]'). How would I find the smallest values of a list r[0], take only those values (r[0][3] and r[0][5]) for further comparison (r[1][3] and r[1][5]), and finally print a p3? Thanks again for reading this. If there is anything unclear, please let me know. Best, Kye My code begins here: #normal.py def normal(s): s.sort() r = [] q = [] v = [] for x in range(0, len(s)): k = s[x:]+s[0:x] r.append(k) for y in range(0, len(s)): print r[y], '\t' d = [] for yy in range(len(s)-1, 0, -1): w = (r[y][yy]-r[y][0])%12 d.append(w) q.append(d) for z in range(0, len(s)-1): d = [] for zz in range(0, len(s)): w = q[zz][z] d.append(w) v.append(d) print '\n', v From sag at hydrosphere.com Fri Jul 23 10:09:11 2004 From: sag at hydrosphere.com (sag at hydrosphere.com) Date: Fri Jul 23 10:09:11 2004 Subject: [Numpy-discussion] re: Unpickling python 2.2 userArray objs in python 2.3 Message-ID: <4100F5DD.17007.13BB9C82@localhost> I have further information on my problem of unpickling an object that is based on Numeric.UserArray class. I can recreate the endless getattr loop with the following code, which is a small subsection of my class: data = Numeric.ones(31,savespace=1) ua = UserArray(data) blob = cPickle.dumps(ua) obj = cPickle.loads(blob) <-- fails here If you pickle the data obj, everything works. This code works in Python2.2. Is this a bug? Is it fixable? sue From jmiller at stsci.edu Fri Jul 23 10:30:15 2004 From: jmiller at stsci.edu (Todd Miller) Date: Fri Jul 23 10:30:15 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <20040718212443.M21561@grenoble.cnrs.fr> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040718212443.M21561@grenoble.cnrs.fr> Message-ID: <1090603727.7138.33.camel@halloween.stsci.edu> Hi Gerard, I finally got to your numnum stuff today... awesome work! You've got lots of good suggestions. Here are some comments: 1. Thanks for catching the early return problem with numarray's import_array(). It's not just bad, it's wrong. It'll be fixed for 1.1. 2. That said, I think expanding the macros in-line in numnum is a mistake. It seems to me that "import_array(); PyErr_Clear();" or something like it ought to be enough... after numarray-1.1 anyway. 3. I think there's a problem in numnum.toNP() because of numarray's array "behavior" issues. A test needs to be done to ensure that the incoming array is not byteswapped or misaligned; if it is, the easy fix is to make a numarray copy of the array before copying it to Numeric. 4. Kudos for the LP64 stuff. numconfig is a thorn in the side of the PEP, so I'll put your techniques into numarray for 1.1. HAS_FLOAT128 is not currently used, so it might be time to ditch it. Anyway, thanks! 5. PyArray_Present() and isArray() are superfluous *now*. I was planning to add them to Numeric. 6. The LGPL may be a problem for us and is probably an issue if we ever try to get numnum into the Python distribution. It would be better to release numnum under the modified BSD license, same as numarray. 7. Your API struct was very clean. Eventually I'll regenerate numarray like that. 8. I logged your comments and bug reports on Source Forge and eventually they'll get fixed. A to Z the numnum/pep code is beautiful. Next stop, header PEP update. Regards, Todd On Sun, 2004-07-18 at 17:24, gerard.vermeulen at grenoble.cnrs.fr wrote: > Hi Todd, > > This is a follow-up on the 'header pep' discussion. > > The attachment numnum-0.1.tar.gz contains the sources for the > extension modules pep and numnum. At least on my systems, both > modules behave as described in the 'numarray header PEP' when the > extension modules implementing the C-API are not present (a situation > not foreseen by the macros import_array() of Numeric and especially > numarray). IMO, my solution is 'bona fide', but requires further > testing. > > The pep module shows how to handle the colliding C-APIs of the Numeric > and numarray extension modules and how to implement automagical > conversion between Numeric and numarray arrays. > > For a technical reason explained in the README, the hard work of doing > the conversion between Numeric and numarray arrays has been delegated > to the numnum module. The numnum module is useful when one needs to > convert from one array type to the other to use an extension module > which only exists for the other type (eg. combining numarray's image > processing extensions with pygame's Numeric interface): > > Python 2.3+ (#1, Jan 7 2004, 09:17:35) > [GCC 3.3.1 (SuSE Linux)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numnum; import Numeric as np; import numarray as na > >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) > >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) > >>> print type(np1); np1; type(np2); np2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]],'i') > >>> print type(na1); na1; type(na2); na2 > > array([[1, 2], > [3, 4]]) > > array([[1, 2, 3], > [4, 5, 6]]) > >>> > > The pep module shows how to implement array processing functions which > use the Numeric, numarray or Sequence C-API: > > static PyObject * > wysiwyg(PyObject *dummy, PyObject *args) > { > PyObject *seq1, *seq2; > PyObject *result; > > if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) > return NULL; > > switch(API) { > case NumericAPI: > { > PyObject *np1 = NN_API->toNP(seq1); > PyObject *np2 = NN_API->toNP(seq2); > result = np_wysiwyg(np1, np2); > Py_XDECREF(np1); > Py_XDECREF(np2); > break; > } > case NumarrayAPI: > { > PyObject *na1 = NN_API->toNA(seq1); > PyObject *na2 = NN_API->toNA(seq2); > result = na_wysiwyg(na1, na2); > Py_XDECREF(na1); > Py_XDECREF(na2); > break; > } > case SequenceAPI: > result = seq_wysiwyg(seq1, seq2); > break; > default: > PyErr_SetString(PyExc_RuntimeError, "Should never happen"); > return 0; > } > > return result; > } > > See the README for an example session using the pep module showing that > it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). > > Notes: > > - it is straightforward to adapt pep and numnum so that the conversion > functions are linked into pep instead of imported. > > - numnum is still 'proof of concept'. I am thinking about methods to > make those techniques safer if the numarray (and Numeric?) header > files make it never into the Python headers (or make it safer to > use those techniques with Python < 2.4). In particular it would > be helpful if the numerical C-APIs export an API version number, > similar to the versioning scheme of shared libraries -- see the > libtool->versioning info pages. > > I am considering three possibilities to release a more polished > version of numnum (3rd party extension writers may prefer to link > rather than import numnum's functionality): > > 1. release it from PyQwt's project page > 2. register an independent numnum project at SourceForge > 3. hand numnum over to the Numerical Python project (frees me from > worrying about API changes). > > > Regards -- Gerard Vermeulen -- From eric at enthought.com Fri Jul 23 10:56:07 2004 From: eric at enthought.com (eric jones) Date: Fri Jul 23 10:56:07 2004 Subject: [Numpy-discussion] ANN: SciPy04 -- Last day for abstracts and early registration! Message-ID: <4101510B.9050005@enthought.com> Hey Group, Just a reminder that this is the last day to submit abstracts for SciPy04. It is also the last day for early registration. More information is here: http://www.scipy.org/wikis/scipy04 About the Conference and Keynote Speaker --------------------------------------------- The 1st annual *SciPy Conference* will be held this year at Caltech, September 2-3, 2004. As some of you may know, we've experienced great participation in two SciPy "Workshops" (with ~70 attendees in both 2002 and 2003) and this year we're graduating to a "conference." With the prestige of a conference comes the responsibility of a keynote address. This year, Jim Hugunin has answered the call and will be speaking to kickoff the meeting on Thursday September 2nd. Jim is the creator of Numeric Python, Jython, and co-designer of AspectJ. Jim is currently working on IronPython--a fast implementation of Python for .NET and Mono. Presenters ----------- We still have room for a few more standard talks, and there is plenty of room for lightning talks. Because of this, we are extending the abstract deadline until July 23rd. Please send your abstract to abstracts at scipy.org. Travis Oliphant is organizing the presentations this year. (Thanks!) Once accepted, papers and/or presentation slides are acceptable and are due by August 20, 2004. Registration ------------- Early registration ($100.00) has been extended to July 23rd. Follow the links off of the main conference site: http://www.scipy.org/wikis/scipy04 After July 23rd, registration will be $150.00. Registration includes breakfast and lunch Thursday & Friday and a very nice dinner Thursday night. Please register as soon as possible as it will help us in planning for food, room sizes, etc. Sprints -------- As of now, we really haven't had much of a call for coding sprints for the 3 days prior to SciPy 04. Below is the original announcement about sprints. If you would like to suggest a topic and see if others are interested, please send a message to the list. Otherwise, we'll forgo the sprints session this year. We're also planning three days of informal "Coding Sprints" prior to the conference -- August 30 to September 1, 2004. Conference registration is not required to participate in the sprints. Please email the list, however, if you plan to attend. Topics for these sprints will be determined via the mailing lists as well, so please submit any suggestions for topics to the scipy-user list: list signup: http://www.scipy.org/mailinglists/ list address: scipy-user at scipy.org thanks, eric From cjw at sympatico.ca Sat Jul 24 07:18:04 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Jul 24 07:18:04 2004 Subject: [Numpy-discussion] A bit long, but would appreciate anyone's help, if time permits! In-Reply-To: References: Message-ID: <41026F91.3090706@sympatico.ca> Hee-Seng Kye wrote: > Hi. Like my previous post, my question is not directly related to Numpy, True, but numarray can be of help. > but I couldn't help posting it since many people here deal with > numbers. I have a question that requires a bit of explanation. I > would highly appreciate it if anyone could read this and offer any > suggestions, whenever time permits. > > I'm trying to write a program that 1) gives all possible rotations of > an ordered list, 2) chooses the ordering that has the smallest > difference from first to last element of the rotation, and 3) > continues to compare the difference from first to second-to-last > element, and so on, if there was a tie in step 2. > > The following is the output of a function I wrote. The first 6 lines > are all possible rotations of [0,1,3,6,7,10], and this takes care of > step 1 mentioned above. The last line provides the differences (mod > 12). If the last line were denoted as r, r[0] lists the differences > from first to last element of each rotation (p0 through p5), r[1] the > differences from first to second-to-last element, and so on. > > >>> from normal import normal > >>> normal([0,1,3,6,7,10]) > [0, 1, 3, 6, 7, 10] #p0 > [1, 3, 6, 7, 10, 0] #p1 > [3, 6, 7, 10, 0, 1] #p2 > [6, 7, 10, 0, 1, 3] #p3 > [7, 10, 0, 1, 3, 6] #p4 > [10, 0, 1, 3, 6, 7] #p5 > > [[10, 11, 10, 9, 11, 9], [7, 9, 9, 7, 8, 8], [6, 6, 7, 6, 6, 5], [3, > 5, 4, 4, 5, 3], [1, 2, 3, 1, 3, 2]] #r > > Here is my question. I'm having trouble realizing step 2 (and 3, if > necessary). In the above case, the smallest number in r[0] is 9, > which is present in both r[0][3] and r[0][5]. This means that p3 and > p5 and only p3 and p5 need to be further compared. r[1][3] is 7, and > r[1][5] is 8, so the comparison ends here, and the final result I'm > looking for is p3, [6,7,10,0,1,3] (the final 'n' value for 'pn' > corresponds to the final 'y' value for 'r[x][y]'). > > How would I find the smallest values of a list r[0], take only those > values (r[0][3] and r[0][5]) for further comparison (r[1][3] and > r[1][5]), and finally print a p3? > > Thanks again for reading this. If there is anything unclear, please > let me know. > > Best, > Kye > > My code begins here: [snip] The following reproduces your result, but I'm not sure that it does what you want to do. Best wishes. Colin W. # Kye.py #normal.py def normal(s): s.sort() r = [] q = [] v = [] for x in range(0, len(s)): k = s[x:]+s[0:x] r.append(k) for y in range(0, len(s)): print r[y], '\t' d = [] for yy in range(len(s)-1, 0, -1): w = (r[y][yy]-r[y][0])%12 d.append(w) q.append(d) for z in range(0, len(s)-1): d = [] for zz in range(0, len(s)): w = q[zz][z] d.append(w) v.append(d) print '\n', v def findMinima(i, lst): global diff print 'lst:', lst, 'i:', i res= [] dataRow= diff[i].take(lst) fnd= dataRow.argmin() val= val0= dataRow[fnd] while val == val0: fndRes= lst[fnd] # This will become the result iff no dupicate found res.append(fnd) dataRow[fnd]= 100 fnd= dataRow.argmin() val0= dataRow[fnd] if len(res) == 1: return fndRes else: ret= findMinima(i-1, res) return ret def normal1(s): import numarray.numarraycore as _num import numarray.numerictypes as _nt global diff s= _num.array(s) s.sort() rl= len(s) r= _num.zeros(shape= (rl, rl), type= _nt.Int) for i in range(rl): r[i, 0:rl-i]= s[i:] if i: r[i, rl-i:]= s[0:i] subtr= r[0].repeat(5, 1).resize(6, 5) subtr.transpose() neg= r[1:] < subtr diff= r[1:]-subtr + 12 * neg return 'The selectect rotation is:', r[findMinima(diff._shape[0]-1, range(diff._shape[1]))] if __name__ == '__main__': print normal1([0,1,3,6,7,10]) > > #normal.py > def normal(s): > s.sort() > r = [] > q = [] > v = [] > > for x in range(0, len(s)): > k = s[x:]+s[0:x] > r.append(k) > > for y in range(0, len(s)): > print r[y], '\t' > d = [] > for yy in range(len(s)-1, 0, -1): > w = (r[y][yy]-r[y][0])%12 > d.append(w) > q.append(d) > > for z in range(0, len(s)-1): > d = [] > for zz in range(0, len(s)): > w = q[zz][z] > d.append(w) > v.append(d) > print '\n', v > > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From riiuwjjnivge at yahoo.com Sat Jul 24 08:38:04 2004 From: riiuwjjnivge at yahoo.com (riiuwjjnivge at yahoo.com) Date: Sat Jul 24 08:38:04 2004 Subject: [Numpy-discussion] Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns R753KT98 Message-ID: <249974lbl4oi11j$1so1q6g39$95a678wba@airmen.yahoo.com> E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. A_RM_M - our NEW stck pick - GREAT N.EWS V650OE49 >A.RMM - our NEW s_t_0_c_k p1ck = GREAT N_E_WS V3501136 NnnEW St_ock Pick - Hug.e Mon-day - /ArMm\ m468MV68 NewW Stoc-k Pick + Hug.e Mon-day - ArMm = Earn_1ngs 1497cJ72 Mas_sive G.a1ns - F0r-casted For Mond#y g984iJ69 Monday F0rcaSST is A>R.M.M - Read & Earnn Z8697B79 In-Creased Earn-ings Report - AR-MM - For Monday Morning l547BH81 EX PLO SIVE Gain-s - ALERT for MONDAY T288xC38 NewsWire - Double your Monday Earn>ings! q664qv16 A,L,E,R,T - A>R>M>M- This st0ck is h0t - They announced great news l993L941 A>RM is about to EXPL0DE - A c t n_o_w Z484TE26 - Ma-jor TradeeE Al_ert! !e330vH15 1O to 2O cent in=crease monday. Ma_jor ALer.t. c8620c55 New P1ck Bownd to Dou_ble & Tri_ple. A.R/M.M.. I942qD93 B1gGa1ns For-M0nday = (2X)Double Your Pr0fits!y747s506 UpCOMING Mondays Hot/test St O CK {2x} PROF!TS L572lS00 Get Ins1ders SEcrEt_s - A|R|M|M Sets to Expl0de U812Jb41 Ab0ut To Expl0de - y142qK13 Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns 7074WE36 M0nday Ga1ns, *ARMM*, St0ck NewsW1re g504mo93 {3x} Ur m0nDay Pr0FITS - A\R\M\M w433T229 Break.ing New.s for ARM.M - American Resource Management, Inc. E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. barycentric deform conservator cacophony critter addison armament complain difluoride boris discriminatory boron abo deoxyribose boorish compote belfast carolingian court albania accentuate belshazzar bridesmaid breakwater brandish average bolshevism coppery From riiuwjjnivge at yahoo.com Sat Jul 24 08:42:06 2004 From: riiuwjjnivge at yahoo.com (riiuwjjnivge at yahoo.com) Date: Sat Jul 24 08:42:06 2004 Subject: [Numpy-discussion] Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns R753KT98 Message-ID: <249974lbl4oi11j$1so1q6g39$95a678wba@airmen.yahoo.com> E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. A_RM_M - our NEW stck pick - GREAT N.EWS V650OE49 >A.RMM - our NEW s_t_0_c_k p1ck = GREAT N_E_WS V3501136 NnnEW St_ock Pick - Hug.e Mon-day - /ArMm\ m468MV68 NewW Stoc-k Pick + Hug.e Mon-day - ArMm = Earn_1ngs 1497cJ72 Mas_sive G.a1ns - F0r-casted For Mond#y g984iJ69 Monday F0rcaSST is A>R.M.M - Read & Earnn Z8697B79 In-Creased Earn-ings Report - AR-MM - For Monday Morning l547BH81 EX PLO SIVE Gain-s - ALERT for MONDAY T288xC38 NewsWire - Double your Monday Earn>ings! q664qv16 A,L,E,R,T - A>R>M>M- This st0ck is h0t - They announced great news l993L941 A>RM is about to EXPL0DE - A c t n_o_w Z484TE26 - Ma-jor TradeeE Al_ert! !e330vH15 1O to 2O cent in=crease monday. Ma_jor ALer.t. c8620c55 New P1ck Bownd to Dou_ble & Tri_ple. A.R/M.M.. I942qD93 B1gGa1ns For-M0nday = (2X)Double Your Pr0fits!y747s506 UpCOMING Mondays Hot/test St O CK {2x} PROF!TS L572lS00 Get Ins1ders SEcrEt_s - A|R|M|M Sets to Expl0de U812Jb41 Ab0ut To Expl0de - y142qK13 Hot Stock Newsflash, ARMM expecting Mass|ve M0nday Ga1ns 7074WE36 M0nday Ga1ns, *ARMM*, St0ck NewsW1re g504mo93 {3x} Ur m0nDay Pr0FITS - A\R\M\M w433T229 Break.ing New.s for ARM.M - American Resource Management, Inc. E.fficiency Technologies, Inc.'s New Centrif.ugal Chiller Efficiency and Management Tool Can He.lp S.ave Industry Bi.llions in Energy C.osts ARMM lau.nch n.ew s.ervice (EffHVAC) D.ont miss this g.reat inves.tment issue! ARMM is another ho.t public tr.aded comp.any that is set to so.ar on Monday, July 26th.. BIG PR camp.aign sta.rting on 26th of July for ARMM - S.t0ck will e.xpl0de - Just read the news --------------------- P.rice on Friday: 10Cents In our o.pinion N.ext 3 days p.otential p.rice: 35Cents In our o.pinion N.ext 10 days p.otential p.rice: 45Cents --------------------- G.et on B.oard with ARMM and e.njoy some i.ncredible p.rofits in the n.ext 3-10 days_!_! ALL T.ECHNICAL I.NDICATORS SAY - B.U.Y ARMM @ up to 35cents! Significant short term t.rading p.rofits in ARMM are being p.redicted, great n.ews a.lready issued by the c.ompany and big PR c.ampaign on the way in the n.ext few days. C.OMPANY P.ROFILE --------------> American Resource Management, Inc., through its w.holly-owned s.ubsidiary, E.fficiency T.echnologies, Inc. ("EffTec") is a Tulsa, Oklahoma based c.ompany d.edicated to developing energy efficiency m.onitoring programs for c.ommercial/i.ndustrial HVAC systems principally made up of c.entrifugal chillers and boilers. Centrifugal chillers are the single largest energy-using components in most facilities and can typically consume more than 50% of the total electrical usage. Centrifugal chillers running inefficiently result in substantially higher e.nergy c.osts, decreased equipment reliability and shortened l.ifespan. EffTec has developed a p.owerful, easy-to-use, online d.iagnostic s.ervice called EffHVAC that gives f.acilities the a.bility to document, m.onitor, e.valuate and m.anage c.entrifugal c.hiller system p.erformance. EffHVAC c.reated detailed reports that contain a w.ealth of i.nformation that can be used to improve operations and save t.housands of d.ollars in u.tility c.osts. EffTec offers c.omprehensive and f.lexible HVAC consulting and training. Our t.eam consists of industry-recognized e.xperts in HVAC system design, efficiency, preventive and proactive maintenance, repair, chemistry, computer programming and m.arketing. Combine EffHVAC with our consulting services and start d.eveloping a w.orld-class HVAC program to improve your b.ottom line. Inform.ation within this email contains "f.orward look.ing state.ments" within the meaning of Sect.ion 27A of the Sec.urities Ac.t of 1933 and Sect.ion 21B of the Securit.ies Exc.hange Ac.t of 1934. Any stat.ements that express or involve discu.ssions with resp.ect to pre.dictions, goa.ls, expec.tations, be.liefs, pl.ans, proje.ctions, object.ives, assu.mptions or fut.ure eve.nts or perform.ance are not stat.ements of histo.rical fact and may be "forw.ard loo.king stat.ements." For.ward looking state.ments are based on expect.ations, estim.ates and project.ions at the time the statem.ents are made that involve a number of risks and uncertainties which could cause actual results or events to differ materially from those prese.ntly anticipated. Forward look.ing statements in this action may be identified through the use of words su.ch as: "pro.jects", "for.esee", "expects", "est.imates," "be.lieves," "underst.ands" "wil.l," "part of: "anticip.ates," or that by stat.ements indi.cating certain actions "may," "cou.ld," or "might" occur. All information provided within this em.ail pertai.ning to inv.esting, st.ocks, securi.ties must be under.stood as informa.tion provided and not investm.ent advice. Eme.rging Equity Al.ert advi.ses all re.aders and subscrib.ers to seek advice from a registered profe.ssional secu.rities represent.ative before dec.iding to trade in sto.cks featured within this ema.il. None of the mate.rial within this rep.ort shall be constr.ued as any kind of invest.ment advi.ce. Please have in mind that the interpr.etation of the witer of this newsl.etter about the news published by the company does not represent the com.pany official sta.tement and in fact may differ from the real meaning of what the news rele.ase meant to say. Please read the news release by your.self and judge by yourself about the detai.ls in it. In compli.ance with Sec.tion 17(b), we discl.ose the hol.ding of ARMM s.hares prior to the publi.cation of this report. Be aware of an inher.ent co.nflict of interest res.ulting from such holdi.ngs due to our intent to pro.fit from the liqui.dation of these shares. Sh.ares may be s.old at any time, even after posi.tive state.ments have been made regard.ing the above company. Since we own sh.ares, there is an inher.ent conf.lict of inte.rest in our statem.ents and opin.ions. Readers of this publi.cation are cauti.oned not to place und.ue relia.nce on forw.ard-looki.ng statements, which are based on certain assump.tions and expectati.ons invo.lving various risks and uncert.ainties, that could cause results to differ materi..ally from those set forth in the forw.ard- looking state.ments. Please be advi.sed that noth.ing within this em.ail shall cons.titute a solic.itation or an offer to buy or sell any s.ecurity menti.oned her.ein. This news.letter is neither a regi.stered inves.tment ad.visor nor affil.iated with any brok.er or dealer. All statements made are our e.xpress o.pinion only and should be treated as such. We may own, buy and sell any securi.ties menti.oned at any time. This r.eport includes forw.ard-looki.ng stat.ements within the meaning of The Pri.vate Securi.ties Litig.ation Ref.orm Ac.t of 1995. These state.ments may include terms as "expe.ct", "bel.ieve", "ma.y", "wi.ll", "mo.ve","und.ervalued" and "inte.nd" or simil.ar terms. This news.letter was paid 11500 dollars from th.ird p.arty to se.nd this report. PL.EASE DO YOUR OWN D.UE DI.LIGENCE B.EFORE INVES.TING IN ANY PRO.FILED COMP.ANY. You may lo.se mon.ey from inve.sting in Pen.ny St.ocks. barycentric deform conservator cacophony critter addison armament complain difluoride boris discriminatory boron abo deoxyribose boorish compote belfast carolingian court albania accentuate belshazzar bridesmaid breakwater brandish average bolshevism coppery From kyeser at earthlink.net Sun Jul 25 04:25:14 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Sun Jul 25 04:25:14 2004 Subject: [Numpy-discussion] Permutation in Numpy Message-ID: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> #perm.py def perm(k): # Compute the list of all permutations of k if len(k) <= 1: return [k] r = [] for i in range(len(k)): s = k[:i] + k[i+1:] p = perm(s) for x in p: r.append(k[i:i+1] + x) return r Does anyone know if there is a built-in function in Numpy (or Numarray) that does the above task faster (computes the list of all permutations of a list, k)? Or is there a way to make the above function run faster using Numpy? I'm asking because I need to create a very large list which contains all permutations of range(12), in which case there would be 12! permutations. I created a file test.py: #!/usr/bin/env python from perm import perm print perm(range(12)) And ran the program: $ ./test.py >> list.txt The program ran for about 90 minutes and was still running on my machine (667 MHz PowerPC G4, 512 MB SDRAM) until I quit the process as I was getting nervous (and impatient). I would highly appreciate anyone's suggestions. Many thanks, Kye From gerard.vermeulen at grenoble.cnrs.fr Sun Jul 25 22:49:12 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Sun Jul 25 22:49:12 2004 Subject: [Numpy-discussion] Follow-up Numarray header PEP In-Reply-To: <1090603727.7138.33.camel@halloween.stsci.edu> References: <1088451653.3744.200.camel@localhost.localdomain> <20040629194456.44a1fa7f.gerard.vermeulen@grenoble.cnrs.fr> <1088536183.17789.346.camel@halloween.stsci.edu> <20040629211800.M55753@grenoble.cnrs.fr> <1088632459.7526.213.camel@halloween.stsci.edu> <20040718212443.M21561@grenoble.cnrs.fr> <1090603727.7138.33.camel@halloween.stsci.edu> Message-ID: <20040726050416.M83815@grenoble.cnrs.fr> Hi Todd, Attached is a new version of numnum (including 'topbot', an alternative implementation of numnum). The README contains some additional comments with respect to numarray and Numeric (new comments are preceeded by '+', old comments by '-'). There were still some other bugs in numnum, too. On 23 Jul 2004 13:28:47 -0400, Todd Miller wrote > I finally got to your numnum stuff today... awesome work! You've got > lots of good suggestions. Here are some comments: > > 1. Thanks for catching the early return problem with numarray's > import_array(). It's not just bad, it's wrong. It'll be fixed for 1.1. > > 2. That said, I think expanding the macros in-line in numnum is a > mistake. It seems to me that "import_array(); PyErr_Clear();" or > something like it ought to be enough... after numarray-1.1 anyway. > Indeed, but I am spoiled by C++ and was falling back on gcc -E for debugging. > > 3. I think there's a problem in numnum.toNP() because of numarray's > array "behavior" issues. A test needs to be done to ensure that the > incoming array is not byteswapped or misaligned; if it is, the easy > fix is to make a numarray copy of the array before copying it to Numeric. > Done, but what would be the best function to do this? And the documentation could insist a little more on the possibility of ill-behaved arrays (see README). > > 4. Kudos for the LP64 stuff. numconfig is a thorn in the side of the > PEP, so I'll put your techniques into numarray for 1.1. > HAS_FLOAT128 is not currently used, so it might be time to ditch > it. Anyway, thanks! > There is a difference between the PEP header files and internal numarray usage. I find in my CVS working copy: [packer at slow numarray]$ grep HAS_FLOAT */* Src/_ndarraymodule.c:#if HAS_FLOAT128 and [packer at slow numarray]$ grep HAS_UINT64 */* Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/buffer.ch: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 Src/libnumarraymodule.c: #if HAS_UINT64 but that is not be true for the header files (more important for the PEP) [packer at slow Include]$ grep HAS_UINT64 */* [packer at slow Include]$ grep HAS_FLOAT128 */* numarray/arraybase.h:#if HAS_FLOAT128 > > 5. PyArray_Present() and isArray() are superfluous *now*. I was > planning to add them to Numeric. > > 6. The LGPL may be a problem for us and is probably an issue if we ever > try to get numnum into the Python distribution. It would be better > to release numnum under the modified BSD license, same as numarray. > Done, with certain regrets because I believe in (L)GPL. The minutes of the last board meeting of the PSF tipped the scale ( http://www.python.org/psf/records/board/minutes-2004-06-18.html ) What remains to be done is showing how to add numnum's functionality to a 3rd party extension by linking numnum's object files to the extension instead of importing numnum's C-API (numnum should not become another dependency) Gerard > > 7. Your API struct was very clean. Eventually I'll regenerate numarray > like that. > > 8. I logged your comments and bug reports on Source Forge and eventually > they'll get fixed. > > A to Z the numnum/pep code is beautiful. Next stop, header PEP update. > > Regards, > Todd > > > On Sun, 2004-07-18 at 17:24, gerard.vermeulen at grenoble.cnrs.fr wrote: > > Hi Todd, > > > > This is a follow-up on the 'header pep' discussion. > > > > The attachment numnum-0.1.tar.gz contains the sources for the > > extension modules pep and numnum. At least on my systems, both > > modules behave as described in the 'numarray header PEP' when the > > extension modules implementing the C-API are not present (a situation > > not foreseen by the macros import_array() of Numeric and especially > > numarray). IMO, my solution is 'bona fide', but requires further > > testing. > > > > The pep module shows how to handle the colliding C-APIs of the Numeric > > and numarray extension modules and how to implement automagical > > conversion between Numeric and numarray arrays. > > > > For a technical reason explained in the README, the hard work of doing > > the conversion between Numeric and numarray arrays has been delegated > > to the numnum module. The numnum module is useful when one needs to > > convert from one array type to the other to use an extension module > > which only exists for the other type (eg. combining numarray's image > > processing extensions with pygame's Numeric interface): > > > > Python 2.3+ (#1, Jan 7 2004, 09:17:35) > > [GCC 3.3.1 (SuSE Linux)] on linux2 > > Type "help", "copyright", "credits" or "license" for more information. > > >>> import numnum; import Numeric as np; import numarray as na > > >>> np1 = np.array([[1, 2], [3, 4]]); na1 = numnum.toNA(np1) > > >>> na2 = na.array([[1, 2, 3], [4, 5, 6]]); np2 = numnum.toNP(na2) > > >>> print type(np1); np1; type(np2); np2 > > > > array([[1, 2], > > [3, 4]]) > > > > array([[1, 2, 3], > > [4, 5, 6]],'i') > > >>> print type(na1); na1; type(na2); na2 > > > > array([[1, 2], > > [3, 4]]) > > > > array([[1, 2, 3], > > [4, 5, 6]]) > > >>> > > > > The pep module shows how to implement array processing functions which > > use the Numeric, numarray or Sequence C-API: > > > > static PyObject * > > wysiwyg(PyObject *dummy, PyObject *args) > > { > > PyObject *seq1, *seq2; > > PyObject *result; > > > > if (!PyArg_ParseTuple(args, "OO", &seq1, &seq2)) > > return NULL; > > > > switch(API) { > > case NumericAPI: > > { > > PyObject *np1 = NN_API->toNP(seq1); > > PyObject *np2 = NN_API->toNP(seq2); > > result = np_wysiwyg(np1, np2); > > Py_XDECREF(np1); > > Py_XDECREF(np2); > > break; > > } > > case NumarrayAPI: > > { > > PyObject *na1 = NN_API->toNA(seq1); > > PyObject *na2 = NN_API->toNA(seq2); > > result = na_wysiwyg(na1, na2); > > Py_XDECREF(na1); > > Py_XDECREF(na2); > > break; > > } > > case SequenceAPI: > > result = seq_wysiwyg(seq1, seq2); > > break; > > default: > > PyErr_SetString(PyExc_RuntimeError, "Should never happen"); > > return 0; > > } > > > > return result; > > } > > > > See the README for an example session using the pep module showing that > > it is possible pass a mix of Numeric and numarray arrays to pep.wysiwyg(). > > > > Notes: > > > > - it is straightforward to adapt pep and numnum so that the conversion > > functions are linked into pep instead of imported. > > > > - numnum is still 'proof of concept'. I am thinking about methods to > > make those techniques safer if the numarray (and Numeric?) header > > files make it never into the Python headers (or make it safer to > > use those techniques with Python < 2.4). In particular it would > > be helpful if the numerical C-APIs export an API version number, > > similar to the versioning scheme of shared libraries -- see the > > libtool->versioning info pages. > > > > I am considering three possibilities to release a more polished > > version of numnum (3rd party extension writers may prefer to link > > rather than import numnum's functionality): > > > > 1. release it from PyQwt's project page > > 2. register an independent numnum project at SourceForge > > 3. hand numnum over to the Numerical Python project (frees me from > > worrying about API changes). > > > > > > Regards -- Gerard Vermeulen > > -- -- Open WebMail Project (http://openwebmail.org) -------------- next part -------------- A non-text attachment was scrubbed... Name: numnum-0.2.tar.gz Type: application/gzip Size: 19729 bytes Desc: not available URL: From perry at stsci.edu Mon Jul 26 08:44:06 2004 From: perry at stsci.edu (Perry Greenfield) Date: Mon Jul 26 08:44:06 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <40FFB132.10103@sympatico.ca> Message-ID: I'll try to see if I can address all the comments raised (please let me know if I missed something). 1) Russell Owen asked that indexing by field name not be permitted for record arrays and at least one other agreed. Since it is easier to add something like this later rather than take it away, I'll go along with that. So while it will be possible to index a Record by field name, it won't be for record arrays. 2) Russell asked if it would be possible to specify the types of the fields using numarray/chararray type objects. Yes, it will. We will adopt Rick White's 2nd suggestion for handling fields that themselves are arrays, I.e., formats = (3,Int16), ((4,5), Float32) For a 1-d Int16 cell of shape (3,) and a 2-d Float32 cell of shape (4,5) The first suggestion ("formats = 3*(Int16,), 4*(5*(Float32,),)") will not be supported. While it is very suggestive, it does allow for inconsistent nestings that must be checked and rejected (what if someone supplies (Int16, Int16, Float32) as one of the fields?) which complicates the code. It doesn't read as well. 3) Russell also suggested nesting record arrays. This sort of capability is not being ruled out, but there isn't a chance we can devote resources to this any time soon (can anyone else?) 4) To address the suggestions of Russell and Francesc, I'm proposing that the current "field" method now become an object (callable to retain backward compatibility) that supports: a) indexing by name or number (just like Records) b) name to attribute mapping (with restrictions). So that this means 3 ways to do things! As far as attribute access goes, I simply do not want to throw arbitrary attributes into the main object itself. The use of field is comparatively clean since it has not other public attributes. Aside from mapping '_' into spaces, no other illegal attribute characters will be mapped. (The identifier/label suggestion by Colin Williams has some merit, but on the whole, I think it brings more baggage than benefit). The mapping algorithm is such that it tries to map the attribute to any field name that has either a ' ' or '_' in the place of '_' in the attribute name. While all '_' in the name will take precedence over any other match, there will be no guaranteed order for other cases (e.g., 'x_y z' vs 'x y_z' vs 'x y z'; though 'x_y_z' would be guaranteed to be selected for field.x_y_z if present) Note that the only real need to support indexing other than consistency is to support slices. Only slices for numerical indexing will be supported (and not initially). The callable syntax can support index arrays just as easily. To summarize Rarr.field.home_address Rarr.field['home address'] Rarr.field('home address') Will all work for a field named "home address" ************************************************ Any comments on these changes to the proposal? Are there those that are opposed to supporting attribute access? Thanks, Perry From rowen at u.washington.edu Mon Jul 26 09:40:06 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Mon Jul 26 09:40:06 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: >I'll try to see if I can address all the comments raised (please let me know >if I missed something). >...(nice proposal elided)... >Any comments on these changes to the proposal? Are there those that are >opposed to supporting attribute access? Overall this sounds great. However, I am still strongly against attribute access. Attributes are usually meant for names that are intrinsic to the design of an object, not to the user's "configuration" of the object. The name mapping proposal isn't bad (thank you for keeping it simple!), but it still feels like a kludge and it adds unnecessary clutter. Your explanation of this limitations was clear, but still, imagine putting that into the manual. It's a lot of "be careful of this" info. That's a red flag to me. Imagine all the folks who don't read carefully. Also imagine those who consider attribute access "the right way to do it" and so want to clean up the limitations. I think you'll see a steady stream of: "why can't I see my field..." "why can't you solve the collision problems" "why can't I use special character thus and so" I personally feel that when a feature is hard to document or adds strange limitations then it probably suggests a flawed design. In this case there is another mechanism that is more natural, has no funny corner cases, and is much more powerful. Its only disadvantage is the need for typing for 4 extra characters. Saving 4 characters simply not sufficient reason to add this dubious feature. Before implementing attribute access I have two suggestions (which can be taken singly or together): - Postpone the decision until after the rest of the proposal is implemented. See if folks are happy with the mechanisms that are available. I freely confess to hoping that momentum will then kill the idea. - Discuss it on comp.lang.py. I'd like to see it aired more widely before being adopted. So far I've seen just a few voices for it and a few others against it. I realize it's not a democracy -- those who write the code get the final say. I also realize some folks will always want it, but that tension between simplicity and expressiveness is intrinsic to any language. If you add everything anybody wants you get a mess, and I want to avoid this mess while we still can. I hope nobody takes offense. I certainly did not mean to imply that those who wish attribute access are inferior in any way. There are features of python I wish it had that will never occur. I honestly can see the appeal of attributes; I was in favor of them myself, early on. It adds an appealing expressiveness that makes some kind of code read more naturally. But I personally feel it has too many limitations and is unnecessary. Regards, -- Russell From falted at pytables.org Mon Jul 26 11:12:18 2004 From: falted at pytables.org (Francesc Alted) Date: Mon Jul 26 11:12:18 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <200407262011.33067.falted@pytables.org> Hi, Perry, your last proposal sounds good to me. Just a couple of comments. A Dilluns 26 Juliol 2004 17:43, Perry Greenfield va escriure: > 4) To address the suggestions of Russell and Francesc, I'm proposing that > the current "field" method now become an object (callable to retain backward > compatibility) that supports: > a) indexing by name or number (just like Records) > b) name to attribute mapping (with restrictions). > So that this means 3 ways to do things! As far as attribute access goes, I > simply do not want to throw arbitrary attributes into the main object > itself. The use of field is comparatively clean since it has not other > public attributes. Aside from mapping '_' into spaces, no other illegal > attribute characters will be mapped. (The identifier/label suggestion by > Colin Williams has some merit, but on the whole, I think it brings more > baggage than benefit). The mapping algorithm is such that it tries to map > the attribute to any field name that has either a ' ' or '_' in the place of > '_' in the attribute name. While all '_' in the name will take precedence > over any other match, there will be no guaranteed order for other cases > (e.g., 'x_y z' vs 'x y_z' vs 'x y z'; though 'x_y_z' would be guaranteed to > be selected for field.x_y_z if present) I guess that this mapping algorithm is weak enough to create some problems with special chars that are not suported. I'd prefer the dictionary/tuple of pairs mechanism in order to create a user-configured translation. I don't see the problem that Perry mentioned in an earlier message related with guarantying the persistence of such an object: we always have pickle, isn't it? or I'm missing something? > To summarize > > Rarr.field.home_address > Rarr.field['home address'] > Rarr.field('home address') Supporting Rarr.field['home address'] and Rarr.field('home address') at the same time sounds unnecessary to me. Moreover having a Rarr.field('home_address')[32] (for example) looks a bit strange, and I think Rarr.field['home_address'][32] would be better. But I repeat, this is my personal feeling. I know that dropping support of __call__() in field will make the change backward incompatible, but perhaps now is a good time to define a better interface to the RecArray object. Another possibility maybe to raise a deprecation warning for such an use for a couple of releases. Regards, -- Francesc Alted From barrett at stsci.edu Mon Jul 26 11:25:09 2004 From: barrett at stsci.edu (Paul Barrett) Date: Mon Jul 26 11:25:09 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <41054B5E.8010801@stsci.edu> Russell E Owen wrote: > At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: > >> I'll try to see if I can address all the comments raised (please let >> me know >> if I missed something). >> ...(nice proposal elided)... >> Any comments on these changes to the proposal? Are there those that are >> opposed to supporting attribute access? > > > Overall this sounds great. > > However, I am still strongly against attribute access. > > Attributes are usually meant for names that are intrinsic to the design > of an object, not to the user's "configuration" of the object. The name > mapping proposal isn't bad (thank you for keeping it simple!), but it > still feels like a kludge and it adds unnecessary clutter. > > Your explanation of this limitations was clear, but still, imagine > putting that into the manual. It's a lot of "be careful of this" info. > That's a red flag to me. Imagine all the folks who don't read carefully. > Also imagine those who consider attribute access "the right way to do > it" and so want to clean up the limitations. I think you'll see a steady > stream of: > "why can't I see my field..." > "why can't you solve the collision problems" > "why can't I use special character thus and so" > > I personally feel that when a feature is hard to document or adds > strange limitations then it probably suggests a flawed design. > > In this case there is another mechanism that is more natural, has no > funny corner cases, and is much more powerful. Its only disadvantage is > the need for typing for 4 extra characters. Saving 4 characters simply > not sufficient reason to add this dubious feature. > > Before implementing attribute access I have two suggestions (which can > be taken singly or together): > - Postpone the decision until after the rest of the proposal is > implemented. See if folks are happy with the mechanisms that are > available. I freely confess to hoping that momentum will then kill the > idea. > - Discuss it on comp.lang.py. I'd like to see it aired more widely > before being adopted. So far I've seen just a few voices for it and a > few others against it. I realize it's not a democracy -- those who write > the code get the final say. I also realize some folks will always want > it, but that tension between simplicity and expressiveness is intrinsic > to any language. If you add everything anybody wants you get a mess, and > I want to avoid this mess while we still can. > > I hope nobody takes offense. I certainly did not mean to imply that > those who wish attribute access are inferior in any way. There are > features of python I wish it had that will never occur. I honestly can > see the appeal of attributes; I was in favor of them myself, early on. > It adds an appealing expressiveness that makes some kind of code read > more naturally. But I personally feel it has too many limitations and is > unnecessary. That pretty much sums up my opinion. :) -- Paul -- Paul Barrett, PhD Space Telescope Science Institute Phone: 410-338-4475 ESS/Science Software Branch FAX: 410-338-4767 Baltimore, MD 21218 From falted at pytables.org Mon Jul 26 11:29:19 2004 From: falted at pytables.org (Francesc Alted) Date: Mon Jul 26 11:29:19 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <200407262028.41129.falted@pytables.org> A Dilluns 26 Juliol 2004 18:38, Russell E Owen va escriure: > In this case there is another mechanism that is more natural, has no Well, I guess that depends on what you understand as "natural". For example, for me the "natural" way is adding attributes. However, I must recognize that my point of view could be biased because this can be far more advantageous in the context of large hierarchies of objects where you should specify the complete path to go somewhere. This is typical on software to treat XML documents or any kind of hierarchical data organization system. For a relatively plain structure like RecArray I can understand that this can be regarded as unnecessary. But nevertheless, its adoption continue to sound appealling to me. Anyway, I'd be happy with any decision (regarding field attribute adoption) that would be made. > I hope nobody takes offense. I certainly did not mean to imply that Not at all. Discussing is a good (the best?) way to learn more :) -- Francesc Alted From rowen at u.washington.edu Mon Jul 26 11:30:01 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Mon Jul 26 11:30:01 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <200407262011.33067.falted@pytables.org> References: <200407262011.33067.falted@pytables.org> Message-ID: At 8:11 PM +0200 2004-07-26, Francesc Alted wrote: >... >Supporting Rarr.field['home address'] and Rarr.field('home address') at the >same time sounds unnecessary to me. Moreover having a >Rarr.field('home_address')[32] (for example) looks a bit strange, and I >think Rarr.field['home_address'][32] would be better. But I repeat, this is >my personal feeling. > >I know that dropping support of __call__() in field will make the change >backward incompatible, but perhaps now is a good time to define a better >interface to the RecArray object. Another possibility maybe to raise a >deprecation warning for such an use for a couple of releases. I completely agree. -- Russell From rlw at stsci.edu Mon Jul 26 11:45:11 2004 From: rlw at stsci.edu (Rick White) Date: Mon Jul 26 11:45:11 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: Message-ID: On Mon, 26 Jul 2004, Russell E Owen wrote: > Overall this sounds great. > > However, I am still strongly against attribute access. > > [...] > > In this case there is another mechanism that is more natural, has no > funny corner cases, and is much more powerful. Its only disadvantage > is the need for typing for 4 extra characters. Saving 4 characters > simply not sufficient reason to add this dubious feature. I am sympathetic with Russell's point of view on this, but I do think there is more to gain than just typing 4 additional characters. When you read code that is using the dictionary version of attributes, you also are required to read and mentally parse those 4 additional characters. There is value to having clean, easily readable code that goes well beyond saving a little extra typing. If we didn't care about that, we'd probably all be using Perl. :-) Also, I like to use tab-completion during my interactive use of Python. I know how to make that work with attributes, even dynamically created attributes like those for record arrays. And it is really nice to be able to type and have it fill in a name or give a list of all the available columns. Doing that with the string/dictionary approach could be possible, I guess, but it is a lot trickier. So I do think there are some good reasons for wanting attribute access. Whether they are strong enough to counter Russell's sensible arguments about not cluttering up the interface and documentation, I'm not sure. My personal preference would be to get rid of the mapping between blanks and underscore and to do no mapping of any kind. Then if a column has a name that maps to a legal Python variable, you can access it with an attribute, and if it doesn't then you can't. That doesn't sound particular hard to understand or explain to me. Rick From hsu at stsci.edu Mon Jul 26 13:40:04 2004 From: hsu at stsci.edu (Jin-chung Hsu) Date: Mon Jul 26 13:40:04 2004 Subject: [Numpy-discussion] plot dense and large arrays, AGG limit? Message-ID: <200407262039.APA12769@donner.stsci.edu> One would expect the following will fill up the plot window: >>> n=zeros(20000) >>> n[::2]=1 >>> plot(n) The plot "stops" a little more than half way, as if it "runs out of ink". It happens on Linux as well as Solaris, using either numarray and Numeric, and both TkAgg and GTKAgg, but not GTK. Is this due to some AGG limitation? JC Hsu From cjw at sympatico.ca Mon Jul 26 14:42:01 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Mon Jul 26 14:42:01 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <41057A71.40707@sympatico.ca> Russell E Owen wrote: > At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: > >> I'll try to see if I can address all the comments raised (please let >> me know >> if I missed something). >> ...(nice proposal elided)... >> Any comments on these changes to the proposal? Are there those that are >> opposed to supporting attribute access? > > > Overall this sounds great. > > However, I am still strongly against attribute access. > > Attributes are usually meant for names that are intrinsic to the > design of an object, not to the user's "configuration" of the object. Russell, I hope that you will elaborate this distinction between design and usage. On the face of it, I would have though that the two should be closely related. > The name mapping proposal isn't bad (thank you for keeping it > simple!), but it still feels like a kludge and it adds unnecessary > clutter. > > Your explanation of this limitations was clear, but still, imagine > putting that into the manual. It's a lot of "be careful of this" info. > That's a red flag to me. Imagine all the folks who don't read > carefully. Also imagine those who consider attribute access "the right > way to do it" and so want to clean up the limitations. I think you'll > see a steady stream of: > "why can't I see my field..." > "why can't you solve the collision problems" > "why can't I use special character thus and so" > > I personally feel that when a feature is hard to document or adds > strange limitations then it probably suggests a flawed design. > > In this case there is another mechanism that is more natural, has no > funny corner cases, and is much more powerful. Its only disadvantage > is the need for typing for 4 extra characters. Saving 4 characters > simply not sufficient reason to add this dubious feature. > > Before implementing attribute access I have two suggestions (which can > be taken singly or together): > - Postpone the decision until after the rest of the proposal is > implemented. See if folks are happy with the mechanisms that are > available. I freely confess to hoping that momentum will then kill the > idea. > - Discuss it on comp.lang.py. I'd like to see it aired more widely > before being adopted. So far I've seen just a few voices for it and a > few others against it. I realize it's not a democracy -- those who > write the code get the final say. I also realize some folks will > always want it, but that tension between simplicity and expressiveness > is intrinsic to any language. If you add everything anybody wants you > get a mess, and I want to avoid this mess while we still can. There is merit to this suggestion. It would expose the proposal to other expeiences. > > > I hope nobody takes offense. I certainly did not mean to imply that > those who wish attribute access are inferior in any way. There are > features of python I wish it had that will never occur. I honestly can > see the appeal of attributes; I was in favor of them myself, early on. > It adds an appealing expressiveness that makes some kind of code read > more naturally. But I personally feel it has too many limitations and > is unnecessary. > > Regards, > > -- Russell Perry Greefield summarized: Rarr.field.home_address Rarr.field['home address'] Rarr.field('home address') Will all work for a field named "home address" This is good, it gives the desired functionality. One minor suggestion. We have Rarr.X.home_address, I believe that, in earlier posting, someone suggested that X.home_address really identifies a column rather than a field. Suppose that home_address is field number 6 in the record, Would Rarr.field[6] be equivalent to the above? This may appear redundant, but it gives a method for selecting a group of columns, eg. Rarr.field[6:9] Finally, would Rarr.field.home_address.city or Rarr.field.work_address.city be legitimate? As Russell Owen pointed out, at the end of the day Perry Greenfield will use his judgement as to the best arrangement and we will all live with it. Colin W, > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From Fernando.Perez at colorado.edu Mon Jul 26 18:19:10 2004 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Mon Jul 26 18:19:10 2004 Subject: [Numpy-discussion] ANN: IPython 0.6.1 is officially out Message-ID: <4105AD66.6030002@colorado.edu> [Please forgive the cross-post, but since I know many scipy/numpy users are also ipython users, and this is a fairly significant update, I decided it was worth doing it.] Hi all, I've just uplodaded officially IPython 0.6.1. Many thanks to all who contributed comments, bug reports, ideas and patches. I'd like in particular to thank Ville Vainio, who helped a lot with many of the features for pysh, and was willing to put code in front of his ideas. As always, a big Thank You goes to Enthought and the Scipy crowd for hosting ipython and all its attending support services (bug tracker, mailing lists, website and downloads, etc). The download location, as usual, is: http://ipython.scipy.org/dist A detailed NEWS file can be found here: http://ipython.scipy.org/NEWS, so I won't repeat it. I will only mention the highlights of this released compared to 0.6.0: * BACKWARDS-INCOMPATIBLE CHANGE: Users will need to update their ipythonrc files and replace '%n' with '\D' in their prompt_in2 settings everywhere. Sorry, but there's otherwise no clean way to get all prompts to properly align. The ipythonrc shipped with IPython has been updated. * 'pysh' profile, which allows you to use ipython as a system shell. This includes mechanisms for easily capturing shell output into python strings and lists, and for expanding python variables back to the shell. It is started, like all profiles, with 'ipython -p pysh'. The following is a brief example of the possibilities: planck[~/test]|3> $$a=ls *.py planck[~/test]|4> type(a) <4> planck[~/test]|5> for f in a: |.> if f.startswith('e'): |.> wc -l $f |.> 113 error.py 9 err.py 2 exit2.py 10 exit.py You can get the necessary profile into your ~/.ipython directory by running 'ipython -upgrade', or by copying it from the IPython/UserConfig directory (ipythonrc-pysh). Note that running -upgrade will rename your existing config files to prevent clobbering them with new ones. This feature had been long requested by many users, and it's at last officially part of ipython. * Improved the @alias mechanism. It is now based on a fast, lightweight dictionary implementation, which was a requirement for making the pysh functionality possible. A new pair of magics, @rehash and @rehashx, allow you to load ALL of your $PATH into ipython as aliases at runtime. * New plot2 function added to the Gnuplot support module, to plot dictionaries and lists/tuples of arrays. Also added automatic EPS generation to hardcopy(). * History is now profile-specific. * New @bookmark magic to keep a list of directory bookmarks for quick navigation. * New mechanism for profile-specific persistent data storage. Currently only the new @bookmark system uses it, but it can be extended to hold arbitrary picklable data in the future. * New @system_verbose magic to view all system calls made by ipython. * For Windows users: all this functionality now works under Windows, but some external libraries are required. Details here: http://ipython.scipy.org/doc/manual/node2.html#sub:Under-Windows * Fix bugs with '_' conflicting with the gettext library. * Many, many other bugfixes and minor enhancements. See the NEWS file linked above for the full details. Enjoy, and please report any problems. Best, Fernando Perez. From cjw at sympatico.ca Tue Jul 27 11:22:27 2004 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Jul 27 11:22:27 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: <41057A71.40707@sympatico.ca> Message-ID: <41069D3A.5090903@sympatico.ca> Russell E Owen wrote: > At 5:41 PM -0400 2004-07-26, Colin J. Williams wrote: > >> Russell E Owen wrote: >> >>> At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: >>> >>>> I'll try to see if I can address all the comments raised (please >>>> let me know >>>> if I missed something). >>>> ...(nice proposal elided)... >>>> Any comments on these changes to the proposal? Are there those >>>> that are >>>> opposed to supporting attribute access? >>> >>> >>> >>> Overall this sounds great. >>> >>> However, I am still strongly against attribute access. >>> >>> Attributes are usually meant for names that are intrinsic to the >>> design of an object, not to the user's "configuration" of the object. >> >> >> Russell, I hope that you will elaborate this distinction between >> design and usage. On the face of it, I would have though that the >> two should be closely related. > > > To my mind, the design of an object describes the intended behavior of > the object: what kind of data can it deal with and what should it do > to that data. It tends to be "static" in the sense that it is not a > function of how the object is created or what data is contained in the > object. The design of the object usually drives the choice of the > attributes of the object (variables and methods). > > On the other hand, the user's "configuration" of the object is what > the user has done to make a particular instance of an object unique -- > the data the user has been loaded into the object. > > I consider the particular named fields of a record array to fall into > the latter category. But it is a gray area. Somebody else might argue > that the record array constructors is an object factory, turning out > an object designed by the user. From that alternative perspective, > adding attributes to represent field names is perhaps more natural as > a design. > > I think the main issues are: > - Are there too many ways to address things? (I say yes) This could be true. I guess the test is whether there is a rational justification for each way. > > - Field name mapping: there is no trivial 1:1 mapping between valid > field names and valid attribute names. If one starts with the assumption that field/attribute names are compatible with Python names, then I don't see that this is a problem. The question has been raised as to whether a wider range of names should be permitted e.g.. including such characters as ~`()!???. My view is that such characters should be considered acceptable for data labels, but not for data names. i.e. they are for display, not for manipulation. > > - Nested access. Not sure about this one, but I'd like to hear more. A RecArray is made of of a number of records, each of the same length and data configuration. Each field of a record is of fixed length and type. It wouldn't be a big leap to permit another record in one of the fields. Suppose we have an address record aRec and a personnel record pRec and that rArr is an array of pRec. aRec street: a30 city:a20 postalCode: a7 pRec id: i4 firstName: a15 lastName: a20 homeAddress: aRec workAddress: aRec Then rArr[16].homeAddress.city could give us the hime city for person 16 in rArr > > > If we do end up with attributes for field names, I really like Rick > White's suggestion of adding an attribute for a field only if the > field name is already a valid attribute name. That neatly avoids the > collision issue and is simple to document. > > -- Russell Best wishes, Colin W. > > From falted at pytables.org Tue Jul 27 11:48:00 2004 From: falted at pytables.org (Francesc Alted) Date: Tue Jul 27 11:48:00 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <41069D3A.5090903@sympatico.ca> References: <41069D3A.5090903@sympatico.ca> Message-ID: <200407272046.52761.falted@pytables.org> A Dimarts 27 Juliol 2004 20:21, Colin J. Williams va escriure: > If one starts with the assumption that field/attribute names are > compatible with Python names, then I don't see that this is a problem. > The question has been raised as to whether a wider range of names should > be permitted e.g.. including such characters as ~`()!???. My view is > that such characters should be considered acceptable for data labels, > but not for data names. i.e. they are for display, not for manipulation. I finally was able to see your point. You mean that naming a field with a non-python identifier would be forbidden, and provide another attribute (like 'title', for example) in case the user wants to add some kind of data label. Kind of: records.array([...], names=["c1","c2","c3"], titles=["F one","time&dime","??"]) and have a new attribute called "titles" that keeps this info. Well, I think that would be a very nice solution IMO. -- Francesc Alted From gerard.vermeulen at grenoble.cnrs.fr Tue Jul 27 13:05:06 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Tue Jul 27 13:05:06 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <200407272046.52761.falted@pytables.org> References: <41069D3A.5090903@sympatico.ca> <200407272046.52761.falted@pytables.org> Message-ID: <20040727191434.M48392@grenoble.cnrs.fr> On Tue, 27 Jul 2004 20:46:52 +0200, Francesc Alted wrote > A Dimarts 27 Juliol 2004 20:21, Colin J. Williams va escriure: > > If one starts with the assumption that field/attribute names are > > compatible with Python names, then I don't see that this is a problem. > > The question has been raised as to whether a wider range of names should > > be permitted e.g.. including such characters as ~`()!???. My view is > > that such characters should be considered acceptable for data labels, > > but not for data names. i.e. they are for display, not for manipulation. > > I finally was able to see your point. You mean that naming a field > with a non-python identifier would be forbidden, and provide another > attribute > (like 'title', for example) in case the user wants to add some kind > of data label. Kind of: > > records.array([...], names=["c1","c2","c3"], titles=["F one", > "time&dime","??"]) > > and have a new attribute called "titles" that keeps this info. > > Well, I think that would be a very nice solution IMO. > I agree with Rick, Colin and Francesc on this point: symbolic names are important and I like the commandline completion too. However, I have another concern: Introducing recordArray["column"] as an alternative for recordArray.field("column") breaks a symmetry between for instance 1-d record arrays and 2-d normal arrays. (the symmetry is strongly suggested by their representation: a record array prints almost as a list of tuples and a 2-d normal array almost as a list of lists). Indexing a column of a 2-d normal array is done by normalArray[:, column], so why not recArray[:, "column"] ? It removes the ambiguity between indexing with integers and with strings. Also, leaving the indices in 'natural' order becomes especially important when one envisages (record) arrays containing (record) arrays containing .... I understand that this seems to open the door to recArray[32, "column"], but if it is really not feasible to mix integers and strings (or attribute names) as indices, I prefer to use recordArray.column[32] and/or recordArray[32].column rather than recordArray["column"][32]. Even indexing with integers only seems more natural to me than eg. recordArray["column"][32], sincy I can always do: column = 7 recordArray[32, column] Regards -- Gerard From rowen at u.washington.edu Tue Jul 27 13:44:02 2004 From: rowen at u.washington.edu (Russell E Owen) Date: Tue Jul 27 13:44:02 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <41057A71.40707@sympatico.ca> References: <41057A71.40707@sympatico.ca> Message-ID: At 5:41 PM -0400 2004-07-26, Colin J. Williams wrote: >Russell E Owen wrote: > >> At 11:43 AM -0400 2004-07-26, Perry Greenfield wrote: >> >>> I'll try to see if I can address all the comments raised (please >>>let me know >>> if I missed something). >>> ...(nice proposal elided)... >>> Any comments on these changes to the proposal? Are there those that are >>> opposed to supporting attribute access? >> >> >> Overall this sounds great. >> >> However, I am still strongly against attribute access. >> >> Attributes are usually meant for names that are intrinsic to the >>design of an object, not to the user's "configuration" of the >>object. > >Russell, I hope that you will elaborate this distinction between >design and usage. On the face of it, I would have though that the >two should be closely related. To my mind, the design of an object describes the intended behavior of the object: what kind of data can it deal with and what should it do to that data. It tends to be "static" in the sense that it is not a function of how the object is created or what data is contained in the object. The design of the object usually drives the choice of the attributes of the object (variables and methods). On the other hand, the user's "configuration" of the object is what the user has done to make a particular instance of an object unique -- the data the user has been loaded into the object. I consider the particular named fields of a record array to fall into the latter category. But it is a gray area. Somebody else might argue that the record array constructors is an object factory, turning out an object designed by the user. From that alternative perspective, adding attributes to represent field names is perhaps more natural as a design. I think the main issues are: - Are there too many ways to address things? (I say yes) - Field name mapping: there is no trivial 1:1 mapping between valid field names and valid attribute names. - Nested access. Not sure about this one, but I'd like to hear more. If we do end up with attributes for field names, I really like Rick White's suggestion of adding an attribute for a field only if the field name is already a valid attribute name. That neatly avoids the collision issue and is simple to document. -- Russell From falted at pytables.org Wed Jul 28 03:01:23 2004 From: falted at pytables.org (Francesc Alted) Date: Wed Jul 28 03:01:23 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <20040727191434.M48392@grenoble.cnrs.fr> References: <200407272046.52761.falted@pytables.org> <20040727191434.M48392@grenoble.cnrs.fr> Message-ID: <200407281200.41748.falted@pytables.org> A Dimarts 27 Juliol 2004 22:04, gerard.vermeulen at grenoble.cnrs.fr va escriure: > Introducing recordArray["column"] as an alternative for > recordArray.field("column") breaks a symmetry between for instance 1-d > record arrays and 2-d normal arrays. (the symmetry is strongly suggested > by their representation: a record array prints almost as a list of tuples > and a 2-d normal array almost as a list of lists). > > Indexing a column of a 2-d normal array is done by normalArray[:, column], > so why not recArray[:, "column"] ? Well, I must recognize that this has its beauty (by revealing the simmetry that you mentioned). However, mixing integer and strings on indices can be, in my opinion, rather confusing for most people. Then, I guess that the implementation wouldn't be easy. > I prefer to use > > recordArray.column[32] > > and/or > > recordArray[32].column > > rather than recordArray["column"][32]. I would prefer better: recordArray.fields.column[32] or recordArray.cols.column[32] (note the use of the plural in fields and cols, which I think is more consistent about its functionality) The problem with: recordArray[32].fields.column is that I don't see it as natural and besides, completion capabilities would be broken after the [] parenthesis. Anyway, as Russell suggested, I don't like recordArray["column"][32], because it would be unnecessary (you can get same result using recordArray[column_idx][32]). Although I recognize that a recordArray.cols["column"][32] would not hurt my eyes so much. This is because although indices continues to mix ints and strings, the difference is that ".cols" is placed first, giving a new (and unmistakable) meaning to the "column" index. Cheers, -- Francesc Alted From gerard.vermeulen at grenoble.cnrs.fr Wed Jul 28 07:00:11 2004 From: gerard.vermeulen at grenoble.cnrs.fr (Gerard Vermeulen) Date: Wed Jul 28 07:00:11 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <200407281200.41748.falted@pytables.org> References: <200407272046.52761.falted@pytables.org> <20040727191434.M48392@grenoble.cnrs.fr> <200407281200.41748.falted@pytables.org> Message-ID: <20040728155908.28cc135e.gerard.vermeulen@grenoble.cnrs.fr> On Wed, 28 Jul 2004 12:00:40 +0200 Francesc Alted wrote: > A Dimarts 27 Juliol 2004 22:04, gerard.vermeulen at grenoble.cnrs.fr va escriure: > > Introducing recordArray["column"] as an alternative for > > recordArray.field("column") breaks a symmetry between for instance 1-d > > record arrays and 2-d normal arrays. (the symmetry is strongly suggested > > by their representation: a record array prints almost as a list of tuples > > and a 2-d normal array almost as a list of lists). > > > > Indexing a column of a 2-d normal array is done by normalArray[:, column], > > so why not recArray[:, "column"] ? > > Well, I must recognize that this has its beauty (by revealing the simmetry > that you mentioned). However, mixing integer and strings on indices can > be, in my opinion, rather confusing for most people. Then, I guess that > the implementation wouldn't be easy. > > > I prefer to use > > > > recordArray.column[32] > > > > and/or > > > > recordArray[32].column > > > > rather than recordArray["column"][32]. > > I would prefer better: > > recordArray.fields.column[32] > > or > > recordArray.cols.column[32] > > (note the use of the plural in fields and cols, which I think is more > consistent about its functionality) > > The problem with: > > recordArray[32].fields.column > > is that I don't see it as natural and besides, completion capabilities > would be broken after the [] parenthesis. > Two points: 1. This is true for vanilla Python but not for IPython-0.6.2: packer at zombie:~> ipython Python 2.3+ (#1, Jan 7 2004, 09:17:35) Type "copyright", "credits" or "license" for more information. IPython 0.6.2 -- An enhanced Interactive Python. ? -> Introduction to IPython's features. @magic -> Information about IPython's 'magic' @ functions. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: d = {'Francesc': 0} In [2]: d['Francesc'].__a d['Francesc'].__abs__ d['Francesc'].__add__ d['Francesc'].__and__ In [2]: d['Francesc'].__a You see, the completion mechanism of ipython recognizes d['Francesc'] as an integer. 2. If one accepts that a "field_name" can be used as an attribute, one must be able to say: record.field_name ( == record.field("field_name") ) and (since recordArray[32] returns a record) also: recordArray[32].field_name and not recordArray[32].cols.field_name (sorry, I abhor this) > > Anyway, as Russell suggested, I don't like recordArray["column"][32], > because it would be unnecessary (you can get same result using > recordArray[column_idx][32]). > Thank you for this little slip, you mean recordArray["column"][32] is recordArray[32][column_idx], isn't it? > > Although I recognize that a recordArray.cols["column"][32] would not hurt > my eyes so much. This is because although indices continues to mix ints > and strings, the difference is that ".cols" is placed first, giving a new > (and unmistakable) meaning to the "column" index. > I am just worried that future generalization of indexing will be impossible if the meaning of an indexing operation ("get row" or "get column or field") depends on the fact that an index is a string or an integer: IMO the meaning should depend on the position in the index list. The example has been choosen to show that I don't mind indexing by strings at all. If I see array[13, 'ab', 31, 'ba'], I know that 'ab' and 'ba' index record fields as long as the indices are in 'normal' order. Nevertheless, I am aware that Utopia may be hard to implement efficiently, but this reflects my mental picture of nested (record) arrays. (ipython in Utopia would me allow to figure out array[13].ab[31].ba by tab completion and I would translate this to array[13, 'ab', 31, 'ba'] for efficiency in a real program) I think that we agree that recordArray.cols["column"] is better than recordArray["column"], but I don't see why recordArray.cols["column"] is better than the original recordArray.field("column"). Cheers -- Gerard PS: after reading the above, there may be a case to accept only indexing which can be read from left to right, so recordArray[32].field_name is OK, but recordArray.field_name[32] is not. From falted at pytables.org Wed Jul 28 11:16:12 2004 From: falted at pytables.org (Francesc Alted) Date: Wed Jul 28 11:16:12 2004 Subject: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: <20040728155908.28cc135e.gerard.vermeulen@grenoble.cnrs.fr> References: <200407281200.41748.falted@pytables.org> <20040728155908.28cc135e.gerard.vermeulen@grenoble.cnrs.fr> Message-ID: <200407282015.48875.falted@pytables.org> A Dimecres 28 Juliol 2004 15:59, Gerard Vermeulen va escriure: > Two points: > > 1. This is true for vanilla Python but not for IPython-0.6.2: > You see, the completion mechanism of ipython recognizes d['Francesc'] as an > integer. Ok. That's nice. IPython is more powerful than I realized :) > 2. If one accepts that a "field_name" can be used as an attribute, > one must be able to say: > > record.field_name ( == record.field("field_name") ) > > and (since recordArray[32] returns a record) also: > > recordArray[32].field_name > > and not > > recordArray[32].cols.field_name (sorry, I abhor this) Mmm, maybe are you suggesting that the records.Record class had all its methods starting by a reserved prefix (like "_" or better, "_v_" for attrs and "_f_" for methods), and forbid that field names would start by these prefixes so that no collision problems would occur with field names?. Well, in such a case adopting this convention for records.Record objects would be far more feasible than doing the same for records.RecArray objects just because the former has very few attrs and methods. I think it's a good idea overall. > > Anyway, as Russell suggested, I don't like recordArray["column"][32], > > because it would be unnecessary (you can get same result using > > recordArray[column_idx][32]). > > > > Thank you for this little slip, you mean recordArray["column"][32] is > recordArray[32][column_idx], isn't it? Uh, my bad. I was (badly) trying to express the same than Russell Owen on a message dated from 20th July: """ I think recarray[field name] is too easily confused with recarray[index] and is unnecessary. """ > I think that we agree that recordArray.cols["column"] is better than > recordArray["column"], but I don't see why recordArray.cols["column"] is > better than the original recordArray.field("column"). Good question. Me neither. You are proposing just keeping recordArray.cols.column as the only way to access columns? > PS: after reading the above, there may be a case to accept only indexing > which can be read from left to right, so > recordArray[32].field_name is OK, but recordArray.field_name[32] is not. Sorry, I don't see the point here (it is most probably my fault given the hours I'm writing this :(. May you elaborate that? Cheers, -- Francesc Alted From perry at stsci.edu Wed Jul 28 15:02:04 2004 From: perry at stsci.edu (Perry Greenfield) Date: Wed Jul 28 15:02:04 2004 Subject: FW: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: Message-ID: I guess I've seen enough discussion to try to refine the last delta into what is the last (or next to last) version: So here are the changes to the last updated proposal: 1) I originally intended to narrow attribute access to strictly legal names as Rick White suggested but something got into me to try to handle spaces. I agree with Rick on this. I see that as a very simple rule to remember and don't see it as confusing to allow this. 2) Attribute access still won't be permitted directly on record arrays or records. I'm very much in agreement with Francesc that "fields" is more suggestive than "field" as to the record and record array object that permits both indexing and attribute access by name. The use of the field method will remain, but will eventually be deprecated. As to other names, namely cols, I'll stick with fields since it started with that usage, and that field is a more appropriate term when dealing with multidimensional record arrays (columns is much more suggestive of simple tables). Non changes: 3) It will not be possible to index record arrays by column name. So Rarr["column 1"] will not be permitted, but Rarr.fields["column 1"] will. Nor will Rarr[32, "column 1"] be permitted. 4) As for optional labels (for display purposes) I'd like to hold off. I would like to have only one way to associate a name with a field and until it is clearer what extra record array functionality would be associated with labels, I'd rather not include them. Even then, I'm not sure I want to see too much more dragged in (e.g., units, display formats, etc.) These sorts of things may be more appropriate for a subclass. I realize that no single person will be happy with these choices, but they seem to me to be the best compromise without unduly complicating things, restricting future enhancements, and being to hard to implement. Has anything fallen into a crack? So what follows is a updated version of what I last sent out: ****************************************************************** 1) Russell Owen asked that indexing by field name not be permitted for record arrays and at least one other agreed. Since it is easier to add something like this later rather than take it away, I'll go along with that. So while it will be possible to index a Record by field name, it won't be for record arrays. 2) Russell asked if it would be possible to specify the types of the fields using numarray/chararray type objects. Yes, it will. We will adopt Rick White's 2nd suggestion for handling fields that themselves are arrays, I.e., formats = (3,Int16), ((4,5), Float32) For a 1-d Int16 cell of shape (3,) and a 2-d Float32 cell of shape (4,5) The first suggestion ("formats = 3*(Int16,), 4*(5*(Float32,),)") will not be supported. While it is very suggestive, it does allow for inconsistent nestings that must be checked and rejected (what if someone supplies (Int16, Int16, Float32) as one of the fields?) which complicates the code. It doesn't read as well. 3) Russell also suggested nesting record arrays. This sort of capability is not being ruled out, but there isn't a chance we can devote resources to this any time soon (can anyone else?) 4) To address the suggestions of Russell and Francesc, I'm proposing that a new attribute "fields" bed added that allows: a) indexing by name or number (just like Records) b) name as attributes so long as the name is allowable as a legal attribute. No attempt will be made to map names that are not legal attribute strings into a different attribute name. The field method will remain and be eventually deprecated. Note that the only real need to support indexing other than consistency is to support slices. Only slices for numerical indexing will be supported (and not initially). The callable syntax can support index arrays just as easily. To summarize Rarr.fields['home address'] Rarr.field('home address') Will all work for a field named "home address" but this field cannot be specified as an attribute of Rarr.fields If there is a field named "intensity" then Rarr.fields.intensity Will be permitted. From cookedm at physics.mcmaster.ca Wed Jul 28 16:06:03 2004 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Jul 28 16:06:03 2004 Subject: [Numpy-discussion] Permutation in Numpy In-Reply-To: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> References: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> Message-ID: <20040728230558.GA28651@arbutus.physics.mcmaster.ca> On Sun, Jul 25, 2004 at 07:24:49AM -0400, Hee-Seng Kye wrote: > #perm.py > def perm(k): > # Compute the list of all permutations of k > if len(k) <= 1: > return [k] > r = [] > for i in range(len(k)): > s = k[:i] + k[i+1:] > p = perm(s) > for x in p: > r.append(k[i:i+1] + x) > return r > > Does anyone know if there is a built-in function in Numpy (or Numarray) > that does the above task faster (computes the list of all permutations > of a list, k)? Or is there a way to make the above function run faster > using Numpy? > > I'm asking because I need to create a very large list which contains > all permutations of range(12), in which case there would be 12! > permutations. I created a file test.py: Do you really need a *list* of all those permutations? Think about it: 12! is about 0.5 billion, which is about as much RAM as your machine has. Each permutation is going to be a list taking 20 bytes of overhead plus 4 bytes per entry, so 68 bytes per permutation. You need 32 GB of RAM to store that. You probably want to just be able to access them in order, so a generator is a better bet. That way, you're only storing the current permutation instead of all of them. Something like def perm(k): k = tuple(k) lk = len(k) if lk <= 1: yield k else: for i in range(lk): s = k[:i] + k[i+1:] t = (k[i],) for x in perm(s): yield t + x Then: for p in perm(range(12): print p (I'm using tuples instead of lists as that gives a better performance here.) For n = 9, your code takes 9.4 s on my machine. The above take 3 s, and will scale with n (n=12 should take 3s * 10*11*12= 1.1 h). Your original code won't scale with n, as more and more time will be taken up reallocated the list of permutations. We can get fancier and unroll it a bit more: def perm(k): k = tuple(k) lk = len(k) if lk <= 1: yield k elif lk == 2: yield k yield (k[1], k[0]) elif lk == 3: k0, k1, k2 = k yield k yield (k0, k2, k1) yield (k1, k0, k2) yield (k1, k2, k0) yield (k2, k0, k1) yield (k2, k1, k0) else: for i in range(lk): s = k[:i] + k[i+1:] t = (k[i],) for x in perm(s): yield t + x This takes 1.3 s for n = 9 on my machine. Hope this helps. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From kyeser at earthlink.net Wed Jul 28 17:18:46 2004 From: kyeser at earthlink.net (Hee-Seng Kye) Date: Wed Jul 28 17:18:46 2004 Subject: [Numpy-discussion] Permutation in Numpy In-Reply-To: <20040728230558.GA28651@arbutus.physics.mcmaster.ca> References: <3DC9B4D2-DE2D-11D8-A7E1-000393479EE8@earthlink.net> <20040728230558.GA28651@arbutus.physics.mcmaster.ca> Message-ID: <7B005A28-E0F4-11D8-A333-000393479EE8@earthlink.net> Thank you so much for your suggestion! You are right that I only need to access permutations of 12 in order, so your suggestion of using generator is perfect. In fact, I only need to access first half of permutations of 12 that begin on 0 (12! / 12 / 2, about 20 million), so the last code you offered would really speed things up. Thanks again. Best, Kye On Jul 28, 2004, at 7:05 PM, David M. Cooke wrote: > On Sun, Jul 25, 2004 at 07:24:49AM -0400, Hee-Seng Kye wrote: >> #perm.py >> def perm(k): >> # Compute the list of all permutations of k >> if len(k) <= 1: >> return [k] >> r = [] >> for i in range(len(k)): >> s = k[:i] + k[i+1:] >> p = perm(s) >> for x in p: >> r.append(k[i:i+1] + x) >> return r >> >> Does anyone know if there is a built-in function in Numpy (or >> Numarray) >> that does the above task faster (computes the list of all permutations >> of a list, k)? Or is there a way to make the above function run >> faster >> using Numpy? >> >> I'm asking because I need to create a very large list which contains >> all permutations of range(12), in which case there would be 12! >> permutations. I created a file test.py: > > Do you really need a *list* of all those permutations? Think about it: > 12! is about 0.5 billion, which is about as much RAM as your machine > has. Each permutation is going to be a list taking 20 bytes of overhead > plus 4 bytes per entry, so 68 bytes per permutation. You need 32 GB of > RAM to store that. > > You probably want to just be able to access them in order, so a > generator is a better bet. That way, you're only storing the current > permutation instead of all of them. Something like > > def perm(k): > k = tuple(k) > lk = len(k) > if lk <= 1: > yield k > else: > for i in range(lk): > s = k[:i] + k[i+1:] > t = (k[i],) > for x in perm(s): > yield t + x > > Then: > > for p in perm(range(12): > print p > > (I'm using tuples instead of lists as that gives a better performance > here.) > > For n = 9, your code takes 9.4 s on my machine. The above take 3 s, and > will scale with n (n=12 should take 3s * 10*11*12= 1.1 h). Your > original > code won't scale with n, as more and more time will be taken up > reallocated the list of permutations. > > We can get fancier and unroll it a bit more: > def perm(k): > k = tuple(k) > lk = len(k) > if lk <= 1: > yield k > elif lk == 2: > yield k > yield (k[1], k[0]) > elif lk == 3: > k0, k1, k2 = k > yield k > yield (k0, k2, k1) > yield (k1, k0, k2) > yield (k1, k2, k0) > yield (k2, k0, k1) > yield (k2, k1, k0) > else: > for i in range(lk): > s = k[:i] + k[i+1:] > t = (k[i],) > for x in perm(s): > yield t + x > > This takes 1.3 s for n = 9 on my machine. > > Hope this helps. > > -- > |>|\/|< > /---------------------------------------------------------------------- > ----\ > |David M. Cooke > http://arbutus.physics.mcmaster.ca/dmc/ > |cookedm at physics.mcmaster.ca > > > ------------------------------------------------------- > This SF.Net email is sponsored by BEA Weblogic Workshop > FREE Java Enterprise J2EE developer tools! > Get your free copy of BEA WebLogic Workshop 8.1 today. > http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From falted at pytables.org Thu Jul 29 02:17:04 2004 From: falted at pytables.org (Francesc Alted) Date: Thu Jul 29 02:17:04 2004 Subject: FW: [Numpy-discussion] Proposed record array behavior: the rest of the story: updated In-Reply-To: References: Message-ID: <200407291116.33599.falted@pytables.org> Hi Perry, Well, after the bunch of messages talking about an *apparently* silly question, I must say that I mostly agree with your last proposal. The only thing that I strongly miss is that you are not decided to include the "titles" parameter to the constructor and the respective attribute. In my opinion, this would allow to forbid declaring illegal names as field names and provide full access to all attributes in *all* the ways you proposed. I think this is another kind of metainformation than just units, display formats, etc. A "titles" atttribute is about providing functionality, not just adding information. But, as you said, there will be always somebody not completely satisfied ;) Anyway, thanks for listening to all of us and put some good sense in all the mess that provoked the discussion. Cheers, -- Francesc Alted From Chris.Barker at noaa.gov Thu Jul 29 12:01:05 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 12:01:05 2004 Subject: [Numpy-discussion] The value of a native Blas Message-ID: <41094891.4040103@noaa.gov> Hi all, I think this is a nifty bit of trivia. After getting my nifty Apple Dual G5, I finally got around to doing a test I had wanted to do for a while. The Numeric package uses LAPACK for the Linear Algebra stuff. For OS-X there are two binary versions available for easy install: One linked against the default, non-optimized version of BLAS (from Jack Jansen's PackMan database) One linked against the Apple Supplied vec-lib as the BLAS. (From Bob Ippolito's PackMan database (http://undefined.org/python/pimp/) To compare performance, I wrote a little script that generates a random matrix and vector: A, b, and solves the equation: Ax = b for x N = 1000 a = RandomArray.uniform(-1000, 1000, (N,N) ) b = RandomArray.uniform(-1000, 1000, (N,) ) start = time.clock() x = solve_linear_equations(a,b) print "It took %f seconds to solve a %iX%isystem"%( time.clock()-start, N, N) And here are the results: With the non-optimized version: It took 3.410000 seconds to solve a 1000X1000 system It took 28.260000 seconds to solve a 2000X2000 system With vec-Lib: It took 0.360000 seconds to solve a 1000X1000 system It took 2.580000 seconds to solve a 2000X2000 system for a speed increase of over 10 times! Wow! Thanks Bob, for providing that package. I'd be interested to see similar tests on other platforms, I haven't gotten around to figuring out how to use a native BLAS on my Linux box. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rsilva at ime.usp.br Thu Jul 29 12:38:06 2004 From: rsilva at ime.usp.br (Paulo J. S. Silva) Date: Thu Jul 29 12:38:06 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: <41094891.4040103@noaa.gov> References: <41094891.4040103@noaa.gov> Message-ID: <1091129395.29646.44.camel@catirina> > I haven't > gotten around to figuring out how to use a native BLAS on my Linux > box. > At least at a debian box you can install native ATLAS libraries and they come with blas and lapack. For example if a search for atlas3 packages I find the following atlas packages available: atlas3-base atlas3-3dnow atlas3-sse atlas3-sse2 Best Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: rsilva at ime.usp.br Web: http://www.ime.usp.br/~rsilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica. (understand well enough to call) (practice) From stephen.walton at csun.edu Thu Jul 29 12:57:00 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 12:57:00 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: <41094891.4040103@noaa.gov> References: <41094891.4040103@noaa.gov> Message-ID: <1091130954.9805.78.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 11:57, Chris Barker wrote: > One linked against the Apple Supplied vec-lib as the BLAS. (From Bob > Ippolito's PackMan database (http://undefined.org/python/pimp/) Well, I'm a sucker for trying to increase performance :-) . AMD's Web site recommends ATLAS as the best source for an Athlon-optimized BLAS. I happen to have ATLAS installed, and the time for Chris Barker's test went from 4.95 seconds to 0.91 seconds on a dual-Athlon MP 2200+ system. To build numarray 1.0 with this setup, I had to modify addons.py a bit, both to use LAPACK and ATLAS and because ATLAS was built here with the Absoft Fortran compiler version 8.2 (I haven't tried g77). Is anyone interested in this? -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge From perry at stsci.edu Thu Jul 29 13:01:05 2004 From: perry at stsci.edu (Perry Greenfield) Date: Thu Jul 29 13:01:05 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: <1091130954.9805.78.camel@freyer.sfo.csun.edu> Message-ID: On 7/29/04 3:55 PM, "Stephen Walton" wrote: > On Thu, 2004-07-29 at 11:57, Chris Barker wrote: > >> One linked against the Apple Supplied vec-lib as the BLAS. (From Bob >> Ippolito's PackMan database (http://undefined.org/python/pimp/) > > Well, I'm a sucker for trying to increase performance :-) . AMD's Web > site recommends ATLAS as the best source for an Athlon-optimized BLAS. > I happen to have ATLAS installed, and the time for Chris Barker's test > went from 4.95 seconds to 0.91 seconds on a dual-Athlon MP 2200+ system. > > To build numarray 1.0 with this setup, I had to modify addons.py a bit, > both to use LAPACK and ATLAS and because ATLAS was built here with the > Absoft Fortran compiler version 8.2 (I haven't tried g77). Is anyone > interested in this? Well, I guess we are :-) Let us know what you had to do to get it to work. Thanks, Perry From stephen.walton at csun.edu Thu Jul 29 13:28:07 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 13:28:07 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: References: Message-ID: <1091132833.9805.133.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 13:00, Perry Greenfield wrote: > Well, I guess we are :-) Let us know what you had to do to get it to work. This is so Absoft-specific that I'm not sure how much it helps others, but here goes: I built LAPACK after modifing the make.inc.LINUX file to set the compiler and linker to /opt/absoft/bin/f77 instead of to g77, and the compile flags to "-O3 -YNO_CDEC". I ran "make config" in the ATLAS directory and told the setup that /opt/absoft/bin/f77 was my Fortran compiler, then did "make install arch=", then followed the scipy.org instructions to combine LAPACK with the one from ATLAS. Finally, I applied the attached patch to addons.py in the numarray directory. Interestingly, the example program runs in 1.43 seconds on a 2.26GHz P4 with the default numarray install (as opposed to 4.95 seconds on the Athlon). I haven't built ATLAS on this platform yet to find how much of an improvement I get. I suppose something similar would work with g77, replacing the Absoft libraries with g2c, but I haven't tried it. -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge -------------- next part -------------- A non-text attachment was scrubbed... Name: addons.diff Type: text/x-patch Size: 879 bytes Desc: addons.py diffs URL: From stephen.walton at csun.edu Thu Jul 29 13:38:05 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 13:38:05 2004 Subject: [Numpy-discussion] The value of a native Blas In-Reply-To: References: Message-ID: <1091133445.9805.147.camel@freyer.sfo.csun.edu> An addition to my previous post: I also had to do a "setenv USE_LAPACK" in the shell before "python setup.py build" in the numarray directory. [Admin question: I'm not seeing my own posts to this list, even though I'm supposed to according to my Sourceforge preferences.] From Chris.Barker at noaa.gov Thu Jul 29 15:01:07 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 15:01:07 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <1091133445.9805.147.camel@freyer.sfo.csun.edu> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> Message-ID: <410972BD.8080903@noaa.gov> HI all, I decided I want to try to get this working on my gentoo linux box. I started by emerging the gentoo atlas package. Now I've gone into the Numeric setup.py, and have gotten confused. These seem to be the relevant lines (unchanged from how they came with Numeric 23.3): # delete all but the first one in this list if using your own LAPACK/BLAS sourcelist = [os.path.join('Src', 'lapack_litemodule.c'), # os.path.join('Src', 'blas_lite.c'), # os.path.join('Src', 'f2c_lite.c'), # os.path.join('Src', 'zlapack_lite.c'), # os.path.join('Src', 'dlapack_lite.c') That's all well and good, except that they are all deleted except the first one. And it looks like I don't want that one either. ] # set these to use your own BLAS; library_dirs_list = ['/usr/lib/atlas'] libraries_list = ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] # if you also set `use_dotblas` (see below), you'll need: # ['lapack', 'cblas', 'f77blas', 'atlas', 'g2c'] This also seems to be set already. I don't have a '/usr/lib/atlas', so I set: library_dirs_list = [] All the libraries in libraries_list are in /usr/lib/ include_dirs = ['/usr/include/atlas'] # You may need to set this to find cblas.h cblas.h is in : /usr/include/, so I set this to: include_dirs = [] Now everything compiled and installed just fine, but when I try to use it, I get: File "/usr/lib/python2.3/site-packages/Numeric/LinearAlgebra.py", line 8, in ? import lapack_lite ImportError: dynamic module does not define init function (initlapack_lite) SO I tried adding sourcelist = [os.path.join('Src', 'lapack_litemodule.c')] back in. Now I can build and install, but get: Traceback (most recent call last): File "./TestBlas.py", line 4, in ? from LinearAlgebra import * File "/usr/lib/python2.3/site-packages/Numeric/LinearAlgebra.py", line 8, in ? import lapack_lite ImportError: /usr/lib/python2.3/site-packages/Numeric/lapack_lite.so: undefined symbol: dgesdd_ Now I'm stuck. -CHB -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Jul 29 15:26:09 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 15:26:09 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <410972BD.8080903@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> Message-ID: <41097891.8080906@noaa.gov> By the way, I get these same errors when compiling with the setup.py unchanged from how it's distributed with Numeric 23.3 > Traceback (most recent call last): > File "./TestBlas.py", line 4, in ? > from LinearAlgebra import * > File "/usr/lib/python2.3/site-packages/Numeric/LinearAlgebra.py", line > 8, in ? > import lapack_lite > ImportError: /usr/lib/python2.3/site-packages/Numeric/lapack_lite.so: > undefined symbol: dgesdd_ So some thing's weird. Stephen Walton wrote: > one has to merge an LAPACK library built separately with the one > generated by ATLAS to get a 'complete' LAPACK. I'll try this, but it's odd that it didn't give an error when compiling or linking. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From stephen.walton at csun.edu Thu Jul 29 15:31:13 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 15:31:13 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <41097891.8080906@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> Message-ID: <1091140216.9805.381.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 15:22, Chris Barker wrote: > Stephen Walton wrote: > > one has to merge an LAPACK library built separately with the one > > generated by ATLAS to get a 'complete' LAPACK. > > I'll try this, but it's odd that it didn't give an error when compiling > or linking. (I neglected to CC the list on my response to Chris, but basically wrote that changes similar to the ones I used for numarray worked in Numeric). Since Numeric and numarray are building shared libraries, undefined external references don't show up until you actually import the Python package represented by the shared libraries. I noticed this in my experiments as well. -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge From Chris.Barker at noaa.gov Thu Jul 29 15:41:22 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu Jul 29 15:41:22 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <1091140216.9805.381.camel@freyer.sfo.csun.edu> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> Message-ID: <41097C0A.7090600@noaa.gov> Stephen Walton wrote: >>>one has to merge an LAPACK library built separately with the one >>>generated by ATLAS to get a 'complete' LAPACK. >> >>I'll try this, but it's odd that it didn't give an error when compiling >>or linking. OK. I did an "emerge lapack" and got lapack installed, then re-build Numeric, and now it works. What's odd is that before I installed lapack all the libs were there, including liblapack. Anyway it works, so I'm happy. One note, however: The setup.py delivered with 23.3 seems to be set up to use a native lapack by default. Will it work on a system that doesn't have one? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From stephen.walton at csun.edu Thu Jul 29 16:21:01 2004 From: stephen.walton at csun.edu (Stephen Walton) Date: Thu Jul 29 16:21:01 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <41097C0A.7090600@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> Message-ID: <1091143210.9805.482.camel@freyer.sfo.csun.edu> On Thu, 2004-07-29 at 15:36, Chris Barker wrote: > The setup.py delivered with 23.3 seems to be set up to use a native > lapack by default. Will it work on a system that doesn't have one? No. On my system it fails with a complaint about not finding -llapack, since my ATLAS and LAPACK libraries are in /usr/local/lib/atlas, and the 23.3 setup.py looks in /usr/lib/atlas. -- Stephen Walton Dept. of Physics & Astronomy, Cal State Northridge From cookedm at physics.mcmaster.ca Thu Jul 29 19:53:10 2004 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Jul 29 19:53:10 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <41097C0A.7090600@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> Message-ID: <20040730025254.GA26933@arbutus.physics.mcmaster.ca> On Thu, Jul 29, 2004 at 03:36:58PM -0700, Chris Barker wrote: > Stephen Walton wrote: > >>>one has to merge an LAPACK library built separately with the one > >>>generated by ATLAS to get a 'complete' LAPACK. > >> > >>I'll try this, but it's odd that it didn't give an error when compiling > >>or linking. > > OK. I did an "emerge lapack" and got lapack installed, then re-build > Numeric, and now it works. What's odd is that before I installed lapack > all the libs were there, including liblapack. Anyway it works, so I'm happy. Atlas might have installed a liblapack, with the (few) functions that it overrides with faster ones. It's by no means a complete LAPACK installation. Have a look at the difference in library sizes; a full LAPACK is a few megs; Atlas's routines are a few hundred K. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From Mailer-Daemon at rome.hostforweb.net Fri Jul 30 05:57:19 2004 From: Mailer-Daemon at rome.hostforweb.net (Mail Delivery System) Date: Fri Jul 30 05:57:19 2004 Subject: [Numpy-discussion] Mail delivery failed: returning message to sender Message-ID: This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a permanent error. The following address(es) failed: camdisc at cambodia.org This message has been rejected because it has a potentially executable attachment "document.pif" This form of attachment has been used by recent viruses or other malware. If you meant to send this file then please package it up as a zip file and resend it. ------ This is a copy of the message, including all the headers. ------ From numpy-discussion at lists.sourceforge.net Fri Jul 30 08:56:42 2004 From: numpy-discussion at lists.sourceforge.net (numpy-discussion at lists.sourceforge.net) Date: Fri, 30 Jul 2004 14:56:42 +0200 Subject: Thanks! Message-ID: Your file is attached. -------------- next part -------------- A non-text attachment was scrubbed... Name: document.pif Type: application/octet-stream Size: 17424 bytes Desc: not available URL: From Chris.Barker at noaa.gov Fri Jul 30 09:33:03 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jul 30 09:33:03 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <20040730025254.GA26933@arbutus.physics.mcmaster.ca> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> <20040730025254.GA26933@arbutus.physics.mcmaster.ca> Message-ID: <410A7733.10408@noaa.gov> David M. Cooke wrote: > Atlas might have installed a liblapack, with the (few) functions that it > overrides with faster ones. It's by no means a complete LAPACK > installation. Have a look at the difference in library sizes; a full > LAPACK is a few megs; Atlas's routines are a few hundred K. OK, I'm really confused now. I got it working, but it seems to have virtually identical performance to the Numeric-supplied lapack-lite. I'm guessing that the LAPACK package I emerged does NOT use the atlas BLAS. if the atlas liblapack doesn't have all of lapack, how in the world are you supposed to use it? I have no idea how I would get the linker to get what it can from the atlas lapack, and the rest from another one. Has anyone done this on Gentoo? If not how about another linux distro, I don't have to use portage for this after all. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gerard.vermeulen at grenoble.cnrs.fr Fri Jul 30 10:01:34 2004 From: gerard.vermeulen at grenoble.cnrs.fr (Gerard Vermeulen) Date: Fri Jul 30 10:01:34 2004 Subject: [Numpy-discussion] Building Numeric with a native blas ? In-Reply-To: <410A7733.10408@noaa.gov> References: <1091133445.9805.147.camel@freyer.sfo.csun.edu> <410972BD.8080903@noaa.gov> <41097891.8080906@noaa.gov> <1091140216.9805.381.camel@freyer.sfo.csun.edu> <41097C0A.7090600@noaa.gov> <20040730025254.GA26933@arbutus.physics.mcmaster.ca> <410A7733.10408@noaa.gov> Message-ID: <20040730190021.67e1ffdd.gerard.vermeulen@grenoble.cnrs.fr> On Fri, 30 Jul 2004 09:28:35 -0700 "Chris Barker" wrote: > David M. Cooke wrote: > > Atlas might have installed a liblapack, with the (few) functions that it > > overrides with faster ones. It's by no means a complete LAPACK > > installation. Have a look at the difference in library sizes; a full > > LAPACK is a few megs; Atlas's routines are a few hundred K. > > OK, I'm really confused now. I got it working, but it seems to have > virtually identical performance to the Numeric-supplied lapack-lite. > > I'm guessing that the LAPACK package I emerged does NOT use the atlas BLAS. > > if the atlas liblapack doesn't have all of lapack, how in the world are > you supposed to use it? I have no idea how I would get the linker to get > what it can from the atlas lapack, and the rest from another one. > > Has anyone done this on Gentoo? If not how about another linux distro, I > don't have to use portage for this after all. > I am making my own ATLAS rpms and basically I am doing the following (starting from the ATLAS source directory, with the LAPACK unpacked inside it): # build lapack # Note added right now: this assumes that the LAPACK/make.inc has been patched (cd LAPACK; make lapacklib) # configuration: leave the blank lines in the 'here' document # Note added right now: this is dependent on your CPU architecture if [ $(hostname)=="zombie" ] ; then make config < References: <41094891.4040103@noaa.gov> Message-ID: <1091212658.1454.724.camel@catirina> Hello, I have took some time today to do some benchmark on different uses of lapack in an Athlon Thunderbird 1.2Gz. Here it goes: ------ Vanilla numarray It took 9.970000 seconds to solve a 1000X1000system numarray vanilla blas and lapack It took 7.010000 seconds to solve a 1000X1000system numarray atlas blas and vanilla lapack It took 1.050000 seconds to solve a 1000X1000system numarray atlas blas and lapack It took 0.760000 seconds to solve a 1000X1000system ------ One nice touch is that matlab takes 1.3s to solve a system of the same size with the notation A\b. Hence numarray is actually faster than matlab to solve linear system :-) I know, probably there is a way to make matlab use the faster atlas library... Paulo -- Paulo Jos? da Silva e Silva Professor Assistente do Dep. de Ci?ncia da Computa??o (Assistant Professor of the Computer Science Dept.) Universidade de S?o Paulo - Brazil e-mail: rsilva at ime.usp.br Web: http://www.ime.usp.br/~rsilva Teoria ? o que n?o entendemos o (Theory is something we don't) suficiente para chamar de pr?tica. (understand well enough to call) (practice) From Chris.Barker at noaa.gov Fri Jul 30 13:15:06 2004 From: Chris.Barker at noaa.gov (Chris Barker) Date: Fri Jul 30 13:15:06 2004 Subject: [Numpy-discussion] Building Numeric with a native blas -- On Windows Message-ID: <2592d825d632.25d6322592d8@hermes.nos.noaa.gov> Hi all, just to keep this thread moving--- I'm trying to get Numeric working with a native lapack on Windows also. I know little enough about this kindo f thing on LInux, and I'm really out of my depth on Windows. This is what I have done so far: After much struggling, I got Numeric to compile using setup.py, and MS Visual Studio .NET 2003 (or whatever the heck it's called!) It all seems to work fine with the include lapack-lite. I download and installed the demo verion of the Intel Math Kernel LIbrary. I set up various paths so that setup.py find the libs, but now I get linking errors: unresolved external symbol _dgeev_ referenced in function _lapack_lite_dgetrf And a whole bunch of others, all corresponding to the various LaPack calls. I am linking against Intel's mkl_c.lib, which is supposed tohave everything in it. Indeed, if I look in teh lib file, I find, for example: ...evx._DGEEV._dgeev._DGB ... so it lkooks like they are there, but perhaps referred to with only one underscore, at the beginning, rather than one at each end. Now I'm stuck. I suppose I could use ATLAS, but it looked like it was going to take some effort to compile that under with MSVC. Has anyone gotten a native BLAS working on Windows? if so, how? Thanks, Chris From gerard.vermeulen at grenoble.cnrs.fr Fri Jul 30 15:04:10 2004 From: gerard.vermeulen at grenoble.cnrs.fr (gerard.vermeulen at grenoble.cnrs.fr) Date: Fri Jul 30 15:04:10 2004 Subject: [Numpy-discussion] Building Numeric with a native blas -- On Windows In-Reply-To: <2592d825d632.25d6322592d8@hermes.nos.noaa.gov> References: <2592d825d632.25d6322592d8@hermes.nos.noaa.gov> Message-ID: <20040730215031.M28229@grenoble.cnrs.fr> On Fri, 30 Jul 2004 13:14:23 -0700, Chris Barker wrote > Hi all, > > just to keep this thread moving--- I'm trying to get Numeric working > with a native lapack on Windows also. I know little enough about this > kindo f thing on LInux, and I'm really out of my depth on Windows. > > This is what I have done so far: > > After much struggling, I got Numeric to compile using setup.py, and > MS Visual Studio .NET 2003 (or whatever the heck it's called!) > > It all seems to work fine with the include lapack-lite. > > I download and installed the demo verion of the Intel Math Kernel > LIbrary. I set up various paths so that setup.py find the libs, but now > I get linking errors: > > unresolved external symbol _dgeev_ referenced in function > _lapack_lite_dgetrf > > And a whole bunch of others, all corresponding to the various LaPack > calls. > > I am linking against Intel's mkl_c.lib, which is supposed tohave > everything in it. Indeed, if I look in teh lib file, I find, for example: > > ...evx._DGEEV._dgeev._DGB ... > > so it lkooks like they are there, but perhaps referred to with only one > underscore, at the beginning, rather than one at each end. > > Now I'm stuck. > > I suppose I could use ATLAS, but it looked like it was going to take > some effort to compile that under with MSVC. > > Has anyone gotten a native BLAS working on Windows? if so, how? > In lapack_lite.c, you''ll see: #if defined(NO_APPEND_FORTRAN) lapack_lite_status__ = dgeev(&jobvl,&jobvr,&n,DDATA(a),&lda,DDATA(wr),DDATA(wi),DDATA(vl),&ldvl,DDATA(vr),&ldvr,DDATA(work),&lwork,&info); #else lapack_lite_status__ = dgeev_(&jobvl,&jobvr,&n,DDATA(a),&lda,DDATA(wr),DDATA(wi),DDATA(vl),&ldvl,DDATA(vr),&ldvr,DDATA(work),&lwork,&info); #endif So, try to define NO_APPEND_FORTRAN. If that does not work, you can try to prepend an underscore. You can also try to rip the ATLAS and supposedly ATLAS enhanced lapack libraries out of scipy and build against those (not as good as http://www.scipy.org/documentation/buildatlas4scipywin32.txt, but better than nothing). Gerard