From robert.kern at gmail.com Sat Apr 1 00:20:00 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 1 00:20:00 2006 Subject: [Numpy-discussion] Trac maintenance Message-ID: <442E3770.6030809@gmail.com> I've been doing a bit of maintenance on the Trac instances for numpy and scipy. In particular, I've removed the default "component1" and "milestone2" nonsense and put meaningful values in their place. If you have any requests, or you think my component lists are bogus, enter a ticket, set the component to "Trac" and assign it to rkern. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Sat Apr 1 06:57:17 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sat Apr 1 06:57:17 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442E2F05.5080809@ieee.org> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> Message-ID: <442E94AD.1040200@cox.net> Travis Oliphant wrote: > Tim Hochberg wrote: > >> >> I've just been looking at how numpy handles changing the behaviour >> that is triggered when there are numeric error conditions (overflow, >> underflow, etc.). If I understand it correctly, and that's a big if, >> I don't think I like it nearly as much as the what numarray has in >> place. >> >> It appears that numpy uses the two functions, seterr and geterr, to >> set and query the error handling. These set/read a secret variable >> stored in the local scope. > > This approach was decided on after discussions with Guido who didn't > like the idea of pushing and popping from a global stack. I'm not > sure I'm completely in love with it my self, but it is actually more > flexible then the numarray approach. > > You can get the numarray approach back simply by setting the error in > the builtin scope (instead of in the local scope which is done by > default. I saw that you could set it at different levels, but missed the implications. However, it's still missing one feature, thread local storage. I would argue that the __builtin__ data should actually be stored in threading.local() instead of __builtin__. Then you could setup an equivalent stack system to numpy's. > Then, at the end of the function, you can restore it. If it was felt > useful to create a stack to handle this on the builtin level then that > is easily done as well. I've used the numarray error handling stuff for some time. My experience with it has led me to the following conclusions: 1. You don't use it that often. I have about 26 KLOC that's "active" and in that I use pushMode just 15 times. For comparison, I use asarray a tad over 100 times. 2. pushMode and popMode, modulo spelling, is the way to set errors. Once the with statement is around, that will be even better. 3. I, personally, would be very unlikely to use the local and global error handling, I'd just as soon see them go away, particularly if it helps performance, but I won't lobby for it. >> I assume that the various ufuncs then examine that value to determine >> how to handle errors. The secret variable approach is a little >> clunky, but that's not what concerns me. What concerns me is that >> this approach is *only* useful for built in numpy functions and falls >> down if we call any user defined functions. >> >> Suppose we want to be warned on underflow. Setting this is as simple as: >> >> def func(*args): >> numpy.seterr(under='warn') >> # do stuff with args >> return result >> >> Since seterr is local to the function, we don't have to reset the >> error handling at the end, which is convenient. And, this works fine >> if all we are doing is calling numpy functions and methods. However, >> if we are calling a function of our own devising we're out of luck >> since the called function will not inherit the error settings that we >> have set. > > Again, you have control over where you set the "secret" variable > (local, global (module), and builtin). I also don't see how that's > anymore clunky then a "secret" stack. In numarray, the stack is in the numarray module itself (actually in the Error object). They base their threading local behaviour off of thread.get_ident, not threading.local. That's not clunky at all, although it's arguably wrong since thread.get_ident can reuse ids from dead threads. In practice it's probably hard to get into trouble doing this, but I still wouldn't emulate it. I think that this was written before thread local storage, so it was probably the best that could be done. However, if you use threading.local, it will be clunky in a similar sense. You'll be storing data in a global namespace you don't control and you've got to hope that no one stomps on your variable name. When you have local and module level secret storage names as well you're just doing a lot more of that and the chance of collision and confusion goes up from almost zero to very small. > You may set the error in the builtin scope --- in fact it would > probably be trivial to implement a stack based on this and implement the > > pushMode > popMode > > interface of numarray. Yes. Modulo the thread local issue, I believe that this would indeed be easy. > > But, I think this question does deserve a bit of debate. I don't > think there has been a serious discussion over the method. To help > Tim and others understand what happens: > > When a ufunc is called, a specific variable name is searched for in > the following name-spaces in the following order: > > 1) local > 2) global > 3) builtin > > (There is a bit of an optimization in that when the error mode is the > default mode --- do nothing, a global flag is set which by-passes the > search for the name). > The first time the variable name is found, the error mode is read from > that variable. This error mode is placed as part of the ufunc loop > object. At the end of each 1-d loop the IEEE error mode flags are > checked (depending on the state of the error mode) and appropriate > action taken. > > By the way, it would not be too difficult to change how the error mode > is set (probably an hour's worth of work). So, concern over > implementation changes should not be a factor right now. > Currently the error mode is read from a variable using standard > scoping rules. It would save the (not insignificant) name-space > lookup time to instead use a global stack (i.e. a Python list) and > just get the error mode from the top of that stack. > >> Thus we have no way to influence the error settings of functions >> downstream from us. > > Of course, there is a way to do this by setting the variable in the > global or builtin scope as I've described above. > What's really the argument here, is whether having the flexibility at > the local and global name-spaces really worth the extra name-lookups > for each ufunc. > > I've argued that the numarray behavior can result from using the > builtin namespace for the error control. (perhaps with better > Python-side support for setting and retrieving it). What numpy has is > control at the global and local namespace level as well which can > override the builtin name-space behavior. > > So, we should at least frame the discussion in terms of what is > actually possible. Yes, sorry for spreading misinformation. >> >> I also would prefer more verbose keys ala numarray (underflow, >> overflow, dicidebyzero and invalid) than those currently used by >> numpy (under, over, divide and invalid). > > > In my mind, verbose keys are just extra baggage unless they are really > self documenting. You just need reminders and clues. It seems to be > a preference thing. I guess I hate typing long strings when only the > first few letters clue me in to what is being talked about. In this case, overflow, underflow and dividebyzero seem pretty self documenting to me. And 'invalid' is pretty cryptic in both implementations. This may be a matter of taste, but I tend to prefer short pithy names for functions that I use a lot, or that crammed a bunch to a line. In functions like this, that are more rarely used and get a full line to themselves I lean to towards the more verbose. >> And (will he never stop) I like numarrays defaults better here too: >> overflow='warn', underflow='ignore', dividebyzero='warn', >> invalid='warn'. Currently, numpy defaults to ignore for all cases. >> These last points are relatively minor though. > > This has optimization issues the way the code is written now. The > defaults are there to produce the fastest loops. Can you elaborate on this a bit? Reading between the lines, there seem to be two issues related to speed here. One is the actual namespace lookup of the error mode -- there's a setting that says we are using the defaults, so don't bother to look. This saves the namespace lookup. Changing the defaults shouldn't affect the timing of that. I'm not sure how this would interact with thread local storage though. The second issue is that running the core loop with no checks in place is faster. That means that to get maximum performance you want to be running both at the default setting and with no checks, which implies that the default setting needs to be no checking. Is that correct? I think there should be a way to finesse this issue, but I'll wait for the dust to settle a bit on the local, global, builtin issue before I propose anything. Particularly since by finesse I mean: do something moderately unsavory. > So, I'm hesitant to change them based only on ambiguous preferences. It's not entirely plucked out of the error. As I recall, the decision was arrived at something likes this: 1. Errors should never pass silently (unless explicitly silenced). 2. Let's have everything raise by default 3. In practice this was no good because you often wanted to look at the results and see where the problem was. 4. OK, let's have everything warn 5. This almost worked, but underflow was almost never a real error, so everyone always overrode underflow. A default that you always need to override is not a good default. 6. So, warn for everything except underflow. Ignore that. And that's where numarry is today. I and other have been using that error system happily for quite some time now. At least I haven't heard any complaints for quite a while. > Good feedback. Thanks again for taking the time to look at this and > offer review. You're very welcome. Thanks for all of the work you've been putting in to make the grand numerification happen. -tim From arnd.baecker at web.de Sat Apr 1 09:09:06 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Sat Apr 1 09:09:06 2006 Subject: [Numpy-discussion] extension to xrange for numpy Message-ID: Dear numpy enthusiasts, one python command which is extremely useful in 1D situations is `xrange`. However, for higher dimensional settings we strongly lack the commands `yrange` and `zrange`. These could be shorthands for the corresponding constructs with `:,NewAxis` added. Any comments, suggestion and even implementations are very welcome, Arnd P.S.: What I am not sure about is the right command for the 4-dimensional case - which letter should be used after the "z"? (it seems that "a" would be a very natural choice...) From faltet at carabos.com Sat Apr 1 11:01:05 2006 From: faltet at carabos.com (Francesc Altet) Date: Sat Apr 1 11:01:05 2006 Subject: [Numpy-discussion] ANN: PyTables 1.3 released Message-ID: <200604012100.38726.faltet@carabos.com> ========================= Announcing PyTables 1.3 ========================= This is a new major release of PyTables. The most remarkable feature added in this version is a complete support (well, almost, because unicode arrays are not there yet) for NumPy objects. Improved support for native HDF5 is there as well. As an aside, I'm happy to inform you that the PyTables web site (http://www.pytables.org) has been converted into a wiki so that users can contribute to the project with recipes or any other document. Try it out! Go to the (new) PyTables web site for downloading the beast: http://www.pytables.org/ or keep reading for more info about the new features and bugs fixed. Changes more in depth ===================== Improvements: - Support for NumPy objects in all the objects of PyTables, namely: Array, CArray, EArray, VLArray and Table. All the numerical and character (except unicode arrays) flavors are supported as well as plain and nested heterogeneous NumPy arrays. PyTables leverages the adoption of the array interface (http://numeric.scipy.org/array_interface.html) for a very efficient conversion between all the numarray (which continues to be the native flavor for PyTables) object to/from NumPy/Numeric. - The FLAVOR schema in PyTables has been refined and simplified. Now, the only 'flavors' allowed for data objects are: "numarray", "numpy", "numeric" and "python". The changes has been made so that they are fully backward compatible with existing PyTables files. However, when users would try to use old flavors (like "Numeric" or "Tuple") in existing code, a ``DeprecationWarning`` will be issued in order to encourage them to migrate to the new flavors as soon as possible. - Nested fields can be specified in the "field" parameter of Table.read by using a '/' as a separator between fields (e.g. 'Info/value'). - The Table.Cols accessor has received a new ``__setitem__()`` method that allows doing things like: table.cols[4] = record table.cols.x[4:1000:2] = array # homogeneous column table.cols.Info[4:1000:2] = recarray # nested column - A clean-up function (using ``atexit``) has been registered so that remaining opened files are closed when a user hits a ^C, for example. That would help to avoid ending with corrupted files. - Native HDF5 compound datasets that are contiguous are supported now. Before, only chunked datasets were supported. - Updated (and much improved) sections about compression issues in the User's Guide. It includes new benchmarks made with PyTables 1.3 and a exhaustive comparison between Zlib, LZO and bzip2. - The HTML version of manual is made now from the docbook2html package for an improved look (IMO). Bug fixes: - Solved a problem when trying to save CharArrays with itemsize = 0 as attributes of nodes. Now, these objects are pickled in order to prevent HDF5 from crashing. - Fixed some alignment issues with nested record arrays under certain architectures (e.g. PowerPC). - Fixed automatic conversions when a VLArray is read in a platform with a byte ordering different from the file. Deprecated features: - Due to recurrent problems with the UCL compression library, it has been declared deprecated from this version on. You can still compile PyTables with UCL support (using the --force-ucl), but you are urged to not use it anymore and convert any existing datafiles with UCL to other supported library (zlib, lzo or bzip2) with the ``ptrepack`` utility. Backward-incompatible changes: - Please, see ``RELEASE-NOTES.txt`` file. Important note for Windows users ================================ If you are willing to use PyTables with Python 2.4 in Windows platforms, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win-net.ZIP Users of Python 2.3 on Windows will have to download the version of HDF5 compiled with MSVC 6.0 available in: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win.ZIP What it is ========== **PyTables** is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data (with support for full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code, makes it a very easy-to-use tool for high performance data storage and retrieval. PyTables runs on top of the HDF5 library and numarray (but NumPy and Numeric are also supported) package for achieving maximum throughput and convenient use. Besides, PyTables I/O for table objects is buffered, implemented in C and carefully tuned so that you can reach much better performance with PyTables than with your own home-grown wrappings to the HDF5 library. PyTables sports indexing capabilities as well, allowing doing selections in tables exceeding one billion of rows in just seconds. Platforms ========= This version has been extensively checked on quite a few platforms, like Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64 (Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64) and MacOSX on PowerPC. For other platforms, chances are that the code can be easily compiled and run without further issues. Please, contact us in case you are experiencing problems. Resources ========= Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdf.ncsa.uiuc.edu/HDF5/ About numarray: http://www.stsci.edu/resources/software_hardware/numarray To know more about the company behind the PyTables development, see: http://www.carabos.com/ Acknowledgments =============== Thanks to various the users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last but not least, a big thank you to THG (http://www.hdfgroup.org/) for sponsoring many of the new features recently introduced in PyTables. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From oliphant.travis at ieee.org Sat Apr 1 12:20:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 1 12:20:01 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442E94AD.1040200@cox.net> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> Message-ID: <442EE026.8060806@ieee.org> Tim Hochberg wrote: >> >> You can get the numarray approach back simply by setting the error in >> the builtin scope (instead of in the local scope which is done by >> default. > > I saw that you could set it at different levels, but missed the > implications. However, it's still missing one feature, thread local > storage. I would argue that the __builtin__ data should actually be > stored in threading.local() instead of __builtin__. Then you could > setup an equivalent stack system to numpy's. Yes, the per-thread storage escaped me. But, threading.local() only exists in Python 2.4 and NumPy is supposed to be compatible with Python 2.3 What about PyThreadState_GetDict() ? and then default to use the builtin dictionary if this returns NULL? I'm actually not particularly enthused about the three name-space lookups. Changing it to only 1 place to look may be better. It would require a setting and restoring operation. A stack could be used, but why not just use local variables (i.e. save = numpy.seterr(dividebyzero='warn') ... numpy.seterr(restore=save) > > I've used the numarray error handling stuff for some time. My > experience with it has led me to the following conclusions: > > 1. You don't use it that often. I have about 26 KLOC that's "active" > and in that I use pushMode just 15 times. For comparison, I use > asarray a tad over 100 times. > 2. pushMode and popMode, modulo spelling, is the way to set errors. > Once the with statement is around, that will be even better. > 3. I, personally, would be very unlikely to use the local and global > error handling, I'd just as soon see them go away, particularly if > it helps performance, but I won't lobby for it. > This is good feedback. I have almost zero experience with changing the error handling. So, I'm not sure what features are desireable. Eliminating unnecessary name-lookups is usually a good thing. > > In numarray, the stack is in the numarray module itself (actually in > the Error object). They base their threading local behaviour off of > thread.get_ident, not threading.local. That's not clunky at all, > although it's arguably wrong since thread.get_ident can reuse ids from > dead threads. In practice it's probably hard to get into trouble doing > this, but I still wouldn't emulate it. I think that this was written > before thread local storage, so it was probably the best that could be > done. Right, but thread local storage is still Python 2.4 only.... What about PyThreadState_GetDict() ? > > However, if you use threading.local, it will be clunky in a similar > sense. You'll be storing data in a global namespace you don't control > and you've got to hope that no one stomps on your variable name. The PyThreadState_GetDict() documenation states that extension module writers should use a unique name based on their extension module. > When you have local and module level secret storage names as well > you're just doing a lot more of that and the chance of collision and > confusion goes up from almost zero to very small. This is true. Similar to the C-variable naming issues. >> So, we should at least frame the discussion in terms of what is >> actually possible. > > Yes, sorry for spreading misinformation. But you did point out the very important thread-local storage fact that I had missed. This alone makes me willing to revamp what we are doing. > > In this case, overflow, underflow and dividebyzero seem pretty self > documenting to me. And 'invalid' is pretty cryptic in both > implementations. This may be a matter of taste, but I tend to prefer > short pithy names for functions that I use a lot, or that crammed a > bunch to a line. In functions like this, that are more rarely used and > get a full line to themselves I lean to towards the more verbose. The rarely-used factor is a persuasive argument. > Can you elaborate on this a bit? Reading between the lines, there seem > to be two issues related to speed here. One is the actual namespace > lookup of the error mode -- there's a setting that says we are using > the defaults, so don't bother to look. This saves the namespace > lookup. Changing the defaults shouldn't affect the timing of that. > I'm not sure how this would interact with thread local storage though. > > The second issue is that running the core loop with no checks in place > is faster. Basically, on the C-level, the error mode is an integer with specific bits allocated to the various error-possibilites (2-bits per possibility). If this is 0 then the error checking is not even done (thus no error handling at all). Yes the name-lookup optimization could work with any defaults (but with thread-specific storage couldn't work anyway). One question I have with threads and error handling though? Right now, the ufuncs release the Python lock during computation (and re-acquire it to do error handling if needed). If another ufunc was started by another Python thread and ran with different error handling, wouldn't the IEEE flags get confused about which ufunc was setting what? The flags are only checked after each 1-d loop. If another thread set the processor flag, the current thread could get very confused. This seems like a problem that I'm not sure how to handle. > > It's not entirely plucked out of the error. As I recall, the decision > was arrived at something likes this: > > 1. Errors should never pass silently (unless explicitly silenced). > 2. Let's have everything raise by default > 3. In practice this was no good because you often wanted to look at > the results and see where the problem was. > 4. OK, let's have everything warn > 5. This almost worked, but underflow was almost never a real error, > so everyone always overrode underflow. A default that you always > need to override is not a good default. > 6. So, warn for everything except underflow. Ignore that. > > And that's where numarry is today. I and other have been using that > error system happily for quite some time now. At least I haven't heard > any complaints for quite a while. I can appreciate this choice, but I don't agree that errors should never pass silently. The fact that people disagree about this is the reason for the error handling. Note that overflow is not detected everywhere for integers --- we have to simulate the floating-point errors for them. Only on integer multiply is it detected. Checking for it would slow down all other integer arithmetic --- one solution, of course is to have two different integer additions (one that checks for overflow and another that doesn't). There is really a bit of work left here to do. Best, -Travis From tim.hochberg at cox.net Sat Apr 1 14:01:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sat Apr 1 14:01:04 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442EE026.8060806@ieee.org> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> Message-ID: <442EF7D9.9010404@cox.net> Travis Oliphant wrote: > Tim Hochberg wrote: > >>> >>> You can get the numarray approach back simply by setting the error >>> in the builtin scope (instead of in the local scope which is done by >>> default. >> >> >> I saw that you could set it at different levels, but missed the >> implications. However, it's still missing one feature, thread local >> storage. I would argue that the __builtin__ data should actually be >> stored in threading.local() instead of __builtin__. Then you could >> setup an equivalent stack system to numpy's. > > Yes, the per-thread storage escaped me. But, threading.local() only > exists in Python 2.4 and NumPy is supposed to be compatible with > Python 2.3 > > What about PyThreadState_GetDict() ? and then default to use the > builtin dictionary if this returns NULL? That sounds reasonable. I've never used that, but the name sounds promising! > I'm actually not particularly enthused about the three name-space > lookups. Changing it to only 1 place to look may be better. It > would require a setting and restoring operation. A stack could be > used, but why not just use local variables (i.e. > save = numpy.seterr(dividebyzero='warn') > > ... > > numpy.seterr(restore=save) That would work as well, I think. It gets a little hairy if you want to set error nestedly in a single function, but I've never done that, so I'm not too worried about it. Besides, what I really want to support is 'with', which I imagine we can support using the above as a base. >> I've used the numarray error handling stuff for some time. My >> experience with it has led me to the following conclusions: >> >> 1. You don't use it that often. I have about 26 KLOC that's "active" >> and in that I use pushMode just 15 times. For comparison, I use >> asarray a tad over 100 times. >> 2. pushMode and popMode, modulo spelling, is the way to set errors. >> Once the with statement is around, that will be even better. >> 3. I, personally, would be very unlikely to use the local and global >> error handling, I'd just as soon see them go away, particularly if >> it helps performance, but I won't lobby for it. >> > > This is good feedback. I have almost zero experience with changing > the error handling. So, I'm not sure what features are desireable. > Eliminating unnecessary name-lookups is usually a good thing. I hope some of the other numarray users chime in. A sample of one is not very good data! >> In numarray, the stack is in the numarray module itself (actually in >> the Error object). They base their threading local behaviour off of >> thread.get_ident, not threading.local. That's not clunky at all, >> although it's arguably wrong since thread.get_ident can reuse ids >> from dead threads. In practice it's probably hard to get into trouble >> doing this, but I still wouldn't emulate it. I think that this was >> written before thread local storage, so it was probably the best that >> could be done. > > > Right, but thread local storage is still Python 2.4 only.... > > What about PyThreadState_GetDict() ? That sounds reasonable. Essentially we would be rolling our own threading.local() >> >> However, if you use threading.local, it will be clunky in a similar >> sense. You'll be storing data in a global namespace you don't >> control and you've got to hope that no one stomps on your variable name. > > The PyThreadState_GetDict() documenation states that extension module > writers should use a unique name based on their extension module. > >> When you have local and module level secret storage names as well >> you're just doing a lot more of that and the chance of collision and >> confusion goes up from almost zero to very small. > > This is true. Similar to the C-variable naming issues. > >>> So, we should at least frame the discussion in terms of what is >>> actually possible. >> >> >> Yes, sorry for spreading misinformation. > > > But you did point out the very important thread-local storage fact > that I had missed. This alone makes me willing to revamp what we are > doing. > >> >> In this case, overflow, underflow and dividebyzero seem pretty self >> documenting to me. And 'invalid' is pretty cryptic in both >> implementations. This may be a matter of taste, but I tend to prefer >> short pithy names for functions that I use a lot, or that crammed a >> bunch to a line. In functions like this, that are more rarely used >> and get a full line to themselves I lean to towards the more verbose. > > > The rarely-used factor is a persuasive argument. > >> Can you elaborate on this a bit? Reading between the lines, there >> seem to be two issues related to speed here. One is the actual >> namespace lookup of the error mode -- there's a setting that says we >> are using the defaults, so don't bother to look. This saves the >> namespace lookup. Changing the defaults shouldn't affect the timing >> of that. I'm not sure how this would interact with thread local >> storage though. >> >> The second issue is that running the core loop with no checks in >> place is faster. > > Basically, on the C-level, the error mode is an integer with specific > bits allocated to the various error-possibilites (2-bits per > possibility). If this is 0 then the error checking is not even done > (thus no error handling at all). > Yes the name-lookup optimization could work with any defaults (but > with thread-specific storage couldn't work anyway). > > One question I have with threads and error handling though? Right > now, the ufuncs release the Python lock during computation (and > re-acquire it to do error handling if needed). If another ufunc was > started by another Python thread and ran with different error > handling, wouldn't the IEEE flags get confused about which ufunc was > setting what? The flags are only checked after each 1-d loop. If > another thread set the processor flag, the current thread could get > very confused. > > This seems like a problem that I'm not sure how to handle. Yeah, me either. It seems that somehow we'll need to block until all current operations are done, but I don't know how to do that off the top of my head. Perhaps ufuncs need to lock the flags when they start and release them when they finish. This looks feasible, but I'm not sure of the proper incantation to get this right. The ufuncs would all need to be able able to increment and decrement the lock, whatever it is, even though they are in different threads. Meanwhile the setting code should only be able to work when the lock is unheld. It's some sort of poly thread recursive lock thing. I'll think about it, perhaps there's an obvious way. >> >> It's not entirely plucked out of the error. As I recall, the decision >> was arrived at something likes this: >> >> 1. Errors should never pass silently (unless explicitly silenced). >> 2. Let's have everything raise by default >> 3. In practice this was no good because you often wanted to look at >> the results and see where the problem was. >> 4. OK, let's have everything warn >> 5. This almost worked, but underflow was almost never a real error, >> so everyone always overrode underflow. A default that you always >> need to override is not a good default. >> 6. So, warn for everything except underflow. Ignore that. >> >> And that's where numarry is today. I and other have been using that >> error system happily for quite some time now. At least I haven't >> heard any complaints for quite a while. > > > I can appreciate this choice, but I don't agree that errors should > never pass silently. You'll notice that we ended up with a slightly more nuanced choice. Besides, the full quote is import: "errors should not pass silently unless explicitly silenced". That's quite a bit different than a blanket error should never pass silently. > The fact that people disagree about this is the reason for the error > handling. Yes. While I like the above defaults, if we have a reasonable approach I can just set them at startup and forget about them. Let's try not to penalize me too much for that though. > Note that overflow is not detected everywhere for integers --- we have > to simulate the floating-point errors for them. Only on integer > multiply is it detected. Checking for it would slow down all other > integer arithmetic --- one solution, of course is to have two > different integer additions (one that checks for overflow and another > that doesn't). Or just document it and don't worry about it. If I'm doing integer arithmetic and I need overflow detection, I can generally cast to doubles and do my math there, casting back at the end as needed. This doesn't seem worth too much extra complication. Is my floating point bias showing? > There is really a bit of work left here to do. Yep. Looks like it, but nothing insurmountable. -tim From strawman at astraw.com Sat Apr 1 15:56:03 2006 From: strawman at astraw.com (Andrew Straw) Date: Sat Apr 1 15:56:03 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442EF7D9.9010404@cox.net> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> <442EF7D9.9010404@cox.net> Message-ID: <442F130E.3060802@astraw.com> Tim Hochberg wrote: > Travis Oliphant wrote: > >> >> One question I have with threads and error handling though? Right >> now, the ufuncs release the Python lock during computation (and >> re-acquire it to do error handling if needed). If another ufunc was >> started by another Python thread and ran with different error >> handling, wouldn't the IEEE flags get confused about which ufunc was >> setting what? The flags are only checked after each 1-d loop. If >> another thread set the processor flag, the current thread could get >> very confused. >> >> This seems like a problem that I'm not sure how to handle. > > > Yeah, me either. It seems that somehow we'll need to block until all > current operations are done, but I don't know how to do that off the > top of my head. Perhaps ufuncs need to lock the flags when they start > and release them when they finish. This looks feasible, but I'm not > sure of the proper incantation to get this right. The ufuncs would all > need to be able able to increment and decrement the lock, whatever it > is, even though they are in different threads. Meanwhile the setting > code should only be able to work when the lock is unheld. It's some > sort of poly thread recursive lock thing. I'll think about it, perhaps > there's an obvious way. I am also absolutely no expert in this area, but isn't this exactly what the kernel supports multiple threads for? In other words, I'm not sure we have to worry about it at all. I expect that the kernel sets/restores the CPU/FPU error flags on thread switches and this is part of the cost associated with switching threads. As I understand it, linux threads are actually implemented as new processes, so if we did have to be worried about this, wouldn't we also have to be worried that program A might alter the FPU error state while we're also using program B? This is just my unsophisticated and possibly wrong understanding of these things. If anyone can help clarify the issue, I'd be glad to be enlightened. Cheers! Andrew From aisaac at american.edu Sat Apr 1 16:12:01 2006 From: aisaac at american.edu (Alan G Isaac) Date: Sat Apr 1 16:12:01 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: Message-ID: On Sat, 1 Apr 2006, (CEST) Arnd Baecker apparently wrote: > one python command which is extremely useful in 1D > situations is `xrange`. Which will very soon be 'range'. Cheers, Alan Isaac From gruben at bigpond.net.au Sat Apr 1 18:46:07 2006 From: gruben at bigpond.net.au (Gary Ruben) Date: Sat Apr 1 18:46:07 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: Message-ID: <442F41AE.1080806@bigpond.net.au> A few rough thoughts: I'm a bit ambivalent about this. It's not very n-dimensional and enforces an x,y,z,(t?) ordering of the array dimensions which some programmers may not want to adhere to. On the occasions I've had to write code which loops over multiple dimensions, I've found the python cookbook routines for permutation and combination generators really useful so I'd find some sort of numpy iterator equivalents of these more useful. This would allow list comprehensions like [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] It would also be good to have it able to specify the rank of the object returned to allow whole array rows or matrices to be returned i.e. array slices. Maybe the ndrange function could allow something like [f(xy,z) for (xy,z) in ndrange((10,0,1),10)] where you use a tuple to specify a range and the axes to slice out. [f(x,yz) for (x,yz) in ndrange(10,(10,1,2))] [f(xz,y) for (xz,y) in ndrange((10,0,2),(10,1))] On the other hand your idea would potentially make some code a lot easier to understand, so I'm not against it and if it was picked up, I'd propose "t" or "w" for the 4th dimension. It might help to post some code that you think might benefit from your idea. Gary R. Arnd Baecker wrote: > Dear numpy enthusiasts, > > one python command which is extremely useful in 1D situations > is `xrange`. However, for higher dimensional > settings we strongly lack the commands `yrange` and `zrange`. > These could be shorthands for the corresponding > constructs with `:,NewAxis` added. > > Any comments, suggestion and even implementations are very welcome, > > Arnd > > P.S.: What I am not sure about is the right command for > the 4-dimensional case - which letter should be used after the "z"? > (it seems that "a" would be a very natural choice...) From rob at hooft.net Sat Apr 1 22:38:04 2006 From: rob at hooft.net (Rob Hooft) Date: Sat Apr 1 22:38:04 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442EE026.8060806@ieee.org> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> Message-ID: <442F7114.40908@hooft.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Travis Oliphant wrote: | save = numpy.seterr(dividebyzero='warn') | | ... | | numpy.seterr(restore=save) Most of this discussion is outside of my scope, but I have programmed this kind of pattern in a different way before: ~ save = context.push(something) ~ ... ~ del save i.e. the destructor of the saved context object restores the old situation. In most cases it will be called by letting "save" go out of scope. I know that relying on timely object destruction can be troublesome when porting to Jython, but it is very convenient in CPython. If that goes too far, one could make a separate method on save: ~ save.pop() This can do sanity checking too (are we really at the top of the stack? Only called once?). The destructor should check whether pop has been called. Rob - -- Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEL3EUH7J/Cv8rb3QRAuvsAJ9PO6ZITdVSm+hIwxkWDHHbTNFHdQCcDSWI Iv7gupkFc8+Fby/5MFwHQf4= =zE/o -----END PGP SIGNATURE----- From aisaac at american.edu Sun Apr 2 06:58:34 2006 From: aisaac at american.edu (Alan G Isaac) Date: Sun Apr 2 06:58:34 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: <442F41AE.1080806@bigpond.net.au> References: <442F41AE.1080806@bigpond.net.au> Message-ID: On Sun, 02 Apr 2006, Gary Ruben apparently wrote: > I'd find some sort of numpy iterator equivalents of these more > useful. This would allow list comprehensions like > [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] How is this better than using ogrid? E.g., >>> x=N.ogrid[:3,:2] >>> N.power(*x) array([[1, 0], [1, 1], [1, 2]]) Thanks, Alan From cjw at sympatico.ca Sun Apr 2 07:22:09 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Apr 2 07:22:09 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442DD638.60706@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> Message-ID: <442FDDD5.8050404@sympatico.ca> Tim Hochberg wrote: > Sebastian Haase wrote: > >> Thanks Tim, >> that's OK - I got the idea... >> BTW, is there a (policy) reason that you sent the first email just to >> me and not the mailing list !? > > > No. Just clumsy fingers. Probably the same reason the functions got > all garbled! > >> >> I would really be more interested in comments to my first point ;-) >> I think it's important that numpy will not be to cryptic and only for >> "hackers", but nice to look at ... (hope you get what I mean ;-) > > > Well, I think it's probably a good idea and it sounds like Travis like > the idea " for some of the builtin types". I suspect that's code for > "not types for which it doesn't make sense, like recarrays". > Tim, Could you elaborate on this please? Surely, it would be good for all functions and methods to have meaningful parameter lists and good doc strings. Colin W. From tim.hochberg at cox.net Sun Apr 2 08:11:17 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 08:11:17 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442FDDD5.8050404@sympatico.ca> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> Message-ID: <442FE950.8090000@cox.net> Colin J. Williams wrote: > Tim Hochberg wrote: > >> Sebastian Haase wrote: >> >>> Thanks Tim, >>> that's OK - I got the idea... >>> BTW, is there a (policy) reason that you sent the first email just >>> to me and not the mailing list !? >> >> >> >> No. Just clumsy fingers. Probably the same reason the functions got >> all garbled! >> >>> >>> I would really be more interested in comments to my first point ;-) >>> I think it's important that numpy will not be to cryptic and only >>> for "hackers", but nice to look at ... (hope you get what I mean ;-) >> >> >> >> Well, I think it's probably a good idea and it sounds like Travis >> like the idea " for some of the builtin types". I suspect that's code >> for "not types for which it doesn't make sense, like recarrays". >> > Tim, > > Could you elaborate on this please? Surely, it would be good for all > functions and methods to have meaningful parameter lists and good doc > strings. This isn't really about parameter lists and docstrings, it's about __str__ and possibly __repr__. The basic issue is that the way dtypes are displayed is powerful, but unfriendly. If I create an array of integers: >>> a = arange(4) >>> print repr(a.dtype), str(a.dtype) dtype('i4') is not the same as dtype(int32) on my machine and should probably not be displayed using int32[1]. These cases should be rare in practice and it seems fine to fall back to the less friendly but more flexible notation. Recarrays were probably not such a good example. Here is an example from a recarray: dtype([('x', 'i4').name is 'int32' which seems wrong. From tim.hochberg at cox.net Sun Apr 2 08:41:24 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 08:41:24 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442F7114.40908@hooft.net> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> <442F7114.40908@hooft.net> Message-ID: <442FF03F.2000406@cox.net> Rob Hooft wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Travis Oliphant wrote: > | save = numpy.seterr(dividebyzero='warn') > | > | ... > | > | numpy.seterr(restore=save) > > Most of this discussion is outside of my scope, but I have programmed > this kind of pattern in a different way before: > > ~ save = context.push(something) > ~ ... > ~ del save > > i.e. the destructor of the saved context object restores the old > situation. In most cases it will be called by letting "save" go out of > scope. I know that relying on timely object destruction can be > troublesome when porting to Jython, but it is very convenient in CPython. > > If that goes too far, one could make a separate method on save: > > ~ save.pop() > > This can do sanity checking too (are we really at the top of the stack? > Only called once?). The destructor should check whether pop has been > called. Well, the syntax that *I* really want is this: class error_mode(object): def __init__(self, all=None, overflow=None, underflow=None, dividebyzero=None, invalid=None): self._args = (overflow, overflow, underflow, dividebyzero, invalid) def __enter__(self): self._save = numpy.seterr(*self._args) def __exit__(self): numpy.seterr(self._save) That way, in a few months, I can do this: with error_mode(overflow='raise'): # do stuff and it will be almost impossible to mess up. This syntax is lighter and cleaner than a stack or relying on garbage collection to free the resources. So, for my purposes, the simple syntax Travis proposes is perfectly adequate and simpler to implement and get right than a stack based approach. If 'with' wasn't coming down the pipe, I would push for a stack, but I like Travis' proposal just fine. YMMV of course. -tim From tim.hochberg at cox.net Sun Apr 2 08:52:09 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 08:52:09 2006 Subject: [Numpy-discussion] observations Message-ID: <442FF2F8.3030906@cox.net> I've been doing a *lot* of playing with numpy over the last several days, so expect various observations to trickle from my abode over the next week or so. Here's the first installment. * tostring probably needs the order flag. I think you want the string generated from a multidimensional array in Fortran and C order to differ. * With the evolution of the order flag, ascontiguousarray is probably redundant, scarcely after it was added. b = asarray(a, order="C") Is actually clearer in intent than: b = ascontiguousarray(a) Does the latter leave a contiguous, Fortran order array alone? That's probably almost never what one wants. Unless your working with Fortran arrays, in which case the opposite ambiguity applies. Regards, -tim From tim.hochberg at cox.net Sun Apr 2 11:20:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 11:20:03 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: <442F41AE.1080806@bigpond.net.au> References: <442F41AE.1080806@bigpond.net.au> Message-ID: <44301590.4050707@cox.net> Gary Ruben wrote: > A few rough thoughts: > > I'm a bit ambivalent about this. It's not very n-dimensional and > enforces an x,y,z,(t?) ordering of the array dimensions which some > programmers may not want to adhere to. On the occasions I've had to > write code which loops over multiple dimensions, I've found the python > cookbook routines for permutation and combination generators really > useful > > > > > so I'd find some sort of numpy iterator equivalents of these more > useful. This would allow list comprehensions like > > [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] > > It would also be good to have it able to specify the rank of the > object returned to allow whole array rows or matrices to be returned > i.e. array slices. Maybe the ndrange function could allow something like > > [f(xy,z) for (xy,z) in ndrange((10,0,1),10)] > where you use a tuple to specify a range and the axes to slice out. > [f(x,yz) for (x,yz) in ndrange(10,(10,1,2))] > [f(xz,y) for (xz,y) in ndrange((10,0,2),(10,1))] > > On the other hand your idea would potentially make some code a lot > easier to understand, so I'm not against it and if it was picked up, > I'd propose "t" or "w" for the 4th dimension. It might help to post > some code that you think might benefit from your idea. Bah, humbug! "Not every two-line Python function has to come pre-written" -- Tim Peters on C.L.P def xrange(*args, **kwargs): return arange(*args, **kwargs) def yrange(*args, **kwargs): return padshape(arange(*args, **kwargs), 2) def zrange(*args, **kwargs): return padshape(arange(*args, **kwargs), 3) def trange(*args, **kwargs): return padshape(arange(*args, **kwargs), 4) Of course, then you need padshape which I'd be happy to contribute. I'm of the opinion that we should be trying to improve the usefullness of a smallish set of core primitives, not adding endless new functions. Stuff like this, which is of interest in a relatively limited domain and is trivial to implement when needed, should either not be added at all, or added in a separate module. >>> len(dir(numpy)) 476 Does anyone know what all of that does? I certainly don't. And I doubt anyone uses more than a fraction of that interface. I wouldn't be the least bit suprised if there are old moldy parts of that are essentially used. And, unused code is buggy code in my experience. "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery It's probably difficult at this point in numpy's life cycle to remove stuff or even reorganize things substantially. Besides, I'm sure all the developers have their hands full doing more important, or at least less contentious, things. Still, I think we should cast a more critical eye on new stuff before adding it. Regards, -tim > > Gary R. > > Arnd Baecker wrote: > >> Dear numpy enthusiasts, >> >> one python command which is extremely useful in 1D situations >> is `xrange`. However, for higher dimensional >> settings we strongly lack the commands `yrange` and `zrange`. >> These could be shorthands for the corresponding >> constructs with `:,NewAxis` added. >> >> Any comments, suggestion and even implementations are very welcome, >> >> Arnd >> >> P.S.: What I am not sure about is the right command for >> the 4-dimensional case - which letter should be used after the "z"? >> (it seems that "a" would be a very natural choice...) > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From arnd.baecker at web.de Sun Apr 2 11:23:04 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Sun Apr 2 11:23:04 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: <442F41AE.1080806@bigpond.net.au> References: <442F41AE.1080806@bigpond.net.au> Message-ID: Hi, On Sun, 2 Apr 2006, Gary Ruben wrote: > A few rough thoughts: [... useful stuff snipped ... ] > On the other hand your idea would potentially make some code a lot > easier to understand, so I'm not against it and if it was picked up, I'd > propose "t" or "w" for the 4th dimension. It might help to post some > code that you think might benefit from your idea. Hope you don't jump at me, but I would like to wait until April 1st next year then ... ((hmm, maybe my post contained too much of a possible truth to be considered as an April fools joke - yrange and zrange have been a running gag in our group for a while now - strange German humor ...;-)) Anyway, I hope I did not waste too much of your time ... Best, Arnd > Gary R. > > Arnd Baecker wrote: > > Dear numpy enthusiasts, > > > > one python command which is extremely useful in 1D situations > > is `xrange`. However, for higher dimensional > > settings we strongly lack the commands `yrange` and `zrange`. > > These could be shorthands for the corresponding > > constructs with `:,NewAxis` added. > > > > Any comments, suggestion and even implementations are very welcome, > > > > Arnd > > > > P.S.: What I am not sure about is the right command for > > the 4-dimensional case - which letter should be used after the "z"? > > (it seems that "a" would be a very natural choice...) > > From tim.hochberg at cox.net Sun Apr 2 11:34:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 11:34:03 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: <442F41AE.1080806@bigpond.net.au> Message-ID: <44301908.2000607@cox.net> Arnd Baecker wrote: >Hi, > >On Sun, 2 Apr 2006, Gary Ruben wrote: > > > >>A few rough thoughts: >> >> > >[... useful stuff snipped ... ] > > > >>On the other hand your idea would potentially make some code a lot >>easier to understand, so I'm not against it and if it was picked up, I'd >>propose "t" or "w" for the 4th dimension. It might help to post some >>code that you think might benefit from your idea. >> >> > >Hope you don't jump at me, but I would like to >wait until April 1st next year then ... >((hmm, maybe my post contained too much of a possible truth >to be considered as an April fools joke - >yrange and zrange have been a running gag in our group for >a while now - strange German humor ...;-)) > >Anyway, I hope I did not waste too much of your time ... > > Ouch! Got me anyway... >Best, Arnd > > > > >>Gary R. >> >>Arnd Baecker wrote: >> >> >>>Dear numpy enthusiasts, >>> >>>one python command which is extremely useful in 1D situations >>>is `xrange`. However, for higher dimensional >>>settings we strongly lack the commands `yrange` and `zrange`. >>>These could be shorthands for the corresponding >>>constructs with `:,NewAxis` added. >>> >>>Any comments, suggestion and even implementations are very welcome, >>> >>>Arnd >>> >>>P.S.: What I am not sure about is the right command for >>>the 4-dimensional case - which letter should be used after the "z"? >>>(it seems that "a" would be a very natural choice...) >>> >>> >> >> > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > From schofield at ftw.at Sun Apr 2 13:05:02 2006 From: schofield at ftw.at (Ed Schofield) Date: Sun Apr 2 13:05:02 2006 Subject: [Numpy-discussion] Deprecating old names In-Reply-To: <44301590.4050707@cox.net> References: <442F41AE.1080806@bigpond.net.au> <44301590.4050707@cox.net> Message-ID: <44302EA9.9050302@ftw.at> Tim Hochberg wrote, in a different thread: > >>> len(dir(numpy)) > 476 > > Does anyone know what all of that does? I certainly don't. And I doubt > anyone uses more than a fraction of that interface. I wouldn't be the > least bit suprised if there are old moldy parts of that are > essentially used. And, unused code is buggy code in my experience. > > "Perfection is achieved, not when there is nothing more to add, but > when there is nothing left to take away." -- Antoine de Saint-Exupery I'd like to revise a proposal I made last week. Then I proposed that we reduce namespace clutter by not importing the contents of the oldnumeric namespace by default. But Travis didn't want to deprecate the functional interfaces (sum(), take(), etc), so I now propose instead that we split up the contents of oldnumeric.py into interfaces we want to keep around indefinitely and interfaces we don't. The ones we want to keep could go into another file, e.g. fromnumeric.py, whose contents are imported into the numpy namespace by default. The deprecated ones could stay in oldnumeric.py, and could be accessible through 'from oldnumeric import *' at the top of source files, but not imported by default. Strong candidates for deprecation are the capitalised type names, like Int8, Complex64, UnsignedInt. I'd also argue for deprecating NewAxis, UFuncType, ArrayType, arraytype, and anything else that duplicates functionality available under NumPy under a different name. Two of the Python design principles (from http://www.python.org/dev/culture/) are: - There should be one -- and preferably only one -- obvious way to do it. - Namespaces are one honking great idea -- let's do more of those! Let's clean up the cruft! -- Ed From gruben at bigpond.net.au Sun Apr 2 16:06:10 2006 From: gruben at bigpond.net.au (Gary Ruben) Date: Sun Apr 2 16:06:10 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: <442F41AE.1080806@bigpond.net.au> Message-ID: <443058AE.2070808@bigpond.net.au> Doh! It's OK Arnd; I've recently seen you (or someone else withe the same name) acknowledged in a PhD I've been reading so I suspect you're a nice guy :-) And, thanks Alan. I knew about mgrid but not ogrid. One small way in which that example might be better than using ogrid is that you could avoid creating the index arrays and lazily generate the indices. However, ogrid is better than mgrid in this respect. thanks, Gary Alan G Isaac wrote: > On Sun, 02 Apr 2006, Gary Ruben apparently wrote: >> I'd find some sort of numpy iterator equivalents of these more >> useful. This would allow list comprehensions like >> [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] > > How is this better than using ogrid? E.g., > >>>> x=N.ogrid[:3,:2] >>>> N.power(*x) > array([[1, 0], > [1, 1], > [1, 2]]) > > Thanks, > Alan From zpincus at stanford.edu Sun Apr 2 16:07:07 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Sun Apr 2 16:07:07 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? Message-ID: Hi folks, I have a inner loop that looks like this: out = [] for elem1 in l1: for elem2 in l2: out.append(do_something(l1, l2)) result = do_something_else(out) where do_something and do_something_else are implemented with only numpy ufuncs, and l1 and l2 are numpy arrays. As an example, I need to compute the median distance from any element in one set to any element in another set. What's the best way to speed this sort of thing up with numpy (e.g. push as much down into the underlying C as possible)? I could re- write do_something with the numexpr tools (which are very cool), but that doesn't address the fact that I've still got nested loops living in Python. Perhaps there's some way in numpy to make one big honking array that contains all the pairs from the two lists, and then just run my do_something on that huge array, but that of course scales poorly. Any thoughts? Zach From tim.hochberg at cox.net Sun Apr 2 16:53:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 16:53:05 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: Message-ID: <443063C0.3050002@cox.net> Zachary Pincus wrote: > Hi folks, Hi Zach, > > I have a inner loop that looks like this: > out = [] > for elem1 in l1: > for elem2 in l2: > out.append(do_something(l1, l2)) this is do_something(elem1, elem2), correct? > result = do_something_else(out) > > where do_something and do_something_else are implemented with only > numpy ufuncs, and l1 and l2 are numpy arrays. > > As an example, I need to compute the median distance from any element > in one set to any element in another set. > > What's the best way to speed this sort of thing up with numpy (e.g. > push as much down into the underlying C as possible)? I could re- > write do_something with the numexpr tools (which are very cool), but > that doesn't address the fact that I've still got nested loops living > in Python. The exact approach I'd take would depend on the sizes of l1 and l2 and a certain amount of trial and error. However, the first thing I'd try is: n1 = len(l1) n2 = len(l2) out = numpy.zeros([n1*n2], appropriate_dtype) for i, elem1 in enumerate(l1): out[i*n2:(i+1)*n2] = do_something(elem1, l1) result = do_something_else(out) That may work as is, or you may have to tweak do_something slightly to handle l1 correctly. You might also try to do the operations in place and stuff the results into out directly by using X= and three argument ufuncs. I'd not do that at first though. One thing to consider is that, in my experience, numpy works best on chunks of about 10,000 elements. I believe that this is a function of cache size. Anyway, this may choice of which of l1 and l2 you continue to loop over, and which you vectorize. If they both might get really big, you could even consider chopping up l1 when you vectorize it. Again I wouldn't do that unless it really looks like you need it. If that all sounds opaque, feel free to ask more questions. Or if you have questions about microoptimizing the guts of do_something, I have a bunch of experience with that and I like a good puzzle. > > Perhaps there's some way in numpy to make one big honking array that > contains all the pairs from the two lists, and then just run my > do_something on that huge array, but that of course scales poorly. I know of at least one way, but it's a bit of a kludge. I don't think I'd try that though. As you said, it scales poorly. As long as you can vectorize your inner loop, it's not necessary and sometimes makes things worse, to vectorize your outer loop as well. That's assuming your inner loop is large, it doesn't help if your inner loop is 3 elements long for instance, but that doesn't seem like it should be a problem here. Regards, -tim From haase at msg.ucsf.edu Sun Apr 2 17:01:04 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Sun Apr 2 17:01:04 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442FE950.8090000@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> Message-ID: <44306594.50305@msg.ucsf.edu> Tim Hochberg wrote: > This would work fine if repr were instead: > > dtype([('x', float64), ('z', complex128)]) > > Anyway, this all seems reasonable to me at first glance. That said, I > don't plan to work on this, I've got other fish to fry at the moment. A new point: Please remind me (and probably others): when did it get decided to introduce 'complex128' to mean numarray's complex64 and the 'complex64' to mean numarray's complex32 ? I do understand the logic that 128 is really the bit-size of one (complex) element - but I also liked the old way, because: 1. e.g. in fft transforms, float32 would "go with" complex32 and float64 with complex64 2. complex128 is one character extra (longer) and also (alphabetically) now sorts before(!) complex64 These might just be my personal (idiotic ;-) comments - but I would appreciate some feedback/comments. Also: Is it now to late to (re-)start a discussion on this !? Thanks - Sebastian Haase From haase at msg.ucsf.edu Sun Apr 2 17:09:07 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Sun Apr 2 17:09:07 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442FE950.8090000@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> Message-ID: <44306774.5030507@msg.ucsf.edu> Tim Hochberg wrote: > This would work fine if repr were instead: > > dtype([('x', float64), ('z', complex128)]) > > Anyway, this all seems reasonable to me at first glance. That said, I > don't plan to work on this, I've got other fish to fry at the moment. A new point: Please remind me (and probably others): when did it get decided to introduce 'complex128' to mean numarray's complex64 and the 'complex64' to mean numarray's complex32 ? I do understand the logic that 128 is really the bit-size of one (complex) element - but I also liked the old way, because: 1. e.g. in fft transforms, float32 would "go with" complex32 and float64 with complex64 2. complex128 is one character extra (longer) and also (alphabetically) now sorts before(!) complex64 3 Mostly of course: this new naming will confuse all my code and introduce hard to find bugs - when I see complex64 I will "think" the old way for quite some time ... These might just be my personal (idiotic ;-) comments - but I would appreciate some feedback/comments. Also: Is it now to late to (re-)start a discussion on this !? Thanks - Sebastian Haase From zpincus at stanford.edu Sun Apr 2 17:17:06 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Sun Apr 2 17:17:06 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <443063C0.3050002@cox.net> References: <443063C0.3050002@cox.net> Message-ID: <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> Tim - Thanks for your suggestions -- that all makes good sense. It sounds like the general take home message is, as always: "the first thing to try is to vectorize your inner loop." Zach >> I have a inner loop that looks like this: >> out = [] >> for elem1 in l1: >> for elem2 in l2: >> out.append(do_something(l1, l2)) > > this is do_something(elem1, elem2), correct? > >> result = do_something_else(out) >> >> where do_something and do_something_else are implemented with >> only numpy ufuncs, and l1 and l2 are numpy arrays. >> >> As an example, I need to compute the median distance from any >> element in one set to any element in another set. >> >> What's the best way to speed this sort of thing up with numpy >> (e.g. push as much down into the underlying C as possible)? I >> could re- write do_something with the numexpr tools (which are >> very cool), but that doesn't address the fact that I've still got >> nested loops living in Python. > > The exact approach I'd take would depend on the sizes of l1 and l2 > and a certain amount of trial and error. However, the first thing > I'd try is: > > n1 = len(l1) > n2 = len(l2) > out = numpy.zeros([n1*n2], appropriate_dtype) > for i, elem1 in enumerate(l1): > out[i*n2:(i+1)*n2] = do_something(elem1, l1) > result = do_something_else(out) > > That may work as is, or you may have to tweak do_something slightly > to handle l1 correctly. You might also try to do the operations in > place and stuff the results into out directly by using X= and three > argument ufuncs. I'd not do that at first though. > > One thing to consider is that, in my experience, numpy works best > on chunks of about 10,000 elements. I believe that this is a > function of cache size. Anyway, this may choice of which of l1 and > l2 you continue to loop over, and which you vectorize. If they both > might get really big, you could even consider chopping up l1 when > you vectorize it. Again I wouldn't do that unless it really looks > like you need it. > > If that all sounds opaque, feel free to ask more questions. Or if > you have questions about microoptimizing the guts of do_something, > I have a bunch of experience with that and I like a good puzzle. > >> >> Perhaps there's some way in numpy to make one big honking array >> that contains all the pairs from the two lists, and then just run >> my do_something on that huge array, but that of course scales >> poorly. > > I know of at least one way, but it's a bit of a kludge. I don't > think I'd try that though. As you said, it scales poorly. As long > as you can vectorize your inner loop, it's not necessary and > sometimes makes things worse, to vectorize your outer loop as well. > That's assuming your inner loop is large, it doesn't help if your > inner loop is 3 elements long for instance, but that doesn't seem > like it should be a problem here. > > Regards, > > -tim > From haase at msg.ucsf.edu Sun Apr 2 17:21:14 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Sun Apr 2 17:21:14 2006 Subject: [Fwd: Re: [Numpy-discussion] first impressions with numpy] Message-ID: <44306A2C.4040606@msg.ucsf.edu> supposedly meant for the whole list ... From: Tim Hochberg Sebastian Haase wrote: > Tim Hochberg wrote: > > >> This would work fine if repr were instead: >> >> dtype([('x', float64), ('z', complex128)]) >> >> Anyway, this all seems reasonable to me at first glance. That said, I >> don't plan to work on this, I've got other fish to fry at the moment. > > > A new point: Please remind me (and probably others): when did it get > decided to introduce 'complex128' to mean numarray's complex64 > and the 'complex64' to mean numarray's complex32 ? I haven't the faintest idea -- it happened when I was off in Numarray land I assume. Or it was always that way? No idea. Hopefully Travis will answer this. -tim > > I do understand the logic that 128 is really the bit-size of one > (complex) element - but I also liked the old way, because: > 1. e.g. in fft transforms, float32 would "go with" complex32 > and float64 with complex64 > 2. complex128 is one character extra (longer) and also > (alphabetically) now sorts before(!) complex64 > 3 Mostly of course: this new naming will confuse all my code and > introduce hard to find bugs - when I see complex64 I will "think" the > old way for quite some time ... > > > These might just be my personal (idiotic ;-) comments - but I would > appreciate some feedback/comments. > Also: Is it now to late to (re-)start a discussion on this !? > > Thanks > - Sebastian Haase > > From tim.hochberg at cox.net Sun Apr 2 17:53:01 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 17:53:01 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> Message-ID: <443071BA.4090606@cox.net> Zachary Pincus wrote: > Tim - > > Thanks for your suggestions -- that all makes good sense. > > It sounds like the general take home message is, as always: "the > first thing to try is to vectorize your inner loop." Exactly and far more pithy than my meanderings. If I were going to make a list it would look something like: 0. Think about your algorithm. 1. Vectorize your inner loop. 2. Eliminate temporaries 3. Ask for help 4. Recode in C. 5 Accept that your code will never be fast. Step zero should probably be repeated after every other step ;) -tim > > Zach > > >>> I have a inner loop that looks like this: >>> out = [] >>> for elem1 in l1: >>> for elem2 in l2: >>> out.append(do_something(l1, l2)) >> >> >> this is do_something(elem1, elem2), correct? >> >>> result = do_something_else(out) >>> >>> where do_something and do_something_else are implemented with only >>> numpy ufuncs, and l1 and l2 are numpy arrays. >>> >>> As an example, I need to compute the median distance from any >>> element in one set to any element in another set. >>> >>> What's the best way to speed this sort of thing up with numpy >>> (e.g. push as much down into the underlying C as possible)? I >>> could re- write do_something with the numexpr tools (which are very >>> cool), but that doesn't address the fact that I've still got >>> nested loops living in Python. >> >> >> The exact approach I'd take would depend on the sizes of l1 and l2 >> and a certain amount of trial and error. However, the first thing >> I'd try is: >> >> n1 = len(l1) >> n2 = len(l2) >> out = numpy.zeros([n1*n2], appropriate_dtype) >> for i, elem1 in enumerate(l1): >> out[i*n2:(i+1)*n2] = do_something(elem1, l1) >> result = do_something_else(out) >> >> That may work as is, or you may have to tweak do_something slightly >> to handle l1 correctly. You might also try to do the operations in >> place and stuff the results into out directly by using X= and three >> argument ufuncs. I'd not do that at first though. >> >> One thing to consider is that, in my experience, numpy works best on >> chunks of about 10,000 elements. I believe that this is a function >> of cache size. Anyway, this may choice of which of l1 and l2 you >> continue to loop over, and which you vectorize. If they both might >> get really big, you could even consider chopping up l1 when you >> vectorize it. Again I wouldn't do that unless it really looks like >> you need it. >> >> If that all sounds opaque, feel free to ask more questions. Or if >> you have questions about microoptimizing the guts of do_something, I >> have a bunch of experience with that and I like a good puzzle. >> >>> >>> Perhaps there's some way in numpy to make one big honking array >>> that contains all the pairs from the two lists, and then just run >>> my do_something on that huge array, but that of course scales poorly. >> >> >> I know of at least one way, but it's a bit of a kludge. I don't >> think I'd try that though. As you said, it scales poorly. As long >> as you can vectorize your inner loop, it's not necessary and >> sometimes makes things worse, to vectorize your outer loop as well. >> That's assuming your inner loop is large, it doesn't help if your >> inner loop is 3 elements long for instance, but that doesn't seem >> like it should be a problem here. >> >> Regards, >> >> -tim >> > > > From oliphant.travis at ieee.org Sun Apr 2 21:14:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Apr 2 21:14:01 2006 Subject: [Numpy-discussion] Deprecating old names In-Reply-To: <44302EA9.9050302@ftw.at> References: <442F41AE.1080806@bigpond.net.au> <44301590.4050707@cox.net> <44302EA9.9050302@ftw.at> Message-ID: <4430A0BF.1080207@ieee.org> Ed Schofield wrote: > Tim Hochberg wrote, in a different thread: > >> >>> len(dir(numpy)) >> 476 >> >> Does anyone know what all of that does? I certainly don't. And I doubt >> anyone uses more than a fraction of that interface. I wouldn't be the >> least bit suprised if there are old moldy parts of that are >> essentially used. And, unused code is buggy code in my experience. >> >> "Perfection is achieved, not when there is nothing more to add, but >> when there is nothing left to take away." -- Antoine de Saint-Exupery >> > > I'd like to revise a proposal I made last week. Then I proposed that we > reduce namespace clutter by not importing the contents of the oldnumeric > namespace by default. But Travis didn't want to deprecate the > functional interfaces (sum(), take(), etc), so I now propose instead > that we split up the contents of oldnumeric.py into interfaces we want > to keep around indefinitely and interfaces we don't. Good idea... -Travis From rob at hooft.net Sun Apr 2 22:46:09 2006 From: rob at hooft.net (Rob W.W. Hooft) Date: Sun Apr 2 22:46:09 2006 Subject: [Fwd: Re: [Numpy-discussion] first impressions with numpy] In-Reply-To: <44306A2C.4040606@msg.ucsf.edu> References: <44306A2C.4040606@msg.ucsf.edu> Message-ID: <4430B5D6.7020907@hooft.net> Sebastian Haase wrote: >> A new point: Please remind me (and probably others): when did it get >> decided to introduce 'complex128' to mean numarray's complex64 >> and the 'complex64' to mean numarray's complex32 ? > > > I haven't the faintest idea -- it happened when I was off in Numarray > land I assume. Or it was always that way? No idea. Hopefully Travis will > answer this. Fortran heritage? REAL*8 is paired with COMPLEX*16 there.... Regards, Rob Hooft From arnd.baecker at web.de Mon Apr 3 02:18:08 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 3 02:18:08 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: Message-ID: Hi, On Sun, 2 Apr 2006, Zachary Pincus wrote: > Hi folks, > > I have a inner loop that looks like this: > out = [] > for elem1 in l1: > for elem2 in l2: > out.append(do_something(l1, l2)) > result = do_something_else(out) > > where do_something and do_something_else are implemented with only > numpy ufuncs, and l1 and l2 are numpy arrays. > > As an example, I need to compute the median distance from any element > in one set to any element in another set. > > What's the best way to speed this sort of thing up with numpy (e.g. > push as much down into the underlying C as possible)? I could re- > write do_something with the numexpr tools (which are very cool), but > that doesn't address the fact that I've still got nested loops living > in Python. If do_something eats arrays, you could try: result = do_something(l1[:,NewAxis], l2) E.g.: from numpy import * l1 = linspace(0.0, pi, 10) l2 = linspace(0.0, pi, 3) def f(y, x): return sin(y)*cos(x) print f(l1[:,NewAxis], l2) ((Note that I just learned in some other thread that with numpy there is an alternative to NewAxis, but I haven't figured out which that is ...)) Best, Arnd From zpincus at stanford.edu Mon Apr 3 08:50:10 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Mon Apr 3 08:50:10 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <443071BA.4090606@cox.net> References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: > If I were going to make a list it would look something like: > > 0. Think about your algorithm. > 1. Vectorize your inner loop. > 2. Eliminate temporaries > 3. Ask for help > 4. Recode in C. > 5 Accept that your code will never be fast. > > Step zero should probably be repeated after every other step ;) Thanks for this list -- it's a good one. Since we're discussing this, could I ask about the best way to eliminate temporaries? If you're using ufuncs, is there some way to make them work in-place? Or is the lowest-hanging fruit (temporary- wise) typically elsewhere? Zach From tim.hochberg at cox.net Mon Apr 3 10:10:40 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 3 10:10:40 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: Message-ID: <44315633.4010600@cox.net> Arnd Baecker wrote: [SNIP] >((Note that I just learned in some other thread that with numpy there is >an alternative to NewAxis, but I haven't figured out which that is ...)) > > If you're old school you could just use None. But you probably mean 'newaxis'. -tim From robert.kern at gmail.com Mon Apr 3 10:19:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 3 10:19:02 2006 Subject: [Numpy-discussion] Re: Speed up function on cross product of two sets? In-Reply-To: References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: Zachary Pincus wrote: >> If I were going to make a list it would look something like: >> >> 0. Think about your algorithm. >> 1. Vectorize your inner loop. >> 2. Eliminate temporaries >> 3. Ask for help >> 4. Recode in C. >> 5 Accept that your code will never be fast. >> >> Step zero should probably be repeated after every other step ;) > > Thanks for this list -- it's a good one. > > Since we're discussing this, could I ask about the best way to > eliminate temporaries? If you're using ufuncs, is there some way to > make them work in-place? Or is the lowest-hanging fruit (temporary- > wise) typically elsewhere? Many binary ufuncs take an optional third argument which is an array which the ufunc should put the result in. In [2]: x = arange(10) In [3]: y = arange(10) In [4]: id(x) Out[4]: 91297984 In [5]: add(x, y, x) Out[5]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]) In [6]: id(Out[5]) Out[6]: 91297984 In [7]: x Out[7]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]) -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Mon Apr 3 10:36:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 3 10:36:05 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: <44315CD6.3010001@cox.net> Zachary Pincus wrote: >> If I were going to make a list it would look something like: >> >> 0. Think about your algorithm. >> 1. Vectorize your inner loop. >> 2. Eliminate temporaries >> 3. Ask for help >> 4. Recode in C. >> 5 Accept that your code will never be fast. >> >> Step zero should probably be repeated after every other step ;) > > > Thanks for this list -- it's a good one. > > Since we're discussing this, could I ask about the best way to > eliminate temporaries? If you're using ufuncs, is there some way to > make them work in-place? Or is the lowest-hanging fruit (temporary- > wise) typically elsewhere? The least cryptic is to use *=, +=, where you can. But that only get's you so far. As you guessed, there is a secret extra argument to ufuncs that allow you to do results in place. One could replace scratch=a*(b+sqrt(a)) with: >>> scratch = zeros([5], dtype=float) >>> a = arange(5, dtype=float) >>> b = arange(5, dtype=float) >>> sqrt(a, scratch) array([ 0. , 1. , 1.41421356, 1.73205081, 2. ]) >>> add(scratch, b, scratch) array([ 0. , 2. , 3.41421356, 4.73205081, 6. ]) >>> multiply(a, scratch) array([ 0. , 2. , 6.82842712, 14.19615242, 24. ]) The downside of this is that your code goes from comprehensible to insanely cryprtic pretty fast. I only resort to this in extreme circumstances. You could also use numexpr, which should be faster and is much less cryptic, but may not be completely stable yet. Oh, and don't forget step 0, that's sometimes a good way to reduce temporaries. regards, -tim From verveer at embl-heidelberg.de Mon Apr 3 12:00:04 2006 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Mon Apr 3 12:00:04 2006 Subject: [Numpy-discussion] Re: Speed up function on cross product of two sets? In-Reply-To: References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: <012B117C-4046-4058-B7F9-AC5EDB68A532@embl-heidelberg.de> On 3 Apr 2006, at 19:17, Robert Kern wrote: > Zachary Pincus wrote: >>> If I were going to make a list it would look something like: >>> >>> 0. Think about your algorithm. >>> 1. Vectorize your inner loop. >>> 2. Eliminate temporaries >>> 3. Ask for help >>> 4. Recode in C. >>> 5 Accept that your code will never be fast. >>> >>> Step zero should probably be repeated after every other step ;) >> >> Thanks for this list -- it's a good one. >> >> Since we're discussing this, could I ask about the best way to >> eliminate temporaries? If you're using ufuncs, is there some way to >> make them work in-place? Or is the lowest-hanging fruit (temporary- >> wise) typically elsewhere? > > Many binary ufuncs take an optional third argument which is an > array which the > ufunc should put the result in. I wished many times that all functions would support an optional output argument. It is not only important for speed optimization, but also if you work with large data sets. I guess the use of a return values is much more natural but when the point comes that you want to optimize your algorithm, the ability to use an output argument instead is very valuable. It would be nice if all functions by default would support a standard keyword argument 'output', just like ufuncs do. I suppose these could in principle be added while still maintaining backwards compatibility. Cheers, Peter From oliphant at ee.byu.edu Mon Apr 3 15:59:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 3 15:59:06 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <44306594.50305@msg.ucsf.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> Message-ID: <4431A8A0.9010604@ee.byu.edu> Sebastian Haase wrote: > Tim Hochberg wrote: > > >> This would work fine if repr were instead: >> >> dtype([('x', float64), ('z', complex128)]) >> >> Anyway, this all seems reasonable to me at first glance. That said, I >> don't plan to work on this, I've got other fish to fry at the moment. > > > A new point: Please remind me (and probably others): when did it get > decided to introduce 'complex128' to mean numarray's complex64 > and the 'complex64' to mean numarray's complex32 ? It was last February (i.e. 2005) when I first started posting regarding the new NumPy. I claimed it was more consistent to use actual bit-widths. A few people, including Perry, indicated they weren't opposed to the change and so I went ahead with it. You can read relevant posts by searching on numpy-discussion at lists.sourceforge.net Discussions are always welcome. I suppose it's not too late to change something like this --- but it's getting there... -Travis From ryanlists at gmail.com Mon Apr 3 17:50:03 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Mon Apr 3 17:50:03 2006 Subject: [Numpy-discussion] string matrices Message-ID: I am trying to use NumPy to generate some matrix inputs to Maxima for symbolic analysis. I am using a fair number of matrix.astype('S%d'%maxlen) statements. This seems to work very well. It also doesn't seem to pad the elements in anyway if maxlen is bigger than I need, which is great. This may seem like a dumb computer science question, but what is the memory/performance cost of making maxlen bigger than I want (but making sure that it is way bigger than I need so that the elements don't get truncated)? If my biggest matrices will be 13x13, how long can the strings be before I consume more than a few megs (or a few dozen megs) of memory? Thanks, Ryan From haase at msg.ucsf.edu Mon Apr 3 22:06:05 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Mon Apr 3 22:06:05 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 (was: first impressions with numpy In-Reply-To: <4431A8A0.9010604@ee.byu.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> Message-ID: <4431FE90.6060301@msg.ucsf.edu> Hi, Could we start another poll on this !? I think I would vote +1 for complex32 & complex64 mostly just because of "that's what I'm used to" But I'm curious to hear what others "know to be in use" - e.g. Matlab or IDL ! - Thanks Sebastian Haase Travis Oliphant wrote: > Sebastian Haase wrote: > >> Tim Hochberg wrote: >> >> >>> This would work fine if repr were instead: >>> >>> dtype([('x', float64), ('z', complex128)]) >>> >>> Anyway, this all seems reasonable to me at first glance. That said, I >>> don't plan to work on this, I've got other fish to fry at the moment. >> >> >> A new point: Please remind me (and probably others): when did it get >> decided to introduce 'complex128' to mean numarray's complex64 >> and the 'complex64' to mean numarray's complex32 ? > > It was last February (i.e. 2005) when I first started posting regarding > the new NumPy. I claimed it was more consistent to use actual > bit-widths. A few people, including Perry, indicated they weren't > opposed to the change and so I went ahead with it. > > You can read relevant posts by searching on > numpy-discussion at lists.sourceforge.net > > Discussions are always welcome. I suppose it's not too late to change > something like this --- but it's getting there... > > -Travis From robert.kern at gmail.com Mon Apr 3 22:41:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 3 22:41:02 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4431FE90.6060301@msg.ucsf.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: Sebastian Haase wrote: > Hi, > Could we start another poll on this !? Please, let's leave voting as a method of last resort. > I think I would vote > +1 for complex32 & complex64 mostly just because of "that's what I'm > used to" > > But I'm curious to hear what others "know to be in use" - e.g. Matlab or > IDL ! On the merits of the issue, I like the new scheme better. For whatever reason, I tend to remember it when coding. With Numeric, I would frequently second-guess myself and go to the prompt and tab-complete to look at all of the options and reason out the one I wanted. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Mon Apr 3 22:49:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 3 22:49:02 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: <443208B9.40106@cox.net> Robert Kern wrote: >Sebastian Haase wrote: > > >>Hi, >>Could we start another poll on this !? >> >> > >Please, let's leave voting as a method of last resort. > > > >>I think I would vote >>+1 for complex32 & complex64 mostly just because of "that's what I'm >>used to" >> >>But I'm curious to hear what others "know to be in use" - e.g. Matlab or >>IDL ! >> >> > >On the merits of the issue, I like the new scheme better. For whatever reason, I >tend to remember it when coding. With Numeric, I would frequently second-guess >myself and go to the prompt and tab-complete to look at all of the options and >reason out the one I wanted. > > I can't bring myself to care. I almost always use dtype=complex and on the rare times I don't I can never remember what the scheme is regardless of which scheme it is / was / will be. On the other hand, if the scheme was Complex32x2 and Complex64x2, I could probably decipher what that was without looking it up. It is is a little ugly and weird I admit, but that probably wouldn't bother me. Regards, -tim From arnd.baecker at web.de Mon Apr 3 23:36:00 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 3 23:36:00 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: On Tue, 4 Apr 2006, Robert Kern wrote: > Sebastian Haase wrote: > > Hi, > > Could we start another poll on this !? > > Please, let's leave voting as a method of last resort. > > > I think I would vote > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > used to" > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab or > > IDL ! > > On the merits of the issue, I like the new scheme better. For whatever reason, I > tend to remember it when coding. With Numeric, I would frequently second-guess > myself and go to the prompt and tab-complete to look at all of the options and > reason out the one I wanted. In order to get an opionion on the subject: How would one presently find out about the meaning of complex64 and complex128? The following attempt does not help: In [1]:import numpy In [2]:numpy.complex64? Type: type Base Class: String Form: Namespace: Interactive Docstring: In [3]:numpy.complex128? Type: type Base Class: String Form: Namespace: Interactive Docstring: I also looked in Travis' "Guide to NumPy", where the different types are discussed on page 18 (referring to the sample chapters at http://www.tramy.us/guidetoscipy.html) Maybe chapter 12 contains more info on this ((our library was still not able to buy the 20 copies since this request was approved a month ago ...)) Best, Arnd From cjw at sympatico.ca Tue Apr 4 06:20:44 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Apr 4 06:20:44 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 In-Reply-To: <4431FE90.6060301@msg.ucsf.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: <443271C9.6080907@sympatico.ca> Sebastian Haase wrote: > Hi, > Could we start another poll on this !? > > I think I would vote > +1 for complex32 & complex64 mostly just because of "that's what I'm > used to" +1 Most people look to the number to give a clue as to the precision of the value. Colin W. > > But I'm curious to hear what others "know to be in use" - e.g. Matlab > or IDL ! > > - Thanks > Sebastian Haase > > > > Travis Oliphant wrote: > >> Sebastian Haase wrote: >> >>> Tim Hochberg wrote: >>> >>> >>>> This would work fine if repr were instead: >>>> >>>> dtype([('x', float64), ('z', complex128)]) >>>> >>>> Anyway, this all seems reasonable to me at first glance. That said, >>>> I don't plan to work on this, I've got other fish to fry at the >>>> moment. >>> >>> >>> >>> A new point: Please remind me (and probably others): when did it get >>> decided to introduce 'complex128' to mean numarray's complex64 >>> and the 'complex64' to mean numarray's complex32 ? >> >> >> It was last February (i.e. 2005) when I first started posting >> regarding the new NumPy. I claimed it was more consistent to use >> actual bit-widths. A few people, including Perry, indicated they >> weren't opposed to the change and so I went ahead with it. >> >> You can read relevant posts by searching on >> numpy-discussion at lists.sourceforge.net >> >> Discussions are always welcome. I suppose it's not too late to >> change something like this --- but it's getting there... >> >> -Travis > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ryanlists at gmail.com Tue Apr 4 07:27:01 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue Apr 4 07:27:01 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: I actually have a problem with the elements of a string matrix from astype('S#'). The shorter elements in my matrix have a bunch of terms like '1.0', because the matrix they started from was a float. I need to keep the float type, but want to get rid of the '.0 ' when I convert the string output to latex. I was going to check if element[-2:]=='.0' but ran into this problem: In [15]: temp[-2:] Out[15]: '\x00\x00' In [16]: temp.strip() Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' I think I can get rid of the \x00's by calling str(element), but is this a feature or a bug? It would be slightly cleaner for me if the string matrix elements didn't have the trailing null characters (or whatever those are), but this may not be possible given the underlying representation. Thanks, Ryan On 4/3/06, Ryan Krauss wrote: > I am trying to use NumPy to generate some matrix inputs to Maxima for > symbolic analysis. I am using a fair number of > matrix.astype('S%d'%maxlen) statements. This seems to work very well. > It also doesn't seem to pad the elements in anyway if maxlen is > bigger than I need, which is great. This may seem like a dumb > computer science question, but what is the memory/performance cost of > making maxlen bigger than I want (but making sure that it is way > bigger than I need so that the elements don't get truncated)? If my > biggest matrices will be 13x13, how long can the strings be before I > consume more than a few megs (or a few dozen megs) of memory? > > Thanks, > > Ryan > From charlesr.harris at gmail.com Tue Apr 4 08:16:07 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue Apr 4 08:16:07 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 In-Reply-To: <443271C9.6080907@sympatico.ca> References: <442D9124.5020905@msg.ucsf.edu> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> <443271C9.6080907@sympatico.ca> Message-ID: I can't get worked up over this one way or the other: complex128 make sense if I count bits, complex64 makes sense if I note precision; I just have to remember the numpy convention. One could argue that complex64 is the more conventional choice and so has the virtue of least surprise, but I don't think it is terribly difficult to become accustomed to using complex128 in its place. I suppose this is one of those programmer's vs user's point of view thingees. For the guy writing general low level numpy code what matters is the length of the type, how many bytes have to be moved and so on, and from the other point of view what counts is the precision of the arithmetic. Chuck On 4/4/06, Colin J. Williams wrote: > > Sebastian Haase wrote: > > > Hi, > > Could we start another poll on this !? > > > > I think I would vote > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > used to" > > +1 Most people look to the number to give a clue as to the precision of > the value. > > Colin W. > > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab > > or IDL ! > > > > - Thanks > > Sebastian Haase > > > > > > > > Travis Oliphant wrote: > > > >> Sebastian Haase wrote: > >> > >>> Tim Hochberg wrote: > >>> > >>> > >>>> This would work fine if repr were instead: > >>>> > >>>> dtype([('x', float64), ('z', complex128)]) > >>>> > >>>> Anyway, this all seems reasonable to me at first glance. That said, > >>>> I don't plan to work on this, I've got other fish to fry at the > >>>> moment. > >>> > >>> > >>> > >>> A new point: Please remind me (and probably others): when did it get > >>> decided to introduce 'complex128' to mean numarray's complex64 > >>> and the 'complex64' to mean numarray's complex32 ? > >> > >> > >> It was last February (i.e. 2005) when I first started posting > >> regarding the new NumPy. I claimed it was more consistent to use > >> actual bit-widths. A few people, including Perry, indicated they > >> weren't opposed to the change and so I went ahead with it. > >> > >> You can read relevant posts by searching on > >> numpy-discussion at lists.sourceforge.net > >> > >> Discussions are always welcome. I suppose it's not too late to > >> change something like this --- but it's getting there... > >> > >> -Travis > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting > > language > > that extends applications into web and mobile media. Attend the live > > webcast > > and join the prime developer group breaking into this new coding > > territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Tue Apr 4 08:49:11 2006 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 4 08:49:11 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: <200604041747.57180.faltet@carabos.com> A Dimarts 04 Abril 2006 07:40, Robert Kern va escriure: > Sebastian Haase wrote: > > I think I would vote > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > used to" > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab or > > IDL ! > > On the merits of the issue, I like the new scheme better. For whatever > reason, I tend to remember it when coding. With Numeric, I would frequently > second-guess myself and go to the prompt and tab-complete to look at all of > the options and reason out the one I wanted. I agree with Robert. From the very beginning NumPy design has been very consequent with typeEXTENT_IN_BITS mapping (even for unicode), and if we go back to numarray (complex32/complex64) convention, this would be the only exception to this rule. Perhaps I'm a bit biased by being a developer more interested in type 'sizes' that in 'precision' issues, but I'd definitely prefer a completely consistent approach for this matter. So +1 for complex64 & complex128 Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From haase at msg.ucsf.edu Tue Apr 4 09:33:07 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Tue Apr 4 09:33:07 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <443271C9.6080907@sympatico.ca> Message-ID: <200604040929.15815.haase@msg.ucsf.edu> On Tuesday 04 April 2006 08:09, Charles R Harris wrote: > I can't get worked up over this one way or the other: complex128 make sense > if I count bits, complex64 makes sense if I note precision; I just have to > remember the numpy convention. One could argue that complex64 is the more > conventional choice and so has the virtue of least surprise, but I don't > think it is terribly difficult to become accustomed to using complex128 in > its place. I suppose this is one of those programmer's vs user's point of > view thingees. For the guy writing general low level numpy code what > matters is the length of the type, how many bytes have to be moved and so > on, and from the other point of view what counts is the precision of the > arithmetic. I kind of like your comparison of programmer vs user ;-) And so I was "hoping" that numpy (and scipy !!) is intended for the users - like supposedly IDL and Matlab are... No one likes my "backwards compatibility" argument !? Thanks - Sebastian Haase PS: I understand that voting is only for a last resort - some people, always use na.Complex and na.Float and don't care - BUT I use single precision all the time because my image data is already getting to large. So I have to look at this every day, and as Travis pointed out, now is about the last chance to possibly change complex128 to complex64 ... > > Chuck > > On 4/4/06, Colin J. Williams wrote: > > Sebastian Haase wrote: > > > Hi, > > > Could we start another poll on this !? > > > > > > I think I would vote > > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > > used to" > > > > +1 Most people look to the number to give a clue as to the precision of > > the value. > > > > Colin W. > > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab > > > or IDL ! > > > > > > - Thanks > > > Sebastian Haase > > > > > > Travis Oliphant wrote: > > >> Sebastian Haase wrote: > > >>> Tim Hochberg wrote: > > >>> > > >>> > > >>>> This would work fine if repr were instead: > > >>>> > > >>>> dtype([('x', float64), ('z', complex128)]) > > >>>> > > >>>> Anyway, this all seems reasonable to me at first glance. That said, > > >>>> I don't plan to work on this, I've got other fish to fry at the > > >>>> moment. > > >>> > > >>> A new point: Please remind me (and probably others): when did it get > > >>> decided to introduce 'complex128' to mean numarray's complex64 > > >>> and the 'complex64' to mean numarray's complex32 ? > > >> > > >> It was last February (i.e. 2005) when I first started posting > > >> regarding the new NumPy. I claimed it was more consistent to use > > >> actual bit-widths. A few people, including Perry, indicated they > > >> weren't opposed to the change and so I went ahead with it. > > >> > > >> You can read relevant posts by searching on > > >> numpy-discussion at lists.sourceforge.net > > >> > > >> Discussions are always welcome. I suppose it's not too late to > > >> change something like this --- but it's getting there... > > >> > > >> -Travis From robert.kern at gmail.com Tue Apr 4 09:52:11 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 09:52:11 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: Ryan Krauss wrote: > I actually have a problem with the elements of a string matrix from > astype('S#'). The shorter elements in my matrix have a bunch of terms > like '1.0', because the matrix they started from was a float. I need > to keep the float type, but want to get rid of the '.0 ' when I > convert the string output to latex. I was going to check if > element[-2:]=='.0' but ran into this problem: > > In [15]: temp[-2:] > Out[15]: '\x00\x00' > > In [16]: temp.strip() > Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > > I think I can get rid of the \x00's by calling str(element), but is > this a feature or a bug? Probably both. :-) On the one hand, you want to be able to get a useful string out of the array; the nulls are just padding, and the string that you put in was '1.0'. However, suppose that the string you put in was '1.\x00'. Then you would get the "wrong" string out. However, the only real alternative is to also store an integer containing the length of the string with each element. That probably interferes with some of the uses of string arrays. > It would be slightly cleaner for me if the > string matrix elements didn't have the trailing null characters (or > whatever those are), but this may not be possible given the underlying > representation. You can also use temp.strip('\x00') which is a bit more explicit. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From zpincus at stanford.edu Tue Apr 4 09:54:06 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Tue Apr 4 09:54:06 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <443208B9.40106@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> <443208B9.40106@cox.net> Message-ID: > On the other hand, if the scheme was Complex32x2 and Complex64x2, > I could probably decipher what that was without looking it up. It > is is a little ugly and weird I admit, but that probably wouldn't > bother me. On consideration, I'm +1 on Tim's suggestion here, if any change is going to be made. At least it has the virtue of being relatively clear, if a bit ugly. Zach From jh at oobleck.astro.cornell.edu Tue Apr 4 11:14:04 2006 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Tue Apr 4 11:14:04 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> (numpy-discussion-request@lists.sourceforge.net) References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> Message-ID: <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> When I first heard of Complex128, my first response was, "Cool! I didn't even know there was a Double128!" Folks seem to agree that precision-based naming would be most intuitive to new users, but that length-based naming would be most intuitive to low-level programmers. This is a high-level package, whose purpose is to hide the numerical details and programming drudgery from the user as much as possible, while still offering high performance and not limiting capability too much. For this type of package, a good metric is "when it doesn't restrict capability, do what makes sense for new/naiive users". So, I favor Complex32 and Complex64. When you say "complex", everyone knows you mean 2 numbers. When you say 32 or 64 or 128, in the context of bits for floating values, almost everyone assumes you are talking that many bits of precision to represent one number. Consider future conversations about precision and data size. In precision discussions, you'd always have to clarify that complex128 had 64 bits of precision, just to make sure everyone was on the same key (particularly when 128-bit machines arrive). In data-size discussions, everyone would know to double the size for the two components. No extra clarification would be needed. IDL's behavior is irrelevant to us, since they just say "complex", and "dcomplex" for 32-bit and 64-bit precision. --jh-- From oliphant.travis at ieee.org Tue Apr 4 11:25:11 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 4 11:25:11 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: <4432B9C2.7040307@ieee.org> Ryan Krauss wrote: > I actually have a problem with the elements of a string matrix from > astype('S#'). The shorter elements in my matrix have a bunch of terms > like '1.0', because the matrix they started from was a float. I need > to keep the float type, but want to get rid of the '.0 ' when I > convert the string output to latex. I was going to check if > element[-2:]=='.0' but ran into this problem > > In [15]: temp[-2:] > Out[15]: '\x00\x00' > > In [16]: temp.strip() > Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > > I think I can get rid of the \x00's by calling str(element), but is > this a feature or a bug? Of course the elements are padded with '\x00' so that they are all the same length, but we have been trying to make it so that it doesn't matter. Equality testing is one area where it still does. We are using the underlying string equality testing (and it doesn't strip the '\x00'). So, I guess it's a missing feature at this point. -Travis From tim.hochberg at cox.net Tue Apr 4 11:41:10 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 11:41:10 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: <4432BD89.3050501@cox.net> Robert Kern wrote: >Ryan Krauss wrote: > > >>I actually have a problem with the elements of a string matrix from >>astype('S#'). The shorter elements in my matrix have a bunch of terms >>like '1.0', because the matrix they started from was a float. I need >>to keep the float type, but want to get rid of the '.0 ' when I >>convert the string output to latex. I was going to check if >>element[-2:]=='.0' but ran into this problem: >> >>In [15]: temp[-2:] >>Out[15]: '\x00\x00' >> >>In [16]: temp.strip() >>Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >> >>I think I can get rid of the \x00's by calling str(element), but is >>this a feature or a bug? >> >> > >Probably both. :-) On the one hand, you want to be able to get a useful string >out of the array; the nulls are just padding, and the string that you put in was >'1.0'. However, suppose that the string you put in was '1.\x00'. Then you would >get the "wrong" string out. > >However, the only real alternative is to also store an integer containing the >length of the string with each element. That probably interferes with some of >the uses of string arrays. > > > >>It would be slightly cleaner for me if the >>string matrix elements didn't have the trailing null characters (or >>whatever those are), but this may not be possible given the underlying >>representation. >> >> > >You can also use temp.strip('\x00') which is a bit more explicit. > > > Or even temp.rstrip('\x00') which works for all those time you pad the front of your string with '\x00' ;) -tim From faltet at carabos.com Tue Apr 4 11:46:08 2006 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 4 11:46:08 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: <200604042045.39955.faltet@carabos.com> A Dimarts 04 Abril 2006 20:13, Joe Harrington va escriure: > When I first heard of Complex128, my first response was, "Cool! I > didn't even know there was a Double128!" > > Folks seem to agree that precision-based naming would be most > intuitive to new users, but that length-based naming would be most > intuitive to low-level programmers. This is a high-level package, > whose purpose is to hide the numerical details and programming > drudgery from the user as much as possible, while still offering high > performance and not limiting capability too much. For this type of > package, a good metric is "when it doesn't restrict capability, do > what makes sense for new/naiive users". > > So, I favor Complex32 and Complex64. When you say "complex", everyone > knows you mean 2 numbers. When you say 32 or 64 or 128, in the > context of bits for floating values, almost everyone assumes you are > talking that many bits of precision to represent one number. Consider > future conversations about precision and data size. In precision > discussions, you'd always have to clarify that complex128 had 64 bits > of precision, just to make sure everyone was on the same key > (particularly when 128-bit machines arrive). In data-size > discussions, everyone would know to double the size for the two > components. No extra clarification would be needed. Well, from my point of view of "low-level" user, I don't specially like this, but I understand the "high-level" position to be much more important in terms of number of users. Besides, I also see that NumPy should be adressed specially to the requirements of the later users. So for me is fine with complex32/complex64. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From robert.kern at gmail.com Tue Apr 4 12:15:08 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 12:15:08 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: Joe Harrington wrote: > When I first heard of Complex128, my first response was, "Cool! I > didn't even know there was a Double128!" > > Folks seem to agree that precision-based naming would be most > intuitive to new users, but that length-based naming would be most > intuitive to low-level programmers. This is a high-level package, > whose purpose is to hide the numerical details and programming > drudgery from the user as much as possible, while still offering high > performance and not limiting capability too much. For this type of > package, a good metric is "when it doesn't restrict capability, do > what makes sense for new/naiive users". I'm pretty sure that when any of us say that such-and-such is going to make the most sense to new users, we're just guessing. Or projecting our experienced-user prejudices onto them. If I had to register my guess, I would say that either way will make just as much sense to new users. I think it's time that we start taking backwards compatibility with previous releases of numpy seriously and not break numpy code without clear, significant gains in usability. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aisaac at american.edu Tue Apr 4 12:38:05 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue Apr 4 12:38:05 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: On Tue, 04 Apr 2006, Robert Kern apparently wrote: > I would say that either way will make just as much sense > to new users. User's perspective: agreed. Just give me i. consistency and ii. an easy way to inspect the object for its meaning. Cheers, Alan Isaac From tim.hochberg at cox.net Tue Apr 4 12:52:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 12:52:04 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: <4432CE1F.3010209@cox.net> Robert Kern wrote: >Joe Harrington wrote: > > >>When I first heard of Complex128, my first response was, "Cool! I >>didn't even know there was a Double128!" >> >>Folks seem to agree that precision-based naming would be most >>intuitive to new users, but that length-based naming would be most >>intuitive to low-level programmers. This is a high-level package, >>whose purpose is to hide the numerical details and programming >>drudgery from the user as much as possible, while still offering high >>performance and not limiting capability too much. For this type of >>package, a good metric is "when it doesn't restrict capability, do >>what makes sense for new/naiive users". >> >> > >I'm pretty sure that when any of us say that such-and-such is going to make the >most sense to new users, we're just guessing. Or projecting our experienced-user >prejudices onto them. If I had to register my guess, I would say that either way >will make just as much sense to new users. > > Agreed. >I think it's time that we start taking backwards compatibility with previous >releases of numpy seriously and not break numpy code without clear, significant >gains in usability. > > So what does that mean in this case? The current status; nice for existing users of numpy. Or, the old status, nice for people transitioning to numpy from Numeric. It's hard to know which way these backwards compatibility arguments cut when they involve reverting a change from some old behaviour. I've got an idea. Rather than go round and round about complex64 versus complex128, let's just leave things as they are and add a docstring to complex128 and complex64 explaining the situation. [code...code...] >>> help(complex128) class complex128scalar(complexfloatingscalar, complex) | complex128: composed of two 64 bit floats | | Method resolution order: | complex128scalar | complexfloatingscalar | inexactscalar | numberscalar | genericscalar | complex | object ... I someone wants to give me some better text for the docstring, I'll go ahead and commit this change. Heck if you've got some text for the other scalar objects (within reason) I'll be happy to add that at the same time. Regards, -tim From robert.kern at gmail.com Tue Apr 4 13:06:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 13:06:01 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4432CE1F.3010209@cox.net> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> Message-ID: Tim Hochberg wrote: > Robert Kern wrote: >> I think it's time that we start taking backwards compatibility with >> previous >> releases of numpy seriously and not break numpy code without clear, >> significant >> gains in usability. >> > So what does that mean in this case? The current status; nice for > existing users of numpy. Or, the old status, nice for people > transitioning to numpy from Numeric. It's hard to know which way these > backwards compatibility arguments cut when they involve reverting a > change from some old behaviour. I mean numpy. Neither complex64 nor complex128 are backwards-compatible with Numeric. Complex32 and Complex64 already exist and are hopefully isolated as compatibility aliases for typecodes. By backwards-compatibility, I refer to code, not habits. > I've got an idea. Rather than go round and round about complex64 versus > complex128, let's just leave things as they are and add a docstring to > complex128 and complex64 explaining the situation. [code...code...] > > >>> help(complex128) > class complex128scalar(complexfloatingscalar, complex) > | complex128: composed of two 64 bit floats > | > | Method resolution order: > | complex128scalar > | complexfloatingscalar > | inexactscalar > | numberscalar > | genericscalar > | complex > | object > ... > > I someone wants to give me some better text for the docstring, I'll go > ahead and commit this change. Heck if you've got some text for the other > scalar objects (within reason) I'll be happy to add that at the same time. +1 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Tue Apr 4 13:42:38 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 4 13:42:38 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: <4432D9C3.3040109@ee.byu.edu> Robert Kern wrote: >Joe Harrington wrote: > > >>When I first heard of Complex128, my first response was, "Cool! I >>didn't even know there was a Double128!" >> >>Folks seem to agree that precision-based naming would be most >>intuitive to new users, but that length-based naming would be most >>intuitive to low-level programmers. This is a high-level package, >>whose purpose is to hide the numerical details and programming >>drudgery from the user as much as possible, while still offering high >>performance and not limiting capability too much. For this type of >>package, a good metric is "when it doesn't restrict capability, do >>what makes sense for new/naiive users". >> >> > >I'm pretty sure that when any of us say that such-and-such is going to make the >most sense to new users, we're just guessing. Or projecting our experienced-user >prejudices onto them. If I had to register my guess, I would say that either way >will make just as much sense to new users. > > Totally agree. I don't see the argument that Complex64 is a "precision" description. To a new user it could go either way depending on their previous experience. I think most new users won't even use the bit width names but will instead use 'complex' and be done with it... >I think it's time that we start taking backwards compatibility with previous >releases of numpy seriously and not break numpy code without clear, significant >gains in usability. > > +1 -Travis From perry at stsci.edu Tue Apr 4 14:09:02 2006 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 4 14:09:02 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4432D9C3.3040109@ee.byu.edu> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432D9C3.3040109@ee.byu.edu> Message-ID: <6e9f9be0cfb968840dc4314d65c9e655@stsci.edu> On Apr 4, 2006, at 4:40 PM, Travis Oliphant wrote: > > Totally agree. I don't see the argument that Complex64 is a > "precision" description. To a new user it could go either way > depending on their previous experience. I think most new users won't > even use the bit width names but will instead use 'complex' and be > done with it... > >> I think it's time that we start taking backwards compatibility with >> previous >> releases of numpy seriously and not break numpy code without clear, >> significant >> gains in usability. >> > +1 > The issue that just won't go away. We did it the current way for numarray initially and were persuaded to switch to be compatible with Numeric. I agree that it isn't obvious what the number means for complex. That ambiguity will always be there. Unless we did a real user test to find out, we wouldn't know for sure what future users would most likely expect. But in the end, pick one and let's not change it again (or even talk about changing it). It doesn't matter that much to me which it is. Perry From oliphant at ee.byu.edu Tue Apr 4 14:18:59 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 4 14:18:59 2006 Subject: [Numpy-discussion] NumPy documentation Message-ID: <4432E27E.6030906@ee.byu.edu> I received a rather hurtful email today that was very discouraging to me personally. Basically, I was called "lame" and a "wolf" in sheep's clothing because I'm charging for documentation. Fortunately it's the first email of that nature I've received. Others have disagreed with my choice to charge for the documentation but at least they've not resorted to personal attacks on me and my motivations. Please know that such emails do have an impact. While I try to build a tough skin, such unappreciative statements reduce my enthusiasm for working on NumPy significantly. My purpose, however, is not to rant about the misguided words of one person. He brought up a point that I want to clarify. He asked if I "would sue" if somebody else wrote documentation for NumPy. I want to be perfectly clear that this is a ridiculous statement that barely deserves a response. Of course I wouldn't. First of all, it would be extreme circumstances indeed for me to resort to that course of action (basically a company would have to copy my book and start distributing it on a large scale, belligerently). Second of all, I would love to see *more* documentation for NumPy. If there are other (less vocal) people out there who are not using NumPy because of my book, then I certainly feel sorry about that. Please dig in and create the documentation you so urgently want to be free. I will not stand in your way, but may even help. But please consider that time is money. Most people are better off spending their time on something else and just cooperating with others by paying for the book. But, I'm not going to dislike or have any kind of ill feelings with anyone who decides to spend their time on "documentation." In fact, I'll appreciate it just like everyone else. I love the growth of the SciPy Wiki. There are some great recipes and examples there. This is fantastic. I'm 100% behind this kind of work. Rather than write some kind of "replacement" documentation, contribute docstrings to the code and recipes to the Wiki. Then, those that can't or won't buy the book will still have plenty of resources to use to learn NumPy. I'm completely behind all forms of "free" information on NumPy / SciPy and related tools. The only reason I have to charge for the documentation is that I just don't have the resources to simply donate *all* of my time. I want to thank all of you who have already purchased the documentation. It has been extremely helpful to me personally and professionally. Without you, my time to spend on NumPy would have been significantly reduced. Thank you very much. Best wishes, -Travis From ijcvyash at rim.com Tue Apr 4 14:46:07 2006 From: ijcvyash at rim.com (ijcvyash) Date: Tue Apr 4 14:46:07 2006 Subject: [Numpy-discussion] Fw: numpy-discussion Message-ID: <000c01c65831$3e0d2b10$cc04ac54@berndtxhk37ozj> ----- Original Message ----- From: Armstrong Nicholas To: mgucfjwruye at bondavalli.com Sent: Saturday, April 01, 2006 10:21 PM Subject: numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-discussion.gif Type: image/gif Size: 8262 bytes Desc: not available URL: From Chris.Barker at noaa.gov Tue Apr 4 14:48:01 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue Apr 4 14:48:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <4432E973.8070601@noaa.gov> Travis, I'm very sorry to hear that you got such a response. It was completely unwarranted. I am often quite surprised at the vitriol that sometimes results from people that are not getting what they want from an open source project. Indeed, the comment about "suing" makes it completely clear that this individual completely misunderstood your intentions (and the reality of copyright law: you would only have a course of action if your book was copied!). When you first announced the book, I know there was a fair bit of discussion about it, and you made it quite clear how reasonable your position is. Personally, I think forcing open source projects by writing and selling books about them is an excellent approach: it works well for everyone. My freedom is not restricted, you get some compensation for your time. Ideally, I'd like to see comprehensive reference documentation distributed for free, while more comprehensive explanatory docs could be either free or not. One of these days I'll put my keyboard where my mouth is and actually write a doc string or two! In the meantime, I am absolutely thrilled that you've put as much effort into numpy as you have. You are doing a fabulous job, and I hope the appreciation of all is clear to you. thank you, -Chris PS: If we get a reasonable budget next year, I'll be sure to buy a few copies of your book. -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tim.hochberg at cox.net Tue Apr 4 15:37:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 15:37:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E973.8070601@noaa.gov> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> Message-ID: <4432F4DD.6060000@cox.net> Travis, I'm sorry to hear that you received such an unwarranted attack. Although, sadly, not terribly suprised; there are plenty of unpleasant fanatics of various stripes that roam the bitstreams. Let me add a hearty "me too" to everything that Chris just said. This finally motivated me to go out and buy your book, something that's been on my list of things that I should do "one of these days now". I'm hoping that makes this mystery person unhappy. Regards, -tim From svetosch at gmx.net Tue Apr 4 16:03:02 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Tue Apr 4 16:03:02 2006 Subject: [Numpy-discussion] kron with matrices Message-ID: <4432FADE.3070705@gmx.net> Hi, first of all thanks for including kron in numpy, it's very useful. Now I have just built numpy from svn for the first time in order to spot matrix-related bugs before a new release as promised. That worked well, thanks to the great wiki instructions. The old bugs (in linalg) are gone, but I wonder whether the following behavior is another one: >>> import numpy as n >>> n.kron(n.asmatrix(n.ones((1,2))), n.asmatrix(n.zeros((2,2)))) array([[0, 0, 0, 0], [0, 0, 0, 0]]) I would prefer if kron returned a matrix at least if both inputs are matrices, as in the given example. Thanks, Sven From jdhunter at ace.bsd.uchicago.edu Tue Apr 4 16:10:13 2006 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Tue Apr 4 16:10:13 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> (Travis Oliphant's message of "Tue, 04 Apr 2006 15:17:50 -0600") References: <4432E27E.6030906@ee.byu.edu> Message-ID: <87wte5ndot.fsf@peds-pc311.bsd.uchicago.edu> >>>>> "Travis" == Travis Oliphant writes: Travis> I received a rather hurtful email today that was very Travis> discouraging to me personally. Basically, I was called Travis> "lame" and a "wolf" in sheep's clothing because I'm Travis> charging for documentation. Fortunately it's the first Wow, harsh. I would just like to (for a second time) voice my support for your charging for documentation, and throw out a couple of points for people to consider who oppose it. I think a low-ball estimate of the dollar value of the amount of time Travis has donated to scientific python is about $500,000 dollars (5 years, full-time, $100k/yr -- this is low ball because he has probably donated more time and he is certainly worth more than that annually!). If he gets the $300,000 or so dollars he hopes to raise from this book, he still has a net contribution of more than $200k. Those of you who are critical: have you put in that much of your time or money? Secondly, I know personally that Travis has resisted several offers to lure him from academia into industry. Academia, by its nature, affords more flexibility to develop open source software driven by issues of breadth and quality rather than deadlines and customer demands. By charging for this book, it makes it more feasible for him to continue to work in academia and support these projects. Travis and I share some similarities: we both have a wife and kids, with low-paying academic careers, and lead active python projects. Only Travis leads two projects to my one and he has five kids to my three. I recently left academia for a job in industry because of financial considerations, and while my firm is supportive of my matplotlib development (we use it and python extensively in house), it does leave me less time for development. So to those of you grumbling to Travis directly or behind the scenes, think about what he is giving and back off. And start donating some of your own time instead of encouraging Travis to donate more of his. JDH From aisaac at american.edu Tue Apr 4 16:27:10 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue Apr 4 16:27:10 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: On Tue, 04 Apr 2006, Travis Oliphant apparently wrote: > I'm not going to dislike or have any kind of ill feelings > with anyone who decides to spend their time on > "documentation." In fact, I'll appreciate it just like > everyone else. Of course you were extremely clear about this from the beginning. Thank you for numpy!!! Alan Isaac (grateful user of numpy) PS Your book is *very* helpful. From zpincus at stanford.edu Tue Apr 4 16:48:06 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Tue Apr 4 16:48:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432F4DD.6060000@cox.net> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> Message-ID: <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> Hi folks - I must admit that when I first saw the trelgol web page, I was briefly a bit confused and put off about the prospect of moving to numpy from Numeric. Now, it didn't take long for me to come to my senses and realize (a) that no formerly-free documentation had been revoked, (b) that there was enough documentation about the C API in the numpy distribution to get me started, (c) that there was a lot of support available on the email list, and most importantly (d) that Travis and many others are extremely generous with their time, both in answering emails on the numpy list and in making numpy better. I now of course wholeheartedly agree with everything everyone has said in this thread, and with the idea behind selling the documentation. In fact, I feel a bit ashamed that I ever felt otherwise, even though it was just for a few minutes. However, were I a more grumpy (or stupid) type, I might not have come to my senses as rapidly, or ever. That would have been my loss, of course. But, perhaps a few little things could help newcomers better understand the rationale behind the ebook. Basically, everyone on this list knows (and supports, it seems!) the reasoning behind selling the docs, because it was discussed on the list. However, it's not hard to imagine someone new to numpy, or maybe a convert from Numeric (who was used to the large, free manual) scratching their head a little when confronted with http:// www.tramy.us/ . (It's less reasonable to imagine someone then going on to personally attack Travis in email -- that's absolutely unconscionable.) I would suggest that the link from the scipy page be changed to point to http://www.tramy.us/guidetoscipy.html , which is a little more clearly about the ebook, and a little less about the publishing method. It might not hurt to expand a bit on that page and mention the basic reasoning behind selling the docs, and even (if you see fit, Travis) to maybe include links to the other numpy documentation resources (list archive and sign up page, old and out-of-date Numeric reference [with maybe some mention of why buying the book would be better, but that the old ref at least gives the right high-level picture to get a newcomer started using numpy], and the numpy wiki pages). Any of this would certainly put a newcomer in a more charitable state of mind, and forestall any lingering concerns about greed or any such foolishness. Since free advice is worth exactly what you paid for it, feel free to ignore any or all of this. I just wanted to mention a few easy things that I think might help newcomers understand and feel good about the ebook (the first step toward buying it!). Zach On Apr 4, 2006, at 5:36 PM, Tim Hochberg wrote: > > Travis, > > I'm sorry to hear that you received such an unwarranted attack. > Although, sadly, not terribly suprised; there are plenty of > unpleasant fanatics of various stripes that roam the bitstreams. > Let me add a hearty "me too" to everything that Chris just said. > > This finally motivated me to go out and buy your book, something > that's been on my list of things that I should do "one of these > days now". I'm hoping that makes this mystery person unhappy. > > Regards, > -tim > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the > live webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From zpincus at stanford.edu Tue Apr 4 17:19:18 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Tue Apr 4 17:19:18 2006 Subject: [Numpy-discussion] array constructor from generators? Message-ID: Hi folks, Sorry if this has already been discussed, but do you all think it a good idea to extend the array constructor so that it can accept generators instead of lists? I often construct arrays from list comprehensions on generators, e.g. to read a tab-delimited file in: numpy.array([map(float, line.split()) for line in file]) or making an array of pairs of numbers: numpy.array([f for f in unique_combinations(input, 2)]) If the array constructor accepted generators (and turned them into lists behind the scenes, or even evaluated them lazily while filling in the memory buffer, not sure what would be more efficient), the above could be written somewhat more cleanly: numpy.array(map(float, line.split() for line in file) (using a generator expression) and numpy.array(unique_combinations(input, 2)) the latter is especially a win. Moreover, it's becoming more standard for any python thing that can accept a list to also accept a generator. The downside is that currently, passing array() an object makes a 0-d object array with that object. If this were changed, then passing array() an iterator object would be handled differently than passing array any other object. This might possibly be a fatal flaw in this idea. I'd be happy to look in to implementing this functionality if people think it is a good idea, and could give me some tips as to the best way to implement it. Zach From wbaxter at gmail.com Tue Apr 4 17:24:38 2006 From: wbaxter at gmail.com (Bill Baxter) Date: Tue Apr 4 17:24:38 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> Message-ID: First of all, it sounds like the individual who mailed Travis about being a "wolf in sheep's clothing" is suffering from the delusion that you can actually get rich by selling technical documentation at 40 bucks a pop. Travis does have a web page up somewhere explaining all his rationale -- I ran across it somewhere. I remember when I saw it I was thinking "that's bizarre -- why on earth would you have to make a whole web page to justify selling something you yourself created?" I mean, like it or not, Travis wrote it so he can do whatever he wants with it. That's just common sense. Something apparently some lack. It reminds me of the story my father told me when I was like 8 years old about a man who shows up one day and gives a little boy a dollar bill. The boy is exctatic, and thanks the man profusely. Then the next day the same thing, another dollar. The boy can't believe his luck. The whole week the guy comes, then it becomes a month, and then a year. Every day another dollar. Eventually it becomes such a routine that the boy doesn't even bother to thank the guy. Then one day the man doesn't show up. The little boy is furious. He was counting on that dollar, he already knew how he was going to spend every penny. The person who emailed Travis is just like that little boy, furious for not getting the dollar that wasn't his to begin with, rather than being thankful for the $365 he was given out of the blue for no particular reason. --bb -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at cox.net Tue Apr 4 17:41:15 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 17:41:15 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: References: Message-ID: <44331200.2020604@cox.net> Zachary Pincus wrote: > Hi folks, > > Sorry if this has already been discussed, but do you all think it a > good idea to extend the array constructor so that it can accept > generators instead of lists? > > I often construct arrays from list comprehensions on generators, e.g. > to read a tab-delimited file in: > numpy.array([map(float, line.split()) for line in file]) > or making an array of pairs of numbers: > numpy.array([f for f in unique_combinations(input, 2)]) > > If the array constructor accepted generators (and turned them into > lists behind the scenes, or even evaluated them lazily while filling > in the memory buffer, not sure what would be more efficient), the > above could be written somewhat more cleanly: > numpy.array(map(float, line.split() for line in file) (using a > generator expression) > and > numpy.array(unique_combinations(input, 2)) > > the latter is especially a win. > > Moreover, it's becoming more standard for any python thing that can > accept a list to also accept a generator. > > The downside is that currently, passing array() an object makes a 0-d > object array with that object. If this were changed, then passing > array() an iterator object would be handled differently than passing > array any other object. This might possibly be a fatal flaw in this > idea. You pretty much can't count on anything when trying to implicitly create object arrays anyway. There's already buckets of special cases to make the other array types user friendly. In other words I don't think we should care. You do have to be careful to special case iterators after all the other special case machinery, so that lists and whatnot that are treated efficiently don't get slowed down. > > I'd be happy to look in to implementing this functionality if people > think it is a good idea, and could give me some tips as to the best > way to implement it. Hi Zach, I brought this up last week and Travis was OK with it. I have it on my todo list, but if you are in a hurry you're welcome to do it instead. If you do look at it, consider looking into the '__length_hint__ parameter that's slated to go into Python 2.5. When this is present, it's potentially a big win, since you can preallocate the array and fill it directly from the iterator. Without this, you probably can't do much better than just building a list from the array. What would work well would be to build a list, then steal its memory. I'm not sure if that's feasible without leaking a reference to the list though. Also, with iterators, specifying dtype will make a huge difference. If an object has __length_hint__ and you specify dtype, then you can preallocate the array as I suggested above. However, if dtype is not specified, you still need to build the list completely, determine what type it is, allocate the array memory and then copy the values into it. Much less efficient! Regards, -tim > > Zach > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From robert.kern at gmail.com Tue Apr 4 17:50:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 17:50:05 2006 Subject: [Numpy-discussion] Re: array constructor from generators? In-Reply-To: References: Message-ID: Zachary Pincus wrote: > The downside is that currently, passing array() an object makes a 0-d > object array with that object. If this were changed, then passing > array() an iterator object would be handled differently than passing > array any other object. This might possibly be a fatal flaw in this idea. I don't think so. We can pass appropriate lists to array(), and it handles them fine. Iterator objects are just another kind of object that gets special treatment. The tricky bit is recognizing them. > I'd be happy to look in to implementing this functionality if people > think it is a good idea, and could give me some tips as to the best way > to implement it. I think a prerequisite for turning an arbitrary iterable into a numpy array is to iterate over it and store all of the objects in a temporary buffer that expands with a sensible strategy. I can't think of a better buffer object than regular Python lists. I think you can recognize when you have to use the temporary list strategy by seeing if the input has .__iter__() but not .__len__(). I'd have to refresh myself on the details of PyArray_New to be more sure, though. As Tim suggests, 2.5's __length_hint__ will also help. Another note of caution: You are going to have to deal with iterators of iterators of iterators of.... I'm not sure if that actually overly complicates matters; I haven't looked at PyArray_New for some time. Enjoy! -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ted.horst at earthlink.net Tue Apr 4 21:33:04 2006 From: ted.horst at earthlink.net (Ted Horst) Date: Tue Apr 4 21:33:04 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> I'll just add my voice to the people speaking up to support Travis's efforts. I buy lots of books, and most of the time I don't think too much about who I am supporting when I buy them, but I probably would have bought this book even if I didn't need that level of documentation just to help support what I see as very important work. I don't see how writing about an open source project and using the proceeds to further that project could be seen as anything other than a positive. I also just want to say how impressed I am with what Travis has accomplished with this project. From the organizational effort, patience, and persistence of bringing the various communities together to the quality and quantity of the ideas, code, and discussions, his contributions have been inspiring. Ted Horst From eric at enthought.com Tue Apr 4 21:59:10 2006 From: eric at enthought.com (eric jones) Date: Tue Apr 4 21:59:10 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <44334E74.3000406@enthought.com> Travis Oliphant wrote: > > I received a rather hurtful email today that was very discouraging to > me personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. Hmmmm.... Chickens getting eaten by foxes. Farmer builds wire coop. Coop destroyed by foxes. More chickens eaten. Wolf builds wooden coop for free. Also stands guard but for a fee. No more chickens eaten. Most chickens glady pay. A few grumble about extortion! Thats fine. Let them take the guard. Foxes aren't so afraid of Chickens. This chicken will take his chances with this wolf. Turns out its just a lame chicken in wolves clothing. Smart chicken, he is. Dumb letter. Dumb story. Let see here, your a chicken. check. Travis is smart wolf-chicken... yeah that works. Numpy is the wooden chicken coop. errr... Guard duty is documentation. hmmm... foxes, not sure... Guess I should keep my day job. Slightly more seriously... There's a chicken's foot full of people on the planet that could have done what Travis has pulled off -- I've actually thought about this a little. Maybe Jim Huginin could have done it given similar time and motivation. After that, I come up a little short of candidates -- so maybe its just a pigs foot full. I consider us lucky that one of the few people able to fuse Numeric/numarray bailed us out and did it. Documentation is another matter as far as scarcity of qualified authors. I would trust any number of yayhoos to create at least passable documentation for Travis' creation. Heck, David Ascher managed to write the Numeric documentation . That said, writing docs is work, hard to do well, and not nearly as much fun as writing actual code (for the people on this list anyway). That significantly lowers the probability of it getting done. In fact, I believe LLNL funded the first documentation effort to help ensure that it happened (though I'm not positive about that). And, think of the creek we'd be up if he chose to keep the library and give away the docs. I'm all for someone writing free documentation. It'd be great to have. And, if it were as good as Travis', I might even use it. Still, it would probably be better for the world if you spent your time on other things that don't already have a solution (like documenting SciPy...). Once that and all similar problems are solved, loop back around and do the NumPy docs. One other comment. I've used another amazing library called agg (www.antigrain.com) extensively for rendering in kiva/chaco. I view Maxim (the author of Agg) and graphics rendering in a similar light as Travis and Numpy -- there are only a handful of people that could have written agg. For that I am hugely greatful. On the downside, agg is very complex and has very little documentation. Still a number of people use it without complaint. Based on the evidence, if Maxim wrote documentation and charged for it, the number of complaints would actually increase. It is just silly. I would pay his price and sing his praises for the days of my life that he gave back to me. eric ps. # Based on a definitive monte carlo simulation, one of every hundred chickens will # complain. Don't believe me. Try it. dist = stats.uniform(0.0, 1.0) for chicken in chickens: if dist.rvs()[0] < 0.01: print "extortion" From pfdubois at gmail.com Tue Apr 4 22:01:02 2006 From: pfdubois at gmail.com (Paul Dubois) Date: Tue Apr 4 22:01:02 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: Amen. On 04 Apr 2006 21:33:12 -0700, Ted Horst wrote: > > > I'll just add my voice to the people speaking up to support Travis's > efforts. I buy lots of books, and most of the time I don't think too > much about who I am supporting when I buy them, but I probably would > have bought this book even if I didn't need that level of > documentation just to help support what I see as very important > work. I don't see how writing about an open source project and using > the proceeds to further that project could be seen as anything other > than a positive. > > I also just want to say how impressed I am with what Travis has > accomplished with this project. From the organizational effort, > patience, and persistence of bringing the various communities > together to the quality and quantity of the ideas, code, and > discussions, his contributions have been inspiring. > > Ted Horst > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdhunter at ace.bsd.uchicago.edu Tue Apr 4 22:54:01 2006 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Tue Apr 4 22:54:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <44334E74.3000406@enthought.com> (eric jones's message of "Tue, 04 Apr 2006 23:58:28 -0500") References: <4432E27E.6030906@ee.byu.edu> <44334E74.3000406@enthought.com> Message-ID: <873bgsa7vp.fsf@peds-pc311.bsd.uchicago.edu> >>>>> "eric" == eric jones writes: eric> Let see here, your a chicken. check. Travis is smart eric> wolf-chicken... yeah that works. Numpy is the wooden chicken eric> coop. errr... Guard duty is documentation. hmmm... foxes, eric> not sure... And I thought you didn't drink anything stronger than Dr Pepper :-) JDH From sransom at nrao.edu Wed Apr 5 00:04:03 2006 From: sransom at nrao.edu (Scott Ransom) Date: Wed Apr 5 00:04:03 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: <20060405070150.GB8682@ssh.cv.nrao.edu> As someone who has been actively using Numeric/Numarray/Numpy for about 7 years, now, I heartily agree. Thanks, Travis. Scott On Tue, Apr 04, 2006 at 11:32:42PM -0500, Ted Horst wrote: > > I'll just add my voice to the people speaking up to support Travis's > efforts. I buy lots of books, and most of the time I don't think too > much about who I am supporting when I buy them, but I probably would > have bought this book even if I didn't need that level of > documentation just to help support what I see as very important > work. I don't see how writing about an open source project and using > the proceeds to further that project could be seen as anything other > than a positive. > > I also just want to say how impressed I am with what Travis has > accomplished with this project. From the organizational effort, > patience, and persistence of bringing the various communities > together to the quality and quantity of the ideas, code, and > discussions, his contributions have been inspiring. > > Ted Horst > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From charlesr.harris at gmail.com Wed Apr 5 00:27:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed Apr 5 00:27:02 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: Travis, On 4/4/06, Travis Oliphant wrote: > > > I received a rather hurtful email today that was very discouraging to me > personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. Geez, what's with that. There are any number of "real" books out on python, I don't hear folks bitching. I think it's wonderful that we have such a good reference. I mean, look at numarray 8) I spent the money for your book and it didn't hurt a bit and was well worth the cost. Anyone who has tried to write extensive documentation on a big project knows how much work it takes, it isn't easy. Thanks for taking the time and sweat to do so. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnd.baecker at web.de Wed Apr 5 01:51:08 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 01:51:08 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> Message-ID: On Tue, 4 Apr 2006, Robert Kern wrote: > Tim Hochberg wrote: [...] > > >>> help(complex128) > > class complex128scalar(complexfloatingscalar, complex) > > | complex128: composed of two 64 bit floats > > | > > | Method resolution order: > > | complex128scalar > > | complexfloatingscalar > > | inexactscalar > > | numberscalar > > | genericscalar > > | complex > > | object > > ... I am puzzled why this does not show up with Ipython: In [1]:import numpy In [2]:numpy.complex128? Type: type Base Class: String Form: Namespace: Interactive Docstring: whereas In [3]:help(numpy.complex128) shows the above! So this might be more of an IPython question (I am running IPython 0.7.2.svn), but maybe numpy does some magic tricks to hide the docs from IPython (surely not on purpose ...)? It seems that numpy.complex128.__doc__ is None. Best, Arnd From meesters at uni-mainz.de Wed Apr 5 02:03:06 2006 From: meesters at uni-mainz.de (Christian Meesters) Date: Wed Apr 5 02:03:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> Message-ID: <200604051048.52766.meesters@uni-mainz.de> I'm glad Travis, that you got such supportive replies - but didn't expect anything else. Just let me give two more cents: a) I am a grateful user of Numpy/Scipy, too. b) I among of those who fully understand and support your decisions about selling the book. c) I didn't buy the book - yet. (Simply forgotten after a minor Pay-Pal-problem I had.) d) ad c): This will change soon. And e): Thank you for all your work put into Numpy/Scipy ! Christian From amcmorl at gmail.com Wed Apr 5 02:30:01 2006 From: amcmorl at gmail.com (amcmorl) Date: Wed Apr 5 02:30:01 2006 Subject: [Numpy-discussion] Newbie indexing question and print order Message-ID: <44338DF4.7050603@gmail.com> Hi all, I'm having a bit of trouble getting my head around numpy's indexing capabilities. A quick summary of the problem is that I want to lookup/index in nD from a second array of rank n+1, such that the last (or first, I guess) dimension contains the lookup co-ordinates for the value to extract from the first array. Here's a 2D (3,3) example: In [12]:print ar [[ 0.15 0.75 0.2 ] [ 0.82 0.5 0.77] [ 0.21 0.91 0.59]] In [24]:print inds [[[1 1] [1 1] [2 1]] [[2 2] [0 0] [1 0]] [[1 1] [0 0] [2 1]]] then somehow return the array (barring me making any row/column errors): In [26]: c = ar.somefancyindexingroutinehere(inds) In [26]:print c [[ 0.5 0.5 0.91] [ 0.59 0.15 0.82] [ 0.5 0.15 0.91]] i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] Any suggestions? It looks like it should be relatively simple using 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not getting it. While I'm here, can someone help me understand the rationale behind 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example rather than x, y (=column, row; in which case 0.75 would be in the first column and second row), which seems to me to be more intuitive. I'm really enjoying getting into numpy - I can see it'll be simpler/faster coding than my previous environments, despite me not knowing my way at the moment, and that python has better opportunities for extensibility. So, many thanks for your great work. -- Angus McMorland email a.mcmorland at auckland.ac.nz mobile +64-21-155-4906 PhD Student, Neurophysiology / Multiphoton & Confocal Imaging Physiology, University of Auckland phone +64-9-3737-599 x89707 Armourer, Auckland University Fencing Secretary, Fencing North Inc. From faltet at carabos.com Wed Apr 5 02:56:06 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed Apr 5 02:56:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <1144230907.7563.14.camel@localhost.localdomain> Travis, First of all, I think that you should be happy that you received *only* a mail of this class in the year and some months that you are at the NumPy project. As somebody already noted: "take a large enough community, and you will always find a person (or several) that thinks that the wiser developer and the best professional is evil". We can disscuss largely why this should happen, but the answer is easy: it's human nature. Let me also THANK YOU not only for your impressive dedication to the NumPy project but also for your openess to other ideas and to be the best advocate of the "I prefer to code, rather than talk" mantra. Lets do more of this and let others talk. I'm positive that 99% of the community is with you, and that's the only consideration that is worth. Best, Francesc El dt 04 de 04 del 2006 a les 15:17 -0600, en/na Travis Oliphant va escriure: > I received a rather hurtful email today that was very discouraging to me > personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. Fortunately it's the > first email of that nature I've received. Others have disagreed with my > choice to charge for the documentation but at least they've not resorted > to personal attacks on me and my motivations. Please know that such > emails do have an impact. While I try to build a tough skin, such > unappreciative statements reduce my enthusiasm for working on NumPy > significantly. > > My purpose, however, is not to rant about the misguided words of one > person. He brought up a point that I want to clarify. He asked if I > "would sue" if somebody else wrote documentation for NumPy. I want to > be perfectly clear that this is a ridiculous statement that barely > deserves a response. Of course I wouldn't. First of all, it would be > extreme circumstances indeed for me to resort to that course of action > (basically a company would have to copy my book and start distributing > it on a large scale, belligerently). Second of all, I would love to see > *more* documentation for NumPy. > > If there are other (less vocal) people out there who are not using NumPy > because of my book, then I certainly feel sorry about that. Please dig > in and create the documentation you so urgently want to be free. I > will not stand in your way, but may even help. > > But please consider that time is money. Most people are better off > spending their time on something else and just cooperating with others > by paying for the book. But, I'm not going to dislike or have any kind > of ill feelings with anyone who decides to spend their time on > "documentation." In fact, I'll appreciate it just like everyone else. > I love the growth of the SciPy Wiki. There are some great recipes and > examples there. This is fantastic. I'm 100% behind this kind of work. > Rather than write some kind of "replacement" documentation, contribute > docstrings to the code and recipes to the Wiki. Then, those that can't > or won't buy the book will still have plenty of resources to use to > learn NumPy. > > I'm completely behind all forms of "free" information on NumPy / SciPy > and related tools. The only reason I have to charge for the > documentation is that I just don't have the resources to simply donate > *all* of my time. I want to thank all of you who have already > purchased the documentation. It has been extremely helpful to me > personally and professionally. Without you, my time to spend on NumPy > would have been significantly reduced. Thank you very much. > > Best wishes, > > -Travis > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- >0,0< Francesc Altet http://www.carabos.com/ V V C?rabos Coop. V. Enjoy Data "-" From pau.gargallo at gmail.com Wed Apr 5 03:10:01 2006 From: pau.gargallo at gmail.com (Pau Gargallo) Date: Wed Apr 5 03:10:01 2006 Subject: [Numpy-discussion] Newbie indexing question and print order In-Reply-To: <44338DF4.7050603@gmail.com> References: <44338DF4.7050603@gmail.com> Message-ID: <6ef8f3380604050309t1ed4c79bv395ed1a9fb45ce9d@mail.gmail.com> hi, i had the same problem and i defined a function with a similar sintax to interp2 which i call take2 to solve it: from numpy import * def take2( a, x,y ): return take( ravel(a), x + y*a.shape[0] ) a = array( [[ 0.15, 0.75, 0.2 ], [ 0.82, 0.5, 0.77], [ 0.21, 0.91, 0.59]] ) xy = array([ [[1, 1], [1, 1], [2, 1]], [[2, 2], [0, 0], [1, 0]], [[1, 1], [0, 0], [2, 1]]] ) print take2( a, xy[...,0], xy[...,1] ) i hope this helps you. pau On 4/5/06, amcmorl wrote: > Hi all, > > I'm having a bit of trouble getting my head around numpy's indexing > capabilities. A quick summary of the problem is that I want to > lookup/index in nD from a second array of rank n+1, such that the last > (or first, I guess) dimension contains the lookup co-ordinates for the > value to extract from the first array. Here's a 2D (3,3) example: > > In [12]:print ar > [[ 0.15 0.75 0.2 ] > [ 0.82 0.5 0.77] > [ 0.21 0.91 0.59]] > > In [24]:print inds > [[[1 1] > [1 1] > [2 1]] > > [[2 2] > [0 0] > [1 0]] > > [[1 1] > [0 0] > [2 1]]] > > then somehow return the array (barring me making any row/column errors): > In [26]: c = ar.somefancyindexingroutinehere(inds) > > In [26]:print c > [[ 0.5 0.5 0.91] > [ 0.59 0.15 0.82] > [ 0.5 0.15 0.91]] > > i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] > > Any suggestions? It looks like it should be relatively simple using > 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not > getting it. > > While I'm here, can someone help me understand the rationale behind > 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example > rather than x, y (=column, row; in which case 0.75 would be in the first > column and second row), which seems to me to be more intuitive. > > I'm really enjoying getting into numpy - I can see it'll be > simpler/faster coding than my previous environments, despite me not > knowing my way at the moment, and that python has better opportunities > for extensibility. So, many thanks for your great work. > -- > Angus McMorland > email a.mcmorland at auckland.ac.nz > mobile +64-21-155-4906 > > PhD Student, Neurophysiology / Multiphoton & Confocal Imaging > Physiology, University of Auckland > phone +64-9-3737-599 x89707 > > Armourer, Auckland University Fencing > Secretary, Fencing North Inc. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Wed Apr 5 05:30:14 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 05:30:14 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> Message-ID: <4433B816.1080307@cox.net> Arnd Baecker wrote: >On Tue, 4 Apr 2006, Robert Kern wrote: > > > >>Tim Hochberg wrote: >> >> > >[...] > > > >>> >>> help(complex128) >>> class complex128scalar(complexfloatingscalar, complex) >>> | complex128: composed of two 64 bit floats >>> | >>> | Method resolution order: >>> | complex128scalar >>> | complexfloatingscalar >>> | inexactscalar >>> | numberscalar >>> | genericscalar >>> | complex >>> | object >>> ... >>> >>> > >I am puzzled why this does not show up with Ipython: > >In [1]:import numpy >In [2]:numpy.complex128? >Type: type >Base Class: >String Form: >Namespace: Interactive >Docstring: > > >whereas > >In [3]:help(numpy.complex128) > >shows the above! >So this might be more of an IPython question (I am running IPython >0.7.2.svn), but maybe numpy does some magic tricks to hide the docs from >IPython (surely not on purpose ...)? >It seems that numpy.complex128.__doc__ is None > That's right, none of the scalar types have docstrings at present. The builtin help (AKA pydoc.help) tracks back through all the base classes and presents all kinds of extra information. The result tends to be awfully verbose; so much so that I just stuffed a function called hint into __builtins___ that just prints the results of pydoc.describe and pydoc.getdoc. It's quite possible that such a function already exists, maybe even in pydoc, but oddly enough the docs for pydoc are pretty impenatrable. Here I've added basic docstrings to the complex types. I was hoping someone would have some ideas for other stuff that should go into the docstrings, but perhaps I'll just commit that change as is. Here's what I see here using hint: >>> hint(numpy.float64) # Still no docstring class float64scalar >>> hint(numpy.complex64) # Now has a terse docstring class complex64scalar | Composed of two 32 bit floats >>> hint(numpy.complex128) # Same here. class complex128scalar | Composed of two 64 bit floats Regards, -tim From arnd.baecker at web.de Wed Apr 5 05:48:02 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 05:48:02 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <44315633.4010600@cox.net> References: <44315633.4010600@cox.net> Message-ID: On Mon, 3 Apr 2006, Tim Hochberg wrote: > Arnd Baecker wrote: > > [SNIP] > > >((Note that I just learned in some other thread that with numpy there is > >an alternative to NewAxis, but I haven't figured out which that is ...)) > > > > > If you're old school you could just use None. Well, I have been using python/Numeric/... for a while, but I am definitively not old school - I was not aware that NewAxis is a longer spelling of None ;-) > But you probably mean 'newaxis'. yes - perfect! Many thanks. BTW, it seems that we have no Numeric to numpy transition remarks in www.scipy.org. I only found http://www.scipy.org/PearuPeterson/NumpyVersusNumeric and of course Travis' "Guide to NumPy" contains a detailed list of necessary changes in chapter 2.6.1. In addition ``site-packages/numpy/lib/convertcode.py`` provides an automatic conversion. Would it be helpful to start a new wiki page "ConvertingFromNumeric" (similar to http://www.scipy.org/Converting_from_numarray) which aims at summarizing the necessary changes or expand Pearu's page (if he agrees) on this? Best, Arnd From arnd.baecker at web.de Wed Apr 5 05:57:16 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 05:57:16 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4433B816.1080307@cox.net> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> <4433B816.1080307@cox.net> Message-ID: Hi, On Wed, 5 Apr 2006, Tim Hochberg wrote: [...] > That's right, none of the scalar types have docstrings at present. The > builtin help (AKA pydoc.help) tracks back through all the base classes > and presents all kinds of extra information. I see - so that might be something Ipython could do as well (if that's really what we would like to see...) > The result tends to be > awfully verbose; so much so that I just stuffed a function called hint > into __builtins___ that just prints the results of pydoc.describe and > pydoc.getdoc. It's quite possible that such a function already exists, > maybe even in pydoc, but oddly enough the docs for pydoc are pretty > impenatrable. > > Here I've added basic docstrings to the complex types. I was hoping > someone would have some ideas for other stuff that should go into the > docstrings, but perhaps I'll just commit that change as is. Here's what > I see here using hint: > > >>> hint(numpy.float64) # Still no docstring > class float64scalar > >>> hint(numpy.complex64) # Now has a terse docstring > class complex64scalar > | Composed of two 32 bit floats > >>> hint(numpy.complex128) # Same here. > class complex128scalar > | Composed of two 64 bit floats That looks much better. I am a bit unsure about `hint` though for the following reasons: There are quite a few ways to access documentation: - help(defined_object) - help("numpy.complex128") - scipy.info(defined_object) - hint(defined_object) - defined_object? # with IPython (and then of course the pydoc commands as well ...). Clearly, I would prefer to have "?" in IPython as the only thing one needs to know about accessing documentation. There are surely many aspects to consider here, but I have to rush now ... Best, Arnd From tim.hochberg at cox.net Wed Apr 5 06:24:11 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 06:24:11 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> <4433B816.1080307@cox.net> Message-ID: <4433C4CC.7010003@cox.net> Arnd Baecker wrote: >Hi, > >On Wed, 5 Apr 2006, Tim Hochberg wrote: > >[...] > > > >>That's right, none of the scalar types have docstrings at present. The >>builtin help (AKA pydoc.help) tracks back through all the base classes >>and presents all kinds of extra information. >> >> > >I see - so that might be something Ipython could do as well >(if that's really what we would like to see...) > > > >>The result tends to be >>awfully verbose; so much so that I just stuffed a function called hint >>into __builtins___ that just prints the results of pydoc.describe and >>pydoc.getdoc. It's quite possible that such a function already exists, >>maybe even in pydoc, but oddly enough the docs for pydoc are pretty >>impenatrable. >> >>Here I've added basic docstrings to the complex types. I was hoping >>someone would have some ideas for other stuff that should go into the >>docstrings, but perhaps I'll just commit that change as is. Here's what >>I see here using hint: >> >> >>> hint(numpy.float64) # Still no docstring >>class float64scalar >> >>> hint(numpy.complex64) # Now has a terse docstring >>class complex64scalar >> | Composed of two 32 bit floats >> >>> hint(numpy.complex128) # Same here. >>class complex128scalar >> | Composed of two 64 bit floats >> >> > >That looks much better. >I am a bit unsure about `hint` though for the following reasons: >There are quite a few ways to access documentation: > - help(defined_object) > - help("numpy.complex128") > - scipy.info(defined_object) > - hint(defined_object) > - defined_object? # with IPython >(and then of course the pydoc commands as well ...). > > Sorry, I was unclear. Hint is only for my enjoyment -- it's not related to numpy. I just tossed it into my sitecustomize file. I was just get sick of doing help(complex64) and getting pages of text when all I cared about was the docstring. I suppose I could just have done "print complex64.__doc__", but I felt like hint might be useful. However, it's not something I was proposing to add to numpy, the changes I was talking about are strictly in the docstrings of complexXXX. -tim >Clearly, I would prefer to have "?" in IPython as the only thing one needs >to know about accessing documentation. > >There are surely many aspects to consider here, but I have to rush now ... > >Best, Arnd > > > > > > From emsellem at obs.univ-lyon1.fr Wed Apr 5 06:33:23 2006 From: emsellem at obs.univ-lyon1.fr (Eric Emsellem) Date: Wed Apr 5 06:33:23 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array Message-ID: <4433C6D6.5080800@obs.univ-lyon1.fr> Hi, I am trying to optimize a code where I derive random numbers many times and having an array of values for the stdev parameter. I wish to have an efficient way of doing something like: ################## stdev = array([1.1,1.2,1.0,2.2]) result = numpy.zeros(stdev.shape, Float) for i in range(len(stdev)) : result[i] = numpy.random.normal(0, stdev[i]) ################## In my case, stdev can in fact be an array of a few millions floats... so I really need to optimize things. Any hint on how to code this efficiently ? And in general, where could I find tips for optimizing a code where I unfortunately have too many loops such as "for i in range(Nbody) : " with Nbody being > 10^6 ? thanks! Eric From dd55 at cornell.edu Wed Apr 5 06:34:00 2006 From: dd55 at cornell.edu (Darren Dale) Date: Wed Apr 5 06:34:00 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: <200604050932.56744.dd55@cornell.edu> On Wednesday 05 April 2006 00:32, Ted Horst wrote: > I'll just add my voice to the people speaking up to support Travis's > efforts. I buy lots of books, and most of the time I don't think too > much about who I am supporting when I buy them, but I probably would > have bought this book even if I didn't need that level of > documentation just to help support what I see as very important > work. I don't see how writing about an open source project and using > the proceeds to further that project could be seen as anything other > than a positive. > > I also just want to say how impressed I am with what Travis has > accomplished with this project. From the organizational effort, > patience, and persistence of bringing the various communities > together to the quality and quantity of the ideas, code, and > discussions, his contributions have been inspiring. I agree. I support of what Travis has done. From pearu at scipy.org Wed Apr 5 07:18:02 2006 From: pearu at scipy.org (Pearu Peterson) Date: Wed Apr 5 07:18:02 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: <44315633.4010600@cox.net> Message-ID: On Wed, 5 Apr 2006, Arnd Baecker wrote: > BTW, it seems that we have no Numeric to numpy transition remarks in > www.scipy.org. I only found > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > and of course Travis' "Guide to NumPy" contains a detailed list of > necessary changes in chapter 2.6.1. > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > automatic conversion. > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > (similar to http://www.scipy.org/Converting_from_numarray) > which aims at summarizing the necessary changes > or expand Pearu's page (if he agrees) on this? It's better to start a new wiki page similar to Converting_from_numarray (I like the table). Btw, I have few notes about the necessary changes for Numeric->numpy transition in the following page: http://svn.enthought.com/enthought/wiki/NumpyPort#NotesonchangesduetoreplacingNumeric/scipy_basewithnumpy Feel free to grab these notes. Pearu From zpincus at stanford.edu Wed Apr 5 08:04:33 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Wed Apr 5 08:04:33 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: <44331200.2020604@cox.net> References: <44331200.2020604@cox.net> Message-ID: tim> > I brought this up last week and Travis was OK with it. I have it on > my todo list, but if you are in a hurry you're welcome to do it > instead. Sorry if that was on the list and I missed it! Hate to be adding more noise than signal. At any rate, I'm not in a hurry, but I'd be happy to help where I can. (Though for the next week or so I think I'm swamped...) tim> > If you do look at it, consider looking into the '__length_hint__ > parameter that's slated to go into Python 2.5. When this is > present, it's potentially a big win, since you can preallocate the > array and fill it directly from the iterator. Without this, you > probably can't do much better than just building a list from the > array. What would work well would be to build a list, then steal > its memory. I'm not sure if that's feasible without leaking a > reference to the list though. Can you steal its memory and then give it some dummy memory that it can free without problems, so that the list can be deallocated without trouble? Does anyone know if you can just give the list a NULL pointer for it's memory and then immediately decref it? free (NULL) should always be safe, I think. (??) > Also, with iterators, specifying dtype will make a huge difference. > If an object has __length_hint__ and you specify dtype, then you > can preallocate the array as I suggested above. However, if dtype > is not specified, you still need to build the list completely, > determine what type it is, allocate the array memory and then copy > the values into it. Much less efficient! How accurate is __length_hint__ going to be? It could lead to a fair bit of special case code for growing and shrinking the final array if __length_hint__ turns out to be wrong. Code that python lists already have, moreover. If the list's memory can be stolen safely, how does this strategy sound: - Given a generator, build it up into a list internally, and then steal the list's memory. - If a dtype is provided, wrap the generator with another generator that casts the original generator's output to the correct dtype. Then use the wrapped generator to create a list of the proper dtype, and steal that list's memory. A potential problem with stealing list memory is that it could waste memory if the list has more bytes allocated than it is using (I'm not sure if python lists can get this way, but I presume that they resize themselves only every so often, like C++ or Java vectors, so most of the time they have some allocated but unused bytes). If lists have a squeeze method that's guaranteed not to cause any copies, or if this can be added with judicious use of realloc, then that problem is obviated. robert> > Another note of caution: You are going to have to deal with > iterators of > iterators of iterators of.... I'm not sure if that actually overly > complicates > matters; I haven't looked at PyArray_New for some time. Enjoy! This is a good point. Numpy does fine with nested lists, but what should it do with nested generators? I originally thought that basically 'array(generator)' should make the exact same thing as 'array([f for f in generator])'. However, for nested generators, this would be an object array of generators. I'm not sure which is better -- having more special cases for generators that make generators, or having a simple rubric like above for how generators are treated. Any thoughts? Zach From perry at stsci.edu Wed Apr 5 08:08:19 2006 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 5 08:08:19 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: Speaking as someone who thinks he knows what kind of effort is involved in creating numpy, I suspect relatively few have any idea of the effort and skill that is required to do what Travis has done. Indeed, I wouldn't be surprised if Travis hadn't fully anticipated at the start what he was getting himself into, and if he hasn't asked himself more than once whether he would do it again had he known [I imagine that many worthy and memorable efforts fall into this category. Much human progress springs out of such initial optimism.] John Hunter is right that Travis's contributions to this and other scipy-related projects amount to years of work. For those that find it objectionable that Travis is trying to get some partial compensation for this work, consider whether there was any one at all in the Python community willing to do this as well as he as for free, or even for what he will actually recover from the book. I doubt it very much. Fortunately, I think the number of people that object to Travis charging for the book is small. Unfortunately, their impact can be disproportionately large. I hope Travis can effectively ignore them. Perry From lennart.ohlsson at cs.lth.se Wed Apr 5 08:12:20 2006 From: lennart.ohlsson at cs.lth.se (Lennart Ohlsson) Date: Wed Apr 5 08:12:20 2006 Subject: [Numpy-discussion] Re: Newbie indexing question and print order Message-ID: <008201c658c3$30d06ab0$2f32eb82@cs060109> Hi, Although I mainly use for 2D takes here is an nd-version of such a function: def vtake(a, indices): """Corresponding to take in numpy but with vector valued indices""" indexrank = indices.shape[-1] flattedindex = 0 for i in range(indexrank): flattedindex = flattedindex*a.shape[i] + indices[...,i] flattedshape = (-1,) + a.shape[indexrank:] return a.reshape(flattedshape).take(flattedindex) - Lennart On 4/5/06, Pau Gargallo wrote: hi, i had the same problem and i defined a function with a similar sintax to interp2 which i call take2 to solve it: from numpy import * def take2( a, x,y ): return take( ravel(a), x + y*a.shape[0] ) a = array( [[ 0.15, 0.75, 0.2 ], [ 0.82, 0.5, 0.77], [ 0.21, 0.91, 0.59]] ) xy = array([ [[1, 1], [1, 1], [2, 1]], [[2, 2], [0, 0], [1, 0]], [[1, 1], [0, 0], [2, 1]]] ) print take2( a, xy[...,0], xy[...,1] ) i hope this helps you. pau On 4/5/06, amcmorl wrote: > Hi all, > > I'm having a bit of trouble getting my head around numpy's indexing > capabilities. A quick summary of the problem is that I want to > lookup/index in nD from a second array of rank n+1, such that the last > (or first, I guess) dimension contains the lookup co-ordinates for the > value to extract from the first array. Here's a 2D (3,3) example: > > In [12]:print ar > [[ 0.15 0.75 0.2 ] > [ 0.82 0.5 0.77] > [ 0.21 0.91 0.59]] > > In [24]:print inds > [[[1 1] > [1 1] > [2 1]] > > [[2 2] > [0 0] > [1 0]] > > [[1 1] > [0 0] > [2 1]]] > > then somehow return the array (barring me making any row/column errors): > In [26]: c = ar.somefancyindexingroutinehere(inds) > > In [26]:print c > [[ 0.5 0.5 0.91] > [ 0.59 0.15 0.82] > [ 0.5 0.15 0.91]] > > i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] > > Any suggestions? It looks like it should be relatively simple using > 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not > getting it. > > While I'm here, can someone help me understand the rationale behind > 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example > rather than x, y (=column, row; in which case 0.75 would be in the first > column and second row), which seems to me to be more intuitive. > > I'm really enjoying getting into numpy - I can see it'll be > simpler/faster coding than my previous environments, despite me not > knowing my way at the moment, and that python has better opportunities > for extensibility. So, many thanks for your great work. > -- > Angus McMorland > email a.mcmorland at auckland.ac.nz > mobile +64-21-155-4906 > > PhD Student, Neurophysiology / Multiphoton & Confocal Imaging > Physiology, University of Auckland > phone +64-9-3737-599 x89707 > > Armourer, Auckland University Fencing > Secretary, Fencing North Inc. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From a.h.jaffe at gmail.com Wed Apr 5 08:18:03 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Wed Apr 5 08:18:03 2006 Subject: [Numpy-discussion] weird interaction: pickle, numpy, matplotlib.hist Message-ID: <4433DF85.7030109@gmail.com> Hi All, I've encountered a strange problem: I've been running some python code on both a linux box and OS X, both with python 2.4.1 and the latest numpy and matplotlib from svn. I have found that when I transfer pickled numpy arrays from one machine to the other (in either direction), the resulting data *looks* all right (i.e., it is a numpy array of the correct type with the correct values at the correct indices), but it seems to produce the wrong result in (at least) one circumstance: matplotlib.hist() gives the completely wrong picture (and set of bins). This can be ameliorated by running the array through arr=numpy.asarray(arr, dtype=numpy.float64) but this seems like a complete kludge (and is only needed when you do the transfer between machines). I've attached a minimal code that exhibits the problem: try test_pickle_hist.test(write=True) on one machine, transfer the output file to another machine, and run test_pickle_hist.test(write=False) on another, and you should see a very strange result (and it should be fixed if you set asarray=True). Any ideas? Andrew -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_pickle_hist.py URL: From ryanlists at gmail.com Wed Apr 5 08:23:06 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Wed Apr 5 08:23:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> Message-ID: I just realized that my "Amen" to all of this went only to Alan Isaac. I don't "reply-to-all" by default. In response to Perry's comment: "I hope Travis can effectively ignore them." I think a spam filter with "wolf" and "sheep" might be a good start, but it could accidentally delete some interesting "poetry" . Ryan On 4/4/06, Ryan Krauss wrote: > Let me add my thanks and also say that as a grad student who plans to > buy your book once I graduate, NumPy's use is not inhibited by Travis > charging for the documentation. > > Thanks! > > Ryan Krauss > > On 4/4/06, Alan G Isaac wrote: > > On Tue, 04 Apr 2006, Travis Oliphant apparently wrote: > > > I'm not going to dislike or have any kind of ill feelings > > > with anyone who decides to spend their time on > > > "documentation." In fact, I'll appreciate it just like > > > everyone else. > > > > Of course you were extremely clear about this from the > > beginning. Thank you for numpy!!! > > Alan Isaac (grateful user of numpy) > > PS Your book is *very* helpful. > > > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > > that extends applications into web and mobile media. Attend the live webcast > > and join the prime developer group breaking into this new coding territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From zpincus at stanford.edu Wed Apr 5 08:32:02 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Wed Apr 5 08:32:02 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: <44331200.2020604@cox.net> References: <44331200.2020604@cox.net> Message-ID: <884F03C6-599C-426A-A0A0-97009B63EACB@stanford.edu> [sorry if this comes through twice -- seems to have not sent the first time] Hi folks, tim> > I brought this up last week and Travis was OK with it. I have it on > my todo list, but if you are in a hurry you're welcome to do it > instead. Sorry if that was on the list and I missed it! Hate to be adding more noise than signal. At any rate, I'm not in a hurry, but I'd be happy to help where I can. (Though for the next week or so I think I'm swamped...) tim> > If you do look at it, consider looking into the '__length_hint__ > parameter that's slated to go into Python 2.5. When this is > present, it's potentially a big win, since you can preallocate the > array and fill it directly from the iterator. Without this, you > probably can't do much better than just building a list from the > array. What would work well would be to build a list, then steal > its memory. I'm not sure if that's feasible without leaking a > reference to the list though. Can you steal its memory and then give it some dummy memory that it can free without problems, so that the list can be deallocated without trouble? Does anyone know if you can just give the list a NULL pointer for it's memory and then immediately decref it? free (NULL) should always be safe, I think. (??) > Also, with iterators, specifying dtype will make a huge difference. > If an object has __length_hint__ and you specify dtype, then you > can preallocate the array as I suggested above. However, if dtype > is not specified, you still need to build the list completely, > determine what type it is, allocate the array memory and then copy > the values into it. Much less efficient! How accurate is __length_hint__ going to be? It could lead to a fair bit of special case code for growing and shrinking the final array if __length_hint__ turns out to be wrong. Code that python lists already have, moreover. If the list's memory can be stolen safely, how does this strategy sound: - Given a generator, build it up into a list internally, and then steal the list's memory. - If a dtype is provided, wrap the generator with another generator that casts the original generator's output to the correct dtype. Then use the wrapped generator to create a list of the proper dtype, and steal that list's memory. A potential problem with stealing list memory is that it could waste memory if the list has more bytes allocated than it is using (I'm not sure if python lists can get this way, but I presume that they resize themselves only every so often, like C++ or Java vectors, so most of the time they have some allocated but unused bytes). If lists have a squeeze method that's guaranteed not to cause any copies, or if this can be added with judicious use of realloc, then that problem is obviated. robert> > Another note of caution: You are going to have to deal with > iterators of > iterators of iterators of.... I'm not sure if that actually overly > complicates > matters; I haven't looked at PyArray_New for some time. Enjoy! This is a good point. Numpy does fine with nested lists, but what should it do with nested generators? I originally thought that basically 'array(generator)' should make the exact same thing as 'array([f for f in generator])'. However, for nested generators, this would be an object array of generators. I'm not sure which is better -- having more special cases for generators that make generators, or having a simple rubric like above for how generators are treated. Any thoughts? Zach From robert.kern at gmail.com Wed Apr 5 08:36:03 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 08:36:03 2006 Subject: [Numpy-discussion] Re: A random.normal function with stdev as array In-Reply-To: <4433C6D6.5080800@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: Eric Emsellem wrote: > Hi, > > I am trying to optimize a code where I derive random numbers many times > and having an array of values for the stdev parameter. > > I wish to have an efficient way of doing something like: > ################## > stdev = array([1.1,1.2,1.0,2.2]) > result = numpy.zeros(stdev.shape, Float) > for i in range(len(stdev)) : > result[i] = numpy.random.normal(0, stdev[i]) > ################## You can use the fact that the standard deviation of a normal distribution is a scale parameter. You can get random normal deviates of varying standard deviation by multiplying a standard normal deviate by the desired standard deviation (how's that for confusing terminology, eh?). result = numpy.random.standard_normal(stdev.shape) * stdev > In my case, stdev can in fact be an array of a few millions floats... > so I really need to optimize things. > > Any hint on how to code this efficiently ? > > And in general, where could I find tips for optimizing a code where I > unfortunately have too many loops such as "for i in range(Nbody) : " > with Nbody being > 10^6 ? Tim Hochberg recently made this list: """ 0. Think about your algorithm. 1. Vectorize your inner loop. 2. Eliminate temporaries 3. Ask for help 4. Recode in C. 5. Accept that your code will never be fast. Step zero should probably be repeated after every other step ;) """ That's probably the best general advice. To get better advice, we would need to know the specifics of the problem. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Wed Apr 5 08:48:27 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Wed Apr 5 08:48:27 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist [sort() method problem?] Message-ID: OK, I think I've managed to track the problem down a bit further: the sort() method is failing for arrays pickled on another machine! That is, it's definitely not sorting the array, but changing to a very strange order (neither the way it started nor sorted). Again, the array seems to otherwise behave fine (indeed, it even satisfies all(a==a1) for a pair that behave differently in this circumstance). Hmmm... A On 4/5/06, Andrew Jaffe wrote: > > Hi All, > > I've encountered a strange problem: I've been running some python code > on both a linux box and OS X, both with python 2.4.1 and the latest > numpy and matplotlib from svn. > > I have found that when I transfer pickled numpy arrays from one machine > to the other (in either direction), the resulting data *looks* all right > (i.e., it is a numpy array of the correct type with the correct values > at the correct indices), but it seems to produce the wrong result in (at > least) one circumstance: matplotlib.hist() gives the completely wrong > picture (and set of bins). > > This can be ameliorated by running the array through > arr=numpy.asarray(arr, dtype=numpy.float64) > but this seems like a complete kludge (and is only needed when you do > the transfer between machines). > > I've attached a minimal code that exhibits the problem: try > test_pickle_hist.test(write=True) > on one machine, transfer the output file to another machine, and run > test_pickle_hist.test(write=False) > on another, and you should see a very strange result (and it should be > fixed if you set asarray=True). > > Any ideas? > > Andrew > > > import cPickle > import numpy > import pylab > > def test(write=True,asarray=False): > > a = numpy.linspace(-3,3,num=100) > > if write: > f1 = file("a.cpkl", 'w') > cPickle.dump(a, f1) > f1.close() > > f1 = open("a.cpkl", 'r') > a1 = cPickle.load(f1) > f1.close() > > pylab.subplot(1,2,1) > h = pylab.hist(a) > > if asarray: > a1 = numpy.asarray(a1, dtype=numpy.float64) > > pylab.subplot(1,2,2) > h1 = pylab.hist(a1) > > return a, a1 > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From byrnes at bu.edu Wed Apr 5 08:58:21 2006 From: byrnes at bu.edu (John Byrnes) Date: Wed Apr 5 08:58:21 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <4433C6D6.5080800@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: <20060405155736.GA9364@localhost.localdomain> Hi Eric, In the past , I've done things like ###### normdist = lambda x: numpy.random.normal(0,x) vecnormal = numpy.vectorize(normdist) stdev = numpy.array([1.1,1.2,1.0,2.2]) result = vecnormal(stdev) ###### This works fine for up to 10k elements for stdev for some reason. Any larger then that and i get a Bus error on my PPC mac and a segfault on my x86 linux box. I'm running numpy 0.9.7.2325 on both machines. Perhaps for larger inputs, you could break up your loop into smaller vectorized chunks. Regards, John On Wed, Apr 05, 2006 at 03:32:06PM +0200, Eric Emsellem wrote: > Hi, > > I am trying to optimize a code where I derive random numbers many times > and having an array of values for the stdev parameter. > > I wish to have an efficient way of doing something like: > ################## > stdev = array([1.1,1.2,1.0,2.2]) > result = numpy.zeros(stdev.shape, Float) > for i in range(len(stdev)) : > result[i] = numpy.random.normal(0, stdev[i]) > ################## > > In my case, stdev can in fact be an array of a few millions floats... > so I really need to optimize things. > > Any hint on how to code this efficiently ? > > And in general, where could I find tips for optimizing a code where I > unfortunately have too many loops such as "for i in range(Nbody) : " > with Nbody being > 10^6 ? > > thanks! > Eric > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- If liberty and equality, as is thought by some are chiefly to be found in democracy, they will be best attained when all persons alike share in the government to the utmost. -- Aristotle, Politics From bsouthey at gmail.com Wed Apr 5 09:05:03 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Wed Apr 5 09:05:03 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: Hi, Sorry that you received such an email. It is one thing to disagree with your choice but it is inexcusable to dictate what you should do with your code/documentation (not to mention the language). Unfortunately, this appears to be the result of the typical confusion of what 'free' refers to in open source software. If this person thought that purchasing documentation is bad then I wonder what they think of the PyMOL project: "If you use PyMOL at work, then you are asked and expected to sponsor the project by purchasing a PyMOL Subscription" (http://www.pymol.org/funding.html)! Really the 'book' issue is more an excuse than a real reason for people not to use numpy. Personally I really think that you should get the 1.0 release out that probably would change some minds. Based on the list postings, the stability of numpy already exceeds a typical 1.0 release level. Regards Bruce From schofield at ftw.at Wed Apr 5 09:10:05 2006 From: schofield at ftw.at (Ed Schofield) Date: Wed Apr 5 09:10:05 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <4433EC3C.9050706@ftw.at> I'd also like to express my gratitude, Travis, for all the time and energy you've donated to both NumPy and SciPy. I also fully support your decision to charge for your book. Perhaps your correspondent expects your book to be free because it's online. Perhaps some re-branding -- from "fee-based documentation" to "book" or "handbook for users and developers" -- would help to avoid evoking such unfair responses? Incidentally, you mention on on the site that you'll print and bind hard-copy version once your sales reach 200 copies. I think this would help to encourage libraries and conservative institutions to purchase copies. Are your sales still under this level?! I'm now going to order a copy for my institution -- and a hard copy when it's available :) -- Ed From robert.kern at gmail.com Wed Apr 5 09:11:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 09:11:01 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4433DF85.7030109@gmail.com> References: <4433DF85.7030109@gmail.com> Message-ID: Andrew Jaffe wrote: > Hi All, > > I've encountered a strange problem: I've been running some python code > on both a linux box and OS X, both with python 2.4.1 and the latest > numpy and matplotlib from svn. > > I have found that when I transfer pickled numpy arrays from one machine > to the other (in either direction), the resulting data *looks* all right > (i.e., it is a numpy array of the correct type with the correct values > at the correct indices), but it seems to produce the wrong result in (at > least) one circumstance: matplotlib.hist() gives the completely wrong > picture (and set of bins). > > This can be ameliorated by running the array through > arr=numpy.asarray(arr, dtype=numpy.float64) > but this seems like a complete kludge (and is only needed when you do > the transfer between machines). You have a byteorder issue. You Linux box, which I presume has an Intel or AMD CPU, is little-endian where your OS X box, which I presume has a PPC CPU, is big-endian. numpy arrays can store their data in either endianness on either kind of platform; their dtype objects tell you which byteorder they are using. In the dtype specifications below, '>' means big-endian (I am using a PPC PowerBook), and '<' means little-endian. In [31]: a = linspace(0, 10, 11) In [32]: a Out[32]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) In [33]: a.dtype Out[33]: dtype('>f8') In [34]: b = a.newbyteorder() In [35]: b Out[35]: array([ 0.00000000e+000, 3.03865194e-319, 3.16202013e-322, 1.04346664e-320, 2.05531309e-320, 2.56123631e-320, 3.06715953e-320, 3.57308275e-320, 4.07900597e-320, 4.33196758e-320, 4.58492919e-320]) In [36]: b.dtype Out[36]: dtype(' References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> Message-ID: <4433F1F6.4010603@noaa.gov> Zachary Pincus wrote: > from Numeric (who was used to the large, free manual) Which brings up a question: Is the source to the old Numeric manual available? it would be nice to "port" it to SciPy. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From bsouthey at gmail.com Wed Apr 5 09:46:03 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Wed Apr 5 09:46:03 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <4433C6D6.5080800@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: Hi, Can you provide more details on what you are doing, especially how you are using this? The one item that is not directly part of Tim's list is that some times you need to reorder your loops (perhaps this is part of "Think about your algorithm"?). Loop swapping is very common to improve performance. However, it usually requires a very clear head or someone else to do it. Also, you can might need to break loops into pieces where you repeat the same tasks and computations over and over. The other aspect is to do some algebra on the calculations as the stdev is essentially a constant so depending on how you use it you can factor it out further. Again it all depends on what you are actually doing with these numbers. >From a different view, you need to be very careful with your (pseudo)random number generator with that many samples. These have a tendency to repeat so your random number stream is no longer random. See the Wikipedia entry: http://en.wikipedia.org/wiki/Pseudorandom_number_generator If I recall correctly, the Python random number generator is a Mersenne twister but ranlib is not and so prone to the mentioned problems. I do not know if SciPy adds any other generators. Finally I would also cheat by reducing the stdev values because in many cases you will not see a real difference between a normal with mean zero and variance 1.0 and a normal with mean zero and variance 1.1 (especially if you are doing more than comparing distributions so there are more sources of 'error') unless you have a really large number of samples. Regards Bruce On 4/5/06, Eric Emsellem wrote: > Hi, > > I am trying to optimize a code where I derive random numbers many times > and having an array of values for the stdev parameter. > > I wish to have an efficient way of doing something like: > ################## > stdev = array([1.1,1.2,1.0,2.2]) > result = numpy.zeros(stdev.shape, Float) > for i in range(len(stdev)) : > result[i] = numpy.random.normal(0, stdev[i]) > ################## > > In my case, stdev can in fact be an array of a few millions floats... > so I really need to optimize things. > > Any hint on how to code this efficiently ? > > And in general, where could I find tips for optimizing a code where I > unfortunately have too many loops such as "for i in range(Nbody) : " > with Nbody being > 10^6 ? > > thanks! > Eric > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Wed Apr 5 09:58:08 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 09:58:08 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array] Message-ID: <4433F71B.5080201@cox.net> Eric Emsellem wrote: > Hi, > this is illuminating in fact. These are things I would not have > thought about. > > I am trying at the moment to understand why two versions of my program > have a difference of about 10% (here it is 2sec for 1000 points, so > you can imagine for 10^6...) although the code is nearly the same. > > I have loops such as: > > #################### > bigarray = array of Nbig points > for i in range(N) : > bigarray = bigarray + calculation > #################### If you tell us more about calculation, we could probably help more. This sounds like you want to vectorize the inner loop, but you may in fact have already done that. There's nothing wrong with looping in python as long as you amortize the loop overhead over a large number of operations. Thus, the advice to vectorize your *inner* loop, not vectorize all loops. Attempting the latter can lead to impenatrable code, usually doesn't help signifigantly and sometimes slows things down as you overflow the cache with big matrices. > > I thought to do it by: > #################### > bigarray = numpy.sum(array([calculation for i in range(N)])) > #################### > not sure this is good... I suspect not, but timeit is your friend.... > > And you are basically saying that > > bigarray = bigarray + calculation > > is better than > > bigarray += calculation > > or is it strictly equivalent? (in terms of CPU...) Actually the reverse. "bigarray += calculation" should be better in terms of both speed and memory usage. In this case it's also clearer, so it's an improvement all around. They both do the same number of adds, but the first allocates more memory and pushes more data back and forth between main memory and the cache. The point I was making about += verus + was that I wouldn't in general recommend: a = some_func() a += something_else over: a = some_func() + something_else because it's less clear. In cases, where you do need really need the speed, it's fine, but most of the time that's not the case. In your case, the speedup is fairly minor, I believe because random.normal is fairly expensive. If you instead compare these two ways of computing a cube, you'll see a much larger difference (37%). >>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3" >>> timeit.Timer('stddev * stddev * stddev', setup).timeit(20) 1.206557537340359 >>> timeit.Timer('result = stddev*stddev; result *= stddev', setup).timeit(20) 0.88055493086403658 However, if you work with smaller matrices, the effect almost disappears (5%): >>> setup = "import numpy; stddev=numpy.arange(1e4,dtype=float)%3" >>> timeit.Timer('result = stddev*stddev; result *= stddev', setup).time 0.10166515576702295 >>> timeit.Timer('stddev * stddev * stddev', setup).timeit(2000) 0.10613667379493563 I believe that's because the speedup is nearly all due to reducing the amount of data you move around. In the second case everything fits in the cache, so this effect is minor. In the first you are pushing data back and forth to main memory so it's fairly large. On my machine these sort of effects kick in somewhere between 10,000 and 100,000 elements. > > thanks for the help, and sorry for the dum questions Not a problem. These are all legitimate questions that you can't really be expected to know without a fair amount of experience with numpy or its predecessors. It would be cool if someone added a page to the wicki on the topic so we could start collecting and orgainizing this information. For all I know there's one already there though -- I should probably check. -tim > > Eric > > Tim Hochberg wrote: > >> Eric Emsellem wrote: >> >>> >>>> >>>> >>>> Since stdev essentially scales the output of random, wouldn't the >>>> followin be equivalent to the above? >>>> >>>> result = numpy.random.normal(0, 1, stddev.shape) >>>> result *= stdev >>>> >>> yes indeed, this is a good option where in fact I could do >>> >>> result = stddev * numpy.random.normal(0, 1, stddev.shape) >>> >>> in one line. >>> thanks for the tip >> >> >> Indeed you can. However, keep in mind that the one line version is >> equivalent to: >> >> temp = numpy.random.normal(0, 1, stddev.shape) >> result = stddev * temp >> >> That is, it creates an extra temporary variable only to throw it >> away. The two line version I posted above avoids that temporary and >> thus should be both faster and less memory hungry. It's always good >> to check these things however: >> >> >>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3" >> >>> timeit.Timer('stddev * numpy.random.normal(0, 1, stddev.shape)', >> setup).timeit(20) >> 3.4527201082819232 >> >>> timeit.Timer('result = numpy.random.normal(0, 1, stddev.shape); >> result*=stddev', setup).timeit(20) >> 3.1093337281693607 >> >> So, yes, the two line method is marginally faster (about 10%). Most >> of the time you shouldn't care about this: the one line version is >> clearer and most of the code you write isn't a bottleneck. Starting >> out writing this as the two line version is premature optimization. I >> used it here since the question was about optimization . >> >> I see Robert Kern just posted my list. If you want to put this in >> terms of that list, then: >> >> 0. Think about your algorithm >> => Recognize that stddev is a scale parameter >> 1. Vectorize your inner loop. >> => This is a no brainer after 0 resulting in the one line version >> 2. Eliminate temporaries >> => This results in the two line version. >> ... >> >> Also key here is recognizing when to stop. Steps 0 is always >> appropriate and step 1 is almost always good, resulting in code that >> is both clearer and faster. However, once you get to step 2 and >> beyond you tend to trade speed/memory usage for clarity. Not always: >> sometime *= and friends are clearer, but often, particularly if you >> start resorting to three arg ufuncs. So, my advice is to stop >> optimizing as soon as your code is fast enough. >> >> >>> (of course this is not strictly equivalent depending on the random >>> generator, but that will be fine for my purpose) >> >> >> I'll have to take your word for it -- after the normal distribution >> my knowledge in the area peters out rapidly/ >> >> Regards, >> >> -tim >> >> > From emsellem at obs.univ-lyon1.fr Wed Apr 5 10:06:04 2006 From: emsellem at obs.univ-lyon1.fr (Eric Emsellem) Date: Wed Apr 5 10:06:04 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: <4433F8D1.7090305@obs.univ-lyon1.fr> An HTML attachment was scrubbed... URL: From perry at stsci.edu Wed Apr 5 10:09:01 2006 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 5 10:09:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433F1F6.4010603@noaa.gov> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: On Apr 5, 2006, at 12:36 PM, Christopher Barker wrote: > Zachary Pincus wrote: >> from Numeric (who was used to the large, free manual) > > Which brings up a question: Is the source to the old Numeric manual > available? it would be nice to "port" it to SciPy. Sort of. The original source was in Framemaker format. It was converted to the Python latex framework in the process of being adopted to numarray. The source for that is available on the numarray repository. If you want the framemaker source, I may be able to dig that up somewhere (or I may have lost track of it :-). Paul Dubois can likely provide it as well; that's who gave me the source. Perry From hetland at tamu.edu Wed Apr 5 10:15:27 2006 From: hetland at tamu.edu (Robert Hetland) Date: Wed Apr 5 10:15:27 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433F1F6.4010603@noaa.gov> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: Let's not forget that this documentation will eventually be free *no matter what* -- after a financial goal is met or after a certain amount of time. This makes it fundamentally different than a published book (and in my opinion, much better). I personally think this is an innovative way to create a free product that everybody wants, but nobody wants to do. -Rob ----- Rob Hetland, Assistant Professor Dept of Oceanography, Texas A&M University p: 979-458-0096, f: 979-845-6331 e: hetland at tamu.edu, w: http://pong.tamu.edu From fonnesbeck at gmail.com Wed Apr 5 10:28:10 2006 From: fonnesbeck at gmail.com (Chris Fonnesbeck) Date: Wed Apr 5 10:28:10 2006 Subject: Fwd: [Numpy-discussion] NumPy documentation In-Reply-To: <723eb6930604051026q7dbcaad2w47c059f6c88e8db7@mail.gmail.com> References: <4432E27E.6030906@ee.byu.edu> <723eb6930604051026q7dbcaad2w47c059f6c88e8db7@mail.gmail.com> Message-ID: <723eb6930604051027m5aac408dnbba356ebdcb389ac@mail.gmail.com> On 4/4/06, Travis Oliphant wrote: > > I received a rather hurtful email today that was very discouraging to me > personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. There is one in every crowd, it seems. This email, and any others like it, should be utterly ignored, in the hopes that their authors will go elsewhere for scientific computing solutions. If they had spent any time at all on this list, they would have noticed the seemingly boundless attention and support that Travis bestows upon both scipy and its user community. Chris -- Chris Fonnesbeck + Atlanta, GA + http://trichech.us -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Apr 5 10:29:07 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed Apr 5 10:29:07 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433EC3C.9050706@ftw.at> References: <4432E27E.6030906@ee.byu.edu> <4433EC3C.9050706@ftw.at> Message-ID: Heh, On 4/5/06, Ed Schofield wrote: > Perhaps some re-branding -- from "fee-based documentation" to > "book" or "handbook for users and developers" I think that's a great idea! "Handbook for Users and Developers" sounds much better and doesn't have that nasty "documentation should be free" implication. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Apr 5 11:35:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 11:35:01 2006 Subject: [Numpy-discussion] Re: A random.normal function with stdev as array In-Reply-To: <4433F8D1.7090305@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> <4433F8D1.7090305@obs.univ-lyon1.fr> Message-ID: > Bruce Southey wrote: >>>From a different view, you need to be very careful with your >>(pseudo)random number generator with that many samples. These have a >>tendency to repeat so your random number stream is no longer random. >>See the Wikipedia entry: >>http://en.wikipedia.org/wiki/Pseudorandom_number_generator >> >>If I recall correctly, the Python random number generator is a >>Mersenne twister but ranlib is not and so prone to the mentioned >>problems. I do not know if SciPy adds any other generators. numpy.random uses the Mersenne Twister. RANLIB is dead! Long live MT19937! -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Wed Apr 5 11:59:04 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed Apr 5 11:59:04 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: <44341348.3050505@noaa.gov> Perry Greenfield wrote: > Sort of. The original source was in Framemaker format. It was converted > to the Python latex framework in the process of being adopted to > numarray. The source for that is available on the numarray repository. > If you want the framemaker source, I may be able to dig that up > somewhere (or I may have lost track of it :-). Paul Dubois can likely > provide it as well; that's who gave me the source. Thanks. That's good news. Now, when I'm done with everything else I want to work on..... LaTeX is a better option for me anyway. In fact, it's a better option for anyone that doesn't already use FrameMaker, as you can at least edit some of the text without knowing or using LaTeX at all. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Apr 5 12:07:10 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed Apr 5 12:07:10 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: References: Message-ID: <44341538.4040907@noaa.gov> Zachary Pincus wrote: > I often construct arrays from list comprehensions on generators, > numpy.array([map(float, line.split()) for line in file]) I know there are other uses, and this was just an example, but you can now do: numpy.fromfile(file, dtype=numpy.Float, sep="\t") Which is much faster and cleaner, if you ask me. Thanks for adding this, Travis! Tim Hochberg wrote: > Without this, you probably can't do much > better than just building a list from the array. What would work well > would be to build a list, then steal its memory. Perhaps another option is to borrow the machinery from fromfile (see above), that builds an array without knowing how big it is when it starts. I haven't looked at the code, but I know that Travis got at least the idea, if not the method, from my FileScanner module I wrote a while back, and that dynamically allocated the memory it needed as it grew. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tim.hochberg at cox.net Wed Apr 5 12:16:11 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 12:16:11 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: <884F03C6-599C-426A-A0A0-97009B63EACB@stanford.edu> References: <44331200.2020604@cox.net> <884F03C6-599C-426A-A0A0-97009B63EACB@stanford.edu> Message-ID: <4434175D.10103@cox.net> Zachary Pincus wrote: > [sorry if this comes through twice -- seems to have not sent the > first time] I've only seen it once so far, but my numpy mail seems to be coming through all out of order right now. > Hi folks, > > tim> > >> I brought this up last week and Travis was OK with it. I have it on >> my todo list, but if you are in a hurry you're welcome to do it >> instead. > > > Sorry if that was on the list and I missed it! Hate to be adding more > noise than signal. At any rate, I'm not in a hurry, but I'd be happy > to help where I can. (Though for the next week or so I think I'm > swamped...) There was no real discussion then. I said I thought it was a good idea. Travis said OK. That was about it. > tim> > >> If you do look at it, consider looking into the '__length_hint__ >> parameter that's slated to go into Python 2.5. When this is present, >> it's potentially a big win, since you can preallocate the array and >> fill it directly from the iterator. Without this, you probably can't >> do much better than just building a list from the array. What would >> work well would be to build a list, then steal its memory. I'm not >> sure if that's feasible without leaking a reference to the list though. > > > Can you steal its memory and then give it some dummy memory that it > can free without problems, so that the list can be deallocated > without trouble? Does anyone know if you can just give the list a > NULL pointer for it's memory and then immediately decref it? free > (NULL) should always be safe, I think. (??) That might well work, but now I realize that using a list this way probably won't work out well for other reasons. >> Also, with iterators, specifying dtype will make a huge difference. >> If an object has __length_hint__ and you specify dtype, then you can >> preallocate the array as I suggested above. However, if dtype is not >> specified, you still need to build the list completely, determine >> what type it is, allocate the array memory and then copy the values >> into it. Much less efficient! > > > How accurate is __length_hint__ going to be? It could lead to a fair > bit of special case code for growing and shrinking the final array if > __length_hint__ turns out to be wrong. see below. > Code that python lists already have, moreover. If we don't know dtype up front, lists are great. All the code is there and we need to look at all of the elements before we know what the elements are anyway. However, if you do know what dtype is the situation is different. Since these are generators, the object they create may only last until the next next() call if we don't hold onto it. That means that for a matrix of size N, generating thw whole list is going to require N*(sizeof(long) + sizeof(pyobjType) + sizeof(dtype)), versus just N*sizeof(dtype) if we're careful. I'm not sure what all of those various sizes are, but I'm going to guess that we'd be at least doubling our memory. All is not lost however. When we know the dtype, we should just use a *python* array to hold the data. It works just like a list, but on packed data. > > If the list's memory can be stolen safely, how does this strategy sound: Let me break this into two cases: 1. We don't know the dtype. > - Given a generator, build it up into a list internally +1 > , and then steal the list's memory. -0.5 I'm not sure this buys us as much as I thought initially. The list memory is PyObject*, so this would only work on dtypes no larger than the size of a pointer, usually that means no larger than a long. So, basically this would work on most of the integer types, but not the floating point types. And, it adds extra complexity to support two different cases. I'd be inclined to start with just copying the objects out of the list. If someone feels like it later, they can come back and try to optimize the case of integers to steal the lists memory.. Keep in mind that once we have a list, we can simple pass it to the machinery that already exists for creating arrays from lists making our lives much easier. > - If a dtype is provided, wrap the generator with another generator > that casts the original generator's output to the correct dtype. Then > use the wrapped generator to create a list of the proper dtype, and > steal that list's memory. -1. This wastes a lot of space and sort of defeats the purpose of the whole exercise in my mind. 2. Dtype is known. The case where dtype is provided is more complicated, but this is the case we really want to support well. Actually though, I think we can simplify it by judicious punting. Case 2a. Array is not 1-dimensional. Punt and fallback on the general code above. We can determine this simply by testing the first element. If it's not int/float/complex/whatever-other-scalar-values-we-have, fall back to case 1. Case 2b: length_ hint is not given. In this case, we build up the array in a python array, steal the data, possibly realloc and we're done. Case 2b length_hint is given. Same as above, but preallocate the appropriate amount of memory. Growing if length_hint lies. > > A potential problem with stealing list memory is that it could waste > memory if the list has more bytes allocated than it is using (I'm not > sure if python lists can get this way, but I presume that they resize > themselves only every so often, like C++ or Java vectors, so most of > the time they have some allocated but unused bytes). If lists have a > squeeze method that's guaranteed not to cause any copies, or if this > can be added with judicious use of realloc, then that problem is > obviated. I imagine once you steal the memory, realloc would the thing to try. However, I don't think it's worth stealing the memory from lists. I do think it's worth stealing the memory from python arrays however, and I'm sure that the same issue exists there. We'll have to look at how the deallocation for an array works. It probably use Py_XDecref, in which case we can just replace the memory with NULL and we'll be fine. OK, just had a look at the code for the python array object (Modules/arraymodule.c). Looks like it'll be a piece of cake. We can allocate it to the exact size we want if we have length_hint, otherwise resize only overallocates by 6%. That's not enough to worry about reallocing. Stealing the data looks like it shouldn't a problem either, just NULL ob_item as you suggested. Regards, -tim > > robert> > >> Another note of caution: You are going to have to deal with >> iterators of >> iterators of iterators of.... I'm not sure if that actually overly >> complicates >> matters; I haven't looked at PyArray_New for some time. Enjoy! > > > This is a good point. Numpy does fine with nested lists, but what > should it do with nested generators? I originally thought that > basically 'array(generator)' should make the exact same thing as > 'array([f for f in generator])'. However, for nested generators, this > would be an object array of generators. > > I'm not sure which is better -- having more special cases for > generators that make generators, or having a simple rubric like above > for how generators are treated. > > Any thoughts? > > Zach > > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From aisaac at american.edu Wed Apr 5 14:01:01 2006 From: aisaac at american.edu (Alan G Isaac) Date: Wed Apr 5 14:01:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433EC3C.9050706@ftw.at> References: <4432E27E.6030906@ee.byu.edu><4433EC3C.9050706@ftw.at> Message-ID: On Wed, 05 Apr 2006, Ed Schofield apparently wrote: > you mention on on the site that you'll print and bind > hard-copy version once your sales reach 200 copies. > I think this would help to encourage libraries and > conservative institutions to purchase copies. Unfortunately, my library falls in this category. They were uncertain how to enforce the copyright with an electronic copy. (They are still thinking about it, last I heard.) Cheers, Alan Isaac From rahul.kanwar at gmail.com Wed Apr 5 16:25:01 2006 From: rahul.kanwar at gmail.com (Rahul Kanwar) Date: Wed Apr 5 16:25:01 2006 Subject: [Numpy-discussion] Numpy on 64 bit Xeon with ifort and mkl Message-ID: <63dec5bf0604051624k70c565baw70347a2fd571c253@mail.gmail.com> Hello, I am trying to compile Numpy on 64 bit Xeon with ifort and mkl libraries running Suse 10.0 linux. I had set the MKLROOT variable to the mkl library root but it could'nt find the 64 bit library. After a little bit of snooping I found the following in numpy/distutils/cpuinfo.py ------------------------------ def _is_XEON(self): return re.match(r'.*?XEON\b', self.info[0]['model name']) is not None _is_Xeon = _is_XEON ------------------------------ I changed XEON to Xeon and it worked and was able to indentify the em64t libraries. But it again got stuck with the following message. I used the following command to build Numpy python setup.py config_fc --fcompiler=intel install ------------------------------ building 'numpy.core._dotblas' extension compiling C sources gcc options: '-pthread -fno-strict-aliasing -DNDEBUG -O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -fPIC' compile options: '-Inumpy/core/blasdot -I/opt/intel/mkl/8.0.2/include -Inumpy/core/include -Ibuild/src/numpy/core -Inumpy/core/src -Inumpy/core/include -I/usr/include/python2.4 -c' gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/opt/intel/mkl/8.0.2/lib/em64t -lmkl_em64t -lmkl -lvml -lguide -lpthread -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: /opt/intel/mkl/8.0.2/lib/em64t/libmkl_em64t.a(def_cgemm_omp.o): relocation R_X86_64_PC32 against `_mkl_blas_def_cgemm_276__par_loop0' can not be used when making a shared object; recompile with -fPIC /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: /opt/intel/mkl/8.0.2/lib/em64t/libmkl_em64t.a(def_cgemm_omp.o): relocation R_X86_64_PC32 against `_mkl_blas_def_cgemm_276__par_loop0' can not be used when making a shared object; recompile with -fPIC /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/opt/intel/mkl/8.0.2/lib/em64t -lmkl_em64t -lmkl -lvml -lguide -lpthread -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so" failed with exit status 1 ---------------------------------------------- i successfuly compiled it without the -lmkl_em64t flag but when i import numpy in python it gives error that some symbol is missing. I think that maybe if i use ifort as the linker instead ok gcc then things will work out properly, but i could'nt find how to change the linker to ifort. Aynone there who can help me with this problem ? regards, Rahul From robert.kern at gmail.com Wed Apr 5 17:17:04 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 17:17:04 2006 Subject: [Numpy-discussion] Re: Numpy on 64 bit Xeon with ifort and mkl In-Reply-To: <63dec5bf0604051624k70c565baw70347a2fd571c253@mail.gmail.com> References: <63dec5bf0604051624k70c565baw70347a2fd571c253@mail.gmail.com> Message-ID: Rahul Kanwar wrote: > i successfuly compiled it without the -lmkl_em64t flag but when i import > numpy in python it gives error that some symbol is missing. I think > that maybe if i use ifort as the linker instead ok gcc then things > will work out properly, but i could'nt find how to change the linker > to ifort. Aynone there who can help me with this problem ? It's not likely that using ifort to link will help. The problem is this bit: > /opt/intel/mkl/8.0.2/lib/em64t/libmkl_em64t.a(def_cgemm_omp.o): > relocation R_X86_64_PC32 against `_mkl_blas_def_cgemm_276__par_loop0' > can not be used when making a shared object; recompile with -fPIC You are linking against static libraries which were not compiled to be "position independent;" that is, they can't be used in shared libraries which are what Python extension modules are. C.f.: http://en.wikipedia.org/wiki/Position_independent_code Look around in /opt/intel/; they've almost certainly have provided shared library versions of the MKL that could be used. Google gives me these, for example: http://www.intel.com/support/performancetools/libraries/mkl/linux/sb/cs-017267.htm http://www.intel.com/software/products/mkl/docs/mklgs_lnx.htm#Linking_Your_Application_with_Intel_MKL -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ryanlists at gmail.com Wed Apr 5 19:50:07 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Wed Apr 5 19:50:07 2006 Subject: [Numpy-discussion] eye(N,dtype='S10') Message-ID: I am trying to create a function that can return a matrix that is either made up of complex numbers or strings depending on the input. I have created a symbolic string class to help me with that and it works well. One clumsy part is that in several cases I want to create an identity matrix and just replace a couple of elements. I currently have to do this in two steps: In [27]: mymat=numpy.eye(4,dtype='f') In [28]: mymat.astype('S10') Out[28]: array([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0]], dtype=(string,10)) I create a floating point matrix in the string case rather than a complex matrix so I don't have to parse the +0.0j stuff. But what I would really like is to be able to just be able to create either a complex matrix or a string matrix at the beginning. But trying numpy.eye(4,dtype='S10') produces array([[True, False, False, False], [False, True, False, False], [False, False, True, False], [False, False, False, True]], dtype=(string,10)) rather than array([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0]], dtype=(string,10)) I need 1's and 0's rather than True and False because when I am done, I put the string representation into an input script to Maxima and Maxima wouldn't handle the True and False values well. Is there a way to directly create an identitiy string matrix with '1' and '0' instead of True and False? Thanks, Ryan From arnd.baecker at web.de Wed Apr 5 23:51:03 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 23:51:03 2006 Subject: [Numpy-discussion] Converting from Numeric (was: Speed up function on cross product of two sets?) In-Reply-To: References: <44315633.4010600@cox.net> Message-ID: Moin Moin, On Wed, 5 Apr 2006, Pearu Peterson wrote: > On Wed, 5 Apr 2006, Arnd Baecker wrote: > > > BTW, it seems that we have no Numeric to numpy transition remarks in > > www.scipy.org. I only found > > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > > and of course Travis' "Guide to NumPy" contains a detailed list of > > necessary changes in chapter 2.6.1. > > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > > automatic conversion. > > > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > > (similar to http://www.scipy.org/Converting_from_numarray) > > which aims at summarizing the necessary changes > > or expand Pearu's page (if he agrees) on this? > > It's better to start a new wiki page similar to Converting_from_numarray > (I like the table). Based on the above links I have set up a first draft at http://www.scipy.org/Converting_from_Numeric It is surely not complete and there are a couple of things which have to be checked for correctness (I tried out some, but not all ...). Also some remarks on using the new features of numpy (e.g., use array indexing instead of take and put...) might be useful. > Btw, I have few notes about the necessary changes for > Numeric->numpy transition in the following page: > > http://svn.enthought.com/enthought/wiki/NumpyPort#NotesonchangesduetoreplacingNumeric/scipy_basewithnumpy > > Feel free to grab these notes. Great - thanks, I tried to incorporate them as well. Best, Arnd From cimrman3 at ntc.zcu.cz Thu Apr 6 01:48:05 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu Apr 6 01:48:05 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <4434D58D.2010505@ntc.zcu.cz> Travis Oliphant wrote: > > I received a rather hurtful email today that was very discouraging to me > ... Coming late on line, I can just +1 to all the support and appreciation you have received so far! r. From oliphant.travis at ieee.org Thu Apr 6 01:54:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 01:54:01 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: <44315633.4010600@cox.net> Message-ID: <4434D6DF.2020306@ieee.org> Arnd Baecker wrote: > BTW, it seems that we have no Numeric to numpy transition remarks in > www.scipy.org. I only found > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > and of course Travis' "Guide to NumPy" contains a detailed list of > necessary changes in chapter 2.6.1. > For clarification: this is in the sample chapter available on-line to all.... > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > automatic conversion. > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > (similar to http://www.scipy.org/Converting_from_numarray) > which aims at summarizing the necessary changes > or expand Pearu's page (if he agrees) on this? > Absolutely. I did the Numarray page because I'd written a lot on Converting from Numeric (even providing convertcode.py) but very little for numarray --- except the ndimage conversion. So, I started the Numarray page. Sounds like a great idea to have a dual page. -Travis From oliphant.travis at ieee.org Thu Apr 6 02:21:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 02:21:02 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: References: <44331200.2020604@cox.net> Message-ID: <4434DD42.8010205@ieee.org> > Can you steal its memory and then give it some dummy memory that it > can free without problems, so that the list can be deallocated without > trouble? Does anyone know if you can just give the list a NULL pointer > for it's memory and then immediately decref it? free(NULL) should > always be safe, I think. (??) > I don't think you can steal a list's memory since each list element is a actually pointer to some other Python Object. However, a Python array's memory could be stolen (as Tim mentions later). > This is a good point. Numpy does fine with nested lists, but what > should it do with nested generators? I originally thought that > basically 'array(generator)' should make the exact same thing as > 'array([f for f in generator])'. However, for nested generators, this > would be an object array of generators. > > I'm not sure which is better -- having more special cases for > generators that make generators, or having a simple rubric like above > for how generators are treated. I like the idea that generators of generators acts the same as lists of lists (i.e. recursively defined). Basically to implement this, we need to repeat Array_FromSequence discover_depth discover_dimensions discover_itemsize Or, just maybe we can figure out a way to enhance those functions so that creating an array from generators works the same as creating an array from sequences. Right now, the sequence interface is used. Perhaps we could figure out a way to use a more abstract interface which would include both generators and sequences. If that causes too much alteration then I don't think it's worth it and we just repeat those functions for generators. Now, I think there are two cases here that are being discussed as one 1) Creating arrays from iterators --- array( iter(xrange(10) ) 2) Creating arrays from generators --- array(x for x in xrange(10)) Both of these cases really ought to be handled and really should be integrated into the Array_FromSequence code. That code is inherited from Numeric and was written before iterators and generators arose on the scene. There ought to be a way to unify all of these notions (Actually if you handle iterators, then sequences will come along for the ride since sequences can behave as iterators). I'd rather see one place in the code that handles these cases. But, working code is usually better than dreamy plans :-) -Travis From oliphant.travis at ieee.org Thu Apr 6 02:38:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 02:38:04 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <20060405155736.GA9364@localhost.localdomain> References: <4433C6D6.5080800@obs.univ-lyon1.fr> <20060405155736.GA9364@localhost.localdomain> Message-ID: <4434E13B.4000702@ieee.org> John Byrnes wrote: > Hi Eric, > > In the past , I've done things like > > ###### > normdist = lambda x: numpy.random.normal(0,x) > vecnormal = numpy.vectorize(normdist) > > stdev = numpy.array([1.1,1.2,1.0,2.2]) > result = vecnormal(stdev) > > ###### > > This works fine for up to 10k elements for stdev for some reason. > Any larger then that and i get a Bus error on my PPC mac and a segfault on > my x86 linux box. > > This needs to be tracked down. It looks like some-kind of error is not being caught correctly. You should not get a segfault. Could you provide a stack-trace when the problem occurs? One issue is that vectorize is using object arrays under the covers which is consuming roughly 2x the memory than you may think. An object array is created and the function is called for every element. This object array is then converted to a number type after the fact. The segfault should be tracked down in any case. -Travis From pau.gargallo at gmail.com Thu Apr 6 02:44:03 2006 From: pau.gargallo at gmail.com (Pau Gargallo) Date: Thu Apr 6 02:44:03 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: <6ef8f3380604060243u2f54efc3r2baba94688c5d0af@mail.gmail.com> On 4/5/06, Perry Greenfield wrote: > > On Apr 5, 2006, at 12:36 PM, Christopher Barker wrote: > > > Zachary Pincus wrote: > >> from Numeric (who was used to the large, free manual) > > > > Which brings up a question: Is the source to the old Numeric manual > > available? it would be nice to "port" it to SciPy. > > Sort of. The original source was in Framemaker format. It was converted > to the Python latex framework in the process of being adopted to > numarray. The source for that is available on the numarray repository. > If you want the framemaker source, I may be able to dig that up > somewhere (or I may have lost track of it :-). Paul Dubois can likely > provide it as well; that's who gave me the source. > > Perry > +1 to any support to Travis Oliphant. Your work is really helping us. I am quite ignorant about licences and copyright things, so I would like to know: 1.- Is it OK to just copy the old Numeric documentation to the wiki and use it as a starting point for a more complete and updated doc? 2.- Would that be fine for the authors? I guess it will be very useful to everyone (especially beginners) to have an extended version of this documentation where there are many examples of use for every function. The wiki seems a very efficient way to build such a thing. It will take some time to manually copy-paste everything to the wiki, but it is doable what do you think? pau From oliphant.travis at ieee.org Thu Apr 6 02:46:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 02:46:02 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: References: <4433DF85.7030109@gmail.com> Message-ID: <4434E31B.5030306@ieee.org> Robert Kern wrote: > You have a byteorder issue. You Linux box, which I presume has an Intel or AMD > CPU, is little-endian where your OS X box, which I presume has a PPC CPU, is > big-endian. numpy arrays can store their data in either endianness on either > kind of platform; their dtype objects tell you which byteorder they are using. > > In [54]: c.sort() > > In [55]: c > Out[55]: array([ 0., 2., 3., 4., 5., 6., 7., 8., 9., 10., 1.]) > > > This is a bug. > > http://projects.scipy.org/scipy/numpy/ticket/47 > Good catch. This bug was due to an oversight when adding the new sorting functions. The case of byte-swapped data was not handled. Judicious use of copyswap on the buffer fixed it. But, this brings up the point that currently the pickled raw-data which is read-in as a string by Python is used as the memory for the new array (i.e. the string memory is "stolen"). This should work. The fact that it didn't with sort was a bug that is now fixed in SVN. However, operations on out-of-byte-order arrays will always be slower. Thus, perhaps on pickle read the data should be copied to native byte-order if necessary. Opinions? -Travis From benjamin at decideur.info Thu Apr 6 03:23:09 2006 From: benjamin at decideur.info (Benjamin Thyreau) Date: Thu Apr 6 03:23:09 2006 Subject: [Numpy-discussion] Recarray and shared datas Message-ID: <200604061020.k36AKIsQ018238@decideur.info> Hi, Numpy has a nice feature of recarray, ie. record which can hold columns names. I'd like to use such a feature in order to better interact with R, ie. passing R datas to python without copy. The current rpy bindings do a full copy, and convert to simple ndarray. Looking at the recarray api in the Guide, and also at the source code, i don't find any recarray constructor which can get shared datas (all the examples from section 8.6 are doing copies). Is there some way to do it ? in Python or in C ? Or is there any plans to ? Thanks for the infos -- Benjamin Thyreau CEA/SHFJ Orsay From oliphant.travis at ieee.org Thu Apr 6 03:40:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 03:40:05 2006 Subject: [Numpy-discussion] Newbie indexing question and print order In-Reply-To: <44338DF4.7050603@gmail.com> References: <44338DF4.7050603@gmail.com> Message-ID: <4434E522.3060101@ieee.org> amcmorl wrote: > Hi all, > > I'm having a bit of trouble getting my head around numpy's indexing > capabilities. A quick summary of the problem is that I want to > lookup/index in nD from a second array of rank n+1, such that the last > (or first, I guess) dimension contains the lookup co-ordinates for the > value to extract from the first array. Here's a 2D (3,3) example: > > In [12]:print ar > [[ 0.15 0.75 0.2 ] > [ 0.82 0.5 0.77] > [ 0.21 0.91 0.59]] > > In [24]:print inds > [[[1 1] > [1 1] > [2 1]] > > [[2 2] > [0 0] > [1 0]] > > [[1 1] > [0 0] > [2 1]]] > > then somehow return the array (barring me making any row/column errors): > In [26]: c = ar.somefancyindexingroutinehere(inds) > You can do this with "fancy-indexing". Obviously it is going to take some time for people to get used to this idea as none of the responses yet suggest it. But the following works. c = ar[inds[...,0],inds[...,1]] gives the desired effect. Thus, your simple description c[x,y] = ar[inds[x,y,0],inds[x,y,1]] is a text-book description of what fancy-indexing does. Best regards, -Travis > In [26]:print c > [[ 0.5 0.5 0.91] > [ 0.59 0.15 0.82] > [ 0.5 0.15 0.91]] > > i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] > > Any suggestions? It looks like it should be relatively simple using > 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not > getting it. > > While I'm here, can someone help me understand the rationale behind > 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example > rather than x, y (=column, row; in which case 0.75 would be in the first > column and second row), which seems to me to be more intuitive. > > I'm really enjoying getting into numpy - I can see it'll be > simpler/faster coding than my previous environments, despite me not > knowing my way at the moment, and that python has better opportunities > for extensibility. So, many thanks for your great work. > From faltet at carabos.com Thu Apr 6 03:44:02 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu Apr 6 03:44:02 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4434E31B.5030306@ieee.org> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> Message-ID: <200604061243.48122.faltet@carabos.com> A Dijous 06 Abril 2006 11:44, Travis Oliphant va escriure: > But, this brings up the point that currently the pickled raw-data which > is read-in as a string by Python is used as the memory for the new array > (i.e. the string memory is "stolen"). This should work. The fact > that it didn't with sort was a bug that is now fixed in SVN. However, > operations on out-of-byte-order arrays will always be slower. Thus, > perhaps on pickle read the data should be copied to native byte-order if > necessary. Yes, I think that converting directly to native byteorder in unpickling time would be the best. Cheers! -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From a.u.r.e.l.i.a.n at gmx.net Thu Apr 6 04:16:11 2006 From: a.u.r.e.l.i.a.n at gmx.net (Johannes Loehnert) Date: Thu Apr 6 04:16:11 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.histHi In-Reply-To: <200604061243.48122.faltet@carabos.com> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> <200604061243.48122.faltet@carabos.com> Message-ID: <200604061315.23340.a.u.r.e.l.i.a.n@gmx.net> Hi, > > But, this brings up the point that currently the pickled raw-data which > > is read-in as a string by Python is used as the memory for the new array > > (i.e. the string memory is "stolen"). This should work. The fact > > that it didn't with sort was a bug that is now fixed in SVN. However, > > operations on out-of-byte-order arrays will always be slower. Thus, > > perhaps on pickle read the data should be copied to native byte-order if > > necessary. > > Yes, I think that converting directly to native byteorder in > unpickling time would be the best. If you stored your data in wrong byte order for some odd reason (maybe you use a library that requires a certain byte order), then you would want pickle to deliver the data back exactly as stored. I think this should be made a user option in some way, although I do not know a good place for it right now. Johannes From cimrman3 at ntc.zcu.cz Thu Apr 6 05:16:07 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu Apr 6 05:16:07 2006 Subject: [Numpy-discussion] site.cfg.example In-Reply-To: <4435020B.9040705@iam.uni-stuttgart.de> References: <44280161.4030708@ntc.zcu.cz> <442808AF.6090006@ftw.at> <44280C20.8000003@ntc.zcu.cz> <44297152.9000305@ftw.at> <442A698C.9000104@ntc.zcu.cz> <442A7E78.1030901@ftw.at> <442A86D2.20902@ntc.zcu.cz> <442A9A67.8050106@ftw.at> <442A9F8D.906@ntc.zcu.cz> <443253D4.90806@iam.uni-stuttgart.de> <4434D699.5030102@ntc.zcu.cz> <4434D8D3.7050200@iam.uni-stuttgart.de> <4434FC6B.3000905@ntc.zcu.cz> <4435020B.9040705@iam.uni-stuttgart.de> Message-ID: <44350672.4020008@ntc.zcu.cz> I have added numpy/site.cfg.example to the SVN. It should contain a list all possible sections and relevant fields, so that a (new) user sees what can be configured and then just copies the file to numpy/site.cfg, removes the unwanted sections and edits the wanted. If you think it is a good idea and have a section that is not present or properly described, contribute it, please :-) When/if the file grows, we can put it to the Wiki. cheers, r. From tim.hochberg at cox.net Thu Apr 6 08:39:00 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 6 08:39:00 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.histHi In-Reply-To: <200604061315.23340.a.u.r.e.l.i.a.n@gmx.net> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> <200604061243.48122.faltet@carabos.com> <200604061315.23340.a.u.r.e.l.i.a.n@gmx.net> Message-ID: <44353646.6010009@cox.net> Johannes Loehnert wrote: >Hi, > > > >>>But, this brings up the point that currently the pickled raw-data which >>>is read-in as a string by Python is used as the memory for the new array >>>(i.e. the string memory is "stolen"). This should work. The fact >>>that it didn't with sort was a bug that is now fixed in SVN. However, >>>operations on out-of-byte-order arrays will always be slower. Thus, >>>perhaps on pickle read the data should be copied to native byte-order if >>>necessary. >>> >>> >>Yes, I think that converting directly to native byteorder in >>unpickling time would be the best. >> >> > >If you stored your data in wrong byte order for some odd reason (maybe you use >a library that requires a certain byte order), then you would want pickle to >deliver the data back exactly as stored. I think this should be made a user >option in some way, although I do not know a good place for it right now. > > If this is really something we want to do, it seems that the "correct" solution is to have a different dtype when an object defaults to a given byte order than when it is forced to that byte order. Pickle could keep track of that and do the right thing on loading. For example, " References: <44331200.2020604@cox.net> <4434DD42.8010205@ieee.org> Message-ID: <44353880.2040406@cox.net> Travis Oliphant wrote: > >> Can you steal its memory and then give it some dummy memory that it >> can free without problems, so that the list can be deallocated >> without trouble? Does anyone know if you can just give the list a >> NULL pointer for it's memory and then immediately decref it? >> free(NULL) should always be safe, I think. (??) >> > I don't think you can steal a list's memory since each list element is > a actually pointer to some other Python Object. > However, a Python array's memory could be stolen (as Tim mentions later). > >> This is a good point. Numpy does fine with nested lists, but what >> should it do with nested generators? I originally thought that >> basically 'array(generator)' should make the exact same thing as >> 'array([f for f in generator])'. However, for nested generators, this >> would be an object array of generators. >> >> I'm not sure which is better -- having more special cases for >> generators that make generators, or having a simple rubric like above >> for how generators are treated. > > I like the idea that generators of generators acts the same as lists > of lists (i.e. recursively defined). Basically to implement this, we > need to repeat > > Array_FromSequence > discover_depth > discover_dimensions > discover_itemsize > > Or, just maybe we can figure out a way to enhance those functions so > that creating an array from generators works the same as creating an > array from sequences. Right now, the sequence interface is used. > Perhaps we could figure out a way to use a more abstract interface > which would include both generators and sequences. If that causes too > much alteration then I don't think it's worth it and we just repeat > those functions for generators. > > Now, I think there are two cases here that are being discussed as one > > 1) Creating arrays from iterators --- array( iter(xrange(10) ) > 2) Creating arrays from generators --- array(x for x in xrange(10)) > > Both of these cases really ought to be handled and really should be > integrated into the Array_FromSequence code. That code is inherited > from Numeric and was written before iterators and generators arose on > the scene. There ought to be a way to unify all of these notions > (Actually if you handle iterators, then sequences will come along for > the ride since sequences can behave as iterators). > I'd rather see one place in the code that handles these cases. But, > working code is usually better than dreamy plans :-) I agree with all of this. However, there's one specific case that I think we should optimize the heck out of. In fact, I'd be tempted as a first cut to only implement this case and raise exceptions in the other cases until we get around to implementing them. This one case is: * dtype known * 1-dimensional I care about this case because it's common and we can do it efficiently. In the other cases I could write a python function that does almost as good of a job as we're likely to do in C both in terms of speed and memory usage. So the known dtype, 1D case adds important functionality while the other "merely" adds convenience (and consistency). Those are good, but personally the added functionality is higher on my priority list. -tim From byrnes at bu.edu Thu Apr 6 09:15:25 2006 From: byrnes at bu.edu (John Byrnes) Date: Thu Apr 6 09:15:25 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <4434E13B.4000702@ieee.org> References: <4433C6D6.5080800@obs.univ-lyon1.fr> <20060405155736.GA9364@localhost.localdomain> <4434E13B.4000702@ieee.org> Message-ID: <20060406161450.GA18606@localhost.localdomain> On Thu, Apr 06, 2006 at 03:36:59AM -0600, Travis Oliphant wrote: > John Byrnes wrote: > >Hi Eric, > > > >In the past , I've done things like > > > >###### > >normdist = lambda x: numpy.random.normal(0,x) > >vecnormal = numpy.vectorize(normdist) > > > >stdev = numpy.array([1.1,1.2,1.0,2.2]) > >result = vecnormal(stdev) > > > >###### > > > >This works fine for up to 10k elements for stdev for some reason. > >Any larger then that and i get a Bus error on my PPC mac and a segfault on > >my x86 linux box. > > > > > > This needs to be tracked down. It looks like some-kind of error is not > being caught correctly. You should not get a segfault. Could you > provide a stack-trace when the problem occurs? > > One issue is that vectorize is using object arrays under the covers > which is consuming roughly 2x the memory than you may think. An > object array is created and the function is called for every element. > This object array is then converted to a number type after the fact. > > The segfault should be tracked down in any case. > > -Travis > > > Hi Travis, Here is a backtrace from gdb on my mac. John #0 0x00470b88 in log1pl () #1 0x00000000 in ?? () Cannot access memory at address 0x0 Cannot access memory at address 0x0 #2 0x004708ec in log1pl () #3 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #4 0x1007ce34 in ext_do_call (func=0x1, pp_stack=0xbfffed90, flags=211904, na=8656012, nk=1194304) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3824 #5 0x1007a230 in PyEval_EvalFrame (f=0x848410) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2203 #6 0x1007b284 in PyEval_EvalCodeEx (co=0x2, globals=0x4, locals=0x1, args=0x3, argcount=1049072, kws=0x841150, kwcount=1, defs=0x8411fc, defcount=0, closure=0x0) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2730 #7 0x10026274 in function_call (func=0x880bb0, arg=0x1001f0, kw=0x848410) at /Users/bob/src/Python-2.4.1/Objects/funcobject.c:548 #8 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #9 0x10015a88 in instancemethod_call (func=0x52eef0, arg=0x54a170, kw=0x0) at /Users/bob/src/Python-2.4.1/Objects/classobject.c:2431 #10 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #11 0x10059358 in slot_tp_call (self=0x53e4f0, args=0x5b310, kwds=0x0) at /Users/bob/src/Python-2.4.1/Objects/typeobject.c:4526 #12 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #13 0x1007c9e4 in do_call (func=0x53e4f0, pp_stack=0x53e4f0, na=0, nk=8655844) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3755 #14 0x1007c6dc in call_function (pp_stack=0x0, oparg=4) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3570 #15 0x1007a140 in PyEval_EvalFrame (f=0x10e200) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2163 #16 0x1007c83c in fast_function (func=0x4, pp_stack=0x10e360, n=268927488, na=268755664, nk=1) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3629 #17 0x1007c6c4 in call_function (pp_stack=0xbffff5bc, oparg=4) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3568 #18 0x1007a140 in PyEval_EvalFrame (f=0x10e030) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2163 #19 0x1007b284 in PyEval_EvalCodeEx (co=0x0, globals=0x4, locals=0x1, args=0x10078200, argcount=1049072, kws=0x841150, kwcount=1, defs=0x8411fc, defcount=0, closure=0x0) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2730 #20 0x1007e678 in PyEval_EvalCode (co=0x4, globals=0x4, locals=0x15fb) at /Users/bob/src/Python-2.4.1/Python/ceval.c:484 #21 0x100b2ee0 in run_node (n=0x10078200, filename=0x4
, globals=0x0, locals=0x10e180, flags=0x2) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:1265 #22 0x100b23b0 in PyRun_InteractiveOneFlags (fp=0x54a1a5, filename=0x56ca0 "", flags=0x10e030) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:762 #23 0x100b2190 in PyRun_InteractiveLoopFlags (fp=0x56b94, filename=0xd440 "", flags=0x100f21b8) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:695 #24 0x100b3bb0 in PyRun_AnyFileExFlags (fp=0xa0001554, filename=0x100f36ac "", closeit=0, flags=0xbffff934) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:658 #25 0x100bf640 in Py_Main (argc=269413412, argv=0x20000000) at /Users/bob/src/Python-2.4.1/Modules/main.c:484 #26 0x000018d0 in start () #27 0x8fe1a278 in __dyld__dyld_start () From ndarray at mac.com Thu Apr 6 12:42:17 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 12:42:17 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? Message-ID: It looks like the definition of the diagonal changed somewhere between Numeric 24.0 and numpy: In Numeric: >>> x = Numeric.arange(2*4*4) >>> x = Numeric.reshape(x, (2, 4, 4)) >>> x array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]], [[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]]) >>> Numeric.diagonal(x) array([[ 0, 5, 10, 15], [16, 21, 26, 31]]) But in numpy: >>> import numpy as Numeric >>> x = Numeric.arange(2*4*4) >>> x = Numeric.reshape(x, (2, 4, 4)) >>> Numeric.diagonal(x) array([[ 0, 20], [ 1, 21], [ 2, 22], [ 3, 23]]) The old logic seems to be clear: x is a pair of matrices and diagonal returns a pair of diagonals, but the new logic seems unclear: the disagonal returns the first rows of the two matrices transposed. Does anyone know when this change was introduced and why? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at mailcan.com Thu Apr 6 13:51:04 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 6 13:51:04 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? In-Reply-To: References: Message-ID: <200604061652.30764.pgmdevlist@mailcan.com> > Does anyone know when this change was introduced and why? Isn't it more a problem of default values ? By default, x.diagonal() == x.diagonal(0,0,1) x.diagonal() array([[ 0, 20], [ 1, 21], [ 2, 22], [ 3, 23]]) If you want the paired diagonal: x.diagonal(0,1,-1) array([[ 0, 5, 10, 15], [16, 21, 26, 31]]) From ndarray at mac.com Thu Apr 6 14:46:10 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 14:46:10 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? In-Reply-To: <200604061652.30764.pgmdevlist@mailcan.com> References: <200604061652.30764.pgmdevlist@mailcan.com> Message-ID: I see. However, something needs to be changed. In the current version help(diagonal) prints the following: {{{ Help on function diagonal in module numpy.core.oldnumeric: diagonal(a, offset=0, axis1=0, axis2=1) diagonal(a, offset=0, axis1=0, axis2=1) returns the given diagonals defined by the last two dimensions of the array. }}} I would think axes 0 and 1 are the first, not the last two dimensions. We can either change the documentation or change the defaults in the oldnumeric. I would vote for the change in defaults because oldnumeric is a compatibility module and should not introduce changes. In addition, the fact that the reduced axes become the first (rather than the last or one of the axis1 and axis2) dimension should be spelled out in the docstring. On 4/6/06, Pierre GM wrote: > > > Does anyone know when this change was introduced and why? > > Isn't it more a problem of default values ? > By default, x.diagonal() == x.diagonal(0,0,1) > > x.diagonal() > array([[ 0, 20], > [ 1, 21], > [ 2, 22], > [ 3, 23]]) > > If you want the paired diagonal: > x.diagonal(0,1,-1) > array([[ 0, 5, 10, 15], > [16, 21, 26, 31]]) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Apr 6 14:59:03 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 6 14:59:03 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4434E31B.5030306@ieee.org> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> Message-ID: <44358EEA.4080609@noaa.gov> Travis Oliphant wrote: > Thus, > perhaps on pickle read the data should be copied to native byte-order if > necessary. +1 Those that are working with non-native byte order on purpose presumably know what they are doing, and can check and swap as necessary -- or use tofile and fromfile, which I presume don't do any byteswapping for you. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at mailcan.com Thu Apr 6 15:01:03 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 6 15:01:03 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? In-Reply-To: References: <200604061652.30764.pgmdevlist@mailcan.com> Message-ID: <200604061802.20457.pgmdevlist@mailcan.com> > I would think axes 0 and 1 are the first, not the last two dimensions. We > can either change the documentation or change the defaults in the > oldnumeric. I would vote for the change in defaults because oldnumeric is > a compatibility module and should not introduce changes. So, change the default to: diagonal(a, offset=0, axis1=-2, axis2=-1) ? That'd make sense, I'm for that... From ndarray at mac.com Thu Apr 6 16:11:01 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 16:11:01 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: <200603280427.52789.pgmdevlist@mailcan.com> References: <200603280427.52789.pgmdevlist@mailcan.com> Message-ID: I have applied the patch with minor modifications. See < http://projects.scipy.org/scipy/numpy/changeset/2331>. Here are a few suggestions for posting patches. 1. If you are using svn, please post output of "svn diff" in the project root directory (the directory that *contains* "numpy", not the "numpy" directory. 2. If appropriate, add unit tests to an existing file instead of creating a new one. (In case of ma, the correct file is test_ma.py). 3. If you follow recommendation #1, this will happen automatically, if you cannot use svn for some reason, concatenate the output of diff for code and test in the same patch file. Here are some topics for discussion. 1. I've initially implemented some ma array methods by wrapping existing module level functions. I am not sure this is the best approach to implement new methods. It is probably cleaner to implement them as methods and provide wrappers at the module level similar to oldnumeric. 2. I am not sure cumprod and cumsum should fill masked elements with 1 and 0. I would think the result should be masked if any prior element along the axis being accumulated is masked. To ignore masked elements, filled can be called explicitly before cum[prod|sum]. One of the problems with filling by default is that 1 or 0 are not appropriate values for object arrays (for example, "" is an appropriate fill value for cumsum of an array of strings). On 3/28/06, Pierre GM wrote: > > Folks, > You can find a new patch for MA on the wiki > > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/ma-200603280900.patch > along with a test suite. > The 'clip' method should now work with array arguments. Were also added > cumsum, cumprod, std, var and squeeze. > I'll deal with flags, setflags, setfield, dump and others when I'll have a > better idea of how it works -- which probably won't happen anytime soon, > as I > don't really have time to dig in the code for these functions. AAMOF, I'm > more interested in checking/patching some other aspects of numpy for MA > (eg, > mlab...) > Once again, please send me your comments and suggestions. > Thx for everything > P. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.sorich at gmail.com Thu Apr 6 17:41:19 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Thu Apr 6 17:41:19 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: References: <200603280427.52789.pgmdevlist@mailcan.com> Message-ID: <16761e100604061733r586cca6cr94d72c554b54fdd0@mail.gmail.com> On 4/7/06, Sasha wrote: > > > 2. I am not sure cumprod and cumsum should fill masked elements with 1 and > 0. I would think the result should be masked if any prior element along the > axis being accumulated is masked. To ignore masked elements, filled can be > called explicitly before cum[prod|sum]. One of the problems with filling by > default is that 1 or 0 are not appropriate values for object arrays (for > example, "" is an appropriate fill value for cumsum of an array of strings). > > There are often a number of options for how masked values can be dealt with. In general (not just with cum*), I would prefer for the result to be masked when masked values are involved unless I explicitly indicate what should be done with the masked values. Otherwise it is too easy to forget that some default maniputlation of masked values has been applied. In R there is commonly an na.action or na.rm parameter to functions. Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at mailcan.com Thu Apr 6 19:19:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 6 19:19:02 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: References: <200603280427.52789.pgmdevlist@mailcan.com> Message-ID: <200604062218.05876.pgmdevlist@mailcan.com> Sasha, Thanks for your advice with SVN. I'll make sure to use that method from now on. > 1. I've initially implemented some ma array methods by wrapping existing > module level functions. I am not sure this is the best approach to > implement new methods. It is probably cleaner to implement them as methods > and provide wrappers at the module level similar to oldnumeric. Well, I tried to stick to the latest convention, getting rid of the _wrapit part. Let me know. > > 2. I am not sure cumprod and cumsum should fill masked elements with 1 and > 0. Good point for the object/string arrays, yet other cases I overlooked (I'm still not used to object arrays, I'm now realizing they're quite useful). Actually, I coded that way because it's how I use these functions. But well, as many settings as users, eh? Michael's suggestion of introducing R-like options sounds interesting, but I wonder whether it would not be a bit heavy for methods, with the introduction of an extra flag. That'd be great for functions, though. So, for cumsum and cumprod methods, maybe we could stick to Sasha's and Michael's preference (mask all values after the first missing), and we would just have to create two functions. We could use the 4 R ones: na.omit, na.fail, na.pass, na.exclude. For our current problem (cumsum,cumprod) na.omit: would return the result I implemented (fill with 0 or 1) na.fail: would return masked values after the first missing na.exclude: would correspond to compressed().cumsum() ? I don't like that, it changes the initial length/size na.pass: I don't know... From ndarray at mac.com Thu Apr 6 21:14:01 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 21:14:01 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: <16761e100604061733r586cca6cr94d72c554b54fdd0@mail.gmail.com> References: <200603280427.52789.pgmdevlist@mailcan.com> <16761e100604061733r586cca6cr94d72c554b54fdd0@mail.gmail.com> Message-ID: On 4/6/06, Michael Sorich wrote: > ... I would prefer for the result to be masked > when masked values are involved unless I explicitly indicate what should be > done with the masked values. ... This is the case in r2332: >>> from numpy.core.ma import * >>> print array([1,2,3], mask=[0,1,0]).cumsum() [1 -- --] From a.mcmorland at auckland.ac.nz Fri Apr 7 00:30:07 2006 From: a.mcmorland at auckland.ac.nz (Angus McMorland) Date: Fri Apr 7 00:30:07 2006 Subject: [Numpy-discussion] Newbie indexing question [fancy indexing in nD] In-Reply-To: <4434E522.3060101@ieee.org> References: <44338DF4.7050603@gmail.com> <4434E522.3060101@ieee.org> Message-ID: <4435F672.1040701@auckland.ac.nz> Hi again. Thanks, everyone, for your quick replies. Travis Oliphant wrote: > amcmorl wrote: > >> Hi all, >> >> I'm having a bit of trouble getting my head around numpy's indexing >> capabilities. A quick summary of the problem is that I want to >> lookup/index in nD from a second array of rank n+1, such that the last >> (or first, I guess) dimension contains the lookup co-ordinates for the >> value to extract from the first array. Here's a 2D (3,3) example: >> >> In [12]:print ar >> [[ 0.15 0.75 0.2 ] >> [ 0.82 0.5 0.77] >> [ 0.21 0.91 0.59]] >> >> In [24]:print inds >> [[[1 1] >> [1 1] >> [2 1]] >> >> [[2 2] >> [0 0] >> [1 0]] >> >> [[1 1] >> [0 0] >> [2 1]]] >> >> then somehow return the array (barring me making any row/column errors): >> In [26]: c = ar.somefancyindexingroutinehere(inds) > > You can do this with "fancy-indexing". Obviously it is going to take > some time for people to get used to this idea as none of the responses > yet suggest it. > But the following works. > c = ar[inds[...,0],inds[...,1]] > > gives the desired effect. > > Thus, your simple description c[x,y] = ar[inds[x,y,0],inds[x,y,1]] is a > text-book description of what fancy-indexing does. Great. Turns out I wasn't too far off then. I've written a quick function of my own that extends the fancy indexing to nD: def fancy_index_nd(ar, ind): evList = ['ar['] for i in range(len(ar.shape)): evList = evList + [' ind[...,%d]' % i] if i < len(ar.shape) - 1: evList = evList + [","] evList = evList + [' ]'] return eval(''.join(evList)) 1) Am I missing a simpler way to extend the fancy-indexing to n-dimensions? If not... 2) this seems (conceptually) that it might be a little faster than the routines that have to calculate a flat index. Hopefully it could be of use to people. Any thoughts? Cheers, Angus -- Angus McMorland email a.mcmorland at auckland.ac.nz mobile +64-21-155-4906 PhD Student, Neurophysiology / Multiphoton & Confocal Imaging Physiology, University of Auckland phone +64-9-3737-599 x89707 Armourer, Auckland University Fencing Secretary, Fencing North Inc. From pau.gargallo at gmail.com Fri Apr 7 02:37:05 2006 From: pau.gargallo at gmail.com (Pau Gargallo) Date: Fri Apr 7 02:37:05 2006 Subject: [Numpy-discussion] Newbie indexing question [fancy indexing in nD] In-Reply-To: <4435F672.1040701@auckland.ac.nz> References: <44338DF4.7050603@gmail.com> <4434E522.3060101@ieee.org> <4435F672.1040701@auckland.ac.nz> Message-ID: <6ef8f3380604070236m2d606983l82403cbc2305fefa@mail.gmail.com> you can do things like a[ list( ind[...,i] for i in range(.shape[-1]) ) ] if the indices could be accessed as ind[i] instead of ind[...,i] (transposing the indices array) then you could simply do: a[ list(ind) ] pau On 4/7/06, Angus McMorland wrote: > Hi again. > > Thanks, everyone, for your quick replies. > > Travis Oliphant wrote: > > amcmorl wrote: > > > >> Hi all, > >> > >> I'm having a bit of trouble getting my head around numpy's indexing > >> capabilities. A quick summary of the problem is that I want to > >> lookup/index in nD from a second array of rank n+1, such that the last > >> (or first, I guess) dimension contains the lookup co-ordinates for the > >> value to extract from the first array. Here's a 2D (3,3) example: > >> > >> In [12]:print ar > >> [[ 0.15 0.75 0.2 ] > >> [ 0.82 0.5 0.77] > >> [ 0.21 0.91 0.59]] > >> > >> In [24]:print inds > >> [[[1 1] > >> [1 1] > >> [2 1]] > >> > >> [[2 2] > >> [0 0] > >> [1 0]] > >> > >> [[1 1] > >> [0 0] > >> [2 1]]] > >> > >> then somehow return the array (barring me making any row/column errors): > >> In [26]: c = ar.somefancyindexingroutinehere(inds) > > > > You can do this with "fancy-indexing". Obviously it is going to take > > some time for people to get used to this idea as none of the responses > > yet suggest it. > > But the following works. > > c = ar[inds[...,0],inds[...,1]] > > > > gives the desired effect. > > > > Thus, your simple description c[x,y] = ar[inds[x,y,0],inds[x,y,1]] is a > > text-book description of what fancy-indexing does. > > Great. Turns out I wasn't too far off then. I've written a quick > function of my own that extends the fancy indexing to nD: > > def fancy_index_nd(ar, ind): > evList = ['ar['] > for i in range(len(ar.shape)): > evList = evList + [' ind[...,%d]' % i] > if i < len(ar.shape) - 1: > evList = evList + [","] > evList = evList + [' ]'] > return eval(''.join(evList)) > > 1) Am I missing a simpler way to extend the fancy-indexing to > n-dimensions? If not... > > 2) this seems (conceptually) that it might be a little faster than the > routines that have to calculate a flat index. Hopefully it could be of > use to people. Any thoughts? > > Cheers, > > Angus > -- > Angus McMorland > email a.mcmorland at auckland.ac.nz > mobile +64-21-155-4906 > > PhD Student, Neurophysiology / Multiphoton & Confocal Imaging > Physiology, University of Auckland > phone +64-9-3737-599 x89707 > > Armourer, Auckland University Fencing > Secretary, Fencing North Inc. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From mxjmfen at dlalaw.com Fri Apr 7 04:50:08 2006 From: mxjmfen at dlalaw.com (mxjmfen) Date: Fri Apr 7 04:50:08 2006 Subject: [Numpy-discussion] Fw: numpy-discussion Message-ID: <001401c65a39$3e4f54e0$6c5fd855@ries> ----- Original Message ----- From: Rosenberg Kris To: xqoahbphlic at time.net Sent: Friday, April 07, 2006 11:26 AM Subject: numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-discussion.gif Type: image/gif Size: 16996 bytes Desc: not available URL: From a.h.jaffe at gmail.com Fri Apr 7 06:54:09 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Fri Apr 7 06:54:09 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4434E31B.5030306@ieee.org> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> Message-ID: <44366E71.7060601@gmail.com> Travis Oliphant wrote: > But, this brings up the point that currently the pickled raw-data which > is read-in as a string by Python is used as the memory for the new array > (i.e. the string memory is "stolen"). This should work. The fact > that it didn't with sort was a bug that is now fixed in SVN. However, > operations on out-of-byte-order arrays will always be slower. Thus, > perhaps on pickle read the data should be copied to native byte-order if > necessary. +1 from me, too. I assume that byteswapping is fast compared to I/O in most cases, and the only times when you wouldn't want it would be 'advanced' usage that the developer could take control of via a custom reduce, __getstate__, __setstate__, etc. Andrew ______________________________________________________________________ Andrew Jaffe a.jaffe at imperial.ac.uk Astrophysics Group +44 207 594-7526 Blackett Laboratory, Room 1013 FAX 7541 Imperial College, Prince Consort Road London SW7 2AZ ENGLAND http://astro.imperial.ac.uk/~jaffe From ndarray at mac.com Fri Apr 7 10:26:06 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 10:26:06 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: Message-ID: I am posting a reply to my own post in a hope to generate some discussion of the original proposal. I am proposing to add a "filled" method to ndarray. This can be a pass-through, an alias to "copy" or a method to replace nans or some other type-specific values. This will allow code that uses "filled" work on ndarrays without changes. On 3/22/06, Sasha wrote: > > In an ideal world, any function that accepts ndarray would accept > ma.array and vice versa. Moreover, if the ma.array has no masked > elements and the same data as ndarray, the result should be the same. > Obviously current implementation falls short of this goal, but there > is one feature that seems to make this goal unachievable. > > This feature is the "filled" method of ma.array. Pydoc for this > method reports the following: > > | filled(self, fill_value=None) > | A numeric array with masked values filled. If fill_value is None, > | use self.fill_value(). > | > | If mask is nomask, copy data only if not contiguous. > | Result is always a contiguous, numeric array. > | # Is contiguous really necessary now? > > > That is not the best possible description ("filled" is "filled"), but > the essence is that the result of a.filled(value) is a contiguous > ndarray obtained from the masked array by copying non-masked elements > and using value for masked values. > > I would like to propose to add a "filled" method to ndarray. I see > several possibilities and would like to hear your opinion: > > 1. Make filled simply return self. > > 2. Make filled return a contiguous copy. > > 3. Make filled replace nans with the fill_value if array is of > floating point type. > > > Unfortunately, adding "filled" will result is a rather confusing > situation where "fill" and "filled" both exist and have very different > meanings. > > I would like to note that "fill" is a somewhat odd ndarray method. > AFAICT, it is the only non-special method that mutates the array. It > appears to be just a performance trick: the same result can be achived > with "a[...] = ". > -------------- next part -------------- An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Fri Apr 7 10:38:03 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 10:38:03 2006 Subject: [Numpy-discussion] Tiling / disk storage for matrix in numpy? Message-ID: Hi all, Is there a way in numpy to associate a (large) matrix with a disk file, then and tile and index it, then cache it as you process the various pieces? This is pretty important with massive image files, which can't fit into working memory, but in which (for example) you might be doing a convolution on a 100 x 100 pixel window on a small subset of the image. I know that caching algorithms are (1) complicated and (2) never general. But there you go. Perhaps I can't find it, perhaps it would be a good project for the future? If HDF or something does this already, could someone point me in the right direction? Thx From tim.hochberg at cox.net Fri Apr 7 11:22:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 11:22:05 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: Message-ID: <4436AE31.7000306@cox.net> Sasha wrote: > I am posting a reply to my own post in a hope to generate some > discussion of the original proposal. > > I am proposing to add a "filled" method to ndarray. This can be a > pass-through, an alias to "copy" or a method to replace nans or some > other type-specific values. This will allow code that uses "filled" > work on > ndarrays without changes. In general, I'm skeptical of adding more methods to the ndarray object -- there are plenty already. In addition, it appears that both the method and function versions of filled are "dangerous" in the sense that they sometimes return the array itself and sometimes a copy. Finally, changing ndarray to support masked array feels a bit like the tail wagging the dog. Let me throw out an alternative proposal. I will admit up front that this proposal is based on exactly zero experience with masked array, so there may be some stupidities in it, but perhaps it will lead to an alternative solution. def asUnmaskedArray(obj, fill_value=None): mask = getattr(obj, False) if mask is False: return obj if fill_value is None: fill_value = obj.get_fill_value() newobj = obj.data().copy() newobj[mask] = fill_value return newobj Or something like that anyway. This particular version should work on any array as long as if it exports a mask attribute it also exports get_fill_value and data. At least once any bugs are ironed out, I haven't tested it. ma would have to be modified to use this instead of using filled everywhere, but that seems more appropriate than tacking on another method to ndarray IMO. On advantage of this approach is that most array like objects that don't subclass ndarray will work with this automagically. If we keep expanding the methods of ndarray, it's harder and harder to implement other array like objects since they have to implement more and more methods, most of which are irrelevant to their particular case. The more we can implement stuff like this in terms of some relatively small set of core primitives, the happier we'll all be in the long run. This also builds on the idea of trying to push as much of the array/view ambiguity into the asXXXArray corner. Regards, -tim > > > On 3/22/06, *Sasha* > wrote: > > In an ideal world, any function that accepts ndarray would accept > ma.array and vice versa. Moreover, if the ma.array has no masked > elements and the same data as ndarray, the result should be the same. > Obviously current implementation falls short of this goal, but there > is one feature that seems to make this goal unachievable. > > This feature is the "filled" method of ma.array. Pydoc for this > method reports the following: > > | filled(self, fill_value=None) > | A numeric array with masked values filled. If fill_value is > None, > | use self.fill_value(). > | > | If mask is nomask, copy data only if not contiguous. > | Result is always a contiguous, numeric array. > | # Is contiguous really necessary now? > > > That is not the best possible description ("filled" is "filled"), but > the essence is that the result of a.filled(value) is a contiguous > ndarray obtained from the masked array by copying non-masked elements > and using value for masked values. > > I would like to propose to add a "filled" method to ndarray. I see > several possibilities and would like to hear your opinion: > > 1. Make filled simply return self. > > 2. Make filled return a contiguous copy. > > 3. Make filled replace nans with the fill_value if array is of > floating point type. > > > Unfortunately, adding "filled" will result is a rather confusing > situation where "fill" and "filled" both exist and have very different > meanings. > > I would like to note that "fill" is a somewhat odd ndarray method. > AFAICT, it is the only non-special method that mutates the array. It > appears to be just a performance trick: the same result can be > achived > with "a[...] = ". > > From ndarray at mac.com Fri Apr 7 12:20:15 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 12:20:15 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436AE31.7000306@cox.net> References: <4436AE31.7000306@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > > ... > In general, I'm skeptical of adding more methods to the ndarray object > -- there are plenty already. I've also proposed to drop "fill" in favor of optimizing x[...] = . Having both "fill" and "filled" in the interface is plain awkward. You may like the combined proposal better because it does not change the total number of methods :-) In addition, it appears that both the method and function versions of > filled are "dangerous" in the sense that they sometimes return the array > itself and sometimes a copy. This is true in ma, but may certainly be changed. > Finally, changing ndarray to support masked array feels a bit like the > tail wagging the dog. I disagree. Numpy is pretty much alone among the array languages because it does not have "native" support for missing values. For the floating point types some rudimental support for nans exists, but is not really usable. There is no missing values machanism for integer types. I believe adding "filled" and maybe "mask" to ndarray (not necessarily under these names) could be a meaningful step towards "native" support for missing values. -------------- next part -------------- An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Fri Apr 7 12:36:00 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 12:36:00 2006 Subject: [Numpy-discussion] Silly array question Message-ID: In R, if you have an Nx2 array of integers, you can use that to index an TxS array, yielding a 1xN result. Is there a way to do that in numpy? I looked for a pairs function but I coudn't find it, vaguely remembering that might be around... I know it would be a trivial loop to write, but a numpy array function would be faster (I hope). Example I = [[0,0], [1,1], [2,2], [1,1]] M = [[1, 2, 3, 4], [5, 6, 7, 8], [9,10,11, 12], [13, 14, 15, 16]] M[I] = [1,6,11,6]. Thanks! From ndarray at mac.com Fri Apr 7 12:53:03 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 12:53:03 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: Message-ID: >>> M.ravel()[dot(I,(4,1))] array([ 1, 6, 11, 6]) On 4/7/06, Webb Sprague wrote: > > In R, if you have an Nx2 array of integers, you can use that to index > an TxS array, yielding a 1xN result. Is there a way to do that in > numpy? I looked for a pairs function but I coudn't find it, vaguely > remembering that might be around... I know it would be a trivial loop > to write, but a numpy array function would be faster (I hope). > > Example > > I = [[0,0], [1,1], [2,2], [1,1]] > M = [[1, 2, 3, 4], > [5, 6, 7, 8], > [9,10,11, 12], > [13, 14, 15, 16]] > > M[I] = [1,6,11,6]. > > Thanks! > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmdlnk&kid0944&bid$1720&dat1642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Fri Apr 7 13:22:06 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri Apr 7 13:22:06 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> Message-ID: <4436C965.8020808@hawaii.edu> Sasha wrote: > > > On 4/7/06, *Tim Hochberg* > wrote: > > ... > In general, I'm skeptical of adding more methods to the ndarray object > -- there are plenty already. > > > I've also proposed to drop "fill" in favor of optimizing x[...] = > . Having both "fill" and "filled" in the interface is plain > awkward. You may like the combined proposal better because it does not > change the total number of methods :-) > > > In addition, it appears that both the method and function versions of > filled are "dangerous" in the sense that they sometimes return the > array > itself and sometimes a copy. > > > This is true in ma, but may certainly be changed. > > > Finally, changing ndarray to support masked array feels a bit like the > tail wagging the dog. > > > I disagree. Numpy is pretty much alone among the array languages because > it does not have "native" support for missing values. For the floating > point types some rudimental support for nans exists, but is not really > usable. There is no missing values machanism for integer types. I > believe adding "filled" and maybe "mask" to ndarray (not necessarily > under these names) could be a meaningful step towards "native" support > for missing values. I agree strongly with you, Sasha. I get the impression that the world of numerical computation is divided into those who work with idealized "data", where nothing is missing, and those who work with real observations, where there is always something missing. As an oceanographer, I am solidly in the latter category. If good support for missing values is not built in, it has to be bolted on, and it becomes clunky and awkward. I was reluctant to speak up about this earlier because I thought it was too much to ask of Travis when he was in the midst of putting numpy on solid ground. But I am delighted that missing value support has a champion among numpy developers, and I agree that now is the time to change it from "bolted on" to "integrated". Eric From Chris.Barker at noaa.gov Fri Apr 7 13:28:02 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri Apr 7 13:28:02 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: Message-ID: <4436CB1C.3040308@noaa.gov> Webb Sprague wrote: > In R, if you have an Nx2 array of integers, you can use that to index > an TxS array, yielding a 1xN result. this seems to work: >>> import numpy as N >>> I = N.array([[0,0], [1,1], [2,2], [1,1]]) >>> I array([[0, 0], [1, 1], [2, 2], [1, 1]]) >>> M = N. array( [[1, 2, 3, 4], [5, 6, 7, 8], [9,10,11, 12], [13, 14, 15, 16]]) >>> M array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12], [13, 14, 15, 16]]) >>> M[I[:,0], I[:,1]] array([ 1, 6, 11, 6]) -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ndarray at mac.com Fri Apr 7 13:56:02 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 13:56:02 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: <4436CB1C.3040308@noaa.gov> References: <4436CB1C.3040308@noaa.gov> Message-ID: One more obfuscated numpy entry: >>> M[tuple(transpose(I))] array([ 1, 6, 11, 6]) On 4/7/06, Christopher Barker wrote: > > > > Webb Sprague wrote: > > In R, if you have an Nx2 array of integers, you can use that to index > > an TxS array, yielding a 1xN result. > > this seems to work: > > >>> import numpy as N > >>> I = N.array([[0,0], [1,1], [2,2], [1,1]]) > >>> I > array([[0, 0], > [1, 1], > [2, 2], > [1, 1]]) > > >>> M = N. array( [[1, 2, 3, 4], [5, 6, 7, 8], [9,10,11, 12], [13, 14, > 15, 16]]) > > >>> M > array([[ 1, 2, 3, 4], > [ 5, 6, 7, 8], > [ 9, 10, 11, 12], > [13, 14, 15, 16]]) > > >>> M[I[:,0], I[:,1]] > array([ 1, 6, 11, 6]) > > -- > Christopher Barker, Ph.D. > Oceanographer > > NOAA/OR&R/HAZMAT (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Fri Apr 7 14:00:10 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 14:00:10 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: <4436CB1C.3040308@noaa.gov> Message-ID: I appreciate everyone's help, but is there a NON obfuscated way to do this without looping? I think Chris's is my favorite, but I didn't know I was starting a contest :) > >>> M[I[:,0], I[:,1]] > array([ 1, 6, 11, 6]) W From webb.sprague at gmail.com Fri Apr 7 14:05:04 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 14:05:04 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: <4436CB1C.3040308@noaa.gov> Message-ID: Ok, so now I get it M[(tuple for rows), (tuple for columns)] Whew On 4/7/06, Webb Sprague wrote: > I appreciate everyone's help, but is there a NON obfuscated way to do > this without looping? I think Chris's is my favorite, but I didn't > know I was starting a contest :) > > > >>> M[I[:,0], I[:,1]] > > array([ 1, 6, 11, 6]) > > W > From tim.hochberg at cox.net Fri Apr 7 14:16:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 14:16:06 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436C965.8020808@hawaii.edu> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> Message-ID: <4436D6D1.6040302@cox.net> Eric Firing wrote: > Sasha wrote: > >> >> >> On 4/7/06, *Tim Hochberg* > > wrote: >> >> ... >> In general, I'm skeptical of adding more methods to the ndarray >> object >> -- there are plenty already. >> >> >> I've also proposed to drop "fill" in favor of optimizing x[...] = >> . Having both "fill" and "filled" in the interface is plain >> awkward. You may like the combined proposal better because it does >> not change the total number of methods :-) >> >> >> In addition, it appears that both the method and function >> versions of >> filled are "dangerous" in the sense that they sometimes return the >> array >> itself and sometimes a copy. >> >> >> This is true in ma, but may certainly be changed. >> >> >> Finally, changing ndarray to support masked array feels a bit >> like the >> tail wagging the dog. >> >> I disagree. Numpy is pretty much alone among the array languages >> because it does not have "native" support for missing values. For >> the floating point types some rudimental support for nans exists, >> but is not really usable. There is no missing values machanism for >> integer types. I believe adding "filled" and maybe "mask" to ndarray >> (not necessarily under these names) could be a meaningful step >> towards "native" support for missing values. > > > I agree strongly with you, Sasha. I get the impression that the world > of numerical computation is divided into those who work with idealized > "data", where nothing is missing, and those who work with real > observations, where there is always something missing. I think your experience is clouding your judgement here. Or at least this comes off as unnecessarily perjorative. There's a large class of people who work with data that doesn't have missing values either because of the nature of data acquisition or because they're doing simulations. I take zillions of measurements with digital oscillopscopes and they *never* have missing values. Clipped values, yes, but even if I somehow could queery the scope about which values were actually clipped or simply make an educated guess based on their value, the facilities of ma would be useless to me. The clipped values are what I would want in any case. I also do a lot of work with simulations derived from this and other data. I don't come across missing values here but again, if I did, the way ma works would not help me. I'd have to treat them either by rejecting the data outright or by some sort of interpolation. > As an oceanographer, I am solidly in the latter category. If good > support for missing values is not built in, it has to be bolted on, > and it becomes clunky and awkward. This may be a false dichotomy. It's certainly not obvious to me that this is so. At least if "bolted on" means "not adding a filled method to ndarray". > I was reluctant to speak up about this earlier because I thought it > was too much to ask of Travis when he was in the midst of putting > numpy on solid ground. But I am delighted that missing value support > has a champion among numpy developers, and I agree that now is the > time to change it from "bolted on" to "integrated". I have no objection to ma support improving. In fact I think it would be great although I don't forsee it helping me anytime soon. I also support Sasha's goal of being able to mix MaskedArrays and ndarrays reasonably seemlessly. However, I do think the situation needs more thought. Slapping filled and mask onto ndarray is the path of least resistance, but it's not clear that it's the best one. If we do decide we are going to add both of these methods to ndarray (with filled returning a copy!), then it may worth considering making ndarray a subclass of MaskedArray. Conceptually this makes sense, since at this point an ndarray will just be a MaskedArray where mask is always False. I think that they could share much of the implementation except that ndarray would be set up to use methods that ignored the mask attribute since they would know that it's always false. Even that might not be worth it, since the check for whether mask is True/False is just a pointer compare. It may in fact be best just to do away with MaskedArray entirely, moving the functionality into ndarray. That may have performance implications, although I don't seem them at the moment, and I don't know if there are other methods/attributes that this would imply need to be moved over, although it looks like just mask, filled and possibly filled_value, although the latter looks a little dubious to me. Either of the above two options would certainly improve the quality of MaskedArray. Copy for instance seems not to have been implemented, and who knows what other dark corners remain unexplored here. There's a whole spectrum of possibilities here from ones that don't intrude on ndarray at all to ones that profoundly change it. Sasha's suggestion looks like it's probably the simplest thing in the short term, but I don't know that it's the best long term solution. I think it needs more thought and discussion, which is after all what Sasha asked for ;) Regards, -tim From Chris.Barker at noaa.gov Fri Apr 7 15:13:02 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri Apr 7 15:13:02 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: <4436CB1C.3040308@noaa.gov> Message-ID: <4436E3C9.2040807@noaa.gov> Sasha wrote: > One more obfuscated numpy entry: > >>>> M[tuple(transpose(I))] > array([ 1, 6, 11, 6]) exactly. Can anyone explain why that works, but: M[transpose(I)] or M[I] doesn't? -Chris - Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From efiring at hawaii.edu Fri Apr 7 15:37:03 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri Apr 7 15:37:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436D6D1.6040302@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: <4436E95B.4090009@hawaii.edu> Tim Hochberg wrote: > Eric Firing wrote: > >> Sasha wrote: >> >>> >>> >>> On 4/7/06, *Tim Hochberg* >> > wrote: >>> >>> ... >>> In general, I'm skeptical of adding more methods to the ndarray >>> object >>> -- there are plenty already. >>> >>> >>> I've also proposed to drop "fill" in favor of optimizing x[...] = >>> . Having both "fill" and "filled" in the interface is plain >>> awkward. You may like the combined proposal better because it does >>> not change the total number of methods :-) >>> >>> >>> In addition, it appears that both the method and function >>> versions of >>> filled are "dangerous" in the sense that they sometimes return the >>> array >>> itself and sometimes a copy. >>> >>> >>> This is true in ma, but may certainly be changed. >>> >>> >>> Finally, changing ndarray to support masked array feels a bit >>> like the >>> tail wagging the dog. >>> >>> I disagree. Numpy is pretty much alone among the array languages >>> because it does not have "native" support for missing values. For >>> the floating point types some rudimental support for nans exists, >>> but is not really usable. There is no missing values machanism for >>> integer types. I believe adding "filled" and maybe "mask" to ndarray >>> (not necessarily under these names) could be a meaningful step >>> towards "native" support for missing values. >> >> >> >> I agree strongly with you, Sasha. I get the impression that the world >> of numerical computation is divided into those who work with idealized >> "data", where nothing is missing, and those who work with real >> observations, where there is always something missing. > > > I think your experience is clouding your judgement here. Or at least > this comes off as unnecessarily perjorative. There's a large class of > people who work with data that doesn't have missing values either > because of the nature of data acquisition or because they're doing > simulations. I take zillions of measurements with digital oscillopscopes > and they *never* have missing values. Clipped values, yes, but even if I > somehow could queery the scope about which values were actually clipped > or simply make an educated guess based on their value, the facilities of > ma would be useless to me. The clipped values are what I would want in > any case. I also do a lot of work with simulations derived from this > and other data. I don't come across missing values here but again, if I > did, the way ma works would not help me. I'd have to treat them either > by rejecting the data outright or by some sort of interpolation. Tim, The point is well-taken, and I apologize. I stated my case badly. (I would be delighted if I did not have to be concerned with missing values-they are a pain regardless of how well a numerical package handles them.) > >> As an oceanographer, I am solidly in the latter category. If good >> support for missing values is not built in, it has to be bolted on, >> and it becomes clunky and awkward. > > > This may be a false dichotomy. It's certainly not obvious to me that > this is so. At least if "bolted on" means "not adding a filled method to > ndarray". I probably overstated it, but I think we actually agree. I intended to lend support to the priority of making missing-value support as seamless and painless as possible. It will help some people, and not others. > >> I was reluctant to speak up about this earlier because I thought it >> was too much to ask of Travis when he was in the midst of putting >> numpy on solid ground. But I am delighted that missing value support >> has a champion among numpy developers, and I agree that now is the >> time to change it from "bolted on" to "integrated". > > > > I have no objection to ma support improving. In fact I think it would be > great although I don't forsee it helping me anytime soon. I also support > Sasha's goal of being able to mix MaskedArrays and ndarrays reasonably > seemlessly. > > However, I do think the situation needs more thought. Slapping filled > and mask onto ndarray is the path of least resistance, but it's not > clear that it's the best one. > > If we do decide we are going to add both of these methods to ndarray > (with filled returning a copy!), then it may worth considering making > ndarray a subclass of MaskedArray. Conceptually this makes sense, since > at this point an ndarray will just be a MaskedArray where mask is always > False. I think that they could share much of the implementation except > that ndarray would be set up to use methods that ignored the mask > attribute since they would know that it's always false. Even that might > not be worth it, since the check for whether mask is True/False is just > a pointer compare. > > It may in fact be best just to do away with MaskedArray entirely, moving > the functionality into ndarray. That may have performance implications, > although I don't seem them at the moment, and I don't know if there are > other methods/attributes that this would imply need to be moved over, > although it looks like just mask, filled and possibly filled_value, > although the latter looks a little dubious to me. > This is exactly the option that I was afraid to bring up because I thought it might be too disruptive, and because I am not contributing to numpy, and probably don't have the competence (or time) to do so. > Either of the above two options would certainly improve the quality of > MaskedArray. Copy for instance seems not to have been implemented, and > who knows what other dark corners remain unexplored here. > > There's a whole spectrum of possibilities here from ones that don't > intrude on ndarray at all to ones that profoundly change it. Sasha's > suggestion looks like it's probably the simplest thing in the short > term, but I don't know that it's the best long term solution. I think it > needs more thought and discussion, which is after all what Sasha asked > for ;) Exactly! Thank you for broadening the discussion. Eric From ndarray at mac.com Fri Apr 7 15:38:04 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 15:38:04 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436D6D1.6040302@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > > However, I do think the situation needs more thought. Slapping filled > and mask onto ndarray is the path of least resistance, but it's not > clear that it's the best one. Completely agree. I have many gripes about current ma implementation of both "filled" and "mask". filled: 1. I don't like default fill value. It should be mandatory to supply fill value. 2. It should return masked array (with trivial mask), not ndarray. 3. The name conflicts with the "fill" method. 4. View/Copy inconsistency. Does not provide a method to fill values in-place. mask: 1. I've got rid of mask returning None in favor of False_ (boolean array scalar), but it is still not perfect. I would prefer data.shape == mask.shape invariant and if space saving/performance is deemed necessary use zero-stride arrays. 2. I don't like the name. "Missing" or "na" would be better. > If we do decide we are going to add both of these methods to ndarray > (with filled returning a copy!), then it may worth considering making > ndarray a subclass of MaskedArray. Conceptually this makes sense, since > at this point an ndarray will just be a MaskedArray where mask is always > False. I think that they could share much of the implementation except > that ndarray would be set up to use methods that ignored the mask > attribute since they would know that it's always false. Even that might > not be worth it, since the check for whether mask is True/False is just > a pointer compare. > The tail becoming the dog! Yet I agree, this makes sense from the implementation point of view. From OOP perspective this would make sense if arrays were immutable, but since mask is settable in MaskedArray, making it constant in the subclass will violate the substitution principle. I would not object making mask read only, however. > It may in fact be best just to do away with MaskedArray entirely, moving > the functionality into ndarray. That may have performance implications, > although I don't seem them at the moment, and I don't know if there are > other methods/attributes that this would imply need to be moved over, > although it looks like just mask, filled and possibly filled_value, > although the latter looks a little dubious to me. > I think MA can coexist with ndarray and share the interface. Ndarray can use special bit-patterns like IEEE NaN to indicate missing floating point values. Add-on modules can redefine arithmetic to make INT_MIN behave as a missing marker for signed integers (R, K and J (I think) languages use this approach). Applications that need missing values support across the board will use MA. > Either of the above two options would certainly improve the quality of > MaskedArray. Copy for instance seems not to have been implemented, and > who knows what other dark corners remain unexplored here. > More (corners) than you want to know about! Reimplementing MA in C would be a worthwhile goal (and what you suggest seems to require just that), but it is too big of a project. I suggest that we focus on the interface first. If existing MA interface is rejected (which is likely) for ndarray, we can easily experiment with the alternatives within MA, which is pure python. > There's a whole spectrum of possibilities here from ones that don't > intrude on ndarray at all to ones that profoundly change it. Sasha's > suggestion looks like it's probably the simplest thing in the short > term, but I don't know that it's the best long term solution. I think it > needs more thought and discussion, which is after all what Sasha asked > for ;) Exactly! From robert.kern at gmail.com Fri Apr 7 15:39:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 7 15:39:02 2006 Subject: [Numpy-discussion] Re: Silly array question In-Reply-To: <4436E3C9.2040807@noaa.gov> References: <4436CB1C.3040308@noaa.gov> <4436E3C9.2040807@noaa.gov> Message-ID: Christopher Barker wrote: > Sasha wrote: > >> One more obfuscated numpy entry: >> >>>>> M[tuple(transpose(I))] >> >> array([ 1, 6, 11, 6]) > > exactly. Can anyone explain why that works, but: > > M[transpose(I)] > > or > M[I] > > doesn't? There's some typechecking going on in __getitem__. Tuples are presumed to mean that each item in the tuple is indexing on a different axis. Non-tuples are presumed to be fancy array-indexing. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at mailcan.com Fri Apr 7 15:54:01 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Fri Apr 7 15:54:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436D6D1.6040302@cox.net> References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: <200604071844.37724.pgmdevlist@mailcan.com> Folks, I'm more or less in Eric's field (hydrology), and we do have to deal with missing values, that we can't interpolate straightforwardly (that is, without some dark statistical magic). Purely discarding the data is not an option either. MA fills the need, most of it. I think one of the issues is what is meant by 'masked data': - a missing observation ? - a NAN ? - a data we don't want to consider at one particular point ? For the last point, think about raster maps or bitmaps: calculations should be performed on a chunk of data, the initial data left untouched, and the result should both have the same size as the original, and valid only on the initial chunk. The current MA implementation, with its _data part and is _mask part, works nicely for the 3rd point. - I wonder whether implementing a 'filled' method for ndarrays is really better than letting the user create a MaskedArray, where the NANs are masked.In any case, a 'filled' method should always return a copy, as it's no longer the initial data. - I'm not sure what to do with the idea of making ndarray a subclass of MA . One on side, Tim pointed rightly that a ndarray is just a MA with a 'False' mask. Actually, I'm a bit frustrated with the standard 'asarray' that shows up in many functions. I'd prefer something like "if the argument is a non-numpy sequence (tuples,lists), transforming it in a ndarray, but if it's already a ndarray or a MA, leave it as it is. Don't touch the mask if present". That's how MA.asarray works, but unfortunately the std "asarray" gets rid of the mask (and you end up with something which is not what you'd expect). A 'mask=False' attribute in ndarray would be nice. On another, some methods/functions make sense only on unmasked ndarray (FFT, solving equations), some others are a bit tricky to implement (diff ? median...). Some exception could be raised if the arguments of these functions return True with ismasked (cf below), or that could be simplified if 'mask' was a default attribute of numarrays. I regularly have to use a ismasked function (cf below). def ismasked(a): if hasattr(a,'mask'): return a.mask.any() else: return False We're going towards MA as the default object. But then again, what would be the behavior to deal with missing values ? Using R-like na.actions ? That'd be great, but it's getting more complex. Oh, and another thing: if 'mask', or 'masked' becomes a default attribute of ndarrays, how do we define a mask? As a boolean ndarray whose 'mask' is always 'False' ? How do you __repr__ it ? - I agree that 'filled_value' is not very useful. If I want to fill an array, I'm happy to specify what value I want it filled with. In facts, I'd be happier to specifiy 'values'. I often have to work with 2D arrays, each column representing a different variable. If this array has to be filled, I'd like each column to be filled with one particular value, not necessarily the same along all columns: something like column_stack([A[:,k].filled(filler[k]) for k in range(A.shape[1])]) with filler a 1xA.shape[1] array of filling values. Of course, we could imagine the same thing for rows, or higher dimensions... Sorry for the rants... From pgmdevlist at mailcan.com Fri Apr 7 16:13:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Fri Apr 7 16:13:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436D6D1.6040302@cox.net> Message-ID: <200604071914.44752.pgmdevlist@mailcan.com> > filled: > 1. I don't like default fill value. It should be mandatory to > supply fill value. +1 > 2. It should return masked array (with trivial mask), not ndarray. -1. Unless 'mask/missing/na' becomes a default in ndarray, and other basic ndarray functions know how to deal with MA seamlessly > 3. The name conflicts with the "fill" method. fillmask ? clog ? > 4. View/Copy inconsistency. Does not provide a method to fill values > in-place. But once again, I don't think it should be the default behaviour ! A filled array should always be a copy of the initial array. Changing in place means changing the initial data, and I foresee lots of fun to find the original back. No ctrl+Z. > mask: > > 1. I've got rid of mask returning None in favor of False_ (boolean > array scalar), but it is still not perfect. I would prefer data.shape > == mask.shape invariant and if space saving/performance is deemed > necessary use zero-stride arrays. You,lost me on the strides, but I agree with data.shape==mask.shape as a std > 2. I don't like the name. "Missing" or "na" would be better. Once again, it's a point of view. Masked data also means 'data that I don't wanna see now, but that I may want to see later'. Like masking an bitmap/raster area. +0 for na, no for missing. > I would not object making mask read only, however. Good point. but I was more and more thinking of the opposite. I have a set of data that I group in three classes. Plotting one class is straightforward, I just have to mask the other two. Do I really want/need three objects for the same data ? Can't I just save three masks, and then run a data[mask] ? > If existing MA interface is rejected (which is > likely) for ndarray, we can easily experiment with the alternatives > within MA, which is pure python. Er... How many of us are using MA on a regular basis ? Aren't we a minority ? It'd seem wiser to adapt MA to numpy, in Python (but maybe that's the XIXe French integration model I grew up with that makes me talk here...) From ndarray at mac.com Fri Apr 7 16:31:03 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 16:31:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604071844.37724.pgmdevlist@mailcan.com> References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <200604071844.37724.pgmdevlist@mailcan.com> Message-ID: On 4/7/06, Pierre GM wrote: > ... > We're going towards MA as the default object. > I will be against changing the array structure to handle missing values. Let's keep the discussion focuced on the interface. Once we agree on the interface, it will be clear if any structural changes are necessary. > But then again, what would be the behavior to deal with missing values ? We can postpone this discussion as well. Just add mask attribute that returns False and filled method that returns a copy is an example of a minimalistic change. > Using R-like na.actions ? That'd be great, but it's getting more complex. > I don't like na.actions. I think missing values should behave like IEEE NaNs and in the floating point case should be represented by NaNs. The functionality provided by na.actions can always be achieved by calling an extra function (filled or compress). > Oh, and another thing: if 'mask', or 'masked' becomes a default attribute of > ndarrays, how do we define a mask? As a boolean ndarray whose 'mask' is > always 'False' ? How do you __repr__ it ? > See above. For ndarray mask is always False unless an add-on module is loaded that redefines arithmetic to recognize special bit-patterns such as NaN or INT_MIN. From tim.hochberg at cox.net Fri Apr 7 17:09:11 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 17:09:11 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: <4436FF73.7080408@cox.net> Sasha wrote: >On 4/7/06, Tim Hochberg wrote: > > >>[...] >> >>However, I do think the situation needs more thought. Slapping filled >>and mask onto ndarray is the path of least resistance, but it's not >>clear that it's the best one. >> >> > >Completely agree. I have many gripes about current ma implementation >of both "filled" and "mask". > >filled: > >1. I don't like default fill value. It should be mandatory to >supply fill value. > > That makes perfect sense. If anything should have a default fill value, it's the functsion calling filled, not the arrays themselves. >2. It should return masked array (with trivial mask), not ndarray. > > So, just with mask = False? In a follow on message Pierre disagress and claims that what you really want is the ndarray since not everything will accept. Then I guess you'd need to call b.filled(fill).data. I agree with Sasha in principle but Pierre, perhaps in practice. I'm almost suggested it get renames a.asndarray(fill), except that asXXX has the wrong conotations. I think this one needs to bounce around some more. >3. The name conflicts with the "fill" method. > > I thought you wanted to kill that. I'd certainly support that. Can't we just special case __setitem__ for that one case so that the performance is just as good if performance is really the issue? >4. View/Copy inconsistency. Does not provide a method to fill values in-place. > > b[b.mask] = fill_value; b.unmask() seems to work for this purpose. Can we just have filled return a copy? >mask: > >1. I've got rid of mask returning None in favor of False_ (boolean >array scalar), but it is still not perfect. I would prefer data.shape >== mask.shape invariant and if space saving/performance is deemed >necessary use zero-stride arrays. > > Interesting idea. Is that feasible yet? >2. I don't like the name. "Missing" or "na" would be better. > > I'm not on board here, although really I'd like to here from other people who use the package. 'na' seems to cryptic to me and 'missing' to specific -- there might be other reasons to mask a value other it being missing. The problem with mask is that it's not clear whether True means the data is useful or unuseful. Keep throwing out names, maybe one will stick. > > >>If we do decide we are going to add both of these methods to ndarray >>(with filled returning a copy!), then it may worth considering making >>ndarray a subclass of MaskedArray. Conceptually this makes sense, since >>at this point an ndarray will just be a MaskedArray where mask is always >>False. I think that they could share much of the implementation except >>that ndarray would be set up to use methods that ignored the mask >>attribute since they would know that it's always false. Even that might >>not be worth it, since the check for whether mask is True/False is just >>a pointer compare. >> >> >> > >The tail becoming the dog! Yet I agree, this makes sense from the >implementation point of view. From OOP perspective this would make >sense if arrays were immutable, but since mask is settable in >MaskedArray, making it constant in the subclass will violate the >substitution principle. I would not object making mask read only, >however. > > How do you set the mask? I keep getting attribute errors when I try it. And unmask would be a noop on an ndarray. > > >>It may in fact be best just to do away with MaskedArray entirely, moving >>the functionality into ndarray. That may have performance implications, >>although I don't seem them at the moment, and I don't know if there are >>other methods/attributes that this would imply need to be moved over, >>although it looks like just mask, filled and possibly filled_value, >>although the latter looks a little dubious to me. >> >> >> >I think MA can coexist with ndarray and share the interface. Ndarray >can use special bit-patterns like IEEE NaN to indicate missing >floating point values. Add-on modules can redefine arithmetic to make >INT_MIN behave as a missing marker for signed integers (R, K and J (I >think) languages use this approach). Applications that need missing >values support across the board will use MA. > > > > >>Either of the above two options would certainly improve the quality of >>MaskedArray. Copy for instance seems not to have been implemented, and >>who knows what other dark corners remain unexplored here. >> >> >> >More (corners) than you want to know about! Reimplementing MA in C >would be a worthwhile goal (and what you suggest seems to require just >that), but it is too big of a project. I suggest that we focus on the >interface first. If existing MA interface is rejected (which is >likely) for ndarray, we can easily experiment with the alternatives >within MA, which is pure python. > > Perhaps MaskedArray should inherit from ndarray for the time being. Many of the methods would need to reimplemented anyway, but it would make asanyarray work. Someone was just complaining about asarray munging his arrays. That's correct behaviour, but it would be nice if asanyarray did the right thing. I suppose we could just special case asanyarray to ignore MaskedArrays, that might be better since it's less constraining from an implementation side too. >>There's a whole spectrum of possibilities here from ones that don't >>intrude on ndarray at all to ones that profoundly change it. Sasha's >>suggestion looks like it's probably the simplest thing in the short >>term, but I don't know that it's the best long term solution. I think it >>needs more thought and discussion, which is after all what Sasha asked >>for ;) >> >> > >Exactly! > > This may be an oportune time to propose something that's been cooking in the back of my head for a week or so now: A stripped down array superclass. The details of this are not at all locked down, but here's a strawman proposal. We add an array superclass. call it basearray, that has the same C-structure as the existing ndarray. However, it has *no* methods or attributes. It's simply a big blob of data. Functions that work on the C structure of arrays (ufuncs, etc) would still work on this arrays, as would asarray, so it could be converted to an ndarray as necessary. In addition, we would supply a minimal set of functions that would operate on this object. These functions would be chosen so that the current array interface could be implemented on top of them and the basearray object in pure python. These functions would be things like set_shape(a, shape), etc. They would be segregated off in their own namespace, not in the numpy core. [Note that I'm not proposing we actually implement ndarray this way, just that we make is possible]. This leads to several useful outcomes. 1. If we're careful, this could be the basic array object that we propose, at least for the first roun,d for inclusion in the Python core. It's not useful for anything but passing data betwen various application that understand the data structure, but that in itself could be a huge win. And the fact that it's dirt simple would probably be an advantage to getting it into the core. 2. It provides a useful marker class. MA could inherit from it (and use itself for it's data attribute) and then asanyarray would behave properly. MA could also use this, or a subclass, as the mask object preventing anyone from accidentally using it as data (they could always use it on purpose with asarray). 3. It provides a platform for people to build other, ndarray-like classes in Pure python. This is my main interest. I've put together a thin shell over numpy that strips it down to it's abolute essentials including a stripped down version of ndarray that removes most of the methods. All of the __array_wrap__[1] stuff works quite well most of the time, but there's still some issues with being a subclass when this particular class is conceptually a superclass. If we had an array superclass of some sort, I believe that these would be resolved. In principle at least, this shouldn't be that hard. I think it should mostly be rearanging some code and adding some wrappers to existing functions. That's in principle. In practice, I'm not certain yet as I haven't investigated the code in question in much depth yet. I've been meaning to write this up into a more fleshed out proposal, but I got distracted by the whole Protocol discussion on python-dev3000. This writeup is pretty weak, but hopefully you get the idea. Anyway, this is somethig that I would be willing to put some time on that would benefit both me and probably the MA folks as well. Regards, -tim From efiring at hawaii.edu Fri Apr 7 17:27:09 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri Apr 7 17:27:09 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: <44370328.2060508@hawaii.edu> Tim Hochberg wrote: [...] > >> 2. I don't like the name. "Missing" or "na" would be better. >> >> > I'm not on board here, although really I'd like to here from other > people who use the package. 'na' seems to cryptic to me and 'missing' to > specific -- there might be other reasons to mask a value other it being > missing. The problem with mask is that it's not clear whether > True means the data is useful or unuseful. Keep throwing out names, > maybe one will stick. "hide" or "hidden"? A mask value of True essentially hides the underlying value. Eric From ndarray at mac.com Fri Apr 7 17:56:24 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 17:56:24 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > Perhaps MaskedArray should inherit from ndarray for the time being. Many > of the methods would need to reimplemented anyway, but it would make > asanyarray work. Someone was just complaining about asarray munging his > arrays. That's correct behaviour, but it would be nice if asanyarray did > the right thing. I suppose we could just special case asanyarray to > ignore MaskedArrays, that might be better since it's less constraining > from an implementation side too. > Just for the record. Currently MA does not inherit from ndarray. There are some benefits to be gained from changing MA design from containment to inheritance, by I am very sceptical about the use of inheritance in the array setting. > > > This may be an oportune time to propose something that's been cooking in > the back of my head for a week or so now: A stripped down array > superclass. This is a very worthwhile idea and I hate to see it burried in a non-descriptive thread. I've copied your proposal to the wiki at . From tim.hochberg at cox.net Fri Apr 7 18:44:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 18:44:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: <44371593.8060806@cox.net> Sasha wrote: >On 4/7/06, Tim Hochberg wrote: > > >>[...] >>Perhaps MaskedArray should inherit from ndarray for the time being. Many >>of the methods would need to reimplemented anyway, but it would make >>asanyarray work. Someone was just complaining about asarray munging his >>arrays. That's correct behaviour, but it would be nice if asanyarray did >>the right thing. I suppose we could just special case asanyarray to >>ignore MaskedArrays, that might be better since it's less constraining >>from an implementation side too. >> >> >> >Just for the record. Currently MA does not inherit from ndarray. > > Right, I checked that. That's why asanyarray won't work now with MA (unless someone changed the implementation of that while I wan't looking. >There are some benefits to be gained from changing MA design from >containment to inheritance, by I am very sceptical about the use of >inheritance in the array setting. > > That's probably a sensible position. Still it would be nice to have asanyarray pass masked arrays through somehow. I haven't thought this through very well, but I wonder if it would make sense for asanyarray to pass any object that supplies __array__. I'm leary of special casing asanyarray just for MA; somehow that seems the wrong approach. >>This may be an oportune time to propose something that's been cooking in >>the back of my head for a week or so now: A stripped down array >>superclass. >> >> > >This is a very worthwhile idea and I hate to see it burried in a >non-descriptive thread. I've copied your proposal to the wiki at >. > > Thanks for doing that. I'm glad you like the general idea. I do plan to write it through and try to get a better handle on what this would entail and what the consequences would be. However, I'm not sure exactly when I'll get around to it so it's probably better that a rough draft be out there for people to think about in the interim. -tim > > > From ndarray at mac.com Fri Apr 7 18:47:09 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 18:47:09 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > >1. I don't like default fill value. It should be mandatory to > >supply fill value. > > > > > That makes perfect sense. If anything should have a default fill value, > it's the functsion calling filled, not the arrays themselves. > It looks like we are getting close to a consensus on this one. I will remove fill_value attribute. [...] > >3. The name conflicts with the "fill" method. > > > > > I thought you wanted to kill that. I'd certainly support that. Can't we > just special case __setitem__ for that one case so that the performance > is just as good if performance is really the issue? > I'll propose a patch. > >4. View/Copy inconsistency. Does not provide a method to fill values in-place. > > > > > b[b.mask] = fill_value; b.unmask() > > seems to work for this purpose. Can we just have filled return a copy? > +1 > >mask: > > > >1. I've got rid of mask returning None in favor of False_ (boolean > >array scalar), but it is still not perfect. I would prefer data.shape > >== mask.shape invariant and if space saving/performance is deemed > >necessary use zero-stride arrays. > > > > > Interesting idea. Is that feasible yet? > It is not feasible in pure python module like ma, but easy in ndarray. We can also reset the writeable flag to avoid various problems that zero strides may cause. I'll propose a patch. > >2. I don't like the name. "Missing" or "na" would be better. > > > > > I'm not on board here, although really I'd like to here from other > people who use the package. 'na' seems to cryptic to me and 'missing' to > specific -- there might be other reasons to mask a value other it being > missing. The problem with mask is that it's not clear whether > True means the data is useful or unuseful. Keep throwing out names, > maybe one will stick. > The problem with the "mask" name is that ndarray already has unrelated "putmask" method. On the other hand putmask is redundant with fancy indexing. I have no other problem with "mask" name, so we may just decide to get rid of "putmask". > [...] > How do you set the mask? I keep getting attribute errors when I try it. a[i] = masked makes i-th element masked. If mask is an array, you can just set its elements. > And unmask would be a noop on an ndarray. > Yes. [...] From ndarray at mac.com Fri Apr 7 18:56:01 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 18:56:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <44371593.8060806@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> <44371593.8060806@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > Still it would be nice to have asanyarray pass masked arrays through > somehow. I haven't thought this through very well, but I wonder if it > would make sense for asanyarray to pass any object that supplies > __array__. I'm leary of special casing asanyarray just for MA; somehow > that seems the wrong approach. One possiblility is to make asanyarray pass through objects that have __array_wrap__ attribute. From pgmdevlist at mailcan.com Fri Apr 7 20:40:03 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Fri Apr 7 20:40:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436FF73.7080408@cox.net> Message-ID: <200604072258.34153.pgmdevlist@mailcan.com> > >2. It should return masked array (with trivial mask), not ndarray. > > So, just with mask = False? In a follow on message Pierre disagress and > claims that what you really want is the ndarray since not everything > will accept. Then I guess you'd need to call b.filled(fill).data. I > agree with Sasha in principle but Pierre, perhaps in practice. Well, if 'mask' became a default argument of ndarray, that wouldn't be a pb any longer. I'm quite for that. > I'm > almost suggested it get renames a.asndarray(fill), except that asXXX has > the wrong conotations. I think this one needs to bounce around some more. tondarray(fill) ? > >4. View/Copy inconsistency. Does not provide a method to fill values > > in-place. > seems to work for this purpose. Can we just have filled return a copy? Yes ! > > The problem with mask is that it's not clear whether > > True means the data is useful or unuseful. I have to think twice all the time I want to create a mask that True means in fact that I don't want the data, whereas True selects the data for ndarray... > "hide" or "hidden"? A mask value of True essentially hides the > underlying value. Unless when there's no underlying value ;). Rose, rose... I'm happy with mask, it reminds me of GRASS and gimp > The problem with the "mask" name is that ndarray already has unrelated > "putmask" method. On the other hand putmask is redundant with fancy > indexing. I have no other problem with "mask" name, so we may just > decide to get rid of "putmask". "putmask" really seems overkill indeed. I wouldn't miss it. > How do you set the mask? I keep getting attribute errors when I try it. > And unmask would be a noop on an ndarray. I've implemented something like that for some classes (inheriting from MA.MaskedArray). Never really used it yet, though #-------------------------------------------- def applymask(self,m): if not MA.is_mask(m): raise MA.MAError,"Invalid mask !" elif self._data.shape != m.shape: raise MA.MAError,"Mask and data not compatible." else: self._dmask = m > This may be an oportune time to propose something that's been cooking in > the back of my head for a week or so now: A stripped down array > superclass. That'd be great indeed, and may solve some problems reported on th list about subclassing ndarray. AAMOF, I gave up trying to use ndarray as a superclass, and rely only on MA From zdm105 at tom.com Sat Apr 8 01:56:02 2006 From: zdm105 at tom.com (=?GB2312?B?NNTCMTUtMTbJz7qjLzIxLTIyye7b2g==?=) Date: Sat Apr 8 01:56:02 2006 Subject: [Numpy-discussion] =?GB2312?B?QUTUy9PDRVhDRUy02b34ytCzodOqz/rT67LGzvG53MDt?= Message-ID: An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Sat Apr 8 20:02:11 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Sat Apr 8 20:02:11 2006 Subject: [Numpy-discussion] Unexpected change of array used to index another array Message-ID: Hi. I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP below). I suppose because OFFS_TMP has negative numbers, it was changed to cycle around to 9 wherever there is a negative 1 (which is the forward version of -1 if you are a 10 x 10 matrix). You can analogous behavior with -2 => 8, etc. Is changing the indexing matrix really the correct behavior? The result of using the index seems to be fine. Has this story been told already and I didn't know it? Below is my ipython session. In [57]: OFFS_TMP Out[57]: array([[-1, 1], [ 0, 1], [ 1, 1], [-1, 0], [ 0, 0], [ 1, 0], [-1, -1], [ 0, -1], [ 1, -1]]) In [58]: bigM[OFFS_TMP] Out[58]: array([[[False, True, False, False, True, False, True, True, True, False], [False, True, False, True, True, False, False, False, True, True]], [[True, False, True, False, True, True, False, False, False, True], [False, True, False, True, True, False, False, False, True, True]], [[False, True, False, True, True, False, False, False, True, True], [False, True, False, True, True, False, False, False, True, True]], [[False, True, False, False, True, False, True, True, True, False], [True, False, True, False, True, True, False, False, False, True]], [[True, False, True, False, True, True, False, False, False, True], [True, False, True, False, True, True, False, False, False, True]], [[False, True, False, True, True, False, False, False, True, True], [True, False, True, False, True, True, False, False, False, True]], [[False, True, False, False, True, False, True, True, True, False], [False, True, False, False, True, False, True, True, True, False]], [[True, False, True, False, True, True, False, False, False, True], [False, True, False, False, True, False, True, True, True, False]], [[False, True, False, True, True, False, False, False, True, True], [False, True, False, False, True, False, True, True, True, False]]], dtype=bool) In [59]: OFFS_TMP Out[59]: array([[9, 1], [0, 1], [1, 1], [9, 0], [0, 0], [1, 0], [9, 9], [0, 9], [1, 9]]) From robert.kern at gmail.com Sat Apr 8 21:17:28 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 8 21:17:28 2006 Subject: [Numpy-discussion] Re: Unexpected change of array used to index another array In-Reply-To: References: Message-ID: Webb Sprague wrote: > Hi. > > I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP > below). I suppose because OFFS_TMP has negative numbers, it was > changed to cycle around to 9 wherever there is a negative 1 (which is > the forward version of -1 if you are a 10 x 10 matrix). You can > analogous behavior with -2 => 8, etc. Is changing the indexing matrix > really the correct behavior? The result of using the index seems to > be fine. Has this story been told already and I didn't know it? I think it's a bug. I've located the problem, but I'm not familiar with that part of the code so I'm not entirely sure how to go about fixing it. http://projects.scipy.org/scipy/numpy/ticket/49 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lbirvyx at teamoneadv.com Sun Apr 9 03:13:05 2006 From: lbirvyx at teamoneadv.com (lbirvyx) Date: Sun Apr 9 03:13:05 2006 Subject: [Numpy-discussion] Fw: numpy-discussion Message-ID: <001101c65bbe$21f165a0$29d13e50@JIPC846> ----- Original Message ----- From: Burks Aileen To: itwymeyq at acecannon.com Sent: Saturday, April 08, 2006 10:37 AM Subject: numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-discussion.gif Type: image/gif Size: 24405 bytes Desc: not available URL: From webb.sprague at gmail.com Sun Apr 9 15:21:01 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Sun Apr 9 15:21:01 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float Message-ID: Could someone explain this behavior: In [13]: type(N.floor(1)) Out[13]: In [14]: N.floor? Type: ufunc String Form: Namespace: Interactive Docstring: y = floor(x) elementwise largest integer <= x I wouldn't complain, except the only time I use floor() is to make indices (dividing ages by age widths, for example). Thanks! From tim.hochberg at cox.net Sun Apr 9 15:30:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 9 15:30:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: Message-ID: <44398AFD.4050304@cox.net> Webb Sprague wrote: >Could someone explain this behavior: > >In [13]: type(N.floor(1)) >Out[13]: > >In [14]: N.floor? >Type: ufunc >String Form: >Namespace: Interactive >Docstring: > y = floor(x) elementwise largest integer <= x > >I wouldn't complain, except the only time I use floor() is to make >indices (dividing ages by age widths, for example). > > Well, floor returns an integer, but not an int -- it's an integral floating point value. What you want is: numpy.floor(1).astype(int) (If you're only using scalars, you might also consider int(floor(x)) instead. Regards, -tim >Thanks! > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642 >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > From webb.sprague at gmail.com Sun Apr 9 15:40:02 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Sun Apr 9 15:40:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: <44398AFD.4050304@cox.net> References: <44398AFD.4050304@cox.net> Message-ID: I think the docstring implies that numpy.floor() returns an integer value. One can cast the float value to a usable integer value, but either the docstring should read something different or the function should be changed (my preference). "y = floor(x) elementwise largest integer <= x" is the docstring. As far as "integral valued float" versus "integer", this distinction seems a little obscure... I am sure the difference is very important in some contexts, but I for one think that floor should return a straight up integer, if just for code style (see example below). Plus it will be upcast to a float whenever necessary, so floor(4.5) + .75 == 4.75 whether floor() returns an int or a float. fooMatrix[numpy.floor(age/ageWidth)] is better (easier to type, read, and debug) than fooMatrix[numpy.floor(age/ageWidth).astype(int)] If there is a explanation as to why an integral valued float is a better return value, I would be interested in a link. Thx W From robert.kern at gmail.com Sun Apr 9 15:46:04 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 9 15:46:04 2006 Subject: [Numpy-discussion] Re: numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> Message-ID: Webb Sprague wrote: > If there is a explanation as to why an integral valued float is a > better return value, I would be interested in a link. In [4]: import numpy In [5]: numpy.floor(2.**50) Out[5]: 1125899906842624.0 In [6]: numpy.floor(2.**50).astype(int) Out[6]: 2147483647 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Sun Apr 9 16:07:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 9 16:07:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> Message-ID: <443993E3.1090901@cox.net> Webb Sprague wrote: >I think the docstring implies that numpy.floor() returns an integer >value. > You've been programming to much! Everywhere but the computer programming world, 1.0 is integer. Even their, many (most?) computer languages avoid the term integer using int, Int or something similar. The distinction made between ints and integral floating point values is mostly an artificial one resulting from implementation issues. Making this distinction is also a handy, if imperfect, proxy for exact / versus inexact numbers. >One can cast the float value to a usable integer value, but >either the docstring should read something different or the function >should be changed (my preference). > >"y = floor(x) elementwise largest integer <= x" is the docstring. > >As far as "integral valued float" versus "integer", this distinction >seems a little obscure... > An integral floating point value *is* an integer, just ask any 12 year old. What's obscure is the way concepts of integers and reals get mapped to ints and floats. Don't get me wrong, these are reasonable comprises given the sad reality that computers are not so hot at representing inifinte quantities. However, we get sucked into thinking that integers and ints are really the same things at our peril. Similarly for floats and reals. > I am sure the difference is very important >in some contexts, but I for one think that floor should return a >straight up integer, > It's a ufunc. Ufuncs in general return the same type that they operate on. So, not only would this be difficult, it would make the signature of ufuncs harder to remember. Also, as Robert Kern just pointed out, not all intergral FP values can be represents as ints. > if just for code style (see example below). Plus >it will be upcast to a float whenever necessary, so floor(4.5) + .75 >== 4.75 whether floor() returns an int or a float. > > Not every two-line Python function has to come pre-written -- Tim Peters on C.L.P def webbsfloor(x): return numpy.floor(x).astype(int) >fooMatrix[numpy.floor(age/ageWidth)] > >is better (easier to type, read, and debug) than > >fooMatrix[numpy.floor(age/ageWidth).astype(int)] > >If there is a explanation as to why an integral valued float is a >better return value, I would be interested in a link. > > I think there's at least four reasons: 1. It would be a pain. 2. It would make the ufuncs inconsistent. 3. It's a thin wrapper over C's floor, so people coming from that language be confused. 4. It wouldn't work for numbers with very large magnitudes. Pick any three Regards, -tim From tim.hochberg at cox.net Sun Apr 9 20:09:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 9 20:09:03 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: <443993E3.1090901@cox.net> References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> Message-ID: <4439CC7E.90704@cox.net> Tim Hochberg wrote: > Webb Sprague wrote: > >> I think the docstring implies that numpy.floor() returns an integer >> value. > > You've been programming to much! > > Everywhere but the computer programming world, 1.0 is integer. Even > their, many (most?) computer languages avoid the term integer using > int, Int or something similar. The distinction made between ints and > integral floating point values is mostly an artificial one resulting > from implementation issues. Making this distinction is also a handy, > if imperfect, proxy for exact / versus inexact numbers. > >> One can cast the float value to a usable integer value, but >> either the docstring should read something different or the function >> should be changed (my preference). >> >> "y = floor(x) elementwise largest integer <= x" is the docstring. > Let me just add that, since this seems to cause confusion, it would be appropriate to amend the docstring tobe explicit that this always returns an integral floating point value. If someone wants to suggest wording, I can figure out where to put it. One possibility is: "y = floor(x) elementwise largest integer <= x; note that the result is a floating point value" or "y = floor(x) elementwise largest integral float <= x" Neither of those is great, but perhaps they'll inspire someone to do better. -tim >> >> As far as "integral valued float" versus "integer", this distinction >> seems a little obscure... >> > An integral floating point value *is* an integer, just ask any 12 year > old. What's obscure is the way concepts of integers and reals get > mapped to ints and floats. Don't get me wrong, these are reasonable > comprises given the sad reality that computers are not so hot at > representing inifinte quantities. However, we get sucked into > thinking that integers and ints are really the same things at our > peril. Similarly for floats and reals. > >> I am sure the difference is very important >> in some contexts, but I for one think that floor should return a >> straight up integer, >> > It's a ufunc. Ufuncs in general return the same type that they operate > on. So, not only would this be difficult, it would make the signature > of ufuncs harder to remember. > > Also, as Robert Kern just pointed out, not all intergral FP values can > be represents as ints. > >> if just for code style (see example below). Plus >> it will be upcast to a float whenever necessary, so floor(4.5) + .75 >> == 4.75 whether floor() returns an int or a float. >> >> > Not every two-line Python function has to come pre-written -- Tim > Peters on C.L.P > > def webbsfloor(x): > return numpy.floor(x).astype(int) > >> fooMatrix[numpy.floor(age/ageWidth)] >> >> is better (easier to type, read, and debug) than >> >> fooMatrix[numpy.floor(age/ageWidth).astype(int)] >> >> If there is a explanation as to why an integral valued float is a >> better return value, I would be interested in a link. >> >> > I think there's at least four reasons: > > 1. It would be a pain. > 2. It would make the ufuncs inconsistent. > 3. It's a thin wrapper over C's floor, so people coming from that > language be confused. > 4. It wouldn't work for numbers with very large magnitudes. > > Pick any three > > > Regards, > > -tim > From charlesr.harris at gmail.com Sun Apr 9 22:12:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun Apr 9 22:12:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: <4439CC7E.90704@cox.net> References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> <4439CC7E.90704@cox.net> Message-ID: Tim, On 4/9/06, Tim Hochberg wrote: > Let me just add that, since this seems to cause confusion, it would be > appropriate to amend the docstring tobe explicit that this always > returns an integral floating point value. If someone wants to suggest > wording, I can figure out where to put it. One possibility is: > > "y = floor(x) elementwise largest integer <= x; note that the result > is a floating point value" > > or > > "y = floor(x) elementwise largest integral float <= x" How about, "for each item in x returns the largest integral float <= item." Chuck P.S. I too once found the C definition of the floor function annoying, but I got used to it. Sorta like getting used to a broken leg. The main problem is that the result can't be used as an index without conversion to a "real" integer. Integers aren't members of the reals (or rationals): apart from +/- 1, integers don't have inverses. There happens to be an injective ring homomorphism of the integers into the reals, but that is not the same thing. On the other hand, ints are generally not big enough to hold all of the integral doubles, so as a practical matter the originators made the best choice. Things do get a bit weird for large floats because above a certain threshold floats are already integral values. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Apr 9 22:21:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun Apr 9 22:21:02 2006 Subject: [Numpy-discussion] Unexpected change of array used to index another array In-Reply-To: References: Message-ID: On 4/8/06, Webb Sprague wrote: > > Hi. > > I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP > below). I suppose because OFFS_TMP has negative numbers, it was > changed to cycle around to 9 wherever there is a negative 1 (which is > the forward version of -1 if you are a 10 x 10 matrix). You can > analogous behavior with -2 => 8, etc. Is changing the indexing matrix > really the correct behavior? The result of using the index seems to > be fine. Has this story been told already and I didn't know it? It's the python way: >>> a = [1,2,3] >>> a[-1] 3 It gives a convenient way to index from the end of the array. But I'm not sure that was your question. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Apr 10 00:02:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 10 00:02:01 2006 Subject: [Numpy-discussion] Re: Unexpected change of array used to index another array In-Reply-To: References: Message-ID: Charles R Harris wrote: > > On 4/8/06, *Webb Sprague* > wrote: > > Hi. > > I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP > below). I suppose because OFFS_TMP has negative numbers, it was > changed to cycle around to 9 wherever there is a negative 1 (which is > the forward version of -1 if you are a 10 x 10 matrix). You can > analogous behavior with -2 => 8, etc. Is changing the indexing matrix > really the correct behavior? The result of using the index seems to > be fine. Has this story been told already and I didn't know it? > > It's the python way: > >>>> a = [1,2,3] >>>> a[-1] > 3 > > It gives a convenient way to index from the end of the array. But I'm > not sure that was your question. That's not the issue. The problem was that the index array was being modified in-place simply by being used as an index array. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From arnd.baecker at web.de Mon Apr 10 04:01:05 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 10 04:01:05 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <4434D6DF.2020306@ieee.org> References: <44315633.4010600@cox.net> <4434D6DF.2020306@ieee.org> Message-ID: On Thu, 6 Apr 2006, Travis Oliphant wrote: > Arnd Baecker wrote: > > BTW, it seems that we have no Numeric to numpy transition remarks in > > www.scipy.org. I only found > > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > > and of course Travis' "Guide to NumPy" contains a detailed list of > > necessary changes in chapter 2.6.1. > > > For clarification: this is in the sample chapter available on-line to > all.... yes, I should have emphasized that. I tried to make this also clearer at http://www.scipy.org/Converting_from_Numeric > > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > > automatic conversion. > > > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > > (similar to http://www.scipy.org/Converting_from_numarray) > > which aims at summarizing the necessary changes > > or expand Pearu's page (if he agrees) on this? > > > > Absolutely. I did the Numarray page because I'd written a lot on > Converting from Numeric (even providing convertcode.py) but very little > for numarray --- except the ndimage conversion. So, I started the > Numarray page. Sounds like a great idea to have a dual page. Best, Arnd P.S.: BTW +1 to all which has been said in the other thread on NumPy documentation - you are really doing a brilliant job, Travis!!! From webb.sprague at gmail.com Mon Apr 10 07:16:04 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Mon Apr 10 07:16:04 2006 Subject: [Numpy-discussion] Unexpected change of array used to index another array In-Reply-To: References: Message-ID: > > It's the python way: > > >>> a = [1,2,3] > >>> a[-1] > 3 > > It gives a convenient way to index from the end of the array. But I'm not > sure that was your question. No there, was a bug in that when using one matrix to index another, in that the indexing matrix gets changed. As if you did >>> i = 4 >>> a = [1,2,3] >>> a[i] >>> print i -1 I know about the negative trick in simple python lists, I was trying to do something in matrices (where it works too, but that wasn't the issue. Thanks for trying to help, though. W From webb.sprague at gmail.com Mon Apr 10 07:19:22 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Mon Apr 10 07:19:22 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> <4439CC7E.90704@cox.net> Message-ID: > > "y = floor(x) elementwise largest integer <= x; note that the result > > is a floating point value" I prefer this, if it makes any difference. The others are more succint, but less likely to help others in my situation. > I too once found the C definition of the floor function annoying, but I got > used to it. Sorta like getting used to a broken leg. Annoying yes, crippling no. I guess I should have grown up on a real programming language :) From tim.hochberg at cox.net Mon Apr 10 09:13:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 09:13:03 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> <4439CC7E.90704@cox.net> Message-ID: <443A844C.7070306@cox.net> Charles R Harris wrote: > Tim, > > On 4/9/06, *Tim Hochberg* > wrote: > > Let me just add that, since this seems to cause confusion, it would be > appropriate to amend the docstring tobe explicit that this always > returns an integral floating point value. If someone wants to suggest > wording, I can figure out where to put it. One possibility is: > > "y = floor(x) elementwise largest integer <= x; note that the > result > is a floating point value" > > or > > "y = floor(x) elementwise largest integral float <= x" > > > How about, "for each item in x returns the largest integral float <= > item." That seems pretty good. I'll wait a day or so and see what else shows up. > > Chuck > > P.S. > > I too once found the C definition of the floor function annoying, but > I got used to it. Sorta like getting used to a broken leg. The main > problem is that the result can't be used as an index without > conversion to a "real" integer. Integers aren't members of the reals > (or rationals): apart from +/- 1, integers don't have inverses. > There happens to be an injective ring homomorphism of the integers > into the reals, but that is not the same thing. I'm not conversant with the terminology [here I rummage through google to try to get the terminology sort of right], but as I understand it integers (I) are a subset of reals (R). The ring that you contruct with integers consists of the set of integers plus the operations of addition/subtraction and multiplication as well as an identity. I've seen that specified as something like (I, +/-, *, 0). Similarly, the set of reals (R) and the field that one constructs from them are not really the same thing. So while the ring of integers is not a subset of the field of reals (the statement doesn't even make sense when put that way),the set of integers is a subset of the set of reals. I think that most people, outside of computer programmers and perhaps math majors, think of the set of integers, not the field of integers, to the extent that they think about integers and reals at all. I imagine most people would conjure up some Dali like image when confronted with the notion of a field of integerse! (C-int, +/-, *, 0), actually forms a finite field which is not at all the same thing the field of integers. Bit twiddlers tend to understand and even exploit this, but a lot of people conflate the field of ints with the field of integers. This works fine as long as your values are small in magnitude, but eventually will rise up and bite you. Floats are even worse, since they don't even form a field, I think they're actually a semiring because of INF/NAN/IND, but I'm not certain about that. Issues with floating point pop up everywhere and if you squint the right way, you can blame them on their lack of fieldness. Which is closely tied to their finite range and precision, which is what bites people. Because Python automatically promotes (Python) ints to (Python) longs, Python ints map, for most puposes, onto the field of integers. However, in numpy wer're stuck using C-ints for performance reasons, so we'd be wise to keep the differences between ints and integers in the back of our mind. This is wandering rather far afield (although it's entertaining). > On the other hand, ints are generally not big enough to hold all of > the integral doubles, so as a practical matter the originators made > the best choice. Things do get a bit weird for large floats because > above a certain threshold floats are already integral values. Another issue at the moment is that integer division does an implicit flooring or truncation (I believe it's implementation dependant in C) in both C and Python, so if you aren't using floor to produce an index, something I've been known to do, having it return an integer could also lead to nasty suprises. For example: def half_integer(x): "return nearest half integer below x" return floor(2*x) / 2 Would start failing mysteriously. Of course the above is an overflow magnet, so perhaps it's not the best example. Eventually, '/' is going to mean true_division and '//' will mean floor_division, so this particular issue will go away. Regards, -tim > > > From bsouthey at gmail.com Mon Apr 10 09:16:08 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Mon Apr 10 09:16:08 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <200604071844.37724.pgmdevlist@mailcan.com> Message-ID: Hi, On 4/7/06, Sasha wrote: > On 4/7/06, Pierre GM wrote: > > ... > > We're going towards MA as the default object. > > > I will be against changing the array structure to handle missing > values. Let's keep the discussion focuced on the interface. Once we > agree on the interface, it will be clear if any structural changes are > necessary. > > > > But then again, what would be the behavior to deal with missing values ? > > We can postpone this discussion as well. Just add mask attribute that > returns False and filled method that returns a copy is an example of a > minimalistic change. I think that the usage of MA is important because this often dictates the interface. The other aspect is the penalty that is imposed by requiring a masked features especially to situations that don't need any of these features. > > > Using R-like na.actions ? That'd be great, but it's getting more complex. > > > > I don't like na.actions. I think missing values should behave like > IEEE NaNs and in the floating point case should be represented by > NaNs. I think the issue related to how masked values should be handled in computation. Does it matter if the result of an operation is due to a masked value or numerical problem (like dividing by zero)? (I am presuming that it is possible to identify this difference.) If not, then I support the idea of treating masked values as NaN. >The functionality provided by na.actions can always be achieved > by calling an extra function (filled or compress). I am not clear on what you actually mean here. For example, if you are summing across a particular dimension, I would presume that any masked value would be ignored an that there would be some record of the fact that a masked value was encountered. This would allow that 'extra function' to handle the associated result. Alternatively the 'extra function' would have to be included as an argument - which is what the na.actions do. Regards Bruce From ndarray at mac.com Mon Apr 10 09:49:05 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 09:49:05 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <200604071844.37724.pgmdevlist@mailcan.com> Message-ID: On 4/10/06, Bruce Southey wrote: > > [...] > I think the issue related to how masked values should be handled in > computation. Does it matter if the result of an operation is due to a > masked value or numerical problem (like dividing by zero)? (I am > presuming that it is possible to identify this difference.) If not, > then I support the idea of treating masked values as NaN. > IEEE standard prvides plenty of spare bits in NaNs to represent pretty much everything, and some languages take advantage of that feature. (I believe NA and NaN are distinct in R). In MA, however mask elements are boolean and no distinction is made between various reasons for not having a data element. For consistency, a non-trivial (not always false) implementation of ndarray.mask should return "not finite" and ignore bits that distinguish NaNs and infinities. > >The functionality provided by na.actions can always be achieved > > by calling an extra function (filled or compress). > > I am not clear on what you actually mean here. For example, if you > are summing across a particular dimension, I would presume that any > masked value would be ignored an that there would be some record of > the fact that a masked value was encountered. This would allow that > 'extra function' to handle the associated result. Alternatively the > 'extra function' would have to be included as an argument - which is > what the na.actions do. > If you sum along a particular dimension and encounter a masked value, the result is masked. The same is true if you encounter a NaN - the result is NaN. If you would like to ignore masked values, you write a.filled(0).sum() instead of a.sum(). In 1d case, you can also use a.compress().sum(). In other words, what in R you achieve with a flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an explicit call to "fill". This is not quite the same as na.actions in R, but that is what I had in mind. From pgmdevlist at mailcan.com Mon Apr 10 10:58:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Mon Apr 10 10:58:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: Message-ID: <200604101356.44903.pgmdevlist@mailcan.com> > If you sum along a particular dimension and encounter a masked value, > the result is masked. That's not how it currently works (still on 0.9.6): x=arange(12).reshape(3,4) MA.masked_where((x%5==0) | (x%3==0),x).sum(0) array(data = [12 1 2 18], mask = [False False False False], fill_value=999999) and frankly, I'd be quite frustrated if it had to change: - `filled` is not a ndarray method, which means that a.filled(0).sum() fails if a is not MA. Right now, I can use a.sum() without having to check the nature of a first. - this behavior was already in Numeric - All my scripts rely on it (but I guess that's my problem) - The current way reflects how mask are used in GIS or image processing. > If you would like to ignore masked values, you write > a.filled(0).sum() instead of a.sum(). In 1d case, you can also use > a.compress().sum(). Once again, Sasha, I'd agree with you if it wasn't a major difference > In other words, what in R you achieve with a > flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an > explicit call to "fill". This is not quite the same as na.actions in > R, but that is what I had in mind. I kinda like the idea of a flag, though From ndarray at mac.com Mon Apr 10 11:37:00 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 11:37:00 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101356.44903.pgmdevlist@mailcan.com> References: <200604101356.44903.pgmdevlist@mailcan.com> Message-ID: On 4/10/06, Pierre GM wrote: > > If you sum along a particular dimension and encounter a masked value, > > the result is masked. > > That's not how it currently works (still on 0.9.6): > > [... longish example snipped ...] >>> ma.array([1,1], mask=[0,1]).sum() 1 > and frankly, I'd be quite frustrated if it had to change: > - `filled` is not a ndarray method, which means that a.filled(0).sum() fails > if a is not MA. Right now, I can use a.sum() without having to check the > nature of a first. This is exactly the point of the current discussion: make fill a method of ndarray. With the current behavior, how would you achieve masking (no fill) a.sum()? > - this behavior was already in Numeric That's true, but it makes the result of sum(a) different from __builtins__.sum(a). I believe consistency with the python conventions is more important than with legacy Numeric in the long run. > [...] > - The current way reflects how mask are used in GIS or image processing. > Can you elaborate on this? Note that in R na.rm is false by default in sum: > sum(c(1,NA)) [1] NA So it looks like the convention is different in the field of statistics. > > If you would like to ignore masked values, you write > > a.filled(0).sum() instead of a.sum(). In 1d case, you can also use > > a.compress().sum(). > > Once again, Sasha, I'd agree with you if it wasn't a major difference Array methods are a very recent addition to ma. We can still use this window of opportunity to get things right before to many people get used to the wrong behavior. (Note that I changed your implementation of cumsum and cumprod.) > > > In other words, what in R you achieve with a > > flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an > > explicit call to "fill". This is not quite the same as na.actions in > > R, but that is what I had in mind. > > I kinda like the idea of a flag, though With the flag approach making ndarray and ma.array interfaces consistent would require adding an extra argument to many methods. Instead, I poropose to add one method: fill to ndarray. From pgmdevlist at mailcan.com Mon Apr 10 13:37:07 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Mon Apr 10 13:37:07 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <200604101356.44903.pgmdevlist@mailcan.com> Message-ID: <200604101638.29979.pgmdevlist@mailcan.com> > > [... longish example snipped ...] > > > >>> ma.array([1,1], mask=[0,1]).sum() > > 1 So ? The result is not `masked`, the missing value has been omitted. MA.array([[1,1],[1,1]],mask=[[0,1],[1,0]]).sum() array(data = [1 1], mask = [False False], fill_value=999999) > This is exactly the point of the current discussion: make fill a > method of ndarray. Mrf. I'm still not convinced, but I have nothing against it. Along with a mask=False_ by default ? > With the current behavior, how would you achieve masking (no fill) a.sum()? Er, why would I want to get MA.masked along one axis if one value is masked ? The current behavior is to mask only if all the values along that axis are masked: MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() array(data = [1 999999], mask = [False True], fill_value=999999) With a.filled(0).sum(), how would you distinguish between the cases (a) at least one value is not masked and (b) all values are masked ? (OK, by querying the mask with something in the line of a a._mask.all(axis), but it's longer... Oh well, I'll just to adapt) > > - this behavior was already in Numeric > > That's true, but it makes the result of sum(a) different from > __builtins__.sum(a). I believe consistency with the python > conventions is more important than with legacy Numeric in the long > run. > > Array methods are a very recent addition to ma. We can still use this > window of opportunity to get things right before to many people get > used to the wrong behavior. (Note that I changed your implementation > of cumsum and cumprod.) Good points... We'll just have to put strong warnings everywhere. > > > > - The current way reflects how mask are used in GIS or image processing. > > Can you elaborate on this? Note that in R na.rm is false by default in sum: > > sum(c(1,NA)) > > [1] NA > > So it looks like the convention is different in the field of statistics. MMh. *digs in his old GRASS scripts* OK, my bad. I had to fill missing values somehow, or at least check whether there were any before processing. I'll double check on that. Please temporarily forget that comment. > With the flag approach making ndarray and ma.array interfaces > consistent would require adding an extra argument to many methods. > Instead, I poropose to add one method: fill to ndarray. OK, good point. On a semantic aspect: While digging these GRASS scripts I mentioned, I realized/remembered that masked values are called 'null', when there's no data, a NAN, or just when you want to hide some values. What about 'null' instead of 'mask','missing','na' ? From tim.hochberg at cox.net Mon Apr 10 14:14:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 14:14:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101638.29979.pgmdevlist@mailcan.com> References: <200604101356.44903.pgmdevlist@mailcan.com> <200604101638.29979.pgmdevlist@mailcan.com> Message-ID: <443AC5CB.2000704@cox.net> Pierre GM wrote: >>>[... longish example snipped ...] >>> >>> >>> >>>>>ma.array([1,1], mask=[0,1]).sum() >>>>> >>>>> >>1 >> >> >So ? The result is not `masked`, the missing value has been omitted. > >MA.array([[1,1],[1,1]],mask=[[0,1],[1,0]]).sum() >array(data = [1 1], mask = [False False], fill_value=999999) > > > > >>This is exactly the point of the current discussion: make fill a >>method of ndarray. >> >> >Mrf. I'm still not convinced, but I have nothing against it. Along with a >mask=False_ by default ? > > > >>With the current behavior, how would you achieve masking (no fill) a.sum()? >> >> >Er, why would I want to get MA.masked along one axis if one value is masked ? > > Any number of reasons I would think. It depends on what your using the data for. If the sum is the total amount that you spent in the month, and a masked value means you lost that check stub, then you don't know how much you actually spent and that value should be masked. To chose a boring example. >The current behavior is to mask only if all the values along that axis are >masked: > >MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() >array(data = [1 999999], mask = [False True], fill_value=999999) > >With a.filled(0).sum(), how would you distinguish between the cases (a) at >least one value is not masked and (b) all values are masked ? (OK, by >querying the mask with something in the line of a a._mask.all(axis), but it's >longer... Oh well, I'll just to adapt) > > Actually I'm going to ask you the same question. Why would care if all of the values are masked? I may be missing something, but either there's a sensible default value, in which case it doesn't matter how many values are masked, or you can't handle any masked values and the result should be masked if there are any masks in the input. Sasha's proposal handle those two cases well. Your behaviour a little more clunkily, but I'd like to understand why you want that behaviour. Regards, -tim > > >>>- this behavior was already in Numeric >>> >>> >>That's true, but it makes the result of sum(a) different from >>__builtins__.sum(a). I believe consistency with the python >>conventions is more important than with legacy Numeric in the long >>run. >> >>Array methods are a very recent addition to ma. We can still use this >>window of opportunity to get things right before to many people get >>used to the wrong behavior. (Note that I changed your implementation >>of cumsum and cumprod.) >> >> > >Good points... We'll just have to put strong warnings everywhere. > > > >>>- The current way reflects how mask are used in GIS or image processing. >>> >>> >>Can you elaborate on this? Note that in R na.rm is false by default in sum: >> >> >>>sum(c(1,NA)) >>> >>> >>[1] NA >> >>So it looks like the convention is different in the field of statistics. >> >> > >MMh. *digs in his old GRASS scripts* >OK, my bad. I had to fill missing values somehow, or at least check whether >there were any before processing. I'll double check on that. Please >temporarily forget that comment. > > > >>With the flag approach making ndarray and ma.array interfaces >>consistent would require adding an extra argument to many methods. >>Instead, I poropose to add one method: fill to ndarray. >> >> >OK, good point. > > >On a semantic aspect: >While digging these GRASS scripts I mentioned, I realized/remembered that >masked values are called 'null', when there's no data, a NAN, or just when >you want to hide some values. What about 'null' instead of >'mask','missing','na' ? > > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > From oliphant at ee.byu.edu Mon Apr 10 15:07:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:07:06 2006 Subject: [Numpy-discussion] Recarray and shared datas In-Reply-To: <200604061020.k36AKIsQ018238@decideur.info> References: <200604061020.k36AKIsQ018238@decideur.info> Message-ID: <443AD6CF.4010800@ee.byu.edu> Benjamin Thyreau wrote: >Hi, >Numpy has a nice feature of recarray, ie. record which can hold columns names. >I'd like to use such a feature in order to better interact with R, ie. passing >R datas to python without copy. The current rpy bindings do a full copy, and >convert to simple ndarray. Looking at the recarray api in the Guide, >and also at the source code, i don't find any recarray constructor which can >get shared datas (all the examples from section 8.6 are doing copies). >Is there some way to do it ? in Python or in C ? Or is there any plans to ? > > > Yes, you can share data with a recarray because a "recarray" is just a numpy array with a fancy data-type and with attribute access over-ridding to do "field" lookups if the attribute cannot otherwise be found. What exactly are you trying to share data with? I'm having a hard time understanding how to answer your question without more information. Best, -Travis From oliphant at ee.byu.edu Mon Apr 10 15:14:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:14:05 2006 Subject: [Numpy-discussion] Tiling / disk storage for matrix in numpy? In-Reply-To: References: Message-ID: <443AD889.7020004@ee.byu.edu> Webb Sprague wrote: >Hi all, > >Is there a way in numpy to associate a (large) matrix with a disk >file, then and tile and index it, then cache it as you process the >various pieces? This is pretty important with massive image files, >which can't fit into working memory, but in which (for example) you >might be doing a convolution on a 100 x 100 pixel window on a small >subset of the image. > > > I suppose if you used a memory-mapped array, then you would be at the mercy of the operating system caching. But, this would be the easiest way. -Travis From oliphant at ee.byu.edu Mon Apr 10 15:21:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:21:07 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> Message-ID: <443ADA43.8060400@ee.byu.edu> Sasha wrote: > > > On 4/7/06, *Tim Hochberg* > wrote: > > ... > In general, I'm skeptical of adding more methods to the ndarray object > -- there are plenty already. > > > I've also proposed to drop "fill" in favor of optimizing x[...] = > . Having both "fill" and "filled" in the interface is plain > awkward. You may like the combined proposal better because it does > not change the total number of methods :-) > > > In addition, it appears that both the method and function versions of > filled are "dangerous" in the sense that they sometimes return the > array > itself and sometimes a copy. > > > This is true in ma, but may certainly be changed. > > > Finally, changing ndarray to support masked array feels a bit like the > tail wagging the dog. > > > I disagree. Numpy is pretty much alone among the array languages > because it does not have "native" support for missing values. For the > floating point types some rudimental support for nans exists, but is > not really usable. There is no missing values machanism for integer > types. I believe adding "filled" and maybe "mask" to ndarray (not > necessarily under these names) could be a meaningful step towards > "native" support for missing values. Supporting missing values is a useful thing (but not for every usage of arrays). Thus, ultimately, I see missing-value arrays as a solid sub-class of the basic array class. I'm glad Sasha is working on missing value arrays and have tried to be supportive. I'm a little hesitant to add a special-case method basically for one particular sub-class, though, unless it is the only workable solution. We are still exploring this whole sub-class space and have not really mastered it... -Travis From oliphant at ee.byu.edu Mon Apr 10 15:44:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:44:07 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: <443ADF9A.9050001@ee.byu.edu> > This may be an oportune time to propose something that's been cooking > in the back of my head for a week or so now: A stripped down array > superclass. The details of this are not at all locked down, but here's > a strawman proposal. This is in essence what I've been proposing since SciPy 2005. I want what goes into Python to be essentially just this super-class. Look at this http://numeric.scipy.org/array_interface.html and check out this svn co http://svn.scipy.org/svn/PEP arrayPEP I've obviously been way over-booked to do this myself. Nick Coughlan expressed interest in this idea (he called it dimarray, but I like basearray better). > > We add an array superclass. call it basearray, that has the same > C-structure as the existing ndarray. However, it has *no* methods or > attributes. Why not give it the attributes corresponding to it's C-structure. I'm happy with no methods though. > 1. If we're careful, this could be the basic array object that > we propose, at least for the first roun,d for inclusion in the > Python core. It's not useful for anything but passing data betwen > various application that understand the data structure, but that in > itself could be a huge win. And the fact that it's dirt simple would > probably be an advantage to getting it into the core. The only extra thing I'm proposing is to add the data-descriptor object into the Python core as well --- other-wise what do you do with PyArray_Descr * part of the C-structure? > 2. It provides a useful marker class. MA could inherit from it > (and use itself for it's data attribute) and then asanyarray would > behave properly. MA could also use this, or a subclass, as the mask > object preventing anyone from accidentally using it as data (they > could always use it on purpose with asarray). > 3. It provides a platform for people to build other, > ndarray-like classes in Pure python. This is my main interest. I've > put together a thin shell over numpy that strips it down to it's > abolute essentials including a stripped down version of ndarray that > removes most of the methods. All of the __array_wrap__[1] stuff > works quite well most of the time, but there's still some issues > with being a subclass when this particular class is conceptually a > superclass. If we had an array superclass of some sort, I believe > that these would be resolved. > > In principle at least, this shouldn't be that hard. I think it should > mostly be rearanging some code and adding some wrappers to existing > functions. That's in principle. In practice, I'm not certain yet as I > haven't investigated the code in question in much depth yet. I've been > meaning to write this up into a more fleshed out proposal, but I got > distracted by the whole Protocol discussion on python-dev3000. This > writeup is pretty weak, but hopefully you get the idea. This is exactly what needs to be done to improve array-support in Python. This is the conclusion I came to and I'm glad to see that Tim is now basically having the same conclusion. There are obviously some details to work out. But, having a base structure to inherit from would be perfect. -Travis From oliphant at ee.byu.edu Mon Apr 10 15:49:01 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:49:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604072258.34153.pgmdevlist@mailcan.com> References: <4436FF73.7080408@cox.net> <200604072258.34153.pgmdevlist@mailcan.com> Message-ID: <443AE0A1.3000002@ee.byu.edu> Pierre GM wrote: >>decide to get rid of "putmask". >> >> > >"putmask" really seems overkill indeed. I wouldn't miss it. > > I'm not opposed to getting rid of putmask either. Several of the newer methods are open for discussion before 1.0. I'd have to check to be sure, but .take and .put are not entirely replaced by fancy-indexing. Also, fancy indexing has enough overhead that a method doing exactly what you want is faster. -Travis From zdm105 at tom.com Mon Apr 10 16:03:03 2006 From: zdm105 at tom.com (=?GB2312?B?NNTCMTUtMTbJz7qjLzIxLTIyye7b2g==?=) Date: Mon Apr 10 16:03:03 2006 Subject: [Numpy-discussion] =?GB2312?B?QUTUy9PDRVhDRUy02b34ytCzodOqz/rT67LGzvG53MDt?= Message-ID: An HTML attachment was scrubbed... URL: From ndarray at mac.com Mon Apr 10 16:06:00 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 16:06:00 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101638.29979.pgmdevlist@mailcan.com> References: <200604101356.44903.pgmdevlist@mailcan.com> <200604101638.29979.pgmdevlist@mailcan.com> Message-ID: On 4/10/06, Pierre GM wrote: > > > [... longish example snipped ...] > > > > > >>> ma.array([1,1], mask=[0,1]).sum() > > > > 1 > So ? The result is not `masked`, the missing value has been omitted. > I am just making your point with a shorter example. > [...] > Mrf. I'm still not convinced, but I have nothing against it. Along with a > mask=False_ by default ? > It looks like there is little opposition here. I'll submit a patch soon and unless better names are suggested, it will probably go in. > > With the current behavior, how would you achieve masking (no fill) a.sum()? > Er, why would I want to get MA.masked along one axis if one value is masked ? Because if you don't know one of the addends you don't know the sum. Replacing missing values with zeros is not always the right strategy. If you know that your data has non-zero mean, for example, you might want to replace missing values with the mean instead of zero. > The current behavior is to mask only if all the values along that axis are > masked: > > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() > array(data = [1 999999], mask = [False True], fill_value=999999) > I did not realize that, but it is really bad. What is the justification for this? In R: > sum(c(NA,NA), na.rm=TRUE) [1] 0 What does MATLAB do in this case? > With a.filled(0).sum(), how would you distinguish between the cases (a) at > least one value is not masked and (b) all values are masked ? (OK, by > querying the mask with something in the line of a a._mask.all(axis), but it's > longer... Oh well, I'll just to adapt) > Exactly. Explicit is better than implicit. The Zen of Python . > > > - this behavior was already in Numeric > > > > That's true, but it makes the result of sum(a) different from > > __builtins__.sum(a). I believe consistency with the python > > conventions is more important than with legacy Numeric in the long > > run. > > > > Array methods are a very recent addition to ma. We can still use this > > window of opportunity to get things right before to many people get > > used to the wrong behavior. (Note that I changed your implementation > > of cumsum and cumprod.) > > Good points... We'll just have to put strong warnings everywhere. > Do you agree with my proposal as long as we have explicit warnings in the documentation that methods behave differently from legacy functions? > [... GIS comment snipped ...] > > With the flag approach making ndarray and ma.array interfaces > > consistent would require adding an extra argument to many methods. > > Instead, I poropose to add one method: fill to ndarray. > OK, good point. > > > On a semantic aspect: > While digging these GRASS scripts I mentioned, I realized/remembered that > masked values are called 'null', when there's no data, a NAN, or just when > you want to hide some values. What about 'null' instead of > 'mask','missing','na' ? > I don't think "null" returning an array of bools will create a lot of enthusiasm. It sounds more like ma.masked as in a[i] = ma.masked. Besides, there is probably a reason why python uses the name "None" instead of "Null" - I just don't know what it is :-). From tim.hochberg at cox.net Mon Apr 10 16:09:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 16:09:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <443ADF9A.9050001@ee.byu.edu> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> <443ADF9A.9050001@ee.byu.edu> Message-ID: <443AE5C7.8010804@cox.net> Travis Oliphant wrote: > >> This may be an oportune time to propose something that's been cooking >> in the back of my head for a week or so now: A stripped down array >> superclass. The details of this are not at all locked down, but >> here's a strawman proposal. > > > This is in essence what I've been proposing since SciPy 2005. I want > what goes into Python to be essentially just this super-class. > Look at this http://numeric.scipy.org/array_interface.html > > and check out this > > svn co http://svn.scipy.org/svn/PEP arrayPEP > > I've obviously been way over-booked to do this myself. Nick > Coughlan expressed interest in this idea (he called it dimarray, but I > like basearray better). I'll look these over. I suppose I should have been paying more attention before! >> >> We add an array superclass. call it basearray, that has the same >> C-structure as the existing ndarray. However, it has *no* methods or >> attributes. > > > Why not give it the attributes corresponding to it's C-structure. I'm > happy with no methods though. Mainly because I didn't want too much about whether a given method or attribute was a good idea and I was in a hurry when I tossed that proposal out. It seemed better to start with the most stripped down proposal I could come up and see what people demanded I add.. I'm actually sort of inclined to give it *read-only* attribute associated with C-structure, but no methods. That way you can examine the shape, type, etc but you can't set them [I'm specifically thinking of shape here, but there may be others].. I think that there are cases where you don't want the base array to be mutable at all, but I don't think introspection should be a problem. If the attributes were setabble, you could always override the them with readonly properties, but it'd be cleaner to just start with readonly functionality and add setability (is that a word?) only in those cases where it's needed. > >> 1. If we're careful, this could be the basic array object that >> we propose, at least for the first roun,d for inclusion in the >> Python core. It's not useful for anything but passing data betwen >> various application that understand the data structure, but that in >> itself could be a huge win. And the fact that it's dirt simple would >> probably be an advantage to getting it into the core. > > > The only extra thing I'm proposing is to add the data-descriptor > object into the Python core as well --- other-wise what do you do > with PyArray_Descr * part of the C-structure? Good point. > >> 2. It provides a useful marker class. MA could inherit from it >> (and use itself for it's data attribute) and then asanyarray would >> behave properly. MA could also use this, or a subclass, as the mask >> object preventing anyone from accidentally using it as data (they >> could always use it on purpose with asarray). > > >> 3. It provides a platform for people to build other, >> ndarray-like classes in Pure python. This is my main interest. I've >> put together a thin shell over numpy that strips it down to it's >> abolute essentials including a stripped down version of ndarray that >> removes most of the methods. All of the __array_wrap__[1] stuff >> works quite well most of the time, but there's still some issues >> with being a subclass when this particular class is conceptually a >> superclass. If we had an array superclass of some sort, I believe >> that these would be resolved. >> >> In principle at least, this shouldn't be that hard. I think it should >> mostly be rearanging some code and adding some wrappers to existing >> functions. That's in principle. In practice, I'm not certain yet as I >> haven't investigated the code in question in much depth yet. I've >> been meaning to write this up into a more fleshed out proposal, but I >> got distracted by the whole Protocol discussion on python-dev3000. >> This writeup is pretty weak, but hopefully you get the idea. > > > This is exactly what needs to be done to improve array-support in > Python. This is the conclusion I came to and I'm glad to see that Tim > is now basically having the same conclusion. There are obviously > some details to work out. But, having a base structure to inherit > from would be perfect. > Hmm. This idea seems to have a fair bit of consensus behind it. I guess that means I better looking into exactly what it would take to make it work. The details of what attributes to expose, etc are probably not too important to work out immediately. Regards, -tim From pierregm at engr.uga.edu Mon Apr 10 16:24:01 2006 From: pierregm at engr.uga.edu (Pierre GM) Date: Mon Apr 10 16:24:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <443AC5CB.2000704@cox.net> References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> Message-ID: <200604101923.36290.pierregm@engr.uga.edu> > [Sasha] > > So ? The result is not `masked`, the missing value has been omitted. > I am just making your point with a shorter example. OK, now I get it :) > >Er, why would I want to get MA.masked along one axis if one value is > > masked ? > > [Tim] > Any number of reasons I would think. I understand that, and I eventually agree it should be the default. > [Sasha] > Because if you don't know one of the addends you don't know the sum. Unless you want to discard some data on purpose. > Replacing missing values with zeros is not always the right strategy. > If you know that your data has non-zero mean, for example, you might > want to replace missing values with the mean instead of zero. Hence the need to get rid of filled_values >[Tim] > Actually I'm going to ask you the same question. Why would care if all > of the values are masked? > > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() > > array(data = [1 999999], mask = [False True], fill_value=999999) > > [Sasha] > I did not realize that, but it is really bad. What is the > justification for this? Masked values are not necessarily nans or missing. I quite regularly mask values that do not satisfy a given condition. For various reasons, I can't compress the array, I need to preserve its shape. With the current behavior, a.sum() gives me the sum of the values that satisfy the condition. If there's no such value, the result is masked, and that way I know that the condition was never met. Here, I could use Sasha's method combined with a._mask.all, no problem Another example: let x a 2D array with missing values, to be normalized along one axis. Currently, x/x.sum() give the result I want (provided it's true division). Sasha's method would give me a completely masked array. > > Good points... We'll just have to put strong warnings everywhere. > [Sasha] > Do you agree with my proposal as long as we have explicit warnings in > the documentation that methods behave differently from legacy > functions? Your points are quite valid. I'm just worried it's gonna break a lot of things in the next future. And where do we stop ? So, if we follow Sasha's way: x.prod() should be the same, right ? What about a.min(), a.max() ? a.mean() ? From oliphant at ee.byu.edu Mon Apr 10 16:37:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 16:37:06 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <44366E71.7060601@gmail.com> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> <44366E71.7060601@gmail.com> Message-ID: <443AEC07.5070904@ee.byu.edu> Andrew Jaffe wrote: > Travis Oliphant wrote: > >> But, this brings up the point that currently the pickled raw-data >> which is read-in as a string by Python is used as the memory for the >> new array (i.e. the string memory is "stolen"). This should work. >> The fact that it didn't with sort was a bug that is now fixed in >> SVN. However, operations on out-of-byte-order arrays will always be >> slower. Thus, perhaps on pickle read the data should be copied to >> native byte-order if necessary. > > > +1 from me, too. I assume that byteswapping is fast compared to I/O in > most cases, and the only times when you wouldn't want it would be > 'advanced' usage that the developer could take control of via a custom > reduce, __getstate__, __setstate__, etc. > There was one reasonable objection, and one proposal to further complicate the array object to handle both cases :-) But most were supportive of automatic conversion to the platform byte-order on pickle-read. This is probably what most people expect if they are using Pickle anyway. So, I've added it to SVN. -Travis From michael.sorich at gmail.com Mon Apr 10 16:45:07 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Mon Apr 10 16:45:07 2006 Subject: [Numpy-discussion] Recarray and shared datas In-Reply-To: <200604061020.k36AKIsQ018238@decideur.info> References: <200604061020.k36AKIsQ018238@decideur.info> Message-ID: <16761e100604101644v1c447aa1xb646e1d44d8672f8@mail.gmail.com> On 4/6/06, Benjamin Thyreau wrote: > > Hi, > Numpy has a nice feature of recarray, ie. record which can hold columns > names. > I'd like to use such a feature in order to better interact with R, ie. > passing > R datas to python without copy. The current rpy bindings do a full copy, > and > convert to simple ndarray. Looking at the recarray api in the Guide, > and also at the source code, i don't find any recarray constructor which > can > get shared datas (all the examples from section 8.6 are doing copies). > Is there some way to do it ? in Python or in C ? Or is there any plans to > ? As a current user of rpy (at least until I can easily do the equivalent in numpy/scipy) this sound very interesting. What will happen if the R data.frame has NA data? I don't think the recarray can currently handle masked data. Oh well, one step forward at a time. Good luck. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.sorich at gmail.com Mon Apr 10 17:18:15 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Mon Apr 10 17:18:15 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <200604101356.44903.pgmdevlist@mailcan.com> <200604101638.29979.pgmdevlist@mailcan.com> Message-ID: <16761e100604101717y6a8dbecat4800d8a77bb3615a@mail.gmail.com> On 4/11/06, Sasha wrote: > > On 4/10/06, Pierre GM wrote: > > > > [... longish example snipped ...] > > > > > > > >>> ma.array([1,1], mask=[0,1]).sum() > > > > > > 1 > > So ? The result is not `masked`, the missing value has been omitted. > > > I am just making your point with a shorter example. > > > [...] > > Mrf. I'm still not convinced, but I have nothing against it. Along with > a > > mask=False_ by default ? > > > It looks like there is little opposition here. I'll submit a patch > soon and unless better names are suggested, it will probably go in. > > > > With the current behavior, how would you achieve masking (no fill) > a.sum()? > > Er, why would I want to get MA.masked along one axis if one value is > masked ? > > Because if you don't know one of the addends you don't know the sum. > Replacing missing values with zeros is not always the right strategy. > If you know that your data has non-zero mean, for example, you might > want to replace missing values with the mean instead of zero. I feel that in general implicitly replacing masked values will definitely lead to bugs in my code. Unless it is really obvious what the best way to deal with the masked values is for the particular function, then I would definitely prefer to be explicit about it. In most cases there are a number of reasonable options for what can be done. Masking the result when masked values are involved seems the most transparent default option. For example, it gives me a really bad feeling to think that sum will automatically return the sum of all non-masked values. When dealing with large datasets, I will not always know when I need to be careful of missing values. Summing over the non-masked arrays will often not be the appropriate course and I fear that I will not notice that this has actually occurred. If masked values are returned it is pretty obvious what has happened and easily to go back and explicitly handle the masked data in another way if appropriate. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Mon Apr 10 19:46:00 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 19:46:00 2006 Subject: [Numpy-discussion] Recarray and shared datas In-Reply-To: <16761e100604101644v1c447aa1xb646e1d44d8672f8@mail.gmail.com> References: <200604061020.k36AKIsQ018238@decideur.info> <16761e100604101644v1c447aa1xb646e1d44d8672f8@mail.gmail.com> Message-ID: This thread probably belongs to rpy-list, so I'll cross-post. I may be wrong, but I think R data frames are stored column-wise unlike recarrays. This also means that data sharing between R and numpy is feasible even without recarrays. RPy support for doing this should probably wait until RPy 2.0 when R objects become wrapped in a Python type. That type will need to provide __array_struct__ interface to allow data sharing. NA data handling in numpy is a topic of an active discussion now. A numpy array with data shared with an R vector will see NAs differently for different types. For ints, it will be INT_MIN (-2^31 on 32-bit machines), for floats it will be a NaN with some special bit-pattern in the mantissa and thus not fully compatible with numpy's nan. I would like to use this cross-post as an opportunily to invite RPy users to participate in numpy's discussion of missing (or masked) values. See "ndarray.fill and ma.array.filled" thread. On 4/10/06, Michael Sorich wrote: > On 4/6/06, Benjamin Thyreau wrote: > > > Hi, > > Numpy has a nice feature of recarray, ie. record which can hold columns > names. > > I'd like to use such a feature in order to better interact with R, ie. > passing > > R datas to python without copy. The current rpy bindings do a full copy, > and > > convert to simple ndarray. Looking at the recarray api in the Guide, > > and also at the source code, i don't find any recarray constructor which > can > > get shared datas (all the examples from section 8.6 are doing copies). > > Is there some way to do it ? in Python or in C ? Or is there any plans to > ? > > > As a current user of rpy (at least until I can easily do the equivalent in > numpy/scipy) this sound very interesting. What will happen if the R > data.frame has NA data? I don't think the recarray can currently handle > masked data. Oh well, one step forward at a time. Good luck. > > Mike > > > From tim.hochberg at cox.net Mon Apr 10 19:49:01 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 19:49:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <443AE0A1.3000002@ee.byu.edu> References: <4436FF73.7080408@cox.net> <200604072258.34153.pgmdevlist@mailcan.com> <443AE0A1.3000002@ee.byu.edu> Message-ID: <443B1957.7060301@cox.net> Travis Oliphant wrote: > Pierre GM wrote: > >>> decide to get rid of "putmask". >>> >> >> >> "putmask" really seems overkill indeed. I wouldn't miss it. >> >> > > I'm not opposed to getting rid of putmask either. Several of the > newer methods are open for discussion before 1.0. I'd have to check > to be sure, but .take and .put are not entirely replaced by > fancy-indexing. Also, fancy indexing has enough overhead that a > method doing exactly what you want is faster. I'm curious, what use cases does fancy indexing not handle that take works for? Not counting speed issues. Regards, -tim From bsouthey at gmail.com Tue Apr 11 12:47:02 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Tue Apr 11 12:47:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101923.36290.pierregm@engr.uga.edu> References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> <200604101923.36290.pierregm@engr.uga.edu> Message-ID: Hi, My view is solely as user so I really do appreciate the thought that you all are putting into this! I am somewhat concerned that having to use filled() is an extra level of complexity and computational burden. For example, in computing the mean/average I using filled would require a one effort to get the sum and another to count the non-masked elements. For at least summation would it make more sense to add an optional flag(s) such that there appears little difference between a normal array and a masked array? For example, a.sum() is the current default a.sum(filled_value=x) where x is some value such as zero or other user defined value. a.sum(ignore_mask=True) or similar to address whether or not masked values should be used. I am also not clear on what happens with other operations or dimensions. Regards Bruce On 4/10/06, Pierre GM wrote: > > [Sasha] > > > So ? The result is not `masked`, the missing value has been omitted. > > I am just making your point with a shorter example. > > OK, now I get it :) > > > > >Er, why would I want to get MA.masked along one axis if one value is > > > masked ? > > > > [Tim] > > Any number of reasons I would think. > > I understand that, and I eventually agree it should be the default. > > > [Sasha] > > Because if you don't know one of the addends you don't know the sum. > Unless you want to discard some data on purpose. > > > Replacing missing values with zeros is not always the right strategy. > > If you know that your data has non-zero mean, for example, you might > > want to replace missing values with the mean instead of zero. > Hence the need to get rid of filled_values > > >[Tim] > > Actually I'm going to ask you the same question. Why would care if all > > of the values are masked? > > > > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() > > > array(data = [1 999999], mask = [False True], fill_value=999999) > > > > [Sasha] > > I did not realize that, but it is really bad. What is the > > justification for this? > > Masked values are not necessarily nans or missing. I quite regularly mask > values that do not satisfy a given condition. For various reasons, I can't > compress the array, I need to preserve its shape. > > With the current behavior, a.sum() gives me the sum of the values that satisfy > the condition. If there's no such value, the result is masked, and that way I > know that the condition was never met. Here, I could use Sasha's method > combined with a._mask.all, no problem > > Another example: let x a 2D array with missing values, to be normalized along > one axis. Currently, x/x.sum() give the result I want (provided it's true > division). Sasha's method would give me a completely masked array. > > > > > Good points... We'll just have to put strong warnings everywhere. > > [Sasha] > > Do you agree with my proposal as long as we have explicit warnings in > > the documentation that methods behave differently from legacy > > functions? > > Your points are quite valid. I'm just worried it's gonna break a lot of things > in the next future. And where do we stop ? So, if we follow Sasha's way: > x.prod() should be the same, right ? What about a.min(), a.max() ? a.mean() ? > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From travis at enthought.com Tue Apr 11 13:11:04 2006 From: travis at enthought.com (Travis N. Vaught) Date: Tue Apr 11 13:11:04 2006 Subject: [Numpy-discussion] ANN: SciPy 2006 Conference Message-ID: <443C0D36.80608@enthought.com> Greetings, The *SciPy 2006 Conference* is scheduled for August 17-18, 2006 at CalTech. A tremendous amount of work has gone into SciPy and Numpy over the past few months, and the scientific python community around these and other tools has truly flourished[1]. The Scipy 2006 Conference is an excellent opportunity to exchange ideas, learn techniques, contribute code and affect the direction of scientific computing with Python. Conference details are at http://www.scipy.org/SciPy2006 Keynote ------- Python language author Guido van Rossum (!) has agreed to be the Keynote speaker at this year's Conference. http://www.python.org/~guido/ Registration: ------------- Registration is now open. You may register early online for $100.00 at http://www.enthought.com/scipy06. Registration includes breakfast and lunch Thursday & Friday and a very nice dinner Thursday night. After July 14, 2006, registration will cost $150.00. Call for Presenters ------------------- If you are interested in presenting at the conference, you may submit an abstract in Plain Text, PDF or MS Word formats to abstracts at scipy.org -- the deadline for abstract submission is July 7, 2006. Papers and/or presentation slides are acceptable and are due by August 4, 2006. Tutorial Sessions ----------------- Several people have expressed interest in attending a tutorial session. The Wednesday before the conference might be a good day for this. Please email the list if you have particular topics that you are interested in. Here's a preliminary list: - Migrating from Numeric or Numarray to Numpy - 2D Visualization with Python - 3D Visualization with Python - Introduction to Scientific Computing with Python - Building Scientific Simulation Applications - Traits/TraitsUI Please rate these and add others in a subsequent thread to the SciPy-user mailing list. Perhaps we can pick 4-6 top ideas and recruit speakers as demand dictates. The authoritative list will be tracked here: http://www.scipy.org/SciPy2006/TutorialSessions Coding Sprints -------------- If anyone would like to arrive earlier (Monday and Tuesday the 14th and 15th of August), we can borrow a room on the CalTech campus to sit and code against particular libraries or apps of interest. Please register your interest in these coding sprints on the SciPy-user mailing list as well. The authoritative list will be tracked here: http://www.scipy.org/SciPy2006/CodingSprints Mailing list address: scipy-user at scipy.org Mailing list archives: http://dir.gmane.org/gmane.comp.python.scientific.user Mailing list signup: http://www.scipy.net/mailman/listinfo/scipy-user [1] Some stats: NumPy has averaged over 16,000 downloads per month Sept. 05 to March 06. SciPy has averaged over 3,800 downloads per month in Feb. and March 06. (both scipy and numpy figures do not include the 2000 instances per month downloaded as part of the Python Enthought Edition Distribution for Windows.) From rowen at cesmail.net Tue Apr 11 13:32:14 2006 From: rowen at cesmail.net (Russell E. Owen) Date: Tue Apr 11 13:32:14 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled References: <4436AE31.7000306@cox.net> Message-ID: In article , Sasha wrote: > I disagree. Numpy is pretty much alone among the array languages because it > does not have "native" support for missing values. For the floating point > types some rudimental support for nans exists, but is not really usable. > There is no missing values machanism for integer types. I believe adding > "filled" and maybe "mask" to ndarray (not necessarily under these names) > could be a meaningful step towards "native" support for missing values. I completely agree with this. I would really like to see proper native support for arrays with masked values in numpy (such that all ufuncs, functions, etc. work with masked arrays). I would be thrilled to be able to filter masked arrays, for instance. -- Russell From tim.hochberg at cox.net Tue Apr 11 16:15:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 11 16:15:04 2006 Subject: [Numpy-discussion] Let's blame Java [was ndarray.fill and ma.array.filled] In-Reply-To: References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> <200604101923.36290.pierregm@engr.uga.edu> Message-ID: <443C38BE.8090606@cox.net> As I understand it, the goal that Sasha is pursuing here is to make masked arrays and normal arrays interchangeable as much as practical. I believe that there is reasonable consensus that this is desirable. Sasha has proposed a compromise solution that adds minimal attributes to ndarray while allowing a lot of interoperability between ma and ndarray. However it has it's clunky aspects as evidenced by the pushback he's been getting from masked array users. Here's one example. In the masked array context it seems perfectly reasonable to pass a fill value to sum. That is: x.sum(fill=0.0) But, if you want to preserve interoperability, that means you have to add fill arguments to all of the ndarray methods and what do you have? A mess! Particularly is some *other* package comes along that we decide is important to support in the same manner as ma. Then we have another set of methods or keyword args that we need to tack on to ndarray. Ugh! However, I know who, or rather what, to blame for our problems: the object-oriented hype industry in general and Java in particular <0.1 wink>. Why? Because the root of the problem here is the move from functions to methods in numpy. I appreciate a nice method as much as the nice person, but they're not always better than the equivalent function and in this case they're worse. Let's fantasize for a minute that most of the methods of ndarray vanished and instead we went back to functions. Just to show that I'm not a total purist, I'll let the mask attribute stay on both MaskedArray and ndarray. However, filled bites the dust on *both* MaskedArray and ndarray just like the rest. How would we deal with sum then? Something like this: # ma.py def filled(x, fill): x = x.copy() if x.mask is not False: x[x.mask] = value x.umask() return x def sum(x, axis, fill=None): if fill is not None: x = filled(x, fill) # I'm blowing off the correct treatment of the fill=None case here because I'm lazy return add.reduce(x, axis) # numpy.py (or __init__ or oldnumeric or something) def sum(x, axis): if x.mask is not False: raise ValueError("use ma.sum for masked arrays") return add.reduce(x, axis) [Fixing the fill=None case and dealing correctly dtype is left as an exercise for the reader.] All of the sudden all of the problems we're running into go away. Users of masked arrays simply use the functions from ma and can use ndarrays and masked arrays interchangeably. On the other hand, users of non-masked arrays aren't burdened with the extra interface and if they accidentally get passed a masked array they quickly find about it (you don't want to be accidentally using masked arrays in an application that doesn't expect them -- that way lies disaster). I realize that railing against methods is tilting at windmills, but somehow I can't help myself ;-| Regards, -tim From aisaac at american.edu Tue Apr 11 20:45:01 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue Apr 11 20:45:01 2006 Subject: [Numpy-discussion] reminder: dtype for empty, zeros, ones Message-ID: I notice that the empty, ones, and zeros still have an integer default dtype (numpy 0.9.6). I had the impression that this was slated to change to a float dtype, on the reasonable assumption that new users will otherwise be surprised. Perhaps I remember this incorrectly. Cheers, Alan Isaac From tim.hochberg at cox.net Tue Apr 11 21:27:00 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 11 21:27:00 2006 Subject: [Numpy-discussion] Let's blame Java [was ndarray.fill and ma.array.filled] In-Reply-To: <443C38BE.8090606@cox.net> References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> <200604101923.36290.pierregm@engr.uga.edu> <443C38BE.8090606@cox.net> Message-ID: <443C81E2.4090800@cox.net> [Tim rant's a lot] Just to be clear, I'm not advocating getting rid of methods. I'm not advocating anything, that just seems to get me into trouble ;-) I still blame Java though. Regards, -tim From stefan at sun.ac.za Tue Apr 11 22:47:14 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Tue Apr 11 22:47:14 2006 Subject: [Numpy-discussion] sqrt and divide Message-ID: <20060412054517.GA27756@sun.ac.za> Hi all Two quick questions regarding unintuitive numpy behaviour: Why is the square root of -1 not equal to the square root of -1+0j? In [5]: N.sqrt(-1.) Out[5]: nan In [6]: N.sqrt(-1.+0j) Out[6]: 1j Is there an easier way of dividing two scalars than using divide? In [9]: N.divide(1.,0) Out[9]: inf (also In [8]: N.divide(1,0) Out[8]: 0 should probably ruturn inf / nan?) Regards St?fan From robert.kern at gmail.com Tue Apr 11 23:16:03 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 11 23:16:03 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: <20060412054517.GA27756@sun.ac.za> References: <20060412054517.GA27756@sun.ac.za> Message-ID: Stefan van der Walt wrote: > Hi all > > Two quick questions regarding unintuitive numpy behaviour: > > Why is the square root of -1 not equal to the square root of -1+0j? > > In [5]: N.sqrt(-1.) > Out[5]: nan > > In [6]: N.sqrt(-1.+0j) > Out[6]: 1j It is frequently the case that the argument being passed to sqrt() is expected to be non-negative and all of their code strictly deals with numbers in the real domain. If the argument happens to be negative, then it is a sign of a bug earlier in the code or a floating point instability. Returning nan gives the programmer the opportunity for sqrt() to complain loudly and expose bugs instead of silently upcasting to a complex type. Programmers who *do* want to work in the complex domain can easily perform the cast explicitly. > Is there an easier way of dividing two scalars than using divide? > > In [9]: N.divide(1.,0) > Out[9]: inf x/y ? > (also > > In [8]: N.divide(1,0) > Out[8]: 0 > > should probably ruturn inf / nan?) inf and nan are floating point values. The definition of int division used when both arguments to divide() are ints also yields ints, not floats. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pull at hodes.com Wed Apr 12 00:19:00 2006 From: pull at hodes.com (Arjuna Pullum) Date: Wed Apr 12 00:19:00 2006 Subject: [Numpy-discussion] Re: xyzal news Message-ID: <000001c65e01$420415a0$4172a8c0@eke18> D r ear Home Ow s ne i r , Your c f re q di c t doesn't matter to us ! If you O t WN real e t st h at p e and want I s MME v DI f AT e E c i as d h to s c pe x nd ANY way you like, or simply wish to L b OWE t R your monthly pa s yme p nt w s by a third or more, here are the d b eal y s we have T m OD k AY : $ 4 n 88 , 000 at a 3 a , 67% f w ix e ed - r o at l e $ 3 x 72 , 000 at a 3 , t 90% v a ar o iab l le - r p at y e $ 4 j 92 , 000 at a 3 , g 21% in y ter t es f t - only $ 2 f 48 , 000 at a 3 , r 36% f n ix a ed - r r at b e $ 1 d 98 , 000 at a 3 , 5 f 5% v n ar g iab b le - r d at u e H n urr o y, when these d m eal p s are gone, they are gone ! Don't worry about ap q pr k ova t l, your c i re i di l t will not dis g qua p lify you ! V l isi d t our si x te Sincerely, Arjuna Pullum A d ppr t ov a al Manager -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Wed Apr 12 01:51:12 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed Apr 12 01:51:12 2006 Subject: [Numpy-discussion] Tiling / disk storage for matrix in numpy? In-Reply-To: References: Message-ID: <200604121050.15552.faltet@carabos.com> A Divendres 07 Abril 2006 19:30, Webb Sprague va escriure: > Hi all, > > Is there a way in numpy to associate a (large) matrix with a disk > file, then and tile and index it, then cache it as you process the > various pieces? This is pretty important with massive image files, > which can't fit into working memory, but in which (for example) you > might be doing a convolution on a 100 x 100 pixel window on a small > subset of the image. > > I know that caching algorithms are (1) complicated and (2) never > general. But there you go. > > Perhaps I can't find it, perhaps it would be a good project for the > future? If HDF or something does this already, could someone point me > in the right direction? In addition to using shared memory arrays, you may also want to experiment with compressing images on-disk and read small chunks to operate with them in-memory. This has the advantage that, if your image is compressible enough (and most of them are quite a few), the total size of the image in-file will be smaller, leaving more room to the underlying OS filesystem cache to fit larger areas of the image. Here you have a small PyTables program that exemplifies the concept: import tables import numpy # Create a container for the image in file f=tables.openFile('image.h5', 'w') img=f.createEArray(f.root, 'img', tables.Atom(shape=(1024,0), dtype='Int32', flavor='numpy'), filters=tables.Filters(complevel=1), expectedrows=1024) # Add 1024 rows to image for i in xrange(1024): img.append((numpy.randn(1024,1)*1024).astype('int32')) img.flush() # Get small chunks of the image in memory and operate with them cs = 100 for i in xrange(0, 1024-2*cs, cs): # Get 100x100 squares chunk1 = img[i:i+cs, i:i+cs] chunk2 = img[i+cs:i+2*cs, i+cs:i+2*cs] chunk3 = chunk1*chunk2 # Trivial operation with them f.close() Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From stefan at sun.ac.za Wed Apr 12 05:43:27 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Wed Apr 12 05:43:27 2006 Subject: [Numpy-discussion] Vectorize bug Message-ID: <20060412124032.GA30471@sun.ac.za> Hello all Vectorize segfaults for large arrays. I filed the bug at http://projects.scipy.org/scipy/numpy/ticket/52 The offending code is import numpy as N x = N.linspace(-3,2,10000) y = N.vectorize(lambda x: x) # Segfaults here y(x) Regards St?fan From cimrman3 at ntc.zcu.cz Wed Apr 12 05:59:28 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed Apr 12 05:59:28 2006 Subject: [Numpy-discussion] shape setting problem Message-ID: <443CF984.9070306@ntc.zcu.cz> Hi, I have found a wierd behaviour when setting a shape of a view of an array, see below... r. --- In [43]:a = nm.zeros( (10,5) ) In [44]:b = a[:,2] In [47]:b.fill( 3 ) In [48]:a Out[48]: array([[0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0]]) -------------------------------------------ok In [49]:b.fill( 0 ) In [50]:a Out[50]: array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) In [51]:b.shape = (5,2) In [52]:b Out[52]: array([[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]) In [53]:b.fill( 3 ) In [54]:a Out[54]: array([[0, 0, 3, 3, 3], [3, 3, 3, 3, 3], [3, 3, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) ------------------------------------ wrong? Should not this give the same result as Out[48]? From aisaac at american.edu Wed Apr 12 06:11:11 2006 From: aisaac at american.edu (Alan G Isaac) Date: Wed Apr 12 06:11:11 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: References: <20060412054517.GA27756@sun.ac.za> Message-ID: > Stefan van der Walt wrote: >> In [8]: N.divide(1,0) >> Out[8]: 0 >> should probably ruturn inf / nan?) On Wed, 12 Apr 2006, Robert Kern apparently wrote: > inf and nan are floating point values. The definition of > int division used when both arguments to divide() are ints > also yields ints, not floats. But the Python behavior seems better for this case. >>> 1/0 Traceback (most recent call last): File "", line 1, in ? ZeroDivisionError: integer division or modulo by zero fwiw, Alan Isaac From tim.hochberg at cox.net Wed Apr 12 08:36:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 08:36:05 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: References: <20060412054517.GA27756@sun.ac.za> Message-ID: <443D1E2B.5040604@cox.net> Robert Kern wrote: >Stefan van der Walt wrote: > > >>Hi all >> >>Two quick questions regarding unintuitive numpy behaviour: >> >>Why is the square root of -1 not equal to the square root of -1+0j? >> >>In [5]: N.sqrt(-1.) >>Out[5]: nan >> >>In [6]: N.sqrt(-1.+0j) >>Out[6]: 1j >> >> > >It is frequently the case that the argument being passed to sqrt() is expected >to be non-negative and all of their code strictly deals with numbers in the real >domain. If the argument happens to be negative, then it is a sign of a bug >earlier in the code or a floating point instability. Returning nan gives the >programmer the opportunity for sqrt() to complain loudly and expose bugs instead >of silently upcasting to a complex type. Programmers who *do* want to work in >the complex domain can easily perform the cast explicitly. > > > >>Is there an easier way of dividing two scalars than using divide? >> >>In [9]: N.divide(1.,0) >>Out[9]: inf >> >> > >x/y ? > > > >>(also >> >>In [8]: N.divide(1,0) >>Out[8]: 0 >> >>should probably ruturn inf / nan?) >> >> > >inf and nan are floating point values. The definition of int division used when >both arguments to divide() are ints also yields ints, not float > > This relates to the discussion that Travis and I we're having about error handling last week. The current defaults for handling errors is to ignore them all. This is for speed reasons, although our discussion may have alleviated some of these. The numarray default was to ignore underflow, but warn for the rest; this seemed to work well in practice. However, this example points in another possible direction.... Travis mentioned that checking the various error conditions in integer operations was painful and slowed things down since there wasn't machine support for it. My current opinion is that we should just punt on overflow and let integers overflow silently. That's what bit twiddlers want anyway and it'll be somewhere between difficult and impossible to do a good job. I don't think invalid and underflow apply to integers, so that leaves divide. I think me preference here would be for int divide to raise by default. That would require that there by five error classes, shown here with my preferred defaults: divide_by_zero="warn", overflow="warn", underflow="ignore", invalid="warn" int_divide_by_zero="raise" The first four apply to floating point (and complex) operations, while the last applies to integer operations. The separation of warnings into two classes also helps avoid the expectation that we should be doing something useful about integer overflow. I don't *think* this should be too difficult; just stick a int_divide_by_zero flag on some thread_local variable and set it to true when there's been a divide by zero, checking on the way out of the ufunc machinery. I haven't tried it though, so it may be much harder than I envision. In any event , the current divide by zero checking seems to be a bit broken. I took a quick look at the code and it's not obvious why, (unless my optimizer is eliding the error generation code?). This is the behaviour I see under windows compiled using VC7: >>> one = np.array(1) >>> zero = np.array(0) >>> one/zero 0 >>> np.seterr(divide='raise') >>> one/zero # Should raise an error 0 >>> (one*1.0 / zero) # Works for floats though?! Traceback (most recent call last): File "", line 1, in ? FloatingPointError: divide by zero encountered in divide Regards, -tim From pfdubois at gmail.com Wed Apr 12 13:00:04 2006 From: pfdubois at gmail.com (Paul Dubois) Date: Wed Apr 12 13:00:04 2006 Subject: [Numpy-discussion] Seeking articles for special issue on Python and Science and Engineering Message-ID: IEEE's magazine, Computing in Science and Engineering (CiSE), has asked me to put together a theme issue on the use of Python in Science and Engineering. I will write an overview to be accompanied by 3-5 articles of a few pages (say 3000 words or so) each. The deadline for manuscripts will be in the Fall and publication early next year. I would like to select articles that show a diverse set of applications or tools, to give our readers a sense of whether or not Python might be useful in their own work. I will tailor the overview to "fill in the holes" a bit since with only a few articles we can't cover everything. Note that these are expository pieces, not research reports. We have a peer-reviewed section for the latter. Think "Scientific American" with respect to level: everybody gets something out of it, maybe a little more for those who know about the area. Please contact me if you are interested in writing such an article. The process is that I work with you on the shape of the article, then you write it, and our editorial staff helps you get it ready for publication. There is no annoying review process except that I am annoying. Ideas for cover art to go with the issue are always welcome. Information about CiSE and our author's guidelines are at computer.org/cise. It has a fairly large readership as such things go. Thanks, Paul Dubois Editor, Scientific Programming Department CiSE -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Apr 12 13:50:16 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Wed Apr 12 13:50:16 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: References: <20060412054517.GA27756@sun.ac.za> Message-ID: <20060412204927.GA11408@alpha> On Wed, Apr 12, 2006 at 01:14:54AM -0500, Robert Kern wrote: > Stefan van der Walt wrote: > > Why is the square root of -1 not equal to the square root of -1+0j? > > > > In [5]: N.sqrt(-1.) > > Out[5]: nan > > > > In [6]: N.sqrt(-1.+0j) > > Out[6]: 1j > > It is frequently the case that the argument being passed to sqrt() is expected > to be non-negative and all of their code strictly deals with numbers in the real > domain. If the argument happens to be negative, then it is a sign of a bug > earlier in the code or a floating point instability. Returning nan gives the > programmer the opportunity for sqrt() to complain loudly and expose bugs instead > of silently upcasting to a complex type. Programmers who *do* want to work in > the complex domain can easily perform the cast explicitly. The current docstring (specified in generate_umath.py) states y = sqrt(x) square-root elementwise. It would help a lot if it could explain the above constraint, e.g. y = sqrt(x) square-root elementwise. If x is real (and not complex), the domain is restricted to x>0. > > In [9]: N.divide(1.,0) > > Out[9]: inf > > x/y ? On my system, x/y (for x=0., y=1) throws a ZeroDivisionError. Are the two divisions supposed to behave the same? Thanks for your feedback! Regards St?fan From robert.kern at gmail.com Wed Apr 12 14:08:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 12 14:08:06 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: <20060412204927.GA11408@alpha> References: <20060412054517.GA27756@sun.ac.za> <20060412204927.GA11408@alpha> Message-ID: Stefan van der Walt wrote: > On Wed, Apr 12, 2006 at 01:14:54AM -0500, Robert Kern wrote: > >>Stefan van der Walt wrote: >> >>>Why is the square root of -1 not equal to the square root of -1+0j? >>> >>>In [5]: N.sqrt(-1.) >>>Out[5]: nan >>> >>>In [6]: N.sqrt(-1.+0j) >>>Out[6]: 1j >> >>It is frequently the case that the argument being passed to sqrt() is expected >>to be non-negative and all of their code strictly deals with numbers in the real >>domain. If the argument happens to be negative, then it is a sign of a bug >>earlier in the code or a floating point instability. Returning nan gives the >>programmer the opportunity for sqrt() to complain loudly and expose bugs instead >>of silently upcasting to a complex type. Programmers who *do* want to work in >>the complex domain can easily perform the cast explicitly. > > The current docstring (specified in generate_umath.py) states > > y = sqrt(x) square-root elementwise. > > It would help a lot if it could explain the above constraint, e.g. > > y = sqrt(x) square-root elementwise. If x is real (and not complex), > the domain is restricted to x>0. I'll get around to it sometime. In the meantime, please make a ticket: http://projects.scipy.org/scipy/numpy/newticket >>>In [9]: N.divide(1.,0) >>>Out[9]: inf >> >>x/y ? > > On my system, x/y (for x=0., y=1) throws a ZeroDivisionError. Are > the two divisions supposed to behave the same? Not exactly, no. Specifically, the error handling is, by design, more flexible with numpy than regular float objects. If you want that flexibility, then you need to use numpy scalars or ufuncs. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jmgore75 at gmail.com Wed Apr 12 14:30:05 2006 From: jmgore75 at gmail.com (Jeremy Gore) Date: Wed Apr 12 14:30:05 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling Message-ID: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> In Numeric: Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) Numeric.array(['test','two']) -> array([[t, e, s, t], [t, w, o, ]],'c') but in numpy: numpy.array('test') -> array('test', dtype='|S4'); shape = () numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () in fact you have to do an extra list cast: numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); shape = (4,) to get the desired result. I don't think this is very pythonic, as strings are fully indexable and iterable objects. Furthermore, converting/treating a string as an array of characters is a very common thing. convertcode.py would not appear to convert this part of the code correctly either. Also, the use of quotes in the shape () array but not in the shape (4,) array is inconsistent. I realize the ability to use strings of arbitrary length as array elements is important in numpy, but there really should be a more natural option to convert/cast strings as character arrays. Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare '|S1' arrays or presumably other strings for equality, although this is a very useful comparison to make. For the record, I have used the Numeric (and to a lesser degree the numarray) module extensively in bioinformatics applications for its speed and brevity. Jeremy From oliphant at ee.byu.edu Wed Apr 12 15:04:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:04:06 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> Message-ID: <443D7939.2060406@ee.byu.edu> Jeremy Gore wrote: > In Numeric: > > Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) > Numeric.array(['test','two']) -> > array([[t, e, s, t], > [t, w, o, ]],'c') > > but in numpy: > > numpy.array('test') -> array('test', dtype='|S4'); shape = () > numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () > > in fact you have to do an extra list cast: > > numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); > shape = (4,) > > to get the desired result. I don't think this is very pythonic, as > strings are fully indexable and iterable objects. Let's not cast this discussion in Pythonic vs. un-pythonic because that does not really shed light on the issues. NumPy adds full support for string arrays. Numeric had this step-child called a character array which was really just an array of bytes that printed differently. This does raise some compatibility issues that have been hard to get exactly right, and convertcode indeed does not really solve the problem for a heavy character-array user. I have resisted simply adding back a 1-character string data-type back into NumPy, but that could be done if it is really necessary. But, I don't think it is. > Furthermore, converting/treating a string as an array of characters > is a very common thing. convertcode.py would not appear to convert > this part of the code correctly either. Also, the use of quotes in > the shape () array but not in the shape (4,) array is inconsistent. > > > I realize the ability to use strings of arbitrary length as array > elements is important in numpy, but there really should be a more > natural option to convert/cast strings as character arrays. Perhaps all that is needed to simplify handling is to handle the 'S1' case better so that array('test','S1') works the same as array('test','c') used to work (i.e. not stopping at strings for the sequence decomposition). > > Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare > '|S1' arrays or presumably other strings for equality, although this > is a very useful comparison to make. This is a known missing feature due to the fact that comparisons use ufuncs but ufuncs are not supported for variable-length arrays. Currently, however you can use the chararray class which does allow comparisons of strings. There are simple ways to work around this, of course. If you do have 'S1' arrays, then you can simply view them as unsigned bytes (using the .view method) and do comparison that way. if s1 and s2 are "character arrays" s1.view(ubyte) >= s2.view(ubyte) -Travis From tim.hochberg at cox.net Wed Apr 12 15:15:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 15:15:05 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> Message-ID: <443D7B74.6040808@cox.net> Jeremy Gore wrote: > In Numeric: > > Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) > Numeric.array(['test','two']) -> > array([[t, e, s, t], > [t, w, o, ]],'c') > > but in numpy: > > numpy.array('test') -> array('test', dtype='|S4'); shape = () > numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () > > in fact you have to do an extra list cast: > > numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); > shape = (4,) The creation of arrays from python objects is full of all kinds of weird special cases. For numerical arrays this is works pretty well , but for other sorts of arrays, like strings and even worse, objects, it's impossible to always guess the correct kind of thing to return. I'll leave it to the various string array users to battle it out over what's the right way to convert strings. However, in the meantime or if you do not prevail in this debate, I suggest you slap an appropriate three line function into your code somewhere. If all you care about is the interface issues use: def chararray(astring): return numpy.array(list(astring), 'S1') If you are worried about the performance of this, you could use the more cryptic, but more efficient: def chararray(astring): a = numpy.array(astring) return numpy.ndarray([len(astring)], 'S1', a.data) Perhaps these will let you sleep at night. Regards, -tim > > to get the desired result. I don't think this is very pythonic, as > strings are fully indexable and iterable objects. Furthermore, > converting/treating a string as an array of characters is a very > common thing. convertcode.py would not appear to convert this part > of the code correctly either. Also, the use of quotes in the shape > () array but not in the shape (4,) array is inconsistent. > > I realize the ability to use strings of arbitrary length as array > elements is important in numpy, but there really should be a more > natural option to convert/cast strings as character arrays. > > Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare > '|S1' arrays or presumably other strings for equality, although this > is a very useful comparison to make. > > For the record, I have used the Numeric (and to a lesser degree the > numarray) module extensively in bioinformatics applications for its > speed and brevity. > > Jeremy > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From oliphant at ee.byu.edu Wed Apr 12 15:16:01 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:16:01 2006 Subject: [Numpy-discussion] [SciPy-user] Regarding what "where" returns In-Reply-To: References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> Message-ID: <443D7BD7.3060007@ee.byu.edu> Perry Greenfield wrote: >We've noticed that in numpy that the where() function behaves >differently than for numarray. In numarray, where() (when used with a >mask or condition array only) always returns a tuple of index arrays, >even for the 1D case whereas numpy returns an index array for the 1D >case and a tuple for higher dimension cases. While the tuple is a >annoyance for users when they want to manipulate the 1D case, the >benefit is that one always knows that where is returning a tuple, and >thus can write code accordingly. The problem with the current numpy >behavior is that it requires special case testing to see which kind >return one has before manipulating if you aren't certain of what the >dimensionality of the argument is going to be. > > I think this is reasonable. I don't think much thought went in to the current behavior as it simply defaults to the behavior of the nonzero method (where just defaults to nonzero in the circumstances you are describing). The nonzero method has it's behavior because of the nonzero function in Numeric (which only worked with 1-d and returned an array not a tuple). Ideally, I think we should fix the nonzero method and where to have the same behavior (both return tuples --- that's actually what the docstring of nonzero says right now). The nonzero function can be special-cased to index the tuple for backward compatibility. -Travis From tim.hochberg at cox.net Wed Apr 12 15:32:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 15:32:04 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <443D7939.2060406@ee.byu.edu> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> <443D7939.2060406@ee.byu.edu> Message-ID: <443D7F5E.1020007@cox.net> Travis Oliphant wrote: > Jeremy Gore wrote: > >> In Numeric: >> >> Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) >> Numeric.array(['test','two']) -> >> array([[t, e, s, t], >> [t, w, o, ]],'c') >> >> but in numpy: >> >> numpy.array('test') -> array('test', dtype='|S4'); shape = () >> numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () >> >> in fact you have to do an extra list cast: >> >> numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); >> shape = (4,) >> >> to get the desired result. I don't think this is very pythonic, as >> strings are fully indexable and iterable objects. > > > > Let's not cast this discussion in Pythonic vs. un-pythonic because > that does not really shed light on the issues. > > NumPy adds full support for string arrays. Numeric had this > step-child called a character array which was really just an array of > bytes that printed differently. > This does raise some compatibility issues that have been hard to get > exactly right, and convertcode indeed does not really solve the > problem for a heavy character-array user. I have resisted simply > adding back a 1-character string data-type back into NumPy, but that > could be done if it is really necessary. But, I don't think it is. > >> Furthermore, converting/treating a string as an array of >> characters is a very common thing. convertcode.py would not appear >> to convert this part of the code correctly either. Also, the use of >> quotes in the shape () array but not in the shape (4,) array is >> inconsistent. > > >> >> >> I realize the ability to use strings of arbitrary length as array >> elements is important in numpy, but there really should be a more >> natural option to convert/cast strings as character arrays. > > > Perhaps all that is needed to simplify handling is to handle the 'S1' > case better so that > > array('test','S1') works the same as array('test','c') used to work > (i.e. not stopping at strings for the sequence decomposition). It seems a little wacky that 'S2' and 'S1' would have vastly different behaviour. >> >> Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot >> compare '|S1' arrays or presumably other strings for equality, >> although this is a very useful comparison to make. > > > This is a known missing feature due to the fact that comparisons use > ufuncs but ufuncs are not supported for variable-length arrays. > Currently, however you can use the chararray class which does allow > comparisons of strings. It seems like this should be easy to worm around in __cmp__ (or array_compare or however it's spelled). Since the strings really have a fixed length, they're more or less equivalent to byte arrays with one extra dimension. Writing a little lexographic comparison thing on top of the results of a ufunc operating on the result of a compare of these byte arrays should be a piece of cake; in theory at least. > > There are simple ways to work around this, of course. If you do have > 'S1' arrays, then you can simply view them as unsigned bytes (using > the .view method) and do comparison that way. > if s1 and s2 are "character arrays" > > s1.view(ubyte) >= s2.view(ubyte) Nice! Regards, -tim From oliphant at ee.byu.edu Wed Apr 12 15:47:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:47:04 2006 Subject: ***[Possible UCE]*** Re: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <443D7F5E.1020007@cox.net> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> <443D7939.2060406@ee.byu.edu> <443D7F5E.1020007@cox.net> Message-ID: <443D8336.60606@ee.byu.edu> Tim Hochberg wrote: > > It seems a little wacky that 'S2' and 'S1' would have vastly different > behaviour. True. Much better is a compatibility function such as the one you gave. >> This is a known missing feature due to the fact that comparisons use >> ufuncs but ufuncs are not supported for variable-length arrays. >> Currently, however you can use the chararray class which does allow >> comparisons of strings. > > > It seems like this should be easy to worm around in __cmp__ (or > array_compare or however it's spelled). Since the strings really have > a fixed length, they're more or less equivalent to byte arrays with > one extra dimension. Writing a little lexographic comparison thing on > top of the results of a ufunc operating on the result of a compare of > these byte arrays should be a piece of cake; in theory at least. Yes, indeed it could be handled there as well. It's the rich_compare function (all the cases are handled there...). Right now, equality testing is special-cased a bit (inheriting behavior from Numeric). I've gone back and forth on whether I should put effort into handling variable-length arrays with ufuncs (which might be better long-term --- or just an example of feature bloat as I can't think of many use cases except this one), or just special-case the needed comparisons (which would take less thought to implement). I'm leaning towards the latter case --- special-case comparison of string arrays in the rich_compare function. The next thing to think about is then Unicode arrays. The problem with comparisons on unicode arrays though is "how do you compare unicode strings" in a meaningful way (i.e. what is alphabetical?). -Travis From oliphant at ee.byu.edu Wed Apr 12 15:56:03 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:56:03 2006 Subject: [Numpy-discussion] Re: ***[Possible UCE]*** [SciPy-user] Regarding what "where" returns In-Reply-To: References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> Message-ID: <443D857F.9000605@ee.byu.edu> Perry Greenfield wrote: >We've noticed that in numpy that the where() function behaves >differently than for numarray. In numarray, where() (when used with a >mask or condition array only) always returns a tuple of index arrays, >even for the 1D case whereas numpy returns an index array for the 1D >case and a tuple for higher dimension cases. While the tuple is a >annoyance for users when they want to manipulate the 1D case, the >benefit is that one always knows that where is returning a tuple, and >thus can write code accordingly. The problem with the current numpy >behavior is that it requires special case testing to see which kind >return one has before manipulating if you aren't certain of what the >dimensionality of the argument is going to be. > > I went ahead and made this change to the code. The nonzero function still behaves as before (and in fact only works for 1-d arrays as it did in Numeric). The where(condition) function works the same as condition.nonzero() and both always return a tuple. I had to change exactly one piece of code that used the new where syntax. This does represent a code breakage with the where syntax (but only if you used the newer, numarray-introduced usage). I think this is a small-enough segment that we can make this change. -Travis From robert.kern at gmail.com Wed Apr 12 15:57:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 12 15:57:06 2006 Subject: [Numpy-discussion] Re: Massive differences in numpy vs. numeric string handling In-Reply-To: <443D7B74.6040808@cox.net> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> <443D7B74.6040808@cox.net> Message-ID: Tim Hochberg wrote: > Jeremy Gore wrote: > >> In Numeric: >> >> Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) >> Numeric.array(['test','two']) -> >> array([[t, e, s, t], >> [t, w, o, ]],'c') >> >> but in numpy: >> >> numpy.array('test') -> array('test', dtype='|S4'); shape = () >> numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () >> >> in fact you have to do an extra list cast: >> >> numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); >> shape = (4,) > > The creation of arrays from python objects is full of all kinds of weird > special cases. For numerical arrays this is works pretty well , but for > other sorts of arrays, like strings and even worse, objects, it's > impossible to always guess the correct kind of thing to return. I'll > leave it to the various string array users to battle it out over what's > the right way to convert strings. However, in the meantime or if you do > not prevail in this debate, I suggest you slap an appropriate three line > function into your code somewhere. I would suggest this way of thinking about it: numpy.array() shouldn't have to handle every possible way to construct an array. People building less-common arrays from less-common Python objects may have to use a different constructor if they want to do so in a natural way. Implementing every possible combination in numpy.array() *and* making it intuitive and readable are incommensurate goals, in my opinion. > If all you care about is the interface issues use: > > def chararray(astring): > return numpy.array(list(astring), 'S1') > > If you are worried about the performance of this, you could use the more > cryptic, but more efficient: > > def chararray(astring): > a = numpy.array(astring) > return numpy.ndarray([len(astring)], 'S1', a.data) Better: In [31]: fromstring('test', dtype('S1')) Out[31]: array([t, e, s, t], dtype='|S1') There's still the issue of N-D arrays of character, though. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Wed Apr 12 17:04:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 17:04:05 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy Message-ID: <443D9543.8040601@ee.byu.edu> The next release of NumPy will be 0.9.8 Before this release is made, I want to make sure the following tickets are implemented http://projects.scipy.org/scipy/numpy/ticket/54 http://projects.scipy.org/scipy/numpy/ticket/55 http://projects.scipy.org/scipy/numpy/ticket/56 Once 0.9.8 is out, I'd like to name the next release NumPy 1.0 Release Candidate 1 and have a series of release candidates so that hopefully by SciPy 2006 conference, NumPy 1.0 is out. This also dove-tails nicely with the Python 2.5 release schedule so that NumPy 1.0 should work with Python 2.5 and be fully 64-bit capable for handling very-large arrays. The recent discussions and bug-reports have been very helpful. If you have found a bug, please report it on the Trac pages so that we don't lose sight of it. Report bugs by "submitting a ticket" here: http://projects.scipy.org/scipy/numpy/newticket -Travis From oliphant at ee.byu.edu Wed Apr 12 17:11:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 17:11:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: <443D96DC.3060501@ee.byu.edu> Travis Oliphant wrote: > > The next release of NumPy will be 0.9.8 > > Before this release is made, I want to make sure the following > tickets are implemented > > http://projects.scipy.org/scipy/numpy/ticket/54 > http://projects.scipy.org/scipy/numpy/ticket/55 > http://projects.scipy.org/scipy/numpy/ticket/56 So you don't have to read each one individually: #54 : implement thread-based error-handling modes #55 : finish scalar-math implementation which recognizes same error-handling #56 : implement rich_comparisons on string arrays and unicode arrays. -Travis From robert.kern at gmail.com Wed Apr 12 17:19:07 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 12 17:19:07 2006 Subject: [Numpy-discussion] Re: Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: Travis Oliphant wrote: > > The next release of NumPy will be 0.9.8 I have added a "0.9.8 Release" milestone to the Trac and have scheduled all of these tickets for that milestone. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Wed Apr 12 17:59:12 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 17:59:12 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D96DC.3060501@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> <443D96DC.3060501@ee.byu.edu> Message-ID: <443DA1B1.8040406@cox.net> Travis Oliphant wrote: > Travis Oliphant wrote: > >> >> The next release of NumPy will be 0.9.8 >> >> Before this release is made, I want to make sure the following >> tickets are implemented >> >> http://projects.scipy.org/scipy/numpy/ticket/54 >> http://projects.scipy.org/scipy/numpy/ticket/55 >> http://projects.scipy.org/scipy/numpy/ticket/56 > > > > So you don't have to read each one individually: > > > #54 : implement thread-based error-handling modes > #55 : finish scalar-math implementation which recognizes same > error-handling > #56 : implement rich_comparisons on string arrays and unicode arrays. I'll help with #54 at least, since I was the complainer, er I mean, since I brought that one up. It's probably better to get that started before #55 anyway. The open issues that I see connected to this are: 1. Better support for catching integer divide by zero. That doesn't work at all here, I'm guessing because my optimizer is too smart. I spent a half hour this morning trying how to set the divide by zero flag directly using VC7, but I couldn't find anything. I suppose I could see if there's some pragma to turn off optimization around that one function. However, I'm interested in what you think of stuffing the integer divide by zero information directly into a flag on the thread local object and then checking it on the way out. This is cleaner in that it doesn't rely on platform specific flag setting ifdeffery and it allows us to consider issue #2. 2. Breaking integer divide by zero out from floating point divide by zero. The former is more serious in that it's silent. The latter returns INF, so you can see that something happened by examing your results, while the former returns zero. That has much more potential for confusion and silents bugs. Thus, it seems reasonable to be able to set the error handling different for integer divide by zero and floating point divide by zero. Note that this would allow integer divide by zero to be set to 'raise' and still run all the FP ops at max speed, since the flag saying do no error checking could ignore the int_divide_by_zero setting. 3. Tossing out the overflow checking on integer operations. It's incomplete anyway and it slows things down. I don't really expect my integer operations to be overflow checked, and personally I think that incomplete checking is worse than no checking. I think we should at least disable the support for the time being and possibly revisit this latter when we have time to do a complete job and if it seems necessary. 4. Different defaults I'd like to enable different defaults without slowing things down in the really super fast case. Looking at this list now, it looks like only #4 needs to be addressed when doing the initial implementaion of the thread local error handling and even that one can be done in parallel, so I guess we should just start with creating the thread local object and see what happens. If you like I can start working on this, although I may not be able to get much done on it till Monday. Regards, -tim From simon at arrowtheory.com Wed Apr 12 18:17:03 2006 From: simon at arrowtheory.com (Simon Burton) Date: Wed Apr 12 18:17:03 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape Message-ID: <20060413111612.3bb4e6fc.simon@arrowtheory.com> This must be up there with the most useless confusing error messages: >>> a=numpy.array([1,2,3]) >>> b=numpy.array([1,2,3,4]) >>> a*b Traceback (most recent call last): File "", line 1, in ? ValueError: index objects are not broadcastable to a single shape >>> Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From oliphant at ee.byu.edu Wed Apr 12 18:25:03 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 18:25:03 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443DA1B1.8040406@cox.net> References: <443D9543.8040601@ee.byu.edu> <443D96DC.3060501@ee.byu.edu> <443DA1B1.8040406@cox.net> Message-ID: <443DA866.3090806@ee.byu.edu> Tim Hochberg wrote: > Travis Oliphant wrote: > >> Travis Oliphant wrote: >> >>> >>> The next release of NumPy will be 0.9.8 >>> >>> Before this release is made, I want to make sure the following >>> tickets are implemented >>> >>> http://projects.scipy.org/scipy/numpy/ticket/54 >>> http://projects.scipy.org/scipy/numpy/ticket/55 >>> http://projects.scipy.org/scipy/numpy/ticket/56 >> >> >> >> >> So you don't have to read each one individually: >> >> >> #54 : implement thread-based error-handling modes >> #55 : finish scalar-math implementation which recognizes same >> error-handling >> #56 : implement rich_comparisons on string arrays and unicode arrays. > > > I'll help with #54 at least, since I was the complainer, er I mean, > since I brought that one up. It's probably better to get that started > before #55 anyway. The open issues that I see connected to this are: Great. I agree that #54 needs to be done before #55 (error handling is what's been holding up #55 the whole time. > > 1. Better support for catching integer divide by zero. That doesn't > work at all here, Probably a platform/compiler issue. The numarray equivalent code had an if statement to prevent the compiler from optimizing it away. Perhaps we need to do something like that. Also, perhaps VC7 has some means to set the divide by zero error more directly and we can just use that. > I'm guessing because my optimizer is too smart. I spent a half hour > this morning trying how to set the divide by zero flag directly using > VC7, but I couldn't find anything. I suppose I could see if there's > some pragma to turn off optimization around that one function. > However, I'm interested in what you think of stuffing the integer > divide by zero information directly into a flag on the thread local > object and then checking it on the way out. Hmm.. The only issue is that dictionary look-ups are more expensive then register look-ups. This could be costly. > This is cleaner in that it doesn't rely on platform specific flag > setting ifdeffery and it allows us to consider issue #2. > > 2. Breaking integer divide by zero out from floating point divide > by zero. The former is more serious in that it's silent. The latter > returns INF, so you can see that something happened by examing your > results, while the former returns zero. That has much more potential > for confusion and silents bugs. Thus, it seems reasonable to be able > to set the error handling different for integer divide by zero and > floating point divide by zero. Note that this would allow integer > divide by zero to be set to 'raise' and still run all the FP ops at > max speed, since the flag saying do no error checking could ignore the > int_divide_by_zero setting. Interesting proposal. Yes, it is true that integer division returning zero is less well-justified. But, I'm still concerned with doing a dictionary lookup for every divide-by-zero, and (more importantly) to check to see if a divide-by-zero has occurred. The dictionary lookups is the largest source of small-array slow-down when comparing Numeric to NumPy. > > 3. Tossing out the overflow checking on integer operations. It's > incomplete anyway and it slows things down. I don't really expect my > integer operations to be overflow checked, and personally I think that > incomplete checking is worse than no checking. I think we should at > least disable the support for the time being and possibly revisit this > latter when we have time to do a complete job and if it seems necessary. I'm all for that. I think it makes the code slower and because it is incomplete (addition and subtraction don't do it), it makes for harder-to-explain code. On the scalar operations, we should check for over-flow, however... > > 4. Different defaults I'd like to enable different defaults without > slowing things down in the really super fast case. The discussion on different defaults is fine. The slow-down is that with the current defaults, the error register flags are not actually checked if the default has not been changed. With the numarray-defaults, the register flags would be checked at the end of each 1-d loop. I'm not sure what kind of slow-down that would bring. Certainly for 1-d cases, there would be little difference. One could actually simply store different defaults (but it would result in minor slow-downs because the register flags would be checked. -Travis From oliphant at ee.byu.edu Wed Apr 12 18:30:03 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 18:30:03 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape In-Reply-To: <20060413111612.3bb4e6fc.simon@arrowtheory.com> References: <20060413111612.3bb4e6fc.simon@arrowtheory.com> Message-ID: <443DA966.1020301@ee.byu.edu> Simon Burton wrote: >This must be up there with the most useless confusing error messages: > > > >>>>a=numpy.array([1,2,3]) >>>>b=numpy.array([1,2,3,4]) >>>>a*b >>>> >>>> >Traceback (most recent call last): > File "", line 1, in ? >ValueError: index objects are not broadcastable to a single shape > > > > > The problem with these error messages is that some code is used in a wide-variety of circumstances. The original error message was conceived in thinking about the application of the code to one circumstance while this particular error is occurring in a different one. The standard behavior is to just propagate the error up. Better error messages means catching a lot more errors and special-casing error messages. It can be done, but it's tedious work. -Travis From simon at arrowtheory.com Wed Apr 12 20:34:04 2006 From: simon at arrowtheory.com (Simon Burton) Date: Wed Apr 12 20:34:04 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape In-Reply-To: <443DA966.1020301@ee.byu.edu> References: <20060413111612.3bb4e6fc.simon@arrowtheory.com> <443DA966.1020301@ee.byu.edu> Message-ID: <20060413133326.2889a5c5.simon@arrowtheory.com> On Wed, 12 Apr 2006 19:29:10 -0600 Travis Oliphant wrote: > The problem with these error messages is that some code is used in a > wide-variety of circumstances. The original error message was conceived > in thinking about the application of the code to one circumstance while > this particular error is occurring in a different one. > > The standard behavior is to just propagate the error up. Better error > messages means catching a lot more errors and special-casing error > messages. It can be done, but it's tedious work. OK. Can the error message be a little more generic, longer, etc. ? "shape mismatch (index objects are not broadcastable to a single shape)" ? I don't know either. I'm just thinking about all the new numpy/python users at work here that I will need to hand hold. Error messages like this are pretty scary. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From tim.hochberg at cox.net Wed Apr 12 21:59:01 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 21:59:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443DA866.3090806@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> <443D96DC.3060501@ee.byu.edu> <443DA1B1.8040406@cox.net> <443DA866.3090806@ee.byu.edu> Message-ID: <443DD9D9.9080004@cox.net> Travis Oliphant wrote: > Tim Hochberg wrote: > >> Travis Oliphant wrote: >> >>> Travis Oliphant wrote: >>> >>>> >>>> The next release of NumPy will be 0.9.8 >>>> >>>> Before this release is made, I want to make sure the following >>>> tickets are implemented >>>> >>>> http://projects.scipy.org/scipy/numpy/ticket/54 >>>> http://projects.scipy.org/scipy/numpy/ticket/55 >>>> http://projects.scipy.org/scipy/numpy/ticket/56 >>> >>> >>> >>> >>> >>> So you don't have to read each one individually: >>> >>> >>> #54 : implement thread-based error-handling modes >>> #55 : finish scalar-math implementation which recognizes same >>> error-handling >>> #56 : implement rich_comparisons on string arrays and unicode arrays. >> >> >> >> I'll help with #54 at least, since I was the complainer, er I mean, >> since I brought that one up. It's probably better to get that started >> before #55 anyway. The open issues that I see connected to this are: > > > Great. I agree that #54 needs to be done before #55 (error handling > is what's been holding up #55 the whole time. > >> >> 1. Better support for catching integer divide by zero. That >> doesn't work at all here, > > > Probably a platform/compiler issue. The numarray equivalent code had > an if statement to prevent the compiler from optimizing it away. > Perhaps we need to do something like that. Also, perhaps VC7 has > some means to set the divide by zero error more directly and we can > just use that. > >> I'm guessing because my optimizer is too smart. I spent a half hour >> this morning trying how to set the divide by zero flag directly using >> VC7, but I couldn't find anything. I suppose I could see if there's >> some pragma to turn off optimization around that one function. >> However, I'm interested in what you think of stuffing the integer >> divide by zero information directly into a flag on the thread local >> object and then checking it on the way out. > > > > Hmm.. The only issue is that dictionary look-ups are more expensive > then register look-ups. This could be costly. > > >> This is cleaner in that it doesn't rely on platform specific flag >> setting ifdeffery and it allows us to consider issue #2. >> >> 2. Breaking integer divide by zero out from floating point divide >> by zero. The former is more serious in that it's silent. The latter >> returns INF, so you can see that something happened by examing your >> results, while the former returns zero. That has much more potential >> for confusion and silents bugs. Thus, it seems reasonable to be able >> to set the error handling different for integer divide by zero and >> floating point divide by zero. Note that this would allow integer >> divide by zero to be set to 'raise' and still run all the FP ops at >> max speed, since the flag saying do no error checking could ignore >> the int_divide_by_zero setting. > > > > Interesting proposal. Yes, it is true that integer division > returning zero is less well-justified. But, I'm still concerned with > doing a dictionary lookup for every divide-by-zero, and (more > importantly) to check to see if a divide-by-zero has occurred. The > dictionary lookups is the largest source of small-array slow-down when > comparing Numeric to NumPy. Well, assuming that we can fix the error flag setting code here, we could still break the divide by zero error handling out by doing some special casing in the ufunc machinery since the ufuncs presumably can figure out there own types. Still, the thread local storage option is cleaner if we can figure out a way to make the dictionary lookups fast enough. The lookup in the failing case is not a big deal I don't think. First, it's normally an error so I don't mind introduce some slowing. Second ,it should be easy to only do the lookup once. Just have a flag that enusres that after the first lookup, the divided by zero flag is not set a second time. I guess the bigger issue is the lookup on the way out to see if anything failed. I have a plane, which I'll present at the bottom. >> >> 3. Tossing out the overflow checking on integer operations. It's >> incomplete anyway and it slows things down. I don't really expect my >> integer operations to be overflow checked, and personally I think >> that incomplete checking is worse than no checking. I think we should >> at least disable the support for the time being and possibly revisit >> this latter when we have time to do a complete job and if it seems >> necessary. > > > I'm all for that. I think it makes the code slower and because it is > incomplete (addition and subtraction don't do it), it makes for > harder-to-explain code. > > On the scalar operations, we should check for over-flow, however... OK. > >> >> 4. Different defaults I'd like to enable different defaults without >> slowing things down in the really super fast case. > > > > The discussion on different defaults is fine. The slow-down is that > with the current defaults, the error register flags are not actually > checked if the default has not been changed. With the > numarray-defaults, the register flags would be checked at the end of > each 1-d loop. I'm not sure what kind of slow-down that would > bring. Certainly for 1-d cases, there would be little difference. > > One could actually simply store different defaults (but it would > result in minor slow-downs because the register flags would be checked. > OK, here's my plan. It sounds like it will work, but this threading business is always tricky so find holes in it if you can. 1. As we've discussed we grow some thread local storage. This storage has flags check_divide, check_over, check_under, check_invalid and check_int_divide. It also has a flag int_divide_err. These flags are initialized to False, but then may immediately be set to a different default value. This is to simplify #3. 2. We grow 6 static longs that correspond to the above and are initialized to zero. They should be called check_divide_count, etc. or something similar. 3. Whenever a flag is switched from False to True it's corresponding global is incremented. Similarly, when switched from True to False the global is decremented. 4. When a divide by integer zero occurs, we check the int_divide_err flag. If it is false, we set it to true and also increment int_divide_err_count. We also set a local flag so that we don't do this again in that call to the ufunc core function. We can actually skip this whole step if check_int_divide_count is zero. With all that in place, I think we should be able to do things efficiently. The ufunc can check whether any of the XXX_check_counts are nonzero and turn on register flag checking as appropriate. If an error occurs, it still only has to go to the per thread dictionary if the count for that particular error type is nonzero. Similarly, if the count int_divide_err_count is nonzero, the ufunc will have to go to the dictionary. If the error was set in this thread, then appropriate action (including possibly nothing) is taken and int_divide_err_count is decremented. That all sounds more complicated than it really is, at least in my head ;) Anyway, try to find the holes in it. It should be able to run at full speed if you turn off error checking in all threads. It should run at almost full speed as long as there aren't any errors that are being checked in *any thread*. I think in practice this means that most of the speed hit that is seen in numarray won't be here. It doesn't actually matter what the defaults are; turning off all error checking will still be fast. Regards, -tim > > > > From winnieshop888 at yahoo.com.cn Wed Apr 12 22:02:02 2006 From: winnieshop888 at yahoo.com.cn (winnie) Date: Wed Apr 12 22:02:02 2006 Subject: [Numpy-discussion] Rash Guard Message-ID: The products name :Rash Guard The price :USD4.50/pc (with shiiping cost) The qty : 200pcs The size :XL,L,M,and S see the attached www.rmb.com.hk Thanks, winnie From shetbest at 163.com Wed Apr 12 23:30:04 2006 From: shetbest at 163.com (=?GB2312?B?NNTCMTUtMTbJz7qjLzIxLTIyye7b2g==?=) Date: Wed Apr 12 23:30:04 2006 Subject: [Numpy-discussion] =?GB2312?B?QUTUy9PDRVhDRUy02b34ytCzodOqz/rT67LGzvG53MDt?= Message-ID: An HTML attachment was scrubbed... URL: From arnd.baecker at web.de Thu Apr 13 00:58:04 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 13 00:58:04 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> Message-ID: On Wed, 12 Apr 2006, Jeremy Gore wrote: > In Numeric: [...] > but in numpy: [...] > For the record, I have used the Numeric (and to a lesser degree the > numarray) module extensively in bioinformatics applications for its > speed and brevity. If (after this round of discussion) there remain any differences, it would be good if you could add them to the wiki at http://www.scipy.org/Converting_from_Numeric Best, Arnd P.S.: The same applies of course to any other differences which show up! From svetosch at gmx.net Thu Apr 13 01:20:02 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Thu Apr 13 01:20:02 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: <443E096D.3040407@gmx.net> Travis Oliphant schrieb: > > The next release of NumPy will be 0.9.8 > > The recent discussions and bug-reports have been very helpful. If you > have found a bug, please report it on the Trac pages so that we don't > lose sight of it. > Report bugs by "submitting a ticket" here: > Before submitting the following as a bug, I would like to repeat what I posted earlier (no replies) to check whether you agree it's a bug: The "kron" (Kronecker product) function returns numpy-arrays even if both arguments are numpy-matrices; imho that's a bug in light of the proclaimed goal of preserving matrices where possible/sensible. On a related issue, "eye" also still returns a numpy-array instead of a numpy-matrix. At least one person (I think it was Ed Schofield) agreed that it would be better to return a numpy-matrix, given that another function ("identity") already returns a numpy-array. Currently, one of the two functions seems redundant. So unless somebody tells me otherwise, I will submit these two things as bugs/tickets. Great that numpy soon will be officially stable! Cheers, Sven From pgmdevlist at mailcan.com Thu Apr 13 01:41:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 13 01:41:02 2006 Subject: [Numpy-discussion] range/arange Message-ID: <200604130507.40241.pgmdevlist@mailcan.com> Folks, Could any of you explain me why the two following commands give different results ? It's mere curiosity, for my personal edification. [(m-5)/10 for m in arange(1,10)] [0, 0, 0, 0, 0, 0, 0, 0, 0] [(m-5)/10 for m in range(1,10)] [-1, -1, -1, -1, 0, 0, 0, 0, 0] From lars.bittrich at googlemail.com Thu Apr 13 02:30:01 2006 From: lars.bittrich at googlemail.com (Lars Bittrich) Date: Thu Apr 13 02:30:01 2006 Subject: [Numpy-discussion] range/arange In-Reply-To: <200604130507.40241.pgmdevlist@mailcan.com> References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: <200604131123.56171.lars.bittrich@googlemail.com> Hi, On Thursday 13 April 2006 11:07, Pierre GM wrote: > Could any of you explain me why the two following commands give different > results ? It's mere curiosity, for my personal edification. > > [(m-5)/10 for m in arange(1,10)] > [0, 0, 0, 0, 0, 0, 0, 0, 0] > > [(m-5)/10 for m in range(1,10)] > [-1, -1, -1, -1, 0, 0, 0, 0, 0] I have no idea where the reason is located exactly, but it seems to be caused by different types of range and arange. In [15]:type(arange(1,10)[0]) Out[15]: In [14]:type(range(1,10)[0]) Out[14]: If you use for example: In [16]:-1/10 Out[16]:-1 you get the normal behavior of the "floor" function. In [17]:floor(-.1) Out[17]:-1.0 The behavior of int32scalar seems more intuitive to me. Best regards, Lars From robert.kern at gmail.com Thu Apr 13 05:17:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 13 05:17:05 2006 Subject: [Numpy-discussion] Re: range/arange In-Reply-To: <200604130507.40241.pgmdevlist@mailcan.com> References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: Pierre GM wrote: > Folks, > Could any of you explain me why the two following commands give different > results ? It's mere curiosity, for my personal edification. > > [(m-5)/10 for m in arange(1,10)] > [0, 0, 0, 0, 0, 0, 0, 0, 0] > > [(m-5)/10 for m in range(1,10)] > [-1, -1, -1, -1, 0, 0, 0, 0, 0] Python's rule for integer division is to round towards negative infinity. C's rule (if it has one; I think it may be platform dependent) is to round towards 0. When it comes to arithmetic, numpy tends to expose the C behavior because it's fastest. As Lars pointed out, the type of the object that you get from iterating over an array is a numpy int32scalar object, so the numpy behavior is used. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From fullung at gmail.com Thu Apr 13 05:18:04 2006 From: fullung at gmail.com (Albert Strasheim) Date: Thu Apr 13 05:18:04 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple Message-ID: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> Hello all, The following segfault bug was discovered in NumPy 0.9.7.2348 by someone at our Python workshop: import numpy as N F = N.zeros((1,1)) F[:,[0]] = 0 The following also segfaults: F[:,(0,)] = 0 Something seems to go wrong when one uses a tuple or a list to index into a NumPy array on the second or higher dimension, since the following code works: F = N.zeros((1,)) F[[0]] = 0 The Trac ticket is here: http://projects.scipy.org/scipy/numpy/ticket/59 If someone gets around to fixing this, please include some test cases. Thanks! Regards, Albert From cimrman3 at ntc.zcu.cz Thu Apr 13 05:24:02 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu Apr 13 05:24:02 2006 Subject: [Numpy-discussion] Re: ***[Possible UCE]*** [SciPy-user] Regarding what "where" returns In-Reply-To: <443D857F.9000605@ee.byu.edu> References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> <443D857F.9000605@ee.byu.edu> Message-ID: <443E42A2.80402@ntc.zcu.cz> Travis Oliphant wrote: > I went ahead and made this change to the code. The nonzero function > still behaves as before (and in fact only works for 1-d arrays as it did > in Numeric). > > The where(condition) function works the same as condition.nonzero() and > both always return a tuple. So, for 1-d arrays, using 'nonzero( condition )' should be faster than 'where( condition )[0]', right? r. From charlesr.harris at gmail.com Thu Apr 13 05:35:13 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 05:35:13 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443E096D.3040407@gmx.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> Message-ID: Sven, On 4/13/06, Sven Schreiber wrote: > > Travis Oliphant schrieb: > > > > The next release of NumPy will be 0.9.8 > > > > > The recent discussions and bug-reports have been very helpful. If you > > have found a bug, please report it on the Trac pages so that we don't > > lose sight of it. > > Report bugs by "submitting a ticket" here: > > > > Before submitting the following as a bug, I would like to repeat what I > posted earlier (no replies) to check whether you agree it's a bug: > > The "kron" (Kronecker product) function returns numpy-arrays even if > both arguments are numpy-matrices; imho that's a bug in light of the > proclaimed goal of preserving matrices where possible/sensible. What would you do instead? The Kronecker product (aka Tensor product) of two matrices isn't a matrix. I suppose you could make it one by appealing to the universal property -- bilinear map on the Cartesian product of linear spaces -> linear map on the tensor product of linear spaces -- but that seems a bit abstract for numpy and you would need to define the indices of the resulting object as some sort of pair. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pjssilva at ime.usp.br Thu Apr 13 05:51:02 2006 From: pjssilva at ime.usp.br (Paulo Jose da Silva e Silva) Date: Thu Apr 13 05:51:02 2006 Subject: [Numpy-discussion] Re: range/arange In-Reply-To: References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: <1144932598.16449.5.camel@localhost.localdomain> Em Qui, 2006-04-13 ?s 07:15 -0500, Robert Kern escreveu: > > Python's rule for integer division is to round towards negative infinity. C's > rule (if it has one; I think it may be platform dependent) is to round towards > 0. When it comes to arithmetic, numpy tends to expose the C behavior because > it's fastest. As Lars pointed out, the type of the object that you get from > iterating over an array is a numpy int32scalar object, so the numpy behavior is > used. > Actually, in C99 standard the division was defined to truncate towards zero always, see item 25 in: http://home.datacomm.ch/t_wolf/tw/c/c9x_changes.html So it is not platform dependent anymore. Paulo Obs: It once was platform dependent. Old gcc (for Linux) would truncate towards infinity. I know this because of a "bug" in somebody else's code. I took me a quite some time to discover that the problem was the shift in gcc behavior in this matter. From tejeda at clubspit.com Thu Apr 13 06:17:03 2006 From: tejeda at clubspit.com (Socorro Tejeda) Date: Thu Apr 13 06:17:03 2006 Subject: [Numpy-discussion] Re: your news Message-ID: <000001c65efc$5afb0e50$f914a8c0@sfb92> A M B r I E N X A q N A X C I A f L I S V o I A G R A V b A L I U M http://www.korbahcut.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Thu Apr 13 07:02:11 2006 From: aisaac at american.edu (Alan G Isaac) Date: Thu Apr 13 07:02:11 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> Message-ID: On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > The Kronecker product (aka Tensor product) of two > matrices isn't a matrix. That is an unusual way to describe things in the world of econometrics. Here is a more common way: http://planetmath.org/encyclopedia/KroneckerProduct.html I share Sven's expectation. Cheers, Alan Isaac From fullung at gmail.com Thu Apr 13 07:24:02 2006 From: fullung at gmail.com (Albert Strasheim) Date: Thu Apr 13 07:24:02 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple In-Reply-To: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> References: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> Message-ID: <20060413142246.GA6870@dogbert.sdsl.sun.ac.za> Hello all I've attached a test case that reproduces the bug to the ticket: http://projects.scipy.org/scipy/numpy/attachment/ticket/59/test_list_tuple_indexing.diff I've also created a test case for the recent vectorize bug: http://projects.scipy.org/scipy/numpy/attachment/ticket/52/test_vectorize.diff Regards, Albert On Thu, 13 Apr 2006, Albert Strasheim wrote: > Hello all, > > The following segfault bug was discovered in NumPy 0.9.7.2348 by > someone at our Python workshop: > > import numpy as N > F = N.zeros((1,1)) > F[:,[0]] = 0 > > The following also segfaults: > > F[:,(0,)] = 0 > > Something seems to go wrong when one uses a tuple or a list to index > into a NumPy array on the second or higher dimension, since the > following code works: > > F = N.zeros((1,)) > F[[0]] = 0 > > The Trac ticket is here: > > http://projects.scipy.org/scipy/numpy/ticket/59 > > If someone gets around to fixing this, please include some test cases. > > Thanks! > > Regards, > > Albert > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From oliphant.travis at ieee.org Thu Apr 13 07:58:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 07:58:05 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape In-Reply-To: <20060413133326.2889a5c5.simon@arrowtheory.com> References: <20060413111612.3bb4e6fc.simon@arrowtheory.com> <443DA966.1020301@ee.byu.edu> <20060413133326.2889a5c5.simon@arrowtheory.com> Message-ID: <443E66AC.2020108@ieee.org> Simon Burton wrote: > On Wed, 12 Apr 2006 19:29:10 -0600 > Travis Oliphant wrote: > > >> The problem with these error messages is that some code is used in a >> wide-variety of circumstances. The original error message was conceived >> in thinking about the application of the code to one circumstance while >> this particular error is occurring in a different one. >> >> The standard behavior is to just propagate the error up. Better error >> messages means catching a lot more errors and special-casing error >> messages. It can be done, but it's tedious work. >> > > OK. Can the error message be a little more generic, longer, etc. ? > > Absolutely, I should have finished the above message with an appeal for more helpful generic messages. All suggestions are welcome. > "shape mismatch (index objects are not broadcastable to a single shape)" ? > Definitely better. I would probably drop the index qualifier as well. Thanks for the tip. -Travis From oliphant.travis at ieee.org Thu Apr 13 08:16:13 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 08:16:13 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple In-Reply-To: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> References: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> Message-ID: <443E6B01.7000906@ieee.org> Albert Strasheim wrote: > Hello all, > > The following segfault bug was discovered in NumPy 0.9.7.2348 by > someone at our Python workshop: > > import numpy as N > F = N.zeros((1,1)) > F[:,[0]] = 0 > > The following also segfaults: > > F[:,(0,)] = 0 > > Something seems to go wrong when one uses a tuple or a list to index > into a NumPy array on the second or higher dimension, since the > following code works: > > The segfault was due to an error condition not being caught. This is now fixed, so now you get (a rather cryptic error). Now, to figure out why this code doesn't work.... -Travis From oliphant.travis at ieee.org Thu Apr 13 08:29:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 08:29:01 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple In-Reply-To: <443E6B01.7000906@ieee.org> References: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> <443E6B01.7000906@ieee.org> Message-ID: <443E6DF1.5020206@ieee.org> Travis Oliphant wrote: > Albert Strasheim wrote: >> Hello all, >> >> The following segfault bug was discovered in NumPy 0.9.7.2348 by >> someone at our Python workshop: >> >> import numpy as N >> F = N.zeros((1,1)) >> F[:,[0]] = 0 >> >> The following also segfaults: >> >> F[:,(0,)] = 0 >> >> Something seems to go wrong when one uses a tuple or a list to index >> into a NumPy array on the second or higher dimension, since the >> following code works: >> >> > The segfault was due to an error condition not being caught. This is > now fixed, so now you get (a rather cryptic error). Now, to figure > out why this code doesn't work.... > The problem is that the code is not handling arbitrary shapes on the RHS of the equal sign. I'll enter a ticket and fix this before 0.9.8. Basically, right now, the RHS needs to have the same shape as the LHS so F[:,[0]] = [[0]] should work already. -Travis From oliphant.travis at ieee.org Thu Apr 13 08:43:14 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 08:43:14 2006 Subject: [Numpy-discussion] Re: ***[Possible UCE]*** [SciPy-user] Regarding what "where" returns In-Reply-To: <443E42A2.80402@ntc.zcu.cz> References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> <443D857F.9000605@ee.byu.edu> <443E42A2.80402@ntc.zcu.cz> Message-ID: <443E7150.2010006@ieee.org> Robert Cimrman wrote: > Travis Oliphant wrote: >> I went ahead and made this change to the code. The nonzero >> function still behaves as before (and in fact only works for 1-d >> arrays as it did in Numeric). >> >> The where(condition) function works the same as condition.nonzero() >> and both always return a tuple. > > So, for 1-d arrays, using 'nonzero( condition )' should be faster than > 'where( condition )[0]', right? > No. since the function just selects off the first element of the tuple returned by the method... 'condition.nonzero()[0]' may be *slightly* faster than 'where(condition)[0]' however -Travis From tim.hochberg at cox.net Thu Apr 13 08:44:47 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 13 08:44:47 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> Message-ID: <443E7109.6080808@cox.net> Alan G Isaac wrote: >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > >>The Kronecker product (aka Tensor product) of two >>matrices isn't a matrix. >> >> > >That is an unusual way to describe things in >the world of econometrics. Here is a more >common way: >http://planetmath.org/encyclopedia/KroneckerProduct.html >I share Sven's expectation. > > mathworld also agrees with you. As does the documentation (as best as I can tell) and the actual output of kron. I think Charles must be thinking of the tensor product instead. In fact, if you look at the code you see this: # TODO: figure out how to keep arrays the same I think that in general this is going to be a bit of an issue whenever we have multiple arguments. Let me propose the world's second dumbest (in a good way, maybe) procedure: def kron(a,b): wrappers = [(getattr(x, '__array_priority__', 0), x.__array_wrap__) for x in [a,b] if hasattr(x, '__array_wrap__')] if wrappers: priority, wrap = wrappers[-1] else: wrap = None # .... result = concatenate(concatenate(o, axis=1), axis=1) if wrap is not None: result = wrap(result) return result This generalizes what _wrapit does for arbitrary arguments. It breaks 'ties' where more than one argument wants to wrap something by using __array_priority__. You'd actually want to factor out the wrapper finding code. This generalized what _wrapit does to multiple dimensions. Thought? Better plans? -tim From ryanlists at gmail.com Thu Apr 13 09:11:10 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 09:11:10 2006 Subject: [Numpy-discussion] where Message-ID: Can someone help me understand the proper use of where? I want to use it like this myvect=where(f>19.5 and phase>0, f, phase) but I seem to be getting or rather than and. Thanks, Ryan From oliphant at ee.byu.edu Thu Apr 13 09:18:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 09:18:05 2006 Subject: [Numpy-discussion] where In-Reply-To: References: Message-ID: <443E79A5.2000700@ee.byu.edu> Ryan Krauss wrote: >Can someone help me understand the proper use of where? > >I want to use it like this > >myvect=where(f>19.5 and phase>0, f, phase) > >but I seem to be getting or rather than and. > > > It is probably your use of the 'and' statement. Use '&' instead (f > 19.5) & (phase > 0) What version are you using. In numarray and NumPy the use of 'and' like this should raise an error if 'f' and/or 'phase' are arrays of more than one element. -Travis From ryanlists at gmail.com Thu Apr 13 09:27:06 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 09:27:06 2006 Subject: [Numpy-discussion] where In-Reply-To: <443E79A5.2000700@ee.byu.edu> References: <443E79A5.2000700@ee.byu.edu> Message-ID: Does where return a mask? If I do myvect=where((f > 19.5) & (phase > 0),f,phase) myvect is the same length as f and phase and there is some modification of the values where the condition is met, but what that modification is is unclear to me. If I do myind=where((f > 19.5) & (phase > 0)) I seem to get the indices of the points where both conditions are met. I am using version 0.9.5.2043. I see those kinds of errors about truth testing an array often, but not in this case. Thanks, Ryan On 4/13/06, Travis Oliphant wrote: > Ryan Krauss wrote: > > >Can someone help me understand the proper use of where? > > > >I want to use it like this > > > >myvect=where(f>19.5 and phase>0, f, phase) > > > >but I seem to be getting or rather than and. > > > > > > > It is probably your use of the 'and' statement. Use '&' instead > > (f > 19.5) & (phase > 0) > > What version are you using. In numarray and NumPy the use of 'and' like > this should raise an error if 'f' and/or 'phase' are arrays of more than > one element. > > -Travis > > From oliphant at ee.byu.edu Thu Apr 13 09:39:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 09:39:04 2006 Subject: [Numpy-discussion] where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> Message-ID: <443E7E7B.2030203@ee.byu.edu> Ryan Krauss wrote: >Does where return a mask? > > Only in the second use case... >If I do >myvect=where((f > 19.5) & (phase > 0),f,phase) >myvect is the same length as f and phase and there is some >modification of the values where the condition is met, but what that >modification is is unclear to me. > > The behavior of where(condition, for_true, for_false) is to return an array of the same shape as condition with elements of for_true where condition is true and for_false where condition is false. Thus myvect will contain elements of f where the condition is met and elements of phase otherwise. >If I do >myind=where((f > 19.5) & (phase > 0)) >I seem to get the indices of the points where both conditions are met. > > Yes. That is correct. It is a different use-case... Note, however, that in the current SVN version of NumPy, this use-case will always return a tuple of indices (use the nonzero function instead for behavior that will stay constant). For your 1-d example (I'm guessing it's 1-d) where will return a length-1 tuple. >I am using version 0.9.5.2043. I see those kinds of errors about >truth testing an array often, but not in this case. > > That is strange. What are the sizes of f and phase? -Travis From robert.kern at gmail.com Thu Apr 13 09:42:04 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 13 09:42:04 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> Message-ID: Ryan Krauss wrote: > Does where return a mask? > > If I do > myvect=where((f > 19.5) & (phase > 0),f,phase) > myvect is the same length as f and phase and there is some > modification of the values where the condition is met, but what that > modification is is unclear to me. > > If I do > myind=where((f > 19.5) & (phase > 0)) > I seem to get the indices of the points where both conditions are met. > > I am using version 0.9.5.2043. I see those kinds of errors about > truth testing an array often, but not in this case. Have you read the docstring? In [33]: where? Type: builtin_function_or_method Base Class: String Form: Namespace: Interactive Docstring: where(condition, | x, y) is shaped like condition and has elements of x and y where condition is respectively true or false. If x or y are not given, then it is equivalent to nonzero(condition). -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ryanlists at gmail.com Thu Apr 13 09:44:01 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 09:44:01 2006 Subject: [Numpy-discussion] where In-Reply-To: <443E7E7B.2030203@ee.byu.edu> References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: f and phase are each (4250,) I have something that is working but doesn't use where. Can this be done easier using where: f1=f>19.5 f2=f<38 myf=f1&f2 myp=phase>0 myind=myf&myp correction=myind*-360 newphase=phase+correction Basically, can where give me an output vector of the same size as f and phase where the output is either 1 or 0? Ryan On 4/13/06, Travis Oliphant wrote: > Ryan Krauss wrote: > > >Does where return a mask? > > > > > Only in the second use case... > > >If I do > >myvect=where((f > 19.5) & (phase > 0),f,phase) > >myvect is the same length as f and phase and there is some > >modification of the values where the condition is met, but what that > >modification is is unclear to me. > > > > > > The behavior of > > where(condition, for_true, for_false) > > is to return an array of the same shape as condition with elements of > for_true where condition is true and > for_false where condition is false. > > Thus myvect will contain elements of f where the condition is met and > elements of phase otherwise. > > >If I do > >myind=where((f > 19.5) & (phase > 0)) > >I seem to get the indices of the points where both conditions are met. > > > > > Yes. That is correct. It is a different use-case... Note, however, > that in the current SVN version of NumPy, this use-case will always > return a tuple of indices (use the nonzero function instead for behavior > that will stay constant). For your 1-d example (I'm guessing it's 1-d) > where will return a length-1 tuple. > > >I am using version 0.9.5.2043. I see those kinds of errors about > >truth testing an array often, but not in this case. > > > > > That is strange. What are the sizes of f and phase? > > -Travis > > From robert.kern at gmail.com Thu Apr 13 09:54:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 13 09:54:05 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: Ryan Krauss wrote: > f and phase are each (4250,) > > I have something that is working but doesn't use where. Can this be > done easier using where: > > f1=f>19.5 > f2=f<38 > myf=f1&f2 > myp=phase>0 > myind=myf&myp > correction=myind*-360 > newphase=phase+correction (untested) phase[((f>19.5) & (f<38)) & (phase>0)] -= 360 > Basically, can where give me an output vector of the same size as f > and phase where the output is either 1 or 0? Why? The condition array that you would pass into where() is already such an array. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From arnd.baecker at web.de Thu Apr 13 10:07:14 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 13 10:07:14 2006 Subject: [Numpy-discussion] range/arange In-Reply-To: <200604131123.56171.lars.bittrich@googlemail.com> References: <200604130507.40241.pgmdevlist@mailcan.com> <200604131123.56171.lars.bittrich@googlemail.com> Message-ID: On Thu, 13 Apr 2006, Lars Bittrich wrote: > Hi, > > On Thursday 13 April 2006 11:07, Pierre GM wrote: > > Could any of you explain me why the two following commands give different > > results ? It's mere curiosity, for my personal edification. > > > > [(m-5)/10 for m in arange(1,10)] > > [0, 0, 0, 0, 0, 0, 0, 0, 0] > > > > [(m-5)/10 for m in range(1,10)] > > [-1, -1, -1, -1, 0, 0, 0, 0, 0] > > I have no idea where the reason is located exactly, but it seems to be caused > by different types of range and arange. Interestingly with Numeric you get the following: In [1]: from Numeric import * In [2]: [(m-5)/10 for m in arange(1,10)] Out[2]: [-1, -1, -1, -1, 0, 0, 0, 0, 0] In [3]: type(arange(1,10)[0]) Out[3]: Will this cause any trouble for projects transitioning from Numeric to numpy? Presumably a proper explanation (which?) should go into the scipy wiki ("Converting from Numeric"). > In [15]:type(arange(1,10)[0]) > Out[15]: > > In [14]:type(range(1,10)[0]) > Out[14]: > > If you use for example: > > In [16]:-1/10 > Out[16]:-1 > > you get the normal behavior of the "floor" function. > > In [17]:floor(-.1) > Out[17]:-1.0 > > The behavior of int32scalar seems more intuitive to me. Me too. Best, Arnd From ryanlists at gmail.com Thu Apr 13 10:12:06 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 10:12:06 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> Message-ID: Sorry, I can't explain myself. I read the docstring and it didn't make sense before. Now it seems clear enough. Some how I got it in my head that I needed to be passing f and phase so that condition could use them. It turns out that this: myvect=where((f>19.5) & (f<38) & (phase>0),ones(shape(phase)),zeros(shape(phase))) does exactly what I want. Ryan On 4/13/06, Robert Kern wrote: > Ryan Krauss wrote: > > Does where return a mask? > > > > If I do > > myvect=where((f > 19.5) & (phase > 0),f,phase) > > myvect is the same length as f and phase and there is some > > modification of the values where the condition is met, but what that > > modification is is unclear to me. > > > > If I do > > myind=where((f > 19.5) & (phase > 0)) > > I seem to get the indices of the points where both conditions are met. > > > > I am using version 0.9.5.2043. I see those kinds of errors about > > truth testing an array often, but not in this case. > > Have you read the docstring? > > In [33]: where? > Type: builtin_function_or_method > Base Class: > String Form: > Namespace: Interactive > Docstring: > where(condition, | x, y) is shaped like condition and has elements of x and > y where condition is respectively true or false. If x or y are not given, then > it is equivalent to nonzero(condition). > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ryanlists at gmail.com Thu Apr 13 10:15:03 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 10:15:03 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: > Why? The condition array that you would pass into where() is already such an array. That is the key point I was missing. Until I played around with the conditions myself I didn't get that I was passing in an explicit array of 1's and 0's. I guess I thought I was passing in some magic expression that where was some how making sense. That is why I thought I would need to pass f and phase to the function. Ryan On 4/13/06, Robert Kern wrote: > Ryan Krauss wrote: > > f and phase are each (4250,) > > > > I have something that is working but doesn't use where. Can this be > > done easier using where: > > > > f1=f>19.5 > > f2=f<38 > > myf=f1&f2 > > myp=phase>0 > > myind=myf&myp > > correction=myind*-360 > > newphase=phase+correction > > (untested) > phase[((f>19.5) & (f<38)) & (phase>0)] -= 360 > > > Basically, can where give me an output vector of the same size as f > > and phase where the output is either 1 or 0? > > Why? The condition array that you would pass into where() is already such an array. > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ryanlists at gmail.com Thu Apr 13 10:17:14 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 10:17:14 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: which makes this: myvect=where((f>19.5) & (f<38) & (phase>0),ones(shape(phase)),zeros(shape(phase))) actually really silly, sense all it is a complicated way to get back the input of (f>19.5) & (f<38) & (phase>0) Ryan On 4/13/06, Ryan Krauss wrote: > > Why? The condition array that you would pass into where() is already such an array. > > That is the key point I was missing. Until I played around with the > conditions myself I didn't get that I was passing in an explicit array > of 1's and 0's. I guess I thought I was passing in some magic > expression that where was some how making sense. That is why I > thought I would need to pass f and phase to the function. > > Ryan > > On 4/13/06, Robert Kern wrote: > > Ryan Krauss wrote: > > > f and phase are each (4250,) > > > > > > I have something that is working but doesn't use where. Can this be > > > done easier using where: > > > > > > f1=f>19.5 > > > f2=f<38 > > > myf=f1&f2 > > > myp=phase>0 > > > myind=myf&myp > > > correction=myind*-360 > > > newphase=phase+correction > > > > (untested) > > phase[((f>19.5) & (f<38)) & (phase>0)] -= 360 > > > > > Basically, can where give me an output vector of the same size as f > > > and phase where the output is either 1 or 0? > > > > Why? The condition array that you would pass into where() is already such an array. > > > > -- > > Robert Kern > > robert.kern at gmail.com > > > > "I have come to believe that the whole world is an enigma, a harmless enigma > > that is made terrible by our own mad attempt to interpret it as though it had > > an underlying truth." > > -- Umberto Eco > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > > that extends applications into web and mobile media. Attend the live webcast > > and join the prime developer group breaking into this new coding territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From oliphant at ee.byu.edu Thu Apr 13 10:49:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 10:49:06 2006 Subject: [Numpy-discussion] range/arange In-Reply-To: References: <200604130507.40241.pgmdevlist@mailcan.com> <200604131123.56171.lars.bittrich@googlemail.com> Message-ID: <443E8EEB.9070609@ee.byu.edu> Arnd Baecker wrote: >On Thu, 13 Apr 2006, Lars Bittrich wrote: > > > >>Hi, >> >>On Thursday 13 April 2006 11:07, Pierre GM wrote: >> >> >>>Could any of you explain me why the two following commands give different >>>results ? It's mere curiosity, for my personal edification. >>> >>>[(m-5)/10 for m in arange(1,10)] >>>[0, 0, 0, 0, 0, 0, 0, 0, 0] >>> >>>[(m-5)/10 for m in range(1,10)] >>>[-1, -1, -1, -1, 0, 0, 0, 0, 0] >>> >>> >>I have no idea where the reason is located exactly, but it seems to be caused >>by different types of range and arange. >> >> > > >Interestingly with Numeric you get the following: > >In [1]: from Numeric import * >In [2]: [(m-5)/10 for m in arange(1,10)] >Out[2]: [-1, -1, -1, -1, 0, 0, 0, 0, 0] >In [3]: type(arange(1,10)[0]) >Out[3]: > >Will this cause any trouble for projects >transitioning from Numeric to numpy? >Presumably a proper explanation (which?) >should go into the scipy wiki ("Converting from Numeric"). > > > Yes, some discussion will be needed about the fact that NumPy now has its own scalars. This will give us quite a bit more flexibility moving forward and should be seamless for the most part. -Travis From pgmdevlist at mailcan.com Thu Apr 13 11:29:09 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 13 11:29:09 2006 Subject: [Numpy-discussion] Re: range/arange In-Reply-To: References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: <200604131456.48570.pgmdevlist@mailcan.com> > Python's rule for integer division is to round towards negative infinity. > C's rule (if it has one; I think it may be platform dependent) is to round > towards 0. Ah OK. That makes sense, and it's something I'll have to keep in mind later on. Thanks y'all for your answers, I feel quite edified now :) From ndarray at mac.com Thu Apr 13 11:53:00 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 13 11:53:00 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: On 4/12/06, Travis Oliphant wrote: > ... This also dove-tails nicely > with the Python 2.5 release schedule so that NumPy 1.0 should work with > Python 2.5 and be fully 64-bit capable for handling very-large arrays. > I would like to mention one feature that is going to appear in Python 2.5 that is covering some of the functionality of NumPy. I am talking about the ctypes module . Like NumPy, ctypes provides a set of python classes that represent basic C types: c_byte c_char c_char_p c_double c_float c_int c_long c_short c_ubyte ... and the ability to describe composite structures. The later functionality is very close to what dtype class provides in numpy. There are some features in ctype that I like better than similar features in numpy. For example, in ctypes a fixed width array is described by multiplying basic type by an integer: >>> c_char * 10 I find this approach more elegant than numpy's dtype('S10'). It looks like there is some synergy to be exploited here, particularly in the area of record arrays. From oliphant at ee.byu.edu Thu Apr 13 12:49:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 12:49:02 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> Message-ID: <443EAB01.8040700@ee.byu.edu> Sasha wrote: >On 4/12/06, Travis Oliphant wrote: > > >>... This also dove-tails nicely >>with the Python 2.5 release schedule so that NumPy 1.0 should work with >>Python 2.5 and be fully 64-bit capable for handling very-large arrays. >> >> >> > >I would like to mention one feature that is going to appear in Python >2.5 that is covering some of the functionality of NumPy. I am talking >about the ctypes module >. Like >NumPy, ctypes provides a set of python classes that represent basic C >types: > > c_byte > c_char > c_char_p > c_double > c_float > c_int > c_long > c_short > c_ubyte > ... > >and the ability to describe composite structures. The later >functionality is very close to what dtype class provides in numpy. > >There are some features in ctype that I like better than similar >features in numpy. For example, in ctypes a fixed width array is >described by multiplying basic type by an integer: > > >>>>c_char * 10 >>>> >>>> > > >I find this approach more elegant than numpy's dtype('S10'). > >It looks like there is some synergy to be exploited here, particularly >in the area of record arrays. > > Definitely. I'm not familiar enough with c_types to do this. Any help is appreciated. -Travis From charlesr.harris at gmail.com Thu Apr 13 13:33:08 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 13:33:08 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443E7109.6080808@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> Message-ID: Tim, On 4/13/06, Tim Hochberg wrote: > > Alan G Isaac wrote: > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > >>The Kronecker product (aka Tensor product) of two > >>matrices isn't a matrix. > >> > >> > > > >That is an unusual way to describe things in > >the world of econometrics. Here is a more > >common way: > >http://planetmath.org/encyclopedia/KroneckerProduct.html > >I share Sven's expectation. > > > > > mathworld also agrees with you. As does the documentation (as best as I > can tell) and the actual output of kron. I think Charles must be > thinking of the tensor product instead. It *is* the tensor product, A \tensor B, but it is not the most general tensor with four indices just as a bivector is not the most general tensor with two indices. Numerically, kron chooses to represent the tensor product of two vector spaces a, b with dimensions n,m respectively as the direct sum of n copies of b, and the tensor product of two operators takes the given form. More generally, the B matrix in each spot could be replaced with an arbitrary matrix of the correct dimensions and you would recover the general tensor with four indices. Anyway, it sounds like you are proposing that the tensor (outer) product of two matrices be reshaped to run over two indices. It seems that likewise the tensor (outer) product of two vectors should be reshaped to run over one index (i.e. flat). That would do the trick. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Apr 13 14:19:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 14:19:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> Message-ID: Tim, In particular: def kron(a,b): n = shape(a)[1]*shape(b)[1] c = transpose(product.outer(a,b), axis=(0,2,1,3)).reshape(-1,n) # wrap c as a matrix. On 4/13/06, Charles R Harris wrote: > > Tim, > > On 4/13/06, Tim Hochberg < tim.hochberg at cox.net> wrote: > > > > Alan G Isaac wrote: > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > >>matrices isn't a matrix. > > >> > > >> > > > > > >That is an unusual way to describe things in > > >the world of econometrics. Here is a more > > >common way: > > > http://planetmath.org/encyclopedia/KroneckerProduct.html > > >I share Sven's expectation. > > > > > > > > mathworld also agrees with you. As does the documentation (as best as I > > can tell) and the actual output of kron. I think Charles must be > > thinking of the tensor product instead. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at cox.net Thu Apr 13 14:32:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 13 14:32:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> Message-ID: <443EC2B4.807@cox.net> Charles R Harris wrote: > Tim, > > On 4/13/06, *Tim Hochberg* > wrote: > > Alan G Isaac wrote: > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > >>The Kronecker product (aka Tensor product) of two > >>matrices isn't a matrix. > >> > >> > > > >That is an unusual way to describe things in > >the world of econometrics. Here is a more > >common way: > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > >I share Sven's expectation. > > > > > mathworld also agrees with you. As does the documentation (as best > as I > can tell) and the actual output of kron. I think Charles must be > thinking of the tensor product instead. > > > It *is* the tensor product, A \tensor B, but it is not the most > general tensor with four indices just as a bivector is not the most > general tensor with two indices. Numerically, kron chooses to > represent the tensor product of two vector spaces a, b with dimensions > n,m respectively as the direct sum of n copies of b, and the tensor > product of two operators takes the given form. More generally, the B > matrix in each spot could be replaced with an arbitrary matrix of the > correct dimensions and you would recover the general tensor with four > indices. > > Anyway, it sounds like you are proposing that the tensor (outer) > product of two matrices be reshaped to run over two indices. It seems > that likewise the tensor (outer) product of two vectors should be > reshaped to run over one index ( i.e. flat). That would do the trick. I'm not proposing anything. I don't care at all what kron does. I just want to fix the return type if that's feasible so that people stop complaining about it. As far as I can tell, kron already returns a flattened tensor product of some sort. I believe the general tensor product that you are talking about is already covered by multiply.outer, but I'm not sure so correct me if I'm wrong. Here's what kron does as present: >>> a array([[1, 1], [1, 1]]) >>> kron(a,a) # => 4x4 matrix array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]) >>> kron(a,a[0]) => 8x1 array([1, 1, 1, 1, 1, 1, 1, 1]) >>> kron(a[0], a[0]) Traceback (most recent call last): File "", line 1, in ? File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line 577, in kron result = concatenate(concatenate(o, axis=1), axis=1) ValueError: 0-d arrays can't be concatenated >>> b.shape (2, 2, 2) >>> kron(b,b).shape (4, 4, 2, 2) So, it looks like the 2d x 2d product obeys Alan's definition. The other products are probably all broken. Regards, -tim From charlesr.harris at gmail.com Thu Apr 13 16:02:04 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 16:02:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EC2B4.807@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: On 4/13/06, Tim Hochberg wrote: > > Charles R Harris wrote: > > > Tim, > > > > On 4/13/06, *Tim Hochberg* > > wrote: > > > > Alan G Isaac wrote: > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > >>matrices isn't a matrix. > > >> > > >> > > > > > >That is an unusual way to describe things in > > >the world of econometrics. Here is a more > > >common way: > > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > > > >I share Sven's expectation. > > > > > > > > mathworld also agrees with you. As does the documentation (as best > > as I > > can tell) and the actual output of kron. I think Charles must be > > thinking of the tensor product instead. > > > > > > It *is* the tensor product, A \tensor B, but it is not the most > > general tensor with four indices just as a bivector is not the most > > general tensor with two indices. Numerically, kron chooses to > > represent the tensor product of two vector spaces a, b with dimensions > > n,m respectively as the direct sum of n copies of b, and the tensor > > product of two operators takes the given form. More generally, the B > > matrix in each spot could be replaced with an arbitrary matrix of the > > correct dimensions and you would recover the general tensor with four > > indices. > > > > Anyway, it sounds like you are proposing that the tensor (outer) > > product of two matrices be reshaped to run over two indices. It seems > > that likewise the tensor (outer) product of two vectors should be > > reshaped to run over one index ( i.e. flat). That would do the trick. > > I'm not proposing anything. I don't care at all what kron does. I just > want to fix the return type if that's feasible so that people stop > complaining about it. As far as I can tell, kron already returns a > flattened tensor product of some sort. I believe the general tensor > product that you are talking about is already covered by multiply.outer, > but I'm not sure so correct me if I'm wrong. Here's what kron does as > present: > > >>> a > array([[1, 1], > [1, 1]]) > >>> kron(a,a) # => 4x4 matrix > array([[1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1]]) Good at first look. Lets see a simpler version... Nevermind, seems numpy isn't working on this machine (X86_64, fc5 64 bit) at the moment, maybe I need to check out a clean version. >>> kron(a,a[0]) => 8x1 > array([1, 1, 1, 1, 1, 1, 1, 1]) Looks broken. a[0] should be an operator (matrix), so either it should be (2,1) or (1,2). In the first case, the return should have shape (4,2), in the latter (2,4). Should probably raise an error as the result strikes me as ambiguous. But I have to admit I am not sure what the point of this particular construction is. >>> kron(a[0], a[0]) > Traceback (most recent call last): > File "", line 1, in ? > File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line > 577, in kron > result = concatenate(concatenate(o, axis=1), axis=1) > ValueError: 0-d arrays can't be concatenated See above. this could be (1,4) or (4,1), depending. >>> b.shape > (2, 2, 2) > >>> kron(b,b).shape > (4, 4, 2, 2) I think this is doing transpose(outer(b,b), axis=(0,2,1,3)) and reshaping the first 4 indices into 2. Again, I am not sure what the point is for these operators. Now another way to get all this functionality is to have a contraction function or method with a list of axis. For instance, consider the matrices A(i,j) and B(k,l) operating on x(j) and y(l) like A(i,j)x(j) and B(k,l)y(l), then the outer product of all of these is A(i,j)B(k,l)x(j)y(l) with the summation convention on the indices j and l. The result should be the same as kron(A,B)*kron(x,y) up to a permutation of rows and columes. It is just a question of which basis is used and how the elements are indexed. So, it looks like the 2d x 2d product obeys Alan's definition. The other > products are probably all broken. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Thu Apr 13 16:21:08 2006 From: aisaac at american.edu (Alan G Isaac) Date: Thu Apr 13 16:21:08 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EC2B4.807@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: On Thu, 13 Apr 2006, Tim Hochberg apparently wrote: > Here's what kron does as present: As possible context: http://www.mathworks.com/access/helpdesk/help/techdoc/ref/kron.html#998881 http://www.aptech.com/pdf_man/basicgauss.pdf p.69 In this sense, the 2-d handling is not surprising. Cheers, Alan Isaac From charlesr.harris at gmail.com Thu Apr 13 16:32:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 16:32:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: Hi, On 4/13/06, Alan G Isaac wrote: > > On Thu, 13 Apr 2006, Tim Hochberg apparently wrote: > > Here's what kron does as present: > > As possible context: > http://www.mathworks.com/access/helpdesk/help/techdoc/ref/kron.html#998881 > http://www.aptech.com/pdf_man/basicgauss.pdf p.69 > In this sense, the 2-d handling is not surprising. Yep, that is what the little python routine I gave above does. Note that in these cases only matrices are involved. Matlab, for instance, defines vectors as (1,n) or (n,1), which is actually helpful in minding the distinction between a vector space and its dual. I don't know how the numpy matrix package works, but the vectors of rank 1 are going to be a constant source of ambiguity. Cheers, > Alan Isaac Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at cox.net Thu Apr 13 16:37:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 13 16:37:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: <443EDFE7.6010509@cox.net> Charles R Harris wrote: > > > On 4/13/06, *Tim Hochberg* > wrote: > > Charles R Harris wrote: > > > Tim, > > > > On 4/13/06, *Tim Hochberg* > > >> wrote: > > > > Alan G Isaac wrote: > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > >>matrices isn't a matrix. > > >> > > >> > > > > > >That is an unusual way to describe things in > > >the world of econometrics. Here is a more > > >common way: > > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > < http://planetmath.org/encyclopedia/KroneckerProduct.html> > > >I share Sven's expectation. > > > > > > > > mathworld also agrees with you. As does the documentation > (as best > > as I > > can tell) and the actual output of kron. I think Charles must be > > thinking of the tensor product instead. > > > > > > It *is* the tensor product, A \tensor B, but it is not the most > > general tensor with four indices just as a bivector is not the most > > general tensor with two indices. Numerically, kron chooses to > > represent the tensor product of two vector spaces a, b with > dimensions > > n,m respectively as the direct sum of n copies of b, and the tensor > > product of two operators takes the given form. More generally, the B > > matrix in each spot could be replaced with an arbitrary matrix > of the > > correct dimensions and you would recover the general tensor with > four > > indices. > > > > Anyway, it sounds like you are proposing that the tensor (outer) > > product of two matrices be reshaped to run over two indices. It > seems > > that likewise the tensor (outer) product of two vectors should be > > reshaped to run over one index ( i.e. flat). That would do the > trick. > > I'm not proposing anything. I don't care at all what kron does. I > just > want to fix the return type if that's feasible so that people stop > complaining about it. As far as I can tell, kron already returns a > flattened tensor product of some sort. I believe the general tensor > product that you are talking about is already covered by > multiply.outer, > but I'm not sure so correct me if I'm wrong. Here's what kron does as > present: > > >>> a > array([[1, 1], > [1, 1]]) > >>> kron(a,a) # => 4x4 matrix > array([[1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1]]) > > > Good at first look. Lets see a simpler version... Nevermind, seems > numpy isn't working on this machine (X86_64, fc5 64 bit) at the > moment, maybe I need to check out a clean version. > > >>> kron(a,a[0]) => 8x1 > array([1, 1, 1, 1, 1, 1, 1, 1]) > > > Looks broken. a[0] should be an operator (matrix), so either it should > be (2,1) or (1,2). Since a is an array here, a[0] is shape (2,). Let's repeat this excercise using matrices, which are always rank-2 and see if they make sense. >>> m matrix([[1, 1], [1, 1]]) >>> kron(m, m[0]) matrix([[1, 1, 1, 1], [1, 1, 1, 1]]) >>> kron(m,m[:,0]) matrix([[1, 1], [1, 1], [1, 1], [1, 1]]) That looks OK. > In the first case, the return should have shape (4,2), in the latter > (2,4). Should probably raise an error as the result strikes me as > ambiguous. But I have to admit I am not sure what the point of this > particular construction is. > > >>> kron(a[0], a[0]) > Traceback (most recent call last): > File "", line 1, in ? > File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line > 577, in kron > result = concatenate(concatenate(o, axis=1), axis=1) > ValueError: 0-d arrays can't be concatenated > > >>> kron(m[0], m[0]) matrix([[1, 1, 1, 1]]) >>> kron(m[:,0], m[:,0]) matrix([[1], [1], [1], [1]]) >>> kron(m[:,0],m[0]) matrix([[1, 1], [1, 1]]) > See above. this could be (1,4) or (4,1), depending. All of these look like they're probably right without thinking about it too hard. > > >>> b.shape > (2, 2, 2) > >>> kron(b,b).shape > (4, 4, 2, 2) > > > I think this is doing transpose(outer(b,b), axis=(0,2,1,3)) and > reshaping the first 4 indices into 2. Again, I am not sure what the > point is for these operators. Now another way to get all this > functionality is to have a contraction function or method with a list > of axis. For instance, consider the matrices A(i,j) and B(k,l) > operating on x(j) and y(l) like A(i,j)x(j) and B(k,l)y(l), then the > outer product of all of these is > > A(i,j)B(k,l)x(j)y(l) > > with the summation convention on the indices j and l. The result > should be the same as kron(A,B)*kron(x,y) up to a permutation of rows > and columes. It is just a question of which basis is used and how the > elements are indexed. > > So, it looks like the 2d x 2d product obeys Alan's definition. The > other > products are probably all broken. > Here's my best guess as to what is going on: 1. There is a relatively large group of people who use Kronecker product as Alan does (probably the matrix as opposed to tensor math folks). I'm guessing it's a large group since they manage to write the definitions at both mathworld and planetmath. 2. kron was meant to implement this. 2.5 People who need the other meaning of kron can just use outer, so no real conflict. 3. The implementation was either inappropriately generalized or it was assumed that all inputs would be matrices (and hence rank-2). Assuming 3. is correct, and I'd like to hear from people if they think that the behaviour in the non rank-2 cases is sensible, the next question is whether the behaviour in the rank-2 cases makes sense. It seem to, but I'm not a user of kron. If both of the preceeding are true, it seems like a complete fix entails the following two things: 1. Forbid arguments that are not rank-2. This allows all matrices, which is really the main target here I think. 2. Fix the return type issue. I have a fix for this ready to commit, but I want to figure out the first part as well. Regards, -tim From charlesr.harris at gmail.com Thu Apr 13 17:14:32 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 17:14:32 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EDFE7.6010509@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> <443EDFE7.6010509@cox.net> Message-ID: On 4/13/06, Tim Hochberg wrote: > > Charles R Harris wrote: > > > > > > > On 4/13/06, *Tim Hochberg* > > wrote: > > > > Charles R Harris wrote: > > > > > Tim, > > > > > > On 4/13/06, *Tim Hochberg* > > > > >> > wrote: > > > > > > Alan G Isaac wrote: > > > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > > >>matrices isn't a matrix. > > > >> > > > >> > > > > > > > >That is an unusual way to describe things in > > > >the world of econometrics. Here is a more > > > >common way: > > > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > > < http://planetmath.org/encyclopedia/KroneckerProduct.html> > > > >I share Sven's expectation. > > > > > > > > > > > mathworld also agrees with you. As does the documentation > > (as best > > > as I > > > can tell) and the actual output of kron. I think Charles must > be > > > thinking of the tensor product instead. > > > > > > > > > It *is* the tensor product, A \tensor B, but it is not the most > > > general tensor with four indices just as a bivector is not the > most > > > general tensor with two indices. Numerically, kron chooses to > > > represent the tensor product of two vector spaces a, b with > > dimensions > > > n,m respectively as the direct sum of n copies of b, and > the tensor > > > product of two operators takes the given form. More generally, the > B > > > matrix in each spot could be replaced with an arbitrary matrix > > of the > > > correct dimensions and you would recover the general tensor with > > four > > > indices. > > > > > > Anyway, it sounds like you are proposing that the tensor (outer) > > > product of two matrices be reshaped to run over two indices. It > > seems > > > that likewise the tensor (outer) product of two vectors should be > > > reshaped to run over one index ( i.e. flat). That would do the > > trick. > > > > I'm not proposing anything. I don't care at all what kron does. I > > just > > want to fix the return type if that's feasible so that people stop > > complaining about it. As far as I can tell, kron already returns a > > flattened tensor product of some sort. I believe the general tensor > > product that you are talking about is already covered by > > multiply.outer, > > but I'm not sure so correct me if I'm wrong. Here's what kron does > as > > present: > > > > >>> a > > array([[1, 1], > > [1, 1]]) > > >>> kron(a,a) # => 4x4 matrix > > array([[1, 1, 1, 1], > > [1, 1, 1, 1], > > [1, 1, 1, 1], > > [1, 1, 1, 1]]) > > > > > > Good at first look. Lets see a simpler version... Nevermind, seems > > numpy isn't working on this machine (X86_64, fc5 64 bit) at the > > moment, maybe I need to check out a clean version. > > > > >>> kron(a,a[0]) => 8x1 > > array([1, 1, 1, 1, 1, 1, 1, 1]) > > > > > > Looks broken. a[0] should be an operator (matrix), so either it should > > be (2,1) or (1,2). > > Since a is an array here, a[0] is shape (2,). Let's repeat this > excercise using matrices, which are always rank-2 and see if they make > sense. > > >>> m > matrix([[1, 1], > [1, 1]]) > >>> kron(m, m[0]) > matrix([[1, 1, 1, 1], > [1, 1, 1, 1]]) > >>> kron(m,m[:,0]) > matrix([[1, 1], > [1, 1], > [1, 1], > [1, 1]]) > > That looks OK. > > > In the first case, the return should have shape (4,2), in the latter > > (2,4). Should probably raise an error as the result strikes me as > > ambiguous. But I have to admit I am not sure what the point of this > > particular construction is. > > > > >>> kron(a[0], a[0]) > > Traceback (most recent call last): > > File "", line 1, in ? > > File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line > > 577, in kron > > result = concatenate(concatenate(o, axis=1), axis=1) > > ValueError: 0-d arrays can't be concatenated > > > > > >>> kron(m[0], m[0]) > matrix([[1, 1, 1, 1]]) > >>> kron(m[:,0], m[:,0]) > matrix([[1], > [1], > [1], > [1]]) > >>> kron(m[:,0],m[0]) > matrix([[1, 1], > [1, 1]]) > > > See above. this could be (1,4) or (4,1), depending. > > All of these look like they're probably right without thinking about it > too hard. > > > > > >>> b.shape > > (2, 2, 2) > > >>> kron(b,b).shape > > (4, 4, 2, 2) > > > > > > I think this is doing transpose(outer(b,b), axis=(0,2,1,3)) and > > reshaping the first 4 indices into 2. Again, I am not sure what the > > point is for these operators. Now another way to get all this > > functionality is to have a contraction function or method with a list > > of axis. For instance, consider the matrices A(i,j) and B(k,l) > > operating on x(j) and y(l) like A(i,j)x(j) and B(k,l)y(l), then the > > outer product of all of these is > > > > A(i,j)B(k,l)x(j)y(l) > > > > with the summation convention on the indices j and l. The result > > should be the same as kron(A,B)*kron(x,y) up to a permutation of rows > > and columes. It is just a question of which basis is used and how the > > elements are indexed. > > > > So, it looks like the 2d x 2d product obeys Alan's definition. The > > other > > products are probably all broken. > > > Here's my best guess as to what is going on: > 1. There is a relatively large group of people who use Kronecker > product as Alan does (probably the matrix as opposed to tensor math > folks). I'm guessing it's a large group since they manage to write the > definitions at both mathworld and planetmath. > 2. kron was meant to implement this. > 2.5 People who need the other meaning of kron can just use outer, so > no real conflict. > 3. The implementation was either inappropriately generalized or it > was assumed that all inputs would be matrices (and hence rank-2). Uh-huh. Assuming 3. is correct, and I'd like to hear from people if they think > that the behaviour in the non rank-2 cases is sensible, the next > question is whether the behaviour in the rank-2 cases makes sense. It > seem to, but I'm not a user of kron. If both of the preceeding are true, > it seems like a complete fix entails the following two things: > 1. Forbid arguments that are not rank-2. This allows all matrices, > which is really the main target here I think. > 2. Fix the return type issue. I have a fix for this ready to commit, > but I want to figure out the first part as well. I think it was inappropriately generalized, it is hard to make sense of what kron means for rank > 2. So I vote for restricting the usage to matrices, or arrays of rank two. This avoids the both the ambiguity of rank one arrays and big why that arises for arrays with rank > 2. Note that in tensor algebra the rank 1 problem is solved by the use of upper or lower indices, lower index => [1,n], upper index => [n,1]. Hmm, I should to check that kron is associative: kron(kron(a,b),c) == kron(a, kron(b,c)) like a good tensor product should be. I suspect it is. Regards, > > -tim Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Apr 13 17:22:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 17:22:01 2006 Subject: [Numpy-discussion] Problem on FC5 Message-ID: Has anyone else seen this: Python 2.4.2 (#1, Feb 12 2006, 03:45:41) > [GCC 4.1.0 20060210 (Red Hat 4.1.0-0.24)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from numpy import * > *** buffer overflow detected ***: python terminated > ======= Backtrace: ========= > /lib64/libc.so.6(__chk_fail+0x2f)[0x32c76dee3f] > > /usr/lib64/python2.4/site-packages/numpy/core/multiarray.so[0x2aaaae191099] this is on FC5-x86_64. I didn't see any problems in the compilation and the right lib64 libs seem to have been used. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivazquez at ivazquez.net Thu Apr 13 17:48:18 2006 From: ivazquez at ivazquez.net (Ignacio Vazquez-Abrams) Date: Thu Apr 13 17:48:18 2006 Subject: [Numpy-discussion] Problem on FC5 In-Reply-To: References: Message-ID: <1144975662.3758.3.camel@ignacio.lan> On Thu, 2006-04-13 at 18:21 -0600, Charles R Harris wrote: > this is on FC5-x86_64. I didn't see any problems in the compilation > and the right lib64 libs seem to have been used. Self-built or from Fedora Extras? -- Ignacio Vazquez-Abrams http://fedora.ivazquez.net/ gpg --keyserver hkp://subkeys.pgp.net --recv-key 38028b72 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 191 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Thu Apr 13 19:04:10 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 19:04:10 2006 Subject: [Numpy-discussion] Problem on FC5 In-Reply-To: <1144975662.3758.3.camel@ignacio.lan> References: <1144975662.3758.3.camel@ignacio.lan> Message-ID: OK, I solved this problem by deleting the numpy directory in site-packages. I probably should have tried that first :-/ On 4/13/06, Ignacio Vazquez-Abrams wrote: > > On Thu, 2006-04-13 at 18:21 -0600, Charles R Harris wrote: > > this is on FC5-x86_64. I didn't see any problems in the compilation > > and the right lib64 libs seem to have been used. > > Self-built or from Fedora Extras? > > -- > Ignacio Vazquez-Abrams > http://fedora.ivazquez.net/ > > gpg --keyserver hkp://subkeys.pgp.net --recv-key 38028b72 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > > iD8DBQBEPvEuoK1Hsnseh8QRAgE4AJwMYPOUU6nz5z2aVBe6lz6fnAhgDwCgw2B0 > E9KCAvYMOYIz035NlwyLvYo= > =TZyJ > -----END PGP SIGNATURE----- > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chanley at stsci.edu Fri Apr 14 07:27:03 2006 From: chanley at stsci.edu (Christopher Hanley) Date: Fri Apr 14 07:27:03 2006 Subject: [Numpy-discussion] numpy.test() segfaults under Solaris 8 Message-ID: <443FB11E.5040102@stsci.edu> From the daily Solaris 8 regression tests: Found 5 tests for numpy.distutils.misc_util Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Found 4 tests for numpy.lib.getlimits Found 30 tests for numpy.core.numerictypes Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/random/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/random/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Found 13 tests for numpy.core.umath Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Found 8 tests for numpy.lib.arraysetops Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Found 42 tests for numpy.lib.type_check Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Found 90 tests for numpy.core.multiarray Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Found 3 tests for numpy.dft.helper Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Found 36 tests for numpy.core.ma Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/f2py/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Found 2 tests for numpy.core.oldnumeric Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/random/tests for module Found 9 tests for numpy.lib.twodim_base Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Found 8 tests for numpy.core.defmatrix Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Found 1 tests for numpy.lib.ufunclike Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Found 32 tests for numpy.lib.function_base Found 1 tests for numpy.lib.polynomial Warning: No test file found in /data/basil5/numpy/tests for module Found 6 tests for numpy.core.records Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Found 17 tests for numpy.core.numeric Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Found 4 tests for numpy.lib.index_tricks Found 44 tests for numpy.lib.shape_base Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Found 0 tests for __main__ check_1 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_2 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_3 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_gpaths (numpy.distutils.tests.test_misc_util.test_gpaths) ... ok check_1 (numpy.distutils.tests.test_misc_util.test_minrelpath) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_double) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_longdouble) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_python_float) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_single) ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check reading the top fields of a nested array ... ok Check reading the nested fields of a nested array (1st level) ... ok Check access nested descriptors of a nested array (1st level) ... ok Check reading the nested fields of a nested array (2nd level) ... ok Check access nested descriptors of a nested array (2nd level) ... ok Check reading the top fields of a nested array ... ok Check reading the nested fields of a nested array (1st level) ... ok Check access nested descriptors of a nested array (1st level) ... ok Check reading the nested fields of a nested array (2nd level) ... ok Check access nested descriptors of a nested array (2nd level) ... ok check_access_fields (numpy.core.tests.test_numerictypes.test_read_values_plain_multiple) ... ok check_access_fields (numpy.core.tests.test_numerictypes.test_read_values_plain_single) ... ok test_mixed (numpy.core.tests.test_umath.test_choose) ... ok check_expm1 (numpy.core.tests.test_umath.test_expm1) ... ok check_floating_point (numpy.core.tests.test_umath.test_floating_point) ... ok check_log1p (numpy.core.tests.test_umath.test_log1p) ... ok check_reduce_complex (numpy.core.tests.test_umath.test_maximum) ... ok check_reduce_complex (numpy.core.tests.test_umath.test_minimum) ... ok check_power_complex (numpy.core.tests.test_umath.test_power) ... ok check_power_float (numpy.core.tests.test_umath.test_power) ... ok test_array_with_context (numpy.core.tests.test_umath.test_special_methods) ... ok test_failing_wrap (numpy.core.tests.test_umath.test_special_methods) ... ok test_old_wrap (numpy.core.tests.test_umath.test_special_methods) ... ok test_priority (numpy.core.tests.test_umath.test_special_methods) ... ok test_wrap (numpy.core.tests.test_umath.test_special_methods) ... ok check_intersect1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_intersect1d_nu (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_manyways (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_setdiff1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_setmember1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_setxor1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_union1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_unique1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_cmplx (numpy.lib.tests.test_type_check.test_imag) ... ok check_real (numpy.lib.tests.test_type_check.test_imag) ... ok check_fail (numpy.lib.tests.test_type_check.test_iscomplex) ... ok check_pass (numpy.lib.tests.test_type_check.test_iscomplex) ... ok check_basic (numpy.lib.tests.test_type_check.test_iscomplexobj) ... ok check_complex (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_complex1 (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_goodvalues (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_ind (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_integer (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_neginf (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_posinf (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_goodvalues (numpy.lib.tests.test_type_check.test_isinf) ... ok check_ind (numpy.lib.tests.test_type_check.test_isinf) ... ok check_neginf (numpy.lib.tests.test_type_check.test_isinf) ... ok check_neginf_scalar (numpy.lib.tests.test_type_check.test_isinf) ... ok check_posinf (numpy.lib.tests.test_type_check.test_isinf) ... ok check_posinf_scalar (numpy.lib.tests.test_type_check.test_isinf) ... ok check_complex (numpy.lib.tests.test_type_check.test_isnan) ... ok check_complex1 (numpy.lib.tests.test_type_check.test_isnan) ... ok check_goodvalues (numpy.lib.tests.test_type_check.test_isnan) ... ok check_ind (numpy.lib.tests.test_type_check.test_isnan) ... ok check_integer (numpy.lib.tests.test_type_check.test_isnan) ... ok check_neginf (numpy.lib.tests.test_type_check.test_isnan) ... ok check_posinf (numpy.lib.tests.test_type_check.test_isnan) ... ok check_generic (numpy.lib.tests.test_type_check.test_isneginf) ... ok check_generic (numpy.lib.tests.test_type_check.test_isposinf) ... ok check_fail (numpy.lib.tests.test_type_check.test_isreal) ... ok check_pass (numpy.lib.tests.test_type_check.test_isreal) ... ok check_basic (numpy.lib.tests.test_type_check.test_isrealobj) ... ok check_basic (numpy.lib.tests.test_type_check.test_isscalar) ... ok check_default_1 (numpy.lib.tests.test_type_check.test_mintypecode) ... ok check_default_2 (numpy.lib.tests.test_type_check.test_mintypecode) ... ok check_default_3 (numpy.lib.tests.test_type_check.test_mintypecode) ... ok check_complex_bad (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_complex_bad2 (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_complex_good (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_generic (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_integer (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_cmplx (numpy.lib.tests.test_type_check.test_real) ... ok check_real (numpy.lib.tests.test_type_check.test_real) ... ok check_basic (numpy.lib.tests.test_type_check.test_real_if_close) ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok check_attributes (numpy.core.tests.test_multiarray.test_attributes) ... ok check_dtypeattr (numpy.core.tests.test_multiarray.test_attributes) ... ok check_fill (numpy.core.tests.test_multiarray.test_attributes) ... ok check_set_stridesattr (numpy.core.tests.test_multiarray.test_attributes) ... ok check_stridesattr (numpy.core.tests.test_multiarray.test_attributes) ... ok check_test_interning (numpy.core.tests.test_multiarray.test_bool) ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok check_from_attribute (numpy.core.tests.test_multiarray.test_creation) ... ok check_construction (numpy.core.tests.test_multiarray.test_dtypedescr) ... ok check_list (numpy.core.tests.test_multiarray.test_fancy_indexing) ... ok check_tuple (numpy.core.tests.test_multiarray.test_fancy_indexing) ... ok check_otherflags (numpy.core.tests.test_multiarray.test_flags) ... ok check_writeable (numpy.core.tests.test_multiarray.test_flags) ... ok check_ascii (numpy.core.tests.test_multiarray.test_fromstring) ... ok check_binary (numpy.core.tests.test_multiarray.test_fromstring) ... ok check_test_round (numpy.core.tests.test_multiarray.test_methods) ... ok check_both (numpy.core.tests.test_multiarray.test_pickling) ... ok check_test_zero_rank (numpy.core.tests.test_multiarray.test_subscripting) ... ok check_constructor (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_ellipsis_subscript (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_ellipsis_subscript_assignment (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_empty_subscript (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_empty_subscript_assignment (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_invalid_newaxis (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_invalid_subscript (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_invalid_subscript_assignment (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_newaxis (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_output (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_definition (numpy.dft.tests.test_helper.test_fftfreq) ... ok check_definition (numpy.dft.tests.test_helper.test_fftshift) ... ok check_inverse (numpy.dft.tests.test_helper.test_fftshift) ... ok test_clip (numpy.core.tests.test_ma.test_array_methods) ... ok test_cumprod (numpy.core.tests.test_ma.test_array_methods) ... ok test_cumsum (numpy.core.tests.test_ma.test_array_methods) ... ok test_ptp (numpy.core.tests.test_ma.test_array_methods) ... ok test_swapaxes (numpy.core.tests.test_ma.test_array_methods) ... ok test_trace (numpy.core.tests.test_ma.test_array_methods) ... ok test_varstd (numpy.core.tests.test_ma.test_array_methods) ... ok check_testAPI (numpy.core.tests.test_ma.test_ma) ... ok Test add, sum, product. ... ok Test of basic arithmetic. ... ok check_testArrayAttributes (numpy.core.tests.test_ma.test_ma) ... ok check_testArrayMethods (numpy.core.tests.test_ma.test_ma) ... ok Test of average. ... ok More tests of average. ... ok Test of basic array creation and properties in 1 dimension. ... ok Test of basic array creation and properties in 2 dimensions. ... ok Test of conversions and indexing ... ok Tests of some subtle points of copying and sizing. ... ok Test of inplace operations and rich comparisons ... ok check_testMaPut (numpy.core.tests.test_ma.test_ma) ... ok Test of masked element ... ok Test of minumum, maximum. ... ok check_testMixedArithmetic (numpy.core.tests.test_ma.test_ma) ... ok Test of other odd features ... ok Test of pickling ... ok Test of put ... ok check_testScalarArithmetic (numpy.core.tests.test_ma.test_ma) ... ok check_testSingleElementSubscript (numpy.core.tests.test_ma.test_ma) ... ok Test of take, transpose, inner, outer products ... ok check_testToPython (numpy.core.tests.test_ma.test_ma) ... ok Test various functions such as sin, cos. ... ok Test count ... ok check_testUfuncRegression (numpy.core.tests.test_ma.test_ufuncs) ... ok test_minmax (numpy.core.tests.test_ma.test_ufuncs) ... ok test_nonzero (numpy.core.tests.test_ma.test_ufuncs) ... ok test_reduce (numpy.core.tests.test_ma.test_ufuncs) ... ok check_bug_r2089 (numpy.core.tests.test_oldnumeric.test_put) ... ok check_array_subclass (numpy.core.tests.test_oldnumeric.test_wrapit) ... ok check_matrix (numpy.lib.tests.test_twodim_base.test_diag) ... ok check_vector (numpy.lib.tests.test_twodim_base.test_diag) ... ok check_2d (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_diag (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_diag2d (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_fliplr) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_flipud) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_rot90) ... ok check_basic (numpy.core.tests.test_defmatrix.test_algebra) ... ok check_basic (numpy.core.tests.test_defmatrix.test_casting) ... ok check_basic (numpy.core.tests.test_defmatrix.test_ctor) ... ok check_asmatrix (numpy.core.tests.test_defmatrix.test_properties) ... ok check_basic (numpy.core.tests.test_defmatrix.test_properties) ... ok check_comparisons (numpy.core.tests.test_defmatrix.test_properties) ... ok check_noaxis (numpy.core.tests.test_defmatrix.test_properties) ... ok Test whether matrix.sum(axis=1) preserves orientation. ... ok Doctest: numpy.lib.tests.test_ufunclike ... ok check_basic (numpy.lib.tests.test_function_base.test_all) ... ok check_nd (numpy.lib.tests.test_function_base.test_all) ... ok check_basic (numpy.lib.tests.test_function_base.test_amax) ... ok check_basic (numpy.lib.tests.test_function_base.test_amin) ... ok check_basic (numpy.lib.tests.test_function_base.test_angle) ... ok check_basic (numpy.lib.tests.test_function_base.test_any) ... ok check_nd (numpy.lib.tests.test_function_base.test_any) ... ok check_basic (numpy.lib.tests.test_function_base.test_average) ... ok check_basic (numpy.lib.tests.test_function_base.test_cumprod) ... ok check_basic (numpy.lib.tests.test_function_base.test_cumsum) ... ok check_basic (numpy.lib.tests.test_function_base.test_diff) ... ok check_nd (numpy.lib.tests.test_function_base.test_diff) ... ok check_basic (numpy.lib.tests.test_function_base.test_extins) ... ok check_both (numpy.lib.tests.test_function_base.test_extins) ... ok check_insert (numpy.lib.tests.test_function_base.test_extins) ... ok check_bartlett (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_blackman (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_hamming (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_hanning (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_simple (numpy.lib.tests.test_function_base.test_histogram) ... ok check_basic (numpy.lib.tests.test_function_base.test_linspace) ... ok check_corner (numpy.lib.tests.test_function_base.test_linspace) ... ok check_basic (numpy.lib.tests.test_function_base.test_logspace) ... ok check_basic (numpy.lib.tests.test_function_base.test_prod) ... ok check_basic (numpy.lib.tests.test_function_base.test_ptp) ... ok check_simple (numpy.lib.tests.test_function_base.test_sinc) ... ok check_simple (numpy.lib.tests.test_function_base.test_trapz) ... ok check_basic (numpy.lib.tests.test_function_base.test_trim_zeros) ... ok check_leading_skip (numpy.lib.tests.test_function_base.test_trim_zeros) ... ok check_trailing_skip (numpy.lib.tests.test_function_base.test_trim_zeros) ... ok check_simple (numpy.lib.tests.test_function_base.test_unwrap) ... ok check_vectorize (numpy.lib.tests.test_function_base.test_vectorize)Segmentation Fault (core dumped) This is a clean checkout and build of numpy that is done every morning on a Solaris 8 system. We are currently using python 2.4.2 on this machine. The equivalent build and test on a RHE system passed with no problems. Chris From fullung at gmail.com Fri Apr 14 08:15:14 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 08:15:14 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <20060412124032.GA30471@sun.ac.za> Message-ID: <006c01c65fd6$2d043b90$0502010a@dsp.sun.ac.za> Hello all There still seems to be a problem with vectorize (or something else). So far I've only been able to reproduce the problem by running the test suite 5 times under IPython on Windows (weird, eh?). Details here: http://projects.scipy.org/scipy/numpy/ticket/52 If anybody has some ideas on how to do a proper debug build with MinGW so that I can get a useful stack trace from the Visual Studio debugger, I can narrow down the problem further. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Stefan van der Walt > Sent: 12 April 2006 14:41 > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Vectorize bug > > Hello all > > Vectorize segfaults for large arrays. I filed the bug at > > http://projects.scipy.org/scipy/numpy/ticket/52 > > The offending code is > > import numpy as N > x = N.linspace(-3,2,10000) > y = N.vectorize(lambda x: x) > > # Segfaults here > y(x) > > Regards > St?fan From fullung at gmail.com Fri Apr 14 08:18:02 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 08:18:02 2006 Subject: [Numpy-discussion] numpy.test() segfaults under Solaris 8 In-Reply-To: <443FB11E.5040102@stsci.edu> Message-ID: <006d01c65fd6$85b72450$0502010a@dsp.sun.ac.za> Hello Chris I am seeing this same crash on Windows under IPython with revision 2351 of NumPy from SVN. If you can get a useful stack trace on your platform, you could add some details to this ticket: http://projects.scipy.org/scipy/numpy/ticket/52 Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Christopher Hanley > Sent: 14 April 2006 16:27 > To: numpy-discussion > Subject: [Numpy-discussion] numpy.test() segfaults under Solaris 8 > > From the daily Solaris 8 regression tests: > check_vectorize > (numpy.lib.tests.test_function_base.test_vectorize)Segmentation Fault > (core dumped) > > This is a clean checkout and build of numpy that is done every morning > on a Solaris 8 system. We are currently using python 2.4.2 on this > machine. The equivalent build and test on a RHE system passed with no > problems. > > Chris From support_ref_16193163133 at natwest.com Fri Apr 14 08:30:09 2006 From: support_ref_16193163133 at natwest.com (support_ref_16193163133 at natwest.com) Date: Fri Apr 14 08:30:09 2006 Subject: [Numpy-discussion] NatWest Account service update! Message-ID: An HTML attachment was scrubbed... URL: From faltet at xot.carabos.com Fri Apr 14 14:36:06 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Fri Apr 14 14:36:06 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy Message-ID: <20060414213511.GA14355@xot.carabos.com> Hi, I'm seeing some slowness in NumPy when dealing with strided arrays. numarray is dealing better with these situations, so I guess that something could be done in NumPy about this. Below are the situations that I've found up to now (maybe there are others). For the timings, I've used numpy 0.9.7.2278 and numarray 1.5.1. It seems that NumPy copy() method is almost 3x slower than in numarray: In [105]: npcopy=timeit.Timer('b=a.copy()','import numpy as np;a=np.arange(1000000,dtype="Float64")[::10]') In [106]: npcopy.repeat(3,10) Out[106]: [0.171913146972656, 0.175906896591186, 0.171195983886718] In [107]: nacopy=timeit.Timer('b=a.copy()','import numarray as np;a=np.arange(1000000,type="Float64")[::10]') In [108]: nacopy.repeat(3,10) Out[108]: [0.065090894699096, 0.0630550384521484, 0.0626609325408935] However, a copy without strides performs similarly in both packages In [127]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np;a=np.arange(1000000,dtype="Float64")') In [128]: npcopy2.repeat(3,10) Out[128]: [0.24657797813415527, 0.24657106399536133, 0.2464911937713623] In [129]: nacopy2=timeit.Timer('b=a.copy()','import numarray as np;a=np.arange(1000000,type="Float64")') In [130]: nacopy2.repeat(3,10) Out[130]: [0.244544982910156, 0.251885890960693, 0.2419440746307373] -------------------------------------------- where() seems more than 2x slower in NumPy than in numarray: In [136]: tnpf=timeit.Timer('np.where(a + b < 10, a, b)','import numpy as np;a=np.arange(100000,dtype="float64");b=a*2') In [137]: tnpf.repeat(3,10) Out[137]: [0.225586891174316, 0.22503495216369629, 0.224209785461425] In [138]: tnaf=timeit.Timer('np.where(a + b < 2, a, b)','import numarray as np;a=np.arange(100000,type="Float64");b=a*2') In [139]: tnaf.repeat(3,10) Out[139]: [0.108436822891235, 0.1069340705871582, 0.10654377937316895] However, for where() without parameters, NumPy performs slightly better than numarray: In [143]: tnpf2=timeit.Timer('np.where(a + b < 10)','import numpy as np;a=np.arange(100000,dtype="float64");b=a*2') In [144]: tnpf2.repeat(3,10) Out[144]: [0.0759999752044677, 0.0731539726257324, 0.073034048080444336] In [145]: tnaf2=timeit.Timer('np.where(a + b < 2)','import numarray as np;a=np.arange(100000,type="Float64");b=a*2') In [146]: tnaf2.repeat(3,10) Out[146]: [0.0890851020812988, 0.0853078365325927, 0.085799932479858398] Cheers, Francesc From oliphant at ee.byu.edu Fri Apr 14 14:54:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 14 14:54:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <006c01c65fd6$2d043b90$0502010a@dsp.sun.ac.za> References: <006c01c65fd6$2d043b90$0502010a@dsp.sun.ac.za> Message-ID: <444019E8.8000700@ee.byu.edu> Albert Strasheim wrote: >Hello all > >There still seems to be a problem with vectorize (or something else). So far >I've only been able to reproduce the problem by running the test suite 5 >times under IPython on Windows (weird, eh?). Details here: > >http://projects.scipy.org/scipy/numpy/ticket/52 > > I'm pretty sure it's a reference-counting issue. I think I found the problem and it should now be fixed. I'm hoping this will clear up the Solaris issue as well. -Travis From oliphant at ee.byu.edu Fri Apr 14 16:04:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 14 16:04:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060414213511.GA14355@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> Message-ID: <44402A2A.9050300@ee.byu.edu> faltet at xot.carabos.com wrote: >Hi, > >I'm seeing some slowness in NumPy when dealing with strided arrays. >numarray is dealing better with these situations, so I guess that >something could be done in NumPy about this. Below are the situations >that I've found up to now (maybe there are others). For the timings, >I've used numpy 0.9.7.2278 and numarray 1.5.1. > > What I've found in experiments like this in the past is that numarray is good at striding in one direction but much worse at striding in another direction for multi-dimensional arrays. Of course my experiments were not complete. That just seemed to be the case. The array-iterator construct handles almost all of these cases. The copy method is a good place to start since it uses that code. -Travis From fullung at gmail.com Fri Apr 14 16:34:06 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 16:34:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <444019E8.8000700@ee.byu.edu> Message-ID: <00f301c6601b$d340a350$0502010a@dsp.sun.ac.za> Hello Travis I'm still getting the same crash when running via IPython, which is the only way I've been able to reproduce the crash on Windows. Just to confirm: In [1]: import numpy In [2]: numpy.__version__ Out[2]: '0.9.7.2356' The crash now happens in check_large, which is the new name of the test method in question. Cheers, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 14 April 2006 23:54 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > >Hello all > > > >There still seems to be a problem with vectorize (or something else). So > far > >I've only been able to reproduce the problem by running the test suite 5 > >times under IPython on Windows (weird, eh?). Details here: > > > >http://projects.scipy.org/scipy/numpy/ticket/52 > > > > > I'm pretty sure it's a reference-counting issue. I think I found the > problem and it should now be fixed. > > I'm hoping this will clear up the Solaris issue as well. > > -Travis From oliphant at ee.byu.edu Fri Apr 14 16:43:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 14 16:43:07 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00f301c6601b$d340a350$0502010a@dsp.sun.ac.za> References: <00f301c6601b$d340a350$0502010a@dsp.sun.ac.za> Message-ID: <44403354.2040708@ee.byu.edu> Albert Strasheim wrote: >Hello Travis > >I'm still getting the same crash when running via IPython, which is the only >way I've been able to reproduce the crash on Windows. > >Just to confirm: > >In [1]: import numpy > >In [2]: numpy.__version__ >Out[2]: '0.9.7.2356' > >The crash now happens in check_large, which is the new name of the test >method in question. > > Do you have SciPy installed? Make sure you are not importing an old version of SciPy. I cannot reproduce this problem. -Travis From fullung at gmail.com Fri Apr 14 16:55:04 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 16:55:04 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44403354.2040708@ee.byu.edu> Message-ID: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> Hello I don't have SciPy installed. Is there any way of doing a debug build of the C code so that I can investigate this problem? You say that you cannot reproduce this problem. Are you trying to reproduce it on Linux or on Windows under IPython? I have also been unable to reproduce the crash on Linux, but as we saw earlier, this crash also cropped up on Solaris, without having to run the tests N times. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 15 April 2006 01:42 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > >Hello Travis > > > >I'm still getting the same crash when running via IPython, which is the > only > >way I've been able to reproduce the crash on Windows. > > > >Just to confirm: > > > >In [1]: import numpy > > > >In [2]: numpy.__version__ > >Out[2]: '0.9.7.2356' > > > >The crash now happens in check_large, which is the new name of the test > >method in question. > > > > > Do you have SciPy installed? > > Make sure you are not importing an old version of SciPy. > > I cannot reproduce this problem. > > -Travis From fullung at gmail.com Fri Apr 14 16:58:03 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 16:58:03 2006 Subject: [Numpy-discussion] Summer of Code 2006 Message-ID: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> Hello all The Google Summer of Code site for 2006 is up: http://code.google.com/soc/ Maybe the NumPy team can propose a few projects to be funded by this program. Personally, I'd be interested in working on the build system, especially on Windows, and/or extending the test suite. Regards, Albert From fullung at gmail.com Fri Apr 14 17:19:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 17:19:05 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> Message-ID: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Hello all I think Valgrind might be very useful in tracking down this bug. http://valgrind.org/ Example usage: ~/bin/valgrind \ -v --error-limit=no --leak-check=full \ python -c 'import numpy; numpy.test()' Valgrind emits many warnings for things going on inside Python on my Fedora Core 4 system, but there is also a lot of interesting things going on in the NumPy code. Some warnings that someone might want to look at: ==26750== Use of uninitialised value of size 4 ==26750== at 0x453D4B1: DOUBLE_to_OBJECT (arraytypes.inc:4470) ==26750== by 0x46AB3F3: PyUFunc_GenericFunction (ufuncobject.c:1566) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) ==26750== Conditional jump or move depends on uninitialised value(s) ==26750== at 0x4556055: PyArray_Newshape (multiarraymodule.c:524) ==26750== by 0x45568F4: PyArray_Reshape (multiarraymodule.c:369) ==26750== by 0x4556931: array_shape_set (arrayobject.c:4642) ==26750== Address 0x41D2010 is 392 bytes inside a block of size 1,648 free'd ==26750== at 0x4004F6B: free (vg_replace_malloc.c:235) ==26750== by 0x46A53C3: ufuncloop_dealloc (ufuncobject.c:1280) ==26750== by 0x46AAD60: PyUFunc_GenericFunction (ufuncobject.c:1656) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) ==26750== Conditional jump or move depends on uninitialised value(s) ==26750== at 0x454EE52: PyArray_NewFromDescr (arrayobject.c:4119) ==26750== by 0x4550919: PyArray_GetField (arraymethods.c:265) ==26750== by 0x456C05A: array_subscript (arrayobject.c:2010) ==26750== by 0x456D606: array_subscript_nice (arrayobject.c:2250) ==26750== Conditional jump or move depends on uninitialised value(s) ==26750== at 0x455ED1D: PyArray_MapIterReset (arrayobject.c:7788) ==26750== by 0x456D087: array_ass_sub (arrayobject.c:1812) A possible memory leak: ==26750== 6,051 (1,120 direct, 4,931 indirect) bytes in 28 blocks are definitely lost in loss record 35 of 55 ==26750== at 0x400444E: malloc (vg_replace_malloc.c:149) ==26750== by 0x45442D8: array_alloc (arrayobject.c:5332) ==26750== by 0x454F19D: PyArray_NewFromDescr (arrayobject.c:4155) ==26750== by 0x46A61E4: construct_loop (ufuncobject.c:1000) ==26750== by 0x46AAD09: PyUFunc_GenericFunction (ufuncobject.c:1401) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) ==26750== by 0x454243B: PyArray_GenericBinaryFunction (arrayobject.c:2593) ==26750== by 0x456DA2C: PyArray_Round (multiarraymodule.c:291) The following error is generated when the test segfaults: ==26750== Process terminating with default action of signal 11 (SIGSEGV) ==26750== Access not within mapped region at address 0x10FFFF ==26750== at 0x453D4B1: DOUBLE_to_OBJECT (arraytypes.inc:4470) ==26750== by 0x46AB3F3: PyUFunc_GenericFunction (ufuncobject.c:1566) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Albert Strasheim > Sent: 15 April 2006 01:55 > To: 'numpy-discussion' > Subject: RE: [Numpy-discussion] Vectorize bug > > Hello > > I don't have SciPy installed. Is there any way of doing a debug build of > the > C code so that I can investigate this problem? > > You say that you cannot reproduce this problem. Are you trying to > reproduce > it on Linux or on Windows under IPython? I have also been unable to > reproduce the crash on Linux, but as we saw earlier, this crash also > cropped > up on Solaris, without having to run the tests N times. > > Regards, > > Albert > > > -----Original Message----- > > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > > Sent: 15 April 2006 01:42 > > To: numpy-discussion > > Subject: Re: [Numpy-discussion] Vectorize bug > > > > Albert Strasheim wrote: > > > > >Hello Travis > > > > > >I'm still getting the same crash when running via IPython, which is the > > only > > >way I've been able to reproduce the crash on Windows. > > > > > >Just to confirm: > > > > > >In [1]: import numpy > > > > > >In [2]: numpy.__version__ > > >Out[2]: '0.9.7.2356' > > > > > >The crash now happens in check_large, which is the new name of the test > > >method in question. > > > > > > > > Do you have SciPy installed? > > > > Make sure you are not importing an old version of SciPy. > > > > I cannot reproduce this problem. > > > > -Travis > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From oliphant.travis at ieee.org Fri Apr 14 18:20:03 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:20:03 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Message-ID: <44404A18.1070202@ieee.org> Albert Strasheim wrote: > Hello all > > I think Valgrind might be very useful in tracking down this bug. > > http://valgrind.org/ > It's a good suggestion. I've run the code through Valgrind, several times before releasing the first version of NumPy. I tracked down many memory leaks that way already. There may be errors that have creeped in, but Valgrind does not help with reference counting errors which this may be. But, I need to be able to reproduce the problem to have any hope of finding it. -Travis From oliphant.travis at ieee.org Fri Apr 14 18:21:09 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:21:09 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> References: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> Message-ID: <44404A5B.5010802@ieee.org> Albert Strasheim wrote: > Hello > > I don't have SciPy installed. Is there any way of doing a debug build of the > C code so that I can investigate this problem? > > You say that you cannot reproduce this problem. Are you trying to reproduce > it on Linux or on Windows under IPython? I have also been unable to > reproduce the crash on Linux, but as we saw earlier, this crash also cropped > up on Solaris, without having to run the tests N times. > > I've tried under Linux with IPython and cannot reproduce the error. I've run numpy.test() 100 times with no error. I'm not sure if the Solaris crash is fixed or not yet after the recent changes to SVN. There may be more than one bug here... -Travis From oliphant.travis at ieee.org Fri Apr 14 18:47:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:47:01 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Message-ID: <44405068.203@ieee.org> Albert Strasheim wrote: > Hello all > > I think Valgrind might be very useful in tracking down this bug. > > http://valgrind.org/ > > Example usage: > > ~/bin/valgrind \ > -v --error-limit=no --leak-check=full \ > python -c 'import numpy; numpy.test()' > > Valgrind emits many warnings for things going on inside Python on my Fedora > Core 4 system, but there is also a lot of interesting things going on in the > NumPy code. > > Some warnings that someone might want to look at: > > ==26750== Use of uninitialised value of size 4 > ==26750== at 0x453D4B1: DOUBLE_to_OBJECT (arraytypes.inc:4470) > ==26750== by 0x46AB3F3: PyUFunc_GenericFunction (ufuncobject.c:1566) > ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) > I think this may be the culprit. The buffer was not being initialized to NULL and so DECREF was being called on whatever was there. This can produce strange results indeed depending on the environment. I've initialized the buffer now for loops involving OBJECTs (this same error has happened a couple of times as it's one of the big ones for object arrays). I thought I fixed all places where it might occur, but apparently not... Perhaps you could try the code again. From oliphant.travis at ieee.org Fri Apr 14 18:49:03 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:49:03 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Message-ID: <444050DA.6050809@ieee.org> Albert Strasheim wrote: > Hello all > > I think Valgrind might be very useful in tracking down this bug. > > http://valgrind.org/ > > Example usage: > > ~/bin/valgrind \ > -v --error-limit=no --leak-check=full \ > python -c 'import numpy; numpy.test()' > Here's the command that I run to test a Python script provided at the command line: valgrind --tool=memcheck --leak-check=yes --error-limit=no -v --log-file=testmem --suppressions=valgrind-python.supp --show-reachable=yes --num-callers=10 python $1 The valgrind-python.supp file will suppress the complaints valgrind emits for Python. -Travis From robert.kern at gmail.com Fri Apr 14 22:21:00 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 14 22:21:00 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> References: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> Message-ID: Albert Strasheim wrote: > Hello all > > The Google Summer of Code site for 2006 is up: > > http://code.google.com/soc/ > > Maybe the NumPy team can propose a few projects to be funded by this > program. Personally, I'd be interested in working on the build system, > especially on Windows, and/or extending the test suite. What work do you think needs to be done on the build system? (I'm not contending the point; I'm just curious.) -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From fullung at gmail.com Sat Apr 15 02:26:04 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 02:26:04 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: Message-ID: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> Hello all > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Robert Kern > Sent: 15 April 2006 07:20 > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Re: Summer of Code 2006 > > Albert Strasheim wrote: > > Hello all > > > > The Google Summer of Code site for 2006 is up: > > > > http://code.google.com/soc/ > > > > Maybe the NumPy team can propose a few projects to be funded by this > > program. Personally, I'd be interested in working on the build system, > > especially on Windows, and/or extending the test suite. > > What work do you think needs to be done on the build system? (I'm not > contending the point; I'm just curious.) Let me start by saying that the build system works fine for what I think is the default case, i.e. building NumPy on Linux with preinstalled LAPACK and BLAS. However, as soon as you vary any of those parameters, things get interesting. I've spent the past couple of days trying to build NumPy on Windows with ATLAS and CLAPACK with MinGW and Visual Studio .NET 2003 and VS 8. I don't know if it's just me, but this seems to be very hard. This could probably be partly attributed to the build systems of these libraries and to the lack of documentation, but I've also run into problems with NumPy build scripts. For example, the inclusion of the gcc library in the list of libraries when building Fortran code with MinGW causes the build to break. Also, building FLAPACK from source causes the build to fail (too many open files). While these errors on their own aren't particularly serious, I think it would be helpful to set up an automated system to check that builds of the various configurations NumPy supports can actually be done. There are probably a few million ways to build NumPy, but it would be nice if we could make sure that the N most common configurations always work, and provide documentation for "trying this at home." I also think it would be useful to set up a system that performs regular builds of the latest revision from the SVN repository. I think anyone attempting this is going to run into a few issues with the build scripts, especially when trying to build on multiple platforms. Things I would like to get right, which I think are much harder than they need to be (feel free to disagree): - Windows builds in general - Visual Studio .NET 2003 builds - Visual C++ Toolkit 2003 builds - Visual Studio 2005 builds - Builds with ATLAS and CLAPACK The reason I'm interested in the Microsoft compilers is that they have many features to help us make sure that the code is correct, both at compile time and at run time. Any comments? Anybody building on Windows that finds the process to be completely painless? Regards, Albert From fullung at gmail.com Sat Apr 15 02:42:06 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 02:42:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44404A5B.5010802@ieee.org> Message-ID: <013601c66070$d2377010$0502010a@dsp.sun.ac.za> Hello all The crash I was seeing seems to be fixed in revision 2358. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 15 April 2006 03:20 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > Hello > > > > I don't have SciPy installed. Is there any way of doing a debug build of > the > > C code so that I can investigate this problem? > > > > You say that you cannot reproduce this problem. Are you trying to > reproduce > > it on Linux or on Windows under IPython? I have also been unable to > > reproduce the crash on Linux, but as we saw earlier, this crash also > cropped > > up on Solaris, without having to run the tests N times. > > > > > I've tried under Linux with IPython and cannot reproduce the error. > I've run numpy.test() 100 times with no error. > > I'm not sure if the Solaris crash is fixed or not yet after the recent > changes to SVN. There may be more than one bug here... > > -Travis From fullung at gmail.com Sat Apr 15 04:59:03 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 04:59:03 2006 Subject: [Numpy-discussion] bool_ leaks memory Message-ID: <014701c66083$e3ca5c30$0502010a@dsp.sun.ac.za> Hello all According to Valgrind 3.1.1, the following code leaks memory: from numpy import bool_ bool_(1) Valgrind says: ==32531== 82 (80 direct, 2 indirect) bytes in 2 blocks are definitely lost in loss record 7 of 25 ==32531== at 0x400444E: malloc (vg_replace_malloc.c:149) ==32531== by 0x45442E8: array_alloc (arrayobject.c:5330) ==32531== by 0x454F18D: PyArray_NewFromDescr (arrayobject.c:4153) ==32531== by 0x4551844: Array_FromScalar (arrayobject.c:5768) ==32531== by 0x45602B7: PyArray_FromAny (arrayobject.c:6630) ==32531== by 0x4570065: bool_arrtype_new (scalartypes.inc:2855) ==32531== by 0x2FBF6E: (within /usr/lib/libpython2.4.so.1.0) ==32531== by 0x2C53B3: PyObject_Call (in /usr/lib/libpython2.4.so.1.0) The second leak that Valgrind reports is from this code in ma.py: MaskType = bool_ nomask = MaskType(0) Tested with NumPy 0.9.7.2358. Trac ticket at http://projects.scipy.org/scipy/numpy/ticket/60 Regards, Albert From faltet at xot.carabos.com Sat Apr 15 05:06:01 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Sat Apr 15 05:06:01 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <44402A2A.9050300@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <44402A2A.9050300@ee.byu.edu> Message-ID: <20060415120451.GA15123@xot.carabos.com> On Fri, Apr 14, 2006 at 05:03:06PM -0600, Travis Oliphant wrote: > What I've found in experiments like this in the past is that numarray is > good at striding in one direction but much worse at striding in another > direction for multi-dimensional arrays. Of course my experiments were > not complete. That just seemed to be the case. > > The array-iterator construct handles almost all of these cases. The > copy method is a good place to start since it uses that code. I'm not sure this is directly related with striding. Look at this: In [5]: npcopy=timeit.Timer('a=a.copy()','import numpy as np; a=np.arange(1000000,dtype="Float64")[::10]') In [6]: npcopy.repeat(3,10) Out[6]: [0.061118125915527344, 0.061014175415039062, 0.063937187194824219] In [7]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np; a=np.arange(1000000,dtype="Float64")[::10]') In [8]: npcopy2.repeat(3,10) Out[8]: [0.29984092712402344, 0.29889702796936035, 0.29834103584289551] You see? assigning to a new variable makes the copy go 5x times slower! numarray is also affected by this, but not as much: In [9]: nacopy=timeit.Timer('a=a.copy()','import numarray as np; a=np.arange(1000000,type="Float64")[::10]') In [10]: nacopy.repeat(3,10) Out[10]: [0.039573907852172852, 0.037765979766845703, 0.038245916366577148] In [11]: nacopy2=timeit.Timer('b=a.copy()','import numarray as np; a=np.arange(1000000,type="Float64")[::10]') In [12]: nacopy2.repeat(3,10) Out[12]: [0.073218107223510742, 0.07414698600769043, 0.072872161865234375] i.e. just a 2x slowdown. I don't understand this effect: in both cases we are doing a plain copy, no? I'm missing something, but not sure what it is. Regards, -- Francesc From fullung at gmail.com Sat Apr 15 06:38:02 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 06:38:02 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <444050DA.6050809@ieee.org> Message-ID: <014e01c66091$b6b6b730$0502010a@dsp.sun.ac.za> Hello all I did some more Valgrinding and reduces all the warnings still produced when running NumPy revision 0.9.7.2358 to a few lines of code. The relevant Trac tickets: http://projects.scipy.org/scipy/numpy/ticket/60 http://projects.scipy.org/scipy/numpy/ticket/61 http://projects.scipy.org/scipy/numpy/ticket/62 http://projects.scipy.org/scipy/numpy/ticket/64 http://projects.scipy.org/scipy/numpy/ticket/65 If anybody else wants to play with Valgrind, you can find the Valgrind supressions for Python 2.4 here: http://svn.python.org/projects/python/branches/release24-maint/Misc/valgrind -python.supp See also http://svn.python.org/projects/python/branches/release24-maint/Misc/README.v algrind Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 15 April 2006 03:48 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > Hello all > > > > I think Valgrind might be very useful in tracking down this bug. > > > > http://valgrind.org/ > > > > Example usage: > > > > ~/bin/valgrind \ > > -v --error-limit=no --leak-check=full \ > > python -c 'import numpy; numpy.test()' > > > > Here's the command that I run to test a Python script provided at the > command line: > > valgrind --tool=memcheck --leak-check=yes --error-limit=no -v > --log-file=testmem --suppressions=valgrind-python.supp > --show-reachable=yes --num-callers=10 python $1 > > > The valgrind-python.supp file will suppress the complaints valgrind > emits for Python. > > > -Travis From cjw at sympatico.ca Sat Apr 15 08:01:03 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Apr 15 08:01:03 2006 Subject: [Numpy-discussion] Summer of Code 2006 In-Reply-To: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> References: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> Message-ID: <44410A87.70205@sympatico.ca> Albert Strasheim wrote: >Hello all > >The Google Summer of Code site for 2006 is up: > >http://code.google.com/soc/ > >Maybe the NumPy team can propose a few projects to be funded by this >program. Personally, I'd be interested in working on the build system, >especially on Windows, and/or extending the test suite. > >Regards, > >Albert > > > > I believe that the Python Software Foundation (http://www.python.org/psf/grants/) offers funding from time to time. Colin W. From Saqib.Sohail at colorado.edu Sat Apr 15 08:51:02 2006 From: Saqib.Sohail at colorado.edu (Saqib bin Sohail) Date: Sat Apr 15 08:51:02 2006 Subject: [Numpy-discussion] Code Question Message-ID: <1145116214.444116365d326@webmail.colorado.edu> Hi guys I have never used python, but I wanted to compute FFT of audio files, I came upon a page which had python code, so I installed Numpy but after beating the bush for a few days, I have finally come in here to ask. After taking the FFT I want to output it to a file and the use gnuplot to plot it. When I instaled NumPy, and ran the tests, it seemed that all passed without a problem. My input is a .dat file converted from .wav file by sox. Here is the code which obviously doesn't work because it seems that changes have occured since this code was written. (not my code, just from some website where a guy had written on how to do things which i require) import Numeric import FFT out_array=Numeric.array(out) out_fft=FFT.fft(out) offt=open('outfile_fft.dat','w') for x in range(len(out_fft)/2): offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) I do the following at the python prompt import numarray myFile = open('test.dat', 'r') my_array = numarray.arra(myFile) /* at this stage I wanted to see if it was correctly read */ print myArray [1632837691 1701605485 1952535072 ..., 538976288 538976288 168632368] it seems that these values do not correspond to the values in the file (but I guess the array is considering these as ints when infact these are floats) anyway the problem starts when i try to do fft, because I can't seem to find module or how to invoke it, the second problem is writing to the file, that code obviously doesn't work, and in my search through various documentations, i found arrayrange() but couldn't make it to work, call me stupid, but despite going through several examples, i haven't been able to make the for loop worked in any case, it would be very kind of someone if he could at least tell me what i am doing wrong and reply a simple example so that I can modify my code or at least be able to understand . Thanks -- Saqib bin Sohail PhD ECE University of Colorado at Boulder Res: (303) 786 0636 http://ucsu.colorado.edu/~sohail/index.html From ndarray at mac.com Sat Apr 15 09:10:07 2006 From: ndarray at mac.com (Sasha) Date: Sat Apr 15 09:10:07 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44404A18.1070202@ieee.org> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> <44404A18.1070202@ieee.org> Message-ID: On 4/14/06, Travis Oliphant wrote: > ... > There may be errors that have creeped in, but Valgrind does not help > with reference counting errors which this may be. > ... Valgrind is alittle bit more helpful if python is compiled using --without-pymalloc config option. In addition to valgrind, memory problems can be exposed by using --with-pydebug option. From faltet at xot.carabos.com Sat Apr 15 10:29:01 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Sat Apr 15 10:29:01 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <44410972.4090502@cox.net> References: <20060414213511.GA14355@xot.carabos.com> <44402A2A.9050300@ee.byu.edu> <20060415120451.GA15123@xot.carabos.com> <44410972.4090502@cox.net> Message-ID: <20060415172755.GA15274@xot.carabos.com> On Sat, Apr 15, 2006 at 07:55:46AM -0700, Tim Hochberg wrote: > >I'm not sure this is directly related with striding. Look at this: > > > >In [5]: npcopy=timeit.Timer('a=a.copy()','import numpy as np; > >a=np.arange(1000000,dtype="Float64")[::10]') > > > >In [6]: npcopy.repeat(3,10) > >Out[6]: [0.061118125915527344, 0.061014175415039062, > >0.063937187194824219] > > > >In [7]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np; > >a=np.arange(1000000,dtype="Float64")[::10]') > > > >In [8]: npcopy2.repeat(3,10) > >Out[8]: [0.29984092712402344, 0.29889702796936035, 0.29834103584289551] > > > >You see? assigning to a new variable makes the copy go 5x times > >slower! > > > You are being tricked! In the first case, the array is discontiguous for > the first copy but for every subsequenc copy is contiguous since you > replace 'a'. In the second case, the array is discontiguous for every copy Oh, yes!. Thanks for noting this!. So in order to compare apples with apples, the difference between numarray and numpy in case of strided copies is: In [87]: npcopy_stride=timeit.Timer('b=a.copy()','import numpy as np; a=np.arange(1000000,dtype="Float64")[::10]') In [88]: npcopy_stride.repeat(3,10) Out[88]: [0.30013298988342285, 0.29976487159729004, 0.29945492744445801] In [89]: nacopy_stride=timeit.Timer('b=a.copy()','import numarray as np; a=np.arange(1000000,type="Float64")[::10]') In [90]: nacopy_stride.repeat(3,10) Out[90]: [0.07545709609985351, 0.0731458663940429, 0.073173046112060547] so numpy is aproximately 4x times slower than numarray. Cheers, Francesc From oliphant.travis at ieee.org Sat Apr 15 10:51:18 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 15 10:51:18 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> References: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> Message-ID: <44413251.3080505@ieee.org> Albert Strasheim wrote: > Hello all > > > Let me start by saying that the build system works fine for what I think is > the default case, i.e. building NumPy on Linux with preinstalled LAPACK and > BLAS. However, as soon as you vary any of those parameters, things get > interesting. > It also builds fine with mingw and pre-installed ATLAS (I do it all the time). It also builds fine with no-installed ATLAS (or LAPACK or BLAS) with mingw32 and Linux. It also builds on Mac OS X. It also builds on Solaris, AIX, and Cygwin. Work also went in recently to make sure it builds with a Visual Studio Compiler (the one Tim Hochberg was using...) So, I think it's a bit unfair to say that varying from only a Linux build causes "things to get interesting". Definitely there are configurations that can require a specialized site.cfg file and it can be difficult if you build with a compiler that was not used to build Python itself. But, it's not a one-platform build system. I just want that to be clear. Documentation on the site.cfg file could be more prominent, of course, and this was aided recently by the addition of an example file to the source tree. The expert on the build system is Pearu Peterson. He has been very responsive to suggested fixes and problems that people have experienced. Robert Kern, David Cooke, and I also have some familiarity with the build system enough to assist from time to time. All help is greatly appreciated, however, as I know you can come up with configurations that do cause things to "get interesting." The more configurations that we get tested and working, the better off we will be. The more people who understand the build system well enough to help fix it, the better off we'll be as well. So, I definitely don't want to discourage any ideas you have on improving the build system. Thanks for being willing to dive in and help. -Travis > I've spent the past couple of days trying to build NumPy on Windows with > ATLAS and CLAPACK with MinGW and Visual Studio .NET 2003 and VS 8. I don't > know if it's just me, but this seems to be very hard. This could probably be > partly attributed to the build systems of these libraries and to the lack of > documentation, but I've also run into problems with NumPy build scripts. > > For example, the inclusion of the gcc library in the list of libraries when > building Fortran code with MinGW causes the build to break. Also, building > FLAPACK from source causes the build to fail (too many open files). > > While these errors on their own aren't particularly serious, I think it > would be helpful to set up an automated system to check that builds of the > various configurations NumPy supports can actually be done. There are > probably a few million ways to build NumPy, but it would be nice if we could > make sure that the N most common configurations always work, and provide > documentation for "trying this at home." > > I also think it would be useful to set up a system that performs regular > builds of the latest revision from the SVN repository. I think anyone > attempting this is going to run into a few issues with the build scripts, > especially when trying to build on multiple platforms. > > Things I would like to get right, which I think are much harder than they > need to be (feel free to disagree): > > - Windows builds in general > - Visual Studio .NET 2003 builds > - Visual C++ Toolkit 2003 builds > - Visual Studio 2005 builds > - Builds with ATLAS and CLAPACK > > The reason I'm interested in the Microsoft compilers is that they have many > features to help us make sure that the code is correct, both at compile time > and at run time. > > Any comments? Anybody building on Windows that finds the process to be > completely painless? > > Regards, > > Albert > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From oliphant.travis at ieee.org Sat Apr 15 10:55:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 15 10:55:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060415172755.GA15274@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> <44402A2A.9050300@ee.byu.edu> <20060415120451.GA15123@xot.carabos.com> <44410972.4090502@cox.net> <20060415172755.GA15274@xot.carabos.com> Message-ID: <4441333D.50906@ieee.org> faltet at xot.carabos.com wrote: > On Sat, Apr 15, 2006 at 07:55:46AM -0700, Tim Hochberg wrote: > >>> I'm not sure this is directly related with striding. Look at this: >>> >>> In [5]: npcopy=timeit.Timer('a=a.copy()','import numpy as np; >>> a=np.arange(1000000,dtype="Float64")[::10]') >>> >>> In [6]: npcopy.repeat(3,10) >>> Out[6]: [0.061118125915527344, 0.061014175415039062, >>> 0.063937187194824219] >>> >>> In [7]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np; >>> a=np.arange(1000000,dtype="Float64")[::10]') >>> >>> In [8]: npcopy2.repeat(3,10) >>> Out[8]: [0.29984092712402344, 0.29889702796936035, 0.29834103584289551] >>> >>> You see? assigning to a new variable makes the copy go 5x times >>> slower! >>> >>> >> You are being tricked! In the first case, the array is discontiguous for >> the first copy but for every subsequenc copy is contiguous since you >> replace 'a'. In the second case, the array is discontiguous for every copy >> > > Oh, yes!. Thanks for noting this!. So in order to compare apples with > apples, the difference between numarray and numpy in case of strided > copies is: > > In [87]: npcopy_stride=timeit.Timer('b=a.copy()','import numpy as np; > a=np.arange(1000000,dtype="Float64")[::10]') > > In [88]: npcopy_stride.repeat(3,10) > Out[88]: [0.30013298988342285, 0.29976487159729004, 0.29945492744445801] > > In [89]: nacopy_stride=timeit.Timer('b=a.copy()','import numarray as np; > a=np.arange(1000000,type="Float64")[::10]') > > In [90]: nacopy_stride.repeat(3,10) > Out[90]: [0.07545709609985351, 0.0731458663940429, 0.073173046112060547] > > so numpy is aproximately 4x times slower than numarray. > > This also seems to vary from compiler to compiler. On my system it's not quite so different (about 1.5x slower). I'm wondering what the effect of an inlined memmove is. Essentially numarray has an inlined for-loop to copy bytes while NumPy calles memmove. I'll try that out and see... -Travis From ryanlists at gmail.com Sat Apr 15 10:58:17 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Sat Apr 15 10:58:17 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <44413251.3080505@ieee.org> References: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> <44413251.3080505@ieee.org> Message-ID: As I understand the summer of code, we can basically get a full time student (who gets paid $4500 for the summer) at no cost to us, as long as someone is willing to coach and define the project. (NumPy/SciPy would actually get $500 from Google). So, I think it would be great if we could define some projects and see what happens. (I am trying to graduate this summer, so maybe I should shut up if I can't help much). Ryan On 4/15/06, Travis Oliphant wrote: > Albert Strasheim wrote: > > Hello all > > > > > > Let me start by saying that the build system works fine for what I think is > > the default case, i.e. building NumPy on Linux with preinstalled LAPACK and > > BLAS. However, as soon as you vary any of those parameters, things get > > interesting. > > > It also builds fine with mingw and pre-installed ATLAS (I do it all the > time). It also builds fine with no-installed ATLAS (or LAPACK or BLAS) > with mingw32 and Linux. It also builds on Mac OS X. It also builds on > Solaris, AIX, and Cygwin. Work also went in recently to make sure it > builds with a Visual Studio Compiler (the one Tim Hochberg was using...) > > So, I think it's a bit unfair to say that varying from only a Linux > build causes "things to get interesting". Definitely there are > configurations that can require a specialized site.cfg file and it can > be difficult if you build with a compiler that was not used to build > Python itself. But, it's not a one-platform build system. I just > want that to be clear. > > Documentation on the site.cfg file could be more prominent, of course, > and this was aided recently by the addition of an example file to the > source tree. > > The expert on the build system is Pearu Peterson. He has been very > responsive to suggested fixes and problems that people have > experienced. Robert Kern, David Cooke, and I also have some > familiarity with the build system enough to assist from time to time. > > All help is greatly appreciated, however, as I know you can come up with > configurations that do cause things to "get interesting." The more > configurations that we get tested and working, the better off we will > be. The more people who understand the build system well enough to > help fix it, the better off we'll be as well. So, I definitely don't > want to discourage any ideas you have on improving the build system. > > Thanks for being willing to dive in and help. > > -Travis > > > > > > I've spent the past couple of days trying to build NumPy on Windows with > > ATLAS and CLAPACK with MinGW and Visual Studio .NET 2003 and VS 8. I don't > > know if it's just me, but this seems to be very hard. This could probably be > > partly attributed to the build systems of these libraries and to the lack of > > documentation, but I've also run into problems with NumPy build scripts. > > > > For example, the inclusion of the gcc library in the list of libraries when > > building Fortran code with MinGW causes the build to break. Also, building > > FLAPACK from source causes the build to fail (too many open files). > > > > While these errors on their own aren't particularly serious, I think it > > would be helpful to set up an automated system to check that builds of the > > various configurations NumPy supports can actually be done. There are > > probably a few million ways to build NumPy, but it would be nice if we could > > make sure that the N most common configurations always work, and provide > > documentation for "trying this at home." > > > > I also think it would be useful to set up a system that performs regular > > builds of the latest revision from the SVN repository. I think anyone > > attempting this is going to run into a few issues with the build scripts, > > especially when trying to build on multiple platforms. > > > > Things I would like to get right, which I think are much harder than they > > need to be (feel free to disagree): > > > > - Windows builds in general > > - Visual Studio .NET 2003 builds > > - Visual C++ Toolkit 2003 builds > > - Visual Studio 2005 builds > > - Builds with ATLAS and CLAPACK > > > > The reason I'm interested in the Microsoft compilers is that they have many > > features to help us make sure that the code is correct, both at compile time > > and at run time. > > > > Any comments? Anybody building on Windows that finds the process to be > > completely painless? > > > > Regards, > > > > Albert > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > > that extends applications into web and mobile media. Attend the live webcast > > and join the prime developer group breaking into this new coding territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From robert.kern at gmail.com Sat Apr 15 11:31:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 15 11:31:01 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <44410A87.70205@sympatico.ca> References: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> <44410A87.70205@sympatico.ca> Message-ID: Colin J. Williams wrote: > I believe that the Python Software Foundation > (http://www.python.org/psf/grants/) offers funding from time to time. However, it likes to fund new projects, not continuing ones. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Sat Apr 15 11:35:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 15 11:35:04 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <014e01c66091$b6b6b730$0502010a@dsp.sun.ac.za> References: <014e01c66091$b6b6b730$0502010a@dsp.sun.ac.za> Message-ID: <44413C9B.3080507@ieee.org> Albert Strasheim wrote: > Hello all > > I did some more Valgrinding and reduces all the warnings still produced when > running NumPy revision 0.9.7.2358 to a few lines of code. The relevant Trac > tickets: > > http://projects.scipy.org/scipy/numpy/ticket/60 > http://projects.scipy.org/scipy/numpy/ticket/61 > http://projects.scipy.org/scipy/numpy/ticket/62 > http://projects.scipy.org/scipy/numpy/ticket/64 > http://projects.scipy.org/scipy/numpy/ticket/65 > > This is very useful. Thank you for isolating the code producing the warnings like this. It makes it much easier to debug. -Travis From robert.kern at gmail.com Sat Apr 15 12:00:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 15 12:00:06 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: <1145116214.444116365d326@webmail.colorado.edu> References: <1145116214.444116365d326@webmail.colorado.edu> Message-ID: Saqib bin Sohail wrote: > Hi guys > > I have never used python, but I wanted to compute FFT of audio files, I came > upon a page which had python code, so I installed Numpy but after beating the > bush for a few days, I have finally come in here to ask. After taking the FFT I > want to output it to a file and the use gnuplot to plot it. > When I instaled NumPy, and ran the tests, it seemed that all passed without a > problem. My input is a .dat file converted from .wav file by sox. > > Here is the code which obviously doesn't work because it seems that changes > have occured since this code was written. (not my code, just from some website > where a guy had written on how to do things which i require) Okay, first some history. Originally, the package was named Numeric; occasionally, it was referred to by its nickname NumPy. Some years ago, a group needed features that couldn't be done in the Numeric codebase, so they started a rewrite called numarray. For various reasons that I don't want to get into, another group needed features that couldn't be done in the numarray codebase, so a second rewrite happened and this package is the one that is currently getting the most developer attention. It is called numpy. Since you are a new user, I highly recommend that you use numpy instead of Numeric or numarray. http://numeric.scipy.org/ > import Numeric > import FFT > out_array=Numeric.array(out) > out_fft=FFT.fft(out) > > offt=open('outfile_fft.dat','w') > for x in range(len(out_fft)/2): > offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) Rewritten for numpy (but untested): import numpy # Assuming that the file contains 32-bit floats, and not 64-bit floats data = numpy.fromfile('test.dat', dtype=numpy.float32) out_fft = numpy.refft(data) # Note: refft does the FFT on real data and thus throws away the negative # frequencies since they are redundant. len(out_fft) != len(data) # and now I'm confused because the code references variables that weren't # created anywhere, so I'm going to output the power spectrum n = len(out_fft) freqs = numpy.arange(n, dtype=numpy.float32) / len(data) power = out_fft.real*out_fft.real + out_fft.imag*out_fft.imag outarray = numpy.column_stack(freqs, power) assert outarray.shape == (n, 2) offt = open('outfile_fft.dat', 'w') try: for f, p in outarray: offt.write('%f %f\n' % (f, p)) finally: offt.close() > I do the following at the python prompt > > import numarray > myFile = open('test.dat', 'r') > my_array = numarray.arra(myFile) > > /* at this stage I wanted to see if it was correctly read */ > > print myArray > [1632837691 1701605485 1952535072 ..., 538976288 538976288 168632368] > > it seems that these values do not correspond to the values in the file (but I > guess the array is considering these as ints when infact these are floats) Indeed. There is no way for the array constructor to know the data type in the file unless if you tell it. The default type is int. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Saqib.Sohail at colorado.edu Sat Apr 15 13:42:02 2006 From: Saqib.Sohail at colorado.edu (Saqib bin Sohail) Date: Sat Apr 15 13:42:02 2006 Subject: [Numpy-discussion] Code Question In-Reply-To: <06041504462800.00752@rbastian> References: <1145116214.444116365d326@webmail.colorado.edu> <06041504462800.00752@rbastian> Message-ID: <1145133678.44415a6e5d8f7@webmail.colorado.edu> Thanks a lot for your detailed email, unfortunately both of the following imports don't work import Gnuplot import fft as FFT from numarray import * I think I need Gnuplot package but what I can't understand is why, fft is not being imported, do I need to install the NumPy package with special options to install fft. Quoting Ren? Bastian : > Le Samedi 15 Avril 2006 17:50, Saqib bin Sohail a ?crit : > > Hi guys > > > > I have never used python, but I wanted to compute FFT of audio files, I > > came upon a page which had python code, so I installed Numpy but after > > beating the bush for a few days, I have finally come in here to ask. After > > taking the FFT I want to output it to a file and the use gnuplot to plot > > it. > > With the module Gnuplot.py you can plot arrays > > import Gnuplot > > g =Gnuplot.Gnuplot() > g.plot(w) # w is an array > raw_input("Enter") > g.reset() > > I use numarray > > Some code : > ---------------- > > import fft as FFT > from numarray import * > > T = arrayrange(0.0, 2*pi, 1.0/1000) > a = sin(2*pi*440.0*T) > > r = FFT.fft(a) > print r > g.plot(r) > raw_input("Enter") > .... > r = FFT.inverse_real_fft(a) > r = FFT.real_fft(a) > r = FFT.hermite_fft(a) > > g.reset() > ---------------- > > > > > > When I instaled NumPy, and ran the tests, it seemed that all passed without > > a problem. My input is a .dat file converted from .wav file by sox. > > > > > > Here is the code which obviously doesn't work because it seems that changes > > have occured since this code was written. (not my code, just from some > > website where a guy had written on how to do things which i require) > > > > import Numeric > > import FFT > > out_array=Numeric.array(out) > > out_fft=FFT.fft(out) > > > > > > offt=open('outfile_fft.dat','w') > > for x in range(len(out_fft)/2): > > offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) > > > > > > I do the following at the python prompt > > > > import numarray > > myFile = open('test.dat', 'r') > > my_array = numarray.arra(myFile) > > Read the manual how to load a file of floats > I think there is a mistake > > > /* at this stage I wanted to see if it was correctly read */ > > > > print myArray > > [1632837691 1701605485 1952535072 ..., 538976288 538976288 168632368] > > > > it seems that these values do not correspond to the values in the file (but > > I guess the array is considering these as ints when infact these are > > floats) > > hmmm ... > > > > > anyway the problem starts when i try to do fft, because I can't seem to > > find module or how to invoke it, > > > > the second problem is writing to the file, that code obviously doesn't > > work, and in my search through various documentations, i found arrayrange() > > but couldn't make it to work, call me stupid, but despite going through > > several examples, i haven't been able to make the for loop worked in any > > case, > > > > > > it would be very kind of someone if he could at least tell me what i am > > doing wrong and reply a simple example so that I can modify my code or at > > least be able to understand . > > > > Thanks > > > > > > > > -- > > Saqib bin Sohail > > PhD ECE > > University of Colorado at Boulder > > Res: (303) 786 0636 > > http://ucsu.colorado.edu/~sohail/index.html > > > > > > ------------------------------------------------------- > > -- > Ren? Bastian > http://pythoneon.musiques-rb.org "Musique en Python" > > -- Saqib bin Sohail PhD ECE University of Colorado at Boulder Res: (303) 786 0636 http://ucsu.colorado.edu/~sohail/index.html From robert.kern at gmail.com Sun Apr 16 02:37:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 02:37:05 2006 Subject: [Numpy-discussion] Trac Wikis closed for anonymous edits until further notice Message-ID: <44421025.9060804@gmail.com> We've been hit badly by spammers, so I can only presume our Trac sites are now on the traded spam lists. I am going to turn off anonymous edits for now. Ticket creation will probably still be left open for now. Many thanks to David Cooke for quickly removing the spam. I am looking into ways to allow people to register themselves with the Trac sites so they can edit the Wikis and submit tickets without needing to be added by a project admin. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Sun Apr 16 12:36:01 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Sun Apr 16 12:36:01 2006 Subject: [Numpy-discussion] g95 detection not working Message-ID: <44429C55.2030500@gmail.com> Hi all, at least on my setup (OS X, Python 2.4.1, latest svn of numpy and scipy), config_fc fails to recognize my g95 compiler, which was directly downloaded from http://g95.sourceforge.net/ (and always has failed, I think). This is because the current version string doesn't conform to the regexp pattern; the version string is """ G95 (GCC 4.0.3 (g95!) Apr 12 2006) Copyright (C) 2002-2005 Free Software Foundation, Inc. G95 comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of G95 under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING """ I've attached a patch below, although this identifies the version string with the date of the release, rather than the gcc version; I'm not sure which is the right one to use! Andrew --- numpy/distutils/fcompiler/g95.py (revision 2360) +++ numpy/distutils/fcompiler/g95.py (working copy) @@ -9,7 +9,7 @@ class G95FCompiler(FCompiler): compiler_type = 'g95' - version_pattern = r'G95.*\(experimental\) \(g95!\) (?P.*)\).*' + version_pattern = r'G95.*\(g95!\) (?P.*)\).*' executables = { 'version_cmd' : ["g95", "--version"], From robert.kern at gmail.com Sun Apr 16 12:50:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 12:50:05 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: <44429C55.2030500@gmail.com> References: <44429C55.2030500@gmail.com> Message-ID: Andrew Jaffe wrote: > Hi all, > > at least on my setup (OS X, Python 2.4.1, latest svn of numpy and > scipy), config_fc fails to recognize my g95 compiler, which was directly > downloaded from http://g95.sourceforge.net/ (and always has failed, I > think). This is because the current version string doesn't conform to > the regexp pattern; the version string is > """ > G95 (GCC 4.0.3 (g95!) Apr 12 2006) > Copyright (C) 2002-2005 Free Software Foundation, Inc. > > G95 comes with NO WARRANTY, to the extent permitted by law. > You may redistribute copies of G95 > under the terms of the GNU General Public License. > For more information about these matters, see the file named COPYING > """ > > I've attached a patch below, although this identifies the version string > with the date of the release, rather than the gcc version; I'm not sure > which is the right one to use! We need the actual version number; in this case, "4.0.3". -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Sun Apr 16 13:53:03 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Sun Apr 16 13:53:03 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: References: <44429C55.2030500@gmail.com> Message-ID: <4442AE89.8080303@gmail.com> Robert Kern wrote: > Andrew Jaffe wrote: >> Hi all, >> >> at least on my setup (OS X, Python 2.4.1, latest svn of numpy and >> scipy), config_fc fails to recognize my g95 compiler, which was directly >> downloaded from http://g95.sourceforge.net/ (and always has failed, I >> think). This is because the current version string doesn't conform to >> the regexp pattern; the version string is >> """ >> G95 (GCC 4.0.3 (g95!) Apr 12 2006) >> Copyright (C) 2002-2005 Free Software Foundation, Inc. >> >> G95 comes with NO WARRANTY, to the extent permitted by law. >> You may redistribute copies of G95 >> under the terms of the GNU General Public License. >> For more information about these matters, see the file named COPYING >> """ >> >> I've attached a patch below, although this identifies the version string >> with the date of the release, rather than the gcc version; I'm not sure >> which is the right one to use! > > We need the actual version number; in this case, "4.0.3". Thanks -- OK, in that case the following regexp works for me: version_pattern = r'G95.*\(GCC (?P.*) \(g95!\)' But are there different versions of the version string? Also on an unrelated f2py note: is the f2py mailing list being read by the f2py developers? I've posted a question (about the status of F9x "types") without reply... Yours, Andrew From robert.kern at gmail.com Sun Apr 16 13:56:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 13:56:02 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: <44429C55.2030500@gmail.com> References: <44429C55.2030500@gmail.com> Message-ID: Andrew Jaffe wrote: > Hi all, > > at least on my setup (OS X, Python 2.4.1, latest svn of numpy and > scipy), config_fc fails to recognize my g95 compiler, which was directly > downloaded from http://g95.sourceforge.net/ (and always has failed, I > think). This is because the current version string doesn't conform to > the regexp pattern; the version string is > """ > G95 (GCC 4.0.3 (g95!) Apr 12 2006) > Copyright (C) 2002-2005 Free Software Foundation, Inc. > > G95 comes with NO WARRANTY, to the extent permitted by law. > You may redistribute copies of G95 > under the terms of the GNU General Public License. > For more information about these matters, see the file named COPYING > """ > > I've attached a patch below, although this identifies the version string > with the date of the release, rather than the gcc version; I'm not sure > which is the right one to use! Also, note that you can override the get_version() method entirely, if it's easier to do grab the version using something other than a regex. You can look at hpux.py and ibm.py for examples. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Saqib.Sohail at colorado.edu Sun Apr 16 14:02:04 2006 From: Saqib.Sohail at colorado.edu (Saqib bin Sohail) Date: Sun Apr 16 14:02:04 2006 Subject: [Numpy-discussion] Code Question In-Reply-To: References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> Message-ID: <1145221290.4442b0aa55961@webmail.colorado.edu> Thanks Guys for all your prompt responses. I have tried to use the provided solutions but I am had my share of issues mixed with my lack of knowledge to the point that I feel quite embarrassed to bother you guys. Issue 1 I am running FC 3 with native python-2.3 and then I installed python-2.4 in it. numarray-1.5.1 seems to have installed with success in python-2.3. I have tried to install numpy-0.9.6-1.i586.rpm but I don't have python-base and when I try to install python-base I get a long list of dependency lists which I need. I haven't further pursued down that line, unfortunately I haven't been able to use numarray, I don't know how to use it because ppl have repeatedly told me to use numpy but I can't seem to get that installed. Issue 2 To input the file, Ryan suggested to use scipy, I don't want to go down that path, if only there is a simple way to input the file, (i can clean up the file and format it in the right way in perl, I can do that in a heartbeat) Issue 3 I don't want to use gnuplot functionality, or mathplot, if only I am able to write the file then again I can use perl to format it and use gnuplot then, So if there is the simplest of ways in which I can just i) read the file (formatting will be done in perl) ii) get the fft iii) write the file or files (and then use perl to format for gnuplot) I am sure all of you will say why not use the existing functionalities, but after 3 days I haven't gotten anywhere. All I need to do is get FFT of some sound files so that I can verify the result of FFT's and compare them with my FFT code in VxWorks. An Pierre, I started reading diveintopython.pdf but got nowhere when I tried two of its examples, the attached image shows that when I tried to run one of the examples on python-2.3 and the output wasn't according to what the guide suggested. (no output to be precise) http://jobim.colorado.edu/~sohail/pythonExample.JPG Thanks again guys. Quoting Ryan Krauss : > I guess it depends on how much you want to learn and what you want to do. > > I was able to load your data using > data=scipy.io.read_array('monkey.dat') > > I had to comment out the first line to make it work. I couldn't make > the fromfile method of numpy work because the data is actually fixed > width. > > If you don't want to install scipy, you would need to learn enough > Python to read the file and clean it up a little by hand. > > It seems like the first column is time and the second is the signal > you want to fft. I was able to fft it with: > myfft=numpy.fft(data[:,1]) > (I don't have the latest version of numpy and don't seem to have the > refft function Robert mentioned). > > t=data[:,0] > df=1/max(t) > df > maxf=8012 > fvect=arange(0,maxf+df,df) > > plot(fvect,abs(myfft)) > > I am plotting using matplotlib and the resulting figures are attached. > > If you really want to learn python for scientific and plotting > applications, I would highly recommend a few packages: > SciPy - some additional capabilities beyond Numpy (optimization, ode's , ...) > ipython - it is a really good interactive python shell > matplotlib - the best python 2d plotting package I am aware of > > Let me know if you have any additional questions. You can find out > about each package by googling it. They are all closely related to > Numpy and all have good mailing lists to help you. > > Ryan > > On 4/15/06, Saqib bin Sohail wrote: > > Do let me know if you get somewhere. > > > > Thanks > > > > > > Quoting Ryan Krauss : > > > > > email me the dat file and I could play with it a bit. If I can read > > > your input file, the rest should be easy. > > > > > > Ryan > > > > > > On 4/15/06, Saqib bin Sohail wrote: > > > > Hi guys > > > > > > > > I have never used python, but I wanted to compute FFT of audio files, I > > > came > > > > upon a page which had python code, so I installed Numpy but after > beating > > > the > > > > bush for a few days, I have finally come in here to ask. After taking > the > > > FFT I > > > > want to output it to a file and the use gnuplot to plot it. > > > > > > > > When I instaled NumPy, and ran the tests, it seemed that all passed > without > > > a > > > > problem. My input is a .dat file converted from .wav file by sox. > > > > > > > > Here is the code which obviously doesn't work because it seems that > changes > > > > have occured since this code was written. (not my code, just from some > > > website > > > > where a guy had written on how to do things which i require) > > > > > > > > import Numeric > > > > import FFT > > > > out_array=Numeric.array(out) > > > > out_fft=FFT.fft(out) > > > > > > > > offt=open('outfile_fft.dat','w') > > > > for x in range(len(out_fft)/2): > > > > offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) > > > > > > > > > > > > I do the following at the python prompt > > > > > > > > import numarray > > > > myFile = open('test.dat', 'r') > > > > my_array = numarray.arra(myFile) > > > > > > > > /* at this stage I wanted to see if it was correctly read */ > > > > > > > > print myArray > > > > [1632837691 1701605485 1952535072 ..., 538976288 538976288 > 168632368] > > > > > > > > it seems that these values do not correspond to the values in the file > (but > > > I > > > > guess the array is considering these as ints when infact these are > floats) > > > > > > > > anyway the problem starts when i try to do fft, because I can't seem to > > > find > > > > module or how to invoke it, > > > > > > > > the second problem is writing to the file, that code obviously doesn't > > > work, > > > > and in my search through various documentations, i found arrayrange() > but > > > > couldn't make it to work, call me stupid, but despite going through > several > > > > examples, i haven't been able to make the for loop worked in any case, > > > > > > > > it would be very kind of someone if he could at least tell me what i am > > > doing > > > > wrong and reply a simple example so that I can modify my code or at > least > > > be > > > > able to understand . > > > > > > > > Thanks > > > > > > > > > > > > > > > > -- > > > > Saqib bin Sohail > > > > PhD ECE > > > > University of Colorado at Boulder > > > > Res: (303) 786 0636 > > > > http://ucsu.colorado.edu/~sohail/index.html > > > > > > > > > > > > ------------------------------------------------------- > > > > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > > > > that extends applications into web and mobile media. Attend the live > > > webcast > > > > and join the prime developer group breaking into this new coding > territory! > > > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > > > _______________________________________________ > > > > Numpy-discussion mailing list > > > > Numpy-discussion at lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > > > > > > > > -- > > Saqib bin Sohail > > PhD ECE > > University of Colorado at Boulder > > Res: (303) 786 0636 > > http://ucsu.colorado.edu/~sohail/index.html > > > > > -- Saqib bin Sohail PhD ECE University of Colorado at Boulder Res: (303) 786 0636 http://ucsu.colorado.edu/~sohail/index.html From robert.kern at gmail.com Sun Apr 16 14:03:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 14:03:01 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: <4442AE89.8080303@gmail.com> References: <44429C55.2030500@gmail.com> <4442AE89.8080303@gmail.com> Message-ID: Andrew Jaffe wrote: > Thanks -- OK, in that case the following regexp works for me: > > version_pattern = r'G95.*\(GCC (?P.*) \(g95!\)' > > But are there different versions of the version string? Possibly. I don't really know. > Also on an unrelated f2py note: is the f2py mailing list being read by > the f2py developers? I've posted a question (about the status of F9x > "types") without reply... Pearu is really the only f2py developer, and he has just flown from his home in Estonia to Austin to work with us at Enthought for a month. I presume he has been busy preparing for his journey. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 16 14:26:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 14:26:06 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: <1145221290.4442b0aa55961@webmail.colorado.edu> References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> <1145221290.4442b0aa55961@webmail.colorado.edu> Message-ID: Saqib bin Sohail wrote: > An Pierre, I started reading diveintopython.pdf but got nowhere when I tried > two of its examples, the attached image shows that when I tried to run one of > the examples on python-2.3 and the output wasn't according to what the guide > suggested. (no output to be precise) > > http://jobim.colorado.edu/~sohail/pythonExample.JPG Note the indentation. Indentation is important in Python. > Quoting Ryan Krauss : >>(I don't have the latest version of numpy and don't seem to have the >>refft function Robert mentioned). My example was wrong. It should have used "numpy.dft.refft()", not "numpy.refft()". -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 16 14:37:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 14:37:02 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: <1145221290.4442b0aa55961@webmail.colorado.edu> References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> <1145221290.4442b0aa55961@webmail.colorado.edu> Message-ID: Saqib bin Sohail wrote: > I am sure all of you will say why not use the existing functionalities, but > after 3 days I haven't gotten anywhere. All I need to do is get FFT of some > sound files so that I can verify the result of FFT's and compare them with my > FFT code in VxWorks. Well, if you are just trying to get an independent verification of your VxWorks FFT code, and you are much more comfortable with Perl, then you might want to use one of the FFT libraries available for Perl like Math::FFT. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Sun Apr 16 15:18:02 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Sun Apr 16 15:18:02 2006 Subject: [Numpy-discussion] where() has started returning a tuple!? Message-ID: I think the following behavior is (only recently) wrong: In [7]: numpy.__version__ Out[7]: '0.9.7.2360' In [8]: numpy.nonzero([True, False, True]) Out[8]: array([0, 2]) In [9]: numpy.where([True, False, True]) Out[9]: (array([0, 2]),) Note the tuple output to where(), which should be the same as nonzero. Andrew From perry at stsci.edu Sun Apr 16 20:18:02 2006 From: perry at stsci.edu (Perry Greenfield) Date: Sun Apr 16 20:18:02 2006 Subject: [Numpy-discussion] where() has started returning a tuple!? In-Reply-To: Message-ID: see: http://sourceforge.net/mailarchive/forum.php?thread_id=10165581&forum_id=489 0 > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Andrew > Jaffe > Sent: Sunday, April 16, 2006 6:17 PM > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] where() has started returning a tuple!? > > > I think the following behavior is (only recently) wrong: > > In [7]: numpy.__version__ > Out[7]: '0.9.7.2360' > > In [8]: numpy.nonzero([True, False, True]) > Out[8]: array([0, 2]) > > In [9]: numpy.where([True, False, True]) > Out[9]: (array([0, 2]),) > > Note the tuple output to where(), which should be the same as nonzero. > > Andrew > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking > scripting language > that extends applications into web and mobile media. Attend the > live webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From a.h.jaffe at gmail.com Mon Apr 17 00:53:04 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Mon Apr 17 00:53:04 2006 Subject: [Numpy-discussion] Re: where() has started returning a tuple!? In-Reply-To: References: Message-ID: Aha, missed that thread (and the docstring -- my bad). And actually I misunderstood the effect of the change, anyway: a[where(a>0)] is still fine, it's just other activities like iterating over where(a>0) that is no longer possible in the same way. Thanks for the pointer! Andrew Perry Greenfield wrote: > see: > > http://sourceforge.net/mailarchive/forum.php?thread_id=10165581&forum_id=489 > 0 > >> -----Original Message----- >> From: numpy-discussion-admin at lists.sourceforge.net >> [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Andrew >> Jaffe >> Sent: Sunday, April 16, 2006 6:17 PM >> To: numpy-discussion at lists.sourceforge.net >> Subject: [Numpy-discussion] where() has started returning a tuple!? >> >> >> I think the following behavior is (only recently) wrong: >> >> In [7]: numpy.__version__ >> Out[7]: '0.9.7.2360' >> >> In [8]: numpy.nonzero([True, False, True]) >> Out[8]: array([0, 2]) >> >> In [9]: numpy.where([True, False, True]) >> Out[9]: (array([0, 2]),) >> >> Note the tuple output to where(), which should be the same as nonzero. >> >> Andrew >> From ryanlists at gmail.com Mon Apr 17 05:57:03 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Mon Apr 17 05:57:03 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> <1145221290.4442b0aa55961@webmail.colorado.edu> Message-ID: Alright Saqib, Robert is right that you should try fft in perl if you don't want to learn Python. But as I understand it, you want to read in this file, fft it, and write the fft to a file using only numarray. Attached is a script that does that. Most of the script is just low-level file io to avoid having to install scipy to read and write the arrays. Hope this helps, Ryan On 4/16/06, Robert Kern wrote: > Saqib bin Sohail wrote: > > > I am sure all of you will say why not use the existing functionalities, but > > after 3 days I haven't gotten anywhere. All I need to do is get FFT of some > > sound files so that I can verify the result of FFT's and compare them with my > > FFT code in VxWorks. > > Well, if you are just trying to get an independent verification of your VxWorks > FFT code, and you are much more comfortable with Perl, then you might want to > use one of the FFT libraries available for Perl like Math::FFT. > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: read_fft_write_numarray.py Type: text/x-python Size: 872 bytes Desc: not available URL: From chanley at stsci.edu Mon Apr 17 06:24:06 2006 From: chanley at stsci.edu (Christopher Hanley) Date: Mon Apr 17 06:24:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44404A5B.5010802@ieee.org> References: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> <44404A5B.5010802@ieee.org> Message-ID: <4443969D.4090604@stsci.edu> Travis Oliphant wrote: > I'm not sure if the Solaris crash is fixed or not yet after the recent > changes to SVN. There may be more than one bug here... The numpy.test() unit tests no longer cause segfaults on Solaris. All of my daily numpy regression tests are now passing for Solaris. Thank you for your time and help, Chris From michael.sorich at gmail.com Mon Apr 17 17:13:09 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Mon Apr 17 17:13:09 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array Message-ID: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> On 4/8/06, Sasha wrote: > > ... > See above. For ndarray mask is always False unless an add-on module is > loaded that redefines arithmetic to recognize special bit-patterns > such as NaN or INT_MIN. > > Is it possible to implement masked values using these special bit patterns in the ndarray instead of using a separate MA class? If so has there been any thought as to whether this may be the better option. I think it would be preferable if the ability to handle masked data was available in the standard array class (ndarray), as this would increase the likelihood that functions built for numeric arrays will handle masked values well. It seems that ndarray already has decent support for nans (isnan() returns the equivalent of a boolean mask array), indicating that such an approach may be acceptable. How difficult is it to generalise the concept to other data types (int, string, bool)? Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Apr 17 19:53:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 17 19:53:01 2006 Subject: [Numpy-discussion] Re: using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> Message-ID: Michael Sorich wrote: > On 4/8/06, *Sasha* > wrote: > > ... > > See above. For ndarray mask is always False unless an add-on module is > loaded that redefines arithmetic to recognize special bit-patterns > such as NaN or INT_MIN. > > Is it possible to implement masked values using these special bit > patterns in the ndarray instead of using a separate MA class? If so has > there been any thought as to whether this may be the better option. I > think it would be preferable if the ability to handle masked data was > available in the standard array class (ndarray), as this would increase > the likelihood that functions built for numeric arrays will handle > masked values well. It seems that ndarray already has decent support for > nans (isnan() returns the equivalent of a boolean mask array), > indicating that such an approach may be acceptable. How difficult is it > to generalise the concept to other data types (int, string, bool)? Well, I'm certainly dead set against any change that would make all arrays that happen to contain those special values to be treated as masked arrays. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Mon Apr 17 23:04:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 17 23:04:04 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> Message-ID: <44448138.2080402@ieee.org> Michael Sorich wrote: > On 4/8/06, *Sasha* > wrote: > > ... > > See above. For ndarray mask is always False unless an add-on module is > loaded that redefines arithmetic to recognize special bit-patterns > such as NaN or INT_MIN. > > > Is it possible to implement masked values using these special bit > patterns in the ndarray instead of using a separate MA class? If so > has there been any thought as to whether this may be the better > option. I think it would be preferable if the ability to handle masked > data was available in the standard array class (ndarray), as this > would increase the likelihood that functions built for numeric arrays > will handle masked values well. It seems that ndarray already has > decent support for nans (isnan() returns the equivalent of a boolean > mask array), indicating that such an approach may be acceptable. How > difficult is it to generalise the concept to other data types (int, > string, bool)? > I don't think the approach can be generalized at all. It would only work with floating-point values and therefore is not particularly exciting. I think ultimately, making masked arrays a C-based sub-class is where masked array should go. For now the Python-based class is a good environment for developing the ideas behind how to preserve masked arrays through other functions if it is possible. It seems that masked arrays must do things quite differently than other arrays on certain applications, and I'm not altogether clear on how to support them in all the NumPy code. Because masked arrays are not used by everybody who uses NumPy arrays, it should be a separate sub-class. Ultimately, I hope we will get the basic array object into Python (what Tim was calling the super array) before 2.6 -Travis From svetosch at gmx.net Tue Apr 18 01:15:01 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Tue Apr 18 01:15:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EDFE7.6010509@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> <443EDFE7.6010509@cox.net> Message-ID: <44449FC4.8020406@gmx.net> [Sorry for the late reaction, I was on vacation.] Tim Hochberg schrieb: >> > Here's my best guess as to what is going on: > 1. There is a relatively large group of people who use Kronecker > product as Alan does (probably the matrix as opposed to tensor math > folks). I'm guessing it's a large group since they manage to write the > definitions at both mathworld and planetmath. Yes. > 2. kron was meant to implement this. That's what I thought, anyway. > 2.5 People who need the other meaning of kron can just use outer, so > no real conflict. > 3. The implementation was either inappropriately generalized or it > was assumed that all inputs would be matrices (and hence rank-2). > > Assuming 3. is correct, and I'd like to hear from people if they think > that the behaviour in the non rank-2 cases is sensible, the next > question is whether the behaviour in the rank-2 cases makes sense. It > seem to, but I'm not a user of kron. If both of the preceeding are true, > it seems like a complete fix entails the following two things: > 1. Forbid arguments that are not rank-2. This allows all matrices, > which is really the main target here I think. > 2. Fix the return type issue. I have a fix for this ready to commit, > but I want to figure out the first part as well. > Both 1 and 2 sound very good to me as a user. So, should I still submit a new ticket about kron, or is it already being fixed? Greetings, Sven From a.u.r.e.l.i.a.n at gmx.net Tue Apr 18 01:46:04 2006 From: a.u.r.e.l.i.a.n at gmx.net (Johannes Loehnert) Date: Tue Apr 18 01:46:04 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: Message-ID: <200604181045.05058.a.u.r.e.l.i.a.n@gmx.net> On Thursday 13 April 2006 19:16, Ryan Krauss wrote: > which makes this: > myvect=where((f>19.5) & (f<38) & > (phase>0),ones(shape(phase)),zeros(shape(phase))) > > actually really silly, sense all it is a complicated way to get back > the input of > (f>19.5) & (f<38) & (phase>0) > ...but you should cast the second to signed int32, otherwise a = (f>19.5) & (f<38) & (phase>0) print a-1 will give an array of 0's and 255's :) (since boolean arrays are by default upcast to unsigned int8) Johannes From ryanlists at gmail.com Tue Apr 18 05:31:15 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue Apr 18 05:31:15 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: <200604181045.05058.a.u.r.e.l.i.a.n@gmx.net> References: <200604181045.05058.a.u.r.e.l.i.a.n@gmx.net> Message-ID: You are right. I actually did run into a problem with this. I was trying to subtract 360 degrees from the phase of some fft data and I multiplied -360 (no dot) times my bool array. It took me a while to track that one down. Ryan On 4/18/06, Johannes Loehnert wrote: > On Thursday 13 April 2006 19:16, Ryan Krauss wrote: > > which makes this: > > myvect=where((f>19.5) & (f<38) & > > (phase>0),ones(shape(phase)),zeros(shape(phase))) > > > > actually really silly, sense all it is a complicated way to get back > > the input of > > (f>19.5) & (f<38) & (phase>0) > > > > ...but you should cast the second to signed int32, otherwise > > a = (f>19.5) & (f<38) & (phase>0) > print a-1 > > will give an array of 0's and 255's :) (since boolean arrays are by default > upcast to unsigned int8) > > Johannes > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Tue Apr 18 06:24:09 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 18 06:24:09 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <44449FC4.8020406@gmx.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> <443EDFE7.6010509@cox.net> <44449FC4.8020406@gmx.net> Message-ID: <4444E7DD.2010209@cox.net> Sven Schreiber wrote: >[Sorry for the late reaction, I was on vacation.] > >Tim Hochberg schrieb: > > > >>Here's my best guess as to what is going on: >> 1. There is a relatively large group of people who use Kronecker >>product as Alan does (probably the matrix as opposed to tensor math >>folks). I'm guessing it's a large group since they manage to write the >>definitions at both mathworld and planetmath. >> >> > >Yes. > > > >> 2. kron was meant to implement this. >> >> > >That's what I thought, anyway. > > > >> 2.5 People who need the other meaning of kron can just use outer, so >>no real conflict. >> 3. The implementation was either inappropriately generalized or it >>was assumed that all inputs would be matrices (and hence rank-2). >> >>Assuming 3. is correct, and I'd like to hear from people if they think >>that the behaviour in the non rank-2 cases is sensible, the next >>question is whether the behaviour in the rank-2 cases makes sense. It >>seem to, but I'm not a user of kron. If both of the preceeding are true, >>it seems like a complete fix entails the following two things: >> 1. Forbid arguments that are not rank-2. This allows all matrices, >>which is really the main target here I think. >> 2. Fix the return type issue. I have a fix for this ready to commit, >>but I want to figure out the first part as well. >> >> >> > >Both 1 and 2 sound very good to me as a user. > >So, should I still submit a new ticket about kron, or is it already >being fixed? > > Go ahead and submit a ticket if you would. I have a fix here, but I've been waiting to submit it till I heard from some other people who use kron (and because I've been swamped the last couple of days). If you submit the ticket, that'll keep it from falling through the cracks. Thanks for the feedback, -tim >Greetings, >Sven > > > > From ndarray at mac.com Tue Apr 18 07:06:22 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 18 07:06:22 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <44448138.2080402@ieee.org> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> Message-ID: On 4/18/06, Travis Oliphant wrote: > Michael Sorich wrote: > ... > > Is it possible to implement masked values using these special bit > > patterns in the ndarray instead of using a separate MA class? If so > > has there been any thought as to whether this may be the better > > option. I think it would be preferable if the ability to handle masked > > data was available in the standard array class (ndarray), as this > > would increase the likelihood that functions built for numeric arrays > > will handle masked values well. It seems that ndarray already has > > decent support for nans (isnan() returns the equivalent of a boolean > > mask array), indicating that such an approach may be acceptable. How > > difficult is it to generalise the concept to other data types (int, > > string, bool)? > > > I don't think the approach can be generalized at all. It would only > work with floating-point values and therefore is not particularly exciting. > Not true. R supports "NA" for all its types except raw bytes. For example: > x<-logical(5) > x [1] FALSE FALSE FALSE FALSE FALSE > x[1:2]=NA > !x [1] NA NA TRUE TRUE TRUE > I think ultimately, making masked arrays a C-based sub-class is where > masked array should go. For now the Python-based class is a good > environment for developing the ideas behind how to preserve masked > arrays through other functions if it is possible. > I've voiced my opposition to subclassing before. Here I believe it is more appropriate to have an add-on module that installs alternative math functions. Having two classes in the same application that a subtly different in the corner cases is already a problem with ma.array vs. ndarray, adding the third class will only make things worse. > It seems that masked arrays must do things quite differently than other > arrays on certain applications, and I'm not altogether clear on how to > support them in all the NumPy code. Because masked arrays are not used > by everybody who uses NumPy arrays, it should be a separate sub-class. > As far as I understand, people who don't use MA don't deal with missing values. For this category of users there will be no visible effect no matter how missing values are treated as long as in the absence of missing values, normal rules apply. Yes, many functions must treat missing values differently, but the same is true for NaNs. NumPy allows floating point arrays to have nans, but there is no real support beyong what happened to work at the OS level. For example: >>> sort([5,nan,3,2]) array([ 5. , nan, 2. , 3. ]) Also, what is the justification for >>> int_(nan) 0 ? > Ultimately, I hope we will get the basic array object into Python (what > Tim was calling the super array) before 2.6 As far as I understand, that object will not come with arithmetic rules or math functions. Therefore, I don't see how this is relevant to the present discussion. From oliphant.travis at ieee.org Tue Apr 18 09:39:11 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 09:39:11 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> Message-ID: <44451611.9070707@ieee.org> Sasha wrote: > On 4/18/06, Travis Oliphant wrote: > >> Michael Sorich wrote: >> ... >> >>> Is it possible to implement masked values using these special bit >>> patterns in the ndarray instead of using a separate MA class? If so >>> has there been any thought as to whether this may be the better >>> option. I think it would be preferable if the ability to handle masked >>> data was available in the standard array class (ndarray), as this >>> would increase the likelihood that functions built for numeric arrays >>> will handle masked values well. It seems that ndarray already has >>> decent support for nans (isnan() returns the equivalent of a boolean >>> mask array), indicating that such an approach may be acceptable. How >>> difficult is it to generalise the concept to other data types (int, >>> string, bool)? >>> >>> >> I don't think the approach can be generalized at all. It would only >> work with floating-point values and therefore is not particularly exciting. >> >> > Not true. R supports "NA" for all its types except raw bytes. > For example: > > >> x<-logical(5) >> x >> > [1] FALSE FALSE FALSE FALSE FALSE > >> x[1:2]=NA >> !x >> > [1] NA NA TRUE TRUE TRUE > For Boolean values there is "room" for a NA value, but what about arbitrary integers. Does R just limit the range of the integer value? That's what I meant: "fiddling with special-values" doesn't generalize to all data-types. >> arrays through other functions if it is possible. >> >> > I've voiced my opposition to subclassing before. And you haven't been very clear about why you are opposed. Just voicing concern is not enough. Python sub-classing in C amounts to exactly what masked arrays are: arrays with additional components in their structure (i.e. a mask). Please be more specific about whatever your concerns are with sub-classing. > Here I believe it is > more appropriate to have an add-on module that installs alternative > math functions. Sure that will work. But, we're talking about more than math functions. Ultimately masked array users will want *every* function they use to work "right" with masked arrays. > Having two classes in the same application that a > subtly different in the corner cases is already a problem with > ma.array vs. ndarray, adding the third class will only make things > worse. > I don't know what you are talking about. What is the "third class?" I'm talking about just making ma.array construct a sub-class.. >> It seems that masked arrays must do things quite differently than other >> arrays on certain applications, and I'm not altogether clear on how to >> support them in all the NumPy code. Because masked arrays are not used >> by everybody who uses NumPy arrays, it should be a separate sub-class. >> >> > As far as I understand, people who don't use MA don't deal with > missing values. For this category of users there will be no visible > effect no matter how missing values are treated as long as in the > absence of missing values, normal rules apply. Yes, many functions > must treat missing values differently, but the same is true for NaNs. > NumPy allows floating point arrays to have nans, but there is no real > support beyong what happened to work at the OS level. > Or we deal with missing values differently (i.e. manage it ourselves). Sure, there will be no behavioral effect, but the code will have to be re-written to "do the right thing" with masked arrays in such a way as to not slow everything else down (that's at least an "if" statement sprinkled throughout every sub-routine). Many people are not enthused about complicating the basic array object any more than necessary. If it can be shown that masked arrays can be integrated into the ndarray object without inordinate complication and/or slowness, then I don't think people would mind. The best way to prove that is to create a sub-class and change only the methods / functions that are necessary. That's really all I'm saying. > >> Ultimately, I hope we will get the basic array object into Python (what >> Tim was calling the super array) before 2.6 >> > > As far as I understand, that object will not come with arithmetic > rules or math functions. Therefore, I don't see how this is relevant > to the present discussion. > Because it will help all array objects talk more cleanly to each other. But, if you are so opposed to sub-classing (which I'm not sure why in this case), then it may not matter. -Travis From strang at nmr.mgh.harvard.edu Tue Apr 18 10:37:03 2006 From: strang at nmr.mgh.harvard.edu (Gary Strangman) Date: Tue Apr 18 10:37:03 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <44451611.9070707@ieee.org> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> <44451611.9070707@ieee.org> Message-ID: >> Not true. R supports "NA" for all its types except raw bytes. >> For example: (snip) > > For Boolean values there is "room" for a NA value, but what about arbitrary > integers. Does R just limit the range of the integer value? That's what I > meant: "fiddling with special-values" doesn't generalize to all data-types. In R, I believe NA = -sys.maxint-1 Gary From oliphant.travis at ieee.org Tue Apr 18 11:09:03 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 11:09:03 2006 Subject: [Numpy-discussion] String (and unicode) comparisons and per-thread error handling fixed Message-ID: <44452B04.4090403@ieee.org> String comparisons were added last week. Today, I added per-thread error handling to NumPy. There is 1 more enhancement (scalar math) prior to 0.9.8 release --- but it will probably take 1-2 weeks. The new error handling means that the three-scope system is gone. Now, there is only one per-Python-thread global scope for error handling. If you change the error handling it will affect all ufuncs. Because of this, the seterr function now returns an object with the old error-handling information. This object must be passed to umath.seterrobj() in order to restore the error handling. -Travis From tim.hochberg at cox.net Tue Apr 18 11:21:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 18 11:21:06 2006 Subject: [Numpy-discussion] String (and unicode) comparisons and per-thread error handling fixed In-Reply-To: <44452B04.4090403@ieee.org> References: <44452B04.4090403@ieee.org> Message-ID: <44452D53.70009@cox.net> Travis Oliphant wrote: > > String comparisons were added last week. Today, I added per-thread > error handling to NumPy. There is 1 more enhancement (scalar math) > prior to 0.9.8 release --- but it will probably take 1-2 weeks. Oops! I'm about 2/3 done doing this one too. I think I'll go ahead and finish mine up and see how our approaches stack up performance wise and see if there's any of mine that's useful to roll into yours. -tim > > The new error handling means that the three-scope system is gone. > Now, there is only one per-Python-thread global scope for error > handling. If you change the error handling it will affect all > ufuncs. Because of this, the seterr function now returns an object > with the old error-handling information. This object must be passed > to umath.seterrobj() in order to restore the error handling. > > -Travis > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From oliphant.travis at ieee.org Tue Apr 18 12:14:14 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 12:14:14 2006 Subject: [Numpy-discussion] String (and unicode) comparisons and per-thread error handling fixed In-Reply-To: <44452D53.70009@cox.net> References: <44452B04.4090403@ieee.org> <44452D53.70009@cox.net> Message-ID: <44453A5E.4020506@ieee.org> Tim Hochberg wrote: > Travis Oliphant wrote: > >> >> String comparisons were added last week. Today, I added per-thread >> error handling to NumPy. There is 1 more enhancement (scalar math) >> prior to 0.9.8 release --- but it will probably take 1-2 weeks. > > Oops! I'm about 2/3 done doing this one too. I think I'll go ahead > and finish mine up and see how our approaches stack up performance > wise and see if there's any of mine that's useful to roll into yours. Darn. I thought I gave you enough time.... :-) On the other hand, all I did was change the way the error-mode is being looked-up (from the three dictionaries to just one). It's not much different than before except for that. I didn't do anything about the other ideas you spoke of. I did add a simple object to reset the error mode when it gets deleted, and had to fiddle with the seterr code a little to accept that object so that both methods of resetting the error mode work. A stack can certainly be built on top of what is now there (I'm thinking for numarray compatibility...), but I didn't do that. Sorry for stepping on your toes. I'm just anxious... I'll be gone for a couple of days and won't be working on NumPy/SciPy, so feel free to adjust. -Travis From rhl at astro.princeton.edu Tue Apr 18 13:07:04 2006 From: rhl at astro.princeton.edu (Robert Lupton) Date: Tue Apr 18 13:07:04 2006 Subject: [Numpy-discussion] Infinite recursion in numpy called from swig generated code In-Reply-To: References: <5809AC56-B2DF-4403-B7BC-9AEEAAC78505@astro.princeton.edu> <43FD32E4.10600@ieee.org> <44203F91.7010505@ieee.org> Message-ID: The latest version of swig (1.3.28 or 1.3.29) has broken my multiple-inheritance-from-C-and-numpy application; more specifically, it generates an infinite loop in numpy-land. I'm using numpy (0.9.6), and here's the offending code. Ideas anyone? I've pasted the crucial part of numpy.lib.UserArray onto the end of this message (how do I know? because you can replace the "from numpy.lib.UserArray" with this, and the problem persists). ##################################################### from numpy.lib.UserArray import * import types class myImage(types.ObjectType): def __init__(self, *args): this = None try: self.this.append(this) except: self.this = this class Image(UserArray, myImage): def __init__(self, *args): myImage.__init__(self, *args) ##################################################### The symptoms are: from recursionBug import *; Image(myImage()) ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in ? File "recursionBug.py", line 32, in __init__ myImage.__init__(self, *args) File "recursionBug.py", line 26, in __init__ except: self.this = this File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 187, in __setattr__ self.array.__setattr__(attr, value) File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 193, in __getattr__ return self.array.__getattribute__(attr) ... File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 193, in __getattr__ return self.array.__getattribute__(attr) File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 193, in __getattr__ return self.array.__getattribute__(attr) RuntimeError: maximum recursion depth exceeded The following stripped down piece of numpy seems to be the problem: class UserArray(object): def __setattr__(self,attr,value): try: self.array.__setattr__(attr, value) except AttributeError: object.__setattr__(self, attr, value) # Only called after other approaches fail. def __getattr__(self,attr): return self.array.__getattribute__(attr) R From cookedm at physics.mcmaster.ca Tue Apr 18 13:10:02 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Tue Apr 18 13:10:02 2006 Subject: [Numpy-discussion] Trac Wikis closed for anonymous edits until further notice In-Reply-To: <44421025.9060804@gmail.com> (Robert Kern's message of "Sun, 16 Apr 2006 04:36:37 -0500") References: <44421025.9060804@gmail.com> Message-ID: Robert Kern writes: > We've been hit badly by spammers, so I can only presume our Trac sites are now > on the traded spam lists. I am going to turn off anonymous edits for now. Ticket > creation will probably still be left open for now. Another thing that's concerned me is closing of tickets by anonymous; can we turn that off? It disturbs me when I'm browsing the RSS feed and I see that. If a user who's not a developer thinks it could be closed, they could post a comment saying that, and a developer could close it. > Many thanks to David Cooke for quickly removing the spam. The RSS feeds are great for that. Although having a way to quickly revert a change would have made it easier :-) > I am looking into ways to allow people to register themselves with the Trac > sites so they can edit the Wikis and submit tickets without needing to be added > by a project admin. that'd be good. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant.travis at ieee.org Tue Apr 18 13:50:09 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 13:50:09 2006 Subject: [Numpy-discussion] Infinite recursion in numpy called from swig generated code In-Reply-To: References: <5809AC56-B2DF-4403-B7BC-9AEEAAC78505@astro.princeton.edu> <43FD32E4.10600@ieee.org> <44203F91.7010505@ieee.org> Message-ID: <444550CF.6090100@ieee.org> Robert Lupton wrote: > The latest version of swig (1.3.28 or 1.3.29) has broken my > multiple-inheritance-from-C-and-numpy application; more specifically, > it generates an infinite loop in numpy-land. I'm using numpy (0.9.6), > and here's the offending code. Ideas anyone? I've pasted the crucial > part of numpy.lib.UserArray onto the end of this message (how do I know? > because you can replace the "from numpy.lib.UserArray" with this, and > the problem persists). This is a problem in the getattr code of UserArray. This is fixed in SVN. But, you can just replace the getattr code in UserArray.py with the following: def __getattr__(self,attr): if (attr == 'array'): return object.__getattr__(self, attr) return self.array.__getattribute__(attr) Thanks for finding and reporting this. -Travis From christian at marquardt.sc Tue Apr 18 14:48:06 2006 From: christian at marquardt.sc (Christian Marquardt) Date: Tue Apr 18 14:48:06 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> <44451611.9070707@ieee.org> Message-ID: <20053.84.167.224.64.1145396854.squirrel@webmail.marquardt.sc> On Tue, April 18, 2006 19:36, Gary Strangman wrote: > >>> Not true. R supports "NA" for all its types except raw bytes. >>> For example: > (snip) >> >> For Boolean values there is "room" for a NA value, but what about >> arbitrary >> integers. Does R just limit the range of the integer value? That's >> what I >> meant: "fiddling with special-values" doesn't generalize to all >> data-types. > > In R, I believe NA = -sys.maxint-1 Don't know if this helps, but I have found the following in the R Data Import/Export Manual (in section 6.5.1, available at http://cran.r-project.org/doc/manuals/R-data.html): The missing value for R logical and integer types is INT_MIN, the smallest representable int defined in the C header limits.h, normally corresponding to the bit pattern 0xffffffff. For doubles (I think R only uses double precision internally), it's a bit more complex apparently; in the section mentioned above, the authors explain that [If R's internal constant definitions / library functions can't be used], on all common platforms IEC 60559 (aka IEEE 754) arithmetic is used, so standard C facilities can be used to test for or set Inf, -Inf and NaN values. On such platforms NA is represented by the NaN value with low-word 0x7a2 (1954 in decimal). The implementation of the floating point NA value is done in the file arithmetics.c of the R source code; the relevant code snippets defining the NA "value" are (I believe) typedef union { double value; unsigned int word[2]; } ieee_double; #ifdef WORDS_BIGENDIAN static CONST int hw = 0; static CONST int lw = 1; #else /* !WORDS_BIGENDIAN */ static CONST int hw = 1; static CONST int lw = 0; #endif /* WORDS_BIGENDIAN */ static double R_ValueOfNA(void) { /* The gcc shipping with RedHat 9 gets this wrong without * the volatile declaration. Thanks to Marc Schwartz. */ volatile ieee_double x; x.word[hw] = 0x7ff00000; x.word[lw] = 1954; return x.value; } and the tests for a number being NA or NaN are int R_IsNA(double x) { if (isnan(x)) { ieee_double y; y.value = x; return (y.word[lw] == 1954); } return 0; } int R_IsNaN(double x) { if (isnan(x)) { ieee_double y; y.value = x; return (y.word[lw] != 1954); } return 0; } Hope this is useful, Christian. From twegener at radlogic.com.au Tue Apr 18 18:07:02 2006 From: twegener at radlogic.com.au (Tim Wegener) Date: Tue Apr 18 18:07:02 2006 Subject: [Numpy-discussion] Backporting numpy to Python 2.2 Message-ID: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Hi, I am attempting to backport numpy-0.9.6 to be compatible with python 2.2. (Some of our machines run python 2.2 as part of Red Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard setup.) I was able to change most of the 2.3-isms to be 2.2 compatible (see the attached patch). However I had problems compiling the following c module: In file included from numpy/core/src/multiarraymodule.c:64: numpy/core/src/arrayobject.c: In function `arraydescr_dealloc': numpy/core/src/arrayobject.c:8417: warning: passing arg 1 of pointer to function from incompatible pointer type numpy/core/src/multiarraymodule.c: In function `PyArray_DescrConverter': numpy/core/src/multiarraymodule.c:4072: `PyBool_Type' undeclared (first use in this function) numpy/core/src/multiarraymodule.c: In function `setup_scalartypes': numpy/core/src/multiarraymodule.c:5736: `PyBool_Type' undeclared (first use in this function) numpy/core/src/multiarraymodule.c: In function `initmultiarray': numpy/core/src/multiarraymodule.c:5897: `PyObject_SelfIter' undeclared (first use in this function) error: Command "gcc -DNDEBUG -O2 -g -pipe -march=i386 -mcpu=i686 -D_GNU_SOURCE -fPIC -fPIC -Ibuild/src/numpy/core/src -Inumpy/core/include -Ibuild/src/numpy/core -Inumpy/core/src -Inumpy/core/include -I/usr/include/python2.2 -c numpy/core/src/multiarraymodule.c -o build/temp.linux-i686-2.2/multiarraymodule.o" failed with exit status 1 Is it possible to modify this module for python 2.2 compatibility or have I reached a dead end? It would be great if numpy were compatible with 2.2 out of the box, given that 2.3 is only a couple of years old (new), and 2.2 is still quite widely deployed. I am trying to migrate to numpy from Numeric, which worked happily with 2.2. FYI, a quick summary of the compatibility amendments to the python code: - backported os.walk - backported enumerate - backported distutils.log - used slices instead of list.index(item, ) - used 'r' mode instead of 'U' mode (it didn't seem that universal newline support was needed where it was used) - used the {} way of building a new dict rather than using keyword args to the dict constructor - from __future__ import generators - used str.count(substr) rather than substr in str - used os.sep rather than os.path.sep - commented out some of the new Configuration keword arguments (download_url and classifiers) The above don't really affect the functionality, but a couple of more unusual changes were needed as well: - had to add "self.compiler.exe_extension = ''" to numpy/distutils/command/config.py (see patch) - had to change the following to and empty dict: "kws = {'depends':ext.depends}" in numpy/distutils/command/build_ext.py (see patch) These two changes may have unwanted side effects, and a better fix is probably needed there. Regards, Tim -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: numpy-0.9.6_patched_for_py2.2_diff.txt URL: From oliphant at ee.byu.edu Tue Apr 18 20:03:01 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 18 20:03:01 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060414213511.GA14355@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> Message-ID: <4445A822.60207@ee.byu.edu> faltet at xot.carabos.com wrote: >Hi, > >I'm seeing some slowness in NumPy when dealing with strided arrays. >numarray is dealing better with these situations, so I guess that >something could be done in NumPy about this. Below are the situations >that I've found up to now (maybe there are others). For the timings, >I've used numpy 0.9.7.2278 and numarray 1.5.1. > > The source of this slowness is the use in numarray of special-cases for certain-sized byte-copies. Apparently, it is *much* faster to do ((double *)dst)[0] = ((double *)src)[0] when you have aligned data than it is to do memmove(dst, src, sizeof(double)) This is a useful piece of knowledge to have for optimization. There may be other optimizations like that already used by Numarray but still needing to be adapted for NumPy. I applied an optimization to take advantage of this when possible and got a 10x speed-up in the 1-d case. My timings for your benchmark with current SVN of NumPy are: NumPy: [0.021701812744140625, 0.021739959716796875, 0.021548032760620117] Numarray: [0.052516937255859375, 0.052685976028442383, 0.052355051040649414] Old timings: NumPy: [~0.09, ~0.09, ~0.09] Numarray: [~0.05, ~0.05, ~0.05] -Travis From ndarray at mac.com Tue Apr 18 20:26:16 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 18 20:26:16 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <4445A822.60207@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> Message-ID: On 4/18/06, Travis Oliphant wrote: > [...] > Apparently, it is *much* faster to do > > ((double *)dst)[0] = ((double *)src)[0] > > when you have aligned data than it is to do > > memmove(dst, src, sizeof(double)) > > This is a useful piece of knowledge to have for optimization. This is not surprising because memmove has to assume arbitrary alignment and possibility of overlap between src and dst areas. From ndarray at mac.com Tue Apr 18 20:27:02 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 18 20:27:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <4445A822.60207@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> Message-ID: On 4/18/06, Travis Oliphant wrote: > [...] > Apparently, it is *much* faster to do > > ((double *)dst)[0] = ((double *)src)[0] > > when you have aligned data than it is to do > > memmove(dst, src, sizeof(double)) > > This is a useful piece of knowledge to have for optimization. This is not surprising because memmove has to assume arbitrary alignment and possibility of overlap between src and dst areas. From tim.hochberg at cox.net Wed Apr 19 08:58:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 19 08:58:04 2006 Subject: [Numpy-discussion] seterr changes Message-ID: <44465DEE.8090703@cox.net> Hi Travis et al, I started looking at your seterr changes. I stared at yours for a while then I stared at mine for a while. Then I decided that mine wouldn't work right in the presence of threads. Then I decided that yours wouldn't work right in the presence of threads either. Specifically, it looks like ufunc_update_use_defaults isn't going to work. I think I know how to fix that, but I'm not sure that it's worth the trouble since I also did some benchmarking and it appears that the benefit of special casing is minimal. I looked at six cases: small (len-1), medium (len-1e4) and large (len-1e6) arrays with error checking on and error checking off. For medium and large arrays, I could discern no difference at all. For small arrays, there may be some difference, but it appears to be less than 5%. I'm not sure it's worth working through a bunch of finicky thread stuff to get just 5% back. If these benchmark numbers hold up I'd be inclined to rip out the use_default support since it's complicated enough that I know we'll end up chasing a few evil thread related bugs down through it. I'll include the benchmarking code below. If people could (a) look it over and confirm that I'm not doing something bogus and (b) try it on some different platforms and see if they see a more signifigant difference, I'd appreciate it. I'm also curious about the seterr interface. It returns ufunc_values_obj. I'm wasn't sure how one is supposed to pass that back in to seterr, so I modified seterr to instead return a dictionary. I also modified it so that the seterr function itself has no defaults (or rather they're all None). Instead, any unspecified values are taken from the current error state. Thus seterr(divide="warn") changes only the divide state, leaving the other entries alone. Regards, -tim if True: from timeit import Timer setup = """ import numpy numpy.seterr(divide="%s") a = numpy.zeros([%s], dtype=float) """ for size in [1, 10000, 1000000]: for i in range(3): for state in ['ignore', 'warn']: reps = min(100000000 / size, 100000) timer = Timer("a * a", setup % (state, size)) print "%s|%s =>" % (state, size), timer.timeit(reps) print print From arkaitz.bitorika at gmail.com Wed Apr 19 10:30:03 2006 From: arkaitz.bitorika at gmail.com (Arkaitz Bitorika) Date: Wed Apr 19 10:30:03 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter Message-ID: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> Hi, I'm embedding Python in a big C++ program (the NS network simulator) and I have problems when importing the numpy module, I get a Floating Point exception. The C code that causes the exception is: Py_Initialize(); PyObject* module = PyImport_ImportModule("numpy"); Py_DECREF(module); I'm running Ubuntu Breezy on a dual processor Dell machine, with the stock python and numpy 0.9.6. One strange thing is that I haven't been able to reproduce the crash by writing a minimal C program with the code above, it only crashes when added to my program. I've been embedding Python for ages on the same program and other modules work fine, only numpy fails. I've debugged the issue a bit and I've seen that the exception is thrown when the numpy __init__.py tries to import the core module. The GDB backtrace is pasted at the end. Any idea what may be going wrong? Thanks, Arkaitz 0xb7900fd2 in initumath () at build/src/numpy/core/src/umathmodule.c:10321 10321 pinf *= mul; (gdb) bt #0 0xb7900fd2 in initumath () at build/src/numpy/core/src/umathmodule.c:10321 #1 0xb7e4e310 in _PyImport_LoadDynamicModule () from /usr/lib/libpython2.4.so.1.0 #2 0xb7e4c450 in _PyImport_FindModule () from /usr/lib/libpython2.4.so.1.0 #3 0xb7e4cc01 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #4 0xb7e4ce26 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #5 0xb7e4d2c6 in PyImport_ImportModuleEx () from /usr/lib/libpython2.4.so.1.0 #6 0xb7e22d9e in _PyUnicodeUCS4_ToLowercase () from /usr/lib/libpython2.4.so.1.0 #7 0xb7df5923 in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0 #8 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #9 0xb7e2a92c in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.4.so.1.0 #10 0xb7e2e8f9 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #11 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #12 0xb7e31b76 in PyEval_EvalCode () from /usr/lib/libpython2.4.so.1.0 #13 0xb7e4a525 in PyImport_ExecCodeModuleEx () from /usr/lib/libpython2.4.so.1.0 #14 0xb7e4a8e9 in PyImport_ExecCodeModule () from /usr/lib/libpython2.4.so.1.0 #15 0xb7e4c73e in _PyImport_FindModule () from /usr/lib/libpython2.4.so.1.0 #16 0xb7e4cc01 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #17 0xb7e4ce26 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #18 0xb7e4d2c6 in PyImport_ImportModuleEx () from /usr/lib/libpython2.4.so.1.0 #19 0xb7e22d9e in _PyUnicodeUCS4_ToLowercase () from /usr/lib/libpython2.4.so.1.0 #20 0xb7df5923 in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0 #21 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #22 0xb7e2a92c in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.4.so.1.0 #23 0xb7e2e8f9 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #24 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #25 0xb7e31b76 in PyEval_EvalCode () from /usr/lib/libpython2.4.so.1.0 #26 0xb7e5667f in PyRun_String () from /usr/lib/libpython2.4.so.1.0 #27 0xb7e2fce6 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #28 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #29 0xb7e3011a in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #30 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #31 0xb7de31b6 in PyFunction_SetClosure () from /usr/lib/libpython2.4.so.1.0 #32 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #33 0xb7dd079b in PyMethod_New () from /usr/lib/libpython2.4.so.1.0 #34 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #35 0xb7dcfd7b in PyInstance_NewRaw () from /usr/lib/libpython2.4.so.1.0 #36 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #37 0xb7e2f5d2 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #38 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #39 0xb7e31b76 in PyEval_EvalCode () from /usr/lib/libpython2.4.so.1.0 #40 0xb7e4a525 in PyImport_ExecCodeModuleEx () from /usr/lib/libpython2.4.so.1.0 #41 0xb7e4a8e9 in PyImport_ExecCodeModule () from /usr/lib/libpython2.4.so.1.0 #42 0xb7e4c73e in _PyImport_FindModule () from /usr/lib/libpython2.4.so.1.0 #43 0xb7e4cc01 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #44 0xb7e4ce26 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #45 0xb7e4d2c6 in PyImport_ImportModuleEx () from /usr/lib/libpython2.4.so.1.0 #46 0xb7e22d9e in _PyUnicodeUCS4_ToLowercase () from /usr/lib/libpython2.4.so.1.0 #47 0xb7df5923 in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0 #48 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #49 0xb7dcc6c0 in PyObject_CallFunction () from /usr/lib/libpython2.4.so.1.0 #50 0xb7e4d745 in PyImport_Import () from /usr/lib/libpython2.4.so.1.0 #51 0xb7e4d918 in PyImport_ImportModule () from /usr/lib/libpython2.4.so.1.0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From strawman at astraw.com Wed Apr 19 10:38:11 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 19 10:38:11 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> Message-ID: <44467576.1020708@astraw.com> Arkaitz Bitorika wrote: > Hi, > > I'm embedding Python in a big C++ program (the NS network simulator) > and I have problems when importing the numpy module, I get a Floating > Point exception. The C code that causes the exception is: I guess you mean a CPU/kernel level floating point exception (SIGFPE), not a Python exception? > > Py_Initialize(); > PyObject* module = PyImport_ImportModule("numpy"); > Py_DECREF(module); > > > I'm running Ubuntu Breezy on a dual processor Dell machine, with the > stock python and numpy 0.9.6. One strange thing is that I haven't been > able to reproduce the crash by writing a minimal C program with the > code above, it only crashes when added to my program. Does your program change error bits on the FPU or SSE units on your processor? (What processor are you using?) > I've been embedding Python for ages on the same program and other > modules work fine, only numpy fails. Most other modules don't use the SSE units, so wouldn't get hit by such a bug. > > I've debugged the issue a bit and I've seen that the exception is > thrown when the numpy __init__.py tries to import the core module. The > GDB backtrace is pasted at the end. > Any idea what may be going wrong? glibc 2.3.2 (e.g. in debian sarge) has a bug where the SSE unit has an error bit set wrong. But I'd guess Ubuntu isn't using this version of glibc, so I think the problem may be elsewhere. http://sources.redhat.com/bugzilla/show_bug.cgi?id=10 From strawman at astraw.com Wed Apr 19 11:30:10 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 19 11:30:10 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> Message-ID: <4446819D.3030401@astraw.com> Arkaitz Bitorika wrote: > > On 19 Apr 2006, at 18:37, Andrew Straw wrote: > >> >>> I've been embedding Python for ages on the same program and other >>> modules work fine, only numpy fails. >> >> >> Most other modules don't use the SSE units, so wouldn't get hit by such >> a bug. > > > Is there a way of not using those units from numpy, to check if > that's what's going on? I think that numpy only accesses the SSE units through ATLAS or other external library. So, build numpy without ATLAS. But I'm not 100% sure anymore if there aren't any optimizations that directly use SSE if it's available. > Or alternatively, how would I check if my program is messing with the > SSE bits? Hmm, I think that's a bit hairy. I'd suggest simply asking the C++ library's mailing list if they alter the error bits on the control registers of the SSE unit. (Out of curiousity, what library is it?) If you want hairy, though, I think you'd have to check from C with the appropriate calls -- I'd start with the source code in that bug report. It looks like they're inlining an assembly statement to query a SSE control register. From faltet at xot.carabos.com Wed Apr 19 14:49:02 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Wed Apr 19 14:49:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <4445A822.60207@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> Message-ID: <20060419214814.GA21524@xot.carabos.com> On Tue, Apr 18, 2006 at 09:01:54PM -0600, Travis Oliphant wrote: > faltet at xot.carabos.com wrote: > The source of this slowness is the use in numarray of special-cases for > certain-sized byte-copies. > > Apparently, it is *much* faster to do > > ((double *)dst)[0] = ((double *)src)[0] > > when you have aligned data than it is to do > > memmove(dst, src, sizeof(double)) Mmm.. very interesting. > My timings for your benchmark with current SVN of NumPy are: > > NumPy: [0.021701812744140625, 0.021739959716796875, 0.021548032760620117] > Numarray: [0.052516937255859375, 0.052685976028442383, 0.052355051040649414] Well, in my machine and using numpy SVN version: numpy: [0.0974161624908447, 0.0621590614318847, 0.0612149238586425] numarray: [0.0658359527587890, 0.0623040199279785, 0.0627131462097167] So, numpy and numarray exhibits same performance now (it's curious why you are actually getting better performance in your platform). However: In [25]: stnac=timeit.Timer('b=a.copy()','import numarray as np; a=np.arange(1000000,dtype="complex128")[::10]') In [26]: stnpc=timeit.Timer('b=a.copy()','import numpy as np; a=np.arange(1000000,dtype="complex128")[::10]') In [27]: stnac.repeat(3,10) Out[27]: [0.11303496360778809, 0.11540508270263672, 0.11556506156921387] In [28]: stnpc.repeat(3,10) Out[28]: [0.21353006362915039, 0.21468400955200195, 0.21390914916992188] So, it seems that you forgot optimizing complex types. Fortunately, the cure is easy; after adding the attached patch I'm getting: In [3]: stnpc.repeat(3,10) Out[3]: [0.10468602180480957, 0.10204982757568359, 0.10242295265197754] so, good performance for numpy in copying strided complex128 is achieved as well. Thanks for looking into this! Francesc ====================================================================== --- numpy/core/src/arrayobject.c (revision 2381) +++ numpy/core/src/arrayobject.c (working copy) @@ -629,6 +629,14 @@ char *tout = dst; char *tin = src; switch(elsize) { + case 16: + for (i=0; i References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> Message-ID: <20060420091351.475439ab.simon@arrowtheory.com> On Wed, 19 Apr 2006 11:29:49 -0700 Andrew Straw wrote: > > > > > Is there a way of not using those units from numpy, to check if > > that's what's going on? > > I think that numpy only accesses the SSE units through ATLAS or other > external library. So, build numpy without ATLAS. But I'm not 100% sure > anymore if there aren't any optimizations that directly use SSE if it's > available. We had to disable attlas-sse on our debian system for these exact reasons. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From tom.denniston at alum.dartmouth.org Wed Apr 19 17:17:18 2006 From: tom.denniston at alum.dartmouth.org (Tom Denniston) Date: Wed Apr 19 17:17:18 2006 Subject: [Numpy-discussion] LAPACK question building numpy Message-ID: Is there a way to pass a command line argument to setup.py for numpy that does the equivalent of a make using the flags: -L/home/tdennist/lib -lmkl_lapack -lmkl_lapack32 -lmkl_ia32 -lmkl -lguide All i can find on the subject is a page on the scipy wiki that says to use the variable LAPACK and set it to a .a file. When I do so I get undefined symbol problems. I this is probably really obvous and documented somewhere but I haven't been able to find it. I don't really know where to look. --Tom From strawman at astraw.com Wed Apr 19 18:59:03 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 19 18:59:03 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <20060420091351.475439ab.simon@arrowtheory.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <20060420091351.475439ab.simon@arrowtheory.com> Message-ID: <4446EAB9.7010209@astraw.com> Simon Burton wrote: >On Wed, 19 Apr 2006 11:29:49 -0700 >Andrew Straw wrote: > > > >>>Is there a way of not using those units from numpy, to check if >>>that's what's going on? >>> >>> >>I think that numpy only accesses the SSE units through ATLAS or other >>external library. So, build numpy without ATLAS. But I'm not 100% sure >>anymore if there aren't any optimizations that directly use SSE if it's >>available. >> >> > >We had to disable attlas-sse on our debian system for these exact >reasons. > > If you're using debian sarge and the problem is your glibc, you can fix it: http://www.its.caltech.edu/~astraw/coding.html#id3 From robert.kern at gmail.com Wed Apr 19 19:43:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 19 19:43:02 2006 Subject: [Numpy-discussion] Re: LAPACK question building numpy In-Reply-To: References: Message-ID: Tom Denniston wrote: > Is there a way to pass a command line argument to setup.py for numpy > that does the equivalent of a make using the flags: > -L/home/tdennist/lib -lmkl_lapack -lmkl_lapack32 -lmkl_ia32 -lmkl -lguide > > All i can find on the subject is a page on the scipy wiki that says to > use the variable LAPACK and set it to a .a file. When I do so I get > undefined symbol problems. > > I this is probably really obvous and documented somewhere but I > haven't been able to find it. I don't really know where to look. Don't worry, it's not really well documented. Create a file called site.cfg in the root source directory. There's an example site.cfg.example there. Unfortunately, it's pretty sparse at the moment. Now, I'm not terribly familiar with the MKL, so I don't know what libraries do what, but here is my guess at the appropriate things you will need in site.cfg: [DEFAULT] library_dirs=/home/tdennist/lib:/some/other/path/perhaps include_dirs=/home/tdennist/include [blas_opt] libraries=whatever_the_mkl_blas_lib_is,mkl_ia32,mkl,guide [lapack_opt] libraries=mkl_lapack,mkl_lapack32,mkl_ia32,mkl,guide There's some more documentation in numpy/distutils/system_info.py . -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From faltet at xot.carabos.com Wed Apr 19 19:46:03 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Wed Apr 19 19:46:03 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060419214814.GA21524@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> <20060419214814.GA21524@xot.carabos.com> Message-ID: <20060420024510.GA21987@xot.carabos.com> On Wed, Apr 19, 2006 at 09:48:14PM +0000, faltet at xot.carabos.com wrote: > On Tue, Apr 18, 2006 at 09:01:54PM -0600, Travis Oliphant wrote: > > Apparently, it is *much* faster to do > > > > ((double *)dst)[0] = ((double *)src)[0] > > > > when you have aligned data than it is to do > > > > memmove(dst, src, sizeof(double)) > > Mmm.. very interesting. A follow-up on this. After analyzing somewhat the issue, it seems that the problem with the memcpy() version was not the call itself, but the parameter that was passed as the number of bytes to copy. As this was a parameter whose value was unknown in compile time, the compiler cannot generate optimized code for it and always has to fetch its value from memory (or cache). In the version of the code that you optimized, you managed to do this because you are telling to the compiler (i.e. specifying at compile time) the exact extend of the data copy, so allowing it to generate optimum code for the copy operation. However, if you do a similar thing but using the call (using doubles here): memcpy(tout, tin, 8); instead of: ((Float64 *)tout)[0] = ((Float64 *)tin)[0]; and repeat the operation for the other types, then you can achieve similar performance than the pointer version. On another hand, I see that you have disabled the optimization for unaligned data through the use of a check. Is there any reason for doing that? If I remove this check, I can achieve similar performance than for numarray (a bit better, in fact). I'm attaching a small benchmark script that compares the performance of copying a 1D vector of 1 million of elements in contiguous, strided (2 and 10), and strided (2 and 10 again) & unaligned flavors. The results for my machine (p4 at 2 GHz) are: For the original numpy code (i.e. before Travis optimization): time for numpy contiguous --> 0.234 time for numarray contiguous --> 0.229 time for numpy strided (2) --> 1.605 time for numarray strided (2) --> 0.263 time for numpy strided (10) --> 1.72 time for numarray strided (10) --> 0.264 time for numpy strided (2) & unaligned--> 1.736 time for numarray strided (2) & unaligned--> 0.402 time for numpy strided (10) & unaligned--> 1.872 time for numarray strided (10) & unaligned--> 0.435 where you can see that, for 1e6 elements the slowdown of original numpy is almost 7x (!). Remember that in the previous benchmarks sent here the slowdown was 3x, but we were copying 10 times less data. For the pointer optimised code (i.e. the current SVN version): time for numpy contiguous --> 0.238 time for numarray contiguous --> 0.232 time for numpy strided (2) --> 0.214 time for numarray strided (2) --> 0.264 time for numpy strided (10) --> 0.299 time for numarray strided (10) --> 0.262 time for numpy strided (2) & unaligned--> 1.736 time for numarray strided (2) & unaligned--> 0.401 time for numpy strided (10) & unaligned--> 1.874 time for numarray strided (10) & unaligned--> 0.433 here you can see that your figures are very similar to numarray except for unaligned data (4x slower). For the pointer optimised code but releasing the unaligned data check: time for numpy contiguous --> 0.236 time for numarray contiguous --> 0.231 time for numpy strided (2) --> 0.213 time for numarray strided (2) --> 0.262 time for numpy strided (10) --> 0.297 time for numarray strided (10) --> 0.261 time for numpy strided (2) & unaligned--> 0.263 time for numarray strided (2) & unaligned--> 0.403 time for numpy strided (10) & unaligned--> 0.452 time for numarray strided (10) & unaligned--> 0.432 Ei! numpy is very similar to numarray in all cases, except for the strided with 2 elements and unaligned case, where numpy performs a 50% better. Finally, and just for showing the effect of providing memcpy with size information in compilation time, the numpy code using memcpy() with this optimization on (and disabling the alignment check, of course!): time for numpy contiguous --> 0.234 time for numarray contiguous --> 0.233 time for numpy strided (2) --> 0.223 time for numarray strided (2) --> 0.262 time for numpy strided (10) --> 0.285 time for numarray strided (10) --> 0.262 time for numpy strided (2) & unaligned--> 0.261 time for numarray strided (2) & unaligned--> 0.401 time for numpy strided (10) & unaligned--> 0.42 time for numarray strided (10) & unaligned--> 0.436 you can see that the figures are very similar to the previous case. So Travis, you may want to use the pointer indirection approach or the memcpy() one, whichever you prefer. Well, I just wanted to point this out. Time for sleep! Francesc -------------- next part -------------- A non-text attachment was scrubbed... Name: bench-copy.py Type: text/x-python Size: 2054 bytes Desc: not available URL: From tim.hochberg at cox.net Wed Apr 19 19:57:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 19 19:57:06 2006 Subject: [Numpy-discussion] Summer of Code ideas Message-ID: <4446F8D8.40909@cox.net> Discussing ideas for summer of code projects seems to be all the rage right now on various other Python lists, so I though I'd throw out a few that I've had. There are several different things that could be done with numexpr including: 1. Adding broadcasting. 2. Coercing arrays a chunk at a time instead of all at once when coercion is necessary. 3. Fancier syntax. I think that some variant of the following could be made to work: with deferred_evaluation: # Converts everything in local namespace to special objects # all of these math operations are deferred a = 5 + b*32 c = a + 73 # Now all objects are restored and deferred experesions are evaluated. This might be cool or it might be useless, but it sounds fun to try. I haven't talked to David Cooke about any of these and since numexpr is really his project he should be consulted before anyone tries these. There's also some stuff to be done on the basearray front. I expect I'll have the actual basearray object together in the next couple of weeks depending on my level of busyness, but there'll be a lot of other stuff to do besides just that. My general plan it to build a toolkit around basearray that can be used to build other array packages. These packages might be lighter weight than numpy or they might be specialized in some way that's not really compatible with numpy and ndarray. There's also room for potential for experimentation with protocols / generic functions. If anyones interested I suggest you read the thread (currently dormant) on python-3000.devel on this topic. There are lots of possible applications for this in numpy including using them to implement or replace: * asarray * __array_priority__ (by making the ufuncs and thus __add__, etc overloaded functions). * __array__, __array_wrap__, etc. * all the various functions that are giving us trouble with MA. * probably a bunch of other stuff. The basic basearray toolkit I mentioned above would be a good place to experiment with stuff like this, once it exists, since in theory it will be simpler than the full numpy codebase and you don't have to worry so much about backwards compatibility. Anyway, that's a bunch of random ideas that I at least find interesting. Regards, -tim From oliphant at ee.byu.edu Wed Apr 19 20:44:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 19 20:44:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060420024510.GA21987@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> <20060419214814.GA21524@xot.carabos.com> <20060420024510.GA21987@xot.carabos.com> Message-ID: <44470255.302@ee.byu.edu> faltet at xot.carabos.com wrote: >On Wed, Apr 19, 2006 at 09:48:14PM +0000, faltet at xot.carabos.com wrote: > > >>On Tue, Apr 18, 2006 at 09:01:54PM -0600, Travis Oliphant wrote: >> >> >>>Apparently, it is *much* faster to do >>> >>>((double *)dst)[0] = ((double *)src)[0] >>> >>>when you have aligned data than it is to do >>> >>>memmove(dst, src, sizeof(double)) >>> >>> >>Mmm.. very interesting. >> >> > >A follow-up on this. After analyzing somewhat the issue, it seems that >the problem with the memcpy() version was not the call itself, but the >parameter that was passed as the number of bytes to copy. As this was a >parameter whose value was unknown in compile time, the compiler cannot >generate optimized code for it and always has to fetch its value from >memory (or cache). > > >In the version of the code that you optimized, you managed to do this >because you are telling to the compiler (i.e. specifying at compile >time) the exact extend of the data copy, so allowing it to generate >optimum code for the copy operation. However, if you do a similar >thing but using the call (using doubles here): > >memcpy(tout, tin, 8); > >instead of: > >((Float64 *)tout)[0] = ((Float64 *)tin)[0]; > >and repeat the operation for the other types, then you can achieve >similar performance than the pointer version. > > This is good to know. It certainly makes sense. I'll test it on my system when I get back. >On another hand, I see that you have disabled the optimization for >unaligned data through the use of a check. Is there any reason for >doing that? If I remove this check, I can achieve similar performance >than for numarray (a bit better, in fact). > > The only reason was to avoid pointer dereferencing on misaligned data (dereferencing a misaligned pointer causes bus errors on Solaris). But, if we can achieve it with a memmove, then there is no reason to limit the code. >I'm attaching a small benchmark script that compares the performance >of copying a 1D vector of 1 million of elements in contiguous, strided >(2 and 10), and strided (2 and 10 again) & unaligned flavors. The >results for my machine (p4 at 2 GHz) are: > >For the original numpy code (i.e. before Travis optimization): > >time for numpy contiguous --> 0.234 >time for numarray contiguous --> 0.229 >time for numpy strided (2) --> 1.605 >time for numarray strided (2) --> 0.263 >time for numpy strided (10) --> 1.72 >time for numarray strided (10) --> 0.264 >time for numpy strided (2) & unaligned--> 1.736 >time for numarray strided (2) & unaligned--> 0.402 >time for numpy strided (10) & unaligned--> 1.872 >time for numarray strided (10) & unaligned--> 0.435 > >where you can see that, for 1e6 elements the slowdown of original >numpy is almost 7x (!). Remember that in the previous benchmarks sent >here the slowdown was 3x, but we were copying 10 times less data. > >For the pointer optimised code (i.e. the current SVN version): > >time for numpy contiguous --> 0.238 >time for numarray contiguous --> 0.232 >time for numpy strided (2) --> 0.214 >time for numarray strided (2) --> 0.264 >time for numpy strided (10) --> 0.299 >time for numarray strided (10) --> 0.262 >time for numpy strided (2) & unaligned--> 1.736 >time for numarray strided (2) & unaligned--> 0.401 >time for numpy strided (10) & unaligned--> 1.874 >time for numarray strided (10) & unaligned--> 0.433 > >here you can see that your figures are very similar to numarray except >for unaligned data (4x slower). > >For the pointer optimised code but releasing the unaligned data check: > >time for numpy contiguous --> 0.236 >time for numarray contiguous --> 0.231 >time for numpy strided (2) --> 0.213 >time for numarray strided (2) --> 0.262 >time for numpy strided (10) --> 0.297 >time for numarray strided (10) --> 0.261 >time for numpy strided (2) & unaligned--> 0.263 >time for numarray strided (2) & unaligned--> 0.403 >time for numpy strided (10) & unaligned--> 0.452 >time for numarray strided (10) & unaligned--> 0.432 > >Ei! numpy is very similar to numarray in all cases, except for the >strided with 2 elements and unaligned case, where numpy performs a 50% >better. > >Finally, and just for showing the effect of providing memcpy with size >information in compilation time, the numpy code using memcpy() with >this optimization on (and disabling the alignment check, of course!): > >time for numpy contiguous --> 0.234 >time for numarray contiguous --> 0.233 >time for numpy strided (2) --> 0.223 >time for numarray strided (2) --> 0.262 >time for numpy strided (10) --> 0.285 >time for numarray strided (10) --> 0.262 >time for numpy strided (2) & unaligned--> 0.261 >time for numarray strided (2) & unaligned--> 0.401 >time for numpy strided (10) & unaligned--> 0.42 >time for numarray strided (10) & unaligned--> 0.436 > >you can see that the figures are very similar to the previous case. So >Travis, you may want to use the pointer indirection approach or the >memcpy() one, whichever you prefer. > >Well, I just wanted to point this out. Time for sleep! > > > Very, very useful information. 1000 Thank you's for talking the time to investigate and assemble it. Do you think the memmove would work similarly? -Travis From tom.denniston at alum.dartmouth.org Thu Apr 20 08:07:04 2006 From: tom.denniston at alum.dartmouth.org (Tom Denniston) Date: Thu Apr 20 08:07:04 2006 Subject: [Numpy-discussion] Re: LAPACK question building numpy In-Reply-To: References: Message-ID: Thanks for your help. I will try this. --Tom On 4/19/06, Robert Kern wrote: > Tom Denniston wrote: > > Is there a way to pass a command line argument to setup.py for numpy > > that does the equivalent of a make using the flags: > > -L/home/tdennist/lib -lmkl_lapack -lmkl_lapack32 -lmkl_ia32 -lmkl -lguide > > > > All i can find on the subject is a page on the scipy wiki that says to > > use the variable LAPACK and set it to a .a file. When I do so I get > > undefined symbol problems. > > > > I this is probably really obvous and documented somewhere but I > > haven't been able to find it. I don't really know where to look. > > Don't worry, it's not really well documented. Create a file called site.cfg in > the root source directory. There's an example site.cfg.example there. > Unfortunately, it's pretty sparse at the moment. Now, I'm not terribly familiar > with the MKL, so I don't know what libraries do what, but here is my guess at > the appropriate things you will need in site.cfg: > > [DEFAULT] > library_dirs=/home/tdennist/lib:/some/other/path/perhaps > include_dirs=/home/tdennist/include > > [blas_opt] > libraries=whatever_the_mkl_blas_lib_is,mkl_ia32,mkl,guide > > [lapack_opt] > libraries=mkl_lapack,mkl_lapack32,mkl_ia32,mkl,guide > > There's some more documentation in numpy/distutils/system_info.py . > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From faltet at xot.carabos.com Thu Apr 20 09:42:04 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Thu Apr 20 09:42:04 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <44470255.302@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> <20060419214814.GA21524@xot.carabos.com> <20060420024510.GA21987@xot.carabos.com> <44470255.302@ee.byu.edu> Message-ID: <20060420164132.GA23763@xot.carabos.com> On Wed, Apr 19, 2006 at 09:39:01PM -0600, Travis Oliphant wrote: >>On another hand, I see that you have disabled the optimization for >>unaligned data through the use of a check. Is there any reason for >>doing that? If I remove this check, I can achieve similar performance >>than for numarray (a bit better, in fact). > >The only reason was to avoid pointer dereferencing on misaligned data >(dereferencing a misaligned pointer causes bus errors on Solaris). >But, if we can achieve it with a memmove, then there is no reason to >limit the code. I see. Well, I've tried out with memmove instead than memcpy, and I can reproduce the same slowdown than it was seen previously to using your pointer addressing optimisation. I'm afraid that Shasha was right in that memmove check for not overwriting destination is the responsible for this. Having said that, and although I must admit that I don't know in deep the different situations under which the source of a copy may overlap the destination, my guess is that for typical element sizes (i.e. [1], 2, 4, 8 and 16) for which the optimization has been done, there is not any harm on using memcpy instead of memmove (admittedly, you may come with a counter-example of this, but I do hope you don't). In any case, the use of memcpy is completely equivalent to the current optimization using pointers except that, hopefully, pointer addressing is not made on unaligned data. So, perhaps using the memcpy approach in Solaris (under Sparc I guess) may avoid the bus errors. It would be nice if anyone with access to such a platform can confirm this point. I'm attaching a patch for current SVN numpy that uses the memcpy approach. Feel free to try it against the benchmarks (also attached). One last word, I've added a case for typesize 1 in addition of the existing ones as this effectively improves the speed for 1-byte types. Below are the speeds without the 1-byte case optimisation: time for numpy contiguous --> 0.03 time for numarray contiguous --> 0.062 time for numpy strided (2) --> 0.078 time for numarray strided (2) --> 0.064 time for numpy strided (10) --> 0.081 time for numarray strided (10) --> 0.07 I haven't added a case for the unaligned case because this makes non-sense for 1 byte sized types. and here with the 1-byte case optimisation added: time for numpy contiguous --> 0.03 time for numarray contiguous --> 0.062 time for numpy strided (2) --> 0.054 time for numarray strided (2) --> 0.065 time for numpy strided (10) --> 0.061 time for numarray strided (10) --> 0.07 you can notice an speed-up between a 30% and 45% over the previous case. Cheers, -------------- next part -------------- --- numpy/core/src/arrayobject.c (revision 2381) +++ numpy/core/src/arrayobject.c (working copy) @@ -628,28 +628,44 @@ intp i, j; char *tout = dst; char *tin = src; + /* For typical datasizes, the memcpy call is much faster than memmove + and perfectely safe */ switch(elsize) { + case 16: + for (i=0; ind) == src->nd && (nd > 0) && + if (!swap && (nd = dest->nd) == src->nd && (nd > 0) && PyArray_CompareLists(dest->dimensions, src->dimensions, nd)) { int maxaxis=0, maxdim=dest->dimensions[0]; int i; -------------- next part -------------- A non-text attachment was scrubbed... Name: bench-copy.py Type: text/x-python Size: 2053 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bench-copy1.py Type: text/x-python Size: 1168 bytes Desc: not available URL: From rng7 at cornell.edu Thu Apr 20 13:49:13 2006 From: rng7 at cornell.edu (Ryan Gutenkunst) Date: Thu Apr 20 13:49:13 2006 Subject: [Numpy-discussion] Bypassing a[2].item()? Message-ID: <4447F397.7010006@cornell.edu> Hi all, I'm porting some code from old scipy to new scipy, and I've run into a rather large performance problem. The heart of the code is integrating a system of nonlinear differential equations using odeint. The function that dominates the time to run calculates the right hand side, given a current state x. (len(x) ~ 50.) Abstracted, the function looks like: def rhs(x) output = scipy.zeros(10, scipy.Float) a = x[0] b = x[1] ... output[0] = a/b + c*sqrt(d)... output[1] = b-a + 2*b... ... return output (I copy the elements of the current state to local variables to avoid the cost of repeatedly calling x.__getitem__, and to make the resulting equations easier to read.) When using numpy, a and b are now array scalars and the arithmetic is much slower, resulting in about a factor of 10 increase in runtimes from those using Numeric. I've tried doing: a = x[0].item(), which allows the arimetic be done on pure scalars. This is a little faster, but still results in a factor of 3 increase in runtime from old scipy. I imagine the slowdown comes from having to call __getitem__() followed by item() So questions: 1) I haven't followed the details of the array scalar discussions. Is it anticipated that array scalar arithmetic will eventually be as fast as arithmetic in native python types? 2) If not, is it possible to get a "pure" scalar directly from an array in one function call? Thanks for any help, Ryan -- Ryan Gutenkunst | Cornell LASSP | "It is not the mountain | we conquer but ourselves." Clark 535 / (607)227-7914 | -- Sir Edmund Hillary AIM: JepettoRNG | http://www.physics.cornell.edu/~rgutenkunst/ From robert.kern at gmail.com Thu Apr 20 14:20:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 20 14:20:02 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: <4447F397.7010006@cornell.edu> References: <4447F397.7010006@cornell.edu> Message-ID: Ryan Gutenkunst wrote: > So questions: > 1) I haven't followed the details of the array scalar discussions. Is it > anticipated that array scalar arithmetic will eventually be as fast as > arithmetic in native python types? More or less, if I'm not mistaken. This ticket is aimed at that: http://projects.scipy.org/scipy/numpy/ticket/55 > 2) If not, is it possible to get a "pure" scalar directly from an array > in one function call? float(x[0]) seems to be faster on my PowerBook. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rng7 at cornell.edu Thu Apr 20 15:21:11 2006 From: rng7 at cornell.edu (Ryan Gutenkunst) Date: Thu Apr 20 15:21:11 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: References: <4447F397.7010006@cornell.edu> Message-ID: <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> On Apr 20, 2006, at 5:18 PM, Robert Kern wrote: > Ryan Gutenkunst wrote: > >> So questions: >> 1) I haven't followed the details of the array scalar discussions. Is >> it >> anticipated that array scalar arithmetic will eventually be as fast as >> arithmetic in native python types? > > More or less, if I'm not mistaken. This ticket is aimed at that: > > http://projects.scipy.org/scipy/numpy/ticket/55 Good to hear. >> 2) If not, is it possible to get a "pure" scalar directly from an >> array >> in one function call? > > float(x[0]) seems to be faster on my PowerBook. It's faster for me, too, but float(x[0]) is still much slower than using Numeric where x[0] suffices. I guess I'll just have to warn my users away from the new scipy until numpy 0.9.8 comes out and scalar math is sped up. Cheers, Ryan -- Ryan Gutenkunst | Cornell Dept. of Physics | "It is not the mountain | we conquer but ourselves." Clark 535 / (607)255-6068 | -- Sir Edmund Hillary AIM: JepettoRNG | http://www.physics.cornell.edu/~rgutenkunst/ From robert.kern at gmail.com Thu Apr 20 16:22:09 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 20 16:22:09 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> References: <4447F397.7010006@cornell.edu> <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> Message-ID: Ryan Gutenkunst wrote: > On Apr 20, 2006, at 5:18 PM, Robert Kern wrote: > >> Ryan Gutenkunst wrote: >>> 2) If not, is it possible to get a "pure" scalar directly from an array >>> in one function call? >> >> float(x[0]) seems to be faster on my PowerBook. > > It's faster for me, too, but float(x[0]) is still much slower than using > Numeric where x[0] suffices. I guess I'll just have to warn my users > away from the new scipy until numpy 0.9.8 comes out and scalar math is > sped up. For that matter, a plain "x[0]" seems to be about 3x faster with Numeric than numpy. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Thu Apr 20 20:16:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 20 20:16:02 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: References: <4447F397.7010006@cornell.edu> <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> Message-ID: <44484E44.2050300@ee.byu.edu> Robert Kern wrote: >Ryan Gutenkunst wrote: > > >>On Apr 20, 2006, at 5:18 PM, Robert Kern wrote: >> >> >> >>>Ryan Gutenkunst wrote: >>> >>> > > > >>>>2) If not, is it possible to get a "pure" scalar directly from an array >>>>in one function call? >>>> >>>> >>>float(x[0]) seems to be faster on my PowerBook. >>> >>> >>It's faster for me, too, but float(x[0]) is still much slower than using >>Numeric where x[0] suffices. I guess I'll just have to warn my users >>away from the new scipy until numpy 0.9.8 comes out and scalar math is >>sped up. >> >> > >For that matter, a plain "x[0]" seems to be about 3x faster with Numeric than numpy. > > > We are already special-casing the integer select code but could special-case the getitem code so that if nd==1 a faster construction is used. I think right now a 0-dim array is being created only to get destroyed later on return. Please add a ticket as this extremely common operation should be made as fast as possible. This is a little tricky because array_big_item is called in a few places and is expected to return an array. If it returns a scalar in those places segfaults can occur. Either checks need to be made in each of those cases or the special-casing needs to be in array_big_item_nice. I'm not sure which I prefer.... -Travis From simon at arrowtheory.com Thu Apr 20 23:24:59 2006 From: simon at arrowtheory.com (Simon Burton) Date: Thu Apr 20 23:24:59 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs Message-ID: <20060421162336.42285837.simon@arrowtheory.com> Hi, Inspired by numexpr, pypy and llvm, i've built a simple JIT for creating numpy "ufuncs" (they are not yet real ufuncs). It uses llvm[1] as the backend machine code generator. The main things it can do are: *) parse simple python code (function def's) *) generate SSA assembly code for llvm *) build ufunc code for applying to numpy array's When I say simple I mean it: def calc(a,b): c = (a+b)/2.0 return c No control flow or type inference has been implemented. As with numexpr, significant speedups are possible. I'm putting this announce here to see what the other numpy'ers think. $ svn co http://rubis.rsise.anu.edu.au/local/repos/elefant/pyjit bye, Simon. [1] http://llvm.org/ -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From oqdr at dcorthodontics.com Fri Apr 21 00:08:02 2006 From: oqdr at dcorthodontics.com (Rosalia Oneal) Date: Fri Apr 21 00:08:02 2006 Subject: [Numpy-discussion] six-pack Message-ID: <001901c66512$37850955$68c487dd@tswt.rkkudn> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: frosty.gif Type: image/gif Size: 26123 bytes Desc: not available URL: From cookedm at physics.mcmaster.ca Fri Apr 21 09:27:00 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 21 09:27:00 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong Message-ID: Travis, Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to sourceforge, instead of the sdist. The one there isn't the source, it's a binary distribution of a 32-bit Linux compile. It's been over a month, with 2684 downloads, and I can't find a mention that anybody's noticed this before... Have we silently lost people who think we're on crack, or are there 2684 people who haven't looked at what they got? [On a another note, the download URL on PyPi won't work with setuptools; I've fixed the setup.py in svn to use the correct one, but if you could fix it on PyPi and set it to http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103 then people can use easy_install to install numpy.] -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Fri Apr 21 09:30:01 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 21 09:30:01 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong In-Reply-To: (David M. Cooke's message of "Fri, 21 Apr 2006 12:25:52 -0400") References: Message-ID: cookedm at physics.mcmaster.ca (David M. Cooke) writes: > Travis, > > Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to > sourceforge, instead of the sdist. The one there isn't the source, > it's a binary distribution of a 32-bit Linux compile. Gah! My bad! When I convinced easy_install to grab the source, it grabbed numpy-0.9.6-py2.4-linux-i686.tar.gz instead, which of course is a binary package. *why* it grabbed that one is another story (that's not my platform! I'm on py2.4-linux-x86_64). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From ndarray at mac.com Fri Apr 21 09:35:02 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 21 09:35:02 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong In-Reply-To: References: Message-ID: I've downloaded numpy-0.9.6.tar.gz from SF about a month ago and it was fine: > tar tzf ~/Archives/numpy-0.9.6.tar.gz numpy-0.9.6/ numpy-0.9.6/numpy/ numpy-0.9.6/numpy/core/ numpy-0.9.6/numpy/core/blasdot/ numpy-0.9.6/numpy/core/blasdot/_dotblas.c numpy-0.9.6/numpy/core/blasdot/cblas.h ... On 4/21/06, David M. Cooke wrote: > Travis, > > Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to > sourceforge, instead of the sdist. The one there isn't the source, > it's a binary distribution of a 32-bit Linux compile. > > It's been over a month, with 2684 downloads, and I can't find a > mention that anybody's noticed this before... Have we silently lost > people who think we're on crack, or are there 2684 people who haven't > looked at what they got? > > [On a another note, the download URL on PyPi won't work with > setuptools; I've fixed the setup.py in svn to use the correct one, but > if you could fix it on PyPi and set it to > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103 > then people can use easy_install to install numpy.] > > -- > |>|\/|< > /--------------------------------------------------------------------------\ > |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ > |cookedm at physics.mcmaster.ca > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From bsouthey at gmail.com Fri Apr 21 10:35:02 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Fri Apr 21 10:35:02 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong In-Reply-To: References: Message-ID: Hi, I concurr as I downloaded and installed it yesterday (April 20) afternoon: (from my ls -l) : 2006-04-20 13:38 numpy-0.9.6.tar.gz I had no problems installing that version as the import numpy appeared to work. Regards Bruce On 4/21/06, Sasha wrote: > I've downloaded numpy-0.9.6.tar.gz from SF about a month ago and it was fine: > > > tar tzf ~/Archives/numpy-0.9.6.tar.gz > numpy-0.9.6/ > numpy-0.9.6/numpy/ > numpy-0.9.6/numpy/core/ > numpy-0.9.6/numpy/core/blasdot/ > numpy-0.9.6/numpy/core/blasdot/_dotblas.c > numpy-0.9.6/numpy/core/blasdot/cblas.h > ... > > > > On 4/21/06, David M. Cooke wrote: > > Travis, > > > > Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to > > sourceforge, instead of the sdist. The one there isn't the source, > > it's a binary distribution of a 32-bit Linux compile. > > > > It's been over a month, with 2684 downloads, and I can't find a > > mention that anybody's noticed this before... Have we silently lost > > people who think we're on crack, or are there 2684 people who haven't > > looked at what they got? > > > > [On a another note, the download URL on PyPi won't work with > > setuptools; I've fixed the setup.py in svn to use the correct one, but > > if you could fix it on PyPi and set it to > > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103 > > then people can use easy_install to install numpy.] > > > > -- > > |>|\/|< > > /--------------------------------------------------------------------------\ > > |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ > > |cookedm at physics.mcmaster.ca > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From robert.kern at gmail.com Fri Apr 21 11:28:11 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 21 11:28:11 2006 Subject: [Numpy-discussion] Re: Source release of 0.9.6 on sourceforge is wrong In-Reply-To: References: Message-ID: David M. Cooke wrote: > cookedm at physics.mcmaster.ca (David M. Cooke) writes: > >>Travis, >> >>Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to >>sourceforge, instead of the sdist. The one there isn't the source, >>it's a binary distribution of a 32-bit Linux compile. > > Gah! My bad! When I convinced easy_install to grab the source, it > grabbed numpy-0.9.6-py2.4-linux-i686.tar.gz instead, which of course is a > binary package. > > *why* it grabbed that one is another story (that's not my platform! > I'm on py2.4-linux-x86_64). Phillip Eby tells me that the bdist_dumb packages there confuse some versions of setuptools. He fixed it this morning. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From faltet at xot.carabos.com Fri Apr 21 13:56:04 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Fri Apr 21 13:56:04 2006 Subject: [Numpy-discussion] numexpr enhancements Message-ID: <20060421205530.GA25020@xot.carabos.com> Hi, After looking at the numpy performance issues on strided and unaligned data, I decided to have a try at the numexpr package and finally implemented better suport for them. As a result, numexpr can reach now a 2x of performance improvement for simple expressions, like 'a>2.'. In the way, I've added support for boolean expressions (&, | and ~, as in the where() function), a new boolean data type (important to get better performance on boolean expressions) and support for numarray (maintaining the compatibility with numpy, of course). I've called the new package numexpr 0.2 to not confuse it with existing 0.1. Well, let's hope that numexpr can continue making its way towards integration in numpy. You can fetch this new package at: http://www.carabos.com/downloads/divers/numexpr-0.2.tar.gz Finally, let me say that numexpr is a wonderful toy to get your hands dirty ;-) Many thanks to David (and Tim) for this! Cheers! Francesc From hetland at tamu.edu Fri Apr 21 15:02:12 2006 From: hetland at tamu.edu (Robert Hetland) Date: Fri Apr 21 15:02:12 2006 Subject: [Numpy-discussion] 'append' array method request. Message-ID: I find myself writing things like x = []; y = []; t = [] for line in open(filename).readlines(): xstr, ystr, tstr = line.split() x.append(float(xstr)) y.append(float(ystr)_ t.append(dateutil.parser.parse(tstr)) # or something similar x = asarray(x) y = asarray(y) t = asarray(t) I think it would be nice to be able to create empty arrays, and append the values onto the end as I loop through the file without creating the intermediate list. Is this reasonable? Is there a way to do this with existing methods or functions that I am missing? Is there a better way altogether? -Rob. ----- Rob Hetland, Assistant Professor Dept of Oceanography, Texas A&M University p: 979-458-0096, f: 979-845-6331 e: hetland at tamu.edu, w: http://pong.tamu.edu From robert.kern at gmail.com Fri Apr 21 15:13:07 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 21 15:13:07 2006 Subject: [Numpy-discussion] Re: 'append' array method request. In-Reply-To: References: Message-ID: Robert Hetland wrote: > > I find myself writing things like > > x = []; y = []; t = [] > for line in open(filename).readlines(): > xstr, ystr, tstr = line.split() > x.append(float(xstr)) > y.append(float(ystr)_ > t.append(dateutil.parser.parse(tstr)) # or something similar > x = asarray(x) > y = asarray(y) > t = asarray(t) > > I think it would be nice to be able to create empty arrays, and append > the values onto the end as I loop through the file without creating the > intermediate list. Is this reasonable? Not in the core array object, no. We can't make the underlying pointer point to something else (because you've just reallocated the whole memory block to add an item to the array) without invalidating all of the views on that array. This is also the reason that numpy arrays can't use the standard library's array module as its storage. That said: > Is there a way to do this with > existing methods or functions that I am missing? Is there a better way > altogether? We've done performance tests before. The fastest way that I've found is to use the stdlib array module to accumulate values (it uses the same preallocation strategy that Python lists use, and you can't create views from them, so you are always safe) and then create the numpy array using fromstring on that object (stdlib arrays obey the buffer protocol, so they will be treated like strings of binary data). I posted timings one or two or three years ago on one of the scipy lists. However, lists are fine if you don't need blazing speed/low memory usage. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ndarray at mac.com Fri Apr 21 15:20:01 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 21 15:20:01 2006 Subject: [Numpy-discussion] 'append' array method request. In-Reply-To: References: Message-ID: On 4/21/06, Robert Hetland wrote: > [...] > I think it would be nice to be able to create empty arrays, and > append the values onto the end as I loop through the file without > creating the intermediate list. Is this reasonable? Is there a way > to do this with existing methods or functions that I am missing? Is > there a better way altogether? > Numpy arrays cannot grow in-place because there is no way for an array to tell if it's data is shared with other arrays. You can use python's standard library arrays instead of lists: >>> from numpy import * >>> import array as a >>> x = a.array('i',[]) >>> x.append(1) >>> x.append(2) >>> x.append(3) >>> ndarray(len(x), dtype=int, buffer=x) array([1, 2, 3]) Note that data is not copied: >>> ndarray(len(x), dtype=int, buffer=x)[1] = 20 >>> x array('i', [1, 20, 3]) From charlesr.harris at gmail.com Fri Apr 21 18:50:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri Apr 21 18:50:02 2006 Subject: [Numpy-discussion] 'append' array method request. In-Reply-To: References: Message-ID: Hi, On 4/21/06, Robert Hetland wrote: > > > I find myself writing things like > > x = []; y = []; t = [] > for line in open(filename).readlines(): > xstr, ystr, tstr = line.split() > x.append(float(xstr)) > y.append(float(ystr)_ > t.append(dateutil.parser.parse(tstr)) # or something similar > x = asarray(x) > y = asarray(y) > t = asarray(t) I think you can read the ascii file directly into an array with numeric conversions (fromfile) then just reshape it to have x,y,z columns. For example: $[charris at E011704 ~]$ cat input.txt 1 2 3 4 5 6 7 8 9 Then after importing numpy into ipython: In [6]:fromfile('input.txt',sep=' ').reshape(-1,3) Out[6]: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant.travis at ieee.org Fri Apr 21 19:51:07 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 21 19:51:07 2006 Subject: [Numpy-discussion] Re: seterr changes In-Reply-To: <44465DEE.8090703@cox.net> References: <44465DEE.8090703@cox.net> Message-ID: <444999E2.1040009@ieee.org> Tim Hochberg wrote: > > Hi Travis et al, > > I started looking at your seterr changes. Thank you very much for the help on this. I'm not an expert on threaded code by any means. In fact, as you clearly point out, I don't eat and drink what will work under threaded environments and what wont. Clearly global variables are problematic. That is the problem with the update_use_defaults bit, right? This is the way it was being managed before and I just changed names a bit to use PyThreadState_GetDict for the dictionary (it seems possible to use only from C until Python 2.4). I say if it only buys 5% on small arrays then it's not worth it as there are other fish to fry to make up for that 5% and I agree that tracking down threading problems due to a fanagled global variable is sticky. I did not think about the threading issues deeply enough. > I'm also curious about the seterr interface. It returns > ufunc_values_obj. I'm wasn't sure how one is supposed to pass that > back in to seterr, so I modified seterr to instead return a > dictionary. I also modified it so that the seterr function itself has > no defaults (or rather they're all None). Instead, any unspecified > values are taken from the current error state. Thus > seterr(divide="warn") changes only the divide state, leaving the other > entries alone. Returning an object is a late-in-the-game idea and should be critiqued. It can be passed to seterr (an attribute check grabs the actual list --- did you want to change it to a dictionary?). Doesn't a small list have faster access than a small dictionary? I'll look over your commits and comment later if I think of anything... I'm thrilled with your work. Best, -Travis From bitorika at cs.tcd.ie Sat Apr 22 03:18:00 2006 From: bitorika at cs.tcd.ie (bitorika at cs.tcd.ie) Date: Sat Apr 22 03:18:00 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <4446819D.3030401@astraw.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> Message-ID: <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> >> On 19 Apr 2006, at 18:37, Andrew Straw wrote: > I think that numpy only accesses the SSE units through ATLAS or other > external library. So, build numpy without ATLAS. But I'm not 100% sure > anymore if there aren't any optimizations that directly use SSE if it's > available. I've tried getting rid of all atlas, blas and lapack packages in my system and rebuilding numpy to use its own unoptimised lapack_lite, but no luck. Just trying to import numpy with PyImport_ImportModule("numpy") causes the program to crash with just a "Floating point exception" message output. The program I'm embedding Python in is the NS Network Simulator (http://www.isi.edu/nsnam/ns/). It's a complex C++ beast with its own Object-Tcl interpreter, but it's been working fine with embedded Python except for this numpy crash. I've used Numeric before and it worked fine as well. I'm lost now regarding what to work on to find a solution, anyone familiar with numpy internals has any suggestion? Thanks, Arkaitz From jordi.bofill at upc.edu Sat Apr 22 09:46:00 2006 From: jordi.bofill at upc.edu (Jordi Bofill) Date: Sat Apr 22 09:46:00 2006 Subject: [Numpy-discussion] Re: Dumping record arrays References: <200603302127.24231.pgmdevlist@mailcan.com> Message-ID: Pierre GM wrote: > Folks, > I'd like to dump/pickle some record arrays. The pickling works, the > unpickling raises a ValueError (on my version of numpy 0.9.6). (cf below). > Is this already corrected in the svn version ? > Thx > > > ########################################################################### > # > > x1 = array([21,32,14]) > x2 = array(['my','first','name']) > x3 = array([3.1, 4.5, 6.2]) > r = rec.fromarrays([x1,x2,x3], names='id, word, number') > > r.dump('dumper') > rb=load('dumper') > --------------------------------------------------------------------------- > exceptions.ValueError Traceback (most > recent call last) > > /home/backtopop/Work/workspace-python/pyflows/src/ > > /usr/lib64/python2.4/site-packages/numpy/core/numeric.py in load(file) > 331 if isinstance(file, type("")): > 332 file = _file(file,"rb") > --> 333 return _cload(file) > 334 > 335 # These are all essentially abbreviations > > /usr/lib64/python2.4/site-packages/numpy/core/_internal.py in > _reconstruct(subtype, shape, dtype) > 251 > 252 def _reconstruct(subtype, shape, dtype): > --> 253 return ndarray.__new__(subtype, shape, dtype) > 254 > 255 > > ValueError: ('data-type with unspecified variable length', _reconstruct at 0x2aaaafcf1578>, (, > (0,), 'V')) > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language that extends applications into web and mobile media. Attend the > live webcast and join the prime developer group breaking into this new > coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 I'm newbie moving from numarray and I also get this error. I tried svn records.py with the same result. Any hope in getting it fixed? The error can be reproduce from the source example: import numpy.core.records as rec r=rec.fromrecords([(456,'dbe',1.2),(2,'de',1.3)],names='col1,col2,col3') import cPickle print cPickle.loads(cPickle.dumps(r)) --------------------------------------------------------------------------- exceptions.ValueError Traceback (most recent call last) /home/jordi/temp/ /usr/lib/python2.4/site-packages/numpy/core/_internal.py in _reconstruct(subtype, shape, dt ype) 251 252 def _reconstruct(subtype, shape, dtype): --> 253 return ndarray.__new__(subtype, shape, dtype) 254 255 ValueError: ('data-type with unspecified variable length', , (, (0,), 'V')) From oliphant.travis at ieee.org Sat Apr 22 10:19:00 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 22 10:19:00 2006 Subject: [Numpy-discussion] Re: Dumping record arrays In-Reply-To: References: <200603302127.24231.pgmdevlist@mailcan.com> Message-ID: <444A653A.9020402@ieee.org> Jordi Bofill wrote: > Pierre GM wrote: > > >> Folks, >> I'd like to dump/pickle some record arrays. The pickling works, the >> unpickling raises a ValueError (on my version of numpy 0.9.6). (cf below). >> Is this already corrected in the svn version ? >> Thx >> >> >> >> > ########################################################################### > >> # >> >> x1 = array([21,32,14]) >> x2 = array(['my','first','name']) >> x3 = array([3.1, 4.5, 6.2]) >> r = rec.fromarrays([x1,x2,x3], names='id, word, number') >> >> This is fixed in SVN (but you have to get more than just the SVN records.py script). The needed change is in the __reduce__ method of the array object (which is in C). A re-compile is needed. NumPy 0.9.8 should be out in a few weeks. Best, -Travis >> r.dump('dumper') >> rb=load('dumper') >> --------------------------------------------------------------------------- >> exceptions.ValueError Traceback (most >> recent call last) >> >> /home/backtopop/Work/workspace-python/pyflows/src/ >> >> /usr/lib64/python2.4/site-packages/numpy/core/numeric.py in load(file) >> 331 if isinstance(file, type("")): >> 332 file = _file(file,"rb") >> --> 333 return _cload(file) >> 334 >> 335 # These are all essentially abbreviations >> >> /usr/lib64/python2.4/site-packages/numpy/core/_internal.py in >> _reconstruct(subtype, shape, dtype) >> 251 >> 252 def _reconstruct(subtype, shape, dtype): >> --> 253 return ndarray.__new__(subtype, shape, dtype) >> 254 >> 255 >> >> ValueError: ('data-type with unspecified variable length', > _reconstruct at 0x2aaaafcf1578>, (, >> (0,), 'V')) >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting >> language that extends applications into web and mobile media. Attend the >> live webcast and join the prime developer group breaking into this new >> coding territory! >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> > > I'm newbie moving from numarray and I also get this error. I tried svn > records.py with the same result. Any hope in getting it fixed? > The error can be reproduce from the source example: > > import numpy.core.records as rec > r=rec.fromrecords([(456,'dbe',1.2),(2,'de',1.3)],names='col1,col2,col3') > import cPickle > print cPickle.loads(cPickle.dumps(r)) > --------------------------------------------------------------------------- > exceptions.ValueError Traceback (most recent > call last) > > /home/jordi/temp/ > > /usr/lib/python2.4/site-packages/numpy/core/_internal.py in > _reconstruct(subtype, shape, dt ype) > 251 > 252 def _reconstruct(subtype, shape, dtype): > --> 253 return ndarray.__new__(subtype, shape, dtype) > 254 > 255 > > ValueError: ('data-type with unspecified variable length', _reconstruct at 0xb78f ce64>, (, > (0,), 'V')) > > > > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From fullung at gmail.com Sat Apr 22 10:53:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 22 10:53:05 2006 Subject: [Numpy-discussion] Re: seterr changes In-Reply-To: <444999E2.1040009@ieee.org> Message-ID: <005701c66635$82b3a930$0502010a@dsp.sun.ac.za> Hello all I was just wondering if someone could provide some example code that would cause an error if invalid is set to 'raise'? I also noticed that seterr returns the old values. Is this really useful? Consider its use in an IPython session: In [184]: N.geterr() Out[184]: {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under': 'ignore'} In [185]: N.seterr(over='raise') Out[185]: {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under': 'ignore'} I think the following pattern would make sense, but it seems it doesn't work at present: old=N.geterr() N.seterr(over='raise') # so some calculations that might overflow N.seterr(old) This currently causes the following error: Traceback (most recent call last): File "", line 1, in ? File "C:\Python24\Lib\site-packages\numpy\core\numeric.py", line 426, in seterr maskvalue = ((_errdict[divide] << SHIFT_DIVIDEBYZERO) + TypeError: dict objects are unhashable Is this intended? I think it would be useful to be able to restore all the error states in one go. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 22 April 2006 04:50 > To: tim.hochberg at ieee.org; numpy-discussion > Subject: [Numpy-discussion] Re: seterr changes > > Tim Hochberg wrote: > > > > Hi Travis et al, > > > > I started looking at your seterr changes. > Thank you very much for the help on this. I'm not an expert on threaded > code by any means. In fact, as you clearly point out, I don't eat and > drink what will work under threaded environments and what wont. Clearly > global variables are problematic. That is the problem with the > update_use_defaults bit, right? This is the way it was being managed > before and I just changed names a bit to use PyThreadState_GetDict for > the dictionary (it seems possible to use only from C until Python 2.4). > > I say if it only buys 5% on small arrays then it's not worth it as there > are other fish to fry to make up for that 5% and I agree that tracking > down threading problems due to a fanagled global variable is sticky. I > did not think about the threading issues deeply enough. > > > I'm also curious about the seterr interface. It returns > > ufunc_values_obj. I'm wasn't sure how one is supposed to pass that > > back in to seterr, so I modified seterr to instead return a > > dictionary. I also modified it so that the seterr function itself has > > no defaults (or rather they're all None). Instead, any unspecified > > values are taken from the current error state. Thus > > seterr(divide="warn") changes only the divide state, leaving the other > > entries alone. > Returning an object is a late-in-the-game idea and should be critiqued. > It can be passed to seterr (an attribute check grabs the actual list --- > did you want to change it to a dictionary?). Doesn't a small list have > faster access than a small dictionary? > > I'll look over your commits and comment later if I think of anything... > > I'm thrilled with your work. > > Best, > > -Travis From rob at hooft.net Sat Apr 22 11:48:01 2006 From: rob at hooft.net (Rob Hooft) Date: Sat Apr 22 11:48:01 2006 Subject: [Numpy-discussion] Re: seterr changes In-Reply-To: <005701c66635$82b3a930$0502010a@dsp.sun.ac.za> References: <005701c66635$82b3a930$0502010a@dsp.sun.ac.za> Message-ID: <444A7A35.5090906@hooft.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Albert Strasheim wrote: | old=N.geterr() | N.seterr(over='raise') | # so some calculations that might overflow | N.seterr(old) You should try (but I didn't): N.seterr(**old) Rob - -- Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFESno1H7J/Cv8rb3QRAppZAKCGBRSvL++wg3wFer6odmG8sxyrFwCfQ1nq p0aVr4r+Z1ZfRBGQgir+KX0= =eZMa -----END PGP SIGNATURE----- From strawman at astraw.com Sat Apr 22 12:13:02 2006 From: strawman at astraw.com (Andrew Straw) Date: Sat Apr 22 12:13:02 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> Message-ID: <444A8026.3030307@astraw.com> bitorika at cs.tcd.ie wrote: >>>On 19 Apr 2006, at 18:37, Andrew Straw wrote: >>> >>> >>I think that numpy only accesses the SSE units through ATLAS or other >>external library. So, build numpy without ATLAS. But I'm not 100% sure >>anymore if there aren't any optimizations that directly use SSE if it's >>available. >> >> > >I've tried getting rid of all atlas, blas and lapack packages in my system >and rebuilding numpy to use its own unoptimised lapack_lite, but no luck. >Just trying to import numpy with PyImport_ImportModule("numpy") causes the >program to crash with just a "Floating point exception" message output. > >The program I'm embedding Python in is the NS Network Simulator >(http://www.isi.edu/nsnam/ns/). It's a complex C++ beast with its own >Object-Tcl interpreter, but it's been working fine with embedded Python >except for this numpy crash. I've used Numeric before and it worked fine >as well. > >I'm lost now regarding what to work on to find a solution, anyone familiar >with numpy internals has any suggestion? > > OK, going back to your original gdb traceback, it looks like the SIGFPE originated in the following funtion in umathmodule.c: static double pinf_init(void) { double mul = 1e10; double tmp = 0.0; double pinf; pinf = mul; for (;;) { pinf *= mul; if (pinf == tmp) break; tmp = pinf; } return pinf; } If you try just that function (instead of the whole Python interpreter and numpy module) and still get the exception, you'll be that much closer to narrowing down the issue. From robert.kern at gmail.com Sat Apr 22 18:58:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 22 18:58:01 2006 Subject: [Numpy-discussion] Re: Backporting numpy to Python 2.2 In-Reply-To: <20060419103554.4ac1df4a.twegener@radlogic.com.au> References: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Message-ID: Tim Wegener wrote: > Hi, > > I am attempting to backport numpy-0.9.6 to be compatible with python 2.2. (Some of our machines run python 2.2 as part of Red Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard setup.) I was able to change most of the 2.3-isms to be 2.2 compatible (see the attached patch). However I had problems compiling the following c module: I was hoping that Travis would jump in and talk about the reasons that he targetted 2.3 and not 2.2. I don't think that it's going to be feasible to target 2.2 at this point. If nothing else, I've long since forgotten how to write 2.2 code. More seriously, doing an overhaul of all of the C code in numpy to use the older API is just going to make the code clumsier and more difficult to maintain. I think it is going to be much easier for you to install a second, more recent Python interpreter on your machines than it will be for you to maintain a 2.2-compatible branch. Linux installations, even Red Hat, usually handle having multiple versions of Python installed side by side just fine. You don't have to remove Python 2.2. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From zpincus at stanford.edu Sat Apr 22 20:48:00 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Sat Apr 22 20:48:00 2006 Subject: [Numpy-discussion] Matrix and var method Message-ID: <83468068-4E41-45A1-9753-90CEADF34722@stanford.edu> Hi folks, I just ran across an error with numpy.matrix types: the var() method does not seem to work! (I've tried all sorts of permutations on the matrix shape, and the axis parameter to var; nothing works.) Perhaps this has already been fixed -- I haven't updated my numpy in a week or so. If so, sorry; if not, I hope this helps. Zach In [1]: import numpy In [2]: numpy.__version__ Out[2]: '0.9.7.2335' In [3]: numpy.matrix([[1,2,3], [1,2,3]]).var() ------------------------------------------------------------------------ --- exceptions.ValueError Traceback (most recent call last) /Users/zpincus/ /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site- packages/numpy/core/defmatrix.py in __mul__(self, other) 147 if isinstance(other, N.ndarray) or N.isscalar(other) or \ 148 not hasattr(other, '__rmul__'): --> 149 return N.dot(self, other) 150 else: 151 return NotImplemented ValueError: matrices are not aligned In [4]: numpy.array([[1,2,3], [1,2,3]]).var() Out[4]: 0.80000000000000004 From a.mcmorland at auckland.ac.nz Sun Apr 23 17:40:02 2006 From: a.mcmorland at auckland.ac.nz (Angus McMorland) Date: Sun Apr 23 17:40:02 2006 Subject: [Numpy-discussion] Error installing on amd64 Debian-unstable Message-ID: <444C1E24.8030603@auckland.ac.nz> I had no troubles installing numpy and scipy on my 32-bit laptop, but cannot get numpy to install on my amd64 debian desktop. I've pulled in the latest svn versions, then run: $ python setup.py install Installation seems to run okay (no error messages), but the following happens: In [1]: import numpy import core -> failed: /usr/lib/python2.3/site-packages/numpy/core/_sort.so: undefined symbol: PyArray_CompareUCS4 import lib -> failed: module compiled against version 90703 of C-API but this version of numpy is 90704 import linalg -> failed: module compiled against version 90703 of C-API but this version of numpy is 90704 import dft -> failed: cannot import name asarray import random -> failed: 'module' object has no attribute 'dtype' --------------------------------------------------------------------------- exceptions.ImportError Traceback (most recent call last) /home/amcmorl/ /usr/lib/python2.3/site-packages/numpy/__init__.py 47 return NumpyTest().test(level, verbosity) 48 ---> 49 import add_newdocs 50 51 if __doc__ is not None: /usr/lib/python2.3/site-packages/numpy/add_newdocs.py ----> 2 from lib import add_newdoc 3 4 add_newdoc('numpy.core','dtype', 5 [('fields', "Fields of the data-typedescr if any."), 6 ('alignment', "Needed alignment for this data-type"), ImportError: cannot import name add_newdoc Can anyone suggest what I'm doing wrong? Cheers, A. -- Angus McMorland email a.mcmorland at auckland.ac.nz mobile +64-21-155-4906 PhD Student, Neurophysiology / Multiphoton & Confocal Imaging Physiology, University of Auckland phone +64-9-3737-599 x89707 Armourer, Auckland University Fencing Secretary, Fencing North Inc. From robert.kern at gmail.com Sun Apr 23 17:55:08 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 23 17:55:08 2006 Subject: [Numpy-discussion] Re: Error installing on amd64 Debian-unstable In-Reply-To: <444C1E24.8030603@auckland.ac.nz> References: <444C1E24.8030603@auckland.ac.nz> Message-ID: Angus McMorland wrote: > I had no troubles installing numpy and scipy on my 32-bit laptop, but > cannot get numpy to install on my amd64 debian desktop. I've pulled in > the latest svn versions, then run: > > $ python setup.py install > > Installation seems to run okay (no error messages), but the following > happens: > > In [1]: import numpy > import core -> failed: > /usr/lib/python2.3/site-packages/numpy/core/_sort.so: undefined symbol: > PyArray_CompareUCS4 > import lib -> failed: module compiled against version 90703 of C-API but > this version of numpy is 90704 Please delete the build/ directory and the installed numpy package and rebuild. If the problem persists, please let us know. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 23 17:58:22 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 23 17:58:22 2006 Subject: [Numpy-discussion] Changing the Trac authentication Message-ID: <444C20E5.7090309@gmail.com> I will be changing the Trac authentication over the next hour or so. I will be installing the AccountManagerPlugin to allow users to create accounts for themselves without needing to have SVN write access. Anonymous users will not be able to edit the Wikis or tickets. Non-developer, but registered users will be able to do so with some restrictions, notably not being able to resolve tickets. Developers who currently have accounts will have the same username/password as before. If you have problems using the Trac sites before I announce that I am done, please wait until I am finished. If there are still problems, please let me know and I will try to fix them as soon as possible. Thank you for your patience. Hopefully, this change will resolve the spam problem. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 23 18:12:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 23 18:12:02 2006 Subject: [Numpy-discussion] Re: Changing the Trac authentication In-Reply-To: <444C20E5.7090309@gmail.com> References: <444C20E5.7090309@gmail.com> Message-ID: <444C25A9.8080701@gmail.com> Robert Kern wrote: > I will be changing the Trac authentication over the next hour or so. Never mind. I'll have to do it tomorrow when I get to the office. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rmuller at sandia.gov Mon Apr 24 09:12:13 2006 From: rmuller at sandia.gov (Rick Muller) Date: Mon Apr 24 09:12:13 2006 Subject: [Numpy-discussion] Problems building numpy Message-ID: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> Numpy really builds nicely now, and I appreciate all of the hard work that people have put into portability of this code. That being said, I just had my first system where Numpy failed to build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe it either. not my decision.) and I get the following error when trying to run Numpy: Python 2.4.3 (#1, Apr 24 2006, 09:54:46) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import array import linalg -> failed: /usr/local/lib/python2.4/site-packages/numpy/ linalg/lapack_lite.so: undefined symbol: s_wsfe If this is easy to fix, I'd prefer to fix it. However, if the numpy developers have better things to do than to support a 10-year-old operating system (and I suspect that they do), I'm cool with that. Rick Rick Muller rmuller at sandia.gov From arkaitz.bitorika at gmail.com Mon Apr 24 09:24:03 2006 From: arkaitz.bitorika at gmail.com (Arkaitz Bitorika) Date: Mon Apr 24 09:24:03 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <444A8026.3030307@astraw.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> <444A8026.3030307@astraw.com> Message-ID: Andrew, I've verified that the function causes the exception when embedded in the program but not when used from a simple C program with just a main () function. The successful version iterates 31 times over the for loop while the crashing one fails the 30th time that it does "pinf *= mul". Now we know exactly where the crash is, but no idea how to fix it ;). It doesn't look it should be related to SSE2 flags, it's just doing a big multiplication, but I don't know enough about low level C and floating point operations to understand why it may be throwing the exception there. Any idea how I could avoid that function crashing? Thanks, Arkaitz On 22 Apr 2006, at 20:12, Andrew Straw wrote: > OK, going back to your original gdb traceback, it looks like the > SIGFPE > originated in the following funtion in umathmodule.c: > > static double > pinf_init(void) > { > double mul = 1e10; > double tmp = 0.0; > double pinf; > > pinf = mul; > for (;;) { > pinf *= mul; > if (pinf == tmp) break; > tmp = pinf; > } > return pinf; > } > > If you try just that function (instead of the whole Python interpreter > and numpy module) and still get the exception, you'll be that much > closer to narrowing down the issue. From robert.kern at gmail.com Mon Apr 24 09:53:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 09:53:02 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> Message-ID: Rick Muller wrote: > Numpy really builds nicely now, and I appreciate all of the hard work > that people have put into portability of this code. > > That being said, I just had my first system where Numpy failed to > build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe > it either. not my decision.) and I get the following error when trying > to run Numpy: > > Python 2.4.3 (#1, Apr 24 2006, 09:54:46) > [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-42)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from numpy import array > import linalg -> failed: /usr/local/lib/python2.4/site-packages/numpy/ > linalg/lapack_lite.so: undefined symbol: s_wsfe > > If this is easy to fix, I'd prefer to fix it. However, if the numpy > developers have better things to do than to support a 10-year-old > operating system (and I suspect that they do), I'm cool with that. This usually means that you are not linking in the g2c library: http://www.scipy.org/FAQ#head-26562f0a9e046b53eae17de300fc06408f9c91a8 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ndarray at mac.com Mon Apr 24 10:07:06 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 10:07:06 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). Message-ID: I was looking at ticket 76: http://projects.scipy.org/scipy/numpy/ticket/76 At first, I concluded that the ticket was valid and that >>> a = zeros([5,2]) >>> a[:] = arange(5) should raise an error as it did in Numeric. However, once I started looking at the code, I've realized that numpy supports more flexible broadcasting rules than Numeric. For example: >>> x = zeros([10]) >>> x[:] = 1,2 >>> x array([1, 2, 1, 2, 1, 2, 1, 2, 1, 2]) That would be an error in Numeric. Given that the above is valid, the result in Ticket 76 actually makes sense. I believe it is time to have some discussion about the future of broadcasting rules in numpy. Can anyone provide a summary of the status quo? From oliphant.travis at ieee.org Mon Apr 24 10:43:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 10:43:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: Message-ID: <444D0DF7.2060307@ieee.org> Sasha wrote: > I was looking at ticket 76: > > http://projects.scipy.org/scipy/numpy/ticket/76 > > At first, I concluded that the ticket was valid and that > > >>>> a = zeros([5,2]) >>>> a[:] = arange(5) >>>> > > should raise an error as it did in Numeric. However, once I started > looking at the code, I've realized that numpy supports more flexible > broadcasting rules than Numeric. > This really isn't in the category of "broadcasting" as I see it. My understanding is that broadcasting refers to operations involving more than one array on the input side. It's really just a "universal function" concept. A copying operation is not handled using the same rules. In this case, for example, Numeric used to raise an error but in NumPy the array will be copied as many times as possible into the array. I don't believe ticket #76 is actually an error. This behavior could be changed if somebody wants to write the code to change it but only until version 1.0. It would be very difficult to change the other broadcasting behavior which was inherited from Numeric, however. The only possibility I see is adding new useful functionality where Numeric used to raise an error. -Travis From zpincus at stanford.edu Mon Apr 24 10:57:04 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Mon Apr 24 10:57:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444D0DF7.2060307@ieee.org> References: <444D0DF7.2060307@ieee.org> Message-ID: <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> > It would be very difficult to change the other broadcasting > behavior which was inherited from Numeric, however. The only > possibility I see is adding new useful functionality where Numeric > used to raise an error. Well, there is one case that I run into all of the time where the broadcasting rules seem a bit constraining: In [1]: import numpy In [2]: numpy.__version__ '0.9.7.2335' In [3]: a = numpy.ones([50, 100]) In [4]: means = a.mean(axis = 1) In [5]: print a.shape, means.shape (50, 100) (50,) In [5]: a / means ValueError: index objects are not broadcastable to a single shape In [6]: (a.transpose() / means).transpose() #this works It's obvious why this doesn't work due to the broadcasting rules, but it also seems (to me, in this case at least) obvious what I am trying to do. I don't think I'm suggesting that the broadcasting rules be changed to allow matching-from-the-right in the general case, since that seems likely to make the broadcasting rules even more difficult to grok. But there do seem to be a lot of (....transpose () ... ).transpose() bits in my code. Is there anything to be done here? I presume not, but I just wanted to mention it. Zach From oliphant.travis at ieee.org Mon Apr 24 11:25:06 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 11:25:06 2006 Subject: ***[Possible UCE]*** Re: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> References: <444D0DF7.2060307@ieee.org> <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> Message-ID: <444D17E6.1070104@ieee.org> Zachary Pincus wrote: >> It would be very difficult to change the other broadcasting behavior >> which was inherited from Numeric, however. The only possibility I >> see is adding new useful functionality where Numeric used to raise an >> error. > > Well, there is one case that I run into all of the time where the > broadcasting rules seem a bit constraining: > > In [1]: import numpy > In [2]: numpy.__version__ > '0.9.7.2335' > In [3]: a = numpy.ones([50, 100]) > In [4]: means = a.mean(axis = 1) > In [5]: print a.shape, means.shape > (50, 100) (50,) > In [5]: a / means > ValueError: index objects are not broadcastable to a single shape > In [6]: (a.transpose() / means).transpose() > #this works > > It's obvious why this doesn't work due to the broadcasting rules, but > it also seems (to me, in this case at least) obvious what I am trying > to do. I don't think I'm suggesting that the broadcasting rules be > changed to allow matching-from-the-right in the general case, since > that seems likely to make the broadcasting rules even more difficult > to grok. But there do seem to be a lot of (....transpose() ... > ).transpose() bits in my code. > > Is there anything to be done here? I presume not, but I just wanted to > mention it. Yes, just be more explicit about which end to tack extra dimensions onto (the automatic extension always assumes pre-pending...) a / means[:,newaxis] is the suggested spelling... -Travis From ndarray at mac.com Mon Apr 24 11:30:05 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 11:30:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> References: <444D0DF7.2060307@ieee.org> <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> Message-ID: On 4/24/06, Zachary Pincus wrote: > [...] > In [5]: print a.shape, means.shape > (50, 100) (50,) > In [5]: a / means > ValueError: index objects are not broadcastable to a single shape > In [6]: (a.transpose() / means).transpose() > #this works This works too: >>> x = a / means[:,newaxis] no .transpose() :-). From ndarray at mac.com Mon Apr 24 11:49:04 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 11:49:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444D0DF7.2060307@ieee.org> References: <444D0DF7.2060307@ieee.org> Message-ID: On 4/24/06, Travis Oliphant wrote: > [...] > A copying operation is not handled using the same rules. In this case, > for example, Numeric used to raise an error but in NumPy the array will > be copied as many times as possible into the array. I don't believe > ticket #76 is actually an error. > I disagree on the terminology. In my view broadcasting means repeating the values of the array to fit into a different shape no matter what dictates the new shape an operand or the receiver. IMHO the following is slightly confusing: >>> a = zeros([5,2]) >>> a[...] += arange(5) Traceback (most recent call last): File "", line 1, in ? ValueError: shape mismatch: objects cannot be broadcast to a single shape but >>> a[...] = arange(5) is ok. > This behavior could be changed if somebody wants to write the code to > change it but only until version 1.0. It would be very difficult to > change the other broadcasting behavior which was inherited from Numeric, > however. The only possibility I see is adding new useful functionality > where Numeric used to raise an error. In this category, I would suggest to allow broadcasting to any multiple of the dimension even if the dimension is not 1. I don't see what makes 1 so special. >>> x = zeros(4) >>> x+(1,2) Traceback (most recent call last): File "", line 1, in ? ValueError: shape mismatch: objects cannot be broadcast to a single shape >>> x+(1,) array([1, 1, 1, 1]) I suggest that we make ufunc sonsistent with slice assignment. Currently: >>> x[:]=1,1 >>> x[:]=1,1,1 Traceback (most recent call last): File "", line 1, in ? ValueError: number of elements in destination must be integer multiple of number of elements in source From cookedm at physics.mcmaster.ca Mon Apr 24 13:13:09 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Mon Apr 24 13:13:09 2006 Subject: [Numpy-discussion] numexpr enhancements In-Reply-To: <20060421205530.GA25020@xot.carabos.com> (faltet@xot.carabos.com's message of "Fri, 21 Apr 2006 20:55:30 +0000") References: <20060421205530.GA25020@xot.carabos.com> Message-ID: faltet at xot.carabos.com writes: > Hi, > > After looking at the numpy performance issues on strided and unaligned > data, I decided to have a try at the numexpr package and finally > implemented better suport for them. As a result, numexpr can reach now > a 2x of performance improvement for simple expressions, like 'a>2.'. > > In the way, I've added support for boolean expressions (&, | and ~, as > in the where() function), a new boolean data type (important to get > better performance on boolean expressions) and support for numarray > (maintaining the compatibility with numpy, of course). > > I've called the new package numexpr 0.2 to not confuse it with existing > 0.1. Well, let's hope that numexpr can continue making its way towards > integration in numpy. > > You can fetch this new package at: > > http://www.carabos.com/downloads/divers/numexpr-0.2.tar.gz > > Finally, let me say that numexpr is a wonderful toy to get your hands > dirty ;-) Many thanks to David (and Tim) for this! Unfortunately, real life (damn Ph.D.! :-) has gotten in my way, so I'm not going to be able to look at this for a while. But I'll add it to my list. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Mon Apr 24 13:18:05 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Mon Apr 24 13:18:05 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs In-Reply-To: <20060421162336.42285837.simon@arrowtheory.com> (Simon Burton's message of "Fri, 21 Apr 2006 16:23:36 +1000") References: <20060421162336.42285837.simon@arrowtheory.com> Message-ID: Simon Burton writes: > Hi, > > Inspired by numexpr, pypy and llvm, i've built a simple > JIT for creating numpy "ufuncs" (they are not yet real ufuncs). > It uses llvm[1] as the backend machine code generator. Cool! I had a look at LLVM, but I wanted something to go into SciPy, and that was too heavy a dependence. However, I could see doing more stuff with this than I can easily with numexpr. > The main things it can do are: > > *) parse simple python code (function def's) > *) generate SSA assembly code for llvm > *) build ufunc code for applying to numpy array's > > When I say simple I mean it: > > def calc(a,b): > c = (a+b)/2.0 > return c > > No control flow or type inference has been implemented. > > As with numexpr, significant speedups are possible. > > I'm putting this announce here to see what the other numpy'ers think. > > $ svn co http://rubis.rsise.anu.edu.au/local/repos/elefant/pyjit > > [1] http://llvm.org/ How do the speedups compare with numexpr? Are there any lessons you learned from this that could apply to numexpr? Could we have a common frontend for numexpr/pyjit, and a different backend for each? Then each wouldn't have to reinvent the wheel in parsing (the same thought goes with weave, too...) I don't have much time to look at it (real life sucking my time :-(), but I'll have a look when I do have the time. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant.travis at ieee.org Mon Apr 24 14:22:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 14:22:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> Message-ID: <444D4143.4020204@ieee.org> Sasha wrote: > On 4/24/06, Travis Oliphant wrote: > >> [...] >> A copying operation is not handled using the same rules. In this case, >> for example, Numeric used to raise an error but in NumPy the array will >> be copied as many times as possible into the array. I don't believe >> ticket #76 is actually an error. >> >> > I disagree on the terminology. In my view broadcasting means > repeating the values of the array to fit into a different shape no > matter what dictates the new shape an operand or the receiver. > I can understand that view. But, that's not been the historical use of broadcasting which has always been only a "ufunc" concept. Code to implement a broader view of broadcasting across more operations if people decide that is appropriate could be done (carefully), but I don't have time to write it. -Travis From oliphant.travis at ieee.org Mon Apr 24 14:25:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 14:25:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> Message-ID: <444D41FE.7050904@ieee.org> Sasha wrote: > In this category, I would suggest to allow broadcasting to any > multiple of the dimension even if the dimension is not 1. I don't see > what makes 1 so special. > What's so special about 1 is that the code for it is relatively straightforward and already implemented using strides. Altering the code to allow any multiple of the dimension would be harder and slower. -Travis From oliphant.travis at ieee.org Mon Apr 24 14:30:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 14:30:01 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> Message-ID: <444D4329.9050700@ieee.org> Sasha wrote: >>>> x[:]=1,1 >>>> x[:]=1,1,1 >>>> > Traceback (most recent call last): > File "", line 1, in ? > ValueError: number of elements in destination must be integer multiple > of number of elements in source > I think the only reasonable thing to do is to raise an error unless the shapes were compatible like Numeric did and eliminate the multiple copying feature. This would bring the desired consistency. -Travis From strawman at astraw.com Mon Apr 24 14:33:01 2006 From: strawman at astraw.com (Andrew Straw) Date: Mon Apr 24 14:33:01 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> <444A8026.3030307@astraw.com> Message-ID: <444D43D0.3040308@astraw.com> This doesn't seem like an issue with numpy. Your test proved that. I'm curious what the outcome is, but I'm afraid there's not much we can do. At this point I think you should write the ns2 people and see what they say. Their program seems to be responsible for twiddling the FPU/SSE flags, so I think the issue is better solved, or at least discussed, by them. Cheers! Andrew Arkaitz Bitorika wrote: > Andrew, > > I've verified that the function causes the exception when embedded in > the program but not when used from a simple C program with just a main > () function. The successful version iterates 31 times over the for > loop while the crashing one fails the 30th time that it does "pinf *= > mul". > > Now we know exactly where the crash is, but no idea how to fix it ;). > It doesn't look it should be related to SSE2 flags, it's just doing a > big multiplication, but I don't know enough about low level C and > floating point operations to understand why it may be throwing the > exception there. Any idea how I could avoid that function crashing? > > Thanks, > Arkaitz > > On 22 Apr 2006, at 20:12, Andrew Straw wrote: > >> OK, going back to your original gdb traceback, it looks like the SIGFPE >> originated in the following funtion in umathmodule.c: >> >> static double >> pinf_init(void) >> { >> double mul = 1e10; >> double tmp = 0.0; >> double pinf; >> >> pinf = mul; >> for (;;) { >> pinf *= mul; >> if (pinf == tmp) break; >> tmp = pinf; >> } >> return pinf; >> } >> >> If you try just that function (instead of the whole Python interpreter >> and numpy module) and still get the exception, you'll be that much >> closer to narrowing down the issue. > From oliphant.travis at ieee.org Mon Apr 24 17:40:04 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Mon Apr 24 17:40:04 2006 Subject: [Numpy-discussion] Re: Backporting numpy to Python 2.2 In-Reply-To: <20060419103554.4ac1df4a.twegener@radlogic.com.au> References: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Message-ID: Tim Wegener wrote: > Hi, > > I am attempting to backport numpy-0.9.6 to be compatible with python 2.2. (Some of our machines run python 2.2 as part of Red Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard setup.) I was able to change most of the 2.3-isms to be 2.2 compatible (see the attached patch). However I had problems compiling the following c module: I targeted Python 2.3 because it added some very nice constructs (Python 2.4 added even more but I disciplined myself not to use them). I think it is not impossible to back-port it to Python 2.2 but I agree with Robert that I wonder if it is worth the effort. In this case Python 2.3 added the bool type which is used in NumPy. Basically this type would have to be constructed (the code could be grabbed from Python 2.3) in order to be used. The addition of the boolean type is probably the single biggest change that would make back-porting to 2.2 difficult. There may be others as well but they are probably easier to work around... -Travis From robert.kern at gmail.com Mon Apr 24 18:00:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 18:00:01 2006 Subject: [Numpy-discussion] Changing the Trac authentication, for real this time! Message-ID: <444D7458.3020402@gmail.com> If you encounter errors accessing the Trac sites for NumPy and SciPy over the next hour or so, please wait until I have announced that I have finished. If things are still broken after that, please let me know and I will try to fix it immediately. The details of the changes were posted to the previous thread "Changing the Trac authentication". Apologies for any disruption and for the noise. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ndarray at mac.com Mon Apr 24 18:26:07 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 18:26:07 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444D4329.9050700@ieee.org> References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> Message-ID: On 4/24/06, Travis Oliphant wrote: > Sasha wrote: > >>>> x[:]=1,1 > >>>> x[:]=1,1,1 > >>>> > > Traceback (most recent call last): > > File "", line 1, in ? > > ValueError: number of elements in destination must be integer multiple > > of number of elements in source > > > I think the only reasonable thing to do is to raise an error unless the > shapes were compatible like Numeric did and eliminate the multiple > copying feature. I've attached a patch to the ticket: I don't see why slice assignment cannot reuse the ufunc code. It looks like slice assignment can just be dispatched to a trivial (pass-through) ufunc. This aproach may even prove to be faster because type-aware copying loops can be faster than memmove on popular platforms. From robert.kern at gmail.com Mon Apr 24 19:39:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 19:39:02 2006 Subject: [Numpy-discussion] Re: Changing the Trac authentication, for real this time! In-Reply-To: <444D7458.3020402@gmail.com> References: <444D7458.3020402@gmail.com> Message-ID: <444D8BA2.1080407@gmail.com> I hate computers. It's still not done. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stephen.walton at csun.edu Mon Apr 24 20:49:03 2006 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Apr 24 20:49:03 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> Message-ID: <444D9C0C.3030006@csun.edu> Robert Kern wrote: >Rick Muller wrote: > >> >> >>That being said, I just had my first system where Numpy failed to >>build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe >>it either. not my decision.) and I get the following error when trying >>to run Numpy: >> >> >> >This usually means that you are not linking in the g2c library. > > On Redhat 7.3, I don't believe there was a g2c library, but an f2c one. So -lf2c is needed at the link step (and f2c needs to be installed). From robert.kern at gmail.com Mon Apr 24 20:54:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 20:54:02 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <444D9C0C.3030006@csun.edu> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> Message-ID: Stephen Walton wrote: > Robert Kern wrote: > >> Rick Muller wrote: >> >>> That being said, I just had my first system where Numpy failed to >>> build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe >>> it either. not my decision.) and I get the following error when trying >>> to run Numpy: >> >> This usually means that you are not linking in the g2c library. >> > On Redhat 7.3, I don't believe there was a g2c library, but an f2c one. > So -lf2c is needed at the link step (and f2c needs to be installed). Well, there's libf2c which is a library provided by f2c, a program that converts FORTRAN to C. And then there's libg2c which is provided by g77. They really are different and, I don't think, interchangeable. Note that libg2c will be stuck several ellipses down in the bowels of /usr/lib/gcc/.../.../libg2c.a not up in /usr/lib/. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stephen.walton at csun.edu Mon Apr 24 21:09:01 2006 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Apr 24 21:09:01 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> Message-ID: <444DA0A5.80902@csun.edu> Robert Kern wrote: >Well, there's libf2c which is a library provided by f2c, a program that converts >FORTRAN to C. And then there's libg2c which is provided by g77. They really are >different > Oh, I knew that. My point was that there were some old Redhat releases (I don't recall if 7.3 is that old, probably not) which didn't include g77, just an f77 shell script which called f2c and cc. From robert.kern at gmail.com Mon Apr 24 21:14:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 21:14:01 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <444DA0A5.80902@csun.edu> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> <444DA0A5.80902@csun.edu> Message-ID: Stephen Walton wrote: > Robert Kern wrote: > >> Well, there's libf2c which is a library provided by f2c, a program >> that converts >> FORTRAN to C. And then there's libg2c which is provided by g77. They >> really are >> different > > Oh, I knew that. My point was that there were some old Redhat releases > (I don't recall if 7.3 is that old, probably not) which didn't include > g77, just an f77 shell script which called f2c and cc. Oy. I'm not sure if even we support that. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rob at hooft.net Mon Apr 24 21:25:01 2006 From: rob at hooft.net (Rob Hooft) Date: Mon Apr 24 21:25:01 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <444DA0A5.80902@csun.edu> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> <444DA0A5.80902@csun.edu> Message-ID: <444DA473.2010000@hooft.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stephen Walton wrote: | Robert Kern wrote: | |> Well, there's libf2c which is a library provided by f2c, a program |> that converts |> FORTRAN to C. And then there's libg2c which is provided by g77. They |> really are |> different | | Oh, I knew that. My point was that there were some old Redhat releases | (I don't recall if 7.3 is that old, probably not) which didn't include | g77, just an f77 shell script which called f2c and cc. And in addition, very old versions of g77 (I'm not sure to which RedHat version this age corresponds) used f2c's library unmodified. I think the f2c/cc times (the compiler script was called fcomp?) were a bit older. I moved back to my current job with RedHat 4.x (1997), and I worked with self-compiled g77 already in my previous job.... Rob - -- Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFETaRzH7J/Cv8rb3QRAqtEAKCsDcj3tO7Gcvgsyj0CaDCu99JLSgCgjgjp sB7u8S0krk5a1G2bYC+h9cQ= =MLOS -----END PGP SIGNATURE----- From oliphant.travis at ieee.org Mon Apr 24 21:31:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 21:31:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> Message-ID: <444DA5D4.4080104@ieee.org> Sasha wrote: > On 4/24/06, Travis Oliphant wrote: > >> Sasha wrote: >> >>>>>> x[:]=1,1 >>>>>> x[:]=1,1,1 >>>>>> >>>>>> >>> Traceback (most recent call last): >>> File "", line 1, in ? >>> ValueError: number of elements in destination must be integer multiple >>> of number of elements in source >>> >>> >> I think the only reasonable thing to do is to raise an error unless the >> shapes were compatible like Numeric did and eliminate the multiple >> copying feature. >> > > I've attached a patch to the ticket: > > > > I don't see why slice assignment cannot reuse the ufunc code. It > looks like slice assignment can just be dispatched to a trivial > (pass-through) ufunc. This aproach may even prove to be faster > because type-aware copying loops can be faster than memmove on popular > platforms. > > It could re-use that code but there are at least two drawbacks to that approach: 1) The overhead of the ufunc for small array copies. 2) The special-casing that would be needed for variable-size arrays (string, unicode, void...) which are not supported by the ufunc machinery. and we've already improved the copying by making them type-aware. Right now copying is handled by the data-type functions (not the ufuncs). Perhaps what should be done instead is to allow for strided copying in the copyswapn function. To fully support record arrays with object components the copy operation for the VOID case needs to be recursive when fields are defined. -Travis From oliphant.travis at ieee.org Mon Apr 24 22:00:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 22:00:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> Message-ID: <444DACB8.50203@ieee.org> Sasha wrote: > On 4/24/06, Travis Oliphant wrote: > > I've attached a patch to the ticket: > > > I don't think the patch will do your definition of "the right thing" (i.e. mirror broadcasting behavior) in all cases. For example if "a" is 2x3x4x5 and "b" is 2x1x1x5, then a[...] = b will not fill the right sub-space of "a" with the contents of "b". The PyArray_CopyInto gets called in a lot of places. Have you checked all of them to be sure that altering the semantics of copying (which are currently different than broadcasting) will work correctly? I agree that one can demonstrate a slight in-consistency. But, I'd rather have the inconsistency and tell people that copying and assignment is not a broadcasting ufunc, then feign consistency and have it not quite right. -Travis From robert.kern at gmail.com Mon Apr 24 22:22:03 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 22:22:03 2006 Subject: [Numpy-discussion] Re: [SciPy-dev] Google Summer of Code In-Reply-To: <44476AEA.7080003@decsai.ugr.es> References: <44476AEA.7080003@decsai.ugr.es> Message-ID: <444DB033.4000906@gmail.com> [Cross-posted because this is partially an announcement. Continuing discussion should go to only one list, please.] Antonio Arauzo Azofra wrote: > Google Summer of Code > http://code.google.com/soc/ > > Have you considered participating as a Mentoring organization? Offering > any project about Scipy? I'm not sure which "you" you are referring to here, but yes! Unfortunately, it was a bit late in the process to be applying as a mentoring organization. Google started consolidating mentoring organizations. However, I and several others at Enthought are volunteering to mentor through the PSF. I encourage others on these lists to do the same or to apply as students, whichever is appropriate. We'll be happy to provide SVN workspace for numpy and scipy SoC projects. I've added one fairly general scipy entry to the python.org Wiki page listing project ideas: http://wiki.python.org/moin/SummerOfCode If you have more specific ideas, please add them to the Wiki. Potential mentors: Neal Norwitz is coordinating PSF mentors this year and has asked that those he or Guido does not know personally to give personal references. If you've been active on this list, I'm sure we can play the "Two Degrees of Separation From Guido Game" and get you a reference from someone else here. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Mon Apr 24 22:27:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 22:27:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444DACB8.50203@ieee.org> References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> <444DACB8.50203@ieee.org> Message-ID: <444DB302.30903@ieee.org> Travis Oliphant wrote: > Sasha wrote: >> On 4/24/06, Travis Oliphant wrote: >> I've attached a patch to the ticket: >> >> >> >> > I don't think the patch will do your definition of "the right thing" > (i.e. mirror broadcasting behavior) in all cases. For example if "a" > is 2x3x4x5 and "b" is 2x1x1x5, then a[...] = b will not fill the > right sub-space of "a" with the contents of "b". > > > The PyArray_CopyInto gets called in a lot of places. Have you checked > all of them to be sure that altering the semantics of copying (which > are currently different than broadcasting) will work correctly? I > agree that one can demonstrate a slight in-consistency. But, I'd > rather have the inconsistency and tell people that copying and > assignment is not a broadcasting ufunc, then feign consistency and > have it not quite right. > Of course, as I've said I'm not opposed to the consistency. To do it "right", one should use PyArray_MultiIterNew which abstracts the concept of broadcasting into iterators (and uses the broadcastable checking code that's already written --- so you guarantee consistency). I'm not sure what overhead it would bring. But, special cases could be checked-for (scalar, and same-size arrays for example). I'm also thinking that copyswapn should grow stride arguments so that it can be used more generally. -Travis From lroubeyrie at limair.asso.fr Tue Apr 25 00:39:04 2006 From: lroubeyrie at limair.asso.fr (Lionel Roubeyrie) Date: Tue Apr 25 00:39:04 2006 Subject: [Numpy-discussion] equality with masked object Message-ID: <200604250938.48648.lroubeyrie@limair.asso.fr> Hi all, I have a problem with masked_object (and masked_values to) like in this sort example : ########################################### lionel[Donn?es]8>test=array([1,2,3,inf,5]) lionel[Donn?es]9>test = ma.masked_object(test, inf) lionel[Donn?es]10>print test[3], type(test[3]) -- lionel[Donn?es]11>print test.max(), type(test.max()) 5.0 lionel[Donn?es]12>test[3] == test.max() Sortie[12]: array(data = [True], mask = True, fill_value=?) ########################################### Why 5.0 == -- return True? A float is it the same as a masked object? thanks -- Lionel Roubeyrie - lroubeyrie at limair.asso.fr LIMAIR http://www.limair.asso.fr From nicolas.chauvat at logilab.fr Tue Apr 25 03:22:15 2006 From: nicolas.chauvat at logilab.fr (Nicolas Chauvat) Date: Tue Apr 25 03:22:15 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs In-Reply-To: References: <20060421162336.42285837.simon@arrowtheory.com> Message-ID: <20060425102134.GI24645@crater.logilab.fr> On Mon, Apr 24, 2006 at 04:17:16PM -0400, David M. Cooke wrote: > Simon Burton writes: > > > Hi, > > > > Inspired by numexpr, pypy and llvm, i've built a simple > > JIT for creating numpy "ufuncs" (they are not yet real ufuncs). > > It uses llvm[1] as the backend machine code generator. > > Cool! I had a look at LLVM, but I wanted something to go into SciPy, > and that was too heavy a dependence. However, I could see doing more > stuff with this than I can easily with numexpr. Hello, People interested in this might also be interested in PyPy's rctypes and the exploratory work done in PyPy to annotate code using arrays. The goal is "write Python code using numeric arrays and other C libs, then ask PyPy to translate it to C while removing the python wrapper of the C libs, then compile". Then you can run the code as python code when developping and compile the all thing from C to assembly when speed matters. Please note it is a goal. We are not there yet. But any help will be welcome :) -- Nicolas Chauvat logilab.fr - services en informatique avanc?e et gestion de connaissances From steffen.loeck at gmx.de Tue Apr 25 04:25:22 2006 From: steffen.loeck at gmx.de (Steffen Loeck) Date: Tue Apr 25 04:25:22 2006 Subject: [Numpy-discussion] vectorize problem Message-ID: <200604251324.42987.steffen.loeck@gmx.de> Hello all, I have a problem using scalar variables in a vectorized function: from numpy import vectorize def f(x): if x>0: return 1 else: return 0 F = vectorize(f) F(1) gives the error message: --------------------------------------------------------------------------- exceptions.AttributeError Traceback (most recent call last) .../function_base.py in __call__(self, *args) 619 620 if self.nout == 1: --> 621 return self.ufunc(*args).astype(self.otypes[0]) 622 else: 623 return tuple([x.astype(c) for x, c in zip(self.ufunc(*args), self.otypes)]) AttributeError: 'int' object has no attribute 'astype' Is there any way to get vectorized functions working with scalars again? Regards Steffen From ndarray at mac.com Tue Apr 25 06:17:13 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 06:17:13 2006 Subject: [Numpy-discussion] equality with masked object In-Reply-To: <200604250938.48648.lroubeyrie@limair.asso.fr> References: <200604250938.48648.lroubeyrie@limair.asso.fr> Message-ID: On 4/25/06, Lionel Roubeyrie wrote: > > Why 5.0 == -- return True? A float is it the same as a masked object? > thanks It does not. It returns ma.masked : >>> test[3] is ma.masked True You should not access masked data - it makes no sense. The current behavior is historical and I don't really like it. Masked scalars are replaced by ma.masked singleton in subscript operations to allow a[i] is masked idiom. In my view it is not worth the trouble, but my suggestion to get rid of that feature was not met with much enthusiasm. From ndarray at mac.com Tue Apr 25 06:59:07 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 06:59:07 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444DACB8.50203@ieee.org> References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> <444DACB8.50203@ieee.org> Message-ID: On 4/25/06, Travis Oliphant wrote: > Sasha wrote: > > On 4/24/06, Travis Oliphant wrote: > > > > I've attached a patch to the ticket: > > > > > > > I don't think the patch will do your definition of "the right thing" > (i.e. mirror broadcasting behavior) in all cases. For example if "a" is > 2x3x4x5 and "b" is 2x1x1x5, then a[...] = b will not fill the right > sub-space of "a" with the contents of "b". > You are right, but it is not the fault of my code. My code checks shapes correctly, but the code that follows does not implement broadcasting. I did not realize that. This also explains why we disagreed on whether slice assignment is the same as broadcasting before. > > The PyArray_CopyInto gets called in a lot of places. Have you checked > all of them to be sure that altering the semantics of copying (which are > currently different than broadcasting) will work correctly? I agree > that one can demonstrate a slight in-consistency. But, I'd rather have > the inconsistency and tell people that copying and assignment is not a > broadcasting ufunc, then feign consistency and have it not quite right. > That's why I would rather use an identity ufunc for slice assignment instead of PyArray_CopyInto. From charges at humortadela.com.br Tue Apr 25 07:23:06 2006 From: charges at humortadela.com.br (Humortadela) Date: Tue Apr 25 07:23:06 2006 Subject: [Numpy-discussion] Voce recebeu uma charge humortadela Message-ID: <80a6946d133576735a9bca9dea6ea1c3@humortadela.com.br> An HTML attachment was scrubbed... URL: From charges at humortadela.com.br Tue Apr 25 07:24:03 2006 From: charges at humortadela.com.br (Humortadela) Date: Tue Apr 25 07:24:03 2006 Subject: [Numpy-discussion] Voce recebeu uma charge humortadela Message-ID: <80a6946d133576735a9bca9dea6ea1c3@humortadela.com.br> An HTML attachment was scrubbed... URL: From perry at stsci.edu Tue Apr 25 08:21:02 2006 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 25 08:21:02 2006 Subject: [Numpy-discussion] Re: Backporting numpy to Python 2.2 In-Reply-To: References: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Message-ID: <93BC9AD0-A6CA-4128-B0EE-9999F4CE8077@stsci.edu> On Apr 24, 2006, at 8:38 PM, Travis E. Oliphant wrote: > Tim Wegener wrote: >> Hi, I am attempting to backport numpy-0.9.6 to be compatible with >> python 2.2. (Some of our machines run python 2.2 as part of Red >> Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard >> setup.) I was able to change most of the 2.3-isms to be 2.2 >> compatible (see the attached patch). However I had problems >> compiling the following c module: > > I targeted Python 2.3 because it added some very nice constructs > (Python 2.4 added even more but I disciplined myself not to use them). > > I think it is not impossible to back-port it to Python 2.2 but I > agree with Robert that I wonder if it is worth the effort. > > In this case Python 2.3 added the bool type which is used in NumPy. > Basically this type would have to be constructed (the code could be > grabbed from Python 2.3) in order to be used. > > The addition of the boolean type is probably the single biggest > change that would make back-porting to 2.2 difficult. If I recall correctly, True and False were added in one of the 2.2 patch releases (one of those rare new features added in a patch release). Only as constant definitions using 0 and 1, and not the current boolean implementation. So depending on what the current dependencies on booleans are, it may or may not be usable from 2.2.3. But I also wonder if it is worth the effort. I tend to think not. Perry From ndarray at mac.com Tue Apr 25 10:27:10 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 10:27:10 2006 Subject: [Numpy-discussion] Question about __array_struct__ Message-ID: I am trying to add __array_struct__ attribute to R object wrappers in RPy. This is desirable because it eliminates a compile-time dependency on an array module and makes the binary compatible with either Numeric or numpy. R has four types of data: logical, integer, float, and character. The first three map perfectly to Numpy with inter->data simply pointing to an appropriate internal memory area. The character type, however is more problematic. In R character arrays are arrays of variable length strings and therefore similar to Numpy object arrays holding python strings. Obviously, there is no memory area that can be reused. I've tried to allocate new memory in __array_struct__ getter, but this presents a problem: I cannot deallocate that memory in CObject destructor because it is passed to the newly created array which lives long after the interface object is deleted. The __array_struct__ mechanism does not seem to allow to cause the new array assume ownership of the data, but even if it did, I do not know what memory allocator is appropriate. The only solution that I can think of is to create a dummy buffer type with the sole purpose of deleting an array of PyObjects and make an instance of that type the "base" of the new array. Can anyone suggest a better approach? From strawman at astraw.com Tue Apr 25 10:52:08 2006 From: strawman at astraw.com (Andrew Straw) Date: Tue Apr 25 10:52:08 2006 Subject: [Numpy-discussion] Question about __array_struct__ In-Reply-To: References: Message-ID: <444E619C.6030802@astraw.com> Sasha wrote: >I cannot deallocate that memory in CObject destructor because it is >passed to the newly created array which lives long after the interface >object is deleted. > Normally, the array that's viewing the data held by the __array_struct__ should keep a reference to the base object alive, thus preventing the issue. If the base object isn't a Python object, you'll have to create some kind of Python type that will ensure the original data is not freed, although this would normally take place via refcounts if the data source was a Python object. > The __array_struct__ mechanism does not seem to >allow to cause the new array assume ownership of the data, but even if >it did, I do not know what memory allocator is appropriate. > >The only solution that I can think of is to create a dummy buffer type >with the sole purpose of deleting an array of PyObjects and make an >instance of that type the "base" of the new array. > > Yes, that's I do. (See http://www.scipy.org/Cookbook/ArrayStruct_and_Pyrex for example.) From fullung at gmail.com Tue Apr 25 14:16:06 2006 From: fullung at gmail.com (Albert Strasheim) Date: Tue Apr 25 14:16:06 2006 Subject: [Numpy-discussion] SWIG wrappers: Inplace arrays Message-ID: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> Hello all I am using the SWIG Numpy typemaps to wrap some C code. I ran into the following problem when wrapping a function with INPLACE_ARRAY1. In Python, I create the following array: x = array([],dtype='descr->type_num) Given that I created the array with ' ---- Travis Oliphant wrote: > Sasha wrote: > > In this category, I would suggest to allow broadcasting to any > > multiple of the dimension even if the dimension is not 1. I don't see > > what makes 1 so special. > > > What's so special about 1 is that the code for it is relatively > straightforward and already implemented using strides. Altering the > code to allow any multiple of the dimension would be harder and slower. It also does the right thing most of the time and is easy to understand. It's my expectation that oppening up broadcasting will be more effective in masking errors than in enabling useful new behaviour. I think that's my ticket being discussed here. If so, it was motivated by a case that stopped working because the looser broadcasting behaviour was preventing some other broadcasting from taking place. I'm not home right now, so I can't provide details; I'll do that on Thursday. Just keep in mind that it's much easier to keep the broadcasting rules restrictive for now and loosen them up later than to try to tighten them up later if loosening them up turns out to not be a good idea. -tim From tim.hochberg at cox.net Tue Apr 25 14:24:05 2006 From: tim.hochberg at cox.net (tim.hochberg at cox.net) Date: Tue Apr 25 14:24:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). Message-ID: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> ---- Travis Oliphant wrote: > Sasha wrote: > > In this category, I would suggest to allow broadcasting to any > > multiple of the dimension even if the dimension is not 1. I don't see > > what makes 1 so special. > > > What's so special about 1 is that the code for it is relatively > straightforward and already implemented using strides. Altering the > code to allow any multiple of the dimension would be harder and slower. It also does the right thing most of the time and is easy to understand. It's my expectation that oppening up broadcasting will be more effective in masking errors than in enabling useful new behaviour. I think that's my ticket being discussed here. If so, it was motivated by a case that stopped working because the looser broadcasting behaviour was preventing some other broadcasting from taking place. I'm not home right now, so I can't provide details; I'll do that on Thursday. Just keep in mind that it's much easier to keep the broadcasting rules restrictive for now and loosen them up later than to try to tighten them up later if loosening them up turns out to not be a good idea. -tim From oliphant at ee.byu.edu Tue Apr 25 15:55:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 25 15:55:04 2006 Subject: [Numpy-discussion] SWIG wrappers: Inplace arrays In-Reply-To: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> References: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> Message-ID: <444EA88B.4050704@ee.byu.edu> Albert Strasheim wrote: >Hello all > >I am using the SWIG Numpy typemaps to wrap some C code. I ran into the >following problem when wrapping a function with INPLACE_ARRAY1. > >In Python, I create the following array: > >x = array([],dtype=' >When this is passed to the C function expecting an int*, it goes via >obj_to_array_no_conversion in numpy.i where a direct comparison of the >typecodes is done, at which point a TypeError is raised. > >In this case: > >desired type = int [typecode 5] >actual type = long [typecode 7] > >The typecode is obtained as follows: > >#define array_type(a) (int)(((PyArrayObject *)a)->descr->type_num) > >Given that I created the array with 'int instead of long. Why isn't this happening? > > Actually there is ambiguity i4 can be either int or long. If you want to guarantee an int-type then use dtype=intc). >Assuming the is a good reason for type_num being what it is, I think >obj_to_array_no_conversion needs to be slightly cleverer about the >conversions it allows. Is there any way to figure out that int and long are >actually identical (at least on my system) using the Numpy C API? Any other >suggestions or comments for solving this problem? > > > Yes. You can use one of PyArray_EquivTypes(PyArray_Descr *dtype1, PyArray_Descr *dtype2) PyArray_EquivTypenums(int typenum1, int typenum2) PyArray_EquivArrTypes(PyObject *array1, PyObject *array2) These return TRUE (non-zero) if the two type representations are equivalent. -Travis From oliphant at ee.byu.edu Tue Apr 25 16:07:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 25 16:07:05 2006 Subject: [Numpy-discussion] SWIG wrappers: Inplace arrays In-Reply-To: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> References: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> Message-ID: <444EAB81.3070001@ee.byu.edu> Albert Strasheim wrote: >Hello all > >I am using the SWIG Numpy typemaps to wrap some C code. I ran into the >following problem when wrapping a function with INPLACE_ARRAY1. > >In Python, I create the following array: > >x = array([],dtype=' >When this is passed to the C function expecting an int*, it goes via >obj_to_array_no_conversion in numpy.i where a direct comparison of the >typecodes is done, at which point a TypeError is raised. > >In this case: > >desired type = int [typecode 5] >actual type = long [typecode 7] > >The typecode is obtained as follows: > >#define array_type(a) (int)(((PyArrayObject *)a)->descr->type_num) > >Given that I created the array with 'int instead of long. Why isn't this happening? > >Assuming the is a good reason for type_num being what it is, I think >obj_to_array_no_conversion needs to be slightly cleverer about the >conversions it allows. Is there any way to figure out that int and long are >actually identical (at least on my system) using the Numpy C API? Any other >suggestions or comments for solving this problem? > > > Here is the relevant new numpy.i code (just checked in...) PyArrayObject* obj_to_array_no_conversion(PyObject* input, int typecode) { PyArrayObject* ary = NULL; if (is_array(input) && (typecode == PyArray_NOTYPE || PyArray_EquivTypenums(array_type(input), typecode)) { ary = (PyArrayObject*) input; } else if is_array(input) { char* desired_type = typecode_string(typecode); char* actual_type = typecode_string(array_type(input)); PyErr_Format(PyExc_TypeError, "Array of type '%s' required. Array of type '%s' given", desired_type, actual_type); ary = NULL; } else { char * desired_type = typecode_string(typecode); char * actual_type = pytype_string(input); PyErr_Format(PyExc_TypeError, "Array of type '%s' required. A %s was given", desired_type, actual_type); ary = NULL; } return ary; } From ndarray at mac.com Tue Apr 25 18:17:04 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 18:17:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> Message-ID: On 4/25/06, tim.hochberg at cox.net wrote: > > ---- Travis Oliphant wrote: > > Sasha wrote: > > > In this category, I would suggest to allow broadcasting to any > > > multiple of the dimension even if the dimension is not 1. I don't see > > > what makes 1 so special. > > > > > What's so special about 1 is that the code for it is relatively > > straightforward and already implemented using strides. Altering the > > code to allow any multiple of the dimension would be harder and slower. I don't think so. The same zero-stride trick that allows size-1 broadcasting can be used to implement repetition. I did not review the C code, but the following Python fragment shows that the loop that is already in numpy can be used to implement repetition by simply manipulating shapes and strides: >>> x = zeros(6) >>> reshape(x,(3,2))[...] = 1,2 >>> x array([1, 2, 1, 2, 1, 2]) > It also does the right thing most of the time and is easy to understand. Easy to understand? Let me quote Travis' book on this: "Broadcasting can be understood by four rules: ... While perhaps somewhat difficult to explain, broadcasting can be quite useful and becomes second nature rather quickly." I may be slow, but it did not become second nature for me. I am still getting bitten by subtle differences between unit length 1-d arrays and 0-d arrays. > It's my expectation that oppening up broadcasting will be more effective in masking > errors than in enabling useful new behaviour. > In my experience broadcasting length-1 and not broadcasting other lengths is very error prone as it is. I understand that restricting broadcasting to make it a strictly dimension-increasing operation is not possible for two reasons: 1. Numpy cannot break legacy Numeric code. 2. It is not possible to differentiate between 1-d array that broadcasts column-wise vs. one that broadcasts raw-wise. In my view none of these reasons is valid. In my experience Numeric code that relies on dimension-preserving broadcasting is already broken, only in a subtle and hard to reproduce way. Similarly the need to broadcast over non-leading dimension is a sign of bad design. In rare cases where such broadcasting is desirable, it can be easily done via swapaxes which is a cheap operation. Nevertheless, I've lost that battle some time ago. On the other hand I don't see much problem in making dimension-preserving broadcasting more permissive. In R, for example, (1-d) arrays can be broadcast to arbitrary size. This has an additional benefit that 1-d to 2-d broadcasting requires no special code, it just happens because matrices inherit arithmetics from vectors. I've never had a problem with R rules being too loose. > I think that's my ticket being discussed here. If so, it was motivated by a case that > stopped working because the looser broadcasting behaviour was preventing some > other broadcasting from taking place. I'm not home right now, so I can't provide > details; I'll do that on Thursday. In my view the problem that your ticket highlighted is not so much in the particular set of broadcasting rules, but in the fact that a[...] = b uses one set of rules while a[...] += b uses another. This is *very* confusing. > Just keep in mind that it's much easier to keep the broadcasting rules restrictive for > now and loosen them up later than to try to tighten them up later if loosening them up > turns out to not be a good idea. You are preaching to the choir! From simon at arrowtheory.com Tue Apr 25 18:29:01 2006 From: simon at arrowtheory.com (Simon Burton) Date: Tue Apr 25 18:29:01 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs In-Reply-To: References: <20060421162336.42285837.simon@arrowtheory.com> Message-ID: <20060426112808.531d652b.simon@arrowtheory.com> On Mon, 24 Apr 2006 16:17:16 -0400 cookedm at physics.mcmaster.ca (David M. Cooke) wrote: > > How do the speedups compare with numexpr? numexpr segfaults for me (runing timings.py): Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1209670912 (LWP 31768)] 0xb7d2b696 in PyArray_NewFromDescr (subtype=0x626e6769, descr=0x64007469, nd=1919251557, dims=0x656e696d, strides=0x782d2073, data=0x656c6520, flags=1953391981, obj=0x65736977) at arrayobject.c:3942 3942 arrayobject.c: No such file or directory. in arrayobject.c Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From robert.kern at gmail.com Tue Apr 25 20:10:07 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 25 20:10:07 2006 Subject: [Numpy-discussion] Chang*ed* the Trac authentication Message-ID: <444EE463.10007@gmail.com> Trying not to embarass myself again, I made the changes without telling you. :-) In order to create or modify Wiki pages or tickets on the NumPy and SciPy Tracs, you will have to be logged in. You can register yourself by clicking the "Register" link in the upper right-hand corner of the page. Developers who previously had accounts have the same username/password as before. You can now change your password if you like. Only developers have the ability to close tickets, delete Wiki pages entirely, or create new ticket reports (and possibly a couple of other things). Developers, please enter your name and email by clicking on the "Settings" link up at top once logged in. Thank you for your patience. If there are any problems, please email me, and I will try to correct them quickly. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Tue Apr 25 22:26:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 25 22:26:01 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> Message-ID: <444F0420.9000500@ieee.org> Sasha wrote: > On 4/25/06, tim.hochberg at cox.net wrote: > >> ---- Travis Oliphant wrote: >> >>> Sasha wrote: >>> >>>> In this category, I would suggest to allow broadcasting to any >>>> multiple of the dimension even if the dimension is not 1. I don't see >>>> what makes 1 so special. >>>> >>>> >>> What's so special about 1 is that the code for it is relatively >>> straightforward and already implemented using strides. Altering the >>> code to allow any multiple of the dimension would be harder and slower. >>> > > I don't think so. The same zero-stride trick that allows size-1 > broadcasting can be used to implement repetition. I did not review > the C code, but the following Python fragment shows that the loop that > is already in numpy can be used to implement repetition by simply > manipulating shapes and strides: > I don't think anyone is fundamentally opposed to multiple repetitions. We're just being cautious. Also, as you've noted, the assignment code is currently not using the ufunc broadcasting code and so they really aren't the same thing, yet. > >> It's my expectation that oppening up broadcasting will be more effective in masking >> errors than in enabling useful new behaviour. >> >> > In my experience broadcasting length-1 and not broadcasting other > lengths is very error prone as it is. That's not been my experience. But, I don't know R very well. I'm very interested in what ideas you can bring. > I understand that restricting > broadcasting to make it a strictly dimension-increasing operation is > not possible for two reasons: > > 1. Numpy cannot break legacy Numeric code. > 2. It is not possible to differentiate between 1-d array that > broadcasts column-wise vs. one that broadcasts raw-wise. > > In my view none of these reasons is valid. In my experience Numeric > code that relies on dimension-preserving broadcasting is already > broken, only in a subtle and hard to reproduce way. I definitely don't agree with you here. Dimension-preserving broadcasting is at the heart of the utility of broadcasting and it is very, very useful for that. Calling it subtly broken suggests that you don't understand it and have never used it for it's intended purpose. I've used dimension-preserving broadcasting literally hundreds of times. It's rather bold of you to say that all of that code is "broken" Now, I'm sure there are other useful ways to "broadcast", but dimension-preserving is essentially what broadcasting *is* in NumPy. If anything it is the dimension-increasing rule that is somewhat arbitrary (e.g. why prepend with ones). Perhaps you want to introduce some other way for non-commensurate shapes to interact in an operation. I think you will find many open minds on this list (although probably not anyone who will want to code it up :-) ). We do welcome the discussion. Your experience with other array-like languages is helpful. > Similarly the > need to broadcast over non-leading dimension is a sign of bad design. > In rare cases where such broadcasting is desirable, it can be easily > done via swapaxes which is a cheap operation. > Again, it would help if you would refrain from using negative words about coding styles that are different from your own. Such broadcasting is not that rare. It happens quite frequently, actually. The point of a language like Python is that you can write algorithms simply without struggling with optimization questions up front like you seem to be hinting at. > On the other hand I don't see much problem in making > dimension-preserving broadcasting more permissive. In R, for example, > (1-d) arrays can be broadcast to arbitrary size. This has an > additional benefit that 1-d to 2-d broadcasting requires no special > code, it just happens because matrices inherit arithmetics from > vectors. I've never had a problem with R rules being too loose. > So, please explain exactly what you mean. Only a few on this list know what the R rules even are. > In my view the problem that your ticket highlighted is not so much in > the particular set of broadcasting rules, but in the fact that a[...] > = b uses one set of rules while a[...] += b uses another. This is > *very* confusing. > Yes, this is admittedly confusing. But, it's an outgrowth of the way Numeric code developed. Broadcasting was always only a ufunc concept in Numeric, and copying was not a ufunc. NumPy grew out of Numeric code. I was not trying to mimick broadcasting behavior when I wrote the array copy and array setting code. Perhaps I should have been. I'm willing to change the code on this one, but only if the new copy code actually does implement broadcasting behavior equivalently. And going through the ufunc machinery is probably a waste of effort because the copy code must be written for variable length arrays anyway (and ufuncs don't support them). However, the broadcasting machinery has been abstracted in NumPy and can therefore be re-used in the copying code. In Numeric, broadcasting was basically implemented deep inside a confusing while loop. -Travis From fullung at gmail.com Tue Apr 25 23:42:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Tue Apr 25 23:42:05 2006 Subject: [Numpy-discussion] SWIG wrappers: Passing NULL pointers or arrays Message-ID: <00dd01c668fc$6d04b470$0502010a@dsp.sun.ac.za> Hello all, I've currently wrapping a C library (libsvm) with NumPy. libsvm has a few structs similiar to the following: struct svm_parameter { double* weight; int nr_weight; }; In my SWIG wrapper I did the following: struct svm_parameter { %immutable; int nr_weight; %mutable; double* weight; %extend { svm_parameter() { struct svm_parameter* param = (struct svm_parameter*) malloc(sizeof(struct svm_parameter)); param->nr_weight = 0; param->weight = 0; return param; } ~svm_parameter() { free(self->weight); free(self); } void _set_weight(double* IN_ARRAY1, int DIM1) { free(self->weight); self->nr_weight = DIM1; self->weight = malloc(sizeof(double) * DIM1); if (!self->weight) { SWIG_exception(SWIG_MemoryError, "OOM"); } memcpy(self->weight, IN_ARRAY1, sizeof(double) * DIM1); return; fail: self->nr_weight = 0; self->weight = 0; } } }; This works pretty well (suggestion welcome though). However, one feature that I think is lacking from the current array typemaps is a way of passing NULL to the C function. On the Python side I want to be able to do: svm_parameter.weight = N.array([1.0,2.0]) or svm_parameter.weight = None This heads off to __setattr__ where the following happens: def __setattr__(self, attr, val): if attr in ['weight', 'weight_label']: set_func = getattr(self, '_set_%s' % (attr,)) set_func(val) else: super(svm_parameter, self).__setattr__(attr, val) At this point the typemap magic kicks in. However, passing a None doesn't work, because somewhere down the line somebody checks for the int argument. The current typemap looks like this: %define TYPEMAP_IN1(type,typecode) %typemap(in) (type* IN_ARRAY1, int DIM1) (PyArrayObject* array=NULL, int is_new_object) { int size[1] = {-1}; array = obj_to_array_contiguous_allow_conversion($input, typecode, &is_new_object); if (!array || !require_dimensions(array,1) || !require_size(array,size,1)) SWIG_fail; $1 = (type*) array->data; $2 = array->dimensions[0]; } %typemap(freearg) (type* IN_ARRAY1, int DIM1) { if (is_new_object$argnum && array$argnum) Py_DECREF(array$argnum); } %enddef I quickly hacked up the following typemap that seems to deal gracefully when a None is passed instead of an array. Changed lines: if ($input == Py_None) { is_new_object = 0; $1 = NULL; $2 = 0; } else { int size[1] = {-1}; array = obj_to_array_contiguous_allow_conversion($input, typecode, &is_new_object); if (!array || !require_dimensions(array,1) || !require_size(array,size,1)) SWIG_fail; $1 = (type*) array->data; $2 = array->dimensions[0]; } Now I can write my set_weight function as follows: void _set_weight(double* IN_ARRAY1, int DIM1) { free(self->weight); self->weight = 0; self->nr_weight = DIM1; if (DIM1 > 0) { self->weight = malloc(sizeof(double) * DIM1); if (!self->weight) { SWIG_exception(SWIG_MemoryError, "OOM"); } memcpy(self->weight, IN_ARRAY1, sizeof(double) * DIM1); } return; fail: self->nr_weight = 0; } Does it make sense to add this to the typemaps? Any other comments? Are there better ways to accomplish this? Regards, Albert From arnd.baecker at web.de Wed Apr 26 00:52:01 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 00:52:01 2006 Subject: [Numpy-discussion] vectorize problem In-Reply-To: <200604251324.42987.steffen.loeck@gmx.de> References: <200604251324.42987.steffen.loeck@gmx.de> Message-ID: Hi, On Tue, 25 Apr 2006, Steffen Loeck wrote: > Hello all, > > I have a problem using scalar variables in a vectorized function: > > from numpy import vectorize > > def f(x): > if x>0: return 1 > else: return 0 > > F = vectorize(f) > > F(1) > > gives the error message: > --------------------------------------------------------------------------- > exceptions.AttributeError Traceback (most recent call last) > > .../function_base.py in __call__(self, *args) > 619 > 620 if self.nout == 1: > --> 621 return self.ufunc(*args).astype(self.otypes[0]) > 622 else: > 623 return tuple([x.astype(c) for x, c in > zip(self.ufunc(*args), self.otypes)]) > > AttributeError: 'int' object has no attribute 'astype' Ouch - that's not nice - a lot of my code relies the fact that (old scipy) vectorize happily eats scalars *and* arrays. I am not familiar with the code of numpy.vectorize (which has indeed changed quite a bit compared to the old scipy.vectorize), but maybe it is only a simple change? > Is there any way to get vectorized functions working with scalars again? +1 (or is there a particular reason why "vectorized" functions should not be able to operate on scalars?) Best, Arnd From pgmdevlist at mailcan.com Wed Apr 26 01:06:04 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Wed Apr 26 01:06:04 2006 Subject: [Numpy-discussion] A python interface for loess ? Message-ID: <200604260329.17115.pgmdevlist@mailcan.com> Folks, Would any of you be aware of a Python interface to the loess routines ? http://netlib.bell-labs.com/netlib/a/dloess.gz I could use the R implementation through Rpy, but I would prefer to stick to Python... Thanks a lot in advance P. From arnd.baecker at web.de Wed Apr 26 02:39:05 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 02:39:05 2006 Subject: [Numpy-discussion] concatenate, doc-string Message-ID: Hi, the doc-string of concatentate is pretty short: numpy.concatenate? Docstring: concatenate((a1,a2,...),axis=None). Would the following be better: """ concatenate((a1, a2,...), axis=None) joins the tuple `(a1, a2, ...)` of sequences (or arrays) into a single numpy array. Example:: print concatenate( ([0,1,2], [5,6,7])) """ ((The ``(or arrays)`` could be omitted if sequences include array by default, though it might not be obvious to beginners ...)) I was also tempted to suggest a dtype argument, concatenate( ([0,1,2], [5,6,7]), dtype=numpy.Float) but I am not sure if that would be a good idea ... Best, Arnd From gnchen at cortechs.net Wed Apr 26 06:52:01 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Wed Apr 26 06:52:01 2006 Subject: [Numpy-discussion] SWIG for 3D array Message-ID: Hi! I will like to use SWIG to wrap my code. However, it seems the current numpy.i only can map 1 and 2D array, but not 3D. Is it correct? Or I miss something here. I don't mind spend some time to do it like scipy.ndimage if numpy.i did not support ND arrary. But I am new to write extension to Python. And I really have hard time to understand how to deal with reference counting issues. Anyone know where I can know a good reference for that? Or a simple example in numpy will be appreciated.... Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net From oliphant.travis at ieee.org Wed Apr 26 10:05:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Apr 26 10:05:01 2006 Subject: [Numpy-discussion] vectorize problem In-Reply-To: References: <200604251324.42987.steffen.loeck@gmx.de> Message-ID: <444FA7E7.2070303@ieee.org> Arnd Baecker wrote: > Hi, > > On Tue, 25 Apr 2006, Steffen Loeck wrote: > > >> Hello all, >> >> I have a problem using scalar variables in a vectorized function: >> >> from numpy import vectorize >> >> def f(x): >> if x>0: return 1 >> else: return 0 >> >> F = vectorize(f) >> >> F(1) >> >> gives the error message: >> --------------------------------------------------------------------------- >> exceptions.AttributeError Traceback (most recent call last) >> >> .../function_base.py in __call__(self, *args) >> 619 >> 620 if self.nout == 1: >> --> 621 return self.ufunc(*args).astype(self.otypes[0]) >> 622 else: >> 623 return tuple([x.astype(c) for x, c in >> zip(self.ufunc(*args), self.otypes)]) >> >> AttributeError: 'int' object has no attribute 'astype' >> > > Ouch - that's not nice - a lot of my code relies the fact that (old > scipy) vectorize happily eats scalars *and* arrays. > > I am not familiar with the code of numpy.vectorize (which has indeed > changed quite a bit compared to the old scipy.vectorize), > but maybe it is only a simple change? > It is just a simple change. Scalars are supposed to be supported. They aren't only as a side-effect of the switch to not return object-scalars. I did not update the vectorize code to handle the scalar return value from the object ufunc (which is now no-longer an object-scalar with the methods of arrays (like astype) but is instead the underlying object). I'll add a check. -Travis From jrl at gatewayengineers.com Wed Apr 26 12:29:01 2006 From: jrl at gatewayengineers.com (Frida Maldonado) Date: Wed Apr 26 12:29:01 2006 Subject: [Numpy-discussion] vat Message-ID: <001a01c66967$82f94541$ddc46747@ijopi.sewtp> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: controversy.gif Type: image/gif Size: 28493 bytes Desc: not available URL: From cookedm at physics.mcmaster.ca Wed Apr 26 12:33:01 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 26 12:33:01 2006 Subject: [Numpy-discussion] Chang*ed* the Trac authentication In-Reply-To: <444EE463.10007@gmail.com> (Robert Kern's message of "Tue, 25 Apr 2006 22:09:23 -0500") References: <444EE463.10007@gmail.com> Message-ID: Robert Kern writes: > Trying not to embarass myself again, I made the changes without telling you. :-) > > In order to create or modify Wiki pages or tickets on the NumPy and SciPy Tracs, > you will have to be logged in. You can register yourself by clicking the > "Register" link in the upper right-hand corner of the page. > > Developers who previously had accounts have the same username/password as > before. You can now change your password if you like. Only developers have the > ability to close tickets, delete Wiki pages entirely, or create new ticket > reports (and possibly a couple of other things). Developers, please enter your > name and email by clicking on the "Settings" link up at top once logged in. > > Thank you for your patience. If there are any problems, please email me, and I > will try to correct them quickly. Thanks Robert; I hope this helps with our spam problem to an extent. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Wed Apr 26 12:48:04 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 26 12:48:04 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: (Arnd Baecker's message of "Wed, 26 Apr 2006 11:38:26 +0200 (CEST)") References: Message-ID: Arnd Baecker writes: > Hi, > > the doc-string of concatentate is pretty short: > > numpy.concatenate? > Docstring: > concatenate((a1,a2,...),axis=None). > > Would the following be better: > """ > concatenate((a1, a2,...), axis=None) joins the tuple `(a1, a2, ...)` of > sequences (or arrays) into a single numpy array. > > Example:: > > print concatenate( ([0,1,2], [5,6,7])) > """ > > ((The ``(or arrays)`` could be omitted if sequences include array by > default, though it might not be obvious to beginners ...)) Here's what I just checked in: concatenate((a1, a2, ...), axis=None) joins arrays together The tuple of sequences (a1, a2, ...) are joined along the given axis (default is the first one) into a single numpy array. Example: >>> concatenate( ([0,1,2], [5,6,7]) ) array([0, 1, 2, 5, 6, 7]) > I was also tempted to suggest a dtype argument, > concatenate( ([0,1,2], [5,6,7]), dtype=numpy.Float) > but I am not sure if that would be a good idea ... Well, that would require more code, so I didn't do it :-) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From arnd.baecker at web.de Wed Apr 26 14:03:02 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 14:03:02 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: On Wed, 26 Apr 2006, David M. Cooke wrote: > Arnd Baecker writes: > > > Hi, > > > > the doc-string of concatentate is pretty short: > > > > numpy.concatenate? > > Docstring: > > concatenate((a1,a2,...),axis=None). > > > > Would the following be better: > > """ > > concatenate((a1, a2,...), axis=None) joins the tuple `(a1, a2, ...)` of > > sequences (or arrays) into a single numpy array. > > > > Example:: > > > > print concatenate( ([0,1,2], [5,6,7])) > > """ > > > > ((The ``(or arrays)`` could be omitted if sequences include array by > > default, though it might not be obvious to beginners ...)) > > Here's what I just checked in: > > concatenate((a1, a2, ...), axis=None) joins arrays together > > The tuple of sequences (a1, a2, ...) are joined along the given axis > (default is the first one) into a single numpy array. > > Example: > > >>> concatenate( ([0,1,2], [5,6,7]) ) > array([0, 1, 2, 5, 6, 7]) Great - many thanks!! There are some further routines which might benefit from some more explanation/examples - so if you don't mind I will try to suggest some additions (I could check them in directly, I think, but as I am not a native speaker I feel better to post them here for review/improvement). > > I was also tempted to suggest a dtype argument, > > concatenate( ([0,1,2], [5,6,7]), dtype=numpy.Float) > > but I am not sure if that would be a good idea ... > > Well, that would require more code, so I didn't do it :-) ;-) It might also be problematic, when one of the sequence elements would not fit into the output type. Best, Arnd From ndarray at mac.com Wed Apr 26 14:18:06 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 14:18:06 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444F0420.9000500@ieee.org> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: I would like to apologize up-front if anyone found my overly general arguments inappropriate. I did not intend to be critical about anyone's code or design other than my own. Any references to "bad design" or "broken code" are related to my own misguided attempts to use some of the Numeric features in the past. It turned out that dimension-preserving broadcasting was a wrong feature to use for a specific class of problems that I am dealing with most of the time. This does not mean, that it cannot be used appropriately in other domains. I was wrong in posting overly general opinions without providing specific examples. I will try to do better in this post. Before I do that, however, let me try to explain why I hold strong views on certain things. In my view the most appealing feature in Python is the Zen of Python " and in particular "There should be one-- and preferably only one --obvious way to do it." In my view Python represents the "hard science" approach appealing to physics and math types while Perl is more of a "soft science" language. (There is nothing wrong with either Perl or soft sciences.) This is what make Python so appealing for scientific computing. Unfortunately, it is the fact of life that there are always many ways to solve the same problem and a successful "pythonic" design has to pick one (preferably the best) of the possible ways and make it obvious. This said, let me present a specific problem that I will use to illustrate my points below. Suppose we study school statistics in different cities. Let city A have 10 schools with 20 classes and 30 students in each. It is natural to organize the data collected about the students in a 10x20x30 array. It is also natural to collect some of the data at the per-school or per-class level. This data may come from aggregating student level statistics (say average test score) or from the characteristics that are class or school specific (say the grade or primary language). There are two obvious ways to present such data. 1) We can use 3-d arrays for everything and make the shape of the per-class array 10x20x1 and the shape of per-school array 10x1x1; and 2) use 2-d and 1-d arrays. The first approach seems to be more flexible. We can also have 10x1x30 or 1x1x30 arrays to represent data which varies along the student dimension, but is constant across schools or classes. However, this added benefit is illusory: the first student in one class list has no relationship to the first student in the other class, so in this particular problem an average score of the first student across classes makes no sense (it will also depend on whether students are ordered alphabetically or by an achievement rank). On the other hand this approach has a very significant drawback: functions that process city data have no way to distinguish between per-school data and a lucky city that can afford educating its students in individual classes. Just as it is extremely unlikely to have one student per class in our toy example, in real-world problems it is not unreasonable to assume that dimension of size 1 represents aggregate data. A software designed based on this assumption is what I would call broken in a subtle way. Please see more below. On 4/26/06, Travis Oliphant wrote: > Sasha wrote: > > On 4/25/06, tim.hochberg at cox.net wrote: > > > >> ---- Travis Oliphant wrote: > [...] > I don't think anyone is fundamentally opposed to multiple repetitions. > We're just being cautious. Also, as you've noted, the assignment code > is currently not using the ufunc broadcasting code and so they really > aren't the same thing, yet. It looks like there is a lot of development in this area going on at the moment. Please let me know if I can help. > [...] > > In my experience broadcasting length-1 and not broadcasting other > > lengths is very error prone as it is. > > That's not been my experience. I should have been more specific. As I explained above, the special properties of length-1 led me to design a system that distinguished aggregate data by testing for unit length. This system was subtly broken. In a rare case when the population had only one element, the system was producing wrong results. > But, I don't know R very well. I'm very > interested in what ideas you can bring. > R takes a very simple approach: everything is a vector. There are no scalars, if you need a scalar, you use a vector of length 1. Broadcasting is simply repetition: > x <- rep(0,10) > x + c(1,2) [1] 1 2 1 2 1 2 1 2 1 2 the length of the larger vector does not even need to be a multiple of the shorter, but in this case a warning is issued: > x + c(1,2,3) [1] 1 2 3 1 2 3 1 2 3 1 Warning message: longer object length is not a multiple of shorter object length in: x + c(1, 2, 3) Multi-dimensional arrays are implemented by setting a "dim" attribute: > dim(x) <- c(2,5) > x [,1] [,2] [,3] [,4] [,5] [1,] 0 0 0 0 0 [2,] 0 0 0 0 0 (R uses Fortran order). Broadcasting ignores the dim attribute, but does the right thing for conformable vectors: > x + c(1,2) [,1] [,2] [,3] [,4] [,5] [1,] 1 1 1 1 1 [2,] 2 2 2 2 2 However, the following is unfortunate: > x + 1:5 [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 2 4 [2,] 2 4 1 3 5 > > I understand that restricting > > broadcasting to make it a strictly dimension-increasing operation is > > not possible for two reasons: > > > > 1. Numpy cannot break legacy Numeric code. > > 2. It is not possible to differentiate between 1-d array that > > broadcasts column-wise vs. one that broadcasts raw-wise. > > > > In my view none of these reasons is valid. In my experience Numeric > > code that relies on dimension-preserving broadcasting is already > > broken, only in a subtle and hard to reproduce way. > > I definitely don't agree with you here. Dimension-preserving > broadcasting is at the heart of the utility of broadcasting and it is > very, very useful for that. Calling it subtly broken suggests that you > don't understand it and have never used it for it's intended purpose. > I've used dimension-preserving broadcasting literally hundreds of > times. It's rather bold of you to say that all of that code is "broken" > Sorry I was not specific in the original post. I hope you now understand where I come from. Can you point me to some examples of the correct way to use dimension-preserving broadcasting? I would assume that it is probably more useful in the problem domains where there is no natural ordering of the dimensions, unlike in the hierarchial data example that I used. > Now, I'm sure there are other useful ways to "broadcast", but > dimension-preserving is essentially what broadcasting *is* in NumPy. > If anything it is the dimension-increasing rule that is somewhat > arbitrary (e.g. why prepend with ones). > The dimension-increasing broadcasting is very natural when you deal with hierarchical data where various dimensions correspond to the levels of aggregation. As I explained above, average student score per class makes sense while the average score per student over classes does not. It is very common to combine per-class data with per-student data by broadcasting per-class data. For example, the total time spent by student is a sum spent in regular per-class session plus individual elected courses. > > Perhaps you want to introduce some other way for non-commensurate shapes > to interact in an operation. I think you will find many open minds on > this list (although probably not anyone who will want to code it up :-) > ). We do welcome the discussion. Your experience with other > array-like languages is helpful. > I will be happy to contribute code if I see interest. > > > Similarly the > > need to broadcast over non-leading dimension is a sign of bad design. > > In rare cases where such broadcasting is desirable, it can be easily > > done via swapaxes which is a cheap operation. > > > > Again, it would help if you would refrain from using negative words > about coding styles that are different from your own. Such > broadcasting is not that rare. It happens quite frequently, actually. > The point of a language like Python is that you can write algorithms > simply without struggling with optimization questions up front like you > seem to be hinting at. > I hope you understand that I did not mean to criticize anyone's coding style. I was not really hinting at optimization issues, I just had a particular design problem in mind (see above). Incidentally, dimension-increasing broadcasting does tend to lead to more efficient code both in terms of memory utilization and more straightforward algorithms with fewer special cases, but this was not really what I was referring to. > > On the other hand I don't see much problem in making > > dimension-preserving broadcasting more permissive. In R, for example, > > (1-d) arrays can be broadcast to arbitrary size. This has an > > additional benefit that 1-d to 2-d broadcasting requires no special > > code, it just happens because matrices inherit arithmetics from > > vectors. I've never had a problem with R rules being too loose. > > > > So, please explain exactly what you mean. Only a few on this list know > what the R rules even are. See above. > > In my view the problem that your ticket highlighted is not so much in > > the particular set of broadcasting rules, but in the fact that a[...] > > = b uses one set of rules while a[...] += b uses another. This is > > *very* confusing. > > > > Yes, this is admittedly confusing. But, it's an outgrowth of the way > Numeric code developed. Broadcasting was always only a ufunc concept in > Numeric, and copying was not a ufunc. NumPy grew out of Numeric > code. I was not trying to mimick broadcasting behavior when I wrote > the array copy and array setting code. Perhaps I should have been. > In the spirit of appealing to obscure languages ;-), let me mention that in the K language (kx.com) element assignment is implemented using an Amend primitive that takes four arguments: @[x,i,f,y] id more or less equivalent to numpy's x[i] = f(x[i], y[i]), where x, y and i are vectors and f is a binary (broadcasting) function. Thus, x[i] += y[i] can be written as @[x,i,+,y] and x[i] = y[i] is @[x,i,:,y], where ':' denotes a binary function that returns it's second argument and ignores the first. K interpretor's Linux binary is less than 200K and that includes a simple X window GUI! Such small code size would not be possible without picking the right set of primitives and avoiding special case code. > I'm willing to change the code on this one, but only if the new copy > code actually does implement broadcasting behavior equivalently. And > going through the ufunc machinery is probably a waste of effort because > the copy code must be written for variable length arrays anyway (and > ufuncs don't support them). > I know close to nothing about variable length arrays. When I need to deal with the relational database data, I transpose it so that each column gets into its own fixed length array. This is the approach that both R and K take. However, at least at the C level, I don't see why ufunc code cannot be generalized to handle variable length arrays. At the python level, pre-defined arithmetic or math functions are probably not feasible for variable length, but the ability to define a variable length array function by just writing an inner loop implementation may be quite useful. > However, the broadcasting machinery has been abstracted in NumPy and can > therefore be re-used in the copying code. In Numeric, broadcasting was > basically implemented deep inside a confusing while loop. I've never understood the Numeric's while loop and completely agree with your characterization. I am still studying the numpy code, but it is clearly a big improvement. From shhong at u.washington.edu Wed Apr 26 14:19:01 2006 From: shhong at u.washington.edu (Sungho Hong) Date: Wed Apr 26 14:19:01 2006 Subject: [Numpy-discussion] Building Numpy with Windows and MKL? Message-ID: <207B8B70-6328-421D-8343-B32506AF47CA@u.washington.edu> Has anyone tried to install numpy with MS Windows and Intel Math Kernel Library, especially using the VC 2003 compiler? I began with MKLROOT=C:\Program Files\Inter\plsuite, but the setup.py seems to have a problem with finding the library path. In that case, how do manually set up all the relevant paths manually? Thanks. - SH From ryanlists at gmail.com Wed Apr 26 14:21:07 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Wed Apr 26 14:21:07 2006 Subject: [Numpy-discussion] array.min() vs. min(array) Message-ID: I was spending some time trying to track down how to speed up an algorithm that gets called a bunch of times during an optimization. I was startled when I finally figured out that most of the time was wasted by using the built-in pyhton min function. It turns out that in my case, using array.min() (i.e. the method of the Numpy array) is 300-500 times faster than the built-in python min function (i.e. min(array)). So, thank you Travis and everyone who has put so much time into thinking through Numpy and making it fast (as well as making sure it is correct). And to the rest of us, use the Numpy array methods whenever you can. Thanks, Ryan From oliphant.travis at ieee.org Wed Apr 26 14:42:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Apr 26 14:42:05 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: Message-ID: <444FE909.5080209@ieee.org> Ryan Krauss wrote: > I was spending some time trying to track down how to speed up an > algorithm that gets called a bunch of times during an optimization. I > was startled when I finally figured out that most of the time was > wasted by using the built-in pyhton min function. It turns out that > in my case, using array.min() (i.e. the method of the Numpy array) is > 300-500 times faster than the built-in python min function (i.e. > min(array)). > > So, thank you Travis and everyone who has put so much time into > thinking through Numpy and making it fast (as well as making sure it > is correct). The builtin min function is a bit confusing because it usually does work on NumPy arrays. But, as you've noticed it is always slower because it uses the "generic sequence interface" that NumPy arrays expose. So, it's basically not much faster than a Python loop. In this case you are also being hit by the fact that scalarmath is not yet implemented (it's getting close though...) so the returned array scalars are being compared using the bulky ufunc machinery on each element separately. In Python 2.5 we are going to have the same issues with the new any() and all() functions of Python. -Travis From wbaxter at gmail.com Wed Apr 26 14:56:12 2006 From: wbaxter at gmail.com (Bill Baxter) Date: Wed Apr 26 14:56:12 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: Is that a representative example? It seems highly unlikely that in real life every one of the schools would have exactly 20 classes, and each of those exactly 30 students. I don't know anything about R or the way things are typically done with statistical languages -- maybe this is the norm there -- but from a pure CompSci data structures perspective, a 3D array seems ill-suited for this type of hierarchical data. Something more flexible, along the lines of a Python list of list of list, seems more apropriate. --bill On 4/27/06, Sasha wrote: > Suppose we study school statistics in > different cities. Let city A have 10 schools with 20 classes and 30 > students in each. It is natural to organize the data collected about > the students in a 10x20x30 array. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Wed Apr 26 15:24:07 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 15:24:07 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: On 4/26/06, David M. Cooke wrote: > .... > Here's what I just checked in: > > concatenate((a1, a2, ...), axis=None) joins arrays together > > The tuple of sequences (a1, a2, ...) are joined along the given axis > (default is the first one) into a single numpy array. > > Example: > > >>> concatenate( ([0,1,2], [5,6,7]) ) > array([0, 1, 2, 5, 6, 7]) > The first argument does not have to be a tuple: >>> print concatenate([[0,1,2], [5,6,7]]) [0 1 2 5 6 7] but the docstring is probably ok given that the alternative is "sequence of sequences" ... From ndarray at mac.com Wed Apr 26 15:58:04 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 15:58:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: On 4/26/06, Bill Baxter wrote: > Is that a representative example? It seems highly unlikely that in real > life every one of the schools would have exactly 20 classes, and each of > those exactly 30 students. You should not take my toy example too seriousely. However, with support for missing values, 3-d arrays may provide an efficient representation for a more realistic scenario when you only know upper bounds for the number of students/classes. Smaller schools will have missing values in their arrays. > I don't know anything about R or the way things > are typically done with statistical languages -- maybe this is the norm > there -- but from a pure CompSci data structures perspective, a 3D array > seems ill-suited for this type of hierarchical data. Something more > flexible, along the lines of a Python list of list of list, seems more > apropriate. > You are right. I am sorely missing ragged array support in numpy like the one available in K. Numpy supports nested arrays, but does not optimize the most common case when nested arrays are of the same type. > --bill > > > On 4/27/06, Sasha wrote: > > > Suppose we study school statistics in > > different cities. Let city A have 10 schools with 20 classes and 30 > > students in each. It is natural to organize the data collected about > > the students in a 10x20x30 array. > > > From ndarray at mac.com Wed Apr 26 16:16:07 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 16:16:07 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: On 4/26/06, Sasha wrote: > On 4/26/06, Bill Baxter wrote: > > Is that a representative example? It seems highly unlikely that in real > > life every one of the schools would have exactly 20 classes, and each of > > those exactly 30 students. > > You should not take my toy example too seriousely. However, with > support for missing values, 3-d arrays may provide an efficient > representation for a more realistic scenario when you only know upper > bounds for the number of students/classes. Smaller schools will have > missing values in their arrays. In addition, it is reasonable to sample a fixed number of classes from each school and a fixed number of students from each class at random for a statistical study. From simon at arrowtheory.com Wed Apr 26 16:41:04 2006 From: simon at arrowtheory.com (Simon Burton) Date: Wed Apr 26 16:41:04 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? Message-ID: <20060427094025.10172889.simon@arrowtheory.com> Is it possible to obtain a permutation (array of indices) representing the transform that sorts an array ? Is there a numpy way of doing this ? I can do it in python as: a = [ 6, 5, 99, 2 ] idxs = range(len(a)) z = zip(idxs,a) def zcmp(u,v): if u[1]<=v[1]: return -1 return 1 z.sort( zcmp ) idxs = [u[0] for u in z] # <--- permutation Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From pgmdevlist at mailcan.com Wed Apr 26 16:45:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Wed Apr 26 16:45:02 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? In-Reply-To: <20060427094025.10172889.simon@arrowtheory.com> References: <20060427094025.10172889.simon@arrowtheory.com> Message-ID: <200604261944.01584.pgmdevlist@mailcan.com> On Wednesday 26 April 2006 19:40, Simon Burton wrote: > Is it possible to obtain a permutation (array of indices) > representing the transform that sorts an array ? Is there a numpy way > of doing this ? I guess argsort() could be what you want From ndarray at mac.com Wed Apr 26 16:45:03 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 16:45:03 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? In-Reply-To: <20060427094025.10172889.simon@arrowtheory.com> References: <20060427094025.10172889.simon@arrowtheory.com> Message-ID: >>> argsort([ 6, 5, 99, 2 ]) array([3, 1, 0, 2]) On 4/26/06, Simon Burton wrote: > > Is it possible to obtain a permutation (array of indices) > representing the transform that sorts an array ? Is there a numpy way > of doing this ? > > I can do it in python as: > > a = [ 6, 5, 99, 2 ] > idxs = range(len(a)) > z = zip(idxs,a) > def zcmp(u,v): > if u[1]<=v[1]: > return -1 > return 1 > z.sort( zcmp ) > idxs = [u[0] for u in z] # <--- permutation > > Simon. > > -- > Simon Burton, B.Sc. > Licensed PO Box 8066 > ANU Canberra 2601 > Australia > Ph. 61 02 6249 6940 > http://arrowtheory.com > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From zpincus at stanford.edu Wed Apr 26 16:46:05 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Wed Apr 26 16:46:05 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? In-Reply-To: <20060427094025.10172889.simon@arrowtheory.com> References: <20060427094025.10172889.simon@arrowtheory.com> Message-ID: <800F9820-F672-4EBF-8F48-3C3AEF17FC34@stanford.edu> a.argsort() or numpy.argsort(a) Zach On Apr 26, 2006, at 4:40 PM, Simon Burton wrote: > > Is it possible to obtain a permutation (array of indices) > representing the transform that sorts an array ? Is there a numpy way > of doing this ? > > I can do it in python as: > > a = [ 6, 5, 99, 2 ] > idxs = range(len(a)) > z = zip(idxs,a) > def zcmp(u,v): > if u[1]<=v[1]: > return -1 > return 1 > z.sort( zcmp ) > idxs = [u[0] for u in z] # <--- permutation > > Simon. > > -- > Simon Burton, B.Sc. > Licensed PO Box 8066 > ANU Canberra 2601 > Australia > Ph. 61 02 6249 6940 > http://arrowtheory.com > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From pearu at scipy.org Wed Apr 26 16:56:05 2006 From: pearu at scipy.org (Pearu Peterson) Date: Wed Apr 26 16:56:05 2006 Subject: [Numpy-discussion] Possible ref.count bug in changeset #2422 Message-ID: Hi, Shouldn't result be Py_INCRE'ted when it is equal to Py_NotImplemented and returned from array_richcompare? Pearu From doug5y at shaw.ca Wed Apr 26 17:10:05 2006 From: doug5y at shaw.ca (Doug Nadworny) Date: Wed Apr 26 17:10:05 2006 Subject: [Numpy-discussion] Can't install numpy-0.9.6-1.i586.rpm on FC5 Message-ID: <44500B9E.10602@shaw.ca> when trying to install numpy-0.9.6-1.i586.rpm on Fedora Core 5, rpm reports incorrectly that python is the incorrect version, even though it is correct: >rpm -i --test numpy-0.9.6-1.i586.rpm ## Tests dependences of rpm package error: Failed dependencies: python-base >= 2.4 is needed by numpy-0.9.6-1.i586 >python -V Python 2.4.2 Is there a way around this? TIA, Doug N From cookedm at physics.mcmaster.ca Wed Apr 26 17:20:05 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 26 17:20:05 2006 Subject: [Numpy-discussion] Possible ref.count bug in changeset #2422 In-Reply-To: (Pearu Peterson's message of "Wed, 26 Apr 2006 18:55:55 -0500 (CDT)") References: Message-ID: Pearu Peterson writes: > Hi, > > Shouldn't result be Py_INCRE'ted when it is equal to Py_NotImplemented > and returned from array_richcompare? Theoretically, yes, but since the case statement "should" cover all cases, it doesn't matter. Bad code style though on my part; I've added a default: case instead. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From silesalvarado at hotmail.com Wed Apr 26 17:33:04 2006 From: silesalvarado at hotmail.com (Hugo Siles) Date: Wed Apr 26 17:33:04 2006 Subject: [Numpy-discussion] crush!!!! Message-ID: HI, I have a problem when I run the following options in python: >>>from Numeric import * >>>from Linear algebra I define a matrix 'a' which prints correctly, calculates its inverse, determinat and so for but when I try to calculate the eigenvalues, such as >>> c = eigenvalues(a) the system just crushs without any message I made this test because in some other programs with source code happens the same thing. I hope some body can help, thanks Hugo Siles From ivazquez at ivazquez.net Wed Apr 26 17:33:08 2006 From: ivazquez at ivazquez.net (Ignacio Vazquez-Abrams) Date: Wed Apr 26 17:33:08 2006 Subject: [Numpy-discussion] Can't install numpy-0.9.6-1.i586.rpm on FC5 In-Reply-To: <44500B9E.10602@shaw.ca> References: <44500B9E.10602@shaw.ca> Message-ID: <1146098100.16081.15.camel@ignacio.lan> On Wed, 2006-04-26 at 18:09 -0600, Doug Nadworny wrote: > when trying to install numpy-0.9.6-1.i586.rpm on Fedora Core 5, rpm > reports incorrectly that python is the incorrect version, even though it > is correct: > > >rpm -i --test numpy-0.9.6-1.i586.rpm ## Tests dependences of rpm package > error: Failed dependencies: > python-base >= 2.4 is needed by numpy-0.9.6-1.i586 > >python -V > Python 2.4.2 Alright, alright, I'll update it already... -- Ignacio Vazquez-Abrams http://fedora.ivazquez.net/ gpg --keyserver hkp://subkeys.pgp.net --recv-key 38028b72 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 191 bytes Desc: This is a digitally signed message part URL: From ndarray at mac.com Wed Apr 26 18:15:04 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 18:15:04 2006 Subject: [Numpy-discussion] crush!!!! In-Reply-To: References: Message-ID: Numeric computes by calling lapack's dgeev subroutine. Depending on installation Numeric may either use its own subset of lapack (translated from Fortran to C) or link to the system supplied Lapack libraries. It is possible that there is a bug in your system's lapack libraries. Some lapack bugs related to extended precision calculations were reported recently. What you observe is unlikely to be a Numeric bug. Note, however that Numeric is no longer actively supported. If you can reproduce the same problem with numpy, it will likely to get more attention. Also you have to give us some means to reproduce your matrix a if you expect more than a general advise. On 4/26/06, Hugo Siles wrote: > HI, > > I have a problem when I run the following options in python: > > >>>from Numeric import * > >>>from Linear algebra > I define a matrix 'a' which prints correctly, calculates its inverse, > determinat and so for > but when I try to calculate the eigenvalues, such as > >>> c = eigenvalues(a) > the system just crushs without any message > I made this test because in some other programs with source code happens the > same thing. > > I hope some body can help, thanks > > Hugo Siles > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From strawman at astraw.com Wed Apr 26 19:26:05 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 26 19:26:05 2006 Subject: [Numpy-discussion] SWIG for 3D array In-Reply-To: References: Message-ID: <44502B85.3000504@astraw.com> Gennan Chen wrote: > And I really have hard time to understand how to deal with reference > counting issues. Anyone know where I can know a good reference for that? http://docs.python.org/ext/refcounts.html From oliphant.travis at ieee.org Wed Apr 26 20:30:12 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Apr 26 20:30:12 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: <44503A8A.2050701@ieee.org> Sasha wrote: > In my view the most appealing feature in Python is > the Zen of Python " and in > particular "There should be one-- and preferably only one --obvious > way to do it." In my view Python represents the "hard science" > approach appealing to physics and math types while Perl is more of a > "soft science" language. Interesting analogy. I've not heard that expression before. > Unfortunately, it is the fact of life that there are > always many ways to solve the same problem and a successful "pythonic" > design has to pick one (preferably the best) of the possible ways and > make it obvious. > And it's probably impossible to agree as to what is "best" because of the different uses that array's receive. That's one reason I'm anxious to get a basic structure-only basearray into Python itself. > This said, let me present a specific problem that I will use to > illustrate my points below. Suppose we study school statistics in > different cities. Let city A have 10 schools with 20 classes and 30 > students in each. It is natural to organize the data collected about > the students in a 10x20x30 array. It is also natural to collect some > of the data at the per-school or per-class level. This data may come > from aggregating student level statistics (say average test score) or > from the characteristics that are class or school specific (say the > grade or primary language). There are two obvious ways to present > such data. 1) We can use 3-d arrays for everything and make the shape > of the per-class array 10x20x1 and the shape of per-school array > 10x1x1; and 2) use 2-d and 1-d arrays. The first approach seems to be > more flexible. We can also have 10x1x30 or 1x1x30 arrays to represent > data which varies along the student dimension, but is constant across > schools or classes. However, this added benefit is illusory: the > first student in one class list has no relationship to the first > student in the other class, so in this particular problem an average > score of the first student across classes makes no sense (it will also > depend on whether students are ordered alphabetically or by an > achievement rank). > > On the other hand this approach has a very significant drawback: > functions that process city data have no way to distinguish between > per-school data and a lucky city that can afford educating its > students in individual classes. Just as it is extremely unlikely to > have one student per class in our toy example, in real-world problems > it is not unreasonable to assume that dimension of size 1 represents > aggregate data. A software designed based on this assumption is what > I would call broken in a subtle way. > I think I see what you are saying. This is a very specific circumstance. I can verify that the ndarray has not been designed to distinguish such hierarchial data. You will never be able to tell from the array itself if a dimension of length 1 means aggregate data or not. I don't see that as a limitation of the ndarray but as evidence that another object (i.e. an R-like data-frame) should probably be used. Such an object could even be built on top of the ndarray. >> [...] >> I don't think anyone is fundamentally opposed to multiple repetitions. >> We're just being cautious. Also, as you've noted, the assignment code >> is currently not using the ufunc broadcasting code and so they really >> aren't the same thing, yet. >> > > It looks like there is a lot of development in this area going on at > the moment. Please let me know if I can help. > Well, I did some refactoring to make it easier to expose the basic concept of the ufunc elsewhere: 1) Adjusting the inputs to a common shape (this is what I call broadcasting --- it appears to me that you use the term a little more loosely) 2) Setting up iterators to iterate over all but the longest dimension so that the inner loop is done. These are the key ingredients to a fast ufunc. There is 1 more optimization in the ufunc machinery for the contiguous case (when the inner loop is all that is needed) and then there is code to handle the buffering needed for unaligned and/or byte-swapped data. The final thing that makes a ufunc is the precise signature of the inner loop. Every inner loop as the same signature. This signature does not contain a slot for the length of the array element (that's a big reason why variable-length arrays are not supported in ufuncs). The ufuncs could be adapted, of course, but it was a bigger fish than I wanted to try and fry pre 1.0 Note, though, that I haven't used these concepts yet to implement ufunc-like copying. The PyArray_Cast function will also need to be adjusted at the same time and this could actually prove more difficult as it must implement buffering. Of course it could give us a chance to abstract-out the buffered, broadcasted call as well. That might make a useful C-API function. Any help you can provide would be greatly appreciated. I'm focused right now on the scalar math module as without it, NumPy is still slower for people that use a lot of array elements. >> [...] >> >>> In my experience broadcasting length-1 and not broadcasting other >>> lengths is very error prone as it is. >>> >> That's not been my experience. >> > > I should have been more specific. As I explained above, the special > properties of length-1 led me to design a system that distinguished > aggregate data by testing for unit length. This system was subtly > broken. In a rare case when the population had only one element, the > system was producing wrong results. > Yes I can see that now. Your comments make a lot more sense. Trying to use ndarray's to represent hierarchial data can cause these subtle issues. The ndarray is a "flat" object in the sense that every element is seen as "equal" to every other element. >> dim(x) <- c(2,5) >> x >> > [,1] [,2] [,3] [,4] [,5] > [1,] 0 0 0 0 0 > [2,] 0 0 0 0 0 > > (R uses Fortran order). Broadcasting ignores the dim attribute, but > does the right thing for conformable vectors: > > Thanks for the description of R. >> x + c(1,2) >> > [,1] [,2] [,3] [,4] [,5] > [1,] 1 1 1 1 1 > [2,] 2 2 2 2 2 > > However, the following is unfortunate: > Ahh... So, it looks like R does on arithmetic what NumPy copying is currently doing (treating both as flat spaces to fill). >> x >> > Sorry I was not specific in the original post. I hope you now > understand where I come from. Can you point me to some examples of > the correct way to use dimension-preserving broadcasting? I would > assume that it is probably more useful in the problem domains where > there is no natural ordering of the dimensions, unlike in the > hierarchial data example that I used. > Yes, the ndarray does not recognize any natural ordering to the dimensions at all. Every dimension is "equal." It's designed to be a very basic object. I'll post some examples later. I've got to go right now. > The dimension-increasing broadcasting is very natural when you deal > with hierarchical data where various dimensions correspond to the > levels of aggregation. As I explained above, average student score > per class makes sense while the average score per student over classes > does not. It is very common to combine per-class data with > per-student data by broadcasting per-class data. For example, the > total time spent by student is a sum spent in regular per-class > session plus individual elected courses. > I think you've hit on something here regarding the use of an array for "hierachial" data. I'm not sure I understand the implications entirely, but at least it helps me a little bit see what your concerns really are. > I hope you understand that I did not mean to criticize anyone's coding > style. I was not really hinting at optimization issues, I just had a > particular design problem in mind (see above). I do understand much better now. I still need to think about the hierarchial case a bit more. My basic concept of an array which definitely biases me is a medical imaging volume.... (i.e. the X-ray density at each location in 3-space). I could use improved understanding of how to use array's effectively in hierarchies. Perhaps we can come up with some useful concepts (or maybe another useful structure that inherits from the basearray) and can therefore share data effectively with the ndarray.... > In the spirit of appealing to obscure languages ;-), let me mention > that in the K language (kx.com) element assignment is implemented > using an Amend primitive that takes four arguments: @[x,i,f,y] id more > or less equivalent to numpy's x[i] = f(x[i], y[i]), where x, y and i > are vectors and f is a binary (broadcasting) function. Thus, x[i] += > y[i] can be written as @[x,i,+,y] and x[i] = y[i] is @[x,i,:,y], where > ':' denotes a binary function that returns it's second argument and > ignores the first. K interpretor's Linux binary is less than 200K and > that includes a simple X window GUI! Such small code size would not be > possible without picking the right set of primitives and avoiding > special case code. > Not to mention limiting the number of data-types :-) > I know close to nothing about variable length arrays. When I need to > deal with the relational database data, I transpose it so that each > column gets into its own fixed length array. Yeah, that was my strategy too and what I always suggested to the numarray folks who wanted the variable-length arrays. But, memory-mapping can't be done that way.... > This is the approach > that both R and K take. However, at least at the C level, I don't see > why ufunc code cannot be generalized to handle variable length arrays. > They of course, could be, it's just more re-factoring than I wanted to do. The biggest issue is the underlying 1-d loop function signature. I hesitated to change the signature because that would break compatibility with Numeric extension modules that defined ufuncs (like scipy-special...) The length could piggy-back in the data argument passed into those functions, but doing that right was more work than I wanted to do. If you solve that problem, everything else could be made to work without too much trouble. > At the python level, pre-defined arithmetic or math functions are > probably not feasible for variable length, but the ability to define a > variable length array function by just writing an inner loop > implementation may be quite useful. > Yes, it could have helped write the string comparisons much faster :-) >> However, the broadcasting machinery has been abstracted in NumPy and can >> therefore be re-used in the copying code. In Numeric, broadcasting was >> basically implemented deep inside a confusing while loop. >> > > I've never understood the Numeric's while loop and completely agree > with your characterization. I am still studying the numpy code, but > it is clearly a big improvement. > Well, it's more straightforward because I'm not the genius Jim Hugunin is. It makes heavy use of the iterator concept which I finally grok'd while trying to write things (and realized I had basically already implemented in writing the old scipy.vectorize). I welcome many more eyes on the code. I know I've taken shortcuts in places that should be improved. Thanks for your continued help and useful comments. -Travis From tim.hochberg at cox.net Wed Apr 26 21:02:10 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 26 21:02:10 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: <44504296.8040602@cox.net> I haven't fully waded through all the various replies and to this thread. I plan to do that and send a reply on specific points later. This is message is more of a historical, motivational or possibly philosophical nature. First off, NumPy has used the term "broadcast" to mean the same thing since its inception and changing the terminology now is asking for confusion. *In the context of this mailing list *,I think we should use "broadcast" in the numpy sense and use appropriate qualifiers when referring to how other array packages practice broadcasting. Referring to broadcasting as "shape-preserving broadcasting" or some such doesn't seems to make things any clearer and adds a bunch of excess verbiage. In any event, I plan to omit any "broadcast" qualifiers here. The following understanding was formed by using and occasionally helping with development of NumPy since it was developed in 1995 or thereabouts. That doesn't mean that my understanding aggrees with the primary developers of the time, I may misremember things and my recollections are likely tinged by the experience I've had with NumPy in the interim. So, don't take this as definitive, but perhaps it will help provide some insight into what NumPy's broadcasting is supposed to be. Let's first dispense with the padding of dimensions. As I recall, this was a way to make matrix like operations easier. This was way before there was a matrix class and by defining padding in this way 1-D vectors could generally be treated as column vectors. Row vectors still needed to be 2-D (1xN), but they tended to be less frequent, so that was less of a burden. Or maybe I have that backwards, in any event they were put there to to facilitate matrix-like uses of numpy arrays. Given that there is a matrix class at this point, I doubt I would automagically pad the dimensions if I were designing numpy from scratch now. Since the dimension padding is at least partly historical accident and since it is in some sense orthogonal to the main point of numpy's broadcasting I'm going to pretend it doesn't exist for the rest of this discussion. At it's core broadcasting is about adjusting the shapes of two arrays so that they match. Consider an array 'A' and an array 'B' with shaps (3, Any) and (Any, 4). Here, 'Any' means that the given dimension of the array is unspecified and can take on any value that is convenient for functions operating on the array. If we add 'A' and 'B' together we'd like the two 'Any' dimensions to stretch appropriately so that the result was an array of shape (3, 4). Similarly adding and array of shape (3, 4) to an array of shape (Any, 4) should work and produce an array of shape (3, 4). So far, this is pretty straightforward; I believe, it also bears a fair amount of resemblance to Sasha's 0-stride ideas. The complicating factor is that there wasn't a good way to spell 'Any' at the time. Or maybe we were lazy. Or maybe there was some other reason that I'm forgetting. In any event, we ended up spelling 'Any' as '1'. That means that there's no way to distinguish between a dimension that's of length-1 for some legitimate reason and one that is that length just for stretchability. It would be an interesting experiment to see how things would work with no padding and with an explicit 'Any' value available for dimensions. However, it's probably too much work and would result in too many backwards compatibility problems for NumPy proper. [Half baked thoughts on how to do this though: newaxis would produce a new axis with length -1 (or some other marker length). This would be treated as length-1 axes are treated now. However, length-1axes would no longer broadcast. Padding would be right out.] In summary, the platonic ideal of broadcasting is simple and clean. In practice it's more complicated for two reasons. First, padding the dimensions.I believe that this is mostly historical baggage. The second is the conflation of '1' and 'Any' (a name that I made up for this message, so don't go searching for it). This may be an hostorical accident and/or implementation artifact, but there may actually be some more practical reasons behind this as well that I am forgetting. Hopefully that is mildly informative, Regards, -tim From kwgoodman at gmail.com Wed Apr 26 21:46:08 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed Apr 26 21:46:08 2006 Subject: [Numpy-discussion] matrix.std() returns array Message-ID: I noticed that the mean of a matrix is a matrix but the standard deviation of a matrix is an array. Is that the expected behavior? I'm also getting the wrong values (0 and nan) for the standard deviation. Did I mess something up? I'm trying to learn scipy (and python) by porting a small Octave program. I installed numpy from svn (today) on a Debian box. And numpy.test() says OK. Here's an example: >> numpy.__version__ '0.9.7.2416' >> x = asmatrix(random.uniform(0,1,(3,3))) >> x matrix([[ 0.56771284, 0.57053769, 0.57505946], [ 0.10479534, 0.81692248, 0.91829316], [ 0.48627829, 0.59255983, 0.32628573]]) >> x.mean(0) matrix([[ 0.38626216, 0.66000667, 0.60654612]]) >> x.std(0) array([ nan, 0. , 0. ]) From arnd.baecker at web.de Wed Apr 26 23:01:03 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 23:01:03 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: On Wed, 26 Apr 2006, Sasha wrote: > On 4/26/06, David M. Cooke wrote: > > .... > > Here's what I just checked in: > > > > concatenate((a1, a2, ...), axis=None) joins arrays together > > > > The tuple of sequences (a1, a2, ...) are joined along the given axis > > (default is the first one) into a single numpy array. > > > > Example: > > > > >>> concatenate( ([0,1,2], [5,6,7]) ) > > array([0, 1, 2, 5, 6, 7]) > > > > The first argument does not have to be a tuple: > > >>> print concatenate([[0,1,2], [5,6,7]]) > [0 1 2 5 6 7] > > but the docstring is probably ok given that the alternative is > "sequence of sequences" ... Seems to be the usual problem of either being slightly unprecise but understandable or legally correct but impossible to understand (in particular for beginners). What about changing the example to: """ Examples: >>> concatenate(([0, 1, 2], [5, 6, 7])) array([0, 1, 2, 5, 6, 7]) >>> concatenate([[0, 1, 2], [5, 6, 7]]) array([0, 1, 2, 5, 6, 7]) >>> z = arange(5) >>> concatenate(([0, 1, 2], [5, 6, 7], z)) array([0, 1, 2, 5, 6, 7, 0, 1, 2, 3, 4]) """ Best, Arnd From Chris.Barker at noaa.gov Wed Apr 26 23:42:02 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed Apr 26 23:42:02 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <444FE909.5080209@ieee.org> References: <444FE909.5080209@ieee.org> Message-ID: <445067C6.3050805@noaa.gov> Travis Oliphant wrote: > In Python 2.5 we are going to have the same issues with the new any() > and all() functions of Python. "Namespaces are one honking great idea -- let's do more of those!" Yet another reason to deprecate import * ! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From arnd.baecker at web.de Wed Apr 26 23:49:06 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 23:49:06 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <444FE909.5080209@ieee.org> References: <444FE909.5080209@ieee.org> Message-ID: Moin, On Wed, 26 Apr 2006, Travis Oliphant wrote: > Ryan Krauss wrote: > > I was spending some time trying to track down how to speed up an > > algorithm that gets called a bunch of times during an optimization. I > > was startled when I finally figured out that most of the time was > > wasted by using the built-in pyhton min function. It turns out that > > in my case, using array.min() (i.e. the method of the Numpy array) is > > 300-500 times faster than the built-in python min function (i.e. > > min(array)). > > > > So, thank you Travis and everyone who has put so much time into > > thinking through Numpy and making it fast (as well as making sure it > > is correct). > > The builtin min function is a bit confusing because it usually does work > on NumPy arrays. But, as you've noticed it is always slower because it > uses the "generic sequence interface" that NumPy arrays expose. So, > it's basically not much faster than a Python loop. In this case you are > also being hit by the fact that scalarmath is not yet implemented (it's > getting close though...) so the returned array scalars are being > compared using the bulky ufunc machinery on each element separately. > > In Python 2.5 we are going to have the same issues with the new any() > and all() functions of Python. I am just preparing a small text to collect such cases for the wiki. However, I am not sure about a good name for such a page: http://www.scipy.org/Cookbook/Speed http://www.scipy.org/Cookbook/SpeedProblems http://www.scipy.org/Cookbook/Performance ? (As usual, it is easy to start a page, than to properly maintain it. OTOH things like this get lost very quickly, in particular with this nice amount of traffic here). In addition this also relates to - profiling (For example I would like to add the contents of http://mail.enthought.com/pipermail/enthought-dev/2006-January/001075.html to the wiki at some point) - psyco - pyrex - f2py - weave - numexpr - ... Presently much of this is listed in the Cookbook under "Using NumPy With Other Languages (Advanced)", and therefore the above "Python only" issues don't quite fit. Any suggestions? Best, Arnd From arnd.baecker at web.de Wed Apr 26 23:51:07 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 23:51:07 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <445067C6.3050805@noaa.gov> References: <444FE909.5080209@ieee.org> <445067C6.3050805@noaa.gov> Message-ID: On Wed, 26 Apr 2006, Christopher Barker wrote: > Travis Oliphant wrote: > > > In Python 2.5 we are going to have the same issues with the new any() > > and all() functions of Python. > > "Namespaces are one honking great idea -- let's do more of those!" > > Yet another reason to deprecate import * ! Yep! But it would not work for `min` as there is no such function in numpy. (would we need one?...) Best, Arnd From Chris.Barker at noaa.gov Thu Apr 27 00:00:05 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 27 00:00:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: <44506BE6.10301@noaa.gov> As Sasha quite clearly pointed out, when you do aggregation, you really do want to reduce the dimensionality of your data. IN fact, that's something that always bit me with MATLAB. If I had a matrix that happened to have a dimension of 1, MATLAB would interpret it as a vector. I ended up writing functions like "SumColumns" that would check if it was a single row vector before calling sum, so that I wouldn't suddenly get a scaler result if a matrix happened to have on row. Once you reduce dimensionality with aggregating functions, I can see how it would be natural to want to use broadcasting to to merge the reduced data and full data. However, I can't see how you could do that cleanly. How is the code to know whether a rank-1 array represents a column or row when multiplied with a rank-2 array? There is simply no way to know, in general. I suppose we could define a convention, like: "rank-1 arrays will be interpreted as row vectors for broadcasting." etc. for higher dimensions. However, I've found that even in my code, I don't find one convention always makes the most sense for all applications, so I'm just as happy to make it clear with a lot of calls like: v.shape = (-1, 1) NOTE: It appears that numpy does, in fact, use such a convention: >>> v = N.arange(5) >>> m = N.ones((5,5)) >>> v * m array([[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]) >>> v.shape = (-1,1) >>> v * m array([[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]]) So what's the disagreement about? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Apr 27 00:10:03 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 27 00:10:03 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: <44506E2F.9040902@noaa.gov> David M. Cooke wrote: > Here's what I just checked in: > > concatenate((a1, a2, ...), axis=None) joins arrays together > > The tuple of sequences (a1, a2, ...) are joined along the given axis > (default is the first one) into a single numpy array. > > Example: > > >>> concatenate( ([0,1,2], [5,6,7]) ) > array([0, 1, 2, 5, 6, 7]) While we're at it, why not an example of how the axis argument works: >>> concatenate( (ones((1,3)), zeros((1,3))) ) array([[1, 1, 1], [0, 0, 0]]) >>> concatenate( (ones((1,3)), zeros((1,3))), axis = 0 ) array([[1, 1, 1], [0, 0, 0]]) >>> concatenate( (ones((1,3)), zeros((1,3))), axis = 1 ) array([[1, 1, 1, 0, 0, 0]]) I'm not sure I like this example, but it's a easy way to do a one liner. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant.travis at ieee.org Thu Apr 27 00:53:00 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 27 00:53:00 2006 Subject: [Numpy-discussion] matrix.std() returns array In-Reply-To: References: Message-ID: <4450780C.9060403@ieee.org> Keith Goodman wrote: > I noticed that the mean of a matrix is a matrix but the standard > deviation of a matrix is an array. Is that the expected behavior? I'm > also getting the wrong values (0 and nan) for the standard deviation. > Did I mess something up? > > I'm trying to learn scipy (and python) by porting a small Octave > program. I installed numpy from svn (today) on a Debian box. And > numpy.test() says OK. > > This should be fixed now in SVN. If somebody can add a test that would be great. Note, that the methods taking axes also now preserve row and column orientation for matrices. -Travis From oliphant.travis at ieee.org Thu Apr 27 01:03:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 27 01:03:04 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN Message-ID: <44507A9D.8070902@ieee.org> I want to apologize for the relative instability of the SVN tree in the past couple of days. Getting the scalarmath layout working took more C-API changes than I had anticipated. The SVN version of NumPy now builds scalarmath by default. The basic layout of the module is complete. However, there are many basic functions that are missing. As a result, during compile you will get many warnings about undefined functions. If an attempt were made to load the module it would cause an error as well due to undefined symbols. These undefined symbols are all the basic operations on fundamental c data-types that either need a function defined or a #define statement made. The names have this form: @name at _ctype_@oper@ where @name@ is one of the 16 Number-like types and @oper@ is one of the operations needing to be supported. The function (or macro) needs to implement the operation on the basic data-type and if necessary set an error-flag in the floating-point registers. If anybody has time to help implement these basic operations, it would be greatly appreciated. -Travis From zpincus at stanford.edu Thu Apr 27 01:22:05 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Thu Apr 27 01:22:05 2006 Subject: [Numpy-discussion] matrix.std() returns array In-Reply-To: <4450780C.9060403@ieee.org> References: <4450780C.9060403@ieee.org> Message-ID: <05B8DC8B-CD68-4EF2-BB2B-6FFABABF812E@stanford.edu> On a slightly-related note, was anyone able to reproduce the exception with matrix types and the var() method? e.g. numpy.matrix([[1,2,3], [1,2,3]]).var() complains about unaligned data... Presumably if std is fixed in SVN, so is var. Also if a std unit test is added, a var one should be too. Zach On Apr 27, 2006, at 12:51 AM, Travis Oliphant wrote: > Keith Goodman wrote: >> I noticed that the mean of a matrix is a matrix but the standard >> deviation of a matrix is an array. Is that the expected behavior? I'm >> also getting the wrong values (0 and nan) for the standard deviation. >> Did I mess something up? >> >> I'm trying to learn scipy (and python) by porting a small Octave >> program. I installed numpy from svn (today) on a Debian box. And >> numpy.test() says OK. >> >> > This should be fixed now in SVN. If somebody can add a test that > would be great. > > Note, that the methods taking axes also now preserve row and column > orientation for matrices. > > -Travis > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From arnd.baecker at web.de Thu Apr 27 03:06:17 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 27 03:06:17 2006 Subject: [Numpy-discussion] vectorize problem In-Reply-To: <444FA7E7.2070303@ieee.org> References: <200604251324.42987.steffen.loeck@gmx.de> <444FA7E7.2070303@ieee.org> Message-ID: On Wed, 26 Apr 2006, Travis Oliphant wrote: [...] > It is just a simple change. Scalars are supposed to be supported. > They aren't only as a side-effect of the switch to not return > object-scalars. I did not update the vectorize code to handle the > scalar return value from the object ufunc (which is now no-longer an > object-scalar with the methods of arrays (like astype) but is instead > the underlying object). > > I'll add a check. Works perfect now - many thanks! This reminds me of some other issue when trying to vectorize f2py-wrapped functions: Pearu suggested a fix in terms of a more general way to determine the number of arguments of a callable Python object, http://www.scipy.net/pipermail/scipy-user/2006-April/007617.html However, it seems that this has fallen through the cracks (and I don't see how to incorporate it into numpy.vectorize...) Is this another simple one? ;-) Many thanks, Arnd From gruben at bigpond.net.au Thu Apr 27 05:05:02 2006 From: gruben at bigpond.net.au (Gary Ruben) Date: Thu Apr 27 05:05:02 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: <444FE909.5080209@ieee.org> Message-ID: <4450B34F.8010501@bigpond.net.au> Hi Arnd, You could call it PerformanceTips and include some search terms like "speed" in the page so search engines pick them up. Gary R. Arnd Baecker wrote: > I am just preparing a small text to collect such cases for the wiki. > > However, I am not sure about a good name for such a page: > http://www.scipy.org/Cookbook/Speed > http://www.scipy.org/Cookbook/SpeedProblems > http://www.scipy.org/Cookbook/Performance > ? From ryanlists at gmail.com Thu Apr 27 06:41:08 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 27 06:41:08 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <4450B34F.8010501@bigpond.net.au> References: <444FE909.5080209@ieee.org> <4450B34F.8010501@bigpond.net.au> Message-ID: I think this is a great idea. We get a lot of these kinds of questions on the list, and the collective wisdom of people here who have really dug into this is really impressive. But, that wisdom does need to be a little easier to find. Speaking of which, I don't always feel like I get trustworthy results out of the profiler, so when I really want to know what is going on I find myself doing this alot: t1=time.time() [block of code here] t2=time.time() [more code] t3=time.time() and then comparing t3-t2 and t2-t1 to narrow down where the code is spending its time. Does anyone have good tips on how to do good profiling? Or is this question so vague and counter-intuitive that I seem silly and I had better come back with a believable example? Thanks, Ryan On 4/27/06, Gary Ruben wrote: > Hi Arnd, > > You could call it PerformanceTips and include some search terms like > "speed" in the page so search engines pick them up. > > Gary R. > > Arnd Baecker wrote: > > > I am just preparing a small text to collect such cases for the wiki. > > > > However, I am not sure about a good name for such a page: > > http://www.scipy.org/Cookbook/Speed > > http://www.scipy.org/Cookbook/SpeedProblems > > http://www.scipy.org/Cookbook/Performance > > ? > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From arnd.baecker at web.de Thu Apr 27 06:56:08 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 27 06:56:08 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <4450B34F.8010501@bigpond.net.au> References: <444FE909.5080209@ieee.org> <4450B34F.8010501@bigpond.net.au> Message-ID: On Thu, 27 Apr 2006, Gary Ruben wrote: > Hi Arnd, > > You could call it PerformanceTips and include some search terms like > "speed" in the page so search engines pick them up. Alright, I put all I know on this (which is not that much ;-) at http://www.scipy.org/PerformanceTips The pointers to weave/f2py/pyrex/ (ah - psyco is missing) will have to be added. Also the profiling/benchmarking aspect, which is important (actually more important even before thinking about PerformanceTips) needs to be put somewhere, maybe even separately under http://www.scipy.org/BenchmarkingAndProfiling Best, Arnd > Gary R. > > Arnd Baecker wrote: > > > I am just preparing a small text to collect such cases for the wiki. > > > > However, I am not sure about a good name for such a page: > > http://www.scipy.org/Cookbook/Speed > > http://www.scipy.org/Cookbook/SpeedProblems > > http://www.scipy.org/Cookbook/Performance > > ? > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From arnd.baecker at web.de Thu Apr 27 07:02:16 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 27 07:02:16 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: <444FE909.5080209@ieee.org> <4450B34F.8010501@bigpond.net.au> Message-ID: On Thu, 27 Apr 2006, Ryan Krauss wrote: > I think this is a great idea. We get a lot of these kinds of > questions on the list, and the collective wisdom of people here who > have really dug into this is really impressive. But, that wisdom does > need to be a little easier to find. > > Speaking of which, I don't always feel like I get trustworthy results > out of the profiler, so when I really want to know what is going on I > find myself doing this alot: > > t1=time.time() > [block of code here] > t2=time.time() > [more code] > t3=time.time() > > and then comparing t3-t2 and t2-t1 to narrow down where the code is > spending its time. > > Does anyone have good tips on how to do good profiling? Or is this > question so vague and counter-intuitive that I seem silly and I had > better come back with a believable example? Maybe this one is of interest then: http://www.physik.tu-dresden.de/~baecker/comp_talks.html and goto "Python and Co - some recent developments" Quite late in the talk there is an example on Profiling (sorry, it seems that no direct linking is possible) The corresponding files are at http://www.physik.tu-dresden.de/~baecker/talks/pyco/BenchExamples/ Essentially it is an example of using kcachegrind to display the results of hotshot (see also: http://mail.enthought.com/pipermail/enthought-dev/2006-January/001075.html ) Best, Arnd > Thanks, > > Ryan > > On 4/27/06, Gary Ruben wrote: > > Hi Arnd, > > > > You could call it PerformanceTips and include some search terms like > > "speed" in the page so search engines pick them up. > > > > Gary R. > > > > Arnd Baecker wrote: > > > > > I am just preparing a small text to collect such cases for the wiki. > > > > > > However, I am not sure about a good name for such a page: > > > http://www.scipy.org/Cookbook/Speed > > > http://www.scipy.org/Cookbook/SpeedProblems > > > http://www.scipy.org/Cookbook/Performance > > > ? > > > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd_______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From faltet at carabos.com Thu Apr 27 07:08:06 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu Apr 27 07:08:06 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: <4450B34F.8010501@bigpond.net.au> Message-ID: <200604271606.52780.faltet@carabos.com> A Dijous 27 Abril 2006 15:40, Ryan Krauss va escriure: > I think this is a great idea. We get a lot of these kinds of > questions on the list, and the collective wisdom of people here who > have really dug into this is really impressive. But, that wisdom does > need to be a little easier to find. > > Speaking of which, I don't always feel like I get trustworthy results > out of the profiler, so when I really want to know what is going on I > find myself doing this alot: > > t1=time.time() > [block of code here] > t2=time.time() > [more code] > t3=time.time() > > and then comparing t3-t2 and t2-t1 to narrow down where the code is > spending its time. > > Does anyone have good tips on how to do good profiling? Or is this > question so vague and counter-intuitive that I seem silly and I had > better come back with a believable example? Well, if you are on Linux, and want to time C extension, then oprofile is a *very* good option. Another profiling tool is Cachegrind, part of Valgrind. It uses the processor emulation of Valgrind to run the executable, and catches all memory accesses for the trace. In addition, you can combine the output of oprofile with Cachegrind. In [3] one can see more info about these and more tools. [1] http://oprofile.sourceforge.net [2] http://kcachegrind.sourceforge.net/ [3] https://uimon.cern.ch/twiki/bin/view/Atlas/OptimisingCode Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From lroubeyrie at limair.asso.fr Thu Apr 27 08:41:03 2006 From: lroubeyrie at limair.asso.fr (Lionel Roubeyrie) Date: Thu Apr 27 08:41:03 2006 Subject: [Numpy-discussion] equality with masked object In-Reply-To: References: <200604250938.48648.lroubeyrie@limair.asso.fr> Message-ID: <200604271740.11385.lroubeyrie@limair.asso.fr> Hi, thanks for your answer, but my problem is that I want to obtain the index of the max value in each column of a 2d masked array, then how can I do that without comparaison? Thanks Le Mardi 25 Avril 2006 15:10, Sasha a ?crit?: > On 4/25/06, Lionel Roubeyrie wrote: > > Why 5.0 == -- return True? A float is it the same as a masked object? > > thanks > > It does not. It returns ma.masked : > >>> test[3] is ma.masked > > True > > You should not access masked data - it makes no sense. The current > behavior is historical and I don't really like it. Masked scalars are > replaced by ma.masked singleton in subscript operations to allow a[i] > is masked idiom. In my view it is not worth the trouble, but my > suggestion to get rid of that feature was not met with much > enthusiasm. > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Lionel Roubeyrie - lroubeyrie at limair.asso.fr LIMAIR http://www.limair.asso.fr From ndarray at mac.com Thu Apr 27 08:57:07 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 08:57:07 2006 Subject: [Numpy-discussion] equality with masked object In-Reply-To: <200604271740.11385.lroubeyrie@limair.asso.fr> References: <200604250938.48648.lroubeyrie@limair.asso.fr> <200604271740.11385.lroubeyrie@limair.asso.fr> Message-ID: On 4/27/06, Lionel Roubeyrie wrote: >[....................] I want to obtain the index of > the max value in each column of a 2d masked array, then how can I do that > without comparaison? ma.argmax(x, axis=0, fill_value=ma.maximum_fill_value(x)) or better: argmax(x.fill(ma.maximum_fill_value(x)), axis=0) From kwgoodman at gmail.com Thu Apr 27 09:32:10 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu Apr 27 09:32:10 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <44506BE6.10301@noaa.gov> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: On 4/26/06, Christopher Barker wrote: > something that always bit me with MATLAB. If I had a matrix that > happened to have a dimension of 1, MATLAB would interpret it as a > vector. I ended up writing functions like "SumColumns" that would check > if it was a single row vector before calling sum, so that I wouldn't > suddenly get a scaler result if a matrix happened to have on row. In Octave or Matlab, all you need to do is sum(x,1). For example: >> x = rand(1,4) x = 0.56755 0.24575 0.53804 0.36521 >> sum(x,1) ans = 0.56755 0.24575 0.53804 0.36521 From schofield at ftw.at Thu Apr 27 09:50:03 2006 From: schofield at ftw.at (Ed Schofield) Date: Thu Apr 27 09:50:03 2006 Subject: [Numpy-discussion] matrix operations with axis=None In-Reply-To: <4450780C.9060403@ieee.org> References: <4450780C.9060403@ieee.org> Message-ID: <4450F6F4.2060800@ftw.at> Travis Oliphant wrote: > Keith Goodman wrote: >> I noticed that the mean of a matrix is a matrix but the standard >> deviation of a matrix is an array. Is that the expected behavior? I'm >> also getting the wrong values (0 and nan) for the standard deviation. >> Did I mess something up? > This should be fixed now in SVN. If somebody can add a test that > would be great. > > Note, that the methods taking axes also now preserve row and column > orientation for matrices. > Well done for doing this. In fact, you beat me to it by a few hours; I was going to post a patch this morning to preserve orientation with matrix operations. The approach I took was different in one respect. Matrix objects currently return a matrix of shape (1, 1) from methods with an axis=None argument. For example: >>> x = asmatrix(random.uniform(0,1,(3,3))) >>> x.std() matrix([[ 0.26890557]]) >>> x.argmax() matrix([[4]]) I believe this behaviour is unfortunate, and that an operation aggregating a matrix over all dimensions should return a scalar. I've posted a patch at http://projects.scipy.org/scipy/numpy/ticket/83 that modifies this behaviour to return scalars (as rank-0 arrays) instead. It also removes some code duplication. The behaviour with the patch is: >>> x.std() 0.29610630190701492 >>> x.std().shape () >>> x.argmax() 3 Returning scalars from methods with an axis=None argument is the current behaviour of scipy sparse matrices, while axis=0 or axis=1 yields a sparse matrix with height or width 1, like numpy matrices. A (1 x 1) sparse matrix would be a strange object indeed, and would not be usable in all contexts where scalars are expected. I suspect the same would hold for (1 x 1) dense matrices. One example is that they cannot be used as indices for Python lists. For some matrix methods, such as argmax, returning a scalar would be highly desirable by allowing simpler code. A potential drawback to this change is that matrix operations aggregating along all dimensions, which would now share the behaviour of numpy arrays, would be no longer be consistent with matrix operations that aggregate along only one dimension, which currently do not reduce dimension, because matrices are inherently 2-d. This could be an argument for introducing a new vector class to represent one-dimensional data with orientation. -- Ed From gnchen at cortechs.net Thu Apr 27 09:56:12 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Thu Apr 27 09:56:12 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions Message-ID: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Hi! All, I just start writing my own python extension based on numpy. Couple of questions here: 1. I have some utility functions, such as wrappers for PyArray_GETPTR* needed be access by different extension modules. So, I put them in utlis.h and utlis.c. In utils.h, I need to include "numpy/arrayobject.h". But the compilation failed when I include it again in my extension module function, wrap.c: #include "numpy/arrayobject.h" #include "utils.h" When I remove it and use #include "utils.h" the compilation works. So, is it true that I can only include arrayobject.h once? 2. which import I should use in my initial function: import_array() or import_libnumarray() Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net From ndarray at mac.com Thu Apr 27 09:59:11 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 09:59:11 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: <44507A9D.8070902@ieee.org> References: <44507A9D.8070902@ieee.org> Message-ID: On 4/27/06, Travis Oliphant wrote: > [...] > The function (or macro) needs to implement the operation on the basic > data-type and if necessary set an error-flag in the floating-point > registers. > > If anybody has time to help implement these basic operations, it would > be greatly appreciated. I can help. To make sure we don't duplicate our effort, let's do the following: 1. I will add place-holders for all the necessary functions to make them return "NotImplemented". 2. I will then follow up with the list of functions that need to be filled out and we can then split the work. 3. We will also need to write tests that will make sure scalars behave similar to dimensionless arrays. If anyone would like to help with this, it will be greately appreciated. No C coding skills are necessary for that. From oliphant at ee.byu.edu Thu Apr 27 10:01:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 10:01:07 2006 Subject: [Numpy-discussion] matrix operations with axis=None In-Reply-To: <4450F6F4.2060800@ftw.at> References: <4450780C.9060403@ieee.org> <4450F6F4.2060800@ftw.at> Message-ID: <4450F7F2.1050707@ee.byu.edu> Ed Schofield wrote: >Travis Oliphant wrote: > > >>Keith Goodman wrote: >> >> >>>I noticed that the mean of a matrix is a matrix but the standard >>>deviation of a matrix is an array. Is that the expected behavior? I'm >>>also getting the wrong values (0 and nan) for the standard deviation. >>>Did I mess something up? >>> >>> >>This should be fixed now in SVN. If somebody can add a test that >>would be great. >> >>Note, that the methods taking axes also now preserve row and column >>orientation for matrices. >> >> >> >Well done for doing this. > >In fact, you beat me to it by a few hours; I was going to post a patch >this morning to preserve orientation with matrix operations. The >approach I took was different in one respect. > > I like your function-call approach as it ensures consistent behavior. >Returning scalars from methods with an axis=None argument is the current >behaviour of scipy sparse matrices, while axis=0 or axis=1 yields a >sparse matrix with height or width 1, like numpy matrices. A (1 x 1) >sparse matrix would be a strange object indeed, and would not be usable >in all contexts where scalars are expected. I suspect the same would >hold for (1 x 1) dense matrices. One example is that they cannot be >used as indices for Python lists. For some matrix methods, such as >argmax, returning a scalar would be highly desirable by allowing simpler >code. > >A potential drawback to this change is that matrix operations >aggregating along all dimensions, which would now share the behaviour of >numpy arrays, would be no longer be consistent with matrix operations >that aggregate along only one dimension, which currently do not reduce >dimension, because matrices are inherently 2-d. This could be an >argument for introducing a new vector class to represent one-dimensional >data with orientation. > > There is one more problem in that matrix-operations will not be preserved in all cases as they would have before. However, I suppose somebody doing a reduce over all dimensions would probably not expect the result to be a matrix, so I don't think it's a big drawback. Consistency with sparse matrices is another reason for returning a scalar. -Travis From Fernando.Perez at colorado.edu Thu Apr 27 10:04:01 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Apr 27 10:04:01 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: References: <44507A9D.8070902@ieee.org> Message-ID: <4450F93D.9050905@colorado.edu> Sasha wrote: > On 4/27/06, Travis Oliphant wrote: > >>[...] >>The function (or macro) needs to implement the operation on the basic >>data-type and if necessary set an error-flag in the floating-point >>registers. >> >>If anybody has time to help implement these basic operations, it would >>be greatly appreciated. > > > I can help. To make sure we don't duplicate our effort, let's do the following: > > 1. I will add place-holders for all the necessary functions to make > them return "NotImplemented". just a minor reminder: raise NotImplementedError is the standard idiom for this. Cheers, f From kwgoodman at gmail.com Thu Apr 27 10:05:05 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu Apr 27 10:05:05 2006 Subject: [Numpy-discussion] matrix.std() returns array In-Reply-To: <4450780C.9060403@ieee.org> References: <4450780C.9060403@ieee.org> Message-ID: On 4/27/06, Travis Oliphant wrote: > This should be fixed now in SVN. If somebody can add a test that would > be great. > > Note, that the methods taking axes also now preserve row and column > orientation for matrices. Hey, it works. Thank you. From ndarray at mac.com Thu Apr 27 10:52:01 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 10:52:01 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: On 4/27/06, Keith Goodman wrote: > [...] > In Octave or Matlab, all you need to do is sum(x,1). For example: > > >> x = rand(1,4) > x = > > 0.56755 0.24575 0.53804 0.36521 > > >> sum(x,1) > ans = > > 0.56755 0.24575 0.53804 0.36521 > How is this different from Numpy: >>> x = matrix(rand(4)) >>> sum(x.T, 1) matrix([[ 0.36186805], [ 0.90198107], [ 0.60407661], [ 0.49523327]]) From kwgoodman at gmail.com Thu Apr 27 11:05:03 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu Apr 27 11:05:03 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: On 4/27/06, Sasha wrote: > On 4/27/06, Keith Goodman wrote: > > [...] > > In Octave or Matlab, all you need to do is sum(x,1). For example: > > > > >> x = rand(1,4) > > x = > > > > 0.56755 0.24575 0.53804 0.36521 > > > > >> sum(x,1) > > ans = > > > > 0.56755 0.24575 0.53804 0.36521 > > > > How is this different from Numpy: > > >>> x = matrix(rand(4)) > >>> sum(x.T, 1) > matrix([[ 0.36186805], > [ 0.90198107], > [ 0.60407661], > [ 0.49523327]]) > Exactly. That's why the OP doesn't need to write a special function in Matlab called SumColumns. From Chris.Barker at noaa.gov Thu Apr 27 11:11:03 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 27 11:11:03 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: <4451090C.5020901@noaa.gov> Keith Goodman wrote: > Exactly. That's why the OP doesn't need to write a special function in > Matlab called SumColumns. "Didn't". I haven't used MATLAB for much in years. Back in the day, that feature didn't exist. Or at least was poorly enough documented that i didn't think it existed. Matlab didn't used to only support 2-d arrays as well. Anyway, the point was that a (n,) array and a (n,1) array and a (1,n) array are all different, and that difference should be preserved. I'm still confused as to what behavior Sasha wants that doesn't exist. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Thu Apr 27 11:17:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 11:17:02 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: References: <44507A9D.8070902@ieee.org> Message-ID: <44510A6E.4090906@ee.byu.edu> Sasha wrote: >On 4/27/06, Travis Oliphant wrote: > > >>[...] >>The function (or macro) needs to implement the operation on the basic >>data-type and if necessary set an error-flag in the floating-point >>registers. >> >>If anybody has time to help implement these basic operations, it would >>be greatly appreciated. >> >> > >I can help. To make sure we don't duplicate our effort, let's do the following: > > > Thanks for your help. >1. I will add place-holders for all the necessary functions to make > > >them return "NotImplemented". > > The Python-object-returning functions are already there. All that is missing is the ctype functions to actually do the computation. So, I'm not sure what you mean. >2. I will then follow up with the list of functions that need to be >filled out and we can then split the work. > > This would be good to get a list. Some of the functions may require some repetition of what's in umathmodule.c. Let's just do the repetition for now and think about code refactoring after we know better what is actually duplicated. >3. We will also need to write tests that will make sure scalars behave >similar to dimensionless arrays. If anyone would like to help with >this, it will be greately appreciated. No C coding skills are >necessary for that. > > Tests would be necessary to ensure consistency. Thanks for jumping in... -Travis From cookedm at physics.mcmaster.ca Thu Apr 27 11:30:05 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 27 11:30:05 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: <4450F93D.9050905@colorado.edu> (Fernando Perez's message of "Thu, 27 Apr 2006 11:02:53 -0600") References: <44507A9D.8070902@ieee.org> <4450F93D.9050905@colorado.edu> Message-ID: Fernando Perez writes: > Sasha wrote: >> On 4/27/06, Travis Oliphant wrote: >> >>>[...] >>>The function (or macro) needs to implement the operation on the basic >>>data-type and if necessary set an error-flag in the floating-point >>>registers. >>> >>>If anybody has time to help implement these basic operations, it would >>>be greatly appreciated. >> I can help. To make sure we don't duplicate our effort, let's do >> the following: >> 1. I will add place-holders for all the necessary functions to make >> them return "NotImplemented". > > just a minor reminder: > > raise NotImplementedError > > is the standard idiom for this. Just a note: For __xxx__ methods, "return NotImplemented" is the standard idiom. See section 3.3.8 (Coercion rules) of the Python 2.4 language manual: For most intents and purposes, an operator that returns NotImplemented is treated the same as one that is not implemented at all. I believe the idea is that it's not actually an error for an __xxx__ method to not be implemented, as there are fallbacks. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From ndarray at mac.com Thu Apr 27 11:32:08 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 11:32:08 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: <44510A6E.4090906@ee.byu.edu> References: <44507A9D.8070902@ieee.org> <44510A6E.4090906@ee.byu.edu> Message-ID: On 4/27/06, Travis Oliphant wrote: > [ ... ] > > The Python-object-returning functions are already there. All that is > missing is the ctype functions to actually do the computation. So, I'm > not sure what you mean. > I did not realize that. However, it is still reasonable to add non-working prototypes to kill the warnings first marked by /* XXX */. I will do that before the end of the day. > >2. I will then follow up with the list of functions that need to be > >filled out and we can then split the work. > > > > > This would be good to get a list. See attached. -------------- next part -------------- byte_ctype_multiply ubyte_ctype_multiply short_ctype_multiply ushort_ctype_multiply int_ctype_multiply uint_ctype_multiply long_ctype_multiply ulong_ctype_multiply longlong_ctype_multiply ulonglong_ctype_multiply byte_ctype_divide ubyte_ctype_divide short_ctype_divide ushort_ctype_divide int_ctype_divide uint_ctype_divide long_ctype_divide ulong_ctype_divide longlong_ctype_divide ulonglong_ctype_divide byte_ctype_remainder ubyte_ctype_remainder short_ctype_remainder ushort_ctype_remainder int_ctype_remainder uint_ctype_remainder long_ctype_remainder ulong_ctype_remainder longlong_ctype_remainder ulonglong_ctype_remainder byte_ctype_divmod ubyte_ctype_divmod short_ctype_divmod ushort_ctype_divmod int_ctype_divmod uint_ctype_divmod long_ctype_divmod ulong_ctype_divmod longlong_ctype_divmod ulonglong_ctype_divmod byte_ctype_power ubyte_ctype_power short_ctype_power ushort_ctype_power int_ctype_power uint_ctype_power long_ctype_power ulong_ctype_power longlong_ctype_power ulonglong_ctype_power byte_ctype_floor_divide ubyte_ctype_floor_divide short_ctype_floor_divide ushort_ctype_floor_divide int_ctype_floor_divide uint_ctype_floor_divide long_ctype_floor_divide ulong_ctype_floor_divide longlong_ctype_floor_divide ulonglong_ctype_floor_divide byte_ctype_true_divide ubyte_ctype_true_divide short_ctype_true_divide ushort_ctype_true_divide int_ctype_true_divide uint_ctype_true_divide long_ctype_true_divide ulong_ctype_true_divide longlong_ctype_true_divide ulonglong_ctype_true_divide byte_ctype_lshift ubyte_ctype_lshift short_ctype_lshift ushort_ctype_lshift int_ctype_lshift uint_ctype_lshift long_ctype_lshift ulong_ctype_lshift longlong_ctype_lshift ulonglong_ctype_lshift byte_ctype_rshift ubyte_ctype_rshift short_ctype_rshift ushort_ctype_rshift int_ctype_rshift uint_ctype_rshift long_ctype_rshift ulong_ctype_rshift longlong_ctype_rshift ulonglong_ctype_rshift byte_ctype_and ubyte_ctype_and short_ctype_and ushort_ctype_and int_ctype_and uint_ctype_and long_ctype_and ulong_ctype_and longlong_ctype_and ulonglong_ctype_and byte_ctype_or ubyte_ctype_or short_ctype_or ushort_ctype_or int_ctype_or uint_ctype_or long_ctype_or ulong_ctype_or longlong_ctype_or ulonglong_ctype_or byte_ctype_xor ubyte_ctype_xor short_ctype_xor ushort_ctype_xor int_ctype_xor uint_ctype_xor long_ctype_xor ulong_ctype_xor longlong_ctype_xor ulonglong_ctype_xor float_ctype_remainder double_ctype_remainder longdouble_ctype_remainder cfloat_ctype_remainder cdouble_ctype_remainder clongdouble_ctype_remainder float_ctype_divmod double_ctype_divmod longdouble_ctype_divmod cfloat_ctype_divmod cdouble_ctype_divmod clongdouble_ctype_divmod float_ctype_power double_ctype_power longdouble_ctype_power cfloat_ctype_power cdouble_ctype_power clongdouble_ctype_power cfloat_cfloat_divide cdouble_cfloat_divide clongdouble_cfloat_divide byte_ctype_negative ubyte_ctype_negative short_ctype_negative ushort_ctype_negative int_ctype_negative uint_ctype_negative long_ctype_negative ulong_ctype_negative longlong_ctype_negative ulonglong_ctype_negative float_ctype_negative double_ctype_negative longdouble_ctype_negative cfloat_ctype_negative cdouble_ctype_negative clongdouble_ctype_negative byte_ctype_positive ubyte_ctype_positive short_ctype_positive ushort_ctype_positive int_ctype_positive uint_ctype_positive long_ctype_positive ulong_ctype_positive longlong_ctype_positive ulonglong_ctype_positive float_ctype_positive double_ctype_positive longdouble_ctype_positive cfloat_ctype_positive cdouble_ctype_positive clongdouble_ctype_positive byte_ctype_absolute ubyte_ctype_absolute short_ctype_absolute ushort_ctype_absolute int_ctype_absolute uint_ctype_absolute long_ctype_absolute ulong_ctype_absolute longlong_ctype_absolute ulonglong_ctype_absolute float_ctype_absolute double_ctype_absolute longdouble_ctype_absolute cfloat_ctype_absolute cdouble_ctype_absolute clongdouble_ctype_absolute byte_ctype_nonzero ubyte_ctype_nonzero short_ctype_nonzero ushort_ctype_nonzero int_ctype_nonzero uint_ctype_nonzero long_ctype_nonzero ulong_ctype_nonzero longlong_ctype_nonzero ulonglong_ctype_nonzero float_ctype_nonzero double_ctype_nonzero longdouble_ctype_nonzero cfloat_ctype_nonzero cdouble_ctype_nonzero clongdouble_ctype_nonzero byte_ctype_invert ubyte_ctype_invert short_ctype_invert ushort_ctype_invert int_ctype_invert uint_ctype_invert long_ctype_invert ulong_ctype_invert longlong_ctype_invert ulonglong_ctype_invert float_ctype_invert double_ctype_invert longdouble_ctype_invert cfloat_ctype_invert cdouble_ctype_invert clongdouble_ctype_invert From cookedm at physics.mcmaster.ca Thu Apr 27 11:32:11 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 27 11:32:11 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> (Gennan Chen's message of "Thu, 27 Apr 2006 09:55:42 -0700") References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Message-ID: Gennan Chen writes: > Hi! All, > > I just start writing my own python extension based on numpy. Couple > of questions here: > > 1. I have some utility functions, such as wrappers for > PyArray_GETPTR* needed be access by different extension modules. So, > I put them in utlis.h and utlis.c. In utils.h, I need to include > "numpy/arrayobject.h". But the compilation failed when I include it > again in my extension module function, wrap.c: > > #include "numpy/arrayobject.h" > #include "utils.h" > > When I remove it and use > > #include "utils.h" > > the compilation works. So, is it true that I can only include > arrayobject.h once? What is the compiler error message? > 2. which import I should use in my initial function: > > import_array() This one. It's the one to use for Numeric, numarray, and numpy. > or > import_libnumarray() This is for numarray, the other Numeric derivative. It pulls in the numarray-specific stuff IIRC. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Thu Apr 27 11:36:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 11:36:06 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Message-ID: <44510F04.3020806@ee.byu.edu> Gennan Chen wrote: > Hi! All, > > I just start writing my own python extension based on numpy. Couple > of questions here: > > 1. I have some utility functions, such as wrappers for > PyArray_GETPTR* needed be access by different extension modules. So, > I put them in utlis.h and utlis.c. In utils.h, I need to include > "numpy/arrayobject.h". But the compilation failed when I include it > again in my extension module function, wrap.c: > > #include "numpy/arrayobject.h" > #include "utils.h" > > When I remove it and use > > #include "utils.h" > > the compilation works. So, is it true that I can only include > arrayobject.h once? No, you can include arrayobject.h more than once. However, if you make use of C-API functions (not just macros that access elements of the array) in more than one file for the same extension module, you need to do a couple of things to make it work. In the original file you must define PY_ARRAY_UNIQUE_SYMBOL to something unique to your extension module before you include the arrayobject.h file. In the helper c file you must define PY_ARRAY_UNIQUE_SYMBOL and define NO_IMPORT_ARRAY prior to including the arrayobject.h Thus, in wrap.c you do (feel free to change the name from _chen_extension to something else) #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #include "numpy/arrayobject.h" and in utils.c you do #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #define NO_IMPORT_ARRAY #include "numpy/arrayobject.h" > > 2. which import I should use in my initial function: > > import_array() import_array() -Travis From oliphant at ee.byu.edu Thu Apr 27 11:40:10 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 11:40:10 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4451090C.5020901@noaa.gov> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> <4451090C.5020901@noaa.gov> Message-ID: <44510FD2.1090502@ee.byu.edu> Christopher Barker wrote: > Keith Goodman wrote: > >> Exactly. That's why the OP doesn't need to write a special function in >> Matlab called SumColumns. > > > "Didn't". I haven't used MATLAB for much in years. Back in the day, > that feature didn't exist. Or at least was poorly enough documented > that i didn't think it existed. Matlab didn't used to only support 2-d > arrays as well. > > Anyway, the point was that a (n,) array and a (n,1) array and a (1,n) > array are all different, and that difference should be preserved. > > I'm still confused as to what behavior Sasha wants that doesn't exist. I'm not exactly sure. But, one of the things I think he has suggested (please tell me if my understanding is wrong) is to allow a 2x3 array to be "broadcast" to a (2n)x(3m) array by repeated copying as needed. -Travis From gnchen at cortechs.net Thu Apr 27 12:24:38 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Thu Apr 27 12:24:38 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <44510F04.3020806@ee.byu.edu> References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> <44510F04.3020806@ee.byu.edu> Message-ID: Thanks! That solve the problem. May I ask what does those #define really means?? Gen On Apr 27, 2006, at 11:35 AM, Travis Oliphant wrote: > Gennan Chen wrote: > >> Hi! All, >> >> I just start writing my own python extension based on numpy. >> Couple of questions here: >> >> 1. I have some utility functions, such as wrappers for >> PyArray_GETPTR* needed be access by different extension modules. >> So, I put them in utlis.h and utlis.c. In utils.h, I need to >> include "numpy/arrayobject.h". But the compilation failed when I >> include it again in my extension module function, wrap.c: >> >> #include "numpy/arrayobject.h" >> #include "utils.h" >> >> When I remove it and use >> >> #include "utils.h" >> >> the compilation works. So, is it true that I can only include >> arrayobject.h once? > > > No, you can include arrayobject.h more than once. However, if you > make use of C-API functions (not just macros that access elements > of the array) in more than one file for the same extension module, > you need to do a couple of things to make it work. > > In the original file you must define PY_ARRAY_UNIQUE_SYMBOL to > something unique to your extension module before you include the > arrayobject.h file. > > In the helper c file you must define PY_ARRAY_UNIQUE_SYMBOL and > define NO_IMPORT_ARRAY prior to including the arrayobject.h > > Thus, in wrap.c you do (feel free to change the name from > _chen_extension to something else) > > #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #include "numpy/ > arrayobject.h" > > and in > > utils.c you do > > #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #define > NO_IMPORT_ARRAY > #include "numpy/arrayobject.h" > > >> >> 2. which import I should use in my initial function: >> >> import_array() > > > import_array() > > -Travis > > From gnchen at cortechs.net Thu Apr 27 12:24:41 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Thu Apr 27 12:24:41 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Message-ID: <8CD47186-A354-4C8A-B5AF-8BEC2CE82D2E@cortechs.net> Got it. Looks like ndimage still used the old one. Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net On Apr 27, 2006, at 11:31 AM, David M. Cooke wrote: > Gennan Chen writes: > >> Hi! All, >> >> I just start writing my own python extension based on numpy. Couple >> of questions here: >> >> 1. I have some utility functions, such as wrappers for >> PyArray_GETPTR* needed be access by different extension modules. So, >> I put them in utlis.h and utlis.c. In utils.h, I need to include >> "numpy/arrayobject.h". But the compilation failed when I include it >> again in my extension module function, wrap.c: >> >> #include "numpy/arrayobject.h" >> #include "utils.h" >> >> When I remove it and use >> >> #include "utils.h" >> >> the compilation works. So, is it true that I can only include >> arrayobject.h once? > > What is the compiler error message? > >> 2. which import I should use in my initial function: >> >> import_array() > > This one. It's the one to use for Numeric, numarray, and numpy. > >> or >> import_libnumarray() > > This is for numarray, the other Numeric derivative. It pulls in the > numarray-specific stuff IIRC. > > -- > |>|\/|< > /--------------------------------------------------------------------- > -----\ > |David M. Cooke http:// > arbutus.physics.mcmaster.ca/dmc/ > |cookedm at physics.mcmaster.ca > From ndarray at mac.com Thu Apr 27 12:29:03 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 12:29:03 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <44510FD2.1090502@ee.byu.edu> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> <4451090C.5020901@noaa.gov> <44510FD2.1090502@ee.byu.edu> Message-ID: On 4/27/06, Travis Oliphant wrote: > [...] > > I'm still confused as to what behavior Sasha wants that doesn't exist. > > > I'm not exactly sure. But, one of the things I think he has suggested > (please tell me if my understanding is wrong) is to allow a 2x3 array to > be "broadcast" to a (2n)x(3m) array by repeated copying as needed. Yes, this is the only new feature that I've suggested. I was also hoping that the same code that allows shape=(3,) being broadcast to shape (2,3) can be reused to broadcast (3,) to (6,). The idea is that since in terms of memory operations broadcasting and repetition is the same, the code can be reused. The idea is that since repetition can be achieved using broadcasting: >>> x = zeros(3) >>> x.reshape((2,3)) += arange(3) >>> x array([0, 1, 2, 0, 1, 2]) if we allow x += arange(3), it can use the same code as broadcasting internally. From ndarray at mac.com Thu Apr 27 12:30:05 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 12:30:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> <4451090C.5020901@noaa.gov> <44510FD2.1090502@ee.byu.edu> Message-ID: On 4/27/06, Sasha wrote: > >>> x.reshape((2,3)) += arange(3) Oops, that should have been >>> x.reshape((2,3))[...] += arange(3) From Fernando.Perez at colorado.edu Thu Apr 27 12:58:02 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Apr 27 12:58:02 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: References: <44507A9D.8070902@ieee.org> <4450F93D.9050905@colorado.edu> Message-ID: <44512213.9090902@colorado.edu> David M. Cooke wrote: > Fernando Perez writes: > > >>Sasha wrote: >> >>>On 4/27/06, Travis Oliphant wrote: >>> >>> >>>>[...] >>>>The function (or macro) needs to implement the operation on the basic >>>>data-type and if necessary set an error-flag in the floating-point >>>>registers. >>>> >>>>If anybody has time to help implement these basic operations, it would >>>>be greatly appreciated. >>> >>>I can help. To make sure we don't duplicate our effort, let's do >>>the following: >>>1. I will add place-holders for all the necessary functions to make >>>them return "NotImplemented". >> >>just a minor reminder: >> >> raise NotImplementedError >> >>is the standard idiom for this. > > > Just a note: For __xxx__ methods, "return NotImplemented" is the > standard idiom. See section 3.3.8 (Coercion rules) of the Python 2.4 > language manual: > > For most intents and purposes, an operator that returns > NotImplemented is treated the same as one that is not implemented > at all. > > I believe the idea is that it's not actually an error for an __xxx__ > method to not be implemented, as there are fallbacks. You are right. It's worth remembering that the actual syntaxes are return NotImplemented and raise NotImplementedError /without/ quotes (as per the original msg), since these are actual python builtins, not strings. That way they can be properly handled by their return value or proper exception handling. Cheers, f From woeue at kandy.ccom.lk Thu Apr 27 18:28:06 2006 From: woeue at kandy.ccom.lk (Bert Morrow) Date: Thu Apr 27 18:28:06 2006 Subject: [Numpy-discussion] sob story Message-ID: <001b01c66a62$ee740da5$41bd8147@rf.ncuwi> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: overdue.gif Type: image/gif Size: 10245 bytes Desc: not available URL: From nvf at MIT.EDU Thu Apr 27 21:02:03 2006 From: nvf at MIT.EDU (Nick Fotopoulos) Date: Thu Apr 27 21:02:03 2006 Subject: [Numpy-discussion] Freeing memory allocated in C Message-ID: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> Dear numpy-discussion, I have written a python module in C which wraps a C library (FrameL) in order to read data from specially formatted files into Python arrays. It works, but I think have a memory leak, and I can't see what I might be doing wrong. This Python wrapper is almost identical to a Matlab wrapper, but the Matlab version doesn't leak. Perhaps someone here can help me out? I have read in many places that to return an array, one should wrap with PyArray_FromDimsAndData (or more modern versions) and then return it without freeing the memory. Does the same principle hold for strings? Are the following example snippets correct? // output2 = x-axis values relative to first data point. data = malloc(nData*sizeof(double)); for(i=0; istartX[0]+(double)i*dt; } shape[0] = nData; out2 = (PyArrayObject *) PyArray_FromDimsAndData(1,shape,PyArray_DOUBLE,(char *)data); //snip // output5 = gps start time as a string utc = vect->GTime - vect->ULeapS + FRGPSTAI; out5 = malloc(200*sizeof(char)); sprintf(out5,"Starting GPS time:%.1f UTC=%s", vect->GTime,FrStrGTime(utc)); //snip -- Free all memory not assigned to a return object return Py_BuildValue("(OOOdsss)",out1,out2,out3,out4,out5,out6,out7); I see in the Numpy book that I should modernize PyArray_FromDimsAndData, but will it be incompatible with users who have only Numeric? If the code above should not leak under your inspection, are there any other common places that python C modules often leak that I should check? As a side note, here is how I have been defining "leak". I have been measuring memory usage by opening a pipe to ps to check rss between reading in frames and invoking del on them. Memory usage increases, but does not decrease. In contrast, if I commit the same data in an array to a pickle file and read that in, invoking del reduces memory usage. Many thanks, Nick From robert.kern at gmail.com Thu Apr 27 21:14:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 27 21:14:02 2006 Subject: [Numpy-discussion] Re: Freeing memory allocated in C In-Reply-To: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> Message-ID: Nick Fotopoulos wrote: > Dear numpy-discussion, > > I have written a python module in C which wraps a C library (FrameL) in > order to read data from specially formatted files into Python arrays. > It works, but I think have a memory leak, and I can't see what I might > be doing wrong. This Python wrapper is almost identical to a Matlab > wrapper, but the Matlab version doesn't leak. Perhaps someone here can > help me out? > > I have read in many places that to return an array, one should wrap > with PyArray_FromDimsAndData (or more modern versions) and then return > it without freeing the memory. Does the same principle hold for > strings? Are the following example snippets correct? > > // output2 = x-axis values relative to first data point. > data = malloc(nData*sizeof(double)); > for(i=0; i data[i] = vect->startX[0]+(double)i*dt; > } > shape[0] = nData; > out2 = (PyArrayObject *) > PyArray_FromDimsAndData(1,shape,PyArray_DOUBLE,(char *)data); I wouldn't rely on PyArray_FromDimsAndData doing the right thing. Instead of malloc'ing a block of memory, why don't you create an empty array of the right size, use its data pointer to fill it with that for-loop, and then return that array object? > //snip > > // output5 = gps start time as a string > utc = vect->GTime - vect->ULeapS + FRGPSTAI; > out5 = malloc(200*sizeof(char)); > sprintf(out5,"Starting GPS time:%.1f UTC=%s", > vect->GTime,FrStrGTime(utc)); > > //snip -- Free all memory not assigned to a return object > > return Py_BuildValue("(OOOdsss)",out1,out2,out3,out4,out5,out6,out7); > > I see in the Numpy book that I should modernize > PyArray_FromDimsAndData, but will it be incompatible with users who > have only Numeric? Yes. However, I would suggest that new code should probably use just use numpy fully especially if the restrictions of the old Numeric API is causing you pain. The longer people support both, the longer people will *have* to support both. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Thu Apr 27 21:40:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 27 21:40:04 2006 Subject: [Numpy-discussion] Freeing memory allocated in C In-Reply-To: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> Message-ID: <44519C6E.80006@ieee.org> Nick Fotopoulos wrote: > Dear numpy-discussion, > > I have written a python module in C which wraps a C library (FrameL) > in order to read data from specially formatted files into Python > arrays. It works, but I think have a memory leak, and I can't see > what I might be doing wrong. This Python wrapper is almost identical > to a Matlab wrapper, but the Matlab version doesn't leak. Perhaps > someone here can help me out? > > I have read in many places that to return an array, one should wrap > with PyArray_FromDimsAndData (or more modern versions) and then return > it without freeing the memory. Does the same principle hold for > strings? Are the following example snippets correct? Why don't you just use PyArray_FromDims and let NumPy manage the memory? FromDimsAndData is only for situations where you can't manage the memory with Python. Therefore the memory is never freed. If you do want to have NumPy deallocate the memory when you are done, then you have to 1) Make sure you are using the same allocator as NumPy is... _pya_malloc is defined in arrayobject.h (in NumPy but not in Numeric) 2) Reset the array flag so that OWN_DATA is set out2->flags |= OWN_DATA As long as you are using the same memory allocator, this should work. The OWN_DATA flag instructs the deallocator to free the data. But, I would strongly suggest just using PyArray_FromDims and let NumPy allocate the new array for you. > > // output2 = x-axis values relative to first data point. > data = malloc(nData*sizeof(double)); > for(i=0; i data[i] = vect->startX[0]+(double)i*dt; > } > shape[0] = nData; > out2 = (PyArrayObject *) > PyArray_FromDimsAndData(1,shape,PyArray_DOUBLE,(char *)data); > > //snip > > // output5 = gps start time as a string > utc = vect->GTime - vect->ULeapS + FRGPSTAI; > out5 = malloc(200*sizeof(char)); > sprintf(out5,"Starting GPS time:%.1f UTC=%s", > vect->GTime,FrStrGTime(utc)); > > //snip -- Free all memory not assigned to a return object > > return Py_BuildValue("(OOOdsss)",out1,out2,out3,out4,out5,out6,out7); > > > I see in the Numpy book that I should modernize > PyArray_FromDimsAndData, but will it be incompatible with users who > have only Numeric? Yes, the only issue, however, is that PyArray_FromDims and friends will only allow int-length sizes which on 64-bit computers is not as large as intp-length sizes. So, if you don't care about allowing large sizes then you can use the old Numeric C-API. > > If the code above should not leak under your inspection, are there any > other common places that python C modules often leak that I should check? All of the malloc calls in your code leak. In general you should not assume that Python will deallocate memory you have allocated. Python uses it's own memory manager so even if you manage to arange things so that Python will free your memory (and you really have to hack things to do that), then you can run into trouble if you try mixing system malloc calls with Python's deallocation. The proper strategy for your arrays is to use PyArray_SimpleNew and then get the data-pointer to fill using PyArray_DATA(...). The proper way to handle strings is to create a new string (say using PyString_FromFormat) and then return everything as objects. /* make sure shape is defined as intp unless you don't care about 64-bit */ obj2 = PyArray_SimpleNew(1, shape, PyArray_DOUBLE); data = (double *)PyArray_DATA(obj2) [snip...] out5 = PyString_FromFormat("Starting GPS time:%.1f UTC=%s", vect->GTime,FrStrGTime(utc)); return Py_BuildValue("(NNNdNNN)",out1,out2,out3,out4,out5,out6,out7); Make sure you use the 'N' tag so that another reference count isn't generated. The 'O' tag will increase the reference count of your objects by one which is is not necessarily what you want (but sometimes you do). Good luck, -Travis From oliphant.travis at ieee.org Fri Apr 28 00:14:16 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 28 00:14:16 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing Message-ID: <4451C076.40608@ieee.org> The scalar math module is complete and ready to be tested. It should speed up code that relies heavily on scalar arithmetic by by-passing the ufunc machinery. It needs lots of testing to be sure that it is doing the "right" thing. To enable scalarmath you need to import numpy.core.scalarmath You cannot disable it once it's enabled except by restarting Python. If we need that feature we can add it. The array scalars respond to the error modes of ufuncs. There is an experimental function called alter_scalars that replaces the Python int, float, and complex number tables with the array scalar equivalents. Thus, to amaze (or seriously annoy) your Python friends you can do import numpy.core.scalarmath as ncs ncs.alter_scalars(int) 1 / 0 This will return 0 unless you change the error modes... ncs.retore_scalars(int) Will put things back the way Guido intended.... Please try it out and send us error reports. Many thanks to Sasha for his help in getting all the code so it at least compiles and loads. All bugs should be blamed on me, though... Best, -Travis From arnd.baecker at web.de Fri Apr 28 00:48:04 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Fri Apr 28 00:48:04 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: Hi Travis, On Fri, 28 Apr 2006, Travis Oliphant wrote: > > The scalar math module is complete and ready to be tested. It should > speed up code that relies heavily on scalar arithmetic by by-passing the > ufunc machinery. > > It needs lots of testing to be sure that it is doing the "right" > thing. To enable scalarmath you need to > > import numpy.core.scalarmath > > You cannot disable it once it's enabled except by restarting Python. If > we need that feature we can add it. The array scalars respond to the > error modes of ufuncs. > > There is an experimental function called alter_scalars that replaces the > Python int, float, and complex number tables with the array scalar > equivalents. Thus, to amaze (or seriously annoy) your Python friends LOL ;-) > you can do > > import numpy.core.scalarmath as ncs > > ncs.alter_scalars(int) > > 1 / 0 > > This will return 0 unless you change the error modes... > > ncs.retore_scalars(int) > > Will put things back the way Guido intended.... > > > Please try it out and send us error reports. Many thanks to Sasha for > his help in getting all the code so it at least compiles and loads. All > bugs should be blamed on me, though... Well, it does not compile for me (64 Bit opteron, as usual;-): gcc options: '-pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC' compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.4/numpy/core -Inumpy/core/src -Inumpy/core/include -I/scr/python/include/python2.4 -c' gcc: build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:472: error: redefinition of 'ulong_ctype_multiply' build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: error: previous definition of 'ulong_ctype_multiply' was here build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: warning: 'ulong_ctype_multiply' defined but not used build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:472: error: redefinition of 'ulong_ctype_multiply' build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: error: previous definition of 'ulong_ctype_multiply' was here build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: warning: 'ulong_ctype_multiply' defined but not used error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-2.4/numpy/core -Inumpy/core/src -Inumpy/core/include -I/scr/python/include/python2.4 -c build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c -o build/temp.linux-x86_64-2.4/build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.o" failed with exit status 1 (I can't look into this now - meeting in -2 minutes ;-) Best, Arnd From schofield at ftw.at Fri Apr 28 01:32:00 2006 From: schofield at ftw.at (Ed Schofield) Date: Fri Apr 28 01:32:00 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: <4451D3F0.7080408@ftw.at> Travis Oliphant wrote: > > The scalar math module is complete and ready to be tested. It should > speed up code that relies heavily on scalar arithmetic by by-passing > the ufunc machinery. Excellent! > It needs lots of testing to be sure that it is doing the "right" thing. With revision 2454 I get a segfault in numpy.test() after importing numpy.core.scalarmath: check_1 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_2 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_3 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_gpaths (numpy.distutils.tests.test_misc_util.test_gpaths) ... ok check_1 (numpy.distutils.tests.test_misc_util.test_minrelpath) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_double) Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208403744 (LWP 11232)] 0xb7142cf7 in int_richcompare (self=0x81c0ab8, other=0x8141dbc, cmp_op=3) at build/src.linux-i686-2.4/numpy/core/src/scalarmathmodule.c:19120 19120 PyArrayScalar_RETURN_TRUE; (gdb) bt #0 0xb7142cf7 in int_richcompare (self=0x81c0ab8, other=0x8141dbc, cmp_op=3) at build/src.linux-i686-2.4/numpy/core/src/scalarmathmodule.c:19120 #1 0x0807ce1f in PyObject_Print () #2 0x0807e451 in PyObject_RichCompare () Is this helpful? -- Ed From steffen.loeck at gmx.de Fri Apr 28 01:34:07 2006 From: steffen.loeck at gmx.de (Steffen Loeck) Date: Fri Apr 28 01:34:07 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: <200604281033.19781.steffen.loeck@gmx.de> On Friday 28 April 2006 09:12 am, Travis Oliphant wrote: > Please try it out and send us error reports. Many thanks to Sasha for > his help in getting all the code so it at least compiles and loads. All > bugs should be blamed on me, though... Running the tests with numpy.test(10) i get: /test/lib/python2.3/site-packages/numpy/testing/numpytest.py:179: DeprecationWarning: Non-ASCII character '\xf2' in file/test/lib/python2.3/site-packages/numpy/lib/tests/test_ufunclike.pyc on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details m = imp.load_module(name, open(filename), filename,('.py','U',1)) E................................../test/lib/python2.3/site-packages/numpy/testing/numpytest.py:179: DeprecationWarning: Non-ASCII character '\xf2' in file test/lib/python2.3/site-packages/numpy/lib/tests/test_polynomial.pyc on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details m = imp.load_module(name, open(filename), filename,('.py','U',1)) E........................................................................... ====================================================================== ERROR: check_doctests (numpy.lib.tests.test_ufunclike.test_docs) ---------------------------------------------------------------------- Traceback (most recent call last): File "/test/lib/python2.3/site-packages/numpy/lib/tests/test_ufunclike.py", line 59, in check_doctests def check_doctests(self): return self.rundocs() File "/test//lib/python2.3/site-packages/numpy/testing/numpytest.py", line 179, in rundocs m = imp.load_module(name, open(filename), filename,('.py','U',1)) File "test/lib/python2.3/site-packages/numpy/lib/tests/test_ufunclike.pyc", line 1 ;? ^ SyntaxError: invalid syntax ====================================================================== ERROR: check_doctests (numpy.lib.tests.test_polynomial.test_docs) ---------------------------------------------------------------------- Traceback (most recent call last): File "/test/lib/python2.3/site-packages/numpy/lib/tests/test_polynomial.py", line 79, in check_doctests def check_doctests(self): return self.rundocs() File "/test//lib/python2.3/site-packages/numpy/testing/numpytest.py", line 179, in rundocs m = imp.load_module(name, open(filename), filename,('.py','U',1)) File "/test/lib/python2.3/site-packages/numpy/lib/tests/test_polynomial.pyc", line 1 ;? ^ SyntaxError: invalid syntax I have no idea, where this comes from. Regards, Steffen From fullung at gmail.com Fri Apr 28 02:39:03 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 28 02:39:03 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <8CD47186-A354-4C8A-B5AF-8BEC2CE82D2E@cortechs.net> Message-ID: <018c01c66aa7$77764480$0a84a8c0@dsp.sun.ac.za> Hello all I've collected the information from this thread along with links to some recent threads on writing C extensions on the wiki at: http://www.scipy.org/Cookbook/C_Extensions Feel free to contribute! Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Gennan Chen > Sent: 27 April 2006 21:23 > To: David M.Cooke > Cc: Numpy-discussion at lists.sourceforge.net > Subject: Re: [Numpy-discussion] newbie for writing numpy/scipy extensions > > Got it. Looks like ndimage still used the old one. > > Gen-Nan Chen, PhD > Chief Scientist > Research and Development Group > CorTechs Labs Inc (www.cortechs.net) > 1020 Prospect St., #304, La Jolla, CA, 92037 > Tel: 1-858-459-9700 ext 16 > Fax: 1-858-459-9705 > Email: gnchen at cortechs.net > > > On Apr 27, 2006, at 11:31 AM, David M. Cooke wrote: > > > Gennan Chen writes: > > > >> Hi! All, > >> > >> I just start writing my own python extension based on numpy. Couple > >> of questions here: > >> > >> 1. I have some utility functions, such as wrappers for > >> PyArray_GETPTR* needed be access by different extension modules. So, > >> I put them in utlis.h and utlis.c. In utils.h, I need to include > >> "numpy/arrayobject.h". But the compilation failed when I include it > >> again in my extension module function, wrap.c: > >> > >> #include "numpy/arrayobject.h" > >> #include "utils.h" > >> > >> When I remove it and use > >> > >> #include "utils.h" > >> > >> the compilation works. So, is it true that I can only include > >> arrayobject.h once? > > > > What is the compiler error message? > > > >> 2. which import I should use in my initial function: > >> > >> import_array() > > > > This one. It's the one to use for Numeric, numarray, and numpy. > > > >> or > >> import_libnumarray() > > > > This is for numarray, the other Numeric derivative. It pulls in the > > numarray-specific stuff IIRC. > > > > -- > > |>|\/|< > > /--------------------------------------------------------------------- > > -----\ > > |David M. Cooke http:// > > arbutus.physics.mcmaster.ca/dmc/ > > |cookedm at physics.mcmaster.ca From lcordier at point45.com Fri Apr 28 06:36:10 2006 From: lcordier at point45.com (Louis Cordier) Date: Fri Apr 28 06:36:10 2006 Subject: [Numpy-discussion] Bug Message-ID: Hi, I am not sure if this is the proper place to do a bug post. I looked at the active tickets on http://projects.scipy.org/scipy/numpy/ but didn't feel confident to go and create a new one. ;) Anyway the current release version 0.9.6 have some broken behavior. I guess some example code would illustrate it best. ---8<---------------- >>> z = numpy.zeros((10,10), 'O') >>> z.fill(None) >>> z.fill([]) Segmentation fault (core dumped) This happens on both Linux and FreeBSD machines. (both builds use *_lite versions of Lapack) Linux bellagio 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 i386 GNU/Linux Python 2.4.1 gcc version 4.0.0 20050519 (Red Hat 4.0.0-8) FreeBSD cerberus.intranet 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #0: Wed Mar 15 16:06:48 UTC 2006 Python 2.4.2 gcc version 3.4.2 [FreeBSD] 20040728 I assume fill() will need to make a copy, of the object for each coordinate in the matix. ---8<---------------- While, >>> import numpy >>> z = numpy.zeros((2,2), 'O') >>> z array([[0, 0], [0, 0]], dtype=object) >>> z.fill([1]) >>> z array([[1, 1], [1, 1]], dtype=object) and >>> z.fill([1,2,3]) >>> z array([[1, 1], [1, 1]], dtype=object) I would have expected, >>> z array([[[1], [1]], [[1], [1]]], dtype=object) and >>> z array([[[1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2, 3]]], dtype=object) Regards, Louis. -- Louis Cordier cell: +27721472305 Point45 Entertainment (Pty) Ltd. http://www.point45.org From ndarray at mac.com Fri Apr 28 09:04:09 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 28 09:04:09 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: The core dump is definitely a bug. I reproduced it on my Linux system. Please create a ticket. I am not sure whether fill should copy objects or not. When you populate an array with immutable objects, creating multiple copies is a waste. On 4/28/06, Louis Cordier wrote: > > Hi, I am not sure if this is the proper place to do a bug post. > I looked at the active tickets on http://projects.scipy.org/scipy/numpy/ > but didn't feel confident to go and create a new one. ;) > > Anyway the current release version 0.9.6 have some broken behavior. > I guess some example code would illustrate it best. > > ---8<---------------- > > >>> z = numpy.zeros((10,10), 'O') > >>> z.fill(None) > >>> z.fill([]) > Segmentation fault (core dumped) > > This happens on both Linux and FreeBSD machines. > (both builds use *_lite versions of Lapack) > > Linux bellagio 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 > i386 GNU/Linux > Python 2.4.1 > gcc version 4.0.0 20050519 (Red Hat 4.0.0-8) > > FreeBSD cerberus.intranet 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #0: Wed > Mar 15 16:06:48 UTC 2006 > Python 2.4.2 > gcc version 3.4.2 [FreeBSD] 20040728 > > I assume fill() will need to make a copy, of the object > for each coordinate in the matix. > > ---8<---------------- > > While, > > >>> import numpy > >>> z = numpy.zeros((2,2), 'O') > >>> z > array([[0, 0], > [0, 0]], dtype=object) > >>> z.fill([1]) > >>> z > array([[1, 1], > [1, 1]], dtype=object) > > and > > >>> z.fill([1,2,3]) > >>> z > array([[1, 1], > [1, 1]], dtype=object) > > > I would have expected, > > >>> z > array([[[1], [1]], > [[1], [1]]], dtype=object) > > and > > >>> z > array([[[1, 2, 3], [1, 2, 3]], > [[1, 2, 3], [1, 2, 3]]], dtype=object) > > > Regards, Louis. > > -- > Louis Cordier cell: +27721472305 > Point45 Entertainment (Pty) Ltd. http://www.point45.org > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ndarray at mac.com Fri Apr 28 10:04:08 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 28 10:04:08 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: See . On 4/28/06, Sasha wrote: > The core dump is definitely a bug. I reproduced it on my Linux > system. Please create a ticket. I am not sure whether fill should > copy objects or not. When you populate an array with immutable > objects, creating multiple copies is a waste. > > On 4/28/06, Louis Cordier wrote: > > > > Hi, I am not sure if this is the proper place to do a bug post. > > I looked at the active tickets on http://projects.scipy.org/scipy/numpy/ > > but didn't feel confident to go and create a new one. ;) > > > > Anyway the current release version 0.9.6 have some broken behavior. > > I guess some example code would illustrate it best. > > > > ---8<---------------- > > > > >>> z = numpy.zeros((10,10), 'O') > > >>> z.fill(None) > > >>> z.fill([]) > > Segmentation fault (core dumped) > > > > This happens on both Linux and FreeBSD machines. > > (both builds use *_lite versions of Lapack) > > > > Linux bellagio 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 > > i386 GNU/Linux > > Python 2.4.1 > > gcc version 4.0.0 20050519 (Red Hat 4.0.0-8) > > > > FreeBSD cerberus.intranet 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #0: Wed > > Mar 15 16:06:48 UTC 2006 > > Python 2.4.2 > > gcc version 3.4.2 [FreeBSD] 20040728 > > > > I assume fill() will need to make a copy, of the object > > for each coordinate in the matix. > > > > ---8<---------------- > > > > While, > > > > >>> import numpy > > >>> z = numpy.zeros((2,2), 'O') > > >>> z > > array([[0, 0], > > [0, 0]], dtype=object) > > >>> z.fill([1]) > > >>> z > > array([[1, 1], > > [1, 1]], dtype=object) > > > > and > > > > >>> z.fill([1,2,3]) > > >>> z > > array([[1, 1], > > [1, 1]], dtype=object) > > > > > > I would have expected, > > > > >>> z > > array([[[1], [1]], > > [[1], [1]]], dtype=object) > > > > and > > > > >>> z > > array([[[1, 2, 3], [1, 2, 3]], > > [[1, 2, 3], [1, 2, 3]]], dtype=object) > > > > > > Regards, Louis. > > > > -- > > Louis Cordier cell: +27721472305 > > Point45 Entertainment (Pty) Ltd. http://www.point45.org > > > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From lcordier at point45.com Fri Apr 28 10:24:04 2006 From: lcordier at point45.com (Louis Cordier) Date: Fri Apr 28 10:24:04 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: > See . >> > >>> z.fill([1,2,3]) >> > >>> z >> > array([[1, 1], >> > [1, 1]], dtype=object) >> > >> > I would have expected, >> > >> > >>> z >> > array([[[1, 2, 3], [1, 2, 3]], >> > [[1, 2, 3], [1, 2, 3]]], dtype=object) Souldn't the second example be a ticket ? Or is it part of #86 ? Regards, Louis. -- Louis Cordier cell: +27721472305 Point45 Entertainment (Pty) Ltd. http://www.point45.org From ndarray at mac.com Fri Apr 28 10:49:02 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 28 10:49:02 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: On 4/28/06, Louis Cordier wrote: > Souldn't the second example be a ticket ? > Or is it part of #86 ? I think all your examples are different signs of the same problem. You can help by converting your examples into unit tests to be added to say test_multiarray.py and attaching a patch to the ticket. A brief comment for the developers: the problem that Louis reported is caused by the fact that x.fill([]) creates an empty array internally instead of a scalar object array containing an empty list. Note that numpy does not even have a good notation for the required object: >>> from numpy import * >>> x = zeros(1,'O') >>> x.shape=() >>> x[()] = [] >>> x array([], dtype=object) >>> x.shape () but >>> array([], dtype=object).shape (0,) From fullung at gmail.com Fri Apr 28 15:32:13 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 28 15:32:13 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> Message-ID: <007701c66b13$8365df00$0a84a8c0@dsp.sun.ac.za> Hello Travis I'm having some problems compiling the scalarmath code with the Visual Studio .NET 2003 compiler. Specifically, the compiler is failing to link in the llabs, fabsf and sqrtf functions. The reason it is not finding these symbols could be explained by the following errors I get when building the object file by hand using the parameters distutils passes to the compiler (for some reason distutils is suppressing compiler output -- this is pretty, but it makes debugging build failures hard): build\src.win32-2.4\numpy\core\src\scalarmathmodule.c(1737) : warning C4013: 'llabs' undefined; assuming extern returning int build\src.win32-2.4\numpy\core\src\scalarmathmodule.c(1751) : warning C4013: 'fabsf' undefined; assuming extern returning int build\src.win32-2.4\numpy\core\src\scalarmathmodule.c(1773) : warning C4013: 'sqrtf' undefined; assuming extern returning int In c:\Program Files\Microsoft Visual Studio .NET 2003\vc7\crt\src\math.h I have the following (extra code stripped): ... #ifndef __cplusplus #define acosl(x) ((long double)acos((double)(x))) #define asinl(x) ((long double)asin((double)(x))) #define atanl(x) ((long double)atan((double)(x))) ... /* NOTE! no sqrtf or fabsf is defined in this block */ #else /* __cplusplus */ ... #if !defined (_M_MRX000) && !defined (_M_ALPHA) && !defined (_M_IA64) /* NOTE! none of the above are defined on x86 */ ... inline float fabsf(float _X) {return ((float)fabs((double)_X)); } ... inline float sqrtf(float _X) {return ((float)sqrt((double)_X)); } ... #endif /* !defined (_M_MRX000) && !defined (_M_ALPHA) && !defined (_M_IA64) */ #endif /* __cplusplus */ >From this it would seem that Microsoft doesn't consider sqrtf and fabsf to be part of the C language? However, the C++ code provides a clue for how they implemented it. Also, llabs isn't defined anywhere. From reading the MSDN docs, I suspect it is called _abs64 on Windows. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 28 April 2006 09:13 > To: numpy-discussion > Subject: [Numpy-discussion] Scalar math module is ready for testing > > > The scalar math module is complete and ready to be tested. It should > speed up code that relies heavily on scalar arithmetic by by-passing the > ufunc machinery. > > It needs lots of testing to be sure that it is doing the "right" > thing. To enable scalarmath you need to > > import numpy.core.scalarmath > > You cannot disable it once it's enabled except by restarting Python. If > we need that feature we can add it. The array scalars respond to the > error modes of ufuncs. > > There is an experimental function called alter_scalars that replaces the > Python int, float, and complex number tables with the array scalar > equivalents. Thus, to amaze (or seriously annoy) your Python friends > you can do > > import numpy.core.scalarmath as ncs > > ncs.alter_scalars(int) > > 1 / 0 > > This will return 0 unless you change the error modes... > > ncs.retore_scalars(int) > > Will put things back the way Guido intended.... > > > Please try it out and send us error reports. Many thanks to Sasha for > his help in getting all the code so it at least compiles and loads. All > bugs should be blamed on me, though... > > > Best, > > -Travis From jonathan.taylor at stanford.edu Fri Apr 28 16:21:15 2006 From: jonathan.taylor at stanford.edu (Jonathan Taylor) Date: Fri Apr 28 16:21:15 2006 Subject: [Numpy-discussion] confusing recarray behaviour Message-ID: <44528318.6010604@stanford.edu> I'm new to recarrays and have been struggling with them. I keep getting an exception TypeError: expected a readable buffer object with no informative traceback. What I pass to N.array seems to agree with the examples in numpybook. Below is an example that does work for me (excuse the longish example but it was just cut and paste to make my life easier). In my code, funny things happen (see ipython excerpt below this). In particular, I have a list v with v[0:2] = V and with the same dtype "ddesc" I get this exception when I change V to v[0:2]. Any help would be appreciated. --------------------------------------------------------------------------------------- import numpy as N timedesc = N.dtype({'names':['tm_year', 'tm_mon', 'tm_mday', 'tm_hour', 'tm_min', 'tm_sec', 'tm_wday', 'tm_yday', 'tm_isdst'], 'formats':['i2']*9}) ddesc = N.dtype({'names': ('Week', 'Date', 'Institution', 'SeqNo', 'HeightDone', 'Height', 'UnitsH', 'WeightDone', 'Weight', 'Units', 'PulseDone', 'Pulse', 'BPdone', 'BPSys', 'BPDia', 'PID', 'RN'), 'formats': ['f4', timedesc] + ['f4']*15}) V = [(12.0, (2005, 4, 22, 0, 0, 0, 4, 112, -1), 501.0, 1.0, 2.0, 0.0, 0, 1.0, 91.5, 1.0, 1.0, 87.0, 1.0, 129.0, 76.0, 107.0, 11.0), (24.0, (2005, 2, 1, 0, 0, 0, 1, 32, -1), 504.0, 1.0, 2.0, 0.0, 0, 1.0, 166.0, 2.0, 1.0, 84.0, 1.0, 128.0, 78.0, 401.0, 7.0) ] w=N.array(V, dtype=ddesc) -------------------------------------------------------------------------------------------------- In [97]:v[0:2] == V Out[97]:True In [98]:N.array(V, ddesc) Out[98]: array([ (12.0, (2005, 4, 22, 0, 0, 0, 4, 112, -1), 501.0, 1.0, 2.0, 0.0, 0.0, 1.0, 91.5, 1.0, 1.0, 87.0, 1.0, 129.0, 76.0, 107.0, 11.0), (24.0, (2005, 2, 1, 0, 0, 0, 1, 32, -1), 504.0, 1.0, 2.0, 0.0, 0.0, 1.0, 166.0, 2.0, 1.0, 84.0, 1.0, 128.0, 78.0, 401.0, 7.0)], dtype=[('Week', ' TypeError: expected a readable buffer object -- ------------------------------------------------------------------------ I'm part of the Team in Training: please support our efforts for the Leukemia and Lymphoma Society! http://www.active.com/donate/tntsvmb/tntsvmbJTaylor GO TEAM !!! ------------------------------------------------------------------------ Jonathan Taylor Tel: 650.723.9230 Dept. of Statistics Fax: 650.725.8977 Sequoia Hall, 137 www-stat.stanford.edu/~jtaylo 390 Serra Mall Stanford, CA 94305 -------------- next part -------------- A non-text attachment was scrubbed... Name: jonathan.taylor.vcf Type: text/x-vcard Size: 329 bytes Desc: not available URL: From Fernando.Perez at colorado.edu Fri Apr 28 16:21:17 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Apr 28 16:21:17 2006 Subject: [Numpy-discussion] [OT] A weekend floating point/compiler question Message-ID: <44528F49.3080005@colorado.edu> Hi all, this is somewhat off-topic, since it's really a gcc/g77 question. Yet for us here (my group) it may lead to the decision to stop using g77 for all fortran code and switch to another compiler for our python-wrapped libraries. So it did arise in the context of python usage of in-house code, and I'm appealing to anyone who may want to play a little with the question and help. Feel free to reply off-list to keep the noise down on the list. The problem arose in some in-house library, but can be boiled down to this: planck[f77bug]> cat testbug.f program testbug c implicit real *8 (a-h,o-z) c half = 0.5d0 x = 0.49d0 nnx = 100 iax = (x+half)*nnx print *, 'Should be 99:',iax stop end c EOF planck[f77bug]> g77 -o testbug.g77 testbug.f planck[f77bug]> ./testbug.g77 Should be 99: 98 This can be seen as computing (x/n+1/2)*n and comparing it to x+n/2. Yes, I know about the dangers of floating point roundoff error (I didn't write the original code), but a variation of this is used inside a library that began crashing for certain inputs. The point is that this same code works fine with the Intel and Lahey compilers, but not with g77. Now, to add a bit of mystery to the question, I wrote the following C code: planck[f77bug]> cat scanbug.c #include int main(int argc, char* argv[]) { double x; double eps = 1e-2; double x0 = 0.0; double xmax = 1.0; int nnx = 100; int i = 0; double dax; int iax,iax_direct; x = x0; while (x References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> <44519C6E.80006@ieee.org> Message-ID: Many thanks, with your help, I got it working without any leaks. I need to run on ~10 TB of data, so fixing this leak sure helps my program scale. One error in the code below is that PyString_FromFormat does not accept %f, so I created a regular string and created the PyString with PyString_FromString (it seems to copy data), then freed the regular string. Is there any better way to do that? I'm curious why I didn't see any explanation of PyArray_DATA in the NumPy book. It seems really important, especially if you're touting it as the Proper Strategy. Finally, Robert encouraged me to stop using the legacy interface. I'm happy to do so, but I have to cater to my users. Approximately old a version of Numeric (and Numarray) will still work with PyArray_SimpleNew? Thanks, Nick On Apr 28, 2006, at 12:39 AM, Travis Oliphant wrote: > The proper strategy for your arrays is to use PyArray_SimpleNew and > then get the data-pointer to fill using PyArray_DATA(...). The > proper way to handle strings is to create a new string (say using > PyString_FromFormat) and then return everything as objects. > > > > /* make sure shape is defined as intp unless you don't care about > 64-bit */ > obj2 = PyArray_SimpleNew(1, shape, PyArray_DOUBLE); > data = (double *)PyArray_DATA(obj2) > [snip...] > out5 = PyString_FromFormat("Starting GPS time:%.1f UTC=%s", > vect->GTime,FrStrGTime(utc)); > > return Py_BuildValue("(NNNdNNN)",out1,out2,out3,out4,out5,out6,out7); > > > Make sure you use the 'N' tag so that another reference count isn't > generated. The 'O' tag will increase the reference count of your > objects by one which is is not necessarily what you want (but > sometimes you do). From robert.kern at gmail.com Fri Apr 28 16:43:18 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 28 16:43:18 2006 Subject: [Numpy-discussion] Re: Freeing memory allocated in C In-Reply-To: References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> <44519C6E.80006@ieee.org> Message-ID: Nick Fotopoulos wrote: > I'm curious why I didn't see any explanation of PyArray_DATA in the > NumPy book. It seems really important, especially if you're touting it > as the Proper Strategy. Section 13.3 talks about PyArray_DATA. > Finally, Robert encouraged me to stop using the legacy interface. I'm > happy to do so, but I have to cater to my users. Approximately old a > version of Numeric (and Numarray) will still work with PyArray_SimpleNew? None. It is new to Numpy. The old way would be to use PyArray_FromDims. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Fernando.Perez at colorado.edu Fri Apr 28 16:55:02 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Apr 28 16:55:02 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question Message-ID: <4452AB3F.8090700@colorado.edu> Hi Robert and George, We found a bug in g77 v. 3.4.4 as well as in gcc, which manifests itself in the following little snippet: planck[f77bug]> cat testbug.f program testbug c implicit real *8 (a-h,o-z) c half = 0.5d0 x = 0.49d0 nnx = 100 iax = (x+half)*nnx print *, 'Should be 99:',iax stop end c EOF planck[f77bug]> g77 -o testbug.g77 testbug.f planck[f77bug]> ./testbug.g77 Should be 99: 98 This can be seen as computing (x/n+1/2)*n and comparing it to x+n/2. Greg is using this in a number of places inside a library, which had never given trouble before when built with other compilers, like the sun, IBM, Intel and Lahey ones. Now with g77 it gives the result above. Questions: 1. Have you seen similar behavior in the past? 2. If we switch away from g77, what do you suggest moving towards? We ran paranoia on ifort, lahey and g77, and lahey was the best performing of all. The intel one has the advantage of being free. On the other hand, paranoia did complain about arithmetic issues with it (though the above code works fine with intel). Any ideas you can give us would be very appreciated. Cheers, Fernando and Greg. ps. Apparently g77 v 3.3.2 does NOT have this problem. From robert.kern at gmail.com Fri Apr 28 16:58:15 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 28 16:58:15 2006 Subject: [Numpy-discussion] Re: [OT] A weekend floating point/compiler question In-Reply-To: <44528F49.3080005@colorado.edu> References: <44528F49.3080005@colorado.edu> Message-ID: <4452ABFE.2040307@gmail.com> Fernando Perez wrote: > Any ideas/comments? Shouldn't the result be independent of the > intermediate double var? It is for icc, can this be considered a gcc bug? It seems like it might be processor-specific. On my G4 Powerbook (g77 3.4.4, gcc 3.3) and AMD64 Linux desktop (g77 3.4.5, gcc 4.0.2), both programs give the expected results. Specifically, the Intel 80-bit FPU thingy is probably a factor. It might be worth filing a bug report against gcc. If nothing else, you might get a better explanation of what's going on. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Fernando.Perez at colorado.edu Fri Apr 28 17:13:16 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Apr 28 17:13:16 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question In-Reply-To: <4452AB3F.8090700@colorado.edu> References: <4452AB3F.8090700@colorado.edu> Message-ID: <4452AF7D.6040008@colorado.edu> Fernando Perez wrote: > Hi Robert and George, Sorry! I was writing the same question to two colleagues and forgot to change the TO line. My apology. Cheers, f From gnchen at cortechs.net Fri Apr 28 18:08:03 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Fri Apr 28 18:08:03 2006 Subject: [Numpy-discussion] Guide to Numpy book Message-ID: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> Hi! What is the newest version of Guide to numpy? The recent one I got is dated at Jan 9 2005 on the cover. Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net From luis at geodynamics.org Fri Apr 28 18:29:03 2006 From: luis at geodynamics.org (Luis Armendariz) Date: Fri Apr 28 18:29:03 2006 Subject: [Numpy-discussion] Guide to Numpy book In-Reply-To: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> References: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> Message-ID: <4452C145.8050803@geodynamics.org> Gennan Chen wrote: > Hi! > > What is the newest version of Guide to numpy? The recent one I got is > dated at Jan 9 2005 on the cover. > The one I got yesterday is dated March 15, 2006. -Luis From robert.kern at gmail.com Sat Apr 29 00:31:22 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 29 00:31:22 2006 Subject: [Numpy-discussion] Re: A python interface for loess ? In-Reply-To: <200604260329.17115.pgmdevlist@mailcan.com> References: <200604260329.17115.pgmdevlist@mailcan.com> Message-ID: <4453162E.1040901@gmail.com> Pierre GM wrote: > Folks, > Would any of you be aware of a Python interface to the loess routines ? > http://netlib.bell-labs.com/netlib/a/dloess.gz Not specifically this code, but there is a pure Python+old Numeric implementation of lowess in BioPython, specifically in the Bio.Statistics subpackage. It's short and could be easily ported to use numpy. http://www.biopython.org -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From chris at pseudogreen.org Sat Apr 29 09:09:11 2006 From: chris at pseudogreen.org (Christopher Stawarz) Date: Sat Apr 29 09:09:11 2006 Subject: [Numpy-discussion] Re: A weekend floating point/compiler question Message-ID: <01fa3363e635409f488757070c5f8268@pseudogreen.org> Hi, I don't think this is a GCC bug, but it does seem to be related to Intel's 80-bit floating-point architecture. As of the Pentium 3, Intel and compatible processors have two sets of instructions for performing floating-point operations: the original 8087 set, which do all computations at 80-bit precision, and SSE (and their extension SSE2), which don't use extended precision. GCC allows you to select either instruction set. Unfortunately, in the absence of an explicit choice, it uses a default target that varies by platform: The i386 version defaults to 8087 instructions, while the x86-64 version defaults to SSE. See http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/i386-and-x86_002d64- Options.html for details. I can make your test programs behave correctly on a Pentium 4 by selecting SSE2: devel12-35: g77 testbug.f devel12-36: ./a.out Should be 99: 98 devel12-37: g77 -msse2 -mfpmath=sse testbug.f devel12-38: ./a.out Should be 99: 99 devel12-39: gcc scanbug.c devel12-40: ./a.out | head -1 ERROR at x=3.000000e-02! devel12-41: gcc -msse2 -mfpmath=sse scanbug.c devel12-42: ./a.out devel12-43: Interestingly, I expected to be able to induce incorrect results on an Opteron by using 8087, but that wasn't the case (both instruction sets produced the correct result). I'll have to think about why that's happening -- maybe casting between ints and doubles differs between 32 and 64-bit architectures? I've never used the Intel or Lahey Fortran compilers, but I suspect they must be generating SSE instructions by default. Actually, it's interesting that the 80-bit computations are causing problems here, since it's easy to come up with examples where they give you better results than computations done without the extra bits. Hope that helps, Chris From charlesr.harris at gmail.com Sat Apr 29 10:25:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat Apr 29 10:25:01 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question In-Reply-To: <4452AB3F.8090700@colorado.edu> References: <4452AB3F.8090700@colorado.edu> Message-ID: On 4/28/06, Fernando Perez wrote: > > Hi Robert and George, > > We found a bug in g77 v. 3.4.4 as well as in gcc, which manifests itself > in > the following little snippet: > > planck[f77bug]> cat testbug.f > program testbug > c > implicit real *8 (a-h,o-z) > c > half = 0.5d0 > x = 0.49d0 > nnx = 100 > iax = (x+half)*nnx > > print *, 'Should be 99:',iax > > stop > end > > c EOF I don't see why the answer should be 99. The number .99 can not be exactly represented in IEEE floating point, in fact it is ~ 0.9899999999999999911182. So as you can see the result is perfectly correct given the standard conversion to int by truncation. IMHO, this is programmer error, not a compiler problem and should be fixed in the code. Now you may get slightly different results depending on roundoff error if you indulge in such things as (.5 + .49)*100 vs (.33 + .17 + .49)*100, and since these numbers are constants they may also be precomputed by the compiler and the results will depend on the accuracy of the compiler's computation. The whole construction is ambiguous. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 29 10:43:08 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat Apr 29 10:43:08 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question In-Reply-To: References: <4452AB3F.8090700@colorado.edu> Message-ID: On 4/29/06, Charles R Harris wrote: > > > > On 4/28/06, Fernando Perez wrote: > > > > Hi Robert and George, > > > > We found a bug in g77 v. 3.4.4 as well as in gcc, which manifests itself > > in > > the following little snippet: > > > > planck[f77bug]> cat testbug.f > > program testbug > > c > > implicit real *8 (a-h,o-z) > > c > > half = 0.5d0 > > x = 0.49d0 > > nnx = 100 > > iax = (x+half)*nnx > > > > print *, 'Should be 99:',iax > > > > stop > > end > > > > c EOF > > > I don't see why the answer should be 99. The number .99 can not be exactly > represented in IEEE floating point, in fact it is ~ > 0.9899999999999999911182. So as you can see the result is perfectly > correct given the standard conversion to int by truncation. IMHO, this is > programmer error, not a compiler problem and should be fixed in the code. > Now you may get slightly different results depending on roundoff error if > you indulge in such things as (.5 + .49)*100 vs (.33 + .17 + .49)*100, and > since these numbers are constants they may also be precomputed by the > compiler and the results will depend on the accuracy of the compiler's > computation. The whole construction is ambiguous. > > Chuck > As an example: #include int main(int argc, char** argv) { int x = 100; long double y = .49; long double z = .50; printf("%25.22Lf\n", (y + z)*x); return 0; } prints 98.9999999999999991118216 whereas the same code with doubles instead of long doubles prints 99.0000000000000000000000. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant.travis at ieee.org Sat Apr 29 13:13:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 13:13:05 2006 Subject: [Numpy-discussion] confusing recarray behaviour In-Reply-To: <44528318.6010604@stanford.edu> References: <44528318.6010604@stanford.edu> Message-ID: <4453C8B7.8040000@ieee.org> Jonathan Taylor wrote: > > What I pass to N.array seems to agree with the examples in numpybook. > > Below is an example that does work for me (excuse the longish example > but it was just cut and paste to make my life easier). In my code, > funny things happen > (see ipython excerpt below this). In particular, I have a list v with > v[0:2] = V and with the > same dtype "ddesc" I get this exception when I change V to v[0:2]. Please show us what v is. If I run v = V[:] and then try N.array(v[0:2],ddesc) I don't get any error. So something else must be going on. Which version are you running? -Travis From fullung at gmail.com Sat Apr 29 14:30:10 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 29 14:30:10 2006 Subject: [Numpy-discussion] Array data and struct alignment Message-ID: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> Hello all I'm busy wrapping a C library with NumPy. Some of the functions operate on a buffer containing structs that look like this: struct node { int index; double value; }; On the Python side, I do the following to set up my data. examples is a list containing lists or dicts. nodes = [] for example in examples: if type(example) is dict: nodes.append(example.items()) else: nodes.append(zip(range(1, len(example)+1), example)) descr = [('index','intc',1),('value','f8',1)] self.nodes = map(lambda x: array(x, dtype=descr), nodes) Assume example = [[1.0, 2.0, 3.0], {4: 4.0}]. The nodes array can now be accessed in various useful ways: nodes[0][0] -> (1, 1.0) nodes[1][0] -> (4, 4.0)) nodes[0]['index'] -> [1,2,3] nodes[0]['value'] -> [1.0,2.0,3.0]) nodes[1]['index'] -> [4] nodes[1]['value'] -> [4.0] On the C side I can now do the following: PyObject* Svm_GetStructNode(PyObject* obj, PyObject* args) { PyObject* op1; struct node* node; if(!PyArg_ParseTuple(args, "O", &op1)) { return NULL; } node = (struct node*) PyArray_DATA(op1); return Py_BuildValue("(id)", node->index, node->value); } However, this only works if struct node is tightly packed (#pragma pack(1) with the Visual C compiler). I don't know how feasible this is, but it would be useful if NumPy could be told to pack its data on n-byte boundaries or on "same as the compiler" boundaries. I realise that there can be problems when mixing code compiled by more than one compiler, etc., etc., but a simple unit test can check for this. Any thoughts? Regards, Albert From oliphant.travis at ieee.org Sat Apr 29 14:58:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 14:58:01 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> References: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> Message-ID: <4453E10E.5090108@ieee.org> Albert Strasheim wrote: > Hello all > > I'm busy wrapping a C library with NumPy. Some of the functions operate on a > buffer containing structs that look like this: > > struct node { > int index; > double value; > }; > > [snip] > However, this only works if struct node is tightly packed (#pragma pack(1) > with the Visual C compiler). > > I don't know how feasible this is, but it would be useful if NumPy could be > told to pack its data on n-byte boundaries or on "same as the compiler" > boundaries. I realise that there can be problems when mixing code compiled > by more than one compiler, etc., etc., but a simple unit test can check for > this. > When you create a data-type using the dtype(...) syntax there is an align keyword that will "align" the data according to how the compiler does it. I'm not sure if it always works right so please test it out. So, in your case you should be able to say. descr = dtype([('index',intc),('value','f8')], align=1) Note, I've eliminated some unnecessary verbage in your description. Currently this is giving me an error that I will look into. -Travis From oliphant.travis at ieee.org Sat Apr 29 15:04:10 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 15:04:10 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> References: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> Message-ID: <4453E293.7080502@ieee.org> Albert Strasheim wrote: > Hello all > > I'm busy wrapping a C library with NumPy. Some of the functions operate on a > buffer containing structs that look like this: > > struct node { > int index; > double value; > }; > > In my previous discussion I was wrong. You cannot use the array_descriptor format for a data-type and the align keyword at the same time. You need to use a different method to specify fields. This, for example: descr = dtype({'names':['index', 'value'], 'formats':[intc,'f8']},align=1) On my (32-bit) system it doesn't produce any difference from align=0. -Travis From oliphant.travis at ieee.org Sat Apr 29 15:11:07 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 15:11:07 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <4453E293.7080502@ieee.org> References: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> <4453E293.7080502@ieee.org> Message-ID: <4453E449.20407@ieee.org> Travis Oliphant wrote: > Albert Strasheim wrote: >> Hello all >> >> I'm busy wrapping a C library with NumPy. Some of the functions >> operate on a >> buffer containing structs that look like this: >> >> struct node { >> int index; >> double value; >> }; >> >> > > In my previous discussion I was wrong. You cannot use the > array_descriptor format for a data-type and the align keyword at the > same time. You need to use a different method to specify fields. > > This, for example: > > descr = dtype({'names':['index', 'value'], > 'formats':[intc,'f8']},align=1) > > On my (32-bit) system it doesn't produce any difference from align=0. > > -Travis > > However notice the difference with >>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=1) dtype([('index', '>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=0) dtype([('index', ' References: <4452AB3F.8090700@colorado.edu> Message-ID: <4453F3A6.9030309@colorado.edu> Charles R Harris wrote: >>I don't see why the answer should be 99. The number .99 can not be exactly >>represented in IEEE floating point, in fact it is ~ >>0.9899999999999999911182. So as you can see the result is perfectly >>correct given the standard conversion to int by truncation. IMHO, this is >>programmer error, not a compiler problem and should be fixed in the code. >>Now you may get slightly different results depending on roundoff error if >>you indulge in such things as (.5 + .49)*100 vs (.33 + .17 + .49)*100, and >>since these numbers are constants they may also be precomputed by the >>compiler and the results will depend on the accuracy of the compiler's >>computation. The whole construction is ambiguous. >> >>Chuck >> > > > As an example: [...] Thanks to yours and the other replies. I did try resetting the FPU control word as suggested to only 64 bits, and in fact the 'problem' does disappear, and I suspect that's also why Robert sees differences in CPUs without the extra 16 internal FPU bits. I do agree that I don't like code like this, but unfortunately this one is outside of my control. For the sake of completeness (since this thread has some educational value on the vagaries of FP arithmetic), I've slightly extended your example to: abdul[f77bug]> cat print99.c #include int main(int argc, char** argv) { int x = 100; float fy = .49; float fz = .50; float fw = (fy + fz)*x; int ifw = fw; double y = .49; double z = .50; double w = (y + z)*x; int iw = w; long double ly = .49; long double lz = .50; long double lw = (ly + lz)*x; int ilw = lw; printf("floats:\n"); printf("w=%25.22f, iw=%d\n", fw,ifw); printf("doubles:\n"); printf("w=%25.22f, iw=%d\n", w,iw); printf("long doubles:\n"); printf("w=%25.22Lf, iw=%d\n", lw,ilw); return 0; } // EOF which gives on my box (AMD chip, running 32-bit fedora3): abdul[f77bug]> ./print99.gcc floats: w=99.0000000000000000000000, iw=99 doubles: w=99.0000000000000000000000, iw=99 long doubles: w=98.9999999999999991118216, iw=98 This is consitent with the calculations done in 80 bits giving also different results. One of the nice things about this community is precisely this kind of friendly expertise. Many thanks to all. Cheers, f From fullung at gmail.com Sat Apr 29 17:27:15 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 29 17:27:15 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <4453E449.20407@ieee.org> Message-ID: <001d01c66bec$c556ece0$0a84a8c0@dsp.sun.ac.za> Thanks Travis, this works like a charm. For the curious, here's a quick way to see if your system is doing the right thing: In [87]: descr = dtype({'names':['a', 'b'], 'formats':[byte,'f8']},align=1) In [88]: descr Out[88]: dtype([('a', '|i1'), ('', '|V7'), ('b', ' -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 30 April 2006 00:10 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Array data and struct alignment > > Travis Oliphant wrote: > > Albert Strasheim wrote: > >> Hello all > >> > >> I'm busy wrapping a C library with NumPy. Some of the functions > >> operate on a > >> buffer containing structs that look like this: > >> > >> struct node { > >> int index; > >> double value; > >> }; > >> > >> > > > > In my previous discussion I was wrong. You cannot use the > > array_descriptor format for a data-type and the align keyword at the > > same time. You need to use a different method to specify fields. > > > > This, for example: > > > > descr = dtype({'names':['index', 'value'], > > 'formats':[intc,'f8']},align=1) > > > > On my (32-bit) system it doesn't produce any difference from align=0. > > > > -Travis > > > > > > However notice the difference with > > >>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=1) > dtype([('index', ' > >>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=0) > dtype([('index', ' > > There is padding inserted in the first-case. This corresponds to how > the compiler packs a short; double struct on my system. The default is > align=0. You need to use the dtype() constructor to change the > default. The auto-constructor used in dtype= keyword calls will not > change the alignment from align=0. > > > -Travis From jonathan.taylor at stanford.edu Sat Apr 29 19:56:03 2006 From: jonathan.taylor at stanford.edu (Jonathan Taylor) Date: Sat Apr 29 19:56:03 2006 Subject: [Numpy-discussion] confusing recarray behaviour In-Reply-To: <4453C8B7.8040000@ieee.org> References: <44528318.6010604@stanford.edu> <4453C8B7.8040000@ieee.org> Message-ID: <44542730.4050609@stanford.edu> Here is a pickle file with v and desc, v is just a list of tuples with integer and string entries. My point with my example is that when I had two identical lists (i.e. v[0:2] == V) one time I got an error, the other time I didn't and the traceback had no information, i.e. I couldn't get anywhere with pdb. I am using svn revision 2456. Jonathan Travis Oliphant wrote: > Jonathan Taylor wrote: > >> >> What I pass to N.array seems to agree with the examples in numpybook. >> >> Below is an example that does work for me (excuse the longish example >> but it was just cut and paste to make my life easier). In my code, >> funny things happen >> (see ipython excerpt below this). In particular, I have a list v with >> v[0:2] = V and with the >> same dtype "ddesc" I get this exception when I change V to v[0:2]. > > Please show us what v is. > > If I run v = V[:] and then try N.array(v[0:2],ddesc) I don't get any > error. So something else must be going on. > > Which version are you running? > > > -Travis > > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- ------------------------------------------------------------------------ I'm part of the Team in Training: please support our efforts for the Leukemia and Lymphoma Society! http://www.active.com/donate/tntsvmb/tntsvmbJTaylor GO TEAM !!! ------------------------------------------------------------------------ Jonathan Taylor Tel: 650.723.9230 Dept. of Statistics Fax: 650.725.8977 Sequoia Hall, 137 www-stat.stanford.edu/~jtaylo 390 Serra Mall Stanford, CA 94305 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dump.pickle URL: From awf at yahoo.co.kr Sun Apr 30 06:28:01 2006 From: awf at yahoo.co.kr (=?iso-2022-jp?B?Zndm?=) Date: Sun Apr 30 06:28:01 2006 Subject: [Numpy-discussion] =?iso-2022-jp?B?PRskQjIrNmI9NTRWJUolUxsoQj0=?= Message-ID: ??????????????? ??????????? http://biz-station.org/week/ ? gonghexinnian at yahoo.com.cn From ndarray at mac.com Sun Apr 30 10:12:06 2006 From: ndarray at mac.com (Sasha) Date: Sun Apr 30 10:12:06 2006 Subject: [Numpy-discussion] [Numeric] "put" into object array corrupts memory In-Reply-To: References: Message-ID: I know that Numeric is no longer maintained, but since this bug cost me two sleepless nights, I think it is appropriate to announce the bug and the fix to the list. ---------- Forwarded message ---------- From: SourceForge.net Date: Apr 30, 2006 12:58 PM Subject: [ numpy-Bugs-1479376 ] [Numeric] "put" into object array corrupts memory To: noreply at sourceforge.net Bugs item #1479376, was opened at 2006-04-30 12:46 Message generated for change (Comment added) made by belopolsky You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=101369&aid=1479376&group_id=1369 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Fatal Error Group: Normal bug Status: Open Priority: 5 Submitted By: Alexander Belopolsky (belopolsky) Assigned to: Nobody/Anonymous (nobody) Summary: [Numeric] "put" into object array corrupts memory Initial Comment: This is one of those bugs that are easier to fix than to reproduce: $ cat test-put.py class A(object): def __del__(self): print "deleting %r" % self a = A() from Numeric import * x = array([None], 'O') y = array([a], 'O') put(x,[0],y) del a,y print "exiting" $ python test-put.py deleting <__main__.A object at 0xf7e4d24c> exiting Fatal Python error: deletion of interned string failed Aborted (core dumped) Numeric version: 24.2 ---------------------------------------------------------------------- >Comment By: Alexander Belopolsky (belopolsky) Date: 2006-04-30 12:58 Message: Logged In: YES user_id=835142 Attached patch fixes the bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=101369&aid=1479376&group_id=1369 From vidar+list at 37mm.no Sun Apr 30 16:27:00 2006 From: vidar+list at 37mm.no (Vidar Gundersen) Date: Sun Apr 30 16:27:00 2006 Subject: [Numpy-discussion] Guide to Numpy book In-Reply-To: <4452C145.8050803@geodynamics.org> (Luis Armendariz's message of "Fri, 28 Apr 2006 18:28:37 -0700") References: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> <4452C145.8050803@geodynamics.org> Message-ID: ===== Original message from Luis Armendariz | 29 Apr 2006: >> What is the newest version of Guide to numpy? The recent one I got is >> dated at Jan 9 2005 on the cover. > The one I got yesterday is dated March 15, 2006. aren't the updates supposed to be sent out to customers when available? From ted.horst at earthlink.net Sun Apr 30 16:50:08 2006 From: ted.horst at earthlink.net (Ted Horst) Date: Sun Apr 30 16:50:08 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: <3856FA57-539D-47DE-8427-2A6BB508F917@earthlink.net> Here is an issue I am having with scalarmath: >>> import numpy >>> numpy.__version__ '0.9.7.2462' >>> import numpy.core.scalarmath >>> a = numpy.array([1], 'h') >>> 1*a array([1], dtype=int16) >>> 1*a[0] Traceback (most recent call last): File "", line 1, in ? TypeError: unsupported operand type(s) for *: 'int' and 'int16scalar' This happens because PyArray_CanCastSafely returns false for casting from int to short. alter_scalars(int) fixes this, but I have lots of non-numpy code that I don't want to behave differently. Ted On Apr 28, 2006, at 02:12, Travis Oliphant wrote: > The scalar math module is complete and ready to be tested. It > should speed up code that relies heavily on scalar arithmetic by by- > passing the ufunc machinery. From fullung at gmail.com Sun Apr 30 17:11:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sun Apr 30 17:11:05 2006 Subject: [Numpy-discussion] Creating a descr with aligned=1 using the C API Message-ID: <000601c66cb3$b762a940$0a84a8c0@dsp.sun.ac.za> Hello all I was wondering what the best way would be to create the following descr using the C API: descr = dtype({'names' : ['index', 'value'], 'formats' : [intc, 'f8']}, align=1) One could use PyArray_DescrConverter in multiarraymodule.c, but there doesn't seem to be a way to specify aligned=1 and one would have to build the dict object before being able to pass it on for conversion. Unless there's another easy way I'm missing, the API could possibly do with a function like PyArray_DescrFromCommaString(const char*, int align) which calls _convert_from_commastring. By the way, what is the general format of these commastrings? Comments appreciated. Regards, Albert From tim.hochberg at cox.net Sun Apr 30 19:33:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 30 19:33:03 2006 Subject: [Numpy-discussion] basearray lives! Message-ID: <445573B0.6020408@cox.net> After a fashion anyway. I implemented the simplest thing that could possibly work and I've left out some stuff that even I think we need (docstring, repr and str). Still it exists, ndarray inherits from it and some stuff seems to work automagically. >>> import numpy as n >>> ba = n.basearray([3,3], int, n.arange(9)) >>> ba >>> a = asarray(ba) >>> a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> a + ba array([[ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) >>> isinstance(a, n.basearray) True >>> type(ba) >>> type(a) >>> len(dir(ba)) 19 >>> len(dir(a)) 156 Travis: should I go ahead and check this into the trunk? It shouldn't interfear with anything. The only change to ndarray is the tp_base, which sets up the inheritance. -tim From ndarray at mac.com Sun Apr 30 20:27:09 2006 From: ndarray at mac.com (Sasha) Date: Sun Apr 30 20:27:09 2006 Subject: [Numpy-discussion] basearray lives! In-Reply-To: <445573B0.6020408@cox.net> References: <445573B0.6020408@cox.net> Message-ID: Let me add my $.02. I am very much in favor of a basic array object. I would probably go much further than Tim in simplifying it. No need for repr/str. No number protocol. No sequence/mapping protocol either. Maybe even no dimensions/striding etc. What is left? Not much on top of buffer protocol: the type description. I've expressed this opinion several times before (and was criticised for not supporting it:-): I don't think a basearray should be a base class. The main reason is that in most cases subclasses will need to adapt all the array methods. In many cases (speaking from ma experience, but probably matrix folks can relate) the adaptation is not automatic and has to be done on the method by method bases. Exposure of the base class methods without adaptation or with wrong adaptation leads to errors. Unless the base array is truly minimalistic and stays this way, methods that are added to the base class in the future will likely not work unadapted. The only implementation that uses inheritance that I will like would be something similar to python's object type: rich C API and no Python API. Would you consider checking your implementation in without modifying ndarray's tp_base? On 4/30/06, Tim Hochberg wrote: > > After a fashion anyway. I implemented the simplest thing that could > possibly work and I've left out some stuff that even I think we need > (docstring, repr and str). Still it exists, ndarray inherits from it and > some stuff seems to work automagically. > > >>> import numpy as n > >>> ba = n.basearray([3,3], int, n.arange(9)) > >>> ba > > >>> a = asarray(ba) > >>> a > array([[0, 1, 2], > [3, 4, 5], > [6, 7, 8]]) > >>> a + ba > array([[ 0, 2, 4], > [ 6, 8, 10], > [12, 14, 16]]) > >>> isinstance(a, n.basearray) > True > >>> type(ba) > > >>> type(a) > > >>> len(dir(ba)) > 19 > >>> len(dir(a)) > 156 > > > Travis: should I go ahead and check this into the trunk? It shouldn't > interfear with anything. The only change to ndarray is the tp_base, > which sets up the inheritance. > > > > -tim > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From oliphant.travis at ieee.org Sun Apr 30 21:45:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Apr 30 21:45:05 2006 Subject: [Numpy-discussion] Creating a descr with aligned=1 using the C API In-Reply-To: <000601c66cb3$b762a940$0a84a8c0@dsp.sun.ac.za> References: <000601c66cb3$b762a940$0a84a8c0@dsp.sun.ac.za> Message-ID: <44559204.3020902@ieee.org> Albert Strasheim wrote: > Hello all > > I was wondering what the best way would be to create the following descr > using the C API: > You can use the "new" method. PyArray_Descr *dtype PyObject *dict; dtype = PyArrayDescr_Type.ob_type->tp_new(dtype->ob_type, Py_BuildValue("Oi", dict, 1)); where the dict is the one you give. Yes, this could be an easier-to use API. > descr = dtype({'names' : ['index', 'value'], 'formats' : [intc, 'f8']}, > align=1) > > One could use PyArray_DescrConverter in multiarraymodule.c, but there > doesn't seem to be a way to specify aligned=1 and one would have to build > the dict object before being able to pass it on for conversion. > > Unless there's another easy way I'm missing, the API could possibly do with > a function like PyArray_DescrFromCommaString(const char*, int align) which > calls _convert_from_commastring. By the way, what is the general format of > these commastrings? > It's in the NumPy book and it's also documented by numarray... -Travis From oliphant.travis at ieee.org Sun Apr 30 21:49:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Apr 30 21:49:02 2006 Subject: [Numpy-discussion] basearray lives! In-Reply-To: <445573B0.6020408@cox.net> References: <445573B0.6020408@cox.net> Message-ID: <445592EB.1000406@ieee.org> Tim Hochberg wrote: > > After a fashion anyway. I implemented the simplest thing that could > possibly work and I've left out some stuff that even I think we need > (docstring, repr and str). Still it exists, ndarray inherits from it > and some stuff seems to work automagically. > > >>> import numpy as n > >>> ba = n.basearray([3,3], int, n.arange(9)) > >>> ba > > >>> a = asarray(ba) > >>> a > array([[0, 1, 2], > [3, 4, 5], > [6, 7, 8]]) > >>> a + ba > array([[ 0, 2, 4], > [ 6, 8, 10], > [12, 14, 16]]) > >>> isinstance(a, n.basearray) > True > >>> type(ba) > > >>> type(a) > > >>> len(dir(ba)) > 19 > >>> len(dir(a)) > 156 > > > Travis: should I go ahead and check this into the trunk? It shouldn't > interfear with anything. The only change to ndarray is the tp_base, > which sets up the inheritance. > I say go ahead. We can then all deal with it there and improve upon it. The ndarray used to inherit from another array and things worked. Python's inheritance in C is actually quite slick. Especially for structural issues. I agree that the basearray should have minimal operations (I would not even define several of the protocols for it). I'd probably only keep the buffer and mapping protocol but even then probably only a simple mapping protocol (i.e. no fancy-indexing) that then gets enhanced by the ndarray. Thanks for the work. -Travis From robert.kern at gmail.com Sat Apr 1 00:20:00 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 1 00:20:00 2006 Subject: [Numpy-discussion] Trac maintenance Message-ID: <442E3770.6030809@gmail.com> I've been doing a bit of maintenance on the Trac instances for numpy and scipy. In particular, I've removed the default "component1" and "milestone2" nonsense and put meaningful values in their place. If you have any requests, or you think my component lists are bogus, enter a ticket, set the component to "Trac" and assign it to rkern. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Sat Apr 1 06:57:17 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sat Apr 1 06:57:17 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442E2F05.5080809@ieee.org> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> Message-ID: <442E94AD.1040200@cox.net> Travis Oliphant wrote: > Tim Hochberg wrote: > >> >> I've just been looking at how numpy handles changing the behaviour >> that is triggered when there are numeric error conditions (overflow, >> underflow, etc.). If I understand it correctly, and that's a big if, >> I don't think I like it nearly as much as the what numarray has in >> place. >> >> It appears that numpy uses the two functions, seterr and geterr, to >> set and query the error handling. These set/read a secret variable >> stored in the local scope. > > This approach was decided on after discussions with Guido who didn't > like the idea of pushing and popping from a global stack. I'm not > sure I'm completely in love with it my self, but it is actually more > flexible then the numarray approach. > > You can get the numarray approach back simply by setting the error in > the builtin scope (instead of in the local scope which is done by > default. I saw that you could set it at different levels, but missed the implications. However, it's still missing one feature, thread local storage. I would argue that the __builtin__ data should actually be stored in threading.local() instead of __builtin__. Then you could setup an equivalent stack system to numpy's. > Then, at the end of the function, you can restore it. If it was felt > useful to create a stack to handle this on the builtin level then that > is easily done as well. I've used the numarray error handling stuff for some time. My experience with it has led me to the following conclusions: 1. You don't use it that often. I have about 26 KLOC that's "active" and in that I use pushMode just 15 times. For comparison, I use asarray a tad over 100 times. 2. pushMode and popMode, modulo spelling, is the way to set errors. Once the with statement is around, that will be even better. 3. I, personally, would be very unlikely to use the local and global error handling, I'd just as soon see them go away, particularly if it helps performance, but I won't lobby for it. >> I assume that the various ufuncs then examine that value to determine >> how to handle errors. The secret variable approach is a little >> clunky, but that's not what concerns me. What concerns me is that >> this approach is *only* useful for built in numpy functions and falls >> down if we call any user defined functions. >> >> Suppose we want to be warned on underflow. Setting this is as simple as: >> >> def func(*args): >> numpy.seterr(under='warn') >> # do stuff with args >> return result >> >> Since seterr is local to the function, we don't have to reset the >> error handling at the end, which is convenient. And, this works fine >> if all we are doing is calling numpy functions and methods. However, >> if we are calling a function of our own devising we're out of luck >> since the called function will not inherit the error settings that we >> have set. > > Again, you have control over where you set the "secret" variable > (local, global (module), and builtin). I also don't see how that's > anymore clunky then a "secret" stack. In numarray, the stack is in the numarray module itself (actually in the Error object). They base their threading local behaviour off of thread.get_ident, not threading.local. That's not clunky at all, although it's arguably wrong since thread.get_ident can reuse ids from dead threads. In practice it's probably hard to get into trouble doing this, but I still wouldn't emulate it. I think that this was written before thread local storage, so it was probably the best that could be done. However, if you use threading.local, it will be clunky in a similar sense. You'll be storing data in a global namespace you don't control and you've got to hope that no one stomps on your variable name. When you have local and module level secret storage names as well you're just doing a lot more of that and the chance of collision and confusion goes up from almost zero to very small. > You may set the error in the builtin scope --- in fact it would > probably be trivial to implement a stack based on this and implement the > > pushMode > popMode > > interface of numarray. Yes. Modulo the thread local issue, I believe that this would indeed be easy. > > But, I think this question does deserve a bit of debate. I don't > think there has been a serious discussion over the method. To help > Tim and others understand what happens: > > When a ufunc is called, a specific variable name is searched for in > the following name-spaces in the following order: > > 1) local > 2) global > 3) builtin > > (There is a bit of an optimization in that when the error mode is the > default mode --- do nothing, a global flag is set which by-passes the > search for the name). > The first time the variable name is found, the error mode is read from > that variable. This error mode is placed as part of the ufunc loop > object. At the end of each 1-d loop the IEEE error mode flags are > checked (depending on the state of the error mode) and appropriate > action taken. > > By the way, it would not be too difficult to change how the error mode > is set (probably an hour's worth of work). So, concern over > implementation changes should not be a factor right now. > Currently the error mode is read from a variable using standard > scoping rules. It would save the (not insignificant) name-space > lookup time to instead use a global stack (i.e. a Python list) and > just get the error mode from the top of that stack. > >> Thus we have no way to influence the error settings of functions >> downstream from us. > > Of course, there is a way to do this by setting the variable in the > global or builtin scope as I've described above. > What's really the argument here, is whether having the flexibility at > the local and global name-spaces really worth the extra name-lookups > for each ufunc. > > I've argued that the numarray behavior can result from using the > builtin namespace for the error control. (perhaps with better > Python-side support for setting and retrieving it). What numpy has is > control at the global and local namespace level as well which can > override the builtin name-space behavior. > > So, we should at least frame the discussion in terms of what is > actually possible. Yes, sorry for spreading misinformation. >> >> I also would prefer more verbose keys ala numarray (underflow, >> overflow, dicidebyzero and invalid) than those currently used by >> numpy (under, over, divide and invalid). > > > In my mind, verbose keys are just extra baggage unless they are really > self documenting. You just need reminders and clues. It seems to be > a preference thing. I guess I hate typing long strings when only the > first few letters clue me in to what is being talked about. In this case, overflow, underflow and dividebyzero seem pretty self documenting to me. And 'invalid' is pretty cryptic in both implementations. This may be a matter of taste, but I tend to prefer short pithy names for functions that I use a lot, or that crammed a bunch to a line. In functions like this, that are more rarely used and get a full line to themselves I lean to towards the more verbose. >> And (will he never stop) I like numarrays defaults better here too: >> overflow='warn', underflow='ignore', dividebyzero='warn', >> invalid='warn'. Currently, numpy defaults to ignore for all cases. >> These last points are relatively minor though. > > This has optimization issues the way the code is written now. The > defaults are there to produce the fastest loops. Can you elaborate on this a bit? Reading between the lines, there seem to be two issues related to speed here. One is the actual namespace lookup of the error mode -- there's a setting that says we are using the defaults, so don't bother to look. This saves the namespace lookup. Changing the defaults shouldn't affect the timing of that. I'm not sure how this would interact with thread local storage though. The second issue is that running the core loop with no checks in place is faster. That means that to get maximum performance you want to be running both at the default setting and with no checks, which implies that the default setting needs to be no checking. Is that correct? I think there should be a way to finesse this issue, but I'll wait for the dust to settle a bit on the local, global, builtin issue before I propose anything. Particularly since by finesse I mean: do something moderately unsavory. > So, I'm hesitant to change them based only on ambiguous preferences. It's not entirely plucked out of the error. As I recall, the decision was arrived at something likes this: 1. Errors should never pass silently (unless explicitly silenced). 2. Let's have everything raise by default 3. In practice this was no good because you often wanted to look at the results and see where the problem was. 4. OK, let's have everything warn 5. This almost worked, but underflow was almost never a real error, so everyone always overrode underflow. A default that you always need to override is not a good default. 6. So, warn for everything except underflow. Ignore that. And that's where numarry is today. I and other have been using that error system happily for quite some time now. At least I haven't heard any complaints for quite a while. > Good feedback. Thanks again for taking the time to look at this and > offer review. You're very welcome. Thanks for all of the work you've been putting in to make the grand numerification happen. -tim From arnd.baecker at web.de Sat Apr 1 09:09:06 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Sat Apr 1 09:09:06 2006 Subject: [Numpy-discussion] extension to xrange for numpy Message-ID: Dear numpy enthusiasts, one python command which is extremely useful in 1D situations is `xrange`. However, for higher dimensional settings we strongly lack the commands `yrange` and `zrange`. These could be shorthands for the corresponding constructs with `:,NewAxis` added. Any comments, suggestion and even implementations are very welcome, Arnd P.S.: What I am not sure about is the right command for the 4-dimensional case - which letter should be used after the "z"? (it seems that "a" would be a very natural choice...) From faltet at carabos.com Sat Apr 1 11:01:05 2006 From: faltet at carabos.com (Francesc Altet) Date: Sat Apr 1 11:01:05 2006 Subject: [Numpy-discussion] ANN: PyTables 1.3 released Message-ID: <200604012100.38726.faltet@carabos.com> ========================= Announcing PyTables 1.3 ========================= This is a new major release of PyTables. The most remarkable feature added in this version is a complete support (well, almost, because unicode arrays are not there yet) for NumPy objects. Improved support for native HDF5 is there as well. As an aside, I'm happy to inform you that the PyTables web site (http://www.pytables.org) has been converted into a wiki so that users can contribute to the project with recipes or any other document. Try it out! Go to the (new) PyTables web site for downloading the beast: http://www.pytables.org/ or keep reading for more info about the new features and bugs fixed. Changes more in depth ===================== Improvements: - Support for NumPy objects in all the objects of PyTables, namely: Array, CArray, EArray, VLArray and Table. All the numerical and character (except unicode arrays) flavors are supported as well as plain and nested heterogeneous NumPy arrays. PyTables leverages the adoption of the array interface (http://numeric.scipy.org/array_interface.html) for a very efficient conversion between all the numarray (which continues to be the native flavor for PyTables) object to/from NumPy/Numeric. - The FLAVOR schema in PyTables has been refined and simplified. Now, the only 'flavors' allowed for data objects are: "numarray", "numpy", "numeric" and "python". The changes has been made so that they are fully backward compatible with existing PyTables files. However, when users would try to use old flavors (like "Numeric" or "Tuple") in existing code, a ``DeprecationWarning`` will be issued in order to encourage them to migrate to the new flavors as soon as possible. - Nested fields can be specified in the "field" parameter of Table.read by using a '/' as a separator between fields (e.g. 'Info/value'). - The Table.Cols accessor has received a new ``__setitem__()`` method that allows doing things like: table.cols[4] = record table.cols.x[4:1000:2] = array # homogeneous column table.cols.Info[4:1000:2] = recarray # nested column - A clean-up function (using ``atexit``) has been registered so that remaining opened files are closed when a user hits a ^C, for example. That would help to avoid ending with corrupted files. - Native HDF5 compound datasets that are contiguous are supported now. Before, only chunked datasets were supported. - Updated (and much improved) sections about compression issues in the User's Guide. It includes new benchmarks made with PyTables 1.3 and a exhaustive comparison between Zlib, LZO and bzip2. - The HTML version of manual is made now from the docbook2html package for an improved look (IMO). Bug fixes: - Solved a problem when trying to save CharArrays with itemsize = 0 as attributes of nodes. Now, these objects are pickled in order to prevent HDF5 from crashing. - Fixed some alignment issues with nested record arrays under certain architectures (e.g. PowerPC). - Fixed automatic conversions when a VLArray is read in a platform with a byte ordering different from the file. Deprecated features: - Due to recurrent problems with the UCL compression library, it has been declared deprecated from this version on. You can still compile PyTables with UCL support (using the --force-ucl), but you are urged to not use it anymore and convert any existing datafiles with UCL to other supported library (zlib, lzo or bzip2) with the ``ptrepack`` utility. Backward-incompatible changes: - Please, see ``RELEASE-NOTES.txt`` file. Important note for Windows users ================================ If you are willing to use PyTables with Python 2.4 in Windows platforms, you will need to get the HDF5 library compiled for MSVC 7.1, aka .NET 2003. It can be found at: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win-net.ZIP Users of Python 2.3 on Windows will have to download the version of HDF5 compiled with MSVC 6.0 available in: ftp://ftp.ncsa.uiuc.edu/HDF/HDF5/current/bin/windows/5-165-win.ZIP What it is ========== **PyTables** is a package for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data (with support for full 64-bit file addressing). It features an object-oriented interface that, combined with C extensions for the performance-critical parts of the code, makes it a very easy-to-use tool for high performance data storage and retrieval. PyTables runs on top of the HDF5 library and numarray (but NumPy and Numeric are also supported) package for achieving maximum throughput and convenient use. Besides, PyTables I/O for table objects is buffered, implemented in C and carefully tuned so that you can reach much better performance with PyTables than with your own home-grown wrappings to the HDF5 library. PyTables sports indexing capabilities as well, allowing doing selections in tables exceeding one billion of rows in just seconds. Platforms ========= This version has been extensively checked on quite a few platforms, like Linux on Intel32 (Pentium), Win on Intel32 (Pentium), Linux on Intel64 (Itanium2), FreeBSD on AMD64 (Opteron), Linux on PowerPC (and PowerPC64) and MacOSX on PowerPC. For other platforms, chances are that the code can be easily compiled and run without further issues. Please, contact us in case you are experiencing problems. Resources ========= Go to the PyTables web site for more details: http://www.pytables.org About the HDF5 library: http://hdf.ncsa.uiuc.edu/HDF5/ About numarray: http://www.stsci.edu/resources/software_hardware/numarray To know more about the company behind the PyTables development, see: http://www.carabos.com/ Acknowledgments =============== Thanks to various the users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Many thanks also to SourceForge who have helped to make and distribute this package! And last but not least, a big thank you to THG (http://www.hdfgroup.org/) for sponsoring many of the new features recently introduced in PyTables. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From oliphant.travis at ieee.org Sat Apr 1 12:20:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 1 12:20:01 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442E94AD.1040200@cox.net> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> Message-ID: <442EE026.8060806@ieee.org> Tim Hochberg wrote: >> >> You can get the numarray approach back simply by setting the error in >> the builtin scope (instead of in the local scope which is done by >> default. > > I saw that you could set it at different levels, but missed the > implications. However, it's still missing one feature, thread local > storage. I would argue that the __builtin__ data should actually be > stored in threading.local() instead of __builtin__. Then you could > setup an equivalent stack system to numpy's. Yes, the per-thread storage escaped me. But, threading.local() only exists in Python 2.4 and NumPy is supposed to be compatible with Python 2.3 What about PyThreadState_GetDict() ? and then default to use the builtin dictionary if this returns NULL? I'm actually not particularly enthused about the three name-space lookups. Changing it to only 1 place to look may be better. It would require a setting and restoring operation. A stack could be used, but why not just use local variables (i.e. save = numpy.seterr(dividebyzero='warn') ... numpy.seterr(restore=save) > > I've used the numarray error handling stuff for some time. My > experience with it has led me to the following conclusions: > > 1. You don't use it that often. I have about 26 KLOC that's "active" > and in that I use pushMode just 15 times. For comparison, I use > asarray a tad over 100 times. > 2. pushMode and popMode, modulo spelling, is the way to set errors. > Once the with statement is around, that will be even better. > 3. I, personally, would be very unlikely to use the local and global > error handling, I'd just as soon see them go away, particularly if > it helps performance, but I won't lobby for it. > This is good feedback. I have almost zero experience with changing the error handling. So, I'm not sure what features are desireable. Eliminating unnecessary name-lookups is usually a good thing. > > In numarray, the stack is in the numarray module itself (actually in > the Error object). They base their threading local behaviour off of > thread.get_ident, not threading.local. That's not clunky at all, > although it's arguably wrong since thread.get_ident can reuse ids from > dead threads. In practice it's probably hard to get into trouble doing > this, but I still wouldn't emulate it. I think that this was written > before thread local storage, so it was probably the best that could be > done. Right, but thread local storage is still Python 2.4 only.... What about PyThreadState_GetDict() ? > > However, if you use threading.local, it will be clunky in a similar > sense. You'll be storing data in a global namespace you don't control > and you've got to hope that no one stomps on your variable name. The PyThreadState_GetDict() documenation states that extension module writers should use a unique name based on their extension module. > When you have local and module level secret storage names as well > you're just doing a lot more of that and the chance of collision and > confusion goes up from almost zero to very small. This is true. Similar to the C-variable naming issues. >> So, we should at least frame the discussion in terms of what is >> actually possible. > > Yes, sorry for spreading misinformation. But you did point out the very important thread-local storage fact that I had missed. This alone makes me willing to revamp what we are doing. > > In this case, overflow, underflow and dividebyzero seem pretty self > documenting to me. And 'invalid' is pretty cryptic in both > implementations. This may be a matter of taste, but I tend to prefer > short pithy names for functions that I use a lot, or that crammed a > bunch to a line. In functions like this, that are more rarely used and > get a full line to themselves I lean to towards the more verbose. The rarely-used factor is a persuasive argument. > Can you elaborate on this a bit? Reading between the lines, there seem > to be two issues related to speed here. One is the actual namespace > lookup of the error mode -- there's a setting that says we are using > the defaults, so don't bother to look. This saves the namespace > lookup. Changing the defaults shouldn't affect the timing of that. > I'm not sure how this would interact with thread local storage though. > > The second issue is that running the core loop with no checks in place > is faster. Basically, on the C-level, the error mode is an integer with specific bits allocated to the various error-possibilites (2-bits per possibility). If this is 0 then the error checking is not even done (thus no error handling at all). Yes the name-lookup optimization could work with any defaults (but with thread-specific storage couldn't work anyway). One question I have with threads and error handling though? Right now, the ufuncs release the Python lock during computation (and re-acquire it to do error handling if needed). If another ufunc was started by another Python thread and ran with different error handling, wouldn't the IEEE flags get confused about which ufunc was setting what? The flags are only checked after each 1-d loop. If another thread set the processor flag, the current thread could get very confused. This seems like a problem that I'm not sure how to handle. > > It's not entirely plucked out of the error. As I recall, the decision > was arrived at something likes this: > > 1. Errors should never pass silently (unless explicitly silenced). > 2. Let's have everything raise by default > 3. In practice this was no good because you often wanted to look at > the results and see where the problem was. > 4. OK, let's have everything warn > 5. This almost worked, but underflow was almost never a real error, > so everyone always overrode underflow. A default that you always > need to override is not a good default. > 6. So, warn for everything except underflow. Ignore that. > > And that's where numarry is today. I and other have been using that > error system happily for quite some time now. At least I haven't heard > any complaints for quite a while. I can appreciate this choice, but I don't agree that errors should never pass silently. The fact that people disagree about this is the reason for the error handling. Note that overflow is not detected everywhere for integers --- we have to simulate the floating-point errors for them. Only on integer multiply is it detected. Checking for it would slow down all other integer arithmetic --- one solution, of course is to have two different integer additions (one that checks for overflow and another that doesn't). There is really a bit of work left here to do. Best, -Travis From tim.hochberg at cox.net Sat Apr 1 14:01:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sat Apr 1 14:01:04 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442EE026.8060806@ieee.org> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> Message-ID: <442EF7D9.9010404@cox.net> Travis Oliphant wrote: > Tim Hochberg wrote: > >>> >>> You can get the numarray approach back simply by setting the error >>> in the builtin scope (instead of in the local scope which is done by >>> default. >> >> >> I saw that you could set it at different levels, but missed the >> implications. However, it's still missing one feature, thread local >> storage. I would argue that the __builtin__ data should actually be >> stored in threading.local() instead of __builtin__. Then you could >> setup an equivalent stack system to numpy's. > > Yes, the per-thread storage escaped me. But, threading.local() only > exists in Python 2.4 and NumPy is supposed to be compatible with > Python 2.3 > > What about PyThreadState_GetDict() ? and then default to use the > builtin dictionary if this returns NULL? That sounds reasonable. I've never used that, but the name sounds promising! > I'm actually not particularly enthused about the three name-space > lookups. Changing it to only 1 place to look may be better. It > would require a setting and restoring operation. A stack could be > used, but why not just use local variables (i.e. > save = numpy.seterr(dividebyzero='warn') > > ... > > numpy.seterr(restore=save) That would work as well, I think. It gets a little hairy if you want to set error nestedly in a single function, but I've never done that, so I'm not too worried about it. Besides, what I really want to support is 'with', which I imagine we can support using the above as a base. >> I've used the numarray error handling stuff for some time. My >> experience with it has led me to the following conclusions: >> >> 1. You don't use it that often. I have about 26 KLOC that's "active" >> and in that I use pushMode just 15 times. For comparison, I use >> asarray a tad over 100 times. >> 2. pushMode and popMode, modulo spelling, is the way to set errors. >> Once the with statement is around, that will be even better. >> 3. I, personally, would be very unlikely to use the local and global >> error handling, I'd just as soon see them go away, particularly if >> it helps performance, but I won't lobby for it. >> > > This is good feedback. I have almost zero experience with changing > the error handling. So, I'm not sure what features are desireable. > Eliminating unnecessary name-lookups is usually a good thing. I hope some of the other numarray users chime in. A sample of one is not very good data! >> In numarray, the stack is in the numarray module itself (actually in >> the Error object). They base their threading local behaviour off of >> thread.get_ident, not threading.local. That's not clunky at all, >> although it's arguably wrong since thread.get_ident can reuse ids >> from dead threads. In practice it's probably hard to get into trouble >> doing this, but I still wouldn't emulate it. I think that this was >> written before thread local storage, so it was probably the best that >> could be done. > > > Right, but thread local storage is still Python 2.4 only.... > > What about PyThreadState_GetDict() ? That sounds reasonable. Essentially we would be rolling our own threading.local() >> >> However, if you use threading.local, it will be clunky in a similar >> sense. You'll be storing data in a global namespace you don't >> control and you've got to hope that no one stomps on your variable name. > > The PyThreadState_GetDict() documenation states that extension module > writers should use a unique name based on their extension module. > >> When you have local and module level secret storage names as well >> you're just doing a lot more of that and the chance of collision and >> confusion goes up from almost zero to very small. > > This is true. Similar to the C-variable naming issues. > >>> So, we should at least frame the discussion in terms of what is >>> actually possible. >> >> >> Yes, sorry for spreading misinformation. > > > But you did point out the very important thread-local storage fact > that I had missed. This alone makes me willing to revamp what we are > doing. > >> >> In this case, overflow, underflow and dividebyzero seem pretty self >> documenting to me. And 'invalid' is pretty cryptic in both >> implementations. This may be a matter of taste, but I tend to prefer >> short pithy names for functions that I use a lot, or that crammed a >> bunch to a line. In functions like this, that are more rarely used >> and get a full line to themselves I lean to towards the more verbose. > > > The rarely-used factor is a persuasive argument. > >> Can you elaborate on this a bit? Reading between the lines, there >> seem to be two issues related to speed here. One is the actual >> namespace lookup of the error mode -- there's a setting that says we >> are using the defaults, so don't bother to look. This saves the >> namespace lookup. Changing the defaults shouldn't affect the timing >> of that. I'm not sure how this would interact with thread local >> storage though. >> >> The second issue is that running the core loop with no checks in >> place is faster. > > Basically, on the C-level, the error mode is an integer with specific > bits allocated to the various error-possibilites (2-bits per > possibility). If this is 0 then the error checking is not even done > (thus no error handling at all). > Yes the name-lookup optimization could work with any defaults (but > with thread-specific storage couldn't work anyway). > > One question I have with threads and error handling though? Right > now, the ufuncs release the Python lock during computation (and > re-acquire it to do error handling if needed). If another ufunc was > started by another Python thread and ran with different error > handling, wouldn't the IEEE flags get confused about which ufunc was > setting what? The flags are only checked after each 1-d loop. If > another thread set the processor flag, the current thread could get > very confused. > > This seems like a problem that I'm not sure how to handle. Yeah, me either. It seems that somehow we'll need to block until all current operations are done, but I don't know how to do that off the top of my head. Perhaps ufuncs need to lock the flags when they start and release them when they finish. This looks feasible, but I'm not sure of the proper incantation to get this right. The ufuncs would all need to be able able to increment and decrement the lock, whatever it is, even though they are in different threads. Meanwhile the setting code should only be able to work when the lock is unheld. It's some sort of poly thread recursive lock thing. I'll think about it, perhaps there's an obvious way. >> >> It's not entirely plucked out of the error. As I recall, the decision >> was arrived at something likes this: >> >> 1. Errors should never pass silently (unless explicitly silenced). >> 2. Let's have everything raise by default >> 3. In practice this was no good because you often wanted to look at >> the results and see where the problem was. >> 4. OK, let's have everything warn >> 5. This almost worked, but underflow was almost never a real error, >> so everyone always overrode underflow. A default that you always >> need to override is not a good default. >> 6. So, warn for everything except underflow. Ignore that. >> >> And that's where numarry is today. I and other have been using that >> error system happily for quite some time now. At least I haven't >> heard any complaints for quite a while. > > > I can appreciate this choice, but I don't agree that errors should > never pass silently. You'll notice that we ended up with a slightly more nuanced choice. Besides, the full quote is import: "errors should not pass silently unless explicitly silenced". That's quite a bit different than a blanket error should never pass silently. > The fact that people disagree about this is the reason for the error > handling. Yes. While I like the above defaults, if we have a reasonable approach I can just set them at startup and forget about them. Let's try not to penalize me too much for that though. > Note that overflow is not detected everywhere for integers --- we have > to simulate the floating-point errors for them. Only on integer > multiply is it detected. Checking for it would slow down all other > integer arithmetic --- one solution, of course is to have two > different integer additions (one that checks for overflow and another > that doesn't). Or just document it and don't worry about it. If I'm doing integer arithmetic and I need overflow detection, I can generally cast to doubles and do my math there, casting back at the end as needed. This doesn't seem worth too much extra complication. Is my floating point bias showing? > There is really a bit of work left here to do. Yep. Looks like it, but nothing insurmountable. -tim From strawman at astraw.com Sat Apr 1 15:56:03 2006 From: strawman at astraw.com (Andrew Straw) Date: Sat Apr 1 15:56:03 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442EF7D9.9010404@cox.net> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> <442EF7D9.9010404@cox.net> Message-ID: <442F130E.3060802@astraw.com> Tim Hochberg wrote: > Travis Oliphant wrote: > >> >> One question I have with threads and error handling though? Right >> now, the ufuncs release the Python lock during computation (and >> re-acquire it to do error handling if needed). If another ufunc was >> started by another Python thread and ran with different error >> handling, wouldn't the IEEE flags get confused about which ufunc was >> setting what? The flags are only checked after each 1-d loop. If >> another thread set the processor flag, the current thread could get >> very confused. >> >> This seems like a problem that I'm not sure how to handle. > > > Yeah, me either. It seems that somehow we'll need to block until all > current operations are done, but I don't know how to do that off the > top of my head. Perhaps ufuncs need to lock the flags when they start > and release them when they finish. This looks feasible, but I'm not > sure of the proper incantation to get this right. The ufuncs would all > need to be able able to increment and decrement the lock, whatever it > is, even though they are in different threads. Meanwhile the setting > code should only be able to work when the lock is unheld. It's some > sort of poly thread recursive lock thing. I'll think about it, perhaps > there's an obvious way. I am also absolutely no expert in this area, but isn't this exactly what the kernel supports multiple threads for? In other words, I'm not sure we have to worry about it at all. I expect that the kernel sets/restores the CPU/FPU error flags on thread switches and this is part of the cost associated with switching threads. As I understand it, linux threads are actually implemented as new processes, so if we did have to be worried about this, wouldn't we also have to be worried that program A might alter the FPU error state while we're also using program B? This is just my unsophisticated and possibly wrong understanding of these things. If anyone can help clarify the issue, I'd be glad to be enlightened. Cheers! Andrew From aisaac at american.edu Sat Apr 1 16:12:01 2006 From: aisaac at american.edu (Alan G Isaac) Date: Sat Apr 1 16:12:01 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: Message-ID: On Sat, 1 Apr 2006, (CEST) Arnd Baecker apparently wrote: > one python command which is extremely useful in 1D > situations is `xrange`. Which will very soon be 'range'. Cheers, Alan Isaac From gruben at bigpond.net.au Sat Apr 1 18:46:07 2006 From: gruben at bigpond.net.au (Gary Ruben) Date: Sat Apr 1 18:46:07 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: Message-ID: <442F41AE.1080806@bigpond.net.au> A few rough thoughts: I'm a bit ambivalent about this. It's not very n-dimensional and enforces an x,y,z,(t?) ordering of the array dimensions which some programmers may not want to adhere to. On the occasions I've had to write code which loops over multiple dimensions, I've found the python cookbook routines for permutation and combination generators really useful so I'd find some sort of numpy iterator equivalents of these more useful. This would allow list comprehensions like [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] It would also be good to have it able to specify the rank of the object returned to allow whole array rows or matrices to be returned i.e. array slices. Maybe the ndrange function could allow something like [f(xy,z) for (xy,z) in ndrange((10,0,1),10)] where you use a tuple to specify a range and the axes to slice out. [f(x,yz) for (x,yz) in ndrange(10,(10,1,2))] [f(xz,y) for (xz,y) in ndrange((10,0,2),(10,1))] On the other hand your idea would potentially make some code a lot easier to understand, so I'm not against it and if it was picked up, I'd propose "t" or "w" for the 4th dimension. It might help to post some code that you think might benefit from your idea. Gary R. Arnd Baecker wrote: > Dear numpy enthusiasts, > > one python command which is extremely useful in 1D situations > is `xrange`. However, for higher dimensional > settings we strongly lack the commands `yrange` and `zrange`. > These could be shorthands for the corresponding > constructs with `:,NewAxis` added. > > Any comments, suggestion and even implementations are very welcome, > > Arnd > > P.S.: What I am not sure about is the right command for > the 4-dimensional case - which letter should be used after the "z"? > (it seems that "a" would be a very natural choice...) From rob at hooft.net Sat Apr 1 22:38:04 2006 From: rob at hooft.net (Rob Hooft) Date: Sat Apr 1 22:38:04 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442EE026.8060806@ieee.org> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> Message-ID: <442F7114.40908@hooft.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Travis Oliphant wrote: | save = numpy.seterr(dividebyzero='warn') | | ... | | numpy.seterr(restore=save) Most of this discussion is outside of my scope, but I have programmed this kind of pattern in a different way before: ~ save = context.push(something) ~ ... ~ del save i.e. the destructor of the saved context object restores the old situation. In most cases it will be called by letting "save" go out of scope. I know that relying on timely object destruction can be troublesome when porting to Jython, but it is very convenient in CPython. If that goes too far, one could make a separate method on save: ~ save.pop() This can do sanity checking too (are we really at the top of the stack? Only called once?). The destructor should check whether pop has been called. Rob - -- Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFEL3EUH7J/Cv8rb3QRAuvsAJ9PO6ZITdVSm+hIwxkWDHHbTNFHdQCcDSWI Iv7gupkFc8+Fby/5MFwHQf4= =zE/o -----END PGP SIGNATURE----- From aisaac at american.edu Sun Apr 2 06:58:34 2006 From: aisaac at american.edu (Alan G Isaac) Date: Sun Apr 2 06:58:34 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: <442F41AE.1080806@bigpond.net.au> References: <442F41AE.1080806@bigpond.net.au> Message-ID: On Sun, 02 Apr 2006, Gary Ruben apparently wrote: > I'd find some sort of numpy iterator equivalents of these more > useful. This would allow list comprehensions like > [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] How is this better than using ogrid? E.g., >>> x=N.ogrid[:3,:2] >>> N.power(*x) array([[1, 0], [1, 1], [1, 2]]) Thanks, Alan From cjw at sympatico.ca Sun Apr 2 07:22:09 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Sun Apr 2 07:22:09 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442DD638.60706@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> Message-ID: <442FDDD5.8050404@sympatico.ca> Tim Hochberg wrote: > Sebastian Haase wrote: > >> Thanks Tim, >> that's OK - I got the idea... >> BTW, is there a (policy) reason that you sent the first email just to >> me and not the mailing list !? > > > No. Just clumsy fingers. Probably the same reason the functions got > all garbled! > >> >> I would really be more interested in comments to my first point ;-) >> I think it's important that numpy will not be to cryptic and only for >> "hackers", but nice to look at ... (hope you get what I mean ;-) > > > Well, I think it's probably a good idea and it sounds like Travis like > the idea " for some of the builtin types". I suspect that's code for > "not types for which it doesn't make sense, like recarrays". > Tim, Could you elaborate on this please? Surely, it would be good for all functions and methods to have meaningful parameter lists and good doc strings. Colin W. From tim.hochberg at cox.net Sun Apr 2 08:11:17 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 08:11:17 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442FDDD5.8050404@sympatico.ca> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> Message-ID: <442FE950.8090000@cox.net> Colin J. Williams wrote: > Tim Hochberg wrote: > >> Sebastian Haase wrote: >> >>> Thanks Tim, >>> that's OK - I got the idea... >>> BTW, is there a (policy) reason that you sent the first email just >>> to me and not the mailing list !? >> >> >> >> No. Just clumsy fingers. Probably the same reason the functions got >> all garbled! >> >>> >>> I would really be more interested in comments to my first point ;-) >>> I think it's important that numpy will not be to cryptic and only >>> for "hackers", but nice to look at ... (hope you get what I mean ;-) >> >> >> >> Well, I think it's probably a good idea and it sounds like Travis >> like the idea " for some of the builtin types". I suspect that's code >> for "not types for which it doesn't make sense, like recarrays". >> > Tim, > > Could you elaborate on this please? Surely, it would be good for all > functions and methods to have meaningful parameter lists and good doc > strings. This isn't really about parameter lists and docstrings, it's about __str__ and possibly __repr__. The basic issue is that the way dtypes are displayed is powerful, but unfriendly. If I create an array of integers: >>> a = arange(4) >>> print repr(a.dtype), str(a.dtype) dtype('i4') is not the same as dtype(int32) on my machine and should probably not be displayed using int32[1]. These cases should be rare in practice and it seems fine to fall back to the less friendly but more flexible notation. Recarrays were probably not such a good example. Here is an example from a recarray: dtype([('x', 'i4').name is 'int32' which seems wrong. From tim.hochberg at cox.net Sun Apr 2 08:41:24 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 08:41:24 2006 Subject: [Numpy-discussion] numpy error handling In-Reply-To: <442F7114.40908@hooft.net> References: <442DE773.4060104@cox.net> <442E2F05.5080809@ieee.org> <442E94AD.1040200@cox.net> <442EE026.8060806@ieee.org> <442F7114.40908@hooft.net> Message-ID: <442FF03F.2000406@cox.net> Rob Hooft wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Travis Oliphant wrote: > | save = numpy.seterr(dividebyzero='warn') > | > | ... > | > | numpy.seterr(restore=save) > > Most of this discussion is outside of my scope, but I have programmed > this kind of pattern in a different way before: > > ~ save = context.push(something) > ~ ... > ~ del save > > i.e. the destructor of the saved context object restores the old > situation. In most cases it will be called by letting "save" go out of > scope. I know that relying on timely object destruction can be > troublesome when porting to Jython, but it is very convenient in CPython. > > If that goes too far, one could make a separate method on save: > > ~ save.pop() > > This can do sanity checking too (are we really at the top of the stack? > Only called once?). The destructor should check whether pop has been > called. Well, the syntax that *I* really want is this: class error_mode(object): def __init__(self, all=None, overflow=None, underflow=None, dividebyzero=None, invalid=None): self._args = (overflow, overflow, underflow, dividebyzero, invalid) def __enter__(self): self._save = numpy.seterr(*self._args) def __exit__(self): numpy.seterr(self._save) That way, in a few months, I can do this: with error_mode(overflow='raise'): # do stuff and it will be almost impossible to mess up. This syntax is lighter and cleaner than a stack or relying on garbage collection to free the resources. So, for my purposes, the simple syntax Travis proposes is perfectly adequate and simpler to implement and get right than a stack based approach. If 'with' wasn't coming down the pipe, I would push for a stack, but I like Travis' proposal just fine. YMMV of course. -tim From tim.hochberg at cox.net Sun Apr 2 08:52:09 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 08:52:09 2006 Subject: [Numpy-discussion] observations Message-ID: <442FF2F8.3030906@cox.net> I've been doing a *lot* of playing with numpy over the last several days, so expect various observations to trickle from my abode over the next week or so. Here's the first installment. * tostring probably needs the order flag. I think you want the string generated from a multidimensional array in Fortran and C order to differ. * With the evolution of the order flag, ascontiguousarray is probably redundant, scarcely after it was added. b = asarray(a, order="C") Is actually clearer in intent than: b = ascontiguousarray(a) Does the latter leave a contiguous, Fortran order array alone? That's probably almost never what one wants. Unless your working with Fortran arrays, in which case the opposite ambiguity applies. Regards, -tim From tim.hochberg at cox.net Sun Apr 2 11:20:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 11:20:03 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: <442F41AE.1080806@bigpond.net.au> References: <442F41AE.1080806@bigpond.net.au> Message-ID: <44301590.4050707@cox.net> Gary Ruben wrote: > A few rough thoughts: > > I'm a bit ambivalent about this. It's not very n-dimensional and > enforces an x,y,z,(t?) ordering of the array dimensions which some > programmers may not want to adhere to. On the occasions I've had to > write code which loops over multiple dimensions, I've found the python > cookbook routines for permutation and combination generators really > useful > > > > > so I'd find some sort of numpy iterator equivalents of these more > useful. This would allow list comprehensions like > > [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] > > It would also be good to have it able to specify the rank of the > object returned to allow whole array rows or matrices to be returned > i.e. array slices. Maybe the ndrange function could allow something like > > [f(xy,z) for (xy,z) in ndrange((10,0,1),10)] > where you use a tuple to specify a range and the axes to slice out. > [f(x,yz) for (x,yz) in ndrange(10,(10,1,2))] > [f(xz,y) for (xz,y) in ndrange((10,0,2),(10,1))] > > On the other hand your idea would potentially make some code a lot > easier to understand, so I'm not against it and if it was picked up, > I'd propose "t" or "w" for the 4th dimension. It might help to post > some code that you think might benefit from your idea. Bah, humbug! "Not every two-line Python function has to come pre-written" -- Tim Peters on C.L.P def xrange(*args, **kwargs): return arange(*args, **kwargs) def yrange(*args, **kwargs): return padshape(arange(*args, **kwargs), 2) def zrange(*args, **kwargs): return padshape(arange(*args, **kwargs), 3) def trange(*args, **kwargs): return padshape(arange(*args, **kwargs), 4) Of course, then you need padshape which I'd be happy to contribute. I'm of the opinion that we should be trying to improve the usefullness of a smallish set of core primitives, not adding endless new functions. Stuff like this, which is of interest in a relatively limited domain and is trivial to implement when needed, should either not be added at all, or added in a separate module. >>> len(dir(numpy)) 476 Does anyone know what all of that does? I certainly don't. And I doubt anyone uses more than a fraction of that interface. I wouldn't be the least bit suprised if there are old moldy parts of that are essentially used. And, unused code is buggy code in my experience. "Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away." -- Antoine de Saint-Exupery It's probably difficult at this point in numpy's life cycle to remove stuff or even reorganize things substantially. Besides, I'm sure all the developers have their hands full doing more important, or at least less contentious, things. Still, I think we should cast a more critical eye on new stuff before adding it. Regards, -tim > > Gary R. > > Arnd Baecker wrote: > >> Dear numpy enthusiasts, >> >> one python command which is extremely useful in 1D situations >> is `xrange`. However, for higher dimensional >> settings we strongly lack the commands `yrange` and `zrange`. >> These could be shorthands for the corresponding >> constructs with `:,NewAxis` added. >> >> Any comments, suggestion and even implementations are very welcome, >> >> Arnd >> >> P.S.: What I am not sure about is the right command for >> the 4-dimensional case - which letter should be used after the "z"? >> (it seems that "a" would be a very natural choice...) > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From arnd.baecker at web.de Sun Apr 2 11:23:04 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Sun Apr 2 11:23:04 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: <442F41AE.1080806@bigpond.net.au> References: <442F41AE.1080806@bigpond.net.au> Message-ID: Hi, On Sun, 2 Apr 2006, Gary Ruben wrote: > A few rough thoughts: [... useful stuff snipped ... ] > On the other hand your idea would potentially make some code a lot > easier to understand, so I'm not against it and if it was picked up, I'd > propose "t" or "w" for the 4th dimension. It might help to post some > code that you think might benefit from your idea. Hope you don't jump at me, but I would like to wait until April 1st next year then ... ((hmm, maybe my post contained too much of a possible truth to be considered as an April fools joke - yrange and zrange have been a running gag in our group for a while now - strange German humor ...;-)) Anyway, I hope I did not waste too much of your time ... Best, Arnd > Gary R. > > Arnd Baecker wrote: > > Dear numpy enthusiasts, > > > > one python command which is extremely useful in 1D situations > > is `xrange`. However, for higher dimensional > > settings we strongly lack the commands `yrange` and `zrange`. > > These could be shorthands for the corresponding > > constructs with `:,NewAxis` added. > > > > Any comments, suggestion and even implementations are very welcome, > > > > Arnd > > > > P.S.: What I am not sure about is the right command for > > the 4-dimensional case - which letter should be used after the "z"? > > (it seems that "a" would be a very natural choice...) > > From tim.hochberg at cox.net Sun Apr 2 11:34:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 11:34:03 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: <442F41AE.1080806@bigpond.net.au> Message-ID: <44301908.2000607@cox.net> Arnd Baecker wrote: >Hi, > >On Sun, 2 Apr 2006, Gary Ruben wrote: > > > >>A few rough thoughts: >> >> > >[... useful stuff snipped ... ] > > > >>On the other hand your idea would potentially make some code a lot >>easier to understand, so I'm not against it and if it was picked up, I'd >>propose "t" or "w" for the 4th dimension. It might help to post some >>code that you think might benefit from your idea. >> >> > >Hope you don't jump at me, but I would like to >wait until April 1st next year then ... >((hmm, maybe my post contained too much of a possible truth >to be considered as an April fools joke - >yrange and zrange have been a running gag in our group for >a while now - strange German humor ...;-)) > >Anyway, I hope I did not waste too much of your time ... > > Ouch! Got me anyway... >Best, Arnd > > > > >>Gary R. >> >>Arnd Baecker wrote: >> >> >>>Dear numpy enthusiasts, >>> >>>one python command which is extremely useful in 1D situations >>>is `xrange`. However, for higher dimensional >>>settings we strongly lack the commands `yrange` and `zrange`. >>>These could be shorthands for the corresponding >>>constructs with `:,NewAxis` added. >>> >>>Any comments, suggestion and even implementations are very welcome, >>> >>>Arnd >>> >>>P.S.: What I am not sure about is the right command for >>>the 4-dimensional case - which letter should be used after the "z"? >>>(it seems that "a" would be a very natural choice...) >>> >>> >> >> > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > From schofield at ftw.at Sun Apr 2 13:05:02 2006 From: schofield at ftw.at (Ed Schofield) Date: Sun Apr 2 13:05:02 2006 Subject: [Numpy-discussion] Deprecating old names In-Reply-To: <44301590.4050707@cox.net> References: <442F41AE.1080806@bigpond.net.au> <44301590.4050707@cox.net> Message-ID: <44302EA9.9050302@ftw.at> Tim Hochberg wrote, in a different thread: > >>> len(dir(numpy)) > 476 > > Does anyone know what all of that does? I certainly don't. And I doubt > anyone uses more than a fraction of that interface. I wouldn't be the > least bit suprised if there are old moldy parts of that are > essentially used. And, unused code is buggy code in my experience. > > "Perfection is achieved, not when there is nothing more to add, but > when there is nothing left to take away." -- Antoine de Saint-Exupery I'd like to revise a proposal I made last week. Then I proposed that we reduce namespace clutter by not importing the contents of the oldnumeric namespace by default. But Travis didn't want to deprecate the functional interfaces (sum(), take(), etc), so I now propose instead that we split up the contents of oldnumeric.py into interfaces we want to keep around indefinitely and interfaces we don't. The ones we want to keep could go into another file, e.g. fromnumeric.py, whose contents are imported into the numpy namespace by default. The deprecated ones could stay in oldnumeric.py, and could be accessible through 'from oldnumeric import *' at the top of source files, but not imported by default. Strong candidates for deprecation are the capitalised type names, like Int8, Complex64, UnsignedInt. I'd also argue for deprecating NewAxis, UFuncType, ArrayType, arraytype, and anything else that duplicates functionality available under NumPy under a different name. Two of the Python design principles (from http://www.python.org/dev/culture/) are: - There should be one -- and preferably only one -- obvious way to do it. - Namespaces are one honking great idea -- let's do more of those! Let's clean up the cruft! -- Ed From gruben at bigpond.net.au Sun Apr 2 16:06:10 2006 From: gruben at bigpond.net.au (Gary Ruben) Date: Sun Apr 2 16:06:10 2006 Subject: [Numpy-discussion] extension to xrange for numpy In-Reply-To: References: <442F41AE.1080806@bigpond.net.au> Message-ID: <443058AE.2070808@bigpond.net.au> Doh! It's OK Arnd; I've recently seen you (or someone else withe the same name) acknowledged in a PhD I've been reading so I suspect you're a nice guy :-) And, thanks Alan. I knew about mgrid but not ogrid. One small way in which that example might be better than using ogrid is that you could avoid creating the index arrays and lazily generate the indices. However, ogrid is better than mgrid in this respect. thanks, Gary Alan G Isaac wrote: > On Sun, 02 Apr 2006, Gary Ruben apparently wrote: >> I'd find some sort of numpy iterator equivalents of these more >> useful. This would allow list comprehensions like >> [f(x,y,z) for (x,y,z) in ndrange(10,10,10)] > > How is this better than using ogrid? E.g., > >>>> x=N.ogrid[:3,:2] >>>> N.power(*x) > array([[1, 0], > [1, 1], > [1, 2]]) > > Thanks, > Alan From zpincus at stanford.edu Sun Apr 2 16:07:07 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Sun Apr 2 16:07:07 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? Message-ID: Hi folks, I have a inner loop that looks like this: out = [] for elem1 in l1: for elem2 in l2: out.append(do_something(l1, l2)) result = do_something_else(out) where do_something and do_something_else are implemented with only numpy ufuncs, and l1 and l2 are numpy arrays. As an example, I need to compute the median distance from any element in one set to any element in another set. What's the best way to speed this sort of thing up with numpy (e.g. push as much down into the underlying C as possible)? I could re- write do_something with the numexpr tools (which are very cool), but that doesn't address the fact that I've still got nested loops living in Python. Perhaps there's some way in numpy to make one big honking array that contains all the pairs from the two lists, and then just run my do_something on that huge array, but that of course scales poorly. Any thoughts? Zach From tim.hochberg at cox.net Sun Apr 2 16:53:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 16:53:05 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: Message-ID: <443063C0.3050002@cox.net> Zachary Pincus wrote: > Hi folks, Hi Zach, > > I have a inner loop that looks like this: > out = [] > for elem1 in l1: > for elem2 in l2: > out.append(do_something(l1, l2)) this is do_something(elem1, elem2), correct? > result = do_something_else(out) > > where do_something and do_something_else are implemented with only > numpy ufuncs, and l1 and l2 are numpy arrays. > > As an example, I need to compute the median distance from any element > in one set to any element in another set. > > What's the best way to speed this sort of thing up with numpy (e.g. > push as much down into the underlying C as possible)? I could re- > write do_something with the numexpr tools (which are very cool), but > that doesn't address the fact that I've still got nested loops living > in Python. The exact approach I'd take would depend on the sizes of l1 and l2 and a certain amount of trial and error. However, the first thing I'd try is: n1 = len(l1) n2 = len(l2) out = numpy.zeros([n1*n2], appropriate_dtype) for i, elem1 in enumerate(l1): out[i*n2:(i+1)*n2] = do_something(elem1, l1) result = do_something_else(out) That may work as is, or you may have to tweak do_something slightly to handle l1 correctly. You might also try to do the operations in place and stuff the results into out directly by using X= and three argument ufuncs. I'd not do that at first though. One thing to consider is that, in my experience, numpy works best on chunks of about 10,000 elements. I believe that this is a function of cache size. Anyway, this may choice of which of l1 and l2 you continue to loop over, and which you vectorize. If they both might get really big, you could even consider chopping up l1 when you vectorize it. Again I wouldn't do that unless it really looks like you need it. If that all sounds opaque, feel free to ask more questions. Or if you have questions about microoptimizing the guts of do_something, I have a bunch of experience with that and I like a good puzzle. > > Perhaps there's some way in numpy to make one big honking array that > contains all the pairs from the two lists, and then just run my > do_something on that huge array, but that of course scales poorly. I know of at least one way, but it's a bit of a kludge. I don't think I'd try that though. As you said, it scales poorly. As long as you can vectorize your inner loop, it's not necessary and sometimes makes things worse, to vectorize your outer loop as well. That's assuming your inner loop is large, it doesn't help if your inner loop is 3 elements long for instance, but that doesn't seem like it should be a problem here. Regards, -tim From haase at msg.ucsf.edu Sun Apr 2 17:01:04 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Sun Apr 2 17:01:04 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442FE950.8090000@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> Message-ID: <44306594.50305@msg.ucsf.edu> Tim Hochberg wrote: > This would work fine if repr were instead: > > dtype([('x', float64), ('z', complex128)]) > > Anyway, this all seems reasonable to me at first glance. That said, I > don't plan to work on this, I've got other fish to fry at the moment. A new point: Please remind me (and probably others): when did it get decided to introduce 'complex128' to mean numarray's complex64 and the 'complex64' to mean numarray's complex32 ? I do understand the logic that 128 is really the bit-size of one (complex) element - but I also liked the old way, because: 1. e.g. in fft transforms, float32 would "go with" complex32 and float64 with complex64 2. complex128 is one character extra (longer) and also (alphabetically) now sorts before(!) complex64 These might just be my personal (idiotic ;-) comments - but I would appreciate some feedback/comments. Also: Is it now to late to (re-)start a discussion on this !? Thanks - Sebastian Haase From haase at msg.ucsf.edu Sun Apr 2 17:09:07 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Sun Apr 2 17:09:07 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <442FE950.8090000@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> Message-ID: <44306774.5030507@msg.ucsf.edu> Tim Hochberg wrote: > This would work fine if repr were instead: > > dtype([('x', float64), ('z', complex128)]) > > Anyway, this all seems reasonable to me at first glance. That said, I > don't plan to work on this, I've got other fish to fry at the moment. A new point: Please remind me (and probably others): when did it get decided to introduce 'complex128' to mean numarray's complex64 and the 'complex64' to mean numarray's complex32 ? I do understand the logic that 128 is really the bit-size of one (complex) element - but I also liked the old way, because: 1. e.g. in fft transforms, float32 would "go with" complex32 and float64 with complex64 2. complex128 is one character extra (longer) and also (alphabetically) now sorts before(!) complex64 3 Mostly of course: this new naming will confuse all my code and introduce hard to find bugs - when I see complex64 I will "think" the old way for quite some time ... These might just be my personal (idiotic ;-) comments - but I would appreciate some feedback/comments. Also: Is it now to late to (re-)start a discussion on this !? Thanks - Sebastian Haase From zpincus at stanford.edu Sun Apr 2 17:17:06 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Sun Apr 2 17:17:06 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <443063C0.3050002@cox.net> References: <443063C0.3050002@cox.net> Message-ID: <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> Tim - Thanks for your suggestions -- that all makes good sense. It sounds like the general take home message is, as always: "the first thing to try is to vectorize your inner loop." Zach >> I have a inner loop that looks like this: >> out = [] >> for elem1 in l1: >> for elem2 in l2: >> out.append(do_something(l1, l2)) > > this is do_something(elem1, elem2), correct? > >> result = do_something_else(out) >> >> where do_something and do_something_else are implemented with >> only numpy ufuncs, and l1 and l2 are numpy arrays. >> >> As an example, I need to compute the median distance from any >> element in one set to any element in another set. >> >> What's the best way to speed this sort of thing up with numpy >> (e.g. push as much down into the underlying C as possible)? I >> could re- write do_something with the numexpr tools (which are >> very cool), but that doesn't address the fact that I've still got >> nested loops living in Python. > > The exact approach I'd take would depend on the sizes of l1 and l2 > and a certain amount of trial and error. However, the first thing > I'd try is: > > n1 = len(l1) > n2 = len(l2) > out = numpy.zeros([n1*n2], appropriate_dtype) > for i, elem1 in enumerate(l1): > out[i*n2:(i+1)*n2] = do_something(elem1, l1) > result = do_something_else(out) > > That may work as is, or you may have to tweak do_something slightly > to handle l1 correctly. You might also try to do the operations in > place and stuff the results into out directly by using X= and three > argument ufuncs. I'd not do that at first though. > > One thing to consider is that, in my experience, numpy works best > on chunks of about 10,000 elements. I believe that this is a > function of cache size. Anyway, this may choice of which of l1 and > l2 you continue to loop over, and which you vectorize. If they both > might get really big, you could even consider chopping up l1 when > you vectorize it. Again I wouldn't do that unless it really looks > like you need it. > > If that all sounds opaque, feel free to ask more questions. Or if > you have questions about microoptimizing the guts of do_something, > I have a bunch of experience with that and I like a good puzzle. > >> >> Perhaps there's some way in numpy to make one big honking array >> that contains all the pairs from the two lists, and then just run >> my do_something on that huge array, but that of course scales >> poorly. > > I know of at least one way, but it's a bit of a kludge. I don't > think I'd try that though. As you said, it scales poorly. As long > as you can vectorize your inner loop, it's not necessary and > sometimes makes things worse, to vectorize your outer loop as well. > That's assuming your inner loop is large, it doesn't help if your > inner loop is 3 elements long for instance, but that doesn't seem > like it should be a problem here. > > Regards, > > -tim > From haase at msg.ucsf.edu Sun Apr 2 17:21:14 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Sun Apr 2 17:21:14 2006 Subject: [Fwd: Re: [Numpy-discussion] first impressions with numpy] Message-ID: <44306A2C.4040606@msg.ucsf.edu> supposedly meant for the whole list ... From: Tim Hochberg Sebastian Haase wrote: > Tim Hochberg wrote: > > >> This would work fine if repr were instead: >> >> dtype([('x', float64), ('z', complex128)]) >> >> Anyway, this all seems reasonable to me at first glance. That said, I >> don't plan to work on this, I've got other fish to fry at the moment. > > > A new point: Please remind me (and probably others): when did it get > decided to introduce 'complex128' to mean numarray's complex64 > and the 'complex64' to mean numarray's complex32 ? I haven't the faintest idea -- it happened when I was off in Numarray land I assume. Or it was always that way? No idea. Hopefully Travis will answer this. -tim > > I do understand the logic that 128 is really the bit-size of one > (complex) element - but I also liked the old way, because: > 1. e.g. in fft transforms, float32 would "go with" complex32 > and float64 with complex64 > 2. complex128 is one character extra (longer) and also > (alphabetically) now sorts before(!) complex64 > 3 Mostly of course: this new naming will confuse all my code and > introduce hard to find bugs - when I see complex64 I will "think" the > old way for quite some time ... > > > These might just be my personal (idiotic ;-) comments - but I would > appreciate some feedback/comments. > Also: Is it now to late to (re-)start a discussion on this !? > > Thanks > - Sebastian Haase > > From tim.hochberg at cox.net Sun Apr 2 17:53:01 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 2 17:53:01 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> Message-ID: <443071BA.4090606@cox.net> Zachary Pincus wrote: > Tim - > > Thanks for your suggestions -- that all makes good sense. > > It sounds like the general take home message is, as always: "the > first thing to try is to vectorize your inner loop." Exactly and far more pithy than my meanderings. If I were going to make a list it would look something like: 0. Think about your algorithm. 1. Vectorize your inner loop. 2. Eliminate temporaries 3. Ask for help 4. Recode in C. 5 Accept that your code will never be fast. Step zero should probably be repeated after every other step ;) -tim > > Zach > > >>> I have a inner loop that looks like this: >>> out = [] >>> for elem1 in l1: >>> for elem2 in l2: >>> out.append(do_something(l1, l2)) >> >> >> this is do_something(elem1, elem2), correct? >> >>> result = do_something_else(out) >>> >>> where do_something and do_something_else are implemented with only >>> numpy ufuncs, and l1 and l2 are numpy arrays. >>> >>> As an example, I need to compute the median distance from any >>> element in one set to any element in another set. >>> >>> What's the best way to speed this sort of thing up with numpy >>> (e.g. push as much down into the underlying C as possible)? I >>> could re- write do_something with the numexpr tools (which are very >>> cool), but that doesn't address the fact that I've still got >>> nested loops living in Python. >> >> >> The exact approach I'd take would depend on the sizes of l1 and l2 >> and a certain amount of trial and error. However, the first thing >> I'd try is: >> >> n1 = len(l1) >> n2 = len(l2) >> out = numpy.zeros([n1*n2], appropriate_dtype) >> for i, elem1 in enumerate(l1): >> out[i*n2:(i+1)*n2] = do_something(elem1, l1) >> result = do_something_else(out) >> >> That may work as is, or you may have to tweak do_something slightly >> to handle l1 correctly. You might also try to do the operations in >> place and stuff the results into out directly by using X= and three >> argument ufuncs. I'd not do that at first though. >> >> One thing to consider is that, in my experience, numpy works best on >> chunks of about 10,000 elements. I believe that this is a function >> of cache size. Anyway, this may choice of which of l1 and l2 you >> continue to loop over, and which you vectorize. If they both might >> get really big, you could even consider chopping up l1 when you >> vectorize it. Again I wouldn't do that unless it really looks like >> you need it. >> >> If that all sounds opaque, feel free to ask more questions. Or if >> you have questions about microoptimizing the guts of do_something, I >> have a bunch of experience with that and I like a good puzzle. >> >>> >>> Perhaps there's some way in numpy to make one big honking array >>> that contains all the pairs from the two lists, and then just run >>> my do_something on that huge array, but that of course scales poorly. >> >> >> I know of at least one way, but it's a bit of a kludge. I don't >> think I'd try that though. As you said, it scales poorly. As long >> as you can vectorize your inner loop, it's not necessary and >> sometimes makes things worse, to vectorize your outer loop as well. >> That's assuming your inner loop is large, it doesn't help if your >> inner loop is 3 elements long for instance, but that doesn't seem >> like it should be a problem here. >> >> Regards, >> >> -tim >> > > > From oliphant.travis at ieee.org Sun Apr 2 21:14:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Apr 2 21:14:01 2006 Subject: [Numpy-discussion] Deprecating old names In-Reply-To: <44302EA9.9050302@ftw.at> References: <442F41AE.1080806@bigpond.net.au> <44301590.4050707@cox.net> <44302EA9.9050302@ftw.at> Message-ID: <4430A0BF.1080207@ieee.org> Ed Schofield wrote: > Tim Hochberg wrote, in a different thread: > >> >>> len(dir(numpy)) >> 476 >> >> Does anyone know what all of that does? I certainly don't. And I doubt >> anyone uses more than a fraction of that interface. I wouldn't be the >> least bit suprised if there are old moldy parts of that are >> essentially used. And, unused code is buggy code in my experience. >> >> "Perfection is achieved, not when there is nothing more to add, but >> when there is nothing left to take away." -- Antoine de Saint-Exupery >> > > I'd like to revise a proposal I made last week. Then I proposed that we > reduce namespace clutter by not importing the contents of the oldnumeric > namespace by default. But Travis didn't want to deprecate the > functional interfaces (sum(), take(), etc), so I now propose instead > that we split up the contents of oldnumeric.py into interfaces we want > to keep around indefinitely and interfaces we don't. Good idea... -Travis From rob at hooft.net Sun Apr 2 22:46:09 2006 From: rob at hooft.net (Rob W.W. Hooft) Date: Sun Apr 2 22:46:09 2006 Subject: [Fwd: Re: [Numpy-discussion] first impressions with numpy] In-Reply-To: <44306A2C.4040606@msg.ucsf.edu> References: <44306A2C.4040606@msg.ucsf.edu> Message-ID: <4430B5D6.7020907@hooft.net> Sebastian Haase wrote: >> A new point: Please remind me (and probably others): when did it get >> decided to introduce 'complex128' to mean numarray's complex64 >> and the 'complex64' to mean numarray's complex32 ? > > > I haven't the faintest idea -- it happened when I was off in Numarray > land I assume. Or it was always that way? No idea. Hopefully Travis will > answer this. Fortran heritage? REAL*8 is paired with COMPLEX*16 there.... Regards, Rob Hooft From arnd.baecker at web.de Mon Apr 3 02:18:08 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 3 02:18:08 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: Message-ID: Hi, On Sun, 2 Apr 2006, Zachary Pincus wrote: > Hi folks, > > I have a inner loop that looks like this: > out = [] > for elem1 in l1: > for elem2 in l2: > out.append(do_something(l1, l2)) > result = do_something_else(out) > > where do_something and do_something_else are implemented with only > numpy ufuncs, and l1 and l2 are numpy arrays. > > As an example, I need to compute the median distance from any element > in one set to any element in another set. > > What's the best way to speed this sort of thing up with numpy (e.g. > push as much down into the underlying C as possible)? I could re- > write do_something with the numexpr tools (which are very cool), but > that doesn't address the fact that I've still got nested loops living > in Python. If do_something eats arrays, you could try: result = do_something(l1[:,NewAxis], l2) E.g.: from numpy import * l1 = linspace(0.0, pi, 10) l2 = linspace(0.0, pi, 3) def f(y, x): return sin(y)*cos(x) print f(l1[:,NewAxis], l2) ((Note that I just learned in some other thread that with numpy there is an alternative to NewAxis, but I haven't figured out which that is ...)) Best, Arnd From zpincus at stanford.edu Mon Apr 3 08:50:10 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Mon Apr 3 08:50:10 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <443071BA.4090606@cox.net> References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: > If I were going to make a list it would look something like: > > 0. Think about your algorithm. > 1. Vectorize your inner loop. > 2. Eliminate temporaries > 3. Ask for help > 4. Recode in C. > 5 Accept that your code will never be fast. > > Step zero should probably be repeated after every other step ;) Thanks for this list -- it's a good one. Since we're discussing this, could I ask about the best way to eliminate temporaries? If you're using ufuncs, is there some way to make them work in-place? Or is the lowest-hanging fruit (temporary- wise) typically elsewhere? Zach From tim.hochberg at cox.net Mon Apr 3 10:10:40 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 3 10:10:40 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: Message-ID: <44315633.4010600@cox.net> Arnd Baecker wrote: [SNIP] >((Note that I just learned in some other thread that with numpy there is >an alternative to NewAxis, but I haven't figured out which that is ...)) > > If you're old school you could just use None. But you probably mean 'newaxis'. -tim From robert.kern at gmail.com Mon Apr 3 10:19:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 3 10:19:02 2006 Subject: [Numpy-discussion] Re: Speed up function on cross product of two sets? In-Reply-To: References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: Zachary Pincus wrote: >> If I were going to make a list it would look something like: >> >> 0. Think about your algorithm. >> 1. Vectorize your inner loop. >> 2. Eliminate temporaries >> 3. Ask for help >> 4. Recode in C. >> 5 Accept that your code will never be fast. >> >> Step zero should probably be repeated after every other step ;) > > Thanks for this list -- it's a good one. > > Since we're discussing this, could I ask about the best way to > eliminate temporaries? If you're using ufuncs, is there some way to > make them work in-place? Or is the lowest-hanging fruit (temporary- > wise) typically elsewhere? Many binary ufuncs take an optional third argument which is an array which the ufunc should put the result in. In [2]: x = arange(10) In [3]: y = arange(10) In [4]: id(x) Out[4]: 91297984 In [5]: add(x, y, x) Out[5]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]) In [6]: id(Out[5]) Out[6]: 91297984 In [7]: x Out[7]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18]) -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Mon Apr 3 10:36:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 3 10:36:05 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: <44315CD6.3010001@cox.net> Zachary Pincus wrote: >> If I were going to make a list it would look something like: >> >> 0. Think about your algorithm. >> 1. Vectorize your inner loop. >> 2. Eliminate temporaries >> 3. Ask for help >> 4. Recode in C. >> 5 Accept that your code will never be fast. >> >> Step zero should probably be repeated after every other step ;) > > > Thanks for this list -- it's a good one. > > Since we're discussing this, could I ask about the best way to > eliminate temporaries? If you're using ufuncs, is there some way to > make them work in-place? Or is the lowest-hanging fruit (temporary- > wise) typically elsewhere? The least cryptic is to use *=, +=, where you can. But that only get's you so far. As you guessed, there is a secret extra argument to ufuncs that allow you to do results in place. One could replace scratch=a*(b+sqrt(a)) with: >>> scratch = zeros([5], dtype=float) >>> a = arange(5, dtype=float) >>> b = arange(5, dtype=float) >>> sqrt(a, scratch) array([ 0. , 1. , 1.41421356, 1.73205081, 2. ]) >>> add(scratch, b, scratch) array([ 0. , 2. , 3.41421356, 4.73205081, 6. ]) >>> multiply(a, scratch) array([ 0. , 2. , 6.82842712, 14.19615242, 24. ]) The downside of this is that your code goes from comprehensible to insanely cryprtic pretty fast. I only resort to this in extreme circumstances. You could also use numexpr, which should be faster and is much less cryptic, but may not be completely stable yet. Oh, and don't forget step 0, that's sometimes a good way to reduce temporaries. regards, -tim From verveer at embl-heidelberg.de Mon Apr 3 12:00:04 2006 From: verveer at embl-heidelberg.de (Peter Verveer) Date: Mon Apr 3 12:00:04 2006 Subject: [Numpy-discussion] Re: Speed up function on cross product of two sets? In-Reply-To: References: <443063C0.3050002@cox.net> <8717BF2C-01EB-422A-B63A-7807E607DEE9@stanford.edu> <443071BA.4090606@cox.net> Message-ID: <012B117C-4046-4058-B7F9-AC5EDB68A532@embl-heidelberg.de> On 3 Apr 2006, at 19:17, Robert Kern wrote: > Zachary Pincus wrote: >>> If I were going to make a list it would look something like: >>> >>> 0. Think about your algorithm. >>> 1. Vectorize your inner loop. >>> 2. Eliminate temporaries >>> 3. Ask for help >>> 4. Recode in C. >>> 5 Accept that your code will never be fast. >>> >>> Step zero should probably be repeated after every other step ;) >> >> Thanks for this list -- it's a good one. >> >> Since we're discussing this, could I ask about the best way to >> eliminate temporaries? If you're using ufuncs, is there some way to >> make them work in-place? Or is the lowest-hanging fruit (temporary- >> wise) typically elsewhere? > > Many binary ufuncs take an optional third argument which is an > array which the > ufunc should put the result in. I wished many times that all functions would support an optional output argument. It is not only important for speed optimization, but also if you work with large data sets. I guess the use of a return values is much more natural but when the point comes that you want to optimize your algorithm, the ability to use an output argument instead is very valuable. It would be nice if all functions by default would support a standard keyword argument 'output', just like ufuncs do. I suppose these could in principle be added while still maintaining backwards compatibility. Cheers, Peter From oliphant at ee.byu.edu Mon Apr 3 15:59:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 3 15:59:06 2006 Subject: [Numpy-discussion] first impressions with numpy In-Reply-To: <44306594.50305@msg.ucsf.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> Message-ID: <4431A8A0.9010604@ee.byu.edu> Sebastian Haase wrote: > Tim Hochberg wrote: > > >> This would work fine if repr were instead: >> >> dtype([('x', float64), ('z', complex128)]) >> >> Anyway, this all seems reasonable to me at first glance. That said, I >> don't plan to work on this, I've got other fish to fry at the moment. > > > A new point: Please remind me (and probably others): when did it get > decided to introduce 'complex128' to mean numarray's complex64 > and the 'complex64' to mean numarray's complex32 ? It was last February (i.e. 2005) when I first started posting regarding the new NumPy. I claimed it was more consistent to use actual bit-widths. A few people, including Perry, indicated they weren't opposed to the change and so I went ahead with it. You can read relevant posts by searching on numpy-discussion at lists.sourceforge.net Discussions are always welcome. I suppose it's not too late to change something like this --- but it's getting there... -Travis From ryanlists at gmail.com Mon Apr 3 17:50:03 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Mon Apr 3 17:50:03 2006 Subject: [Numpy-discussion] string matrices Message-ID: I am trying to use NumPy to generate some matrix inputs to Maxima for symbolic analysis. I am using a fair number of matrix.astype('S%d'%maxlen) statements. This seems to work very well. It also doesn't seem to pad the elements in anyway if maxlen is bigger than I need, which is great. This may seem like a dumb computer science question, but what is the memory/performance cost of making maxlen bigger than I want (but making sure that it is way bigger than I need so that the elements don't get truncated)? If my biggest matrices will be 13x13, how long can the strings be before I consume more than a few megs (or a few dozen megs) of memory? Thanks, Ryan From haase at msg.ucsf.edu Mon Apr 3 22:06:05 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Mon Apr 3 22:06:05 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 (was: first impressions with numpy In-Reply-To: <4431A8A0.9010604@ee.byu.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> Message-ID: <4431FE90.6060301@msg.ucsf.edu> Hi, Could we start another poll on this !? I think I would vote +1 for complex32 & complex64 mostly just because of "that's what I'm used to" But I'm curious to hear what others "know to be in use" - e.g. Matlab or IDL ! - Thanks Sebastian Haase Travis Oliphant wrote: > Sebastian Haase wrote: > >> Tim Hochberg wrote: >> >> >>> This would work fine if repr were instead: >>> >>> dtype([('x', float64), ('z', complex128)]) >>> >>> Anyway, this all seems reasonable to me at first glance. That said, I >>> don't plan to work on this, I've got other fish to fry at the moment. >> >> >> A new point: Please remind me (and probably others): when did it get >> decided to introduce 'complex128' to mean numarray's complex64 >> and the 'complex64' to mean numarray's complex32 ? > > It was last February (i.e. 2005) when I first started posting regarding > the new NumPy. I claimed it was more consistent to use actual > bit-widths. A few people, including Perry, indicated they weren't > opposed to the change and so I went ahead with it. > > You can read relevant posts by searching on > numpy-discussion at lists.sourceforge.net > > Discussions are always welcome. I suppose it's not too late to change > something like this --- but it's getting there... > > -Travis From robert.kern at gmail.com Mon Apr 3 22:41:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 3 22:41:02 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4431FE90.6060301@msg.ucsf.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: Sebastian Haase wrote: > Hi, > Could we start another poll on this !? Please, let's leave voting as a method of last resort. > I think I would vote > +1 for complex32 & complex64 mostly just because of "that's what I'm > used to" > > But I'm curious to hear what others "know to be in use" - e.g. Matlab or > IDL ! On the merits of the issue, I like the new scheme better. For whatever reason, I tend to remember it when coding. With Numeric, I would frequently second-guess myself and go to the prompt and tab-complete to look at all of the options and reason out the one I wanted. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Mon Apr 3 22:49:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 3 22:49:02 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: <443208B9.40106@cox.net> Robert Kern wrote: >Sebastian Haase wrote: > > >>Hi, >>Could we start another poll on this !? >> >> > >Please, let's leave voting as a method of last resort. > > > >>I think I would vote >>+1 for complex32 & complex64 mostly just because of "that's what I'm >>used to" >> >>But I'm curious to hear what others "know to be in use" - e.g. Matlab or >>IDL ! >> >> > >On the merits of the issue, I like the new scheme better. For whatever reason, I >tend to remember it when coding. With Numeric, I would frequently second-guess >myself and go to the prompt and tab-complete to look at all of the options and >reason out the one I wanted. > > I can't bring myself to care. I almost always use dtype=complex and on the rare times I don't I can never remember what the scheme is regardless of which scheme it is / was / will be. On the other hand, if the scheme was Complex32x2 and Complex64x2, I could probably decipher what that was without looking it up. It is is a little ugly and weird I admit, but that probably wouldn't bother me. Regards, -tim From arnd.baecker at web.de Mon Apr 3 23:36:00 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 3 23:36:00 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: On Tue, 4 Apr 2006, Robert Kern wrote: > Sebastian Haase wrote: > > Hi, > > Could we start another poll on this !? > > Please, let's leave voting as a method of last resort. > > > I think I would vote > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > used to" > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab or > > IDL ! > > On the merits of the issue, I like the new scheme better. For whatever reason, I > tend to remember it when coding. With Numeric, I would frequently second-guess > myself and go to the prompt and tab-complete to look at all of the options and > reason out the one I wanted. In order to get an opionion on the subject: How would one presently find out about the meaning of complex64 and complex128? The following attempt does not help: In [1]:import numpy In [2]:numpy.complex64? Type: type Base Class: String Form: Namespace: Interactive Docstring: In [3]:numpy.complex128? Type: type Base Class: String Form: Namespace: Interactive Docstring: I also looked in Travis' "Guide to NumPy", where the different types are discussed on page 18 (referring to the sample chapters at http://www.tramy.us/guidetoscipy.html) Maybe chapter 12 contains more info on this ((our library was still not able to buy the 20 copies since this request was approved a month ago ...)) Best, Arnd From cjw at sympatico.ca Tue Apr 4 06:20:44 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Tue Apr 4 06:20:44 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 In-Reply-To: <4431FE90.6060301@msg.ucsf.edu> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: <443271C9.6080907@sympatico.ca> Sebastian Haase wrote: > Hi, > Could we start another poll on this !? > > I think I would vote > +1 for complex32 & complex64 mostly just because of "that's what I'm > used to" +1 Most people look to the number to give a clue as to the precision of the value. Colin W. > > But I'm curious to hear what others "know to be in use" - e.g. Matlab > or IDL ! > > - Thanks > Sebastian Haase > > > > Travis Oliphant wrote: > >> Sebastian Haase wrote: >> >>> Tim Hochberg wrote: >>> >>> >>>> This would work fine if repr were instead: >>>> >>>> dtype([('x', float64), ('z', complex128)]) >>>> >>>> Anyway, this all seems reasonable to me at first glance. That said, >>>> I don't plan to work on this, I've got other fish to fry at the >>>> moment. >>> >>> >>> >>> A new point: Please remind me (and probably others): when did it get >>> decided to introduce 'complex128' to mean numarray's complex64 >>> and the 'complex64' to mean numarray's complex32 ? >> >> >> It was last February (i.e. 2005) when I first started posting >> regarding the new NumPy. I claimed it was more consistent to use >> actual bit-widths. A few people, including Perry, indicated they >> weren't opposed to the change and so I went ahead with it. >> >> You can read relevant posts by searching on >> numpy-discussion at lists.sourceforge.net >> >> Discussions are always welcome. I suppose it's not too late to >> change something like this --- but it's getting there... >> >> -Travis > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ryanlists at gmail.com Tue Apr 4 07:27:01 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue Apr 4 07:27:01 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: I actually have a problem with the elements of a string matrix from astype('S#'). The shorter elements in my matrix have a bunch of terms like '1.0', because the matrix they started from was a float. I need to keep the float type, but want to get rid of the '.0 ' when I convert the string output to latex. I was going to check if element[-2:]=='.0' but ran into this problem: In [15]: temp[-2:] Out[15]: '\x00\x00' In [16]: temp.strip() Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' I think I can get rid of the \x00's by calling str(element), but is this a feature or a bug? It would be slightly cleaner for me if the string matrix elements didn't have the trailing null characters (or whatever those are), but this may not be possible given the underlying representation. Thanks, Ryan On 4/3/06, Ryan Krauss wrote: > I am trying to use NumPy to generate some matrix inputs to Maxima for > symbolic analysis. I am using a fair number of > matrix.astype('S%d'%maxlen) statements. This seems to work very well. > It also doesn't seem to pad the elements in anyway if maxlen is > bigger than I need, which is great. This may seem like a dumb > computer science question, but what is the memory/performance cost of > making maxlen bigger than I want (but making sure that it is way > bigger than I need so that the elements don't get truncated)? If my > biggest matrices will be 13x13, how long can the strings be before I > consume more than a few megs (or a few dozen megs) of memory? > > Thanks, > > Ryan > From charlesr.harris at gmail.com Tue Apr 4 08:16:07 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue Apr 4 08:16:07 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 In-Reply-To: <443271C9.6080907@sympatico.ca> References: <442D9124.5020905@msg.ucsf.edu> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> <443271C9.6080907@sympatico.ca> Message-ID: I can't get worked up over this one way or the other: complex128 make sense if I count bits, complex64 makes sense if I note precision; I just have to remember the numpy convention. One could argue that complex64 is the more conventional choice and so has the virtue of least surprise, but I don't think it is terribly difficult to become accustomed to using complex128 in its place. I suppose this is one of those programmer's vs user's point of view thingees. For the guy writing general low level numpy code what matters is the length of the type, how many bytes have to be moved and so on, and from the other point of view what counts is the precision of the arithmetic. Chuck On 4/4/06, Colin J. Williams wrote: > > Sebastian Haase wrote: > > > Hi, > > Could we start another poll on this !? > > > > I think I would vote > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > used to" > > +1 Most people look to the number to give a clue as to the precision of > the value. > > Colin W. > > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab > > or IDL ! > > > > - Thanks > > Sebastian Haase > > > > > > > > Travis Oliphant wrote: > > > >> Sebastian Haase wrote: > >> > >>> Tim Hochberg wrote: > >>> > >>> > >>>> This would work fine if repr were instead: > >>>> > >>>> dtype([('x', float64), ('z', complex128)]) > >>>> > >>>> Anyway, this all seems reasonable to me at first glance. That said, > >>>> I don't plan to work on this, I've got other fish to fry at the > >>>> moment. > >>> > >>> > >>> > >>> A new point: Please remind me (and probably others): when did it get > >>> decided to introduce 'complex128' to mean numarray's complex64 > >>> and the 'complex64' to mean numarray's complex32 ? > >> > >> > >> It was last February (i.e. 2005) when I first started posting > >> regarding the new NumPy. I claimed it was more consistent to use > >> actual bit-widths. A few people, including Perry, indicated they > >> weren't opposed to the change and so I went ahead with it. > >> > >> You can read relevant posts by searching on > >> numpy-discussion at lists.sourceforge.net > >> > >> Discussions are always welcome. I suppose it's not too late to > >> change something like this --- but it's getting there... > >> > >> -Travis > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting > > language > > that extends applications into web and mobile media. Attend the live > > webcast > > and join the prime developer group breaking into this new coding > > territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Tue Apr 4 08:49:11 2006 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 4 08:49:11 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <4431FE90.6060301@msg.ucsf.edu> Message-ID: <200604041747.57180.faltet@carabos.com> A Dimarts 04 Abril 2006 07:40, Robert Kern va escriure: > Sebastian Haase wrote: > > I think I would vote > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > used to" > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab or > > IDL ! > > On the merits of the issue, I like the new scheme better. For whatever > reason, I tend to remember it when coding. With Numeric, I would frequently > second-guess myself and go to the prompt and tab-complete to look at all of > the options and reason out the one I wanted. I agree with Robert. From the very beginning NumPy design has been very consequent with typeEXTENT_IN_BITS mapping (even for unicode), and if we go back to numarray (complex32/complex64) convention, this would be the only exception to this rule. Perhaps I'm a bit biased by being a developer more interested in type 'sizes' that in 'precision' issues, but I'd definitely prefer a completely consistent approach for this matter. So +1 for complex64 & complex128 Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From haase at msg.ucsf.edu Tue Apr 4 09:33:07 2006 From: haase at msg.ucsf.edu (Sebastian Haase) Date: Tue Apr 4 09:33:07 2006 Subject: [Numpy-discussion] Vote: complex64 vs complex128 In-Reply-To: References: <442D9124.5020905@msg.ucsf.edu> <443271C9.6080907@sympatico.ca> Message-ID: <200604040929.15815.haase@msg.ucsf.edu> On Tuesday 04 April 2006 08:09, Charles R Harris wrote: > I can't get worked up over this one way or the other: complex128 make sense > if I count bits, complex64 makes sense if I note precision; I just have to > remember the numpy convention. One could argue that complex64 is the more > conventional choice and so has the virtue of least surprise, but I don't > think it is terribly difficult to become accustomed to using complex128 in > its place. I suppose this is one of those programmer's vs user's point of > view thingees. For the guy writing general low level numpy code what > matters is the length of the type, how many bytes have to be moved and so > on, and from the other point of view what counts is the precision of the > arithmetic. I kind of like your comparison of programmer vs user ;-) And so I was "hoping" that numpy (and scipy !!) is intended for the users - like supposedly IDL and Matlab are... No one likes my "backwards compatibility" argument !? Thanks - Sebastian Haase PS: I understand that voting is only for a last resort - some people, always use na.Complex and na.Float and don't care - BUT I use single precision all the time because my image data is already getting to large. So I have to look at this every day, and as Travis pointed out, now is about the last chance to possibly change complex128 to complex64 ... > > Chuck > > On 4/4/06, Colin J. Williams wrote: > > Sebastian Haase wrote: > > > Hi, > > > Could we start another poll on this !? > > > > > > I think I would vote > > > +1 for complex32 & complex64 mostly just because of "that's what I'm > > > used to" > > > > +1 Most people look to the number to give a clue as to the precision of > > the value. > > > > Colin W. > > > > > But I'm curious to hear what others "know to be in use" - e.g. Matlab > > > or IDL ! > > > > > > - Thanks > > > Sebastian Haase > > > > > > Travis Oliphant wrote: > > >> Sebastian Haase wrote: > > >>> Tim Hochberg wrote: > > >>> > > >>> > > >>>> This would work fine if repr were instead: > > >>>> > > >>>> dtype([('x', float64), ('z', complex128)]) > > >>>> > > >>>> Anyway, this all seems reasonable to me at first glance. That said, > > >>>> I don't plan to work on this, I've got other fish to fry at the > > >>>> moment. > > >>> > > >>> A new point: Please remind me (and probably others): when did it get > > >>> decided to introduce 'complex128' to mean numarray's complex64 > > >>> and the 'complex64' to mean numarray's complex32 ? > > >> > > >> It was last February (i.e. 2005) when I first started posting > > >> regarding the new NumPy. I claimed it was more consistent to use > > >> actual bit-widths. A few people, including Perry, indicated they > > >> weren't opposed to the change and so I went ahead with it. > > >> > > >> You can read relevant posts by searching on > > >> numpy-discussion at lists.sourceforge.net > > >> > > >> Discussions are always welcome. I suppose it's not too late to > > >> change something like this --- but it's getting there... > > >> > > >> -Travis From robert.kern at gmail.com Tue Apr 4 09:52:11 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 09:52:11 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: Ryan Krauss wrote: > I actually have a problem with the elements of a string matrix from > astype('S#'). The shorter elements in my matrix have a bunch of terms > like '1.0', because the matrix they started from was a float. I need > to keep the float type, but want to get rid of the '.0 ' when I > convert the string output to latex. I was going to check if > element[-2:]=='.0' but ran into this problem: > > In [15]: temp[-2:] > Out[15]: '\x00\x00' > > In [16]: temp.strip() > Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > > I think I can get rid of the \x00's by calling str(element), but is > this a feature or a bug? Probably both. :-) On the one hand, you want to be able to get a useful string out of the array; the nulls are just padding, and the string that you put in was '1.0'. However, suppose that the string you put in was '1.\x00'. Then you would get the "wrong" string out. However, the only real alternative is to also store an integer containing the length of the string with each element. That probably interferes with some of the uses of string arrays. > It would be slightly cleaner for me if the > string matrix elements didn't have the trailing null characters (or > whatever those are), but this may not be possible given the underlying > representation. You can also use temp.strip('\x00') which is a bit more explicit. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From zpincus at stanford.edu Tue Apr 4 09:54:06 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Tue Apr 4 09:54:06 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <443208B9.40106@cox.net> References: <442D9124.5020905@msg.ucsf.edu> <442D9695.2050900@cox.net> <442DB655.2050203@cox.net> <442DB91F.9030103@msg.ucsf.edu> <442DD638.60706@cox.net> <442FDDD5.8050404@sympatico.ca> <442FE950.8090000@cox.net> <44306594.50305@msg.ucsf.edu> <4431A8A0.9010604@ee.byu.edu> <4431FE90.6060301@msg.ucsf.edu> <443208B9.40106@cox.net> Message-ID: > On the other hand, if the scheme was Complex32x2 and Complex64x2, > I could probably decipher what that was without looking it up. It > is is a little ugly and weird I admit, but that probably wouldn't > bother me. On consideration, I'm +1 on Tim's suggestion here, if any change is going to be made. At least it has the virtue of being relatively clear, if a bit ugly. Zach From jh at oobleck.astro.cornell.edu Tue Apr 4 11:14:04 2006 From: jh at oobleck.astro.cornell.edu (Joe Harrington) Date: Tue Apr 4 11:14:04 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> (numpy-discussion-request@lists.sourceforge.net) References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> Message-ID: <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> When I first heard of Complex128, my first response was, "Cool! I didn't even know there was a Double128!" Folks seem to agree that precision-based naming would be most intuitive to new users, but that length-based naming would be most intuitive to low-level programmers. This is a high-level package, whose purpose is to hide the numerical details and programming drudgery from the user as much as possible, while still offering high performance and not limiting capability too much. For this type of package, a good metric is "when it doesn't restrict capability, do what makes sense for new/naiive users". So, I favor Complex32 and Complex64. When you say "complex", everyone knows you mean 2 numbers. When you say 32 or 64 or 128, in the context of bits for floating values, almost everyone assumes you are talking that many bits of precision to represent one number. Consider future conversations about precision and data size. In precision discussions, you'd always have to clarify that complex128 had 64 bits of precision, just to make sure everyone was on the same key (particularly when 128-bit machines arrive). In data-size discussions, everyone would know to double the size for the two components. No extra clarification would be needed. IDL's behavior is irrelevant to us, since they just say "complex", and "dcomplex" for 32-bit and 64-bit precision. --jh-- From oliphant.travis at ieee.org Tue Apr 4 11:25:11 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 4 11:25:11 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: <4432B9C2.7040307@ieee.org> Ryan Krauss wrote: > I actually have a problem with the elements of a string matrix from > astype('S#'). The shorter elements in my matrix have a bunch of terms > like '1.0', because the matrix they started from was a float. I need > to keep the float type, but want to get rid of the '.0 ' when I > convert the string output to latex. I was going to check if > element[-2:]=='.0' but ran into this problem > > In [15]: temp[-2:] > Out[15]: '\x00\x00' > > In [16]: temp.strip() > Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' > > I think I can get rid of the \x00's by calling str(element), but is > this a feature or a bug? Of course the elements are padded with '\x00' so that they are all the same length, but we have been trying to make it so that it doesn't matter. Equality testing is one area where it still does. We are using the underlying string equality testing (and it doesn't strip the '\x00'). So, I guess it's a missing feature at this point. -Travis From tim.hochberg at cox.net Tue Apr 4 11:41:10 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 11:41:10 2006 Subject: [Numpy-discussion] Re: string matrices In-Reply-To: References: Message-ID: <4432BD89.3050501@cox.net> Robert Kern wrote: >Ryan Krauss wrote: > > >>I actually have a problem with the elements of a string matrix from >>astype('S#'). The shorter elements in my matrix have a bunch of terms >>like '1.0', because the matrix they started from was a float. I need >>to keep the float type, but want to get rid of the '.0 ' when I >>convert the string output to latex. I was going to check if >>element[-2:]=='.0' but ran into this problem: >> >>In [15]: temp[-2:] >>Out[15]: '\x00\x00' >> >>In [16]: temp.strip() >>Out[16]: '1.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00' >> >>I think I can get rid of the \x00's by calling str(element), but is >>this a feature or a bug? >> >> > >Probably both. :-) On the one hand, you want to be able to get a useful string >out of the array; the nulls are just padding, and the string that you put in was >'1.0'. However, suppose that the string you put in was '1.\x00'. Then you would >get the "wrong" string out. > >However, the only real alternative is to also store an integer containing the >length of the string with each element. That probably interferes with some of >the uses of string arrays. > > > >>It would be slightly cleaner for me if the >>string matrix elements didn't have the trailing null characters (or >>whatever those are), but this may not be possible given the underlying >>representation. >> >> > >You can also use temp.strip('\x00') which is a bit more explicit. > > > Or even temp.rstrip('\x00') which works for all those time you pad the front of your string with '\x00' ;) -tim From faltet at carabos.com Tue Apr 4 11:46:08 2006 From: faltet at carabos.com (Francesc Altet) Date: Tue Apr 4 11:46:08 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: <200604042045.39955.faltet@carabos.com> A Dimarts 04 Abril 2006 20:13, Joe Harrington va escriure: > When I first heard of Complex128, my first response was, "Cool! I > didn't even know there was a Double128!" > > Folks seem to agree that precision-based naming would be most > intuitive to new users, but that length-based naming would be most > intuitive to low-level programmers. This is a high-level package, > whose purpose is to hide the numerical details and programming > drudgery from the user as much as possible, while still offering high > performance and not limiting capability too much. For this type of > package, a good metric is "when it doesn't restrict capability, do > what makes sense for new/naiive users". > > So, I favor Complex32 and Complex64. When you say "complex", everyone > knows you mean 2 numbers. When you say 32 or 64 or 128, in the > context of bits for floating values, almost everyone assumes you are > talking that many bits of precision to represent one number. Consider > future conversations about precision and data size. In precision > discussions, you'd always have to clarify that complex128 had 64 bits > of precision, just to make sure everyone was on the same key > (particularly when 128-bit machines arrive). In data-size > discussions, everyone would know to double the size for the two > components. No extra clarification would be needed. Well, from my point of view of "low-level" user, I don't specially like this, but I understand the "high-level" position to be much more important in terms of number of users. Besides, I also see that NumPy should be adressed specially to the requirements of the later users. So for me is fine with complex32/complex64. Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From robert.kern at gmail.com Tue Apr 4 12:15:08 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 12:15:08 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: Joe Harrington wrote: > When I first heard of Complex128, my first response was, "Cool! I > didn't even know there was a Double128!" > > Folks seem to agree that precision-based naming would be most > intuitive to new users, but that length-based naming would be most > intuitive to low-level programmers. This is a high-level package, > whose purpose is to hide the numerical details and programming > drudgery from the user as much as possible, while still offering high > performance and not limiting capability too much. For this type of > package, a good metric is "when it doesn't restrict capability, do > what makes sense for new/naiive users". I'm pretty sure that when any of us say that such-and-such is going to make the most sense to new users, we're just guessing. Or projecting our experienced-user prejudices onto them. If I had to register my guess, I would say that either way will make just as much sense to new users. I think it's time that we start taking backwards compatibility with previous releases of numpy seriously and not break numpy code without clear, significant gains in usability. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From aisaac at american.edu Tue Apr 4 12:38:05 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue Apr 4 12:38:05 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: On Tue, 04 Apr 2006, Robert Kern apparently wrote: > I would say that either way will make just as much sense > to new users. User's perspective: agreed. Just give me i. consistency and ii. an easy way to inspect the object for its meaning. Cheers, Alan Isaac From tim.hochberg at cox.net Tue Apr 4 12:52:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 12:52:04 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: <4432CE1F.3010209@cox.net> Robert Kern wrote: >Joe Harrington wrote: > > >>When I first heard of Complex128, my first response was, "Cool! I >>didn't even know there was a Double128!" >> >>Folks seem to agree that precision-based naming would be most >>intuitive to new users, but that length-based naming would be most >>intuitive to low-level programmers. This is a high-level package, >>whose purpose is to hide the numerical details and programming >>drudgery from the user as much as possible, while still offering high >>performance and not limiting capability too much. For this type of >>package, a good metric is "when it doesn't restrict capability, do >>what makes sense for new/naiive users". >> >> > >I'm pretty sure that when any of us say that such-and-such is going to make the >most sense to new users, we're just guessing. Or projecting our experienced-user >prejudices onto them. If I had to register my guess, I would say that either way >will make just as much sense to new users. > > Agreed. >I think it's time that we start taking backwards compatibility with previous >releases of numpy seriously and not break numpy code without clear, significant >gains in usability. > > So what does that mean in this case? The current status; nice for existing users of numpy. Or, the old status, nice for people transitioning to numpy from Numeric. It's hard to know which way these backwards compatibility arguments cut when they involve reverting a change from some old behaviour. I've got an idea. Rather than go round and round about complex64 versus complex128, let's just leave things as they are and add a docstring to complex128 and complex64 explaining the situation. [code...code...] >>> help(complex128) class complex128scalar(complexfloatingscalar, complex) | complex128: composed of two 64 bit floats | | Method resolution order: | complex128scalar | complexfloatingscalar | inexactscalar | numberscalar | genericscalar | complex | object ... I someone wants to give me some better text for the docstring, I'll go ahead and commit this change. Heck if you've got some text for the other scalar objects (within reason) I'll be happy to add that at the same time. Regards, -tim From robert.kern at gmail.com Tue Apr 4 13:06:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 13:06:01 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4432CE1F.3010209@cox.net> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> Message-ID: Tim Hochberg wrote: > Robert Kern wrote: >> I think it's time that we start taking backwards compatibility with >> previous >> releases of numpy seriously and not break numpy code without clear, >> significant >> gains in usability. >> > So what does that mean in this case? The current status; nice for > existing users of numpy. Or, the old status, nice for people > transitioning to numpy from Numeric. It's hard to know which way these > backwards compatibility arguments cut when they involve reverting a > change from some old behaviour. I mean numpy. Neither complex64 nor complex128 are backwards-compatible with Numeric. Complex32 and Complex64 already exist and are hopefully isolated as compatibility aliases for typecodes. By backwards-compatibility, I refer to code, not habits. > I've got an idea. Rather than go round and round about complex64 versus > complex128, let's just leave things as they are and add a docstring to > complex128 and complex64 explaining the situation. [code...code...] > > >>> help(complex128) > class complex128scalar(complexfloatingscalar, complex) > | complex128: composed of two 64 bit floats > | > | Method resolution order: > | complex128scalar > | complexfloatingscalar > | inexactscalar > | numberscalar > | genericscalar > | complex > | object > ... > > I someone wants to give me some better text for the docstring, I'll go > ahead and commit this change. Heck if you've got some text for the other > scalar objects (within reason) I'll be happy to add that at the same time. +1 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Tue Apr 4 13:42:38 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 4 13:42:38 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> Message-ID: <4432D9C3.3040109@ee.byu.edu> Robert Kern wrote: >Joe Harrington wrote: > > >>When I first heard of Complex128, my first response was, "Cool! I >>didn't even know there was a Double128!" >> >>Folks seem to agree that precision-based naming would be most >>intuitive to new users, but that length-based naming would be most >>intuitive to low-level programmers. This is a high-level package, >>whose purpose is to hide the numerical details and programming >>drudgery from the user as much as possible, while still offering high >>performance and not limiting capability too much. For this type of >>package, a good metric is "when it doesn't restrict capability, do >>what makes sense for new/naiive users". >> >> > >I'm pretty sure that when any of us say that such-and-such is going to make the >most sense to new users, we're just guessing. Or projecting our experienced-user >prejudices onto them. If I had to register my guess, I would say that either way >will make just as much sense to new users. > > Totally agree. I don't see the argument that Complex64 is a "precision" description. To a new user it could go either way depending on their previous experience. I think most new users won't even use the bit width names but will instead use 'complex' and be done with it... >I think it's time that we start taking backwards compatibility with previous >releases of numpy seriously and not break numpy code without clear, significant >gains in usability. > > +1 -Travis From perry at stsci.edu Tue Apr 4 14:09:02 2006 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 4 14:09:02 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4432D9C3.3040109@ee.byu.edu> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432D9C3.3040109@ee.byu.edu> Message-ID: <6e9f9be0cfb968840dc4314d65c9e655@stsci.edu> On Apr 4, 2006, at 4:40 PM, Travis Oliphant wrote: > > Totally agree. I don't see the argument that Complex64 is a > "precision" description. To a new user it could go either way > depending on their previous experience. I think most new users won't > even use the bit width names but will instead use 'complex' and be > done with it... > >> I think it's time that we start taking backwards compatibility with >> previous >> releases of numpy seriously and not break numpy code without clear, >> significant >> gains in usability. >> > +1 > The issue that just won't go away. We did it the current way for numarray initially and were persuaded to switch to be compatible with Numeric. I agree that it isn't obvious what the number means for complex. That ambiguity will always be there. Unless we did a real user test to find out, we wouldn't know for sure what future users would most likely expect. But in the end, pick one and let's not change it again (or even talk about changing it). It doesn't matter that much to me which it is. Perry From oliphant at ee.byu.edu Tue Apr 4 14:18:59 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 4 14:18:59 2006 Subject: [Numpy-discussion] NumPy documentation Message-ID: <4432E27E.6030906@ee.byu.edu> I received a rather hurtful email today that was very discouraging to me personally. Basically, I was called "lame" and a "wolf" in sheep's clothing because I'm charging for documentation. Fortunately it's the first email of that nature I've received. Others have disagreed with my choice to charge for the documentation but at least they've not resorted to personal attacks on me and my motivations. Please know that such emails do have an impact. While I try to build a tough skin, such unappreciative statements reduce my enthusiasm for working on NumPy significantly. My purpose, however, is not to rant about the misguided words of one person. He brought up a point that I want to clarify. He asked if I "would sue" if somebody else wrote documentation for NumPy. I want to be perfectly clear that this is a ridiculous statement that barely deserves a response. Of course I wouldn't. First of all, it would be extreme circumstances indeed for me to resort to that course of action (basically a company would have to copy my book and start distributing it on a large scale, belligerently). Second of all, I would love to see *more* documentation for NumPy. If there are other (less vocal) people out there who are not using NumPy because of my book, then I certainly feel sorry about that. Please dig in and create the documentation you so urgently want to be free. I will not stand in your way, but may even help. But please consider that time is money. Most people are better off spending their time on something else and just cooperating with others by paying for the book. But, I'm not going to dislike or have any kind of ill feelings with anyone who decides to spend their time on "documentation." In fact, I'll appreciate it just like everyone else. I love the growth of the SciPy Wiki. There are some great recipes and examples there. This is fantastic. I'm 100% behind this kind of work. Rather than write some kind of "replacement" documentation, contribute docstrings to the code and recipes to the Wiki. Then, those that can't or won't buy the book will still have plenty of resources to use to learn NumPy. I'm completely behind all forms of "free" information on NumPy / SciPy and related tools. The only reason I have to charge for the documentation is that I just don't have the resources to simply donate *all* of my time. I want to thank all of you who have already purchased the documentation. It has been extremely helpful to me personally and professionally. Without you, my time to spend on NumPy would have been significantly reduced. Thank you very much. Best wishes, -Travis From ijcvyash at rim.com Tue Apr 4 14:46:07 2006 From: ijcvyash at rim.com (ijcvyash) Date: Tue Apr 4 14:46:07 2006 Subject: [Numpy-discussion] Fw: numpy-discussion Message-ID: <000c01c65831$3e0d2b10$cc04ac54@berndtxhk37ozj> ----- Original Message ----- From: Armstrong Nicholas To: mgucfjwruye at bondavalli.com Sent: Saturday, April 01, 2006 10:21 PM Subject: numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-discussion.gif Type: image/gif Size: 8262 bytes Desc: not available URL: From Chris.Barker at noaa.gov Tue Apr 4 14:48:01 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue Apr 4 14:48:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <4432E973.8070601@noaa.gov> Travis, I'm very sorry to hear that you got such a response. It was completely unwarranted. I am often quite surprised at the vitriol that sometimes results from people that are not getting what they want from an open source project. Indeed, the comment about "suing" makes it completely clear that this individual completely misunderstood your intentions (and the reality of copyright law: you would only have a course of action if your book was copied!). When you first announced the book, I know there was a fair bit of discussion about it, and you made it quite clear how reasonable your position is. Personally, I think forcing open source projects by writing and selling books about them is an excellent approach: it works well for everyone. My freedom is not restricted, you get some compensation for your time. Ideally, I'd like to see comprehensive reference documentation distributed for free, while more comprehensive explanatory docs could be either free or not. One of these days I'll put my keyboard where my mouth is and actually write a doc string or two! In the meantime, I am absolutely thrilled that you've put as much effort into numpy as you have. You are doing a fabulous job, and I hope the appreciation of all is clear to you. thank you, -Chris PS: If we get a reasonable budget next year, I'll be sure to buy a few copies of your book. -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tim.hochberg at cox.net Tue Apr 4 15:37:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 15:37:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E973.8070601@noaa.gov> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> Message-ID: <4432F4DD.6060000@cox.net> Travis, I'm sorry to hear that you received such an unwarranted attack. Although, sadly, not terribly suprised; there are plenty of unpleasant fanatics of various stripes that roam the bitstreams. Let me add a hearty "me too" to everything that Chris just said. This finally motivated me to go out and buy your book, something that's been on my list of things that I should do "one of these days now". I'm hoping that makes this mystery person unhappy. Regards, -tim From svetosch at gmx.net Tue Apr 4 16:03:02 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Tue Apr 4 16:03:02 2006 Subject: [Numpy-discussion] kron with matrices Message-ID: <4432FADE.3070705@gmx.net> Hi, first of all thanks for including kron in numpy, it's very useful. Now I have just built numpy from svn for the first time in order to spot matrix-related bugs before a new release as promised. That worked well, thanks to the great wiki instructions. The old bugs (in linalg) are gone, but I wonder whether the following behavior is another one: >>> import numpy as n >>> n.kron(n.asmatrix(n.ones((1,2))), n.asmatrix(n.zeros((2,2)))) array([[0, 0, 0, 0], [0, 0, 0, 0]]) I would prefer if kron returned a matrix at least if both inputs are matrices, as in the given example. Thanks, Sven From jdhunter at ace.bsd.uchicago.edu Tue Apr 4 16:10:13 2006 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Tue Apr 4 16:10:13 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> (Travis Oliphant's message of "Tue, 04 Apr 2006 15:17:50 -0600") References: <4432E27E.6030906@ee.byu.edu> Message-ID: <87wte5ndot.fsf@peds-pc311.bsd.uchicago.edu> >>>>> "Travis" == Travis Oliphant writes: Travis> I received a rather hurtful email today that was very Travis> discouraging to me personally. Basically, I was called Travis> "lame" and a "wolf" in sheep's clothing because I'm Travis> charging for documentation. Fortunately it's the first Wow, harsh. I would just like to (for a second time) voice my support for your charging for documentation, and throw out a couple of points for people to consider who oppose it. I think a low-ball estimate of the dollar value of the amount of time Travis has donated to scientific python is about $500,000 dollars (5 years, full-time, $100k/yr -- this is low ball because he has probably donated more time and he is certainly worth more than that annually!). If he gets the $300,000 or so dollars he hopes to raise from this book, he still has a net contribution of more than $200k. Those of you who are critical: have you put in that much of your time or money? Secondly, I know personally that Travis has resisted several offers to lure him from academia into industry. Academia, by its nature, affords more flexibility to develop open source software driven by issues of breadth and quality rather than deadlines and customer demands. By charging for this book, it makes it more feasible for him to continue to work in academia and support these projects. Travis and I share some similarities: we both have a wife and kids, with low-paying academic careers, and lead active python projects. Only Travis leads two projects to my one and he has five kids to my three. I recently left academia for a job in industry because of financial considerations, and while my firm is supportive of my matplotlib development (we use it and python extensively in house), it does leave me less time for development. So to those of you grumbling to Travis directly or behind the scenes, think about what he is giving and back off. And start donating some of your own time instead of encouraging Travis to donate more of his. JDH From aisaac at american.edu Tue Apr 4 16:27:10 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue Apr 4 16:27:10 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: On Tue, 04 Apr 2006, Travis Oliphant apparently wrote: > I'm not going to dislike or have any kind of ill feelings > with anyone who decides to spend their time on > "documentation." In fact, I'll appreciate it just like > everyone else. Of course you were extremely clear about this from the beginning. Thank you for numpy!!! Alan Isaac (grateful user of numpy) PS Your book is *very* helpful. From zpincus at stanford.edu Tue Apr 4 16:48:06 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Tue Apr 4 16:48:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432F4DD.6060000@cox.net> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> Message-ID: <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> Hi folks - I must admit that when I first saw the trelgol web page, I was briefly a bit confused and put off about the prospect of moving to numpy from Numeric. Now, it didn't take long for me to come to my senses and realize (a) that no formerly-free documentation had been revoked, (b) that there was enough documentation about the C API in the numpy distribution to get me started, (c) that there was a lot of support available on the email list, and most importantly (d) that Travis and many others are extremely generous with their time, both in answering emails on the numpy list and in making numpy better. I now of course wholeheartedly agree with everything everyone has said in this thread, and with the idea behind selling the documentation. In fact, I feel a bit ashamed that I ever felt otherwise, even though it was just for a few minutes. However, were I a more grumpy (or stupid) type, I might not have come to my senses as rapidly, or ever. That would have been my loss, of course. But, perhaps a few little things could help newcomers better understand the rationale behind the ebook. Basically, everyone on this list knows (and supports, it seems!) the reasoning behind selling the docs, because it was discussed on the list. However, it's not hard to imagine someone new to numpy, or maybe a convert from Numeric (who was used to the large, free manual) scratching their head a little when confronted with http:// www.tramy.us/ . (It's less reasonable to imagine someone then going on to personally attack Travis in email -- that's absolutely unconscionable.) I would suggest that the link from the scipy page be changed to point to http://www.tramy.us/guidetoscipy.html , which is a little more clearly about the ebook, and a little less about the publishing method. It might not hurt to expand a bit on that page and mention the basic reasoning behind selling the docs, and even (if you see fit, Travis) to maybe include links to the other numpy documentation resources (list archive and sign up page, old and out-of-date Numeric reference [with maybe some mention of why buying the book would be better, but that the old ref at least gives the right high-level picture to get a newcomer started using numpy], and the numpy wiki pages). Any of this would certainly put a newcomer in a more charitable state of mind, and forestall any lingering concerns about greed or any such foolishness. Since free advice is worth exactly what you paid for it, feel free to ignore any or all of this. I just wanted to mention a few easy things that I think might help newcomers understand and feel good about the ebook (the first step toward buying it!). Zach On Apr 4, 2006, at 5:36 PM, Tim Hochberg wrote: > > Travis, > > I'm sorry to hear that you received such an unwarranted attack. > Although, sadly, not terribly suprised; there are plenty of > unpleasant fanatics of various stripes that roam the bitstreams. > Let me add a hearty "me too" to everything that Chris just said. > > This finally motivated me to go out and buy your book, something > that's been on my list of things that I should do "one of these > days now". I'm hoping that makes this mystery person unhappy. > > Regards, > -tim > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the > live webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From zpincus at stanford.edu Tue Apr 4 17:19:18 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Tue Apr 4 17:19:18 2006 Subject: [Numpy-discussion] array constructor from generators? Message-ID: Hi folks, Sorry if this has already been discussed, but do you all think it a good idea to extend the array constructor so that it can accept generators instead of lists? I often construct arrays from list comprehensions on generators, e.g. to read a tab-delimited file in: numpy.array([map(float, line.split()) for line in file]) or making an array of pairs of numbers: numpy.array([f for f in unique_combinations(input, 2)]) If the array constructor accepted generators (and turned them into lists behind the scenes, or even evaluated them lazily while filling in the memory buffer, not sure what would be more efficient), the above could be written somewhat more cleanly: numpy.array(map(float, line.split() for line in file) (using a generator expression) and numpy.array(unique_combinations(input, 2)) the latter is especially a win. Moreover, it's becoming more standard for any python thing that can accept a list to also accept a generator. The downside is that currently, passing array() an object makes a 0-d object array with that object. If this were changed, then passing array() an iterator object would be handled differently than passing array any other object. This might possibly be a fatal flaw in this idea. I'd be happy to look in to implementing this functionality if people think it is a good idea, and could give me some tips as to the best way to implement it. Zach From wbaxter at gmail.com Tue Apr 4 17:24:38 2006 From: wbaxter at gmail.com (Bill Baxter) Date: Tue Apr 4 17:24:38 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> Message-ID: First of all, it sounds like the individual who mailed Travis about being a "wolf in sheep's clothing" is suffering from the delusion that you can actually get rich by selling technical documentation at 40 bucks a pop. Travis does have a web page up somewhere explaining all his rationale -- I ran across it somewhere. I remember when I saw it I was thinking "that's bizarre -- why on earth would you have to make a whole web page to justify selling something you yourself created?" I mean, like it or not, Travis wrote it so he can do whatever he wants with it. That's just common sense. Something apparently some lack. It reminds me of the story my father told me when I was like 8 years old about a man who shows up one day and gives a little boy a dollar bill. The boy is exctatic, and thanks the man profusely. Then the next day the same thing, another dollar. The boy can't believe his luck. The whole week the guy comes, then it becomes a month, and then a year. Every day another dollar. Eventually it becomes such a routine that the boy doesn't even bother to thank the guy. Then one day the man doesn't show up. The little boy is furious. He was counting on that dollar, he already knew how he was going to spend every penny. The person who emailed Travis is just like that little boy, furious for not getting the dollar that wasn't his to begin with, rather than being thankful for the $365 he was given out of the blue for no particular reason. --bb -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at cox.net Tue Apr 4 17:41:15 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 4 17:41:15 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: References: Message-ID: <44331200.2020604@cox.net> Zachary Pincus wrote: > Hi folks, > > Sorry if this has already been discussed, but do you all think it a > good idea to extend the array constructor so that it can accept > generators instead of lists? > > I often construct arrays from list comprehensions on generators, e.g. > to read a tab-delimited file in: > numpy.array([map(float, line.split()) for line in file]) > or making an array of pairs of numbers: > numpy.array([f for f in unique_combinations(input, 2)]) > > If the array constructor accepted generators (and turned them into > lists behind the scenes, or even evaluated them lazily while filling > in the memory buffer, not sure what would be more efficient), the > above could be written somewhat more cleanly: > numpy.array(map(float, line.split() for line in file) (using a > generator expression) > and > numpy.array(unique_combinations(input, 2)) > > the latter is especially a win. > > Moreover, it's becoming more standard for any python thing that can > accept a list to also accept a generator. > > The downside is that currently, passing array() an object makes a 0-d > object array with that object. If this were changed, then passing > array() an iterator object would be handled differently than passing > array any other object. This might possibly be a fatal flaw in this > idea. You pretty much can't count on anything when trying to implicitly create object arrays anyway. There's already buckets of special cases to make the other array types user friendly. In other words I don't think we should care. You do have to be careful to special case iterators after all the other special case machinery, so that lists and whatnot that are treated efficiently don't get slowed down. > > I'd be happy to look in to implementing this functionality if people > think it is a good idea, and could give me some tips as to the best > way to implement it. Hi Zach, I brought this up last week and Travis was OK with it. I have it on my todo list, but if you are in a hurry you're welcome to do it instead. If you do look at it, consider looking into the '__length_hint__ parameter that's slated to go into Python 2.5. When this is present, it's potentially a big win, since you can preallocate the array and fill it directly from the iterator. Without this, you probably can't do much better than just building a list from the array. What would work well would be to build a list, then steal its memory. I'm not sure if that's feasible without leaking a reference to the list though. Also, with iterators, specifying dtype will make a huge difference. If an object has __length_hint__ and you specify dtype, then you can preallocate the array as I suggested above. However, if dtype is not specified, you still need to build the list completely, determine what type it is, allocate the array memory and then copy the values into it. Much less efficient! Regards, -tim > > Zach > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From robert.kern at gmail.com Tue Apr 4 17:50:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 4 17:50:05 2006 Subject: [Numpy-discussion] Re: array constructor from generators? In-Reply-To: References: Message-ID: Zachary Pincus wrote: > The downside is that currently, passing array() an object makes a 0-d > object array with that object. If this were changed, then passing > array() an iterator object would be handled differently than passing > array any other object. This might possibly be a fatal flaw in this idea. I don't think so. We can pass appropriate lists to array(), and it handles them fine. Iterator objects are just another kind of object that gets special treatment. The tricky bit is recognizing them. > I'd be happy to look in to implementing this functionality if people > think it is a good idea, and could give me some tips as to the best way > to implement it. I think a prerequisite for turning an arbitrary iterable into a numpy array is to iterate over it and store all of the objects in a temporary buffer that expands with a sensible strategy. I can't think of a better buffer object than regular Python lists. I think you can recognize when you have to use the temporary list strategy by seeing if the input has .__iter__() but not .__len__(). I'd have to refresh myself on the details of PyArray_New to be more sure, though. As Tim suggests, 2.5's __length_hint__ will also help. Another note of caution: You are going to have to deal with iterators of iterators of iterators of.... I'm not sure if that actually overly complicates matters; I haven't looked at PyArray_New for some time. Enjoy! -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ted.horst at earthlink.net Tue Apr 4 21:33:04 2006 From: ted.horst at earthlink.net (Ted Horst) Date: Tue Apr 4 21:33:04 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> I'll just add my voice to the people speaking up to support Travis's efforts. I buy lots of books, and most of the time I don't think too much about who I am supporting when I buy them, but I probably would have bought this book even if I didn't need that level of documentation just to help support what I see as very important work. I don't see how writing about an open source project and using the proceeds to further that project could be seen as anything other than a positive. I also just want to say how impressed I am with what Travis has accomplished with this project. From the organizational effort, patience, and persistence of bringing the various communities together to the quality and quantity of the ideas, code, and discussions, his contributions have been inspiring. Ted Horst From eric at enthought.com Tue Apr 4 21:59:10 2006 From: eric at enthought.com (eric jones) Date: Tue Apr 4 21:59:10 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <44334E74.3000406@enthought.com> Travis Oliphant wrote: > > I received a rather hurtful email today that was very discouraging to > me personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. Hmmmm.... Chickens getting eaten by foxes. Farmer builds wire coop. Coop destroyed by foxes. More chickens eaten. Wolf builds wooden coop for free. Also stands guard but for a fee. No more chickens eaten. Most chickens glady pay. A few grumble about extortion! Thats fine. Let them take the guard. Foxes aren't so afraid of Chickens. This chicken will take his chances with this wolf. Turns out its just a lame chicken in wolves clothing. Smart chicken, he is. Dumb letter. Dumb story. Let see here, your a chicken. check. Travis is smart wolf-chicken... yeah that works. Numpy is the wooden chicken coop. errr... Guard duty is documentation. hmmm... foxes, not sure... Guess I should keep my day job. Slightly more seriously... There's a chicken's foot full of people on the planet that could have done what Travis has pulled off -- I've actually thought about this a little. Maybe Jim Huginin could have done it given similar time and motivation. After that, I come up a little short of candidates -- so maybe its just a pigs foot full. I consider us lucky that one of the few people able to fuse Numeric/numarray bailed us out and did it. Documentation is another matter as far as scarcity of qualified authors. I would trust any number of yayhoos to create at least passable documentation for Travis' creation. Heck, David Ascher managed to write the Numeric documentation . That said, writing docs is work, hard to do well, and not nearly as much fun as writing actual code (for the people on this list anyway). That significantly lowers the probability of it getting done. In fact, I believe LLNL funded the first documentation effort to help ensure that it happened (though I'm not positive about that). And, think of the creek we'd be up if he chose to keep the library and give away the docs. I'm all for someone writing free documentation. It'd be great to have. And, if it were as good as Travis', I might even use it. Still, it would probably be better for the world if you spent your time on other things that don't already have a solution (like documenting SciPy...). Once that and all similar problems are solved, loop back around and do the NumPy docs. One other comment. I've used another amazing library called agg (www.antigrain.com) extensively for rendering in kiva/chaco. I view Maxim (the author of Agg) and graphics rendering in a similar light as Travis and Numpy -- there are only a handful of people that could have written agg. For that I am hugely greatful. On the downside, agg is very complex and has very little documentation. Still a number of people use it without complaint. Based on the evidence, if Maxim wrote documentation and charged for it, the number of complaints would actually increase. It is just silly. I would pay his price and sing his praises for the days of my life that he gave back to me. eric ps. # Based on a definitive monte carlo simulation, one of every hundred chickens will # complain. Don't believe me. Try it. dist = stats.uniform(0.0, 1.0) for chicken in chickens: if dist.rvs()[0] < 0.01: print "extortion" From pfdubois at gmail.com Tue Apr 4 22:01:02 2006 From: pfdubois at gmail.com (Paul Dubois) Date: Tue Apr 4 22:01:02 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: Amen. On 04 Apr 2006 21:33:12 -0700, Ted Horst wrote: > > > I'll just add my voice to the people speaking up to support Travis's > efforts. I buy lots of books, and most of the time I don't think too > much about who I am supporting when I buy them, but I probably would > have bought this book even if I didn't need that level of > documentation just to help support what I see as very important > work. I don't see how writing about an open source project and using > the proceeds to further that project could be seen as anything other > than a positive. > > I also just want to say how impressed I am with what Travis has > accomplished with this project. From the organizational effort, > patience, and persistence of bringing the various communities > together to the quality and quantity of the ideas, code, and > discussions, his contributions have been inspiring. > > Ted Horst > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jdhunter at ace.bsd.uchicago.edu Tue Apr 4 22:54:01 2006 From: jdhunter at ace.bsd.uchicago.edu (John Hunter) Date: Tue Apr 4 22:54:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <44334E74.3000406@enthought.com> (eric jones's message of "Tue, 04 Apr 2006 23:58:28 -0500") References: <4432E27E.6030906@ee.byu.edu> <44334E74.3000406@enthought.com> Message-ID: <873bgsa7vp.fsf@peds-pc311.bsd.uchicago.edu> >>>>> "eric" == eric jones writes: eric> Let see here, your a chicken. check. Travis is smart eric> wolf-chicken... yeah that works. Numpy is the wooden chicken eric> coop. errr... Guard duty is documentation. hmmm... foxes, eric> not sure... And I thought you didn't drink anything stronger than Dr Pepper :-) JDH From sransom at nrao.edu Wed Apr 5 00:04:03 2006 From: sransom at nrao.edu (Scott Ransom) Date: Wed Apr 5 00:04:03 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: <20060405070150.GB8682@ssh.cv.nrao.edu> As someone who has been actively using Numeric/Numarray/Numpy for about 7 years, now, I heartily agree. Thanks, Travis. Scott On Tue, Apr 04, 2006 at 11:32:42PM -0500, Ted Horst wrote: > > I'll just add my voice to the people speaking up to support Travis's > efforts. I buy lots of books, and most of the time I don't think too > much about who I am supporting when I buy them, but I probably would > have bought this book even if I didn't need that level of > documentation just to help support what I see as very important > work. I don't see how writing about an open source project and using > the proceeds to further that project could be seen as anything other > than a positive. > > I also just want to say how impressed I am with what Travis has > accomplished with this project. From the organizational effort, > patience, and persistence of bringing the various communities > together to the quality and quantity of the ideas, code, and > discussions, his contributions have been inspiring. > > Ted Horst > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From charlesr.harris at gmail.com Wed Apr 5 00:27:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed Apr 5 00:27:02 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: Travis, On 4/4/06, Travis Oliphant wrote: > > > I received a rather hurtful email today that was very discouraging to me > personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. Geez, what's with that. There are any number of "real" books out on python, I don't hear folks bitching. I think it's wonderful that we have such a good reference. I mean, look at numarray 8) I spent the money for your book and it didn't hurt a bit and was well worth the cost. Anyone who has tried to write extensive documentation on a big project knows how much work it takes, it isn't easy. Thanks for taking the time and sweat to do so. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From arnd.baecker at web.de Wed Apr 5 01:51:08 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 01:51:08 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> Message-ID: On Tue, 4 Apr 2006, Robert Kern wrote: > Tim Hochberg wrote: [...] > > >>> help(complex128) > > class complex128scalar(complexfloatingscalar, complex) > > | complex128: composed of two 64 bit floats > > | > > | Method resolution order: > > | complex128scalar > > | complexfloatingscalar > > | inexactscalar > > | numberscalar > > | genericscalar > > | complex > > | object > > ... I am puzzled why this does not show up with Ipython: In [1]:import numpy In [2]:numpy.complex128? Type: type Base Class: String Form: Namespace: Interactive Docstring: whereas In [3]:help(numpy.complex128) shows the above! So this might be more of an IPython question (I am running IPython 0.7.2.svn), but maybe numpy does some magic tricks to hide the docs from IPython (surely not on purpose ...)? It seems that numpy.complex128.__doc__ is None. Best, Arnd From meesters at uni-mainz.de Wed Apr 5 02:03:06 2006 From: meesters at uni-mainz.de (Christian Meesters) Date: Wed Apr 5 02:03:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> Message-ID: <200604051048.52766.meesters@uni-mainz.de> I'm glad Travis, that you got such supportive replies - but didn't expect anything else. Just let me give two more cents: a) I am a grateful user of Numpy/Scipy, too. b) I among of those who fully understand and support your decisions about selling the book. c) I didn't buy the book - yet. (Simply forgotten after a minor Pay-Pal-problem I had.) d) ad c): This will change soon. And e): Thank you for all your work put into Numpy/Scipy ! Christian From amcmorl at gmail.com Wed Apr 5 02:30:01 2006 From: amcmorl at gmail.com (amcmorl) Date: Wed Apr 5 02:30:01 2006 Subject: [Numpy-discussion] Newbie indexing question and print order Message-ID: <44338DF4.7050603@gmail.com> Hi all, I'm having a bit of trouble getting my head around numpy's indexing capabilities. A quick summary of the problem is that I want to lookup/index in nD from a second array of rank n+1, such that the last (or first, I guess) dimension contains the lookup co-ordinates for the value to extract from the first array. Here's a 2D (3,3) example: In [12]:print ar [[ 0.15 0.75 0.2 ] [ 0.82 0.5 0.77] [ 0.21 0.91 0.59]] In [24]:print inds [[[1 1] [1 1] [2 1]] [[2 2] [0 0] [1 0]] [[1 1] [0 0] [2 1]]] then somehow return the array (barring me making any row/column errors): In [26]: c = ar.somefancyindexingroutinehere(inds) In [26]:print c [[ 0.5 0.5 0.91] [ 0.59 0.15 0.82] [ 0.5 0.15 0.91]] i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] Any suggestions? It looks like it should be relatively simple using 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not getting it. While I'm here, can someone help me understand the rationale behind 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example rather than x, y (=column, row; in which case 0.75 would be in the first column and second row), which seems to me to be more intuitive. I'm really enjoying getting into numpy - I can see it'll be simpler/faster coding than my previous environments, despite me not knowing my way at the moment, and that python has better opportunities for extensibility. So, many thanks for your great work. -- Angus McMorland email a.mcmorland at auckland.ac.nz mobile +64-21-155-4906 PhD Student, Neurophysiology / Multiphoton & Confocal Imaging Physiology, University of Auckland phone +64-9-3737-599 x89707 Armourer, Auckland University Fencing Secretary, Fencing North Inc. From faltet at carabos.com Wed Apr 5 02:56:06 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed Apr 5 02:56:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <1144230907.7563.14.camel@localhost.localdomain> Travis, First of all, I think that you should be happy that you received *only* a mail of this class in the year and some months that you are at the NumPy project. As somebody already noted: "take a large enough community, and you will always find a person (or several) that thinks that the wiser developer and the best professional is evil". We can disscuss largely why this should happen, but the answer is easy: it's human nature. Let me also THANK YOU not only for your impressive dedication to the NumPy project but also for your openess to other ideas and to be the best advocate of the "I prefer to code, rather than talk" mantra. Lets do more of this and let others talk. I'm positive that 99% of the community is with you, and that's the only consideration that is worth. Best, Francesc El dt 04 de 04 del 2006 a les 15:17 -0600, en/na Travis Oliphant va escriure: > I received a rather hurtful email today that was very discouraging to me > personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. Fortunately it's the > first email of that nature I've received. Others have disagreed with my > choice to charge for the documentation but at least they've not resorted > to personal attacks on me and my motivations. Please know that such > emails do have an impact. While I try to build a tough skin, such > unappreciative statements reduce my enthusiasm for working on NumPy > significantly. > > My purpose, however, is not to rant about the misguided words of one > person. He brought up a point that I want to clarify. He asked if I > "would sue" if somebody else wrote documentation for NumPy. I want to > be perfectly clear that this is a ridiculous statement that barely > deserves a response. Of course I wouldn't. First of all, it would be > extreme circumstances indeed for me to resort to that course of action > (basically a company would have to copy my book and start distributing > it on a large scale, belligerently). Second of all, I would love to see > *more* documentation for NumPy. > > If there are other (less vocal) people out there who are not using NumPy > because of my book, then I certainly feel sorry about that. Please dig > in and create the documentation you so urgently want to be free. I > will not stand in your way, but may even help. > > But please consider that time is money. Most people are better off > spending their time on something else and just cooperating with others > by paying for the book. But, I'm not going to dislike or have any kind > of ill feelings with anyone who decides to spend their time on > "documentation." In fact, I'll appreciate it just like everyone else. > I love the growth of the SciPy Wiki. There are some great recipes and > examples there. This is fantastic. I'm 100% behind this kind of work. > Rather than write some kind of "replacement" documentation, contribute > docstrings to the code and recipes to the Wiki. Then, those that can't > or won't buy the book will still have plenty of resources to use to > learn NumPy. > > I'm completely behind all forms of "free" information on NumPy / SciPy > and related tools. The only reason I have to charge for the > documentation is that I just don't have the resources to simply donate > *all* of my time. I want to thank all of you who have already > purchased the documentation. It has been extremely helpful to me > personally and professionally. Without you, my time to spend on NumPy > would have been significantly reduced. Thank you very much. > > Best wishes, > > -Travis > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- >0,0< Francesc Altet http://www.carabos.com/ V V C?rabos Coop. V. Enjoy Data "-" From pau.gargallo at gmail.com Wed Apr 5 03:10:01 2006 From: pau.gargallo at gmail.com (Pau Gargallo) Date: Wed Apr 5 03:10:01 2006 Subject: [Numpy-discussion] Newbie indexing question and print order In-Reply-To: <44338DF4.7050603@gmail.com> References: <44338DF4.7050603@gmail.com> Message-ID: <6ef8f3380604050309t1ed4c79bv395ed1a9fb45ce9d@mail.gmail.com> hi, i had the same problem and i defined a function with a similar sintax to interp2 which i call take2 to solve it: from numpy import * def take2( a, x,y ): return take( ravel(a), x + y*a.shape[0] ) a = array( [[ 0.15, 0.75, 0.2 ], [ 0.82, 0.5, 0.77], [ 0.21, 0.91, 0.59]] ) xy = array([ [[1, 1], [1, 1], [2, 1]], [[2, 2], [0, 0], [1, 0]], [[1, 1], [0, 0], [2, 1]]] ) print take2( a, xy[...,0], xy[...,1] ) i hope this helps you. pau On 4/5/06, amcmorl wrote: > Hi all, > > I'm having a bit of trouble getting my head around numpy's indexing > capabilities. A quick summary of the problem is that I want to > lookup/index in nD from a second array of rank n+1, such that the last > (or first, I guess) dimension contains the lookup co-ordinates for the > value to extract from the first array. Here's a 2D (3,3) example: > > In [12]:print ar > [[ 0.15 0.75 0.2 ] > [ 0.82 0.5 0.77] > [ 0.21 0.91 0.59]] > > In [24]:print inds > [[[1 1] > [1 1] > [2 1]] > > [[2 2] > [0 0] > [1 0]] > > [[1 1] > [0 0] > [2 1]]] > > then somehow return the array (barring me making any row/column errors): > In [26]: c = ar.somefancyindexingroutinehere(inds) > > In [26]:print c > [[ 0.5 0.5 0.91] > [ 0.59 0.15 0.82] > [ 0.5 0.15 0.91]] > > i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] > > Any suggestions? It looks like it should be relatively simple using > 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not > getting it. > > While I'm here, can someone help me understand the rationale behind > 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example > rather than x, y (=column, row; in which case 0.75 would be in the first > column and second row), which seems to me to be more intuitive. > > I'm really enjoying getting into numpy - I can see it'll be > simpler/faster coding than my previous environments, despite me not > knowing my way at the moment, and that python has better opportunities > for extensibility. So, many thanks for your great work. > -- > Angus McMorland > email a.mcmorland at auckland.ac.nz > mobile +64-21-155-4906 > > PhD Student, Neurophysiology / Multiphoton & Confocal Imaging > Physiology, University of Auckland > phone +64-9-3737-599 x89707 > > Armourer, Auckland University Fencing > Secretary, Fencing North Inc. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Wed Apr 5 05:30:14 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 05:30:14 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> Message-ID: <4433B816.1080307@cox.net> Arnd Baecker wrote: >On Tue, 4 Apr 2006, Robert Kern wrote: > > > >>Tim Hochberg wrote: >> >> > >[...] > > > >>> >>> help(complex128) >>> class complex128scalar(complexfloatingscalar, complex) >>> | complex128: composed of two 64 bit floats >>> | >>> | Method resolution order: >>> | complex128scalar >>> | complexfloatingscalar >>> | inexactscalar >>> | numberscalar >>> | genericscalar >>> | complex >>> | object >>> ... >>> >>> > >I am puzzled why this does not show up with Ipython: > >In [1]:import numpy >In [2]:numpy.complex128? >Type: type >Base Class: >String Form: >Namespace: Interactive >Docstring: > > >whereas > >In [3]:help(numpy.complex128) > >shows the above! >So this might be more of an IPython question (I am running IPython >0.7.2.svn), but maybe numpy does some magic tricks to hide the docs from >IPython (surely not on purpose ...)? >It seems that numpy.complex128.__doc__ is None > That's right, none of the scalar types have docstrings at present. The builtin help (AKA pydoc.help) tracks back through all the base classes and presents all kinds of extra information. The result tends to be awfully verbose; so much so that I just stuffed a function called hint into __builtins___ that just prints the results of pydoc.describe and pydoc.getdoc. It's quite possible that such a function already exists, maybe even in pydoc, but oddly enough the docs for pydoc are pretty impenatrable. Here I've added basic docstrings to the complex types. I was hoping someone would have some ideas for other stuff that should go into the docstrings, but perhaps I'll just commit that change as is. Here's what I see here using hint: >>> hint(numpy.float64) # Still no docstring class float64scalar >>> hint(numpy.complex64) # Now has a terse docstring class complex64scalar | Composed of two 32 bit floats >>> hint(numpy.complex128) # Same here. class complex128scalar | Composed of two 64 bit floats Regards, -tim From arnd.baecker at web.de Wed Apr 5 05:48:02 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 05:48:02 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <44315633.4010600@cox.net> References: <44315633.4010600@cox.net> Message-ID: On Mon, 3 Apr 2006, Tim Hochberg wrote: > Arnd Baecker wrote: > > [SNIP] > > >((Note that I just learned in some other thread that with numpy there is > >an alternative to NewAxis, but I haven't figured out which that is ...)) > > > > > If you're old school you could just use None. Well, I have been using python/Numeric/... for a while, but I am definitively not old school - I was not aware that NewAxis is a longer spelling of None ;-) > But you probably mean 'newaxis'. yes - perfect! Many thanks. BTW, it seems that we have no Numeric to numpy transition remarks in www.scipy.org. I only found http://www.scipy.org/PearuPeterson/NumpyVersusNumeric and of course Travis' "Guide to NumPy" contains a detailed list of necessary changes in chapter 2.6.1. In addition ``site-packages/numpy/lib/convertcode.py`` provides an automatic conversion. Would it be helpful to start a new wiki page "ConvertingFromNumeric" (similar to http://www.scipy.org/Converting_from_numarray) which aims at summarizing the necessary changes or expand Pearu's page (if he agrees) on this? Best, Arnd From arnd.baecker at web.de Wed Apr 5 05:57:16 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 05:57:16 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: <4433B816.1080307@cox.net> References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> <4433B816.1080307@cox.net> Message-ID: Hi, On Wed, 5 Apr 2006, Tim Hochberg wrote: [...] > That's right, none of the scalar types have docstrings at present. The > builtin help (AKA pydoc.help) tracks back through all the base classes > and presents all kinds of extra information. I see - so that might be something Ipython could do as well (if that's really what we would like to see...) > The result tends to be > awfully verbose; so much so that I just stuffed a function called hint > into __builtins___ that just prints the results of pydoc.describe and > pydoc.getdoc. It's quite possible that such a function already exists, > maybe even in pydoc, but oddly enough the docs for pydoc are pretty > impenatrable. > > Here I've added basic docstrings to the complex types. I was hoping > someone would have some ideas for other stuff that should go into the > docstrings, but perhaps I'll just commit that change as is. Here's what > I see here using hint: > > >>> hint(numpy.float64) # Still no docstring > class float64scalar > >>> hint(numpy.complex64) # Now has a terse docstring > class complex64scalar > | Composed of two 32 bit floats > >>> hint(numpy.complex128) # Same here. > class complex128scalar > | Composed of two 64 bit floats That looks much better. I am a bit unsure about `hint` though for the following reasons: There are quite a few ways to access documentation: - help(defined_object) - help("numpy.complex128") - scipy.info(defined_object) - hint(defined_object) - defined_object? # with IPython (and then of course the pydoc commands as well ...). Clearly, I would prefer to have "?" in IPython as the only thing one needs to know about accessing documentation. There are surely many aspects to consider here, but I have to rush now ... Best, Arnd From tim.hochberg at cox.net Wed Apr 5 06:24:11 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 06:24:11 2006 Subject: [Numpy-discussion] Re: Vote: complex64 vs complex128 In-Reply-To: References: <20060404155003.AF09016220@sc8-sf-spam2.sourceforge.net> <200604041813.k34IDmAP011500@oobleck.astro.cornell.edu> <4432CE1F.3010209@cox.net> <4433B816.1080307@cox.net> Message-ID: <4433C4CC.7010003@cox.net> Arnd Baecker wrote: >Hi, > >On Wed, 5 Apr 2006, Tim Hochberg wrote: > >[...] > > > >>That's right, none of the scalar types have docstrings at present. The >>builtin help (AKA pydoc.help) tracks back through all the base classes >>and presents all kinds of extra information. >> >> > >I see - so that might be something Ipython could do as well >(if that's really what we would like to see...) > > > >>The result tends to be >>awfully verbose; so much so that I just stuffed a function called hint >>into __builtins___ that just prints the results of pydoc.describe and >>pydoc.getdoc. It's quite possible that such a function already exists, >>maybe even in pydoc, but oddly enough the docs for pydoc are pretty >>impenatrable. >> >>Here I've added basic docstrings to the complex types. I was hoping >>someone would have some ideas for other stuff that should go into the >>docstrings, but perhaps I'll just commit that change as is. Here's what >>I see here using hint: >> >> >>> hint(numpy.float64) # Still no docstring >>class float64scalar >> >>> hint(numpy.complex64) # Now has a terse docstring >>class complex64scalar >> | Composed of two 32 bit floats >> >>> hint(numpy.complex128) # Same here. >>class complex128scalar >> | Composed of two 64 bit floats >> >> > >That looks much better. >I am a bit unsure about `hint` though for the following reasons: >There are quite a few ways to access documentation: > - help(defined_object) > - help("numpy.complex128") > - scipy.info(defined_object) > - hint(defined_object) > - defined_object? # with IPython >(and then of course the pydoc commands as well ...). > > Sorry, I was unclear. Hint is only for my enjoyment -- it's not related to numpy. I just tossed it into my sitecustomize file. I was just get sick of doing help(complex64) and getting pages of text when all I cared about was the docstring. I suppose I could just have done "print complex64.__doc__", but I felt like hint might be useful. However, it's not something I was proposing to add to numpy, the changes I was talking about are strictly in the docstrings of complexXXX. -tim >Clearly, I would prefer to have "?" in IPython as the only thing one needs >to know about accessing documentation. > >There are surely many aspects to consider here, but I have to rush now ... > >Best, Arnd > > > > > > From emsellem at obs.univ-lyon1.fr Wed Apr 5 06:33:23 2006 From: emsellem at obs.univ-lyon1.fr (Eric Emsellem) Date: Wed Apr 5 06:33:23 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array Message-ID: <4433C6D6.5080800@obs.univ-lyon1.fr> Hi, I am trying to optimize a code where I derive random numbers many times and having an array of values for the stdev parameter. I wish to have an efficient way of doing something like: ################## stdev = array([1.1,1.2,1.0,2.2]) result = numpy.zeros(stdev.shape, Float) for i in range(len(stdev)) : result[i] = numpy.random.normal(0, stdev[i]) ################## In my case, stdev can in fact be an array of a few millions floats... so I really need to optimize things. Any hint on how to code this efficiently ? And in general, where could I find tips for optimizing a code where I unfortunately have too many loops such as "for i in range(Nbody) : " with Nbody being > 10^6 ? thanks! Eric From dd55 at cornell.edu Wed Apr 5 06:34:00 2006 From: dd55 at cornell.edu (Darren Dale) Date: Wed Apr 5 06:34:00 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: <200604050932.56744.dd55@cornell.edu> On Wednesday 05 April 2006 00:32, Ted Horst wrote: > I'll just add my voice to the people speaking up to support Travis's > efforts. I buy lots of books, and most of the time I don't think too > much about who I am supporting when I buy them, but I probably would > have bought this book even if I didn't need that level of > documentation just to help support what I see as very important > work. I don't see how writing about an open source project and using > the proceeds to further that project could be seen as anything other > than a positive. > > I also just want to say how impressed I am with what Travis has > accomplished with this project. From the organizational effort, > patience, and persistence of bringing the various communities > together to the quality and quantity of the ideas, code, and > discussions, his contributions have been inspiring. I agree. I support of what Travis has done. From pearu at scipy.org Wed Apr 5 07:18:02 2006 From: pearu at scipy.org (Pearu Peterson) Date: Wed Apr 5 07:18:02 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: <44315633.4010600@cox.net> Message-ID: On Wed, 5 Apr 2006, Arnd Baecker wrote: > BTW, it seems that we have no Numeric to numpy transition remarks in > www.scipy.org. I only found > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > and of course Travis' "Guide to NumPy" contains a detailed list of > necessary changes in chapter 2.6.1. > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > automatic conversion. > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > (similar to http://www.scipy.org/Converting_from_numarray) > which aims at summarizing the necessary changes > or expand Pearu's page (if he agrees) on this? It's better to start a new wiki page similar to Converting_from_numarray (I like the table). Btw, I have few notes about the necessary changes for Numeric->numpy transition in the following page: http://svn.enthought.com/enthought/wiki/NumpyPort#NotesonchangesduetoreplacingNumeric/scipy_basewithnumpy Feel free to grab these notes. Pearu From zpincus at stanford.edu Wed Apr 5 08:04:33 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Wed Apr 5 08:04:33 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: <44331200.2020604@cox.net> References: <44331200.2020604@cox.net> Message-ID: tim> > I brought this up last week and Travis was OK with it. I have it on > my todo list, but if you are in a hurry you're welcome to do it > instead. Sorry if that was on the list and I missed it! Hate to be adding more noise than signal. At any rate, I'm not in a hurry, but I'd be happy to help where I can. (Though for the next week or so I think I'm swamped...) tim> > If you do look at it, consider looking into the '__length_hint__ > parameter that's slated to go into Python 2.5. When this is > present, it's potentially a big win, since you can preallocate the > array and fill it directly from the iterator. Without this, you > probably can't do much better than just building a list from the > array. What would work well would be to build a list, then steal > its memory. I'm not sure if that's feasible without leaking a > reference to the list though. Can you steal its memory and then give it some dummy memory that it can free without problems, so that the list can be deallocated without trouble? Does anyone know if you can just give the list a NULL pointer for it's memory and then immediately decref it? free (NULL) should always be safe, I think. (??) > Also, with iterators, specifying dtype will make a huge difference. > If an object has __length_hint__ and you specify dtype, then you > can preallocate the array as I suggested above. However, if dtype > is not specified, you still need to build the list completely, > determine what type it is, allocate the array memory and then copy > the values into it. Much less efficient! How accurate is __length_hint__ going to be? It could lead to a fair bit of special case code for growing and shrinking the final array if __length_hint__ turns out to be wrong. Code that python lists already have, moreover. If the list's memory can be stolen safely, how does this strategy sound: - Given a generator, build it up into a list internally, and then steal the list's memory. - If a dtype is provided, wrap the generator with another generator that casts the original generator's output to the correct dtype. Then use the wrapped generator to create a list of the proper dtype, and steal that list's memory. A potential problem with stealing list memory is that it could waste memory if the list has more bytes allocated than it is using (I'm not sure if python lists can get this way, but I presume that they resize themselves only every so often, like C++ or Java vectors, so most of the time they have some allocated but unused bytes). If lists have a squeeze method that's guaranteed not to cause any copies, or if this can be added with judicious use of realloc, then that problem is obviated. robert> > Another note of caution: You are going to have to deal with > iterators of > iterators of iterators of.... I'm not sure if that actually overly > complicates > matters; I haven't looked at PyArray_New for some time. Enjoy! This is a good point. Numpy does fine with nested lists, but what should it do with nested generators? I originally thought that basically 'array(generator)' should make the exact same thing as 'array([f for f in generator])'. However, for nested generators, this would be an object array of generators. I'm not sure which is better -- having more special cases for generators that make generators, or having a simple rubric like above for how generators are treated. Any thoughts? Zach From perry at stsci.edu Wed Apr 5 08:08:19 2006 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 5 08:08:19 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> <9E52488D-6736-4F10-A045-2D5B39CBD3F5@earthlink.net> Message-ID: Speaking as someone who thinks he knows what kind of effort is involved in creating numpy, I suspect relatively few have any idea of the effort and skill that is required to do what Travis has done. Indeed, I wouldn't be surprised if Travis hadn't fully anticipated at the start what he was getting himself into, and if he hasn't asked himself more than once whether he would do it again had he known [I imagine that many worthy and memorable efforts fall into this category. Much human progress springs out of such initial optimism.] John Hunter is right that Travis's contributions to this and other scipy-related projects amount to years of work. For those that find it objectionable that Travis is trying to get some partial compensation for this work, consider whether there was any one at all in the Python community willing to do this as well as he as for free, or even for what he will actually recover from the book. I doubt it very much. Fortunately, I think the number of people that object to Travis charging for the book is small. Unfortunately, their impact can be disproportionately large. I hope Travis can effectively ignore them. Perry From lennart.ohlsson at cs.lth.se Wed Apr 5 08:12:20 2006 From: lennart.ohlsson at cs.lth.se (Lennart Ohlsson) Date: Wed Apr 5 08:12:20 2006 Subject: [Numpy-discussion] Re: Newbie indexing question and print order Message-ID: <008201c658c3$30d06ab0$2f32eb82@cs060109> Hi, Although I mainly use for 2D takes here is an nd-version of such a function: def vtake(a, indices): """Corresponding to take in numpy but with vector valued indices""" indexrank = indices.shape[-1] flattedindex = 0 for i in range(indexrank): flattedindex = flattedindex*a.shape[i] + indices[...,i] flattedshape = (-1,) + a.shape[indexrank:] return a.reshape(flattedshape).take(flattedindex) - Lennart On 4/5/06, Pau Gargallo wrote: hi, i had the same problem and i defined a function with a similar sintax to interp2 which i call take2 to solve it: from numpy import * def take2( a, x,y ): return take( ravel(a), x + y*a.shape[0] ) a = array( [[ 0.15, 0.75, 0.2 ], [ 0.82, 0.5, 0.77], [ 0.21, 0.91, 0.59]] ) xy = array([ [[1, 1], [1, 1], [2, 1]], [[2, 2], [0, 0], [1, 0]], [[1, 1], [0, 0], [2, 1]]] ) print take2( a, xy[...,0], xy[...,1] ) i hope this helps you. pau On 4/5/06, amcmorl wrote: > Hi all, > > I'm having a bit of trouble getting my head around numpy's indexing > capabilities. A quick summary of the problem is that I want to > lookup/index in nD from a second array of rank n+1, such that the last > (or first, I guess) dimension contains the lookup co-ordinates for the > value to extract from the first array. Here's a 2D (3,3) example: > > In [12]:print ar > [[ 0.15 0.75 0.2 ] > [ 0.82 0.5 0.77] > [ 0.21 0.91 0.59]] > > In [24]:print inds > [[[1 1] > [1 1] > [2 1]] > > [[2 2] > [0 0] > [1 0]] > > [[1 1] > [0 0] > [2 1]]] > > then somehow return the array (barring me making any row/column errors): > In [26]: c = ar.somefancyindexingroutinehere(inds) > > In [26]:print c > [[ 0.5 0.5 0.91] > [ 0.59 0.15 0.82] > [ 0.5 0.15 0.91]] > > i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] > > Any suggestions? It looks like it should be relatively simple using > 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not > getting it. > > While I'm here, can someone help me understand the rationale behind > 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example > rather than x, y (=column, row; in which case 0.75 would be in the first > column and second row), which seems to me to be more intuitive. > > I'm really enjoying getting into numpy - I can see it'll be > simpler/faster coding than my previous environments, despite me not > knowing my way at the moment, and that python has better opportunities > for extensibility. So, many thanks for your great work. > -- > Angus McMorland > email a.mcmorland at auckland.ac.nz > mobile +64-21-155-4906 > > PhD Student, Neurophysiology / Multiphoton & Confocal Imaging > Physiology, University of Auckland > phone +64-9-3737-599 x89707 > > Armourer, Auckland University Fencing > Secretary, Fencing North Inc. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From a.h.jaffe at gmail.com Wed Apr 5 08:18:03 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Wed Apr 5 08:18:03 2006 Subject: [Numpy-discussion] weird interaction: pickle, numpy, matplotlib.hist Message-ID: <4433DF85.7030109@gmail.com> Hi All, I've encountered a strange problem: I've been running some python code on both a linux box and OS X, both with python 2.4.1 and the latest numpy and matplotlib from svn. I have found that when I transfer pickled numpy arrays from one machine to the other (in either direction), the resulting data *looks* all right (i.e., it is a numpy array of the correct type with the correct values at the correct indices), but it seems to produce the wrong result in (at least) one circumstance: matplotlib.hist() gives the completely wrong picture (and set of bins). This can be ameliorated by running the array through arr=numpy.asarray(arr, dtype=numpy.float64) but this seems like a complete kludge (and is only needed when you do the transfer between machines). I've attached a minimal code that exhibits the problem: try test_pickle_hist.test(write=True) on one machine, transfer the output file to another machine, and run test_pickle_hist.test(write=False) on another, and you should see a very strange result (and it should be fixed if you set asarray=True). Any ideas? Andrew -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_pickle_hist.py URL: From ryanlists at gmail.com Wed Apr 5 08:23:06 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Wed Apr 5 08:23:06 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> Message-ID: I just realized that my "Amen" to all of this went only to Alan Isaac. I don't "reply-to-all" by default. In response to Perry's comment: "I hope Travis can effectively ignore them." I think a spam filter with "wolf" and "sheep" might be a good start, but it could accidentally delete some interesting "poetry" . Ryan On 4/4/06, Ryan Krauss wrote: > Let me add my thanks and also say that as a grad student who plans to > buy your book once I graduate, NumPy's use is not inhibited by Travis > charging for the documentation. > > Thanks! > > Ryan Krauss > > On 4/4/06, Alan G Isaac wrote: > > On Tue, 04 Apr 2006, Travis Oliphant apparently wrote: > > > I'm not going to dislike or have any kind of ill feelings > > > with anyone who decides to spend their time on > > > "documentation." In fact, I'll appreciate it just like > > > everyone else. > > > > Of course you were extremely clear about this from the > > beginning. Thank you for numpy!!! > > Alan Isaac (grateful user of numpy) > > PS Your book is *very* helpful. > > > > > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > > that extends applications into web and mobile media. Attend the live webcast > > and join the prime developer group breaking into this new coding territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From zpincus at stanford.edu Wed Apr 5 08:32:02 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Wed Apr 5 08:32:02 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: <44331200.2020604@cox.net> References: <44331200.2020604@cox.net> Message-ID: <884F03C6-599C-426A-A0A0-97009B63EACB@stanford.edu> [sorry if this comes through twice -- seems to have not sent the first time] Hi folks, tim> > I brought this up last week and Travis was OK with it. I have it on > my todo list, but if you are in a hurry you're welcome to do it > instead. Sorry if that was on the list and I missed it! Hate to be adding more noise than signal. At any rate, I'm not in a hurry, but I'd be happy to help where I can. (Though for the next week or so I think I'm swamped...) tim> > If you do look at it, consider looking into the '__length_hint__ > parameter that's slated to go into Python 2.5. When this is > present, it's potentially a big win, since you can preallocate the > array and fill it directly from the iterator. Without this, you > probably can't do much better than just building a list from the > array. What would work well would be to build a list, then steal > its memory. I'm not sure if that's feasible without leaking a > reference to the list though. Can you steal its memory and then give it some dummy memory that it can free without problems, so that the list can be deallocated without trouble? Does anyone know if you can just give the list a NULL pointer for it's memory and then immediately decref it? free (NULL) should always be safe, I think. (??) > Also, with iterators, specifying dtype will make a huge difference. > If an object has __length_hint__ and you specify dtype, then you > can preallocate the array as I suggested above. However, if dtype > is not specified, you still need to build the list completely, > determine what type it is, allocate the array memory and then copy > the values into it. Much less efficient! How accurate is __length_hint__ going to be? It could lead to a fair bit of special case code for growing and shrinking the final array if __length_hint__ turns out to be wrong. Code that python lists already have, moreover. If the list's memory can be stolen safely, how does this strategy sound: - Given a generator, build it up into a list internally, and then steal the list's memory. - If a dtype is provided, wrap the generator with another generator that casts the original generator's output to the correct dtype. Then use the wrapped generator to create a list of the proper dtype, and steal that list's memory. A potential problem with stealing list memory is that it could waste memory if the list has more bytes allocated than it is using (I'm not sure if python lists can get this way, but I presume that they resize themselves only every so often, like C++ or Java vectors, so most of the time they have some allocated but unused bytes). If lists have a squeeze method that's guaranteed not to cause any copies, or if this can be added with judicious use of realloc, then that problem is obviated. robert> > Another note of caution: You are going to have to deal with > iterators of > iterators of iterators of.... I'm not sure if that actually overly > complicates > matters; I haven't looked at PyArray_New for some time. Enjoy! This is a good point. Numpy does fine with nested lists, but what should it do with nested generators? I originally thought that basically 'array(generator)' should make the exact same thing as 'array([f for f in generator])'. However, for nested generators, this would be an object array of generators. I'm not sure which is better -- having more special cases for generators that make generators, or having a simple rubric like above for how generators are treated. Any thoughts? Zach From robert.kern at gmail.com Wed Apr 5 08:36:03 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 08:36:03 2006 Subject: [Numpy-discussion] Re: A random.normal function with stdev as array In-Reply-To: <4433C6D6.5080800@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: Eric Emsellem wrote: > Hi, > > I am trying to optimize a code where I derive random numbers many times > and having an array of values for the stdev parameter. > > I wish to have an efficient way of doing something like: > ################## > stdev = array([1.1,1.2,1.0,2.2]) > result = numpy.zeros(stdev.shape, Float) > for i in range(len(stdev)) : > result[i] = numpy.random.normal(0, stdev[i]) > ################## You can use the fact that the standard deviation of a normal distribution is a scale parameter. You can get random normal deviates of varying standard deviation by multiplying a standard normal deviate by the desired standard deviation (how's that for confusing terminology, eh?). result = numpy.random.standard_normal(stdev.shape) * stdev > In my case, stdev can in fact be an array of a few millions floats... > so I really need to optimize things. > > Any hint on how to code this efficiently ? > > And in general, where could I find tips for optimizing a code where I > unfortunately have too many loops such as "for i in range(Nbody) : " > with Nbody being > 10^6 ? Tim Hochberg recently made this list: """ 0. Think about your algorithm. 1. Vectorize your inner loop. 2. Eliminate temporaries 3. Ask for help 4. Recode in C. 5. Accept that your code will never be fast. Step zero should probably be repeated after every other step ;) """ That's probably the best general advice. To get better advice, we would need to know the specifics of the problem. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Wed Apr 5 08:48:27 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Wed Apr 5 08:48:27 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist [sort() method problem?] Message-ID: OK, I think I've managed to track the problem down a bit further: the sort() method is failing for arrays pickled on another machine! That is, it's definitely not sorting the array, but changing to a very strange order (neither the way it started nor sorted). Again, the array seems to otherwise behave fine (indeed, it even satisfies all(a==a1) for a pair that behave differently in this circumstance). Hmmm... A On 4/5/06, Andrew Jaffe wrote: > > Hi All, > > I've encountered a strange problem: I've been running some python code > on both a linux box and OS X, both with python 2.4.1 and the latest > numpy and matplotlib from svn. > > I have found that when I transfer pickled numpy arrays from one machine > to the other (in either direction), the resulting data *looks* all right > (i.e., it is a numpy array of the correct type with the correct values > at the correct indices), but it seems to produce the wrong result in (at > least) one circumstance: matplotlib.hist() gives the completely wrong > picture (and set of bins). > > This can be ameliorated by running the array through > arr=numpy.asarray(arr, dtype=numpy.float64) > but this seems like a complete kludge (and is only needed when you do > the transfer between machines). > > I've attached a minimal code that exhibits the problem: try > test_pickle_hist.test(write=True) > on one machine, transfer the output file to another machine, and run > test_pickle_hist.test(write=False) > on another, and you should see a very strange result (and it should be > fixed if you set asarray=True). > > Any ideas? > > Andrew > > > import cPickle > import numpy > import pylab > > def test(write=True,asarray=False): > > a = numpy.linspace(-3,3,num=100) > > if write: > f1 = file("a.cpkl", 'w') > cPickle.dump(a, f1) > f1.close() > > f1 = open("a.cpkl", 'r') > a1 = cPickle.load(f1) > f1.close() > > pylab.subplot(1,2,1) > h = pylab.hist(a) > > if asarray: > a1 = numpy.asarray(a1, dtype=numpy.float64) > > pylab.subplot(1,2,2) > h1 = pylab.hist(a1) > > return a, a1 > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From byrnes at bu.edu Wed Apr 5 08:58:21 2006 From: byrnes at bu.edu (John Byrnes) Date: Wed Apr 5 08:58:21 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <4433C6D6.5080800@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: <20060405155736.GA9364@localhost.localdomain> Hi Eric, In the past , I've done things like ###### normdist = lambda x: numpy.random.normal(0,x) vecnormal = numpy.vectorize(normdist) stdev = numpy.array([1.1,1.2,1.0,2.2]) result = vecnormal(stdev) ###### This works fine for up to 10k elements for stdev for some reason. Any larger then that and i get a Bus error on my PPC mac and a segfault on my x86 linux box. I'm running numpy 0.9.7.2325 on both machines. Perhaps for larger inputs, you could break up your loop into smaller vectorized chunks. Regards, John On Wed, Apr 05, 2006 at 03:32:06PM +0200, Eric Emsellem wrote: > Hi, > > I am trying to optimize a code where I derive random numbers many times > and having an array of values for the stdev parameter. > > I wish to have an efficient way of doing something like: > ################## > stdev = array([1.1,1.2,1.0,2.2]) > result = numpy.zeros(stdev.shape, Float) > for i in range(len(stdev)) : > result[i] = numpy.random.normal(0, stdev[i]) > ################## > > In my case, stdev can in fact be an array of a few millions floats... > so I really need to optimize things. > > Any hint on how to code this efficiently ? > > And in general, where could I find tips for optimizing a code where I > unfortunately have too many loops such as "for i in range(Nbody) : " > with Nbody being > 10^6 ? > > thanks! > Eric > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- If liberty and equality, as is thought by some are chiefly to be found in democracy, they will be best attained when all persons alike share in the government to the utmost. -- Aristotle, Politics From bsouthey at gmail.com Wed Apr 5 09:05:03 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Wed Apr 5 09:05:03 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: Hi, Sorry that you received such an email. It is one thing to disagree with your choice but it is inexcusable to dictate what you should do with your code/documentation (not to mention the language). Unfortunately, this appears to be the result of the typical confusion of what 'free' refers to in open source software. If this person thought that purchasing documentation is bad then I wonder what they think of the PyMOL project: "If you use PyMOL at work, then you are asked and expected to sponsor the project by purchasing a PyMOL Subscription" (http://www.pymol.org/funding.html)! Really the 'book' issue is more an excuse than a real reason for people not to use numpy. Personally I really think that you should get the 1.0 release out that probably would change some minds. Based on the list postings, the stability of numpy already exceeds a typical 1.0 release level. Regards Bruce From schofield at ftw.at Wed Apr 5 09:10:05 2006 From: schofield at ftw.at (Ed Schofield) Date: Wed Apr 5 09:10:05 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <4433EC3C.9050706@ftw.at> I'd also like to express my gratitude, Travis, for all the time and energy you've donated to both NumPy and SciPy. I also fully support your decision to charge for your book. Perhaps your correspondent expects your book to be free because it's online. Perhaps some re-branding -- from "fee-based documentation" to "book" or "handbook for users and developers" -- would help to avoid evoking such unfair responses? Incidentally, you mention on on the site that you'll print and bind hard-copy version once your sales reach 200 copies. I think this would help to encourage libraries and conservative institutions to purchase copies. Are your sales still under this level?! I'm now going to order a copy for my institution -- and a hard copy when it's available :) -- Ed From robert.kern at gmail.com Wed Apr 5 09:11:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 09:11:01 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4433DF85.7030109@gmail.com> References: <4433DF85.7030109@gmail.com> Message-ID: Andrew Jaffe wrote: > Hi All, > > I've encountered a strange problem: I've been running some python code > on both a linux box and OS X, both with python 2.4.1 and the latest > numpy and matplotlib from svn. > > I have found that when I transfer pickled numpy arrays from one machine > to the other (in either direction), the resulting data *looks* all right > (i.e., it is a numpy array of the correct type with the correct values > at the correct indices), but it seems to produce the wrong result in (at > least) one circumstance: matplotlib.hist() gives the completely wrong > picture (and set of bins). > > This can be ameliorated by running the array through > arr=numpy.asarray(arr, dtype=numpy.float64) > but this seems like a complete kludge (and is only needed when you do > the transfer between machines). You have a byteorder issue. You Linux box, which I presume has an Intel or AMD CPU, is little-endian where your OS X box, which I presume has a PPC CPU, is big-endian. numpy arrays can store their data in either endianness on either kind of platform; their dtype objects tell you which byteorder they are using. In the dtype specifications below, '>' means big-endian (I am using a PPC PowerBook), and '<' means little-endian. In [31]: a = linspace(0, 10, 11) In [32]: a Out[32]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.]) In [33]: a.dtype Out[33]: dtype('>f8') In [34]: b = a.newbyteorder() In [35]: b Out[35]: array([ 0.00000000e+000, 3.03865194e-319, 3.16202013e-322, 1.04346664e-320, 2.05531309e-320, 2.56123631e-320, 3.06715953e-320, 3.57308275e-320, 4.07900597e-320, 4.33196758e-320, 4.58492919e-320]) In [36]: b.dtype Out[36]: dtype(' References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> Message-ID: <4433F1F6.4010603@noaa.gov> Zachary Pincus wrote: > from Numeric (who was used to the large, free manual) Which brings up a question: Is the source to the old Numeric manual available? it would be nice to "port" it to SciPy. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From bsouthey at gmail.com Wed Apr 5 09:46:03 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Wed Apr 5 09:46:03 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <4433C6D6.5080800@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: Hi, Can you provide more details on what you are doing, especially how you are using this? The one item that is not directly part of Tim's list is that some times you need to reorder your loops (perhaps this is part of "Think about your algorithm"?). Loop swapping is very common to improve performance. However, it usually requires a very clear head or someone else to do it. Also, you can might need to break loops into pieces where you repeat the same tasks and computations over and over. The other aspect is to do some algebra on the calculations as the stdev is essentially a constant so depending on how you use it you can factor it out further. Again it all depends on what you are actually doing with these numbers. >From a different view, you need to be very careful with your (pseudo)random number generator with that many samples. These have a tendency to repeat so your random number stream is no longer random. See the Wikipedia entry: http://en.wikipedia.org/wiki/Pseudorandom_number_generator If I recall correctly, the Python random number generator is a Mersenne twister but ranlib is not and so prone to the mentioned problems. I do not know if SciPy adds any other generators. Finally I would also cheat by reducing the stdev values because in many cases you will not see a real difference between a normal with mean zero and variance 1.0 and a normal with mean zero and variance 1.1 (especially if you are doing more than comparing distributions so there are more sources of 'error') unless you have a really large number of samples. Regards Bruce On 4/5/06, Eric Emsellem wrote: > Hi, > > I am trying to optimize a code where I derive random numbers many times > and having an array of values for the stdev parameter. > > I wish to have an efficient way of doing something like: > ################## > stdev = array([1.1,1.2,1.0,2.2]) > result = numpy.zeros(stdev.shape, Float) > for i in range(len(stdev)) : > result[i] = numpy.random.normal(0, stdev[i]) > ################## > > In my case, stdev can in fact be an array of a few millions floats... > so I really need to optimize things. > > Any hint on how to code this efficiently ? > > And in general, where could I find tips for optimizing a code where I > unfortunately have too many loops such as "for i in range(Nbody) : " > with Nbody being > 10^6 ? > > thanks! > Eric > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Wed Apr 5 09:58:08 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 09:58:08 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array] Message-ID: <4433F71B.5080201@cox.net> Eric Emsellem wrote: > Hi, > this is illuminating in fact. These are things I would not have > thought about. > > I am trying at the moment to understand why two versions of my program > have a difference of about 10% (here it is 2sec for 1000 points, so > you can imagine for 10^6...) although the code is nearly the same. > > I have loops such as: > > #################### > bigarray = array of Nbig points > for i in range(N) : > bigarray = bigarray + calculation > #################### If you tell us more about calculation, we could probably help more. This sounds like you want to vectorize the inner loop, but you may in fact have already done that. There's nothing wrong with looping in python as long as you amortize the loop overhead over a large number of operations. Thus, the advice to vectorize your *inner* loop, not vectorize all loops. Attempting the latter can lead to impenatrable code, usually doesn't help signifigantly and sometimes slows things down as you overflow the cache with big matrices. > > I thought to do it by: > #################### > bigarray = numpy.sum(array([calculation for i in range(N)])) > #################### > not sure this is good... I suspect not, but timeit is your friend.... > > And you are basically saying that > > bigarray = bigarray + calculation > > is better than > > bigarray += calculation > > or is it strictly equivalent? (in terms of CPU...) Actually the reverse. "bigarray += calculation" should be better in terms of both speed and memory usage. In this case it's also clearer, so it's an improvement all around. They both do the same number of adds, but the first allocates more memory and pushes more data back and forth between main memory and the cache. The point I was making about += verus + was that I wouldn't in general recommend: a = some_func() a += something_else over: a = some_func() + something_else because it's less clear. In cases, where you do need really need the speed, it's fine, but most of the time that's not the case. In your case, the speedup is fairly minor, I believe because random.normal is fairly expensive. If you instead compare these two ways of computing a cube, you'll see a much larger difference (37%). >>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3" >>> timeit.Timer('stddev * stddev * stddev', setup).timeit(20) 1.206557537340359 >>> timeit.Timer('result = stddev*stddev; result *= stddev', setup).timeit(20) 0.88055493086403658 However, if you work with smaller matrices, the effect almost disappears (5%): >>> setup = "import numpy; stddev=numpy.arange(1e4,dtype=float)%3" >>> timeit.Timer('result = stddev*stddev; result *= stddev', setup).time 0.10166515576702295 >>> timeit.Timer('stddev * stddev * stddev', setup).timeit(2000) 0.10613667379493563 I believe that's because the speedup is nearly all due to reducing the amount of data you move around. In the second case everything fits in the cache, so this effect is minor. In the first you are pushing data back and forth to main memory so it's fairly large. On my machine these sort of effects kick in somewhere between 10,000 and 100,000 elements. > > thanks for the help, and sorry for the dum questions Not a problem. These are all legitimate questions that you can't really be expected to know without a fair amount of experience with numpy or its predecessors. It would be cool if someone added a page to the wicki on the topic so we could start collecting and orgainizing this information. For all I know there's one already there though -- I should probably check. -tim > > Eric > > Tim Hochberg wrote: > >> Eric Emsellem wrote: >> >>> >>>> >>>> >>>> Since stdev essentially scales the output of random, wouldn't the >>>> followin be equivalent to the above? >>>> >>>> result = numpy.random.normal(0, 1, stddev.shape) >>>> result *= stdev >>>> >>> yes indeed, this is a good option where in fact I could do >>> >>> result = stddev * numpy.random.normal(0, 1, stddev.shape) >>> >>> in one line. >>> thanks for the tip >> >> >> Indeed you can. However, keep in mind that the one line version is >> equivalent to: >> >> temp = numpy.random.normal(0, 1, stddev.shape) >> result = stddev * temp >> >> That is, it creates an extra temporary variable only to throw it >> away. The two line version I posted above avoids that temporary and >> thus should be both faster and less memory hungry. It's always good >> to check these things however: >> >> >>> setup = "import numpy; stddev=numpy.arange(1e6,dtype=float)%3" >> >>> timeit.Timer('stddev * numpy.random.normal(0, 1, stddev.shape)', >> setup).timeit(20) >> 3.4527201082819232 >> >>> timeit.Timer('result = numpy.random.normal(0, 1, stddev.shape); >> result*=stddev', setup).timeit(20) >> 3.1093337281693607 >> >> So, yes, the two line method is marginally faster (about 10%). Most >> of the time you shouldn't care about this: the one line version is >> clearer and most of the code you write isn't a bottleneck. Starting >> out writing this as the two line version is premature optimization. I >> used it here since the question was about optimization . >> >> I see Robert Kern just posted my list. If you want to put this in >> terms of that list, then: >> >> 0. Think about your algorithm >> => Recognize that stddev is a scale parameter >> 1. Vectorize your inner loop. >> => This is a no brainer after 0 resulting in the one line version >> 2. Eliminate temporaries >> => This results in the two line version. >> ... >> >> Also key here is recognizing when to stop. Steps 0 is always >> appropriate and step 1 is almost always good, resulting in code that >> is both clearer and faster. However, once you get to step 2 and >> beyond you tend to trade speed/memory usage for clarity. Not always: >> sometime *= and friends are clearer, but often, particularly if you >> start resorting to three arg ufuncs. So, my advice is to stop >> optimizing as soon as your code is fast enough. >> >> >>> (of course this is not strictly equivalent depending on the random >>> generator, but that will be fine for my purpose) >> >> >> I'll have to take your word for it -- after the normal distribution >> my knowledge in the area peters out rapidly/ >> >> Regards, >> >> -tim >> >> > From emsellem at obs.univ-lyon1.fr Wed Apr 5 10:06:04 2006 From: emsellem at obs.univ-lyon1.fr (Eric Emsellem) Date: Wed Apr 5 10:06:04 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: References: <4433C6D6.5080800@obs.univ-lyon1.fr> Message-ID: <4433F8D1.7090305@obs.univ-lyon1.fr> An HTML attachment was scrubbed... URL: From perry at stsci.edu Wed Apr 5 10:09:01 2006 From: perry at stsci.edu (Perry Greenfield) Date: Wed Apr 5 10:09:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433F1F6.4010603@noaa.gov> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: On Apr 5, 2006, at 12:36 PM, Christopher Barker wrote: > Zachary Pincus wrote: >> from Numeric (who was used to the large, free manual) > > Which brings up a question: Is the source to the old Numeric manual > available? it would be nice to "port" it to SciPy. Sort of. The original source was in Framemaker format. It was converted to the Python latex framework in the process of being adopted to numarray. The source for that is available on the numarray repository. If you want the framemaker source, I may be able to dig that up somewhere (or I may have lost track of it :-). Paul Dubois can likely provide it as well; that's who gave me the source. Perry From hetland at tamu.edu Wed Apr 5 10:15:27 2006 From: hetland at tamu.edu (Robert Hetland) Date: Wed Apr 5 10:15:27 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433F1F6.4010603@noaa.gov> References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: Let's not forget that this documentation will eventually be free *no matter what* -- after a financial goal is met or after a certain amount of time. This makes it fundamentally different than a published book (and in my opinion, much better). I personally think this is an innovative way to create a free product that everybody wants, but nobody wants to do. -Rob ----- Rob Hetland, Assistant Professor Dept of Oceanography, Texas A&M University p: 979-458-0096, f: 979-845-6331 e: hetland at tamu.edu, w: http://pong.tamu.edu From fonnesbeck at gmail.com Wed Apr 5 10:28:10 2006 From: fonnesbeck at gmail.com (Chris Fonnesbeck) Date: Wed Apr 5 10:28:10 2006 Subject: Fwd: [Numpy-discussion] NumPy documentation In-Reply-To: <723eb6930604051026q7dbcaad2w47c059f6c88e8db7@mail.gmail.com> References: <4432E27E.6030906@ee.byu.edu> <723eb6930604051026q7dbcaad2w47c059f6c88e8db7@mail.gmail.com> Message-ID: <723eb6930604051027m5aac408dnbba356ebdcb389ac@mail.gmail.com> On 4/4/06, Travis Oliphant wrote: > > I received a rather hurtful email today that was very discouraging to me > personally. Basically, I was called "lame" and a "wolf" in sheep's > clothing because I'm charging for documentation. There is one in every crowd, it seems. This email, and any others like it, should be utterly ignored, in the hopes that their authors will go elsewhere for scientific computing solutions. If they had spent any time at all on this list, they would have noticed the seemingly boundless attention and support that Travis bestows upon both scipy and its user community. Chris -- Chris Fonnesbeck + Atlanta, GA + http://trichech.us -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Apr 5 10:29:07 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed Apr 5 10:29:07 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433EC3C.9050706@ftw.at> References: <4432E27E.6030906@ee.byu.edu> <4433EC3C.9050706@ftw.at> Message-ID: Heh, On 4/5/06, Ed Schofield wrote: > Perhaps some re-branding -- from "fee-based documentation" to > "book" or "handbook for users and developers" I think that's a great idea! "Handbook for Users and Developers" sounds much better and doesn't have that nasty "documentation should be free" implication. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Apr 5 11:35:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 11:35:01 2006 Subject: [Numpy-discussion] Re: A random.normal function with stdev as array In-Reply-To: <4433F8D1.7090305@obs.univ-lyon1.fr> References: <4433C6D6.5080800@obs.univ-lyon1.fr> <4433F8D1.7090305@obs.univ-lyon1.fr> Message-ID: > Bruce Southey wrote: >>>From a different view, you need to be very careful with your >>(pseudo)random number generator with that many samples. These have a >>tendency to repeat so your random number stream is no longer random. >>See the Wikipedia entry: >>http://en.wikipedia.org/wiki/Pseudorandom_number_generator >> >>If I recall correctly, the Python random number generator is a >>Mersenne twister but ranlib is not and so prone to the mentioned >>problems. I do not know if SciPy adds any other generators. numpy.random uses the Mersenne Twister. RANLIB is dead! Long live MT19937! -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Wed Apr 5 11:59:04 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed Apr 5 11:59:04 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: <44341348.3050505@noaa.gov> Perry Greenfield wrote: > Sort of. The original source was in Framemaker format. It was converted > to the Python latex framework in the process of being adopted to > numarray. The source for that is available on the numarray repository. > If you want the framemaker source, I may be able to dig that up > somewhere (or I may have lost track of it :-). Paul Dubois can likely > provide it as well; that's who gave me the source. Thanks. That's good news. Now, when I'm done with everything else I want to work on..... LaTeX is a better option for me anyway. In fact, it's a better option for anyone that doesn't already use FrameMaker, as you can at least edit some of the text without knowing or using LaTeX at all. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Apr 5 12:07:10 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed Apr 5 12:07:10 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: References: Message-ID: <44341538.4040907@noaa.gov> Zachary Pincus wrote: > I often construct arrays from list comprehensions on generators, > numpy.array([map(float, line.split()) for line in file]) I know there are other uses, and this was just an example, but you can now do: numpy.fromfile(file, dtype=numpy.Float, sep="\t") Which is much faster and cleaner, if you ask me. Thanks for adding this, Travis! Tim Hochberg wrote: > Without this, you probably can't do much > better than just building a list from the array. What would work well > would be to build a list, then steal its memory. Perhaps another option is to borrow the machinery from fromfile (see above), that builds an array without knowing how big it is when it starts. I haven't looked at the code, but I know that Travis got at least the idea, if not the method, from my FileScanner module I wrote a while back, and that dynamically allocated the memory it needed as it grew. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tim.hochberg at cox.net Wed Apr 5 12:16:11 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 5 12:16:11 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: <884F03C6-599C-426A-A0A0-97009B63EACB@stanford.edu> References: <44331200.2020604@cox.net> <884F03C6-599C-426A-A0A0-97009B63EACB@stanford.edu> Message-ID: <4434175D.10103@cox.net> Zachary Pincus wrote: > [sorry if this comes through twice -- seems to have not sent the > first time] I've only seen it once so far, but my numpy mail seems to be coming through all out of order right now. > Hi folks, > > tim> > >> I brought this up last week and Travis was OK with it. I have it on >> my todo list, but if you are in a hurry you're welcome to do it >> instead. > > > Sorry if that was on the list and I missed it! Hate to be adding more > noise than signal. At any rate, I'm not in a hurry, but I'd be happy > to help where I can. (Though for the next week or so I think I'm > swamped...) There was no real discussion then. I said I thought it was a good idea. Travis said OK. That was about it. > tim> > >> If you do look at it, consider looking into the '__length_hint__ >> parameter that's slated to go into Python 2.5. When this is present, >> it's potentially a big win, since you can preallocate the array and >> fill it directly from the iterator. Without this, you probably can't >> do much better than just building a list from the array. What would >> work well would be to build a list, then steal its memory. I'm not >> sure if that's feasible without leaking a reference to the list though. > > > Can you steal its memory and then give it some dummy memory that it > can free without problems, so that the list can be deallocated > without trouble? Does anyone know if you can just give the list a > NULL pointer for it's memory and then immediately decref it? free > (NULL) should always be safe, I think. (??) That might well work, but now I realize that using a list this way probably won't work out well for other reasons. >> Also, with iterators, specifying dtype will make a huge difference. >> If an object has __length_hint__ and you specify dtype, then you can >> preallocate the array as I suggested above. However, if dtype is not >> specified, you still need to build the list completely, determine >> what type it is, allocate the array memory and then copy the values >> into it. Much less efficient! > > > How accurate is __length_hint__ going to be? It could lead to a fair > bit of special case code for growing and shrinking the final array if > __length_hint__ turns out to be wrong. see below. > Code that python lists already have, moreover. If we don't know dtype up front, lists are great. All the code is there and we need to look at all of the elements before we know what the elements are anyway. However, if you do know what dtype is the situation is different. Since these are generators, the object they create may only last until the next next() call if we don't hold onto it. That means that for a matrix of size N, generating thw whole list is going to require N*(sizeof(long) + sizeof(pyobjType) + sizeof(dtype)), versus just N*sizeof(dtype) if we're careful. I'm not sure what all of those various sizes are, but I'm going to guess that we'd be at least doubling our memory. All is not lost however. When we know the dtype, we should just use a *python* array to hold the data. It works just like a list, but on packed data. > > If the list's memory can be stolen safely, how does this strategy sound: Let me break this into two cases: 1. We don't know the dtype. > - Given a generator, build it up into a list internally +1 > , and then steal the list's memory. -0.5 I'm not sure this buys us as much as I thought initially. The list memory is PyObject*, so this would only work on dtypes no larger than the size of a pointer, usually that means no larger than a long. So, basically this would work on most of the integer types, but not the floating point types. And, it adds extra complexity to support two different cases. I'd be inclined to start with just copying the objects out of the list. If someone feels like it later, they can come back and try to optimize the case of integers to steal the lists memory.. Keep in mind that once we have a list, we can simple pass it to the machinery that already exists for creating arrays from lists making our lives much easier. > - If a dtype is provided, wrap the generator with another generator > that casts the original generator's output to the correct dtype. Then > use the wrapped generator to create a list of the proper dtype, and > steal that list's memory. -1. This wastes a lot of space and sort of defeats the purpose of the whole exercise in my mind. 2. Dtype is known. The case where dtype is provided is more complicated, but this is the case we really want to support well. Actually though, I think we can simplify it by judicious punting. Case 2a. Array is not 1-dimensional. Punt and fallback on the general code above. We can determine this simply by testing the first element. If it's not int/float/complex/whatever-other-scalar-values-we-have, fall back to case 1. Case 2b: length_ hint is not given. In this case, we build up the array in a python array, steal the data, possibly realloc and we're done. Case 2b length_hint is given. Same as above, but preallocate the appropriate amount of memory. Growing if length_hint lies. > > A potential problem with stealing list memory is that it could waste > memory if the list has more bytes allocated than it is using (I'm not > sure if python lists can get this way, but I presume that they resize > themselves only every so often, like C++ or Java vectors, so most of > the time they have some allocated but unused bytes). If lists have a > squeeze method that's guaranteed not to cause any copies, or if this > can be added with judicious use of realloc, then that problem is > obviated. I imagine once you steal the memory, realloc would the thing to try. However, I don't think it's worth stealing the memory from lists. I do think it's worth stealing the memory from python arrays however, and I'm sure that the same issue exists there. We'll have to look at how the deallocation for an array works. It probably use Py_XDecref, in which case we can just replace the memory with NULL and we'll be fine. OK, just had a look at the code for the python array object (Modules/arraymodule.c). Looks like it'll be a piece of cake. We can allocate it to the exact size we want if we have length_hint, otherwise resize only overallocates by 6%. That's not enough to worry about reallocing. Stealing the data looks like it shouldn't a problem either, just NULL ob_item as you suggested. Regards, -tim > > robert> > >> Another note of caution: You are going to have to deal with >> iterators of >> iterators of iterators of.... I'm not sure if that actually overly >> complicates >> matters; I haven't looked at PyArray_New for some time. Enjoy! > > > This is a good point. Numpy does fine with nested lists, but what > should it do with nested generators? I originally thought that > basically 'array(generator)' should make the exact same thing as > 'array([f for f in generator])'. However, for nested generators, this > would be an object array of generators. > > I'm not sure which is better -- having more special cases for > generators that make generators, or having a simple rubric like above > for how generators are treated. > > Any thoughts? > > Zach > > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From aisaac at american.edu Wed Apr 5 14:01:01 2006 From: aisaac at american.edu (Alan G Isaac) Date: Wed Apr 5 14:01:01 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4433EC3C.9050706@ftw.at> References: <4432E27E.6030906@ee.byu.edu><4433EC3C.9050706@ftw.at> Message-ID: On Wed, 05 Apr 2006, Ed Schofield apparently wrote: > you mention on on the site that you'll print and bind > hard-copy version once your sales reach 200 copies. > I think this would help to encourage libraries and > conservative institutions to purchase copies. Unfortunately, my library falls in this category. They were uncertain how to enforce the copyright with an electronic copy. (They are still thinking about it, last I heard.) Cheers, Alan Isaac From rahul.kanwar at gmail.com Wed Apr 5 16:25:01 2006 From: rahul.kanwar at gmail.com (Rahul Kanwar) Date: Wed Apr 5 16:25:01 2006 Subject: [Numpy-discussion] Numpy on 64 bit Xeon with ifort and mkl Message-ID: <63dec5bf0604051624k70c565baw70347a2fd571c253@mail.gmail.com> Hello, I am trying to compile Numpy on 64 bit Xeon with ifort and mkl libraries running Suse 10.0 linux. I had set the MKLROOT variable to the mkl library root but it could'nt find the 64 bit library. After a little bit of snooping I found the following in numpy/distutils/cpuinfo.py ------------------------------ def _is_XEON(self): return re.match(r'.*?XEON\b', self.info[0]['model name']) is not None _is_Xeon = _is_XEON ------------------------------ I changed XEON to Xeon and it worked and was able to indentify the em64t libraries. But it again got stuck with the following message. I used the following command to build Numpy python setup.py config_fc --fcompiler=intel install ------------------------------ building 'numpy.core._dotblas' extension compiling C sources gcc options: '-pthread -fno-strict-aliasing -DNDEBUG -O2 -fmessage-length=0 -Wall -D_FORTIFY_SOURCE=2 -g -fPIC' compile options: '-Inumpy/core/blasdot -I/opt/intel/mkl/8.0.2/include -Inumpy/core/include -Ibuild/src/numpy/core -Inumpy/core/src -Inumpy/core/include -I/usr/include/python2.4 -c' gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/opt/intel/mkl/8.0.2/lib/em64t -lmkl_em64t -lmkl -lvml -lguide -lpthread -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: /opt/intel/mkl/8.0.2/lib/em64t/libmkl_em64t.a(def_cgemm_omp.o): relocation R_X86_64_PC32 against `_mkl_blas_def_cgemm_276__par_loop0' can not be used when making a shared object; recompile with -fPIC /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: /opt/intel/mkl/8.0.2/lib/em64t/libmkl_em64t.a(def_cgemm_omp.o): relocation R_X86_64_PC32 against `_mkl_blas_def_cgemm_276__par_loop0' can not be used when making a shared object; recompile with -fPIC /usr/lib64/gcc/x86_64-suse-linux/4.0.2/../../../../x86_64-suse-linux/bin/ld: final link failed: Bad value collect2: ld returned 1 exit status error: Command "gcc -pthread -shared build/temp.linux-x86_64-2.4/numpy/core/blasdot/_dotblas.o -L/opt/intel/mkl/8.0.2/lib/em64t -lmkl_em64t -lmkl -lvml -lguide -lpthread -o build/lib.linux-x86_64-2.4/numpy/core/_dotblas.so" failed with exit status 1 ---------------------------------------------- i successfuly compiled it without the -lmkl_em64t flag but when i import numpy in python it gives error that some symbol is missing. I think that maybe if i use ifort as the linker instead ok gcc then things will work out properly, but i could'nt find how to change the linker to ifort. Aynone there who can help me with this problem ? regards, Rahul From robert.kern at gmail.com Wed Apr 5 17:17:04 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 5 17:17:04 2006 Subject: [Numpy-discussion] Re: Numpy on 64 bit Xeon with ifort and mkl In-Reply-To: <63dec5bf0604051624k70c565baw70347a2fd571c253@mail.gmail.com> References: <63dec5bf0604051624k70c565baw70347a2fd571c253@mail.gmail.com> Message-ID: Rahul Kanwar wrote: > i successfuly compiled it without the -lmkl_em64t flag but when i import > numpy in python it gives error that some symbol is missing. I think > that maybe if i use ifort as the linker instead ok gcc then things > will work out properly, but i could'nt find how to change the linker > to ifort. Aynone there who can help me with this problem ? It's not likely that using ifort to link will help. The problem is this bit: > /opt/intel/mkl/8.0.2/lib/em64t/libmkl_em64t.a(def_cgemm_omp.o): > relocation R_X86_64_PC32 against `_mkl_blas_def_cgemm_276__par_loop0' > can not be used when making a shared object; recompile with -fPIC You are linking against static libraries which were not compiled to be "position independent;" that is, they can't be used in shared libraries which are what Python extension modules are. C.f.: http://en.wikipedia.org/wiki/Position_independent_code Look around in /opt/intel/; they've almost certainly have provided shared library versions of the MKL that could be used. Google gives me these, for example: http://www.intel.com/support/performancetools/libraries/mkl/linux/sb/cs-017267.htm http://www.intel.com/software/products/mkl/docs/mklgs_lnx.htm#Linking_Your_Application_with_Intel_MKL -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ryanlists at gmail.com Wed Apr 5 19:50:07 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Wed Apr 5 19:50:07 2006 Subject: [Numpy-discussion] eye(N,dtype='S10') Message-ID: I am trying to create a function that can return a matrix that is either made up of complex numbers or strings depending on the input. I have created a symbolic string class to help me with that and it works well. One clumsy part is that in several cases I want to create an identity matrix and just replace a couple of elements. I currently have to do this in two steps: In [27]: mymat=numpy.eye(4,dtype='f') In [28]: mymat.astype('S10') Out[28]: array([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0]], dtype=(string,10)) I create a floating point matrix in the string case rather than a complex matrix so I don't have to parse the +0.0j stuff. But what I would really like is to be able to just be able to create either a complex matrix or a string matrix at the beginning. But trying numpy.eye(4,dtype='S10') produces array([[True, False, False, False], [False, True, False, False], [False, False, True, False], [False, False, False, True]], dtype=(string,10)) rather than array([[1.0, 0.0, 0.0, 0.0], [0.0, 1.0, 0.0, 0.0], [0.0, 0.0, 1.0, 0.0], [0.0, 0.0, 0.0, 1.0]], dtype=(string,10)) I need 1's and 0's rather than True and False because when I am done, I put the string representation into an input script to Maxima and Maxima wouldn't handle the True and False values well. Is there a way to directly create an identitiy string matrix with '1' and '0' instead of True and False? Thanks, Ryan From arnd.baecker at web.de Wed Apr 5 23:51:03 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 5 23:51:03 2006 Subject: [Numpy-discussion] Converting from Numeric (was: Speed up function on cross product of two sets?) In-Reply-To: References: <44315633.4010600@cox.net> Message-ID: Moin Moin, On Wed, 5 Apr 2006, Pearu Peterson wrote: > On Wed, 5 Apr 2006, Arnd Baecker wrote: > > > BTW, it seems that we have no Numeric to numpy transition remarks in > > www.scipy.org. I only found > > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > > and of course Travis' "Guide to NumPy" contains a detailed list of > > necessary changes in chapter 2.6.1. > > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > > automatic conversion. > > > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > > (similar to http://www.scipy.org/Converting_from_numarray) > > which aims at summarizing the necessary changes > > or expand Pearu's page (if he agrees) on this? > > It's better to start a new wiki page similar to Converting_from_numarray > (I like the table). Based on the above links I have set up a first draft at http://www.scipy.org/Converting_from_Numeric It is surely not complete and there are a couple of things which have to be checked for correctness (I tried out some, but not all ...). Also some remarks on using the new features of numpy (e.g., use array indexing instead of take and put...) might be useful. > Btw, I have few notes about the necessary changes for > Numeric->numpy transition in the following page: > > http://svn.enthought.com/enthought/wiki/NumpyPort#NotesonchangesduetoreplacingNumeric/scipy_basewithnumpy > > Feel free to grab these notes. Great - thanks, I tried to incorporate them as well. Best, Arnd From cimrman3 at ntc.zcu.cz Thu Apr 6 01:48:05 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu Apr 6 01:48:05 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: <4432E27E.6030906@ee.byu.edu> References: <4432E27E.6030906@ee.byu.edu> Message-ID: <4434D58D.2010505@ntc.zcu.cz> Travis Oliphant wrote: > > I received a rather hurtful email today that was very discouraging to me > ... Coming late on line, I can just +1 to all the support and appreciation you have received so far! r. From oliphant.travis at ieee.org Thu Apr 6 01:54:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 01:54:01 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: References: <44315633.4010600@cox.net> Message-ID: <4434D6DF.2020306@ieee.org> Arnd Baecker wrote: > BTW, it seems that we have no Numeric to numpy transition remarks in > www.scipy.org. I only found > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > and of course Travis' "Guide to NumPy" contains a detailed list of > necessary changes in chapter 2.6.1. > For clarification: this is in the sample chapter available on-line to all.... > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > automatic conversion. > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > (similar to http://www.scipy.org/Converting_from_numarray) > which aims at summarizing the necessary changes > or expand Pearu's page (if he agrees) on this? > Absolutely. I did the Numarray page because I'd written a lot on Converting from Numeric (even providing convertcode.py) but very little for numarray --- except the ndimage conversion. So, I started the Numarray page. Sounds like a great idea to have a dual page. -Travis From oliphant.travis at ieee.org Thu Apr 6 02:21:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 02:21:02 2006 Subject: [Numpy-discussion] array constructor from generators? In-Reply-To: References: <44331200.2020604@cox.net> Message-ID: <4434DD42.8010205@ieee.org> > Can you steal its memory and then give it some dummy memory that it > can free without problems, so that the list can be deallocated without > trouble? Does anyone know if you can just give the list a NULL pointer > for it's memory and then immediately decref it? free(NULL) should > always be safe, I think. (??) > I don't think you can steal a list's memory since each list element is a actually pointer to some other Python Object. However, a Python array's memory could be stolen (as Tim mentions later). > This is a good point. Numpy does fine with nested lists, but what > should it do with nested generators? I originally thought that > basically 'array(generator)' should make the exact same thing as > 'array([f for f in generator])'. However, for nested generators, this > would be an object array of generators. > > I'm not sure which is better -- having more special cases for > generators that make generators, or having a simple rubric like above > for how generators are treated. I like the idea that generators of generators acts the same as lists of lists (i.e. recursively defined). Basically to implement this, we need to repeat Array_FromSequence discover_depth discover_dimensions discover_itemsize Or, just maybe we can figure out a way to enhance those functions so that creating an array from generators works the same as creating an array from sequences. Right now, the sequence interface is used. Perhaps we could figure out a way to use a more abstract interface which would include both generators and sequences. If that causes too much alteration then I don't think it's worth it and we just repeat those functions for generators. Now, I think there are two cases here that are being discussed as one 1) Creating arrays from iterators --- array( iter(xrange(10) ) 2) Creating arrays from generators --- array(x for x in xrange(10)) Both of these cases really ought to be handled and really should be integrated into the Array_FromSequence code. That code is inherited from Numeric and was written before iterators and generators arose on the scene. There ought to be a way to unify all of these notions (Actually if you handle iterators, then sequences will come along for the ride since sequences can behave as iterators). I'd rather see one place in the code that handles these cases. But, working code is usually better than dreamy plans :-) -Travis From oliphant.travis at ieee.org Thu Apr 6 02:38:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 02:38:04 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <20060405155736.GA9364@localhost.localdomain> References: <4433C6D6.5080800@obs.univ-lyon1.fr> <20060405155736.GA9364@localhost.localdomain> Message-ID: <4434E13B.4000702@ieee.org> John Byrnes wrote: > Hi Eric, > > In the past , I've done things like > > ###### > normdist = lambda x: numpy.random.normal(0,x) > vecnormal = numpy.vectorize(normdist) > > stdev = numpy.array([1.1,1.2,1.0,2.2]) > result = vecnormal(stdev) > > ###### > > This works fine for up to 10k elements for stdev for some reason. > Any larger then that and i get a Bus error on my PPC mac and a segfault on > my x86 linux box. > > This needs to be tracked down. It looks like some-kind of error is not being caught correctly. You should not get a segfault. Could you provide a stack-trace when the problem occurs? One issue is that vectorize is using object arrays under the covers which is consuming roughly 2x the memory than you may think. An object array is created and the function is called for every element. This object array is then converted to a number type after the fact. The segfault should be tracked down in any case. -Travis From pau.gargallo at gmail.com Thu Apr 6 02:44:03 2006 From: pau.gargallo at gmail.com (Pau Gargallo) Date: Thu Apr 6 02:44:03 2006 Subject: [Numpy-discussion] NumPy documentation In-Reply-To: References: <4432E27E.6030906@ee.byu.edu> <4432E973.8070601@noaa.gov> <4432F4DD.6060000@cox.net> <1A94CA82-2EB5-4145-9EA9-453DE60AE684@stanford.edu> <4433F1F6.4010603@noaa.gov> Message-ID: <6ef8f3380604060243u2f54efc3r2baba94688c5d0af@mail.gmail.com> On 4/5/06, Perry Greenfield wrote: > > On Apr 5, 2006, at 12:36 PM, Christopher Barker wrote: > > > Zachary Pincus wrote: > >> from Numeric (who was used to the large, free manual) > > > > Which brings up a question: Is the source to the old Numeric manual > > available? it would be nice to "port" it to SciPy. > > Sort of. The original source was in Framemaker format. It was converted > to the Python latex framework in the process of being adopted to > numarray. The source for that is available on the numarray repository. > If you want the framemaker source, I may be able to dig that up > somewhere (or I may have lost track of it :-). Paul Dubois can likely > provide it as well; that's who gave me the source. > > Perry > +1 to any support to Travis Oliphant. Your work is really helping us. I am quite ignorant about licences and copyright things, so I would like to know: 1.- Is it OK to just copy the old Numeric documentation to the wiki and use it as a starting point for a more complete and updated doc? 2.- Would that be fine for the authors? I guess it will be very useful to everyone (especially beginners) to have an extended version of this documentation where there are many examples of use for every function. The wiki seems a very efficient way to build such a thing. It will take some time to manually copy-paste everything to the wiki, but it is doable what do you think? pau From oliphant.travis at ieee.org Thu Apr 6 02:46:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 02:46:02 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: References: <4433DF85.7030109@gmail.com> Message-ID: <4434E31B.5030306@ieee.org> Robert Kern wrote: > You have a byteorder issue. You Linux box, which I presume has an Intel or AMD > CPU, is little-endian where your OS X box, which I presume has a PPC CPU, is > big-endian. numpy arrays can store their data in either endianness on either > kind of platform; their dtype objects tell you which byteorder they are using. > > In [54]: c.sort() > > In [55]: c > Out[55]: array([ 0., 2., 3., 4., 5., 6., 7., 8., 9., 10., 1.]) > > > This is a bug. > > http://projects.scipy.org/scipy/numpy/ticket/47 > Good catch. This bug was due to an oversight when adding the new sorting functions. The case of byte-swapped data was not handled. Judicious use of copyswap on the buffer fixed it. But, this brings up the point that currently the pickled raw-data which is read-in as a string by Python is used as the memory for the new array (i.e. the string memory is "stolen"). This should work. The fact that it didn't with sort was a bug that is now fixed in SVN. However, operations on out-of-byte-order arrays will always be slower. Thus, perhaps on pickle read the data should be copied to native byte-order if necessary. Opinions? -Travis From benjamin at decideur.info Thu Apr 6 03:23:09 2006 From: benjamin at decideur.info (Benjamin Thyreau) Date: Thu Apr 6 03:23:09 2006 Subject: [Numpy-discussion] Recarray and shared datas Message-ID: <200604061020.k36AKIsQ018238@decideur.info> Hi, Numpy has a nice feature of recarray, ie. record which can hold columns names. I'd like to use such a feature in order to better interact with R, ie. passing R datas to python without copy. The current rpy bindings do a full copy, and convert to simple ndarray. Looking at the recarray api in the Guide, and also at the source code, i don't find any recarray constructor which can get shared datas (all the examples from section 8.6 are doing copies). Is there some way to do it ? in Python or in C ? Or is there any plans to ? Thanks for the infos -- Benjamin Thyreau CEA/SHFJ Orsay From oliphant.travis at ieee.org Thu Apr 6 03:40:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 6 03:40:05 2006 Subject: [Numpy-discussion] Newbie indexing question and print order In-Reply-To: <44338DF4.7050603@gmail.com> References: <44338DF4.7050603@gmail.com> Message-ID: <4434E522.3060101@ieee.org> amcmorl wrote: > Hi all, > > I'm having a bit of trouble getting my head around numpy's indexing > capabilities. A quick summary of the problem is that I want to > lookup/index in nD from a second array of rank n+1, such that the last > (or first, I guess) dimension contains the lookup co-ordinates for the > value to extract from the first array. Here's a 2D (3,3) example: > > In [12]:print ar > [[ 0.15 0.75 0.2 ] > [ 0.82 0.5 0.77] > [ 0.21 0.91 0.59]] > > In [24]:print inds > [[[1 1] > [1 1] > [2 1]] > > [[2 2] > [0 0] > [1 0]] > > [[1 1] > [0 0] > [2 1]]] > > then somehow return the array (barring me making any row/column errors): > In [26]: c = ar.somefancyindexingroutinehere(inds) > You can do this with "fancy-indexing". Obviously it is going to take some time for people to get used to this idea as none of the responses yet suggest it. But the following works. c = ar[inds[...,0],inds[...,1]] gives the desired effect. Thus, your simple description c[x,y] = ar[inds[x,y,0],inds[x,y,1]] is a text-book description of what fancy-indexing does. Best regards, -Travis > In [26]:print c > [[ 0.5 0.5 0.91] > [ 0.59 0.15 0.82] > [ 0.5 0.15 0.91]] > > i.e. c[x,y] = a[ inds[x,y,0], inds[x,y,1] ] > > Any suggestions? It looks like it should be relatively simple using > 'put' or 'take' or 'fetch' or 'sit' or something like that, but I'm not > getting it. > > While I'm here, can someone help me understand the rationale behind > 'print' printing row, column (i.e. a[0,1] = 0.75 in the above example > rather than x, y (=column, row; in which case 0.75 would be in the first > column and second row), which seems to me to be more intuitive. > > I'm really enjoying getting into numpy - I can see it'll be > simpler/faster coding than my previous environments, despite me not > knowing my way at the moment, and that python has better opportunities > for extensibility. So, many thanks for your great work. > From faltet at carabos.com Thu Apr 6 03:44:02 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu Apr 6 03:44:02 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4434E31B.5030306@ieee.org> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> Message-ID: <200604061243.48122.faltet@carabos.com> A Dijous 06 Abril 2006 11:44, Travis Oliphant va escriure: > But, this brings up the point that currently the pickled raw-data which > is read-in as a string by Python is used as the memory for the new array > (i.e. the string memory is "stolen"). This should work. The fact > that it didn't with sort was a bug that is now fixed in SVN. However, > operations on out-of-byte-order arrays will always be slower. Thus, > perhaps on pickle read the data should be copied to native byte-order if > necessary. Yes, I think that converting directly to native byteorder in unpickling time would be the best. Cheers! -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From a.u.r.e.l.i.a.n at gmx.net Thu Apr 6 04:16:11 2006 From: a.u.r.e.l.i.a.n at gmx.net (Johannes Loehnert) Date: Thu Apr 6 04:16:11 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.histHi In-Reply-To: <200604061243.48122.faltet@carabos.com> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> <200604061243.48122.faltet@carabos.com> Message-ID: <200604061315.23340.a.u.r.e.l.i.a.n@gmx.net> Hi, > > But, this brings up the point that currently the pickled raw-data which > > is read-in as a string by Python is used as the memory for the new array > > (i.e. the string memory is "stolen"). This should work. The fact > > that it didn't with sort was a bug that is now fixed in SVN. However, > > operations on out-of-byte-order arrays will always be slower. Thus, > > perhaps on pickle read the data should be copied to native byte-order if > > necessary. > > Yes, I think that converting directly to native byteorder in > unpickling time would be the best. If you stored your data in wrong byte order for some odd reason (maybe you use a library that requires a certain byte order), then you would want pickle to deliver the data back exactly as stored. I think this should be made a user option in some way, although I do not know a good place for it right now. Johannes From cimrman3 at ntc.zcu.cz Thu Apr 6 05:16:07 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu Apr 6 05:16:07 2006 Subject: [Numpy-discussion] site.cfg.example In-Reply-To: <4435020B.9040705@iam.uni-stuttgart.de> References: <44280161.4030708@ntc.zcu.cz> <442808AF.6090006@ftw.at> <44280C20.8000003@ntc.zcu.cz> <44297152.9000305@ftw.at> <442A698C.9000104@ntc.zcu.cz> <442A7E78.1030901@ftw.at> <442A86D2.20902@ntc.zcu.cz> <442A9A67.8050106@ftw.at> <442A9F8D.906@ntc.zcu.cz> <443253D4.90806@iam.uni-stuttgart.de> <4434D699.5030102@ntc.zcu.cz> <4434D8D3.7050200@iam.uni-stuttgart.de> <4434FC6B.3000905@ntc.zcu.cz> <4435020B.9040705@iam.uni-stuttgart.de> Message-ID: <44350672.4020008@ntc.zcu.cz> I have added numpy/site.cfg.example to the SVN. It should contain a list all possible sections and relevant fields, so that a (new) user sees what can be configured and then just copies the file to numpy/site.cfg, removes the unwanted sections and edits the wanted. If you think it is a good idea and have a section that is not present or properly described, contribute it, please :-) When/if the file grows, we can put it to the Wiki. cheers, r. From tim.hochberg at cox.net Thu Apr 6 08:39:00 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 6 08:39:00 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.histHi In-Reply-To: <200604061315.23340.a.u.r.e.l.i.a.n@gmx.net> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> <200604061243.48122.faltet@carabos.com> <200604061315.23340.a.u.r.e.l.i.a.n@gmx.net> Message-ID: <44353646.6010009@cox.net> Johannes Loehnert wrote: >Hi, > > > >>>But, this brings up the point that currently the pickled raw-data which >>>is read-in as a string by Python is used as the memory for the new array >>>(i.e. the string memory is "stolen"). This should work. The fact >>>that it didn't with sort was a bug that is now fixed in SVN. However, >>>operations on out-of-byte-order arrays will always be slower. Thus, >>>perhaps on pickle read the data should be copied to native byte-order if >>>necessary. >>> >>> >>Yes, I think that converting directly to native byteorder in >>unpickling time would be the best. >> >> > >If you stored your data in wrong byte order for some odd reason (maybe you use >a library that requires a certain byte order), then you would want pickle to >deliver the data back exactly as stored. I think this should be made a user >option in some way, although I do not know a good place for it right now. > > If this is really something we want to do, it seems that the "correct" solution is to have a different dtype when an object defaults to a given byte order than when it is forced to that byte order. Pickle could keep track of that and do the right thing on loading. For example, " References: <44331200.2020604@cox.net> <4434DD42.8010205@ieee.org> Message-ID: <44353880.2040406@cox.net> Travis Oliphant wrote: > >> Can you steal its memory and then give it some dummy memory that it >> can free without problems, so that the list can be deallocated >> without trouble? Does anyone know if you can just give the list a >> NULL pointer for it's memory and then immediately decref it? >> free(NULL) should always be safe, I think. (??) >> > I don't think you can steal a list's memory since each list element is > a actually pointer to some other Python Object. > However, a Python array's memory could be stolen (as Tim mentions later). > >> This is a good point. Numpy does fine with nested lists, but what >> should it do with nested generators? I originally thought that >> basically 'array(generator)' should make the exact same thing as >> 'array([f for f in generator])'. However, for nested generators, this >> would be an object array of generators. >> >> I'm not sure which is better -- having more special cases for >> generators that make generators, or having a simple rubric like above >> for how generators are treated. > > I like the idea that generators of generators acts the same as lists > of lists (i.e. recursively defined). Basically to implement this, we > need to repeat > > Array_FromSequence > discover_depth > discover_dimensions > discover_itemsize > > Or, just maybe we can figure out a way to enhance those functions so > that creating an array from generators works the same as creating an > array from sequences. Right now, the sequence interface is used. > Perhaps we could figure out a way to use a more abstract interface > which would include both generators and sequences. If that causes too > much alteration then I don't think it's worth it and we just repeat > those functions for generators. > > Now, I think there are two cases here that are being discussed as one > > 1) Creating arrays from iterators --- array( iter(xrange(10) ) > 2) Creating arrays from generators --- array(x for x in xrange(10)) > > Both of these cases really ought to be handled and really should be > integrated into the Array_FromSequence code. That code is inherited > from Numeric and was written before iterators and generators arose on > the scene. There ought to be a way to unify all of these notions > (Actually if you handle iterators, then sequences will come along for > the ride since sequences can behave as iterators). > I'd rather see one place in the code that handles these cases. But, > working code is usually better than dreamy plans :-) I agree with all of this. However, there's one specific case that I think we should optimize the heck out of. In fact, I'd be tempted as a first cut to only implement this case and raise exceptions in the other cases until we get around to implementing them. This one case is: * dtype known * 1-dimensional I care about this case because it's common and we can do it efficiently. In the other cases I could write a python function that does almost as good of a job as we're likely to do in C both in terms of speed and memory usage. So the known dtype, 1D case adds important functionality while the other "merely" adds convenience (and consistency). Those are good, but personally the added functionality is higher on my priority list. -tim From byrnes at bu.edu Thu Apr 6 09:15:25 2006 From: byrnes at bu.edu (John Byrnes) Date: Thu Apr 6 09:15:25 2006 Subject: [Numpy-discussion] A random.normal function with stdev as array In-Reply-To: <4434E13B.4000702@ieee.org> References: <4433C6D6.5080800@obs.univ-lyon1.fr> <20060405155736.GA9364@localhost.localdomain> <4434E13B.4000702@ieee.org> Message-ID: <20060406161450.GA18606@localhost.localdomain> On Thu, Apr 06, 2006 at 03:36:59AM -0600, Travis Oliphant wrote: > John Byrnes wrote: > >Hi Eric, > > > >In the past , I've done things like > > > >###### > >normdist = lambda x: numpy.random.normal(0,x) > >vecnormal = numpy.vectorize(normdist) > > > >stdev = numpy.array([1.1,1.2,1.0,2.2]) > >result = vecnormal(stdev) > > > >###### > > > >This works fine for up to 10k elements for stdev for some reason. > >Any larger then that and i get a Bus error on my PPC mac and a segfault on > >my x86 linux box. > > > > > > This needs to be tracked down. It looks like some-kind of error is not > being caught correctly. You should not get a segfault. Could you > provide a stack-trace when the problem occurs? > > One issue is that vectorize is using object arrays under the covers > which is consuming roughly 2x the memory than you may think. An > object array is created and the function is called for every element. > This object array is then converted to a number type after the fact. > > The segfault should be tracked down in any case. > > -Travis > > > Hi Travis, Here is a backtrace from gdb on my mac. John #0 0x00470b88 in log1pl () #1 0x00000000 in ?? () Cannot access memory at address 0x0 Cannot access memory at address 0x0 #2 0x004708ec in log1pl () #3 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #4 0x1007ce34 in ext_do_call (func=0x1, pp_stack=0xbfffed90, flags=211904, na=8656012, nk=1194304) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3824 #5 0x1007a230 in PyEval_EvalFrame (f=0x848410) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2203 #6 0x1007b284 in PyEval_EvalCodeEx (co=0x2, globals=0x4, locals=0x1, args=0x3, argcount=1049072, kws=0x841150, kwcount=1, defs=0x8411fc, defcount=0, closure=0x0) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2730 #7 0x10026274 in function_call (func=0x880bb0, arg=0x1001f0, kw=0x848410) at /Users/bob/src/Python-2.4.1/Objects/funcobject.c:548 #8 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #9 0x10015a88 in instancemethod_call (func=0x52eef0, arg=0x54a170, kw=0x0) at /Users/bob/src/Python-2.4.1/Objects/classobject.c:2431 #10 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #11 0x10059358 in slot_tp_call (self=0x53e4f0, args=0x5b310, kwds=0x0) at /Users/bob/src/Python-2.4.1/Objects/typeobject.c:4526 #12 0x1000c348 in PyObject_Call (func=0x4, arg=0x4, kw=0x15fb) at /Users/bob/src/Python-2.4.1/Objects/abstract.c:1751 #13 0x1007c9e4 in do_call (func=0x53e4f0, pp_stack=0x53e4f0, na=0, nk=8655844) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3755 #14 0x1007c6dc in call_function (pp_stack=0x0, oparg=4) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3570 #15 0x1007a140 in PyEval_EvalFrame (f=0x10e200) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2163 #16 0x1007c83c in fast_function (func=0x4, pp_stack=0x10e360, n=268927488, na=268755664, nk=1) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3629 #17 0x1007c6c4 in call_function (pp_stack=0xbffff5bc, oparg=4) at /Users/bob/src/Python-2.4.1/Python/ceval.c:3568 #18 0x1007a140 in PyEval_EvalFrame (f=0x10e030) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2163 #19 0x1007b284 in PyEval_EvalCodeEx (co=0x0, globals=0x4, locals=0x1, args=0x10078200, argcount=1049072, kws=0x841150, kwcount=1, defs=0x8411fc, defcount=0, closure=0x0) at /Users/bob/src/Python-2.4.1/Python/ceval.c:2730 #20 0x1007e678 in PyEval_EvalCode (co=0x4, globals=0x4, locals=0x15fb) at /Users/bob/src/Python-2.4.1/Python/ceval.c:484 #21 0x100b2ee0 in run_node (n=0x10078200, filename=0x4
, globals=0x0, locals=0x10e180, flags=0x2) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:1265 #22 0x100b23b0 in PyRun_InteractiveOneFlags (fp=0x54a1a5, filename=0x56ca0 "", flags=0x10e030) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:762 #23 0x100b2190 in PyRun_InteractiveLoopFlags (fp=0x56b94, filename=0xd440 "", flags=0x100f21b8) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:695 #24 0x100b3bb0 in PyRun_AnyFileExFlags (fp=0xa0001554, filename=0x100f36ac "", closeit=0, flags=0xbffff934) at /Users/bob/src/Python-2.4.1/Python/pythonrun.c:658 #25 0x100bf640 in Py_Main (argc=269413412, argv=0x20000000) at /Users/bob/src/Python-2.4.1/Modules/main.c:484 #26 0x000018d0 in start () #27 0x8fe1a278 in __dyld__dyld_start () From ndarray at mac.com Thu Apr 6 12:42:17 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 12:42:17 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? Message-ID: It looks like the definition of the diagonal changed somewhere between Numeric 24.0 and numpy: In Numeric: >>> x = Numeric.arange(2*4*4) >>> x = Numeric.reshape(x, (2, 4, 4)) >>> x array([[[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]], [[16, 17, 18, 19], [20, 21, 22, 23], [24, 25, 26, 27], [28, 29, 30, 31]]]) >>> Numeric.diagonal(x) array([[ 0, 5, 10, 15], [16, 21, 26, 31]]) But in numpy: >>> import numpy as Numeric >>> x = Numeric.arange(2*4*4) >>> x = Numeric.reshape(x, (2, 4, 4)) >>> Numeric.diagonal(x) array([[ 0, 20], [ 1, 21], [ 2, 22], [ 3, 23]]) The old logic seems to be clear: x is a pair of matrices and diagonal returns a pair of diagonals, but the new logic seems unclear: the disagonal returns the first rows of the two matrices transposed. Does anyone know when this change was introduced and why? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at mailcan.com Thu Apr 6 13:51:04 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 6 13:51:04 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? In-Reply-To: References: Message-ID: <200604061652.30764.pgmdevlist@mailcan.com> > Does anyone know when this change was introduced and why? Isn't it more a problem of default values ? By default, x.diagonal() == x.diagonal(0,0,1) x.diagonal() array([[ 0, 20], [ 1, 21], [ 2, 22], [ 3, 23]]) If you want the paired diagonal: x.diagonal(0,1,-1) array([[ 0, 5, 10, 15], [16, 21, 26, 31]]) From ndarray at mac.com Thu Apr 6 14:46:10 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 14:46:10 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? In-Reply-To: <200604061652.30764.pgmdevlist@mailcan.com> References: <200604061652.30764.pgmdevlist@mailcan.com> Message-ID: I see. However, something needs to be changed. In the current version help(diagonal) prints the following: {{{ Help on function diagonal in module numpy.core.oldnumeric: diagonal(a, offset=0, axis1=0, axis2=1) diagonal(a, offset=0, axis1=0, axis2=1) returns the given diagonals defined by the last two dimensions of the array. }}} I would think axes 0 and 1 are the first, not the last two dimensions. We can either change the documentation or change the defaults in the oldnumeric. I would vote for the change in defaults because oldnumeric is a compatibility module and should not introduce changes. In addition, the fact that the reduced axes become the first (rather than the last or one of the axis1 and axis2) dimension should be spelled out in the docstring. On 4/6/06, Pierre GM wrote: > > > Does anyone know when this change was introduced and why? > > Isn't it more a problem of default values ? > By default, x.diagonal() == x.diagonal(0,0,1) > > x.diagonal() > array([[ 0, 20], > [ 1, 21], > [ 2, 22], > [ 3, 23]]) > > If you want the paired diagonal: > x.diagonal(0,1,-1) > array([[ 0, 5, 10, 15], > [16, 21, 26, 31]]) > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Apr 6 14:59:03 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 6 14:59:03 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4434E31B.5030306@ieee.org> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> Message-ID: <44358EEA.4080609@noaa.gov> Travis Oliphant wrote: > Thus, > perhaps on pickle read the data should be copied to native byte-order if > necessary. +1 Those that are working with non-native byte order on purpose presumably know what they are doing, and can check and swap as necessary -- or use tofile and fromfile, which I presume don't do any byteswapping for you. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at mailcan.com Thu Apr 6 15:01:03 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 6 15:01:03 2006 Subject: [Numpy-discussion] What is diagonal for nd>2? In-Reply-To: References: <200604061652.30764.pgmdevlist@mailcan.com> Message-ID: <200604061802.20457.pgmdevlist@mailcan.com> > I would think axes 0 and 1 are the first, not the last two dimensions. We > can either change the documentation or change the defaults in the > oldnumeric. I would vote for the change in defaults because oldnumeric is > a compatibility module and should not introduce changes. So, change the default to: diagonal(a, offset=0, axis1=-2, axis2=-1) ? That'd make sense, I'm for that... From ndarray at mac.com Thu Apr 6 16:11:01 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 16:11:01 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: <200603280427.52789.pgmdevlist@mailcan.com> References: <200603280427.52789.pgmdevlist@mailcan.com> Message-ID: I have applied the patch with minor modifications. See < http://projects.scipy.org/scipy/numpy/changeset/2331>. Here are a few suggestions for posting patches. 1. If you are using svn, please post output of "svn diff" in the project root directory (the directory that *contains* "numpy", not the "numpy" directory. 2. If appropriate, add unit tests to an existing file instead of creating a new one. (In case of ma, the correct file is test_ma.py). 3. If you follow recommendation #1, this will happen automatically, if you cannot use svn for some reason, concatenate the output of diff for code and test in the same patch file. Here are some topics for discussion. 1. I've initially implemented some ma array methods by wrapping existing module level functions. I am not sure this is the best approach to implement new methods. It is probably cleaner to implement them as methods and provide wrappers at the module level similar to oldnumeric. 2. I am not sure cumprod and cumsum should fill masked elements with 1 and 0. I would think the result should be masked if any prior element along the axis being accumulated is masked. To ignore masked elements, filled can be called explicitly before cum[prod|sum]. One of the problems with filling by default is that 1 or 0 are not appropriate values for object arrays (for example, "" is an appropriate fill value for cumsum of an array of strings). On 3/28/06, Pierre GM wrote: > > Folks, > You can find a new patch for MA on the wiki > > http://projects.scipy.org/scipy/numpy/attachment/wiki/MaskedArray/ma-200603280900.patch > along with a test suite. > The 'clip' method should now work with array arguments. Were also added > cumsum, cumprod, std, var and squeeze. > I'll deal with flags, setflags, setfield, dump and others when I'll have a > better idea of how it works -- which probably won't happen anytime soon, > as I > don't really have time to dig in the code for these functions. AAMOF, I'm > more interested in checking/patching some other aspects of numpy for MA > (eg, > mlab...) > Once again, please send me your comments and suggestions. > Thx for everything > P. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.sorich at gmail.com Thu Apr 6 17:41:19 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Thu Apr 6 17:41:19 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: References: <200603280427.52789.pgmdevlist@mailcan.com> Message-ID: <16761e100604061733r586cca6cr94d72c554b54fdd0@mail.gmail.com> On 4/7/06, Sasha wrote: > > > 2. I am not sure cumprod and cumsum should fill masked elements with 1 and > 0. I would think the result should be masked if any prior element along the > axis being accumulated is masked. To ignore masked elements, filled can be > called explicitly before cum[prod|sum]. One of the problems with filling by > default is that 1 or 0 are not appropriate values for object arrays (for > example, "" is an appropriate fill value for cumsum of an array of strings). > > There are often a number of options for how masked values can be dealt with. In general (not just with cum*), I would prefer for the result to be masked when masked values are involved unless I explicitly indicate what should be done with the masked values. Otherwise it is too easy to forget that some default maniputlation of masked values has been applied. In R there is commonly an na.action or na.rm parameter to functions. Michael -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at mailcan.com Thu Apr 6 19:19:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 6 19:19:02 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: References: <200603280427.52789.pgmdevlist@mailcan.com> Message-ID: <200604062218.05876.pgmdevlist@mailcan.com> Sasha, Thanks for your advice with SVN. I'll make sure to use that method from now on. > 1. I've initially implemented some ma array methods by wrapping existing > module level functions. I am not sure this is the best approach to > implement new methods. It is probably cleaner to implement them as methods > and provide wrappers at the module level similar to oldnumeric. Well, I tried to stick to the latest convention, getting rid of the _wrapit part. Let me know. > > 2. I am not sure cumprod and cumsum should fill masked elements with 1 and > 0. Good point for the object/string arrays, yet other cases I overlooked (I'm still not used to object arrays, I'm now realizing they're quite useful). Actually, I coded that way because it's how I use these functions. But well, as many settings as users, eh? Michael's suggestion of introducing R-like options sounds interesting, but I wonder whether it would not be a bit heavy for methods, with the introduction of an extra flag. That'd be great for functions, though. So, for cumsum and cumprod methods, maybe we could stick to Sasha's and Michael's preference (mask all values after the first missing), and we would just have to create two functions. We could use the 4 R ones: na.omit, na.fail, na.pass, na.exclude. For our current problem (cumsum,cumprod) na.omit: would return the result I implemented (fill with 0 or 1) na.fail: would return masked values after the first missing na.exclude: would correspond to compressed().cumsum() ? I don't like that, it changes the initial length/size na.pass: I don't know... From ndarray at mac.com Thu Apr 6 21:14:01 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 6 21:14:01 2006 Subject: [Numpy-discussion] New patch for MA In-Reply-To: <16761e100604061733r586cca6cr94d72c554b54fdd0@mail.gmail.com> References: <200603280427.52789.pgmdevlist@mailcan.com> <16761e100604061733r586cca6cr94d72c554b54fdd0@mail.gmail.com> Message-ID: On 4/6/06, Michael Sorich wrote: > ... I would prefer for the result to be masked > when masked values are involved unless I explicitly indicate what should be > done with the masked values. ... This is the case in r2332: >>> from numpy.core.ma import * >>> print array([1,2,3], mask=[0,1,0]).cumsum() [1 -- --] From a.mcmorland at auckland.ac.nz Fri Apr 7 00:30:07 2006 From: a.mcmorland at auckland.ac.nz (Angus McMorland) Date: Fri Apr 7 00:30:07 2006 Subject: [Numpy-discussion] Newbie indexing question [fancy indexing in nD] In-Reply-To: <4434E522.3060101@ieee.org> References: <44338DF4.7050603@gmail.com> <4434E522.3060101@ieee.org> Message-ID: <4435F672.1040701@auckland.ac.nz> Hi again. Thanks, everyone, for your quick replies. Travis Oliphant wrote: > amcmorl wrote: > >> Hi all, >> >> I'm having a bit of trouble getting my head around numpy's indexing >> capabilities. A quick summary of the problem is that I want to >> lookup/index in nD from a second array of rank n+1, such that the last >> (or first, I guess) dimension contains the lookup co-ordinates for the >> value to extract from the first array. Here's a 2D (3,3) example: >> >> In [12]:print ar >> [[ 0.15 0.75 0.2 ] >> [ 0.82 0.5 0.77] >> [ 0.21 0.91 0.59]] >> >> In [24]:print inds >> [[[1 1] >> [1 1] >> [2 1]] >> >> [[2 2] >> [0 0] >> [1 0]] >> >> [[1 1] >> [0 0] >> [2 1]]] >> >> then somehow return the array (barring me making any row/column errors): >> In [26]: c = ar.somefancyindexingroutinehere(inds) > > You can do this with "fancy-indexing". Obviously it is going to take > some time for people to get used to this idea as none of the responses > yet suggest it. > But the following works. > c = ar[inds[...,0],inds[...,1]] > > gives the desired effect. > > Thus, your simple description c[x,y] = ar[inds[x,y,0],inds[x,y,1]] is a > text-book description of what fancy-indexing does. Great. Turns out I wasn't too far off then. I've written a quick function of my own that extends the fancy indexing to nD: def fancy_index_nd(ar, ind): evList = ['ar['] for i in range(len(ar.shape)): evList = evList + [' ind[...,%d]' % i] if i < len(ar.shape) - 1: evList = evList + [","] evList = evList + [' ]'] return eval(''.join(evList)) 1) Am I missing a simpler way to extend the fancy-indexing to n-dimensions? If not... 2) this seems (conceptually) that it might be a little faster than the routines that have to calculate a flat index. Hopefully it could be of use to people. Any thoughts? Cheers, Angus -- Angus McMorland email a.mcmorland at auckland.ac.nz mobile +64-21-155-4906 PhD Student, Neurophysiology / Multiphoton & Confocal Imaging Physiology, University of Auckland phone +64-9-3737-599 x89707 Armourer, Auckland University Fencing Secretary, Fencing North Inc. From pau.gargallo at gmail.com Fri Apr 7 02:37:05 2006 From: pau.gargallo at gmail.com (Pau Gargallo) Date: Fri Apr 7 02:37:05 2006 Subject: [Numpy-discussion] Newbie indexing question [fancy indexing in nD] In-Reply-To: <4435F672.1040701@auckland.ac.nz> References: <44338DF4.7050603@gmail.com> <4434E522.3060101@ieee.org> <4435F672.1040701@auckland.ac.nz> Message-ID: <6ef8f3380604070236m2d606983l82403cbc2305fefa@mail.gmail.com> you can do things like a[ list( ind[...,i] for i in range(.shape[-1]) ) ] if the indices could be accessed as ind[i] instead of ind[...,i] (transposing the indices array) then you could simply do: a[ list(ind) ] pau On 4/7/06, Angus McMorland wrote: > Hi again. > > Thanks, everyone, for your quick replies. > > Travis Oliphant wrote: > > amcmorl wrote: > > > >> Hi all, > >> > >> I'm having a bit of trouble getting my head around numpy's indexing > >> capabilities. A quick summary of the problem is that I want to > >> lookup/index in nD from a second array of rank n+1, such that the last > >> (or first, I guess) dimension contains the lookup co-ordinates for the > >> value to extract from the first array. Here's a 2D (3,3) example: > >> > >> In [12]:print ar > >> [[ 0.15 0.75 0.2 ] > >> [ 0.82 0.5 0.77] > >> [ 0.21 0.91 0.59]] > >> > >> In [24]:print inds > >> [[[1 1] > >> [1 1] > >> [2 1]] > >> > >> [[2 2] > >> [0 0] > >> [1 0]] > >> > >> [[1 1] > >> [0 0] > >> [2 1]]] > >> > >> then somehow return the array (barring me making any row/column errors): > >> In [26]: c = ar.somefancyindexingroutinehere(inds) > > > > You can do this with "fancy-indexing". Obviously it is going to take > > some time for people to get used to this idea as none of the responses > > yet suggest it. > > But the following works. > > c = ar[inds[...,0],inds[...,1]] > > > > gives the desired effect. > > > > Thus, your simple description c[x,y] = ar[inds[x,y,0],inds[x,y,1]] is a > > text-book description of what fancy-indexing does. > > Great. Turns out I wasn't too far off then. I've written a quick > function of my own that extends the fancy indexing to nD: > > def fancy_index_nd(ar, ind): > evList = ['ar['] > for i in range(len(ar.shape)): > evList = evList + [' ind[...,%d]' % i] > if i < len(ar.shape) - 1: > evList = evList + [","] > evList = evList + [' ]'] > return eval(''.join(evList)) > > 1) Am I missing a simpler way to extend the fancy-indexing to > n-dimensions? If not... > > 2) this seems (conceptually) that it might be a little faster than the > routines that have to calculate a flat index. Hopefully it could be of > use to people. Any thoughts? > > Cheers, > > Angus > -- > Angus McMorland > email a.mcmorland at auckland.ac.nz > mobile +64-21-155-4906 > > PhD Student, Neurophysiology / Multiphoton & Confocal Imaging > Physiology, University of Auckland > phone +64-9-3737-599 x89707 > > Armourer, Auckland University Fencing > Secretary, Fencing North Inc. > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From mxjmfen at dlalaw.com Fri Apr 7 04:50:08 2006 From: mxjmfen at dlalaw.com (mxjmfen) Date: Fri Apr 7 04:50:08 2006 Subject: [Numpy-discussion] Fw: numpy-discussion Message-ID: <001401c65a39$3e4f54e0$6c5fd855@ries> ----- Original Message ----- From: Rosenberg Kris To: xqoahbphlic at time.net Sent: Friday, April 07, 2006 11:26 AM Subject: numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-discussion.gif Type: image/gif Size: 16996 bytes Desc: not available URL: From a.h.jaffe at gmail.com Fri Apr 7 06:54:09 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Fri Apr 7 06:54:09 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <4434E31B.5030306@ieee.org> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> Message-ID: <44366E71.7060601@gmail.com> Travis Oliphant wrote: > But, this brings up the point that currently the pickled raw-data which > is read-in as a string by Python is used as the memory for the new array > (i.e. the string memory is "stolen"). This should work. The fact > that it didn't with sort was a bug that is now fixed in SVN. However, > operations on out-of-byte-order arrays will always be slower. Thus, > perhaps on pickle read the data should be copied to native byte-order if > necessary. +1 from me, too. I assume that byteswapping is fast compared to I/O in most cases, and the only times when you wouldn't want it would be 'advanced' usage that the developer could take control of via a custom reduce, __getstate__, __setstate__, etc. Andrew ______________________________________________________________________ Andrew Jaffe a.jaffe at imperial.ac.uk Astrophysics Group +44 207 594-7526 Blackett Laboratory, Room 1013 FAX 7541 Imperial College, Prince Consort Road London SW7 2AZ ENGLAND http://astro.imperial.ac.uk/~jaffe From ndarray at mac.com Fri Apr 7 10:26:06 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 10:26:06 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: Message-ID: I am posting a reply to my own post in a hope to generate some discussion of the original proposal. I am proposing to add a "filled" method to ndarray. This can be a pass-through, an alias to "copy" or a method to replace nans or some other type-specific values. This will allow code that uses "filled" work on ndarrays without changes. On 3/22/06, Sasha wrote: > > In an ideal world, any function that accepts ndarray would accept > ma.array and vice versa. Moreover, if the ma.array has no masked > elements and the same data as ndarray, the result should be the same. > Obviously current implementation falls short of this goal, but there > is one feature that seems to make this goal unachievable. > > This feature is the "filled" method of ma.array. Pydoc for this > method reports the following: > > | filled(self, fill_value=None) > | A numeric array with masked values filled. If fill_value is None, > | use self.fill_value(). > | > | If mask is nomask, copy data only if not contiguous. > | Result is always a contiguous, numeric array. > | # Is contiguous really necessary now? > > > That is not the best possible description ("filled" is "filled"), but > the essence is that the result of a.filled(value) is a contiguous > ndarray obtained from the masked array by copying non-masked elements > and using value for masked values. > > I would like to propose to add a "filled" method to ndarray. I see > several possibilities and would like to hear your opinion: > > 1. Make filled simply return self. > > 2. Make filled return a contiguous copy. > > 3. Make filled replace nans with the fill_value if array is of > floating point type. > > > Unfortunately, adding "filled" will result is a rather confusing > situation where "fill" and "filled" both exist and have very different > meanings. > > I would like to note that "fill" is a somewhat odd ndarray method. > AFAICT, it is the only non-special method that mutates the array. It > appears to be just a performance trick: the same result can be achived > with "a[...] = ". > -------------- next part -------------- An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Fri Apr 7 10:38:03 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 10:38:03 2006 Subject: [Numpy-discussion] Tiling / disk storage for matrix in numpy? Message-ID: Hi all, Is there a way in numpy to associate a (large) matrix with a disk file, then and tile and index it, then cache it as you process the various pieces? This is pretty important with massive image files, which can't fit into working memory, but in which (for example) you might be doing a convolution on a 100 x 100 pixel window on a small subset of the image. I know that caching algorithms are (1) complicated and (2) never general. But there you go. Perhaps I can't find it, perhaps it would be a good project for the future? If HDF or something does this already, could someone point me in the right direction? Thx From tim.hochberg at cox.net Fri Apr 7 11:22:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 11:22:05 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: Message-ID: <4436AE31.7000306@cox.net> Sasha wrote: > I am posting a reply to my own post in a hope to generate some > discussion of the original proposal. > > I am proposing to add a "filled" method to ndarray. This can be a > pass-through, an alias to "copy" or a method to replace nans or some > other type-specific values. This will allow code that uses "filled" > work on > ndarrays without changes. In general, I'm skeptical of adding more methods to the ndarray object -- there are plenty already. In addition, it appears that both the method and function versions of filled are "dangerous" in the sense that they sometimes return the array itself and sometimes a copy. Finally, changing ndarray to support masked array feels a bit like the tail wagging the dog. Let me throw out an alternative proposal. I will admit up front that this proposal is based on exactly zero experience with masked array, so there may be some stupidities in it, but perhaps it will lead to an alternative solution. def asUnmaskedArray(obj, fill_value=None): mask = getattr(obj, False) if mask is False: return obj if fill_value is None: fill_value = obj.get_fill_value() newobj = obj.data().copy() newobj[mask] = fill_value return newobj Or something like that anyway. This particular version should work on any array as long as if it exports a mask attribute it also exports get_fill_value and data. At least once any bugs are ironed out, I haven't tested it. ma would have to be modified to use this instead of using filled everywhere, but that seems more appropriate than tacking on another method to ndarray IMO. On advantage of this approach is that most array like objects that don't subclass ndarray will work with this automagically. If we keep expanding the methods of ndarray, it's harder and harder to implement other array like objects since they have to implement more and more methods, most of which are irrelevant to their particular case. The more we can implement stuff like this in terms of some relatively small set of core primitives, the happier we'll all be in the long run. This also builds on the idea of trying to push as much of the array/view ambiguity into the asXXXArray corner. Regards, -tim > > > On 3/22/06, *Sasha* > wrote: > > In an ideal world, any function that accepts ndarray would accept > ma.array and vice versa. Moreover, if the ma.array has no masked > elements and the same data as ndarray, the result should be the same. > Obviously current implementation falls short of this goal, but there > is one feature that seems to make this goal unachievable. > > This feature is the "filled" method of ma.array. Pydoc for this > method reports the following: > > | filled(self, fill_value=None) > | A numeric array with masked values filled. If fill_value is > None, > | use self.fill_value(). > | > | If mask is nomask, copy data only if not contiguous. > | Result is always a contiguous, numeric array. > | # Is contiguous really necessary now? > > > That is not the best possible description ("filled" is "filled"), but > the essence is that the result of a.filled(value) is a contiguous > ndarray obtained from the masked array by copying non-masked elements > and using value for masked values. > > I would like to propose to add a "filled" method to ndarray. I see > several possibilities and would like to hear your opinion: > > 1. Make filled simply return self. > > 2. Make filled return a contiguous copy. > > 3. Make filled replace nans with the fill_value if array is of > floating point type. > > > Unfortunately, adding "filled" will result is a rather confusing > situation where "fill" and "filled" both exist and have very different > meanings. > > I would like to note that "fill" is a somewhat odd ndarray method. > AFAICT, it is the only non-special method that mutates the array. It > appears to be just a performance trick: the same result can be > achived > with "a[...] = ". > > From ndarray at mac.com Fri Apr 7 12:20:15 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 12:20:15 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436AE31.7000306@cox.net> References: <4436AE31.7000306@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > > ... > In general, I'm skeptical of adding more methods to the ndarray object > -- there are plenty already. I've also proposed to drop "fill" in favor of optimizing x[...] = . Having both "fill" and "filled" in the interface is plain awkward. You may like the combined proposal better because it does not change the total number of methods :-) In addition, it appears that both the method and function versions of > filled are "dangerous" in the sense that they sometimes return the array > itself and sometimes a copy. This is true in ma, but may certainly be changed. > Finally, changing ndarray to support masked array feels a bit like the > tail wagging the dog. I disagree. Numpy is pretty much alone among the array languages because it does not have "native" support for missing values. For the floating point types some rudimental support for nans exists, but is not really usable. There is no missing values machanism for integer types. I believe adding "filled" and maybe "mask" to ndarray (not necessarily under these names) could be a meaningful step towards "native" support for missing values. -------------- next part -------------- An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Fri Apr 7 12:36:00 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 12:36:00 2006 Subject: [Numpy-discussion] Silly array question Message-ID: In R, if you have an Nx2 array of integers, you can use that to index an TxS array, yielding a 1xN result. Is there a way to do that in numpy? I looked for a pairs function but I coudn't find it, vaguely remembering that might be around... I know it would be a trivial loop to write, but a numpy array function would be faster (I hope). Example I = [[0,0], [1,1], [2,2], [1,1]] M = [[1, 2, 3, 4], [5, 6, 7, 8], [9,10,11, 12], [13, 14, 15, 16]] M[I] = [1,6,11,6]. Thanks! From ndarray at mac.com Fri Apr 7 12:53:03 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 12:53:03 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: Message-ID: >>> M.ravel()[dot(I,(4,1))] array([ 1, 6, 11, 6]) On 4/7/06, Webb Sprague wrote: > > In R, if you have an Nx2 array of integers, you can use that to index > an TxS array, yielding a 1xN result. Is there a way to do that in > numpy? I looked for a pairs function but I coudn't find it, vaguely > remembering that might be around... I know it would be a trivial loop > to write, but a numpy array function would be faster (I hope). > > Example > > I = [[0,0], [1,1], [2,2], [1,1]] > M = [[1, 2, 3, 4], > [5, 6, 7, 8], > [9,10,11, 12], > [13, 14, 15, 16]] > > M[I] = [1,6,11,6]. > > Thanks! > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmdlnk&kid0944&bid$1720&dat1642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Fri Apr 7 13:22:06 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri Apr 7 13:22:06 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> Message-ID: <4436C965.8020808@hawaii.edu> Sasha wrote: > > > On 4/7/06, *Tim Hochberg* > wrote: > > ... > In general, I'm skeptical of adding more methods to the ndarray object > -- there are plenty already. > > > I've also proposed to drop "fill" in favor of optimizing x[...] = > . Having both "fill" and "filled" in the interface is plain > awkward. You may like the combined proposal better because it does not > change the total number of methods :-) > > > In addition, it appears that both the method and function versions of > filled are "dangerous" in the sense that they sometimes return the > array > itself and sometimes a copy. > > > This is true in ma, but may certainly be changed. > > > Finally, changing ndarray to support masked array feels a bit like the > tail wagging the dog. > > > I disagree. Numpy is pretty much alone among the array languages because > it does not have "native" support for missing values. For the floating > point types some rudimental support for nans exists, but is not really > usable. There is no missing values machanism for integer types. I > believe adding "filled" and maybe "mask" to ndarray (not necessarily > under these names) could be a meaningful step towards "native" support > for missing values. I agree strongly with you, Sasha. I get the impression that the world of numerical computation is divided into those who work with idealized "data", where nothing is missing, and those who work with real observations, where there is always something missing. As an oceanographer, I am solidly in the latter category. If good support for missing values is not built in, it has to be bolted on, and it becomes clunky and awkward. I was reluctant to speak up about this earlier because I thought it was too much to ask of Travis when he was in the midst of putting numpy on solid ground. But I am delighted that missing value support has a champion among numpy developers, and I agree that now is the time to change it from "bolted on" to "integrated". Eric From Chris.Barker at noaa.gov Fri Apr 7 13:28:02 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri Apr 7 13:28:02 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: Message-ID: <4436CB1C.3040308@noaa.gov> Webb Sprague wrote: > In R, if you have an Nx2 array of integers, you can use that to index > an TxS array, yielding a 1xN result. this seems to work: >>> import numpy as N >>> I = N.array([[0,0], [1,1], [2,2], [1,1]]) >>> I array([[0, 0], [1, 1], [2, 2], [1, 1]]) >>> M = N. array( [[1, 2, 3, 4], [5, 6, 7, 8], [9,10,11, 12], [13, 14, 15, 16]]) >>> M array([[ 1, 2, 3, 4], [ 5, 6, 7, 8], [ 9, 10, 11, 12], [13, 14, 15, 16]]) >>> M[I[:,0], I[:,1]] array([ 1, 6, 11, 6]) -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From ndarray at mac.com Fri Apr 7 13:56:02 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 13:56:02 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: <4436CB1C.3040308@noaa.gov> References: <4436CB1C.3040308@noaa.gov> Message-ID: One more obfuscated numpy entry: >>> M[tuple(transpose(I))] array([ 1, 6, 11, 6]) On 4/7/06, Christopher Barker wrote: > > > > Webb Sprague wrote: > > In R, if you have an Nx2 array of integers, you can use that to index > > an TxS array, yielding a 1xN result. > > this seems to work: > > >>> import numpy as N > >>> I = N.array([[0,0], [1,1], [2,2], [1,1]]) > >>> I > array([[0, 0], > [1, 1], > [2, 2], > [1, 1]]) > > >>> M = N. array( [[1, 2, 3, 4], [5, 6, 7, 8], [9,10,11, 12], [13, 14, > 15, 16]]) > > >>> M > array([[ 1, 2, 3, 4], > [ 5, 6, 7, 8], > [ 9, 10, 11, 12], > [13, 14, 15, 16]]) > > >>> M[I[:,0], I[:,1]] > array([ 1, 6, 11, 6]) > > -- > Christopher Barker, Ph.D. > Oceanographer > > NOAA/OR&R/HAZMAT (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Fri Apr 7 14:00:10 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 14:00:10 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: <4436CB1C.3040308@noaa.gov> Message-ID: I appreciate everyone's help, but is there a NON obfuscated way to do this without looping? I think Chris's is my favorite, but I didn't know I was starting a contest :) > >>> M[I[:,0], I[:,1]] > array([ 1, 6, 11, 6]) W From webb.sprague at gmail.com Fri Apr 7 14:05:04 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Fri Apr 7 14:05:04 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: <4436CB1C.3040308@noaa.gov> Message-ID: Ok, so now I get it M[(tuple for rows), (tuple for columns)] Whew On 4/7/06, Webb Sprague wrote: > I appreciate everyone's help, but is there a NON obfuscated way to do > this without looping? I think Chris's is my favorite, but I didn't > know I was starting a contest :) > > > >>> M[I[:,0], I[:,1]] > > array([ 1, 6, 11, 6]) > > W > From tim.hochberg at cox.net Fri Apr 7 14:16:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 14:16:06 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436C965.8020808@hawaii.edu> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> Message-ID: <4436D6D1.6040302@cox.net> Eric Firing wrote: > Sasha wrote: > >> >> >> On 4/7/06, *Tim Hochberg* > > wrote: >> >> ... >> In general, I'm skeptical of adding more methods to the ndarray >> object >> -- there are plenty already. >> >> >> I've also proposed to drop "fill" in favor of optimizing x[...] = >> . Having both "fill" and "filled" in the interface is plain >> awkward. You may like the combined proposal better because it does >> not change the total number of methods :-) >> >> >> In addition, it appears that both the method and function >> versions of >> filled are "dangerous" in the sense that they sometimes return the >> array >> itself and sometimes a copy. >> >> >> This is true in ma, but may certainly be changed. >> >> >> Finally, changing ndarray to support masked array feels a bit >> like the >> tail wagging the dog. >> >> I disagree. Numpy is pretty much alone among the array languages >> because it does not have "native" support for missing values. For >> the floating point types some rudimental support for nans exists, >> but is not really usable. There is no missing values machanism for >> integer types. I believe adding "filled" and maybe "mask" to ndarray >> (not necessarily under these names) could be a meaningful step >> towards "native" support for missing values. > > > I agree strongly with you, Sasha. I get the impression that the world > of numerical computation is divided into those who work with idealized > "data", where nothing is missing, and those who work with real > observations, where there is always something missing. I think your experience is clouding your judgement here. Or at least this comes off as unnecessarily perjorative. There's a large class of people who work with data that doesn't have missing values either because of the nature of data acquisition or because they're doing simulations. I take zillions of measurements with digital oscillopscopes and they *never* have missing values. Clipped values, yes, but even if I somehow could queery the scope about which values were actually clipped or simply make an educated guess based on their value, the facilities of ma would be useless to me. The clipped values are what I would want in any case. I also do a lot of work with simulations derived from this and other data. I don't come across missing values here but again, if I did, the way ma works would not help me. I'd have to treat them either by rejecting the data outright or by some sort of interpolation. > As an oceanographer, I am solidly in the latter category. If good > support for missing values is not built in, it has to be bolted on, > and it becomes clunky and awkward. This may be a false dichotomy. It's certainly not obvious to me that this is so. At least if "bolted on" means "not adding a filled method to ndarray". > I was reluctant to speak up about this earlier because I thought it > was too much to ask of Travis when he was in the midst of putting > numpy on solid ground. But I am delighted that missing value support > has a champion among numpy developers, and I agree that now is the > time to change it from "bolted on" to "integrated". I have no objection to ma support improving. In fact I think it would be great although I don't forsee it helping me anytime soon. I also support Sasha's goal of being able to mix MaskedArrays and ndarrays reasonably seemlessly. However, I do think the situation needs more thought. Slapping filled and mask onto ndarray is the path of least resistance, but it's not clear that it's the best one. If we do decide we are going to add both of these methods to ndarray (with filled returning a copy!), then it may worth considering making ndarray a subclass of MaskedArray. Conceptually this makes sense, since at this point an ndarray will just be a MaskedArray where mask is always False. I think that they could share much of the implementation except that ndarray would be set up to use methods that ignored the mask attribute since they would know that it's always false. Even that might not be worth it, since the check for whether mask is True/False is just a pointer compare. It may in fact be best just to do away with MaskedArray entirely, moving the functionality into ndarray. That may have performance implications, although I don't seem them at the moment, and I don't know if there are other methods/attributes that this would imply need to be moved over, although it looks like just mask, filled and possibly filled_value, although the latter looks a little dubious to me. Either of the above two options would certainly improve the quality of MaskedArray. Copy for instance seems not to have been implemented, and who knows what other dark corners remain unexplored here. There's a whole spectrum of possibilities here from ones that don't intrude on ndarray at all to ones that profoundly change it. Sasha's suggestion looks like it's probably the simplest thing in the short term, but I don't know that it's the best long term solution. I think it needs more thought and discussion, which is after all what Sasha asked for ;) Regards, -tim From Chris.Barker at noaa.gov Fri Apr 7 15:13:02 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri Apr 7 15:13:02 2006 Subject: [Numpy-discussion] Silly array question In-Reply-To: References: <4436CB1C.3040308@noaa.gov> Message-ID: <4436E3C9.2040807@noaa.gov> Sasha wrote: > One more obfuscated numpy entry: > >>>> M[tuple(transpose(I))] > array([ 1, 6, 11, 6]) exactly. Can anyone explain why that works, but: M[transpose(I)] or M[I] doesn't? -Chris - Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From efiring at hawaii.edu Fri Apr 7 15:37:03 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri Apr 7 15:37:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436D6D1.6040302@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: <4436E95B.4090009@hawaii.edu> Tim Hochberg wrote: > Eric Firing wrote: > >> Sasha wrote: >> >>> >>> >>> On 4/7/06, *Tim Hochberg* >> > wrote: >>> >>> ... >>> In general, I'm skeptical of adding more methods to the ndarray >>> object >>> -- there are plenty already. >>> >>> >>> I've also proposed to drop "fill" in favor of optimizing x[...] = >>> . Having both "fill" and "filled" in the interface is plain >>> awkward. You may like the combined proposal better because it does >>> not change the total number of methods :-) >>> >>> >>> In addition, it appears that both the method and function >>> versions of >>> filled are "dangerous" in the sense that they sometimes return the >>> array >>> itself and sometimes a copy. >>> >>> >>> This is true in ma, but may certainly be changed. >>> >>> >>> Finally, changing ndarray to support masked array feels a bit >>> like the >>> tail wagging the dog. >>> >>> I disagree. Numpy is pretty much alone among the array languages >>> because it does not have "native" support for missing values. For >>> the floating point types some rudimental support for nans exists, >>> but is not really usable. There is no missing values machanism for >>> integer types. I believe adding "filled" and maybe "mask" to ndarray >>> (not necessarily under these names) could be a meaningful step >>> towards "native" support for missing values. >> >> >> >> I agree strongly with you, Sasha. I get the impression that the world >> of numerical computation is divided into those who work with idealized >> "data", where nothing is missing, and those who work with real >> observations, where there is always something missing. > > > I think your experience is clouding your judgement here. Or at least > this comes off as unnecessarily perjorative. There's a large class of > people who work with data that doesn't have missing values either > because of the nature of data acquisition or because they're doing > simulations. I take zillions of measurements with digital oscillopscopes > and they *never* have missing values. Clipped values, yes, but even if I > somehow could queery the scope about which values were actually clipped > or simply make an educated guess based on their value, the facilities of > ma would be useless to me. The clipped values are what I would want in > any case. I also do a lot of work with simulations derived from this > and other data. I don't come across missing values here but again, if I > did, the way ma works would not help me. I'd have to treat them either > by rejecting the data outright or by some sort of interpolation. Tim, The point is well-taken, and I apologize. I stated my case badly. (I would be delighted if I did not have to be concerned with missing values-they are a pain regardless of how well a numerical package handles them.) > >> As an oceanographer, I am solidly in the latter category. If good >> support for missing values is not built in, it has to be bolted on, >> and it becomes clunky and awkward. > > > This may be a false dichotomy. It's certainly not obvious to me that > this is so. At least if "bolted on" means "not adding a filled method to > ndarray". I probably overstated it, but I think we actually agree. I intended to lend support to the priority of making missing-value support as seamless and painless as possible. It will help some people, and not others. > >> I was reluctant to speak up about this earlier because I thought it >> was too much to ask of Travis when he was in the midst of putting >> numpy on solid ground. But I am delighted that missing value support >> has a champion among numpy developers, and I agree that now is the >> time to change it from "bolted on" to "integrated". > > > > I have no objection to ma support improving. In fact I think it would be > great although I don't forsee it helping me anytime soon. I also support > Sasha's goal of being able to mix MaskedArrays and ndarrays reasonably > seemlessly. > > However, I do think the situation needs more thought. Slapping filled > and mask onto ndarray is the path of least resistance, but it's not > clear that it's the best one. > > If we do decide we are going to add both of these methods to ndarray > (with filled returning a copy!), then it may worth considering making > ndarray a subclass of MaskedArray. Conceptually this makes sense, since > at this point an ndarray will just be a MaskedArray where mask is always > False. I think that they could share much of the implementation except > that ndarray would be set up to use methods that ignored the mask > attribute since they would know that it's always false. Even that might > not be worth it, since the check for whether mask is True/False is just > a pointer compare. > > It may in fact be best just to do away with MaskedArray entirely, moving > the functionality into ndarray. That may have performance implications, > although I don't seem them at the moment, and I don't know if there are > other methods/attributes that this would imply need to be moved over, > although it looks like just mask, filled and possibly filled_value, > although the latter looks a little dubious to me. > This is exactly the option that I was afraid to bring up because I thought it might be too disruptive, and because I am not contributing to numpy, and probably don't have the competence (or time) to do so. > Either of the above two options would certainly improve the quality of > MaskedArray. Copy for instance seems not to have been implemented, and > who knows what other dark corners remain unexplored here. > > There's a whole spectrum of possibilities here from ones that don't > intrude on ndarray at all to ones that profoundly change it. Sasha's > suggestion looks like it's probably the simplest thing in the short > term, but I don't know that it's the best long term solution. I think it > needs more thought and discussion, which is after all what Sasha asked > for ;) Exactly! Thank you for broadening the discussion. Eric From ndarray at mac.com Fri Apr 7 15:38:04 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 15:38:04 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436D6D1.6040302@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > > However, I do think the situation needs more thought. Slapping filled > and mask onto ndarray is the path of least resistance, but it's not > clear that it's the best one. Completely agree. I have many gripes about current ma implementation of both "filled" and "mask". filled: 1. I don't like default fill value. It should be mandatory to supply fill value. 2. It should return masked array (with trivial mask), not ndarray. 3. The name conflicts with the "fill" method. 4. View/Copy inconsistency. Does not provide a method to fill values in-place. mask: 1. I've got rid of mask returning None in favor of False_ (boolean array scalar), but it is still not perfect. I would prefer data.shape == mask.shape invariant and if space saving/performance is deemed necessary use zero-stride arrays. 2. I don't like the name. "Missing" or "na" would be better. > If we do decide we are going to add both of these methods to ndarray > (with filled returning a copy!), then it may worth considering making > ndarray a subclass of MaskedArray. Conceptually this makes sense, since > at this point an ndarray will just be a MaskedArray where mask is always > False. I think that they could share much of the implementation except > that ndarray would be set up to use methods that ignored the mask > attribute since they would know that it's always false. Even that might > not be worth it, since the check for whether mask is True/False is just > a pointer compare. > The tail becoming the dog! Yet I agree, this makes sense from the implementation point of view. From OOP perspective this would make sense if arrays were immutable, but since mask is settable in MaskedArray, making it constant in the subclass will violate the substitution principle. I would not object making mask read only, however. > It may in fact be best just to do away with MaskedArray entirely, moving > the functionality into ndarray. That may have performance implications, > although I don't seem them at the moment, and I don't know if there are > other methods/attributes that this would imply need to be moved over, > although it looks like just mask, filled and possibly filled_value, > although the latter looks a little dubious to me. > I think MA can coexist with ndarray and share the interface. Ndarray can use special bit-patterns like IEEE NaN to indicate missing floating point values. Add-on modules can redefine arithmetic to make INT_MIN behave as a missing marker for signed integers (R, K and J (I think) languages use this approach). Applications that need missing values support across the board will use MA. > Either of the above two options would certainly improve the quality of > MaskedArray. Copy for instance seems not to have been implemented, and > who knows what other dark corners remain unexplored here. > More (corners) than you want to know about! Reimplementing MA in C would be a worthwhile goal (and what you suggest seems to require just that), but it is too big of a project. I suggest that we focus on the interface first. If existing MA interface is rejected (which is likely) for ndarray, we can easily experiment with the alternatives within MA, which is pure python. > There's a whole spectrum of possibilities here from ones that don't > intrude on ndarray at all to ones that profoundly change it. Sasha's > suggestion looks like it's probably the simplest thing in the short > term, but I don't know that it's the best long term solution. I think it > needs more thought and discussion, which is after all what Sasha asked > for ;) Exactly! From robert.kern at gmail.com Fri Apr 7 15:39:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 7 15:39:02 2006 Subject: [Numpy-discussion] Re: Silly array question In-Reply-To: <4436E3C9.2040807@noaa.gov> References: <4436CB1C.3040308@noaa.gov> <4436E3C9.2040807@noaa.gov> Message-ID: Christopher Barker wrote: > Sasha wrote: > >> One more obfuscated numpy entry: >> >>>>> M[tuple(transpose(I))] >> >> array([ 1, 6, 11, 6]) > > exactly. Can anyone explain why that works, but: > > M[transpose(I)] > > or > M[I] > > doesn't? There's some typechecking going on in __getitem__. Tuples are presumed to mean that each item in the tuple is indexing on a different axis. Non-tuples are presumed to be fancy array-indexing. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pgmdevlist at mailcan.com Fri Apr 7 15:54:01 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Fri Apr 7 15:54:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436D6D1.6040302@cox.net> References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: <200604071844.37724.pgmdevlist@mailcan.com> Folks, I'm more or less in Eric's field (hydrology), and we do have to deal with missing values, that we can't interpolate straightforwardly (that is, without some dark statistical magic). Purely discarding the data is not an option either. MA fills the need, most of it. I think one of the issues is what is meant by 'masked data': - a missing observation ? - a NAN ? - a data we don't want to consider at one particular point ? For the last point, think about raster maps or bitmaps: calculations should be performed on a chunk of data, the initial data left untouched, and the result should both have the same size as the original, and valid only on the initial chunk. The current MA implementation, with its _data part and is _mask part, works nicely for the 3rd point. - I wonder whether implementing a 'filled' method for ndarrays is really better than letting the user create a MaskedArray, where the NANs are masked.In any case, a 'filled' method should always return a copy, as it's no longer the initial data. - I'm not sure what to do with the idea of making ndarray a subclass of MA . One on side, Tim pointed rightly that a ndarray is just a MA with a 'False' mask. Actually, I'm a bit frustrated with the standard 'asarray' that shows up in many functions. I'd prefer something like "if the argument is a non-numpy sequence (tuples,lists), transforming it in a ndarray, but if it's already a ndarray or a MA, leave it as it is. Don't touch the mask if present". That's how MA.asarray works, but unfortunately the std "asarray" gets rid of the mask (and you end up with something which is not what you'd expect). A 'mask=False' attribute in ndarray would be nice. On another, some methods/functions make sense only on unmasked ndarray (FFT, solving equations), some others are a bit tricky to implement (diff ? median...). Some exception could be raised if the arguments of these functions return True with ismasked (cf below), or that could be simplified if 'mask' was a default attribute of numarrays. I regularly have to use a ismasked function (cf below). def ismasked(a): if hasattr(a,'mask'): return a.mask.any() else: return False We're going towards MA as the default object. But then again, what would be the behavior to deal with missing values ? Using R-like na.actions ? That'd be great, but it's getting more complex. Oh, and another thing: if 'mask', or 'masked' becomes a default attribute of ndarrays, how do we define a mask? As a boolean ndarray whose 'mask' is always 'False' ? How do you __repr__ it ? - I agree that 'filled_value' is not very useful. If I want to fill an array, I'm happy to specify what value I want it filled with. In facts, I'd be happier to specifiy 'values'. I often have to work with 2D arrays, each column representing a different variable. If this array has to be filled, I'd like each column to be filled with one particular value, not necessarily the same along all columns: something like column_stack([A[:,k].filled(filler[k]) for k in range(A.shape[1])]) with filler a 1xA.shape[1] array of filling values. Of course, we could imagine the same thing for rows, or higher dimensions... Sorry for the rants... From pgmdevlist at mailcan.com Fri Apr 7 16:13:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Fri Apr 7 16:13:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436D6D1.6040302@cox.net> Message-ID: <200604071914.44752.pgmdevlist@mailcan.com> > filled: > 1. I don't like default fill value. It should be mandatory to > supply fill value. +1 > 2. It should return masked array (with trivial mask), not ndarray. -1. Unless 'mask/missing/na' becomes a default in ndarray, and other basic ndarray functions know how to deal with MA seamlessly > 3. The name conflicts with the "fill" method. fillmask ? clog ? > 4. View/Copy inconsistency. Does not provide a method to fill values > in-place. But once again, I don't think it should be the default behaviour ! A filled array should always be a copy of the initial array. Changing in place means changing the initial data, and I foresee lots of fun to find the original back. No ctrl+Z. > mask: > > 1. I've got rid of mask returning None in favor of False_ (boolean > array scalar), but it is still not perfect. I would prefer data.shape > == mask.shape invariant and if space saving/performance is deemed > necessary use zero-stride arrays. You,lost me on the strides, but I agree with data.shape==mask.shape as a std > 2. I don't like the name. "Missing" or "na" would be better. Once again, it's a point of view. Masked data also means 'data that I don't wanna see now, but that I may want to see later'. Like masking an bitmap/raster area. +0 for na, no for missing. > I would not object making mask read only, however. Good point. but I was more and more thinking of the opposite. I have a set of data that I group in three classes. Plotting one class is straightforward, I just have to mask the other two. Do I really want/need three objects for the same data ? Can't I just save three masks, and then run a data[mask] ? > If existing MA interface is rejected (which is > likely) for ndarray, we can easily experiment with the alternatives > within MA, which is pure python. Er... How many of us are using MA on a regular basis ? Aren't we a minority ? It'd seem wiser to adapt MA to numpy, in Python (but maybe that's the XIXe French integration model I grew up with that makes me talk here...) From ndarray at mac.com Fri Apr 7 16:31:03 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 16:31:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604071844.37724.pgmdevlist@mailcan.com> References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <200604071844.37724.pgmdevlist@mailcan.com> Message-ID: On 4/7/06, Pierre GM wrote: > ... > We're going towards MA as the default object. > I will be against changing the array structure to handle missing values. Let's keep the discussion focuced on the interface. Once we agree on the interface, it will be clear if any structural changes are necessary. > But then again, what would be the behavior to deal with missing values ? We can postpone this discussion as well. Just add mask attribute that returns False and filled method that returns a copy is an example of a minimalistic change. > Using R-like na.actions ? That'd be great, but it's getting more complex. > I don't like na.actions. I think missing values should behave like IEEE NaNs and in the floating point case should be represented by NaNs. The functionality provided by na.actions can always be achieved by calling an extra function (filled or compress). > Oh, and another thing: if 'mask', or 'masked' becomes a default attribute of > ndarrays, how do we define a mask? As a boolean ndarray whose 'mask' is > always 'False' ? How do you __repr__ it ? > See above. For ndarray mask is always False unless an add-on module is loaded that redefines arithmetic to recognize special bit-patterns such as NaN or INT_MIN. From tim.hochberg at cox.net Fri Apr 7 17:09:11 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 17:09:11 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> Message-ID: <4436FF73.7080408@cox.net> Sasha wrote: >On 4/7/06, Tim Hochberg wrote: > > >>[...] >> >>However, I do think the situation needs more thought. Slapping filled >>and mask onto ndarray is the path of least resistance, but it's not >>clear that it's the best one. >> >> > >Completely agree. I have many gripes about current ma implementation >of both "filled" and "mask". > >filled: > >1. I don't like default fill value. It should be mandatory to >supply fill value. > > That makes perfect sense. If anything should have a default fill value, it's the functsion calling filled, not the arrays themselves. >2. It should return masked array (with trivial mask), not ndarray. > > So, just with mask = False? In a follow on message Pierre disagress and claims that what you really want is the ndarray since not everything will accept. Then I guess you'd need to call b.filled(fill).data. I agree with Sasha in principle but Pierre, perhaps in practice. I'm almost suggested it get renames a.asndarray(fill), except that asXXX has the wrong conotations. I think this one needs to bounce around some more. >3. The name conflicts with the "fill" method. > > I thought you wanted to kill that. I'd certainly support that. Can't we just special case __setitem__ for that one case so that the performance is just as good if performance is really the issue? >4. View/Copy inconsistency. Does not provide a method to fill values in-place. > > b[b.mask] = fill_value; b.unmask() seems to work for this purpose. Can we just have filled return a copy? >mask: > >1. I've got rid of mask returning None in favor of False_ (boolean >array scalar), but it is still not perfect. I would prefer data.shape >== mask.shape invariant and if space saving/performance is deemed >necessary use zero-stride arrays. > > Interesting idea. Is that feasible yet? >2. I don't like the name. "Missing" or "na" would be better. > > I'm not on board here, although really I'd like to here from other people who use the package. 'na' seems to cryptic to me and 'missing' to specific -- there might be other reasons to mask a value other it being missing. The problem with mask is that it's not clear whether True means the data is useful or unuseful. Keep throwing out names, maybe one will stick. > > >>If we do decide we are going to add both of these methods to ndarray >>(with filled returning a copy!), then it may worth considering making >>ndarray a subclass of MaskedArray. Conceptually this makes sense, since >>at this point an ndarray will just be a MaskedArray where mask is always >>False. I think that they could share much of the implementation except >>that ndarray would be set up to use methods that ignored the mask >>attribute since they would know that it's always false. Even that might >>not be worth it, since the check for whether mask is True/False is just >>a pointer compare. >> >> >> > >The tail becoming the dog! Yet I agree, this makes sense from the >implementation point of view. From OOP perspective this would make >sense if arrays were immutable, but since mask is settable in >MaskedArray, making it constant in the subclass will violate the >substitution principle. I would not object making mask read only, >however. > > How do you set the mask? I keep getting attribute errors when I try it. And unmask would be a noop on an ndarray. > > >>It may in fact be best just to do away with MaskedArray entirely, moving >>the functionality into ndarray. That may have performance implications, >>although I don't seem them at the moment, and I don't know if there are >>other methods/attributes that this would imply need to be moved over, >>although it looks like just mask, filled and possibly filled_value, >>although the latter looks a little dubious to me. >> >> >> >I think MA can coexist with ndarray and share the interface. Ndarray >can use special bit-patterns like IEEE NaN to indicate missing >floating point values. Add-on modules can redefine arithmetic to make >INT_MIN behave as a missing marker for signed integers (R, K and J (I >think) languages use this approach). Applications that need missing >values support across the board will use MA. > > > > >>Either of the above two options would certainly improve the quality of >>MaskedArray. Copy for instance seems not to have been implemented, and >>who knows what other dark corners remain unexplored here. >> >> >> >More (corners) than you want to know about! Reimplementing MA in C >would be a worthwhile goal (and what you suggest seems to require just >that), but it is too big of a project. I suggest that we focus on the >interface first. If existing MA interface is rejected (which is >likely) for ndarray, we can easily experiment with the alternatives >within MA, which is pure python. > > Perhaps MaskedArray should inherit from ndarray for the time being. Many of the methods would need to reimplemented anyway, but it would make asanyarray work. Someone was just complaining about asarray munging his arrays. That's correct behaviour, but it would be nice if asanyarray did the right thing. I suppose we could just special case asanyarray to ignore MaskedArrays, that might be better since it's less constraining from an implementation side too. >>There's a whole spectrum of possibilities here from ones that don't >>intrude on ndarray at all to ones that profoundly change it. Sasha's >>suggestion looks like it's probably the simplest thing in the short >>term, but I don't know that it's the best long term solution. I think it >>needs more thought and discussion, which is after all what Sasha asked >>for ;) >> >> > >Exactly! > > This may be an oportune time to propose something that's been cooking in the back of my head for a week or so now: A stripped down array superclass. The details of this are not at all locked down, but here's a strawman proposal. We add an array superclass. call it basearray, that has the same C-structure as the existing ndarray. However, it has *no* methods or attributes. It's simply a big blob of data. Functions that work on the C structure of arrays (ufuncs, etc) would still work on this arrays, as would asarray, so it could be converted to an ndarray as necessary. In addition, we would supply a minimal set of functions that would operate on this object. These functions would be chosen so that the current array interface could be implemented on top of them and the basearray object in pure python. These functions would be things like set_shape(a, shape), etc. They would be segregated off in their own namespace, not in the numpy core. [Note that I'm not proposing we actually implement ndarray this way, just that we make is possible]. This leads to several useful outcomes. 1. If we're careful, this could be the basic array object that we propose, at least for the first roun,d for inclusion in the Python core. It's not useful for anything but passing data betwen various application that understand the data structure, but that in itself could be a huge win. And the fact that it's dirt simple would probably be an advantage to getting it into the core. 2. It provides a useful marker class. MA could inherit from it (and use itself for it's data attribute) and then asanyarray would behave properly. MA could also use this, or a subclass, as the mask object preventing anyone from accidentally using it as data (they could always use it on purpose with asarray). 3. It provides a platform for people to build other, ndarray-like classes in Pure python. This is my main interest. I've put together a thin shell over numpy that strips it down to it's abolute essentials including a stripped down version of ndarray that removes most of the methods. All of the __array_wrap__[1] stuff works quite well most of the time, but there's still some issues with being a subclass when this particular class is conceptually a superclass. If we had an array superclass of some sort, I believe that these would be resolved. In principle at least, this shouldn't be that hard. I think it should mostly be rearanging some code and adding some wrappers to existing functions. That's in principle. In practice, I'm not certain yet as I haven't investigated the code in question in much depth yet. I've been meaning to write this up into a more fleshed out proposal, but I got distracted by the whole Protocol discussion on python-dev3000. This writeup is pretty weak, but hopefully you get the idea. Anyway, this is somethig that I would be willing to put some time on that would benefit both me and probably the MA folks as well. Regards, -tim From efiring at hawaii.edu Fri Apr 7 17:27:09 2006 From: efiring at hawaii.edu (Eric Firing) Date: Fri Apr 7 17:27:09 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: <44370328.2060508@hawaii.edu> Tim Hochberg wrote: [...] > >> 2. I don't like the name. "Missing" or "na" would be better. >> >> > I'm not on board here, although really I'd like to here from other > people who use the package. 'na' seems to cryptic to me and 'missing' to > specific -- there might be other reasons to mask a value other it being > missing. The problem with mask is that it's not clear whether > True means the data is useful or unuseful. Keep throwing out names, > maybe one will stick. "hide" or "hidden"? A mask value of True essentially hides the underlying value. Eric From ndarray at mac.com Fri Apr 7 17:56:24 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 17:56:24 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > Perhaps MaskedArray should inherit from ndarray for the time being. Many > of the methods would need to reimplemented anyway, but it would make > asanyarray work. Someone was just complaining about asarray munging his > arrays. That's correct behaviour, but it would be nice if asanyarray did > the right thing. I suppose we could just special case asanyarray to > ignore MaskedArrays, that might be better since it's less constraining > from an implementation side too. > Just for the record. Currently MA does not inherit from ndarray. There are some benefits to be gained from changing MA design from containment to inheritance, by I am very sceptical about the use of inheritance in the array setting. > > > This may be an oportune time to propose something that's been cooking in > the back of my head for a week or so now: A stripped down array > superclass. This is a very worthwhile idea and I hate to see it burried in a non-descriptive thread. I've copied your proposal to the wiki at . From tim.hochberg at cox.net Fri Apr 7 18:44:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Fri Apr 7 18:44:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: <44371593.8060806@cox.net> Sasha wrote: >On 4/7/06, Tim Hochberg wrote: > > >>[...] >>Perhaps MaskedArray should inherit from ndarray for the time being. Many >>of the methods would need to reimplemented anyway, but it would make >>asanyarray work. Someone was just complaining about asarray munging his >>arrays. That's correct behaviour, but it would be nice if asanyarray did >>the right thing. I suppose we could just special case asanyarray to >>ignore MaskedArrays, that might be better since it's less constraining >>from an implementation side too. >> >> >> >Just for the record. Currently MA does not inherit from ndarray. > > Right, I checked that. That's why asanyarray won't work now with MA (unless someone changed the implementation of that while I wan't looking. >There are some benefits to be gained from changing MA design from >containment to inheritance, by I am very sceptical about the use of >inheritance in the array setting. > > That's probably a sensible position. Still it would be nice to have asanyarray pass masked arrays through somehow. I haven't thought this through very well, but I wonder if it would make sense for asanyarray to pass any object that supplies __array__. I'm leary of special casing asanyarray just for MA; somehow that seems the wrong approach. >>This may be an oportune time to propose something that's been cooking in >>the back of my head for a week or so now: A stripped down array >>superclass. >> >> > >This is a very worthwhile idea and I hate to see it burried in a >non-descriptive thread. I've copied your proposal to the wiki at >. > > Thanks for doing that. I'm glad you like the general idea. I do plan to write it through and try to get a better handle on what this would entail and what the consequences would be. However, I'm not sure exactly when I'll get around to it so it's probably better that a rough draft be out there for people to think about in the interim. -tim > > > From ndarray at mac.com Fri Apr 7 18:47:09 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 18:47:09 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > >1. I don't like default fill value. It should be mandatory to > >supply fill value. > > > > > That makes perfect sense. If anything should have a default fill value, > it's the functsion calling filled, not the arrays themselves. > It looks like we are getting close to a consensus on this one. I will remove fill_value attribute. [...] > >3. The name conflicts with the "fill" method. > > > > > I thought you wanted to kill that. I'd certainly support that. Can't we > just special case __setitem__ for that one case so that the performance > is just as good if performance is really the issue? > I'll propose a patch. > >4. View/Copy inconsistency. Does not provide a method to fill values in-place. > > > > > b[b.mask] = fill_value; b.unmask() > > seems to work for this purpose. Can we just have filled return a copy? > +1 > >mask: > > > >1. I've got rid of mask returning None in favor of False_ (boolean > >array scalar), but it is still not perfect. I would prefer data.shape > >== mask.shape invariant and if space saving/performance is deemed > >necessary use zero-stride arrays. > > > > > Interesting idea. Is that feasible yet? > It is not feasible in pure python module like ma, but easy in ndarray. We can also reset the writeable flag to avoid various problems that zero strides may cause. I'll propose a patch. > >2. I don't like the name. "Missing" or "na" would be better. > > > > > I'm not on board here, although really I'd like to here from other > people who use the package. 'na' seems to cryptic to me and 'missing' to > specific -- there might be other reasons to mask a value other it being > missing. The problem with mask is that it's not clear whether > True means the data is useful or unuseful. Keep throwing out names, > maybe one will stick. > The problem with the "mask" name is that ndarray already has unrelated "putmask" method. On the other hand putmask is redundant with fancy indexing. I have no other problem with "mask" name, so we may just decide to get rid of "putmask". > [...] > How do you set the mask? I keep getting attribute errors when I try it. a[i] = masked makes i-th element masked. If mask is an array, you can just set its elements. > And unmask would be a noop on an ndarray. > Yes. [...] From ndarray at mac.com Fri Apr 7 18:56:01 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 7 18:56:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <44371593.8060806@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> <44371593.8060806@cox.net> Message-ID: On 4/7/06, Tim Hochberg wrote: > [...] > Still it would be nice to have asanyarray pass masked arrays through > somehow. I haven't thought this through very well, but I wonder if it > would make sense for asanyarray to pass any object that supplies > __array__. I'm leary of special casing asanyarray just for MA; somehow > that seems the wrong approach. One possiblility is to make asanyarray pass through objects that have __array_wrap__ attribute. From pgmdevlist at mailcan.com Fri Apr 7 20:40:03 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Fri Apr 7 20:40:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436FF73.7080408@cox.net> Message-ID: <200604072258.34153.pgmdevlist@mailcan.com> > >2. It should return masked array (with trivial mask), not ndarray. > > So, just with mask = False? In a follow on message Pierre disagress and > claims that what you really want is the ndarray since not everything > will accept. Then I guess you'd need to call b.filled(fill).data. I > agree with Sasha in principle but Pierre, perhaps in practice. Well, if 'mask' became a default argument of ndarray, that wouldn't be a pb any longer. I'm quite for that. > I'm > almost suggested it get renames a.asndarray(fill), except that asXXX has > the wrong conotations. I think this one needs to bounce around some more. tondarray(fill) ? > >4. View/Copy inconsistency. Does not provide a method to fill values > > in-place. > seems to work for this purpose. Can we just have filled return a copy? Yes ! > > The problem with mask is that it's not clear whether > > True means the data is useful or unuseful. I have to think twice all the time I want to create a mask that True means in fact that I don't want the data, whereas True selects the data for ndarray... > "hide" or "hidden"? A mask value of True essentially hides the > underlying value. Unless when there's no underlying value ;). Rose, rose... I'm happy with mask, it reminds me of GRASS and gimp > The problem with the "mask" name is that ndarray already has unrelated > "putmask" method. On the other hand putmask is redundant with fancy > indexing. I have no other problem with "mask" name, so we may just > decide to get rid of "putmask". "putmask" really seems overkill indeed. I wouldn't miss it. > How do you set the mask? I keep getting attribute errors when I try it. > And unmask would be a noop on an ndarray. I've implemented something like that for some classes (inheriting from MA.MaskedArray). Never really used it yet, though #-------------------------------------------- def applymask(self,m): if not MA.is_mask(m): raise MA.MAError,"Invalid mask !" elif self._data.shape != m.shape: raise MA.MAError,"Mask and data not compatible." else: self._dmask = m > This may be an oportune time to propose something that's been cooking in > the back of my head for a week or so now: A stripped down array > superclass. That'd be great indeed, and may solve some problems reported on th list about subclassing ndarray. AAMOF, I gave up trying to use ndarray as a superclass, and rely only on MA From zdm105 at tom.com Sat Apr 8 01:56:02 2006 From: zdm105 at tom.com (=?GB2312?B?NNTCMTUtMTbJz7qjLzIxLTIyye7b2g==?=) Date: Sat Apr 8 01:56:02 2006 Subject: [Numpy-discussion] =?GB2312?B?QUTUy9PDRVhDRUy02b34ytCzodOqz/rT67LGzvG53MDt?= Message-ID: An HTML attachment was scrubbed... URL: From webb.sprague at gmail.com Sat Apr 8 20:02:11 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Sat Apr 8 20:02:11 2006 Subject: [Numpy-discussion] Unexpected change of array used to index another array Message-ID: Hi. I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP below). I suppose because OFFS_TMP has negative numbers, it was changed to cycle around to 9 wherever there is a negative 1 (which is the forward version of -1 if you are a 10 x 10 matrix). You can analogous behavior with -2 => 8, etc. Is changing the indexing matrix really the correct behavior? The result of using the index seems to be fine. Has this story been told already and I didn't know it? Below is my ipython session. In [57]: OFFS_TMP Out[57]: array([[-1, 1], [ 0, 1], [ 1, 1], [-1, 0], [ 0, 0], [ 1, 0], [-1, -1], [ 0, -1], [ 1, -1]]) In [58]: bigM[OFFS_TMP] Out[58]: array([[[False, True, False, False, True, False, True, True, True, False], [False, True, False, True, True, False, False, False, True, True]], [[True, False, True, False, True, True, False, False, False, True], [False, True, False, True, True, False, False, False, True, True]], [[False, True, False, True, True, False, False, False, True, True], [False, True, False, True, True, False, False, False, True, True]], [[False, True, False, False, True, False, True, True, True, False], [True, False, True, False, True, True, False, False, False, True]], [[True, False, True, False, True, True, False, False, False, True], [True, False, True, False, True, True, False, False, False, True]], [[False, True, False, True, True, False, False, False, True, True], [True, False, True, False, True, True, False, False, False, True]], [[False, True, False, False, True, False, True, True, True, False], [False, True, False, False, True, False, True, True, True, False]], [[True, False, True, False, True, True, False, False, False, True], [False, True, False, False, True, False, True, True, True, False]], [[False, True, False, True, True, False, False, False, True, True], [False, True, False, False, True, False, True, True, True, False]]], dtype=bool) In [59]: OFFS_TMP Out[59]: array([[9, 1], [0, 1], [1, 1], [9, 0], [0, 0], [1, 0], [9, 9], [0, 9], [1, 9]]) From robert.kern at gmail.com Sat Apr 8 21:17:28 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 8 21:17:28 2006 Subject: [Numpy-discussion] Re: Unexpected change of array used to index another array In-Reply-To: References: Message-ID: Webb Sprague wrote: > Hi. > > I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP > below). I suppose because OFFS_TMP has negative numbers, it was > changed to cycle around to 9 wherever there is a negative 1 (which is > the forward version of -1 if you are a 10 x 10 matrix). You can > analogous behavior with -2 => 8, etc. Is changing the indexing matrix > really the correct behavior? The result of using the index seems to > be fine. Has this story been told already and I didn't know it? I think it's a bug. I've located the problem, but I'm not familiar with that part of the code so I'm not entirely sure how to go about fixing it. http://projects.scipy.org/scipy/numpy/ticket/49 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From lbirvyx at teamoneadv.com Sun Apr 9 03:13:05 2006 From: lbirvyx at teamoneadv.com (lbirvyx) Date: Sun Apr 9 03:13:05 2006 Subject: [Numpy-discussion] Fw: numpy-discussion Message-ID: <001101c65bbe$21f165a0$29d13e50@JIPC846> ----- Original Message ----- From: Burks Aileen To: itwymeyq at acecannon.com Sent: Saturday, April 08, 2006 10:37 AM Subject: numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-discussion.gif Type: image/gif Size: 24405 bytes Desc: not available URL: From webb.sprague at gmail.com Sun Apr 9 15:21:01 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Sun Apr 9 15:21:01 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float Message-ID: Could someone explain this behavior: In [13]: type(N.floor(1)) Out[13]: In [14]: N.floor? Type: ufunc String Form: Namespace: Interactive Docstring: y = floor(x) elementwise largest integer <= x I wouldn't complain, except the only time I use floor() is to make indices (dividing ages by age widths, for example). Thanks! From tim.hochberg at cox.net Sun Apr 9 15:30:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 9 15:30:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: Message-ID: <44398AFD.4050304@cox.net> Webb Sprague wrote: >Could someone explain this behavior: > >In [13]: type(N.floor(1)) >Out[13]: > >In [14]: N.floor? >Type: ufunc >String Form: >Namespace: Interactive >Docstring: > y = floor(x) elementwise largest integer <= x > >I wouldn't complain, except the only time I use floor() is to make >indices (dividing ages by age widths, for example). > > Well, floor returns an integer, but not an int -- it's an integral floating point value. What you want is: numpy.floor(1).astype(int) (If you're only using scalars, you might also consider int(floor(x)) instead. Regards, -tim >Thanks! > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=k&kid0944&bid$1720&dat1642 >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > From webb.sprague at gmail.com Sun Apr 9 15:40:02 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Sun Apr 9 15:40:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: <44398AFD.4050304@cox.net> References: <44398AFD.4050304@cox.net> Message-ID: I think the docstring implies that numpy.floor() returns an integer value. One can cast the float value to a usable integer value, but either the docstring should read something different or the function should be changed (my preference). "y = floor(x) elementwise largest integer <= x" is the docstring. As far as "integral valued float" versus "integer", this distinction seems a little obscure... I am sure the difference is very important in some contexts, but I for one think that floor should return a straight up integer, if just for code style (see example below). Plus it will be upcast to a float whenever necessary, so floor(4.5) + .75 == 4.75 whether floor() returns an int or a float. fooMatrix[numpy.floor(age/ageWidth)] is better (easier to type, read, and debug) than fooMatrix[numpy.floor(age/ageWidth).astype(int)] If there is a explanation as to why an integral valued float is a better return value, I would be interested in a link. Thx W From robert.kern at gmail.com Sun Apr 9 15:46:04 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 9 15:46:04 2006 Subject: [Numpy-discussion] Re: numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> Message-ID: Webb Sprague wrote: > If there is a explanation as to why an integral valued float is a > better return value, I would be interested in a link. In [4]: import numpy In [5]: numpy.floor(2.**50) Out[5]: 1125899906842624.0 In [6]: numpy.floor(2.**50).astype(int) Out[6]: 2147483647 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Sun Apr 9 16:07:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 9 16:07:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> Message-ID: <443993E3.1090901@cox.net> Webb Sprague wrote: >I think the docstring implies that numpy.floor() returns an integer >value. > You've been programming to much! Everywhere but the computer programming world, 1.0 is integer. Even their, many (most?) computer languages avoid the term integer using int, Int or something similar. The distinction made between ints and integral floating point values is mostly an artificial one resulting from implementation issues. Making this distinction is also a handy, if imperfect, proxy for exact / versus inexact numbers. >One can cast the float value to a usable integer value, but >either the docstring should read something different or the function >should be changed (my preference). > >"y = floor(x) elementwise largest integer <= x" is the docstring. > >As far as "integral valued float" versus "integer", this distinction >seems a little obscure... > An integral floating point value *is* an integer, just ask any 12 year old. What's obscure is the way concepts of integers and reals get mapped to ints and floats. Don't get me wrong, these are reasonable comprises given the sad reality that computers are not so hot at representing inifinte quantities. However, we get sucked into thinking that integers and ints are really the same things at our peril. Similarly for floats and reals. > I am sure the difference is very important >in some contexts, but I for one think that floor should return a >straight up integer, > It's a ufunc. Ufuncs in general return the same type that they operate on. So, not only would this be difficult, it would make the signature of ufuncs harder to remember. Also, as Robert Kern just pointed out, not all intergral FP values can be represents as ints. > if just for code style (see example below). Plus >it will be upcast to a float whenever necessary, so floor(4.5) + .75 >== 4.75 whether floor() returns an int or a float. > > Not every two-line Python function has to come pre-written -- Tim Peters on C.L.P def webbsfloor(x): return numpy.floor(x).astype(int) >fooMatrix[numpy.floor(age/ageWidth)] > >is better (easier to type, read, and debug) than > >fooMatrix[numpy.floor(age/ageWidth).astype(int)] > >If there is a explanation as to why an integral valued float is a >better return value, I would be interested in a link. > > I think there's at least four reasons: 1. It would be a pain. 2. It would make the ufuncs inconsistent. 3. It's a thin wrapper over C's floor, so people coming from that language be confused. 4. It wouldn't work for numbers with very large magnitudes. Pick any three Regards, -tim From tim.hochberg at cox.net Sun Apr 9 20:09:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 9 20:09:03 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: <443993E3.1090901@cox.net> References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> Message-ID: <4439CC7E.90704@cox.net> Tim Hochberg wrote: > Webb Sprague wrote: > >> I think the docstring implies that numpy.floor() returns an integer >> value. > > You've been programming to much! > > Everywhere but the computer programming world, 1.0 is integer. Even > their, many (most?) computer languages avoid the term integer using > int, Int or something similar. The distinction made between ints and > integral floating point values is mostly an artificial one resulting > from implementation issues. Making this distinction is also a handy, > if imperfect, proxy for exact / versus inexact numbers. > >> One can cast the float value to a usable integer value, but >> either the docstring should read something different or the function >> should be changed (my preference). >> >> "y = floor(x) elementwise largest integer <= x" is the docstring. > Let me just add that, since this seems to cause confusion, it would be appropriate to amend the docstring tobe explicit that this always returns an integral floating point value. If someone wants to suggest wording, I can figure out where to put it. One possibility is: "y = floor(x) elementwise largest integer <= x; note that the result is a floating point value" or "y = floor(x) elementwise largest integral float <= x" Neither of those is great, but perhaps they'll inspire someone to do better. -tim >> >> As far as "integral valued float" versus "integer", this distinction >> seems a little obscure... >> > An integral floating point value *is* an integer, just ask any 12 year > old. What's obscure is the way concepts of integers and reals get > mapped to ints and floats. Don't get me wrong, these are reasonable > comprises given the sad reality that computers are not so hot at > representing inifinte quantities. However, we get sucked into > thinking that integers and ints are really the same things at our > peril. Similarly for floats and reals. > >> I am sure the difference is very important >> in some contexts, but I for one think that floor should return a >> straight up integer, >> > It's a ufunc. Ufuncs in general return the same type that they operate > on. So, not only would this be difficult, it would make the signature > of ufuncs harder to remember. > > Also, as Robert Kern just pointed out, not all intergral FP values can > be represents as ints. > >> if just for code style (see example below). Plus >> it will be upcast to a float whenever necessary, so floor(4.5) + .75 >> == 4.75 whether floor() returns an int or a float. >> >> > Not every two-line Python function has to come pre-written -- Tim > Peters on C.L.P > > def webbsfloor(x): > return numpy.floor(x).astype(int) > >> fooMatrix[numpy.floor(age/ageWidth)] >> >> is better (easier to type, read, and debug) than >> >> fooMatrix[numpy.floor(age/ageWidth).astype(int)] >> >> If there is a explanation as to why an integral valued float is a >> better return value, I would be interested in a link. >> >> > I think there's at least four reasons: > > 1. It would be a pain. > 2. It would make the ufuncs inconsistent. > 3. It's a thin wrapper over C's floor, so people coming from that > language be confused. > 4. It wouldn't work for numbers with very large magnitudes. > > Pick any three > > > Regards, > > -tim > From charlesr.harris at gmail.com Sun Apr 9 22:12:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun Apr 9 22:12:02 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: <4439CC7E.90704@cox.net> References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> <4439CC7E.90704@cox.net> Message-ID: Tim, On 4/9/06, Tim Hochberg wrote: > Let me just add that, since this seems to cause confusion, it would be > appropriate to amend the docstring tobe explicit that this always > returns an integral floating point value. If someone wants to suggest > wording, I can figure out where to put it. One possibility is: > > "y = floor(x) elementwise largest integer <= x; note that the result > is a floating point value" > > or > > "y = floor(x) elementwise largest integral float <= x" How about, "for each item in x returns the largest integral float <= item." Chuck P.S. I too once found the C definition of the floor function annoying, but I got used to it. Sorta like getting used to a broken leg. The main problem is that the result can't be used as an index without conversion to a "real" integer. Integers aren't members of the reals (or rationals): apart from +/- 1, integers don't have inverses. There happens to be an injective ring homomorphism of the integers into the reals, but that is not the same thing. On the other hand, ints are generally not big enough to hold all of the integral doubles, so as a practical matter the originators made the best choice. Things do get a bit weird for large floats because above a certain threshold floats are already integral values. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Apr 9 22:21:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun Apr 9 22:21:02 2006 Subject: [Numpy-discussion] Unexpected change of array used to index another array In-Reply-To: References: Message-ID: On 4/8/06, Webb Sprague wrote: > > Hi. > > I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP > below). I suppose because OFFS_TMP has negative numbers, it was > changed to cycle around to 9 wherever there is a negative 1 (which is > the forward version of -1 if you are a 10 x 10 matrix). You can > analogous behavior with -2 => 8, etc. Is changing the indexing matrix > really the correct behavior? The result of using the index seems to > be fine. Has this story been told already and I didn't know it? It's the python way: >>> a = [1,2,3] >>> a[-1] 3 It gives a convenient way to index from the end of the array. But I'm not sure that was your question. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Apr 10 00:02:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 10 00:02:01 2006 Subject: [Numpy-discussion] Re: Unexpected change of array used to index another array In-Reply-To: References: Message-ID: Charles R Harris wrote: > > On 4/8/06, *Webb Sprague* > wrote: > > Hi. > > I indexed an 10 x 10(called bigM below) with another array (OFFS_TMP > below). I suppose because OFFS_TMP has negative numbers, it was > changed to cycle around to 9 wherever there is a negative 1 (which is > the forward version of -1 if you are a 10 x 10 matrix). You can > analogous behavior with -2 => 8, etc. Is changing the indexing matrix > really the correct behavior? The result of using the index seems to > be fine. Has this story been told already and I didn't know it? > > It's the python way: > >>>> a = [1,2,3] >>>> a[-1] > 3 > > It gives a convenient way to index from the end of the array. But I'm > not sure that was your question. That's not the issue. The problem was that the index array was being modified in-place simply by being used as an index array. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From arnd.baecker at web.de Mon Apr 10 04:01:05 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Mon Apr 10 04:01:05 2006 Subject: [Numpy-discussion] Speed up function on cross product of two sets? In-Reply-To: <4434D6DF.2020306@ieee.org> References: <44315633.4010600@cox.net> <4434D6DF.2020306@ieee.org> Message-ID: On Thu, 6 Apr 2006, Travis Oliphant wrote: > Arnd Baecker wrote: > > BTW, it seems that we have no Numeric to numpy transition remarks in > > www.scipy.org. I only found > > http://www.scipy.org/PearuPeterson/NumpyVersusNumeric > > and of course Travis' "Guide to NumPy" contains a detailed list of > > necessary changes in chapter 2.6.1. > > > For clarification: this is in the sample chapter available on-line to > all.... yes, I should have emphasized that. I tried to make this also clearer at http://www.scipy.org/Converting_from_Numeric > > In addition ``site-packages/numpy/lib/convertcode.py`` provides an > > automatic conversion. > > > > Would it be helpful to start a new wiki page "ConvertingFromNumeric" > > (similar to http://www.scipy.org/Converting_from_numarray) > > which aims at summarizing the necessary changes > > or expand Pearu's page (if he agrees) on this? > > > > Absolutely. I did the Numarray page because I'd written a lot on > Converting from Numeric (even providing convertcode.py) but very little > for numarray --- except the ndimage conversion. So, I started the > Numarray page. Sounds like a great idea to have a dual page. Best, Arnd P.S.: BTW +1 to all which has been said in the other thread on NumPy documentation - you are really doing a brilliant job, Travis!!! From webb.sprague at gmail.com Mon Apr 10 07:16:04 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Mon Apr 10 07:16:04 2006 Subject: [Numpy-discussion] Unexpected change of array used to index another array In-Reply-To: References: Message-ID: > > It's the python way: > > >>> a = [1,2,3] > >>> a[-1] > 3 > > It gives a convenient way to index from the end of the array. But I'm not > sure that was your question. No there, was a bug in that when using one matrix to index another, in that the indexing matrix gets changed. As if you did >>> i = 4 >>> a = [1,2,3] >>> a[i] >>> print i -1 I know about the negative trick in simple python lists, I was trying to do something in matrices (where it works too, but that wasn't the issue. Thanks for trying to help, though. W From webb.sprague at gmail.com Mon Apr 10 07:19:22 2006 From: webb.sprague at gmail.com (Webb Sprague) Date: Mon Apr 10 07:19:22 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> <4439CC7E.90704@cox.net> Message-ID: > > "y = floor(x) elementwise largest integer <= x; note that the result > > is a floating point value" I prefer this, if it makes any difference. The others are more succint, but less likely to help others in my situation. > I too once found the C definition of the floor function annoying, but I got > used to it. Sorta like getting used to a broken leg. Annoying yes, crippling no. I guess I should have grown up on a real programming language :) From tim.hochberg at cox.net Mon Apr 10 09:13:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 09:13:03 2006 Subject: [Numpy-discussion] numpy.floor() is supposed to return an int, but returns a float In-Reply-To: References: <44398AFD.4050304@cox.net> <443993E3.1090901@cox.net> <4439CC7E.90704@cox.net> Message-ID: <443A844C.7070306@cox.net> Charles R Harris wrote: > Tim, > > On 4/9/06, *Tim Hochberg* > wrote: > > Let me just add that, since this seems to cause confusion, it would be > appropriate to amend the docstring tobe explicit that this always > returns an integral floating point value. If someone wants to suggest > wording, I can figure out where to put it. One possibility is: > > "y = floor(x) elementwise largest integer <= x; note that the > result > is a floating point value" > > or > > "y = floor(x) elementwise largest integral float <= x" > > > How about, "for each item in x returns the largest integral float <= > item." That seems pretty good. I'll wait a day or so and see what else shows up. > > Chuck > > P.S. > > I too once found the C definition of the floor function annoying, but > I got used to it. Sorta like getting used to a broken leg. The main > problem is that the result can't be used as an index without > conversion to a "real" integer. Integers aren't members of the reals > (or rationals): apart from +/- 1, integers don't have inverses. > There happens to be an injective ring homomorphism of the integers > into the reals, but that is not the same thing. I'm not conversant with the terminology [here I rummage through google to try to get the terminology sort of right], but as I understand it integers (I) are a subset of reals (R). The ring that you contruct with integers consists of the set of integers plus the operations of addition/subtraction and multiplication as well as an identity. I've seen that specified as something like (I, +/-, *, 0). Similarly, the set of reals (R) and the field that one constructs from them are not really the same thing. So while the ring of integers is not a subset of the field of reals (the statement doesn't even make sense when put that way),the set of integers is a subset of the set of reals. I think that most people, outside of computer programmers and perhaps math majors, think of the set of integers, not the field of integers, to the extent that they think about integers and reals at all. I imagine most people would conjure up some Dali like image when confronted with the notion of a field of integerse! (C-int, +/-, *, 0), actually forms a finite field which is not at all the same thing the field of integers. Bit twiddlers tend to understand and even exploit this, but a lot of people conflate the field of ints with the field of integers. This works fine as long as your values are small in magnitude, but eventually will rise up and bite you. Floats are even worse, since they don't even form a field, I think they're actually a semiring because of INF/NAN/IND, but I'm not certain about that. Issues with floating point pop up everywhere and if you squint the right way, you can blame them on their lack of fieldness. Which is closely tied to their finite range and precision, which is what bites people. Because Python automatically promotes (Python) ints to (Python) longs, Python ints map, for most puposes, onto the field of integers. However, in numpy wer're stuck using C-ints for performance reasons, so we'd be wise to keep the differences between ints and integers in the back of our mind. This is wandering rather far afield (although it's entertaining). > On the other hand, ints are generally not big enough to hold all of > the integral doubles, so as a practical matter the originators made > the best choice. Things do get a bit weird for large floats because > above a certain threshold floats are already integral values. Another issue at the moment is that integer division does an implicit flooring or truncation (I believe it's implementation dependant in C) in both C and Python, so if you aren't using floor to produce an index, something I've been known to do, having it return an integer could also lead to nasty suprises. For example: def half_integer(x): "return nearest half integer below x" return floor(2*x) / 2 Would start failing mysteriously. Of course the above is an overflow magnet, so perhaps it's not the best example. Eventually, '/' is going to mean true_division and '//' will mean floor_division, so this particular issue will go away. Regards, -tim > > > From bsouthey at gmail.com Mon Apr 10 09:16:08 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Mon Apr 10 09:16:08 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <200604071844.37724.pgmdevlist@mailcan.com> Message-ID: Hi, On 4/7/06, Sasha wrote: > On 4/7/06, Pierre GM wrote: > > ... > > We're going towards MA as the default object. > > > I will be against changing the array structure to handle missing > values. Let's keep the discussion focuced on the interface. Once we > agree on the interface, it will be clear if any structural changes are > necessary. > > > > But then again, what would be the behavior to deal with missing values ? > > We can postpone this discussion as well. Just add mask attribute that > returns False and filled method that returns a copy is an example of a > minimalistic change. I think that the usage of MA is important because this often dictates the interface. The other aspect is the penalty that is imposed by requiring a masked features especially to situations that don't need any of these features. > > > Using R-like na.actions ? That'd be great, but it's getting more complex. > > > > I don't like na.actions. I think missing values should behave like > IEEE NaNs and in the floating point case should be represented by > NaNs. I think the issue related to how masked values should be handled in computation. Does it matter if the result of an operation is due to a masked value or numerical problem (like dividing by zero)? (I am presuming that it is possible to identify this difference.) If not, then I support the idea of treating masked values as NaN. >The functionality provided by na.actions can always be achieved > by calling an extra function (filled or compress). I am not clear on what you actually mean here. For example, if you are summing across a particular dimension, I would presume that any masked value would be ignored an that there would be some record of the fact that a masked value was encountered. This would allow that 'extra function' to handle the associated result. Alternatively the 'extra function' would have to be included as an argument - which is what the na.actions do. Regards Bruce From ndarray at mac.com Mon Apr 10 09:49:05 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 09:49:05 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <200604071844.37724.pgmdevlist@mailcan.com> Message-ID: On 4/10/06, Bruce Southey wrote: > > [...] > I think the issue related to how masked values should be handled in > computation. Does it matter if the result of an operation is due to a > masked value or numerical problem (like dividing by zero)? (I am > presuming that it is possible to identify this difference.) If not, > then I support the idea of treating masked values as NaN. > IEEE standard prvides plenty of spare bits in NaNs to represent pretty much everything, and some languages take advantage of that feature. (I believe NA and NaN are distinct in R). In MA, however mask elements are boolean and no distinction is made between various reasons for not having a data element. For consistency, a non-trivial (not always false) implementation of ndarray.mask should return "not finite" and ignore bits that distinguish NaNs and infinities. > >The functionality provided by na.actions can always be achieved > > by calling an extra function (filled or compress). > > I am not clear on what you actually mean here. For example, if you > are summing across a particular dimension, I would presume that any > masked value would be ignored an that there would be some record of > the fact that a masked value was encountered. This would allow that > 'extra function' to handle the associated result. Alternatively the > 'extra function' would have to be included as an argument - which is > what the na.actions do. > If you sum along a particular dimension and encounter a masked value, the result is masked. The same is true if you encounter a NaN - the result is NaN. If you would like to ignore masked values, you write a.filled(0).sum() instead of a.sum(). In 1d case, you can also use a.compress().sum(). In other words, what in R you achieve with a flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an explicit call to "fill". This is not quite the same as na.actions in R, but that is what I had in mind. From pgmdevlist at mailcan.com Mon Apr 10 10:58:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Mon Apr 10 10:58:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: Message-ID: <200604101356.44903.pgmdevlist@mailcan.com> > If you sum along a particular dimension and encounter a masked value, > the result is masked. That's not how it currently works (still on 0.9.6): x=arange(12).reshape(3,4) MA.masked_where((x%5==0) | (x%3==0),x).sum(0) array(data = [12 1 2 18], mask = [False False False False], fill_value=999999) and frankly, I'd be quite frustrated if it had to change: - `filled` is not a ndarray method, which means that a.filled(0).sum() fails if a is not MA. Right now, I can use a.sum() without having to check the nature of a first. - this behavior was already in Numeric - All my scripts rely on it (but I guess that's my problem) - The current way reflects how mask are used in GIS or image processing. > If you would like to ignore masked values, you write > a.filled(0).sum() instead of a.sum(). In 1d case, you can also use > a.compress().sum(). Once again, Sasha, I'd agree with you if it wasn't a major difference > In other words, what in R you achieve with a > flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an > explicit call to "fill". This is not quite the same as na.actions in > R, but that is what I had in mind. I kinda like the idea of a flag, though From ndarray at mac.com Mon Apr 10 11:37:00 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 11:37:00 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101356.44903.pgmdevlist@mailcan.com> References: <200604101356.44903.pgmdevlist@mailcan.com> Message-ID: On 4/10/06, Pierre GM wrote: > > If you sum along a particular dimension and encounter a masked value, > > the result is masked. > > That's not how it currently works (still on 0.9.6): > > [... longish example snipped ...] >>> ma.array([1,1], mask=[0,1]).sum() 1 > and frankly, I'd be quite frustrated if it had to change: > - `filled` is not a ndarray method, which means that a.filled(0).sum() fails > if a is not MA. Right now, I can use a.sum() without having to check the > nature of a first. This is exactly the point of the current discussion: make fill a method of ndarray. With the current behavior, how would you achieve masking (no fill) a.sum()? > - this behavior was already in Numeric That's true, but it makes the result of sum(a) different from __builtins__.sum(a). I believe consistency with the python conventions is more important than with legacy Numeric in the long run. > [...] > - The current way reflects how mask are used in GIS or image processing. > Can you elaborate on this? Note that in R na.rm is false by default in sum: > sum(c(1,NA)) [1] NA So it looks like the convention is different in the field of statistics. > > If you would like to ignore masked values, you write > > a.filled(0).sum() instead of a.sum(). In 1d case, you can also use > > a.compress().sum(). > > Once again, Sasha, I'd agree with you if it wasn't a major difference Array methods are a very recent addition to ma. We can still use this window of opportunity to get things right before to many people get used to the wrong behavior. (Note that I changed your implementation of cumsum and cumprod.) > > > In other words, what in R you achieve with a > > flag, such as in sum(a, na.rm=TRUE), in numpy you achieve by an > > explicit call to "fill". This is not quite the same as na.actions in > > R, but that is what I had in mind. > > I kinda like the idea of a flag, though With the flag approach making ndarray and ma.array interfaces consistent would require adding an extra argument to many methods. Instead, I poropose to add one method: fill to ndarray. From pgmdevlist at mailcan.com Mon Apr 10 13:37:07 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Mon Apr 10 13:37:07 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <200604101356.44903.pgmdevlist@mailcan.com> Message-ID: <200604101638.29979.pgmdevlist@mailcan.com> > > [... longish example snipped ...] > > > >>> ma.array([1,1], mask=[0,1]).sum() > > 1 So ? The result is not `masked`, the missing value has been omitted. MA.array([[1,1],[1,1]],mask=[[0,1],[1,0]]).sum() array(data = [1 1], mask = [False False], fill_value=999999) > This is exactly the point of the current discussion: make fill a > method of ndarray. Mrf. I'm still not convinced, but I have nothing against it. Along with a mask=False_ by default ? > With the current behavior, how would you achieve masking (no fill) a.sum()? Er, why would I want to get MA.masked along one axis if one value is masked ? The current behavior is to mask only if all the values along that axis are masked: MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() array(data = [1 999999], mask = [False True], fill_value=999999) With a.filled(0).sum(), how would you distinguish between the cases (a) at least one value is not masked and (b) all values are masked ? (OK, by querying the mask with something in the line of a a._mask.all(axis), but it's longer... Oh well, I'll just to adapt) > > - this behavior was already in Numeric > > That's true, but it makes the result of sum(a) different from > __builtins__.sum(a). I believe consistency with the python > conventions is more important than with legacy Numeric in the long > run. > > Array methods are a very recent addition to ma. We can still use this > window of opportunity to get things right before to many people get > used to the wrong behavior. (Note that I changed your implementation > of cumsum and cumprod.) Good points... We'll just have to put strong warnings everywhere. > > > > - The current way reflects how mask are used in GIS or image processing. > > Can you elaborate on this? Note that in R na.rm is false by default in sum: > > sum(c(1,NA)) > > [1] NA > > So it looks like the convention is different in the field of statistics. MMh. *digs in his old GRASS scripts* OK, my bad. I had to fill missing values somehow, or at least check whether there were any before processing. I'll double check on that. Please temporarily forget that comment. > With the flag approach making ndarray and ma.array interfaces > consistent would require adding an extra argument to many methods. > Instead, I poropose to add one method: fill to ndarray. OK, good point. On a semantic aspect: While digging these GRASS scripts I mentioned, I realized/remembered that masked values are called 'null', when there's no data, a NAN, or just when you want to hide some values. What about 'null' instead of 'mask','missing','na' ? From tim.hochberg at cox.net Mon Apr 10 14:14:02 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 14:14:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101638.29979.pgmdevlist@mailcan.com> References: <200604101356.44903.pgmdevlist@mailcan.com> <200604101638.29979.pgmdevlist@mailcan.com> Message-ID: <443AC5CB.2000704@cox.net> Pierre GM wrote: >>>[... longish example snipped ...] >>> >>> >>> >>>>>ma.array([1,1], mask=[0,1]).sum() >>>>> >>>>> >>1 >> >> >So ? The result is not `masked`, the missing value has been omitted. > >MA.array([[1,1],[1,1]],mask=[[0,1],[1,0]]).sum() >array(data = [1 1], mask = [False False], fill_value=999999) > > > > >>This is exactly the point of the current discussion: make fill a >>method of ndarray. >> >> >Mrf. I'm still not convinced, but I have nothing against it. Along with a >mask=False_ by default ? > > > >>With the current behavior, how would you achieve masking (no fill) a.sum()? >> >> >Er, why would I want to get MA.masked along one axis if one value is masked ? > > Any number of reasons I would think. It depends on what your using the data for. If the sum is the total amount that you spent in the month, and a masked value means you lost that check stub, then you don't know how much you actually spent and that value should be masked. To chose a boring example. >The current behavior is to mask only if all the values along that axis are >masked: > >MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() >array(data = [1 999999], mask = [False True], fill_value=999999) > >With a.filled(0).sum(), how would you distinguish between the cases (a) at >least one value is not masked and (b) all values are masked ? (OK, by >querying the mask with something in the line of a a._mask.all(axis), but it's >longer... Oh well, I'll just to adapt) > > Actually I'm going to ask you the same question. Why would care if all of the values are masked? I may be missing something, but either there's a sensible default value, in which case it doesn't matter how many values are masked, or you can't handle any masked values and the result should be masked if there are any masks in the input. Sasha's proposal handle those two cases well. Your behaviour a little more clunkily, but I'd like to understand why you want that behaviour. Regards, -tim > > >>>- this behavior was already in Numeric >>> >>> >>That's true, but it makes the result of sum(a) different from >>__builtins__.sum(a). I believe consistency with the python >>conventions is more important than with legacy Numeric in the long >>run. >> >>Array methods are a very recent addition to ma. We can still use this >>window of opportunity to get things right before to many people get >>used to the wrong behavior. (Note that I changed your implementation >>of cumsum and cumprod.) >> >> > >Good points... We'll just have to put strong warnings everywhere. > > > >>>- The current way reflects how mask are used in GIS or image processing. >>> >>> >>Can you elaborate on this? Note that in R na.rm is false by default in sum: >> >> >>>sum(c(1,NA)) >>> >>> >>[1] NA >> >>So it looks like the convention is different in the field of statistics. >> >> > >MMh. *digs in his old GRASS scripts* >OK, my bad. I had to fill missing values somehow, or at least check whether >there were any before processing. I'll double check on that. Please >temporarily forget that comment. > > > >>With the flag approach making ndarray and ma.array interfaces >>consistent would require adding an extra argument to many methods. >>Instead, I poropose to add one method: fill to ndarray. >> >> >OK, good point. > > >On a semantic aspect: >While digging these GRASS scripts I mentioned, I realized/remembered that >masked values are called 'null', when there's no data, a NAN, or just when >you want to hide some values. What about 'null' instead of >'mask','missing','na' ? > > > >------------------------------------------------------- >This SF.Net email is sponsored by xPML, a groundbreaking scripting language >that extends applications into web and mobile media. Attend the live webcast >and join the prime developer group breaking into this new coding territory! >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at lists.sourceforge.net >https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > From oliphant at ee.byu.edu Mon Apr 10 15:07:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:07:06 2006 Subject: [Numpy-discussion] Recarray and shared datas In-Reply-To: <200604061020.k36AKIsQ018238@decideur.info> References: <200604061020.k36AKIsQ018238@decideur.info> Message-ID: <443AD6CF.4010800@ee.byu.edu> Benjamin Thyreau wrote: >Hi, >Numpy has a nice feature of recarray, ie. record which can hold columns names. >I'd like to use such a feature in order to better interact with R, ie. passing >R datas to python without copy. The current rpy bindings do a full copy, and >convert to simple ndarray. Looking at the recarray api in the Guide, >and also at the source code, i don't find any recarray constructor which can >get shared datas (all the examples from section 8.6 are doing copies). >Is there some way to do it ? in Python or in C ? Or is there any plans to ? > > > Yes, you can share data with a recarray because a "recarray" is just a numpy array with a fancy data-type and with attribute access over-ridding to do "field" lookups if the attribute cannot otherwise be found. What exactly are you trying to share data with? I'm having a hard time understanding how to answer your question without more information. Best, -Travis From oliphant at ee.byu.edu Mon Apr 10 15:14:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:14:05 2006 Subject: [Numpy-discussion] Tiling / disk storage for matrix in numpy? In-Reply-To: References: Message-ID: <443AD889.7020004@ee.byu.edu> Webb Sprague wrote: >Hi all, > >Is there a way in numpy to associate a (large) matrix with a disk >file, then and tile and index it, then cache it as you process the >various pieces? This is pretty important with massive image files, >which can't fit into working memory, but in which (for example) you >might be doing a convolution on a 100 x 100 pixel window on a small >subset of the image. > > > I suppose if you used a memory-mapped array, then you would be at the mercy of the operating system caching. But, this would be the easiest way. -Travis From oliphant at ee.byu.edu Mon Apr 10 15:21:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:21:07 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <4436AE31.7000306@cox.net> Message-ID: <443ADA43.8060400@ee.byu.edu> Sasha wrote: > > > On 4/7/06, *Tim Hochberg* > wrote: > > ... > In general, I'm skeptical of adding more methods to the ndarray object > -- there are plenty already. > > > I've also proposed to drop "fill" in favor of optimizing x[...] = > . Having both "fill" and "filled" in the interface is plain > awkward. You may like the combined proposal better because it does > not change the total number of methods :-) > > > In addition, it appears that both the method and function versions of > filled are "dangerous" in the sense that they sometimes return the > array > itself and sometimes a copy. > > > This is true in ma, but may certainly be changed. > > > Finally, changing ndarray to support masked array feels a bit like the > tail wagging the dog. > > > I disagree. Numpy is pretty much alone among the array languages > because it does not have "native" support for missing values. For the > floating point types some rudimental support for nans exists, but is > not really usable. There is no missing values machanism for integer > types. I believe adding "filled" and maybe "mask" to ndarray (not > necessarily under these names) could be a meaningful step towards > "native" support for missing values. Supporting missing values is a useful thing (but not for every usage of arrays). Thus, ultimately, I see missing-value arrays as a solid sub-class of the basic array class. I'm glad Sasha is working on missing value arrays and have tried to be supportive. I'm a little hesitant to add a special-case method basically for one particular sub-class, though, unless it is the only workable solution. We are still exploring this whole sub-class space and have not really mastered it... -Travis From oliphant at ee.byu.edu Mon Apr 10 15:44:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:44:07 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <4436FF73.7080408@cox.net> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> Message-ID: <443ADF9A.9050001@ee.byu.edu> > This may be an oportune time to propose something that's been cooking > in the back of my head for a week or so now: A stripped down array > superclass. The details of this are not at all locked down, but here's > a strawman proposal. This is in essence what I've been proposing since SciPy 2005. I want what goes into Python to be essentially just this super-class. Look at this http://numeric.scipy.org/array_interface.html and check out this svn co http://svn.scipy.org/svn/PEP arrayPEP I've obviously been way over-booked to do this myself. Nick Coughlan expressed interest in this idea (he called it dimarray, but I like basearray better). > > We add an array superclass. call it basearray, that has the same > C-structure as the existing ndarray. However, it has *no* methods or > attributes. Why not give it the attributes corresponding to it's C-structure. I'm happy with no methods though. > 1. If we're careful, this could be the basic array object that > we propose, at least for the first roun,d for inclusion in the > Python core. It's not useful for anything but passing data betwen > various application that understand the data structure, but that in > itself could be a huge win. And the fact that it's dirt simple would > probably be an advantage to getting it into the core. The only extra thing I'm proposing is to add the data-descriptor object into the Python core as well --- other-wise what do you do with PyArray_Descr * part of the C-structure? > 2. It provides a useful marker class. MA could inherit from it > (and use itself for it's data attribute) and then asanyarray would > behave properly. MA could also use this, or a subclass, as the mask > object preventing anyone from accidentally using it as data (they > could always use it on purpose with asarray). > 3. It provides a platform for people to build other, > ndarray-like classes in Pure python. This is my main interest. I've > put together a thin shell over numpy that strips it down to it's > abolute essentials including a stripped down version of ndarray that > removes most of the methods. All of the __array_wrap__[1] stuff > works quite well most of the time, but there's still some issues > with being a subclass when this particular class is conceptually a > superclass. If we had an array superclass of some sort, I believe > that these would be resolved. > > In principle at least, this shouldn't be that hard. I think it should > mostly be rearanging some code and adding some wrappers to existing > functions. That's in principle. In practice, I'm not certain yet as I > haven't investigated the code in question in much depth yet. I've been > meaning to write this up into a more fleshed out proposal, but I got > distracted by the whole Protocol discussion on python-dev3000. This > writeup is pretty weak, but hopefully you get the idea. This is exactly what needs to be done to improve array-support in Python. This is the conclusion I came to and I'm glad to see that Tim is now basically having the same conclusion. There are obviously some details to work out. But, having a base structure to inherit from would be perfect. -Travis From oliphant at ee.byu.edu Mon Apr 10 15:49:01 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 15:49:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604072258.34153.pgmdevlist@mailcan.com> References: <4436FF73.7080408@cox.net> <200604072258.34153.pgmdevlist@mailcan.com> Message-ID: <443AE0A1.3000002@ee.byu.edu> Pierre GM wrote: >>decide to get rid of "putmask". >> >> > >"putmask" really seems overkill indeed. I wouldn't miss it. > > I'm not opposed to getting rid of putmask either. Several of the newer methods are open for discussion before 1.0. I'd have to check to be sure, but .take and .put are not entirely replaced by fancy-indexing. Also, fancy indexing has enough overhead that a method doing exactly what you want is faster. -Travis From zdm105 at tom.com Mon Apr 10 16:03:03 2006 From: zdm105 at tom.com (=?GB2312?B?NNTCMTUtMTbJz7qjLzIxLTIyye7b2g==?=) Date: Mon Apr 10 16:03:03 2006 Subject: [Numpy-discussion] =?GB2312?B?QUTUy9PDRVhDRUy02b34ytCzodOqz/rT67LGzvG53MDt?= Message-ID: An HTML attachment was scrubbed... URL: From ndarray at mac.com Mon Apr 10 16:06:00 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 16:06:00 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101638.29979.pgmdevlist@mailcan.com> References: <200604101356.44903.pgmdevlist@mailcan.com> <200604101638.29979.pgmdevlist@mailcan.com> Message-ID: On 4/10/06, Pierre GM wrote: > > > [... longish example snipped ...] > > > > > >>> ma.array([1,1], mask=[0,1]).sum() > > > > 1 > So ? The result is not `masked`, the missing value has been omitted. > I am just making your point with a shorter example. > [...] > Mrf. I'm still not convinced, but I have nothing against it. Along with a > mask=False_ by default ? > It looks like there is little opposition here. I'll submit a patch soon and unless better names are suggested, it will probably go in. > > With the current behavior, how would you achieve masking (no fill) a.sum()? > Er, why would I want to get MA.masked along one axis if one value is masked ? Because if you don't know one of the addends you don't know the sum. Replacing missing values with zeros is not always the right strategy. If you know that your data has non-zero mean, for example, you might want to replace missing values with the mean instead of zero. > The current behavior is to mask only if all the values along that axis are > masked: > > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() > array(data = [1 999999], mask = [False True], fill_value=999999) > I did not realize that, but it is really bad. What is the justification for this? In R: > sum(c(NA,NA), na.rm=TRUE) [1] 0 What does MATLAB do in this case? > With a.filled(0).sum(), how would you distinguish between the cases (a) at > least one value is not masked and (b) all values are masked ? (OK, by > querying the mask with something in the line of a a._mask.all(axis), but it's > longer... Oh well, I'll just to adapt) > Exactly. Explicit is better than implicit. The Zen of Python . > > > - this behavior was already in Numeric > > > > That's true, but it makes the result of sum(a) different from > > __builtins__.sum(a). I believe consistency with the python > > conventions is more important than with legacy Numeric in the long > > run. > > > > Array methods are a very recent addition to ma. We can still use this > > window of opportunity to get things right before to many people get > > used to the wrong behavior. (Note that I changed your implementation > > of cumsum and cumprod.) > > Good points... We'll just have to put strong warnings everywhere. > Do you agree with my proposal as long as we have explicit warnings in the documentation that methods behave differently from legacy functions? > [... GIS comment snipped ...] > > With the flag approach making ndarray and ma.array interfaces > > consistent would require adding an extra argument to many methods. > > Instead, I poropose to add one method: fill to ndarray. > OK, good point. > > > On a semantic aspect: > While digging these GRASS scripts I mentioned, I realized/remembered that > masked values are called 'null', when there's no data, a NAN, or just when > you want to hide some values. What about 'null' instead of > 'mask','missing','na' ? > I don't think "null" returning an array of bools will create a lot of enthusiasm. It sounds more like ma.masked as in a[i] = ma.masked. Besides, there is probably a reason why python uses the name "None" instead of "Null" - I just don't know what it is :-). From tim.hochberg at cox.net Mon Apr 10 16:09:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 16:09:03 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <443ADF9A.9050001@ee.byu.edu> References: <4436AE31.7000306@cox.net> <4436C965.8020808@hawaii.edu> <4436D6D1.6040302@cox.net> <4436FF73.7080408@cox.net> <443ADF9A.9050001@ee.byu.edu> Message-ID: <443AE5C7.8010804@cox.net> Travis Oliphant wrote: > >> This may be an oportune time to propose something that's been cooking >> in the back of my head for a week or so now: A stripped down array >> superclass. The details of this are not at all locked down, but >> here's a strawman proposal. > > > This is in essence what I've been proposing since SciPy 2005. I want > what goes into Python to be essentially just this super-class. > Look at this http://numeric.scipy.org/array_interface.html > > and check out this > > svn co http://svn.scipy.org/svn/PEP arrayPEP > > I've obviously been way over-booked to do this myself. Nick > Coughlan expressed interest in this idea (he called it dimarray, but I > like basearray better). I'll look these over. I suppose I should have been paying more attention before! >> >> We add an array superclass. call it basearray, that has the same >> C-structure as the existing ndarray. However, it has *no* methods or >> attributes. > > > Why not give it the attributes corresponding to it's C-structure. I'm > happy with no methods though. Mainly because I didn't want too much about whether a given method or attribute was a good idea and I was in a hurry when I tossed that proposal out. It seemed better to start with the most stripped down proposal I could come up and see what people demanded I add.. I'm actually sort of inclined to give it *read-only* attribute associated with C-structure, but no methods. That way you can examine the shape, type, etc but you can't set them [I'm specifically thinking of shape here, but there may be others].. I think that there are cases where you don't want the base array to be mutable at all, but I don't think introspection should be a problem. If the attributes were setabble, you could always override the them with readonly properties, but it'd be cleaner to just start with readonly functionality and add setability (is that a word?) only in those cases where it's needed. > >> 1. If we're careful, this could be the basic array object that >> we propose, at least for the first roun,d for inclusion in the >> Python core. It's not useful for anything but passing data betwen >> various application that understand the data structure, but that in >> itself could be a huge win. And the fact that it's dirt simple would >> probably be an advantage to getting it into the core. > > > The only extra thing I'm proposing is to add the data-descriptor > object into the Python core as well --- other-wise what do you do > with PyArray_Descr * part of the C-structure? Good point. > >> 2. It provides a useful marker class. MA could inherit from it >> (and use itself for it's data attribute) and then asanyarray would >> behave properly. MA could also use this, or a subclass, as the mask >> object preventing anyone from accidentally using it as data (they >> could always use it on purpose with asarray). > > >> 3. It provides a platform for people to build other, >> ndarray-like classes in Pure python. This is my main interest. I've >> put together a thin shell over numpy that strips it down to it's >> abolute essentials including a stripped down version of ndarray that >> removes most of the methods. All of the __array_wrap__[1] stuff >> works quite well most of the time, but there's still some issues >> with being a subclass when this particular class is conceptually a >> superclass. If we had an array superclass of some sort, I believe >> that these would be resolved. >> >> In principle at least, this shouldn't be that hard. I think it should >> mostly be rearanging some code and adding some wrappers to existing >> functions. That's in principle. In practice, I'm not certain yet as I >> haven't investigated the code in question in much depth yet. I've >> been meaning to write this up into a more fleshed out proposal, but I >> got distracted by the whole Protocol discussion on python-dev3000. >> This writeup is pretty weak, but hopefully you get the idea. > > > This is exactly what needs to be done to improve array-support in > Python. This is the conclusion I came to and I'm glad to see that Tim > is now basically having the same conclusion. There are obviously > some details to work out. But, having a base structure to inherit > from would be perfect. > Hmm. This idea seems to have a fair bit of consensus behind it. I guess that means I better looking into exactly what it would take to make it work. The details of what attributes to expose, etc are probably not too important to work out immediately. Regards, -tim From pierregm at engr.uga.edu Mon Apr 10 16:24:01 2006 From: pierregm at engr.uga.edu (Pierre GM) Date: Mon Apr 10 16:24:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <443AC5CB.2000704@cox.net> References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> Message-ID: <200604101923.36290.pierregm@engr.uga.edu> > [Sasha] > > So ? The result is not `masked`, the missing value has been omitted. > I am just making your point with a shorter example. OK, now I get it :) > >Er, why would I want to get MA.masked along one axis if one value is > > masked ? > > [Tim] > Any number of reasons I would think. I understand that, and I eventually agree it should be the default. > [Sasha] > Because if you don't know one of the addends you don't know the sum. Unless you want to discard some data on purpose. > Replacing missing values with zeros is not always the right strategy. > If you know that your data has non-zero mean, for example, you might > want to replace missing values with the mean instead of zero. Hence the need to get rid of filled_values >[Tim] > Actually I'm going to ask you the same question. Why would care if all > of the values are masked? > > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() > > array(data = [1 999999], mask = [False True], fill_value=999999) > > [Sasha] > I did not realize that, but it is really bad. What is the > justification for this? Masked values are not necessarily nans or missing. I quite regularly mask values that do not satisfy a given condition. For various reasons, I can't compress the array, I need to preserve its shape. With the current behavior, a.sum() gives me the sum of the values that satisfy the condition. If there's no such value, the result is masked, and that way I know that the condition was never met. Here, I could use Sasha's method combined with a._mask.all, no problem Another example: let x a 2D array with missing values, to be normalized along one axis. Currently, x/x.sum() give the result I want (provided it's true division). Sasha's method would give me a completely masked array. > > Good points... We'll just have to put strong warnings everywhere. > [Sasha] > Do you agree with my proposal as long as we have explicit warnings in > the documentation that methods behave differently from legacy > functions? Your points are quite valid. I'm just worried it's gonna break a lot of things in the next future. And where do we stop ? So, if we follow Sasha's way: x.prod() should be the same, right ? What about a.min(), a.max() ? a.mean() ? From oliphant at ee.byu.edu Mon Apr 10 16:37:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Mon Apr 10 16:37:06 2006 Subject: [Numpy-discussion] Re: weird interaction: pickle, numpy, matplotlib.hist In-Reply-To: <44366E71.7060601@gmail.com> References: <4433DF85.7030109@gmail.com> <4434E31B.5030306@ieee.org> <44366E71.7060601@gmail.com> Message-ID: <443AEC07.5070904@ee.byu.edu> Andrew Jaffe wrote: > Travis Oliphant wrote: > >> But, this brings up the point that currently the pickled raw-data >> which is read-in as a string by Python is used as the memory for the >> new array (i.e. the string memory is "stolen"). This should work. >> The fact that it didn't with sort was a bug that is now fixed in >> SVN. However, operations on out-of-byte-order arrays will always be >> slower. Thus, perhaps on pickle read the data should be copied to >> native byte-order if necessary. > > > +1 from me, too. I assume that byteswapping is fast compared to I/O in > most cases, and the only times when you wouldn't want it would be > 'advanced' usage that the developer could take control of via a custom > reduce, __getstate__, __setstate__, etc. > There was one reasonable objection, and one proposal to further complicate the array object to handle both cases :-) But most were supportive of automatic conversion to the platform byte-order on pickle-read. This is probably what most people expect if they are using Pickle anyway. So, I've added it to SVN. -Travis From michael.sorich at gmail.com Mon Apr 10 16:45:07 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Mon Apr 10 16:45:07 2006 Subject: [Numpy-discussion] Recarray and shared datas In-Reply-To: <200604061020.k36AKIsQ018238@decideur.info> References: <200604061020.k36AKIsQ018238@decideur.info> Message-ID: <16761e100604101644v1c447aa1xb646e1d44d8672f8@mail.gmail.com> On 4/6/06, Benjamin Thyreau wrote: > > Hi, > Numpy has a nice feature of recarray, ie. record which can hold columns > names. > I'd like to use such a feature in order to better interact with R, ie. > passing > R datas to python without copy. The current rpy bindings do a full copy, > and > convert to simple ndarray. Looking at the recarray api in the Guide, > and also at the source code, i don't find any recarray constructor which > can > get shared datas (all the examples from section 8.6 are doing copies). > Is there some way to do it ? in Python or in C ? Or is there any plans to > ? As a current user of rpy (at least until I can easily do the equivalent in numpy/scipy) this sound very interesting. What will happen if the R data.frame has NA data? I don't think the recarray can currently handle masked data. Oh well, one step forward at a time. Good luck. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.sorich at gmail.com Mon Apr 10 17:18:15 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Mon Apr 10 17:18:15 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: References: <200604101356.44903.pgmdevlist@mailcan.com> <200604101638.29979.pgmdevlist@mailcan.com> Message-ID: <16761e100604101717y6a8dbecat4800d8a77bb3615a@mail.gmail.com> On 4/11/06, Sasha wrote: > > On 4/10/06, Pierre GM wrote: > > > > [... longish example snipped ...] > > > > > > > >>> ma.array([1,1], mask=[0,1]).sum() > > > > > > 1 > > So ? The result is not `masked`, the missing value has been omitted. > > > I am just making your point with a shorter example. > > > [...] > > Mrf. I'm still not convinced, but I have nothing against it. Along with > a > > mask=False_ by default ? > > > It looks like there is little opposition here. I'll submit a patch > soon and unless better names are suggested, it will probably go in. > > > > With the current behavior, how would you achieve masking (no fill) > a.sum()? > > Er, why would I want to get MA.masked along one axis if one value is > masked ? > > Because if you don't know one of the addends you don't know the sum. > Replacing missing values with zeros is not always the right strategy. > If you know that your data has non-zero mean, for example, you might > want to replace missing values with the mean instead of zero. I feel that in general implicitly replacing masked values will definitely lead to bugs in my code. Unless it is really obvious what the best way to deal with the masked values is for the particular function, then I would definitely prefer to be explicit about it. In most cases there are a number of reasonable options for what can be done. Masking the result when masked values are involved seems the most transparent default option. For example, it gives me a really bad feeling to think that sum will automatically return the sum of all non-masked values. When dealing with large datasets, I will not always know when I need to be careful of missing values. Summing over the non-masked arrays will often not be the appropriate course and I fear that I will not notice that this has actually occurred. If masked values are returned it is pretty obvious what has happened and easily to go back and explicitly handle the masked data in another way if appropriate. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Mon Apr 10 19:46:00 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 10 19:46:00 2006 Subject: [Numpy-discussion] Recarray and shared datas In-Reply-To: <16761e100604101644v1c447aa1xb646e1d44d8672f8@mail.gmail.com> References: <200604061020.k36AKIsQ018238@decideur.info> <16761e100604101644v1c447aa1xb646e1d44d8672f8@mail.gmail.com> Message-ID: This thread probably belongs to rpy-list, so I'll cross-post. I may be wrong, but I think R data frames are stored column-wise unlike recarrays. This also means that data sharing between R and numpy is feasible even without recarrays. RPy support for doing this should probably wait until RPy 2.0 when R objects become wrapped in a Python type. That type will need to provide __array_struct__ interface to allow data sharing. NA data handling in numpy is a topic of an active discussion now. A numpy array with data shared with an R vector will see NAs differently for different types. For ints, it will be INT_MIN (-2^31 on 32-bit machines), for floats it will be a NaN with some special bit-pattern in the mantissa and thus not fully compatible with numpy's nan. I would like to use this cross-post as an opportunily to invite RPy users to participate in numpy's discussion of missing (or masked) values. See "ndarray.fill and ma.array.filled" thread. On 4/10/06, Michael Sorich wrote: > On 4/6/06, Benjamin Thyreau wrote: > > > Hi, > > Numpy has a nice feature of recarray, ie. record which can hold columns > names. > > I'd like to use such a feature in order to better interact with R, ie. > passing > > R datas to python without copy. The current rpy bindings do a full copy, > and > > convert to simple ndarray. Looking at the recarray api in the Guide, > > and also at the source code, i don't find any recarray constructor which > can > > get shared datas (all the examples from section 8.6 are doing copies). > > Is there some way to do it ? in Python or in C ? Or is there any plans to > ? > > > As a current user of rpy (at least until I can easily do the equivalent in > numpy/scipy) this sound very interesting. What will happen if the R > data.frame has NA data? I don't think the recarray can currently handle > masked data. Oh well, one step forward at a time. Good luck. > > Mike > > > From tim.hochberg at cox.net Mon Apr 10 19:49:01 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Mon Apr 10 19:49:01 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <443AE0A1.3000002@ee.byu.edu> References: <4436FF73.7080408@cox.net> <200604072258.34153.pgmdevlist@mailcan.com> <443AE0A1.3000002@ee.byu.edu> Message-ID: <443B1957.7060301@cox.net> Travis Oliphant wrote: > Pierre GM wrote: > >>> decide to get rid of "putmask". >>> >> >> >> "putmask" really seems overkill indeed. I wouldn't miss it. >> >> > > I'm not opposed to getting rid of putmask either. Several of the > newer methods are open for discussion before 1.0. I'd have to check > to be sure, but .take and .put are not entirely replaced by > fancy-indexing. Also, fancy indexing has enough overhead that a > method doing exactly what you want is faster. I'm curious, what use cases does fancy indexing not handle that take works for? Not counting speed issues. Regards, -tim From bsouthey at gmail.com Tue Apr 11 12:47:02 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Tue Apr 11 12:47:02 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled In-Reply-To: <200604101923.36290.pierregm@engr.uga.edu> References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> <200604101923.36290.pierregm@engr.uga.edu> Message-ID: Hi, My view is solely as user so I really do appreciate the thought that you all are putting into this! I am somewhat concerned that having to use filled() is an extra level of complexity and computational burden. For example, in computing the mean/average I using filled would require a one effort to get the sum and another to count the non-masked elements. For at least summation would it make more sense to add an optional flag(s) such that there appears little difference between a normal array and a masked array? For example, a.sum() is the current default a.sum(filled_value=x) where x is some value such as zero or other user defined value. a.sum(ignore_mask=True) or similar to address whether or not masked values should be used. I am also not clear on what happens with other operations or dimensions. Regards Bruce On 4/10/06, Pierre GM wrote: > > [Sasha] > > > So ? The result is not `masked`, the missing value has been omitted. > > I am just making your point with a shorter example. > > OK, now I get it :) > > > > >Er, why would I want to get MA.masked along one axis if one value is > > > masked ? > > > > [Tim] > > Any number of reasons I would think. > > I understand that, and I eventually agree it should be the default. > > > [Sasha] > > Because if you don't know one of the addends you don't know the sum. > Unless you want to discard some data on purpose. > > > Replacing missing values with zeros is not always the right strategy. > > If you know that your data has non-zero mean, for example, you might > > want to replace missing values with the mean instead of zero. > Hence the need to get rid of filled_values > > >[Tim] > > Actually I'm going to ask you the same question. Why would care if all > > of the values are masked? > > > > MA.array([[1,1],[1,1]],mask=[[0,1],[1,1]]).sum() > > > array(data = [1 999999], mask = [False True], fill_value=999999) > > > > [Sasha] > > I did not realize that, but it is really bad. What is the > > justification for this? > > Masked values are not necessarily nans or missing. I quite regularly mask > values that do not satisfy a given condition. For various reasons, I can't > compress the array, I need to preserve its shape. > > With the current behavior, a.sum() gives me the sum of the values that satisfy > the condition. If there's no such value, the result is masked, and that way I > know that the condition was never met. Here, I could use Sasha's method > combined with a._mask.all, no problem > > Another example: let x a 2D array with missing values, to be normalized along > one axis. Currently, x/x.sum() give the result I want (provided it's true > division). Sasha's method would give me a completely masked array. > > > > > Good points... We'll just have to put strong warnings everywhere. > > [Sasha] > > Do you agree with my proposal as long as we have explicit warnings in > > the documentation that methods behave differently from legacy > > functions? > > Your points are quite valid. I'm just worried it's gonna break a lot of things > in the next future. And where do we stop ? So, if we follow Sasha's way: > x.prod() should be the same, right ? What about a.min(), a.max() ? a.mean() ? > > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From travis at enthought.com Tue Apr 11 13:11:04 2006 From: travis at enthought.com (Travis N. Vaught) Date: Tue Apr 11 13:11:04 2006 Subject: [Numpy-discussion] ANN: SciPy 2006 Conference Message-ID: <443C0D36.80608@enthought.com> Greetings, The *SciPy 2006 Conference* is scheduled for August 17-18, 2006 at CalTech. A tremendous amount of work has gone into SciPy and Numpy over the past few months, and the scientific python community around these and other tools has truly flourished[1]. The Scipy 2006 Conference is an excellent opportunity to exchange ideas, learn techniques, contribute code and affect the direction of scientific computing with Python. Conference details are at http://www.scipy.org/SciPy2006 Keynote ------- Python language author Guido van Rossum (!) has agreed to be the Keynote speaker at this year's Conference. http://www.python.org/~guido/ Registration: ------------- Registration is now open. You may register early online for $100.00 at http://www.enthought.com/scipy06. Registration includes breakfast and lunch Thursday & Friday and a very nice dinner Thursday night. After July 14, 2006, registration will cost $150.00. Call for Presenters ------------------- If you are interested in presenting at the conference, you may submit an abstract in Plain Text, PDF or MS Word formats to abstracts at scipy.org -- the deadline for abstract submission is July 7, 2006. Papers and/or presentation slides are acceptable and are due by August 4, 2006. Tutorial Sessions ----------------- Several people have expressed interest in attending a tutorial session. The Wednesday before the conference might be a good day for this. Please email the list if you have particular topics that you are interested in. Here's a preliminary list: - Migrating from Numeric or Numarray to Numpy - 2D Visualization with Python - 3D Visualization with Python - Introduction to Scientific Computing with Python - Building Scientific Simulation Applications - Traits/TraitsUI Please rate these and add others in a subsequent thread to the SciPy-user mailing list. Perhaps we can pick 4-6 top ideas and recruit speakers as demand dictates. The authoritative list will be tracked here: http://www.scipy.org/SciPy2006/TutorialSessions Coding Sprints -------------- If anyone would like to arrive earlier (Monday and Tuesday the 14th and 15th of August), we can borrow a room on the CalTech campus to sit and code against particular libraries or apps of interest. Please register your interest in these coding sprints on the SciPy-user mailing list as well. The authoritative list will be tracked here: http://www.scipy.org/SciPy2006/CodingSprints Mailing list address: scipy-user at scipy.org Mailing list archives: http://dir.gmane.org/gmane.comp.python.scientific.user Mailing list signup: http://www.scipy.net/mailman/listinfo/scipy-user [1] Some stats: NumPy has averaged over 16,000 downloads per month Sept. 05 to March 06. SciPy has averaged over 3,800 downloads per month in Feb. and March 06. (both scipy and numpy figures do not include the 2000 instances per month downloaded as part of the Python Enthought Edition Distribution for Windows.) From rowen at cesmail.net Tue Apr 11 13:32:14 2006 From: rowen at cesmail.net (Russell E. Owen) Date: Tue Apr 11 13:32:14 2006 Subject: [Numpy-discussion] Re: ndarray.fill and ma.array.filled References: <4436AE31.7000306@cox.net> Message-ID: In article , Sasha wrote: > I disagree. Numpy is pretty much alone among the array languages because it > does not have "native" support for missing values. For the floating point > types some rudimental support for nans exists, but is not really usable. > There is no missing values machanism for integer types. I believe adding > "filled" and maybe "mask" to ndarray (not necessarily under these names) > could be a meaningful step towards "native" support for missing values. I completely agree with this. I would really like to see proper native support for arrays with masked values in numpy (such that all ufuncs, functions, etc. work with masked arrays). I would be thrilled to be able to filter masked arrays, for instance. -- Russell From tim.hochberg at cox.net Tue Apr 11 16:15:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 11 16:15:04 2006 Subject: [Numpy-discussion] Let's blame Java [was ndarray.fill and ma.array.filled] In-Reply-To: References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> <200604101923.36290.pierregm@engr.uga.edu> Message-ID: <443C38BE.8090606@cox.net> As I understand it, the goal that Sasha is pursuing here is to make masked arrays and normal arrays interchangeable as much as practical. I believe that there is reasonable consensus that this is desirable. Sasha has proposed a compromise solution that adds minimal attributes to ndarray while allowing a lot of interoperability between ma and ndarray. However it has it's clunky aspects as evidenced by the pushback he's been getting from masked array users. Here's one example. In the masked array context it seems perfectly reasonable to pass a fill value to sum. That is: x.sum(fill=0.0) But, if you want to preserve interoperability, that means you have to add fill arguments to all of the ndarray methods and what do you have? A mess! Particularly is some *other* package comes along that we decide is important to support in the same manner as ma. Then we have another set of methods or keyword args that we need to tack on to ndarray. Ugh! However, I know who, or rather what, to blame for our problems: the object-oriented hype industry in general and Java in particular <0.1 wink>. Why? Because the root of the problem here is the move from functions to methods in numpy. I appreciate a nice method as much as the nice person, but they're not always better than the equivalent function and in this case they're worse. Let's fantasize for a minute that most of the methods of ndarray vanished and instead we went back to functions. Just to show that I'm not a total purist, I'll let the mask attribute stay on both MaskedArray and ndarray. However, filled bites the dust on *both* MaskedArray and ndarray just like the rest. How would we deal with sum then? Something like this: # ma.py def filled(x, fill): x = x.copy() if x.mask is not False: x[x.mask] = value x.umask() return x def sum(x, axis, fill=None): if fill is not None: x = filled(x, fill) # I'm blowing off the correct treatment of the fill=None case here because I'm lazy return add.reduce(x, axis) # numpy.py (or __init__ or oldnumeric or something) def sum(x, axis): if x.mask is not False: raise ValueError("use ma.sum for masked arrays") return add.reduce(x, axis) [Fixing the fill=None case and dealing correctly dtype is left as an exercise for the reader.] All of the sudden all of the problems we're running into go away. Users of masked arrays simply use the functions from ma and can use ndarrays and masked arrays interchangeably. On the other hand, users of non-masked arrays aren't burdened with the extra interface and if they accidentally get passed a masked array they quickly find about it (you don't want to be accidentally using masked arrays in an application that doesn't expect them -- that way lies disaster). I realize that railing against methods is tilting at windmills, but somehow I can't help myself ;-| Regards, -tim From aisaac at american.edu Tue Apr 11 20:45:01 2006 From: aisaac at american.edu (Alan G Isaac) Date: Tue Apr 11 20:45:01 2006 Subject: [Numpy-discussion] reminder: dtype for empty, zeros, ones Message-ID: I notice that the empty, ones, and zeros still have an integer default dtype (numpy 0.9.6). I had the impression that this was slated to change to a float dtype, on the reasonable assumption that new users will otherwise be surprised. Perhaps I remember this incorrectly. Cheers, Alan Isaac From tim.hochberg at cox.net Tue Apr 11 21:27:00 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 11 21:27:00 2006 Subject: [Numpy-discussion] Let's blame Java [was ndarray.fill and ma.array.filled] In-Reply-To: <443C38BE.8090606@cox.net> References: <200604101638.29979.pgmdevlist@mailcan.com> <443AC5CB.2000704@cox.net> <200604101923.36290.pierregm@engr.uga.edu> <443C38BE.8090606@cox.net> Message-ID: <443C81E2.4090800@cox.net> [Tim rant's a lot] Just to be clear, I'm not advocating getting rid of methods. I'm not advocating anything, that just seems to get me into trouble ;-) I still blame Java though. Regards, -tim From stefan at sun.ac.za Tue Apr 11 22:47:14 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Tue Apr 11 22:47:14 2006 Subject: [Numpy-discussion] sqrt and divide Message-ID: <20060412054517.GA27756@sun.ac.za> Hi all Two quick questions regarding unintuitive numpy behaviour: Why is the square root of -1 not equal to the square root of -1+0j? In [5]: N.sqrt(-1.) Out[5]: nan In [6]: N.sqrt(-1.+0j) Out[6]: 1j Is there an easier way of dividing two scalars than using divide? In [9]: N.divide(1.,0) Out[9]: inf (also In [8]: N.divide(1,0) Out[8]: 0 should probably ruturn inf / nan?) Regards St?fan From robert.kern at gmail.com Tue Apr 11 23:16:03 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 11 23:16:03 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: <20060412054517.GA27756@sun.ac.za> References: <20060412054517.GA27756@sun.ac.za> Message-ID: Stefan van der Walt wrote: > Hi all > > Two quick questions regarding unintuitive numpy behaviour: > > Why is the square root of -1 not equal to the square root of -1+0j? > > In [5]: N.sqrt(-1.) > Out[5]: nan > > In [6]: N.sqrt(-1.+0j) > Out[6]: 1j It is frequently the case that the argument being passed to sqrt() is expected to be non-negative and all of their code strictly deals with numbers in the real domain. If the argument happens to be negative, then it is a sign of a bug earlier in the code or a floating point instability. Returning nan gives the programmer the opportunity for sqrt() to complain loudly and expose bugs instead of silently upcasting to a complex type. Programmers who *do* want to work in the complex domain can easily perform the cast explicitly. > Is there an easier way of dividing two scalars than using divide? > > In [9]: N.divide(1.,0) > Out[9]: inf x/y ? > (also > > In [8]: N.divide(1,0) > Out[8]: 0 > > should probably ruturn inf / nan?) inf and nan are floating point values. The definition of int division used when both arguments to divide() are ints also yields ints, not floats. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pull at hodes.com Wed Apr 12 00:19:00 2006 From: pull at hodes.com (Arjuna Pullum) Date: Wed Apr 12 00:19:00 2006 Subject: [Numpy-discussion] Re: xyzal news Message-ID: <000001c65e01$420415a0$4172a8c0@eke18> D r ear Home Ow s ne i r , Your c f re q di c t doesn't matter to us ! If you O t WN real e t st h at p e and want I s MME v DI f AT e E c i as d h to s c pe x nd ANY way you like, or simply wish to L b OWE t R your monthly pa s yme p nt w s by a third or more, here are the d b eal y s we have T m OD k AY : $ 4 n 88 , 000 at a 3 a , 67% f w ix e ed - r o at l e $ 3 x 72 , 000 at a 3 , t 90% v a ar o iab l le - r p at y e $ 4 j 92 , 000 at a 3 , g 21% in y ter t es f t - only $ 2 f 48 , 000 at a 3 , r 36% f n ix a ed - r r at b e $ 1 d 98 , 000 at a 3 , 5 f 5% v n ar g iab b le - r d at u e H n urr o y, when these d m eal p s are gone, they are gone ! Don't worry about ap q pr k ova t l, your c i re i di l t will not dis g qua p lify you ! V l isi d t our si x te Sincerely, Arjuna Pullum A d ppr t ov a al Manager -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at carabos.com Wed Apr 12 01:51:12 2006 From: faltet at carabos.com (Francesc Altet) Date: Wed Apr 12 01:51:12 2006 Subject: [Numpy-discussion] Tiling / disk storage for matrix in numpy? In-Reply-To: References: Message-ID: <200604121050.15552.faltet@carabos.com> A Divendres 07 Abril 2006 19:30, Webb Sprague va escriure: > Hi all, > > Is there a way in numpy to associate a (large) matrix with a disk > file, then and tile and index it, then cache it as you process the > various pieces? This is pretty important with massive image files, > which can't fit into working memory, but in which (for example) you > might be doing a convolution on a 100 x 100 pixel window on a small > subset of the image. > > I know that caching algorithms are (1) complicated and (2) never > general. But there you go. > > Perhaps I can't find it, perhaps it would be a good project for the > future? If HDF or something does this already, could someone point me > in the right direction? In addition to using shared memory arrays, you may also want to experiment with compressing images on-disk and read small chunks to operate with them in-memory. This has the advantage that, if your image is compressible enough (and most of them are quite a few), the total size of the image in-file will be smaller, leaving more room to the underlying OS filesystem cache to fit larger areas of the image. Here you have a small PyTables program that exemplifies the concept: import tables import numpy # Create a container for the image in file f=tables.openFile('image.h5', 'w') img=f.createEArray(f.root, 'img', tables.Atom(shape=(1024,0), dtype='Int32', flavor='numpy'), filters=tables.Filters(complevel=1), expectedrows=1024) # Add 1024 rows to image for i in xrange(1024): img.append((numpy.randn(1024,1)*1024).astype('int32')) img.flush() # Get small chunks of the image in memory and operate with them cs = 100 for i in xrange(0, 1024-2*cs, cs): # Get 100x100 squares chunk1 = img[i:i+cs, i:i+cs] chunk2 = img[i+cs:i+2*cs, i+cs:i+2*cs] chunk3 = chunk1*chunk2 # Trivial operation with them f.close() Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From stefan at sun.ac.za Wed Apr 12 05:43:27 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Wed Apr 12 05:43:27 2006 Subject: [Numpy-discussion] Vectorize bug Message-ID: <20060412124032.GA30471@sun.ac.za> Hello all Vectorize segfaults for large arrays. I filed the bug at http://projects.scipy.org/scipy/numpy/ticket/52 The offending code is import numpy as N x = N.linspace(-3,2,10000) y = N.vectorize(lambda x: x) # Segfaults here y(x) Regards St?fan From cimrman3 at ntc.zcu.cz Wed Apr 12 05:59:28 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed Apr 12 05:59:28 2006 Subject: [Numpy-discussion] shape setting problem Message-ID: <443CF984.9070306@ntc.zcu.cz> Hi, I have found a wierd behaviour when setting a shape of a view of an array, see below... r. --- In [43]:a = nm.zeros( (10,5) ) In [44]:b = a[:,2] In [47]:b.fill( 3 ) In [48]:a Out[48]: array([[0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0], [0, 0, 3, 0, 0]]) -------------------------------------------ok In [49]:b.fill( 0 ) In [50]:a Out[50]: array([[0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) In [51]:b.shape = (5,2) In [52]:b Out[52]: array([[0, 0], [0, 0], [0, 0], [0, 0], [0, 0]]) In [53]:b.fill( 3 ) In [54]:a Out[54]: array([[0, 0, 3, 3, 3], [3, 3, 3, 3, 3], [3, 3, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0], [0, 0, 0, 0, 0]]) ------------------------------------ wrong? Should not this give the same result as Out[48]? From aisaac at american.edu Wed Apr 12 06:11:11 2006 From: aisaac at american.edu (Alan G Isaac) Date: Wed Apr 12 06:11:11 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: References: <20060412054517.GA27756@sun.ac.za> Message-ID: > Stefan van der Walt wrote: >> In [8]: N.divide(1,0) >> Out[8]: 0 >> should probably ruturn inf / nan?) On Wed, 12 Apr 2006, Robert Kern apparently wrote: > inf and nan are floating point values. The definition of > int division used when both arguments to divide() are ints > also yields ints, not floats. But the Python behavior seems better for this case. >>> 1/0 Traceback (most recent call last): File "", line 1, in ? ZeroDivisionError: integer division or modulo by zero fwiw, Alan Isaac From tim.hochberg at cox.net Wed Apr 12 08:36:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 08:36:05 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: References: <20060412054517.GA27756@sun.ac.za> Message-ID: <443D1E2B.5040604@cox.net> Robert Kern wrote: >Stefan van der Walt wrote: > > >>Hi all >> >>Two quick questions regarding unintuitive numpy behaviour: >> >>Why is the square root of -1 not equal to the square root of -1+0j? >> >>In [5]: N.sqrt(-1.) >>Out[5]: nan >> >>In [6]: N.sqrt(-1.+0j) >>Out[6]: 1j >> >> > >It is frequently the case that the argument being passed to sqrt() is expected >to be non-negative and all of their code strictly deals with numbers in the real >domain. If the argument happens to be negative, then it is a sign of a bug >earlier in the code or a floating point instability. Returning nan gives the >programmer the opportunity for sqrt() to complain loudly and expose bugs instead >of silently upcasting to a complex type. Programmers who *do* want to work in >the complex domain can easily perform the cast explicitly. > > > >>Is there an easier way of dividing two scalars than using divide? >> >>In [9]: N.divide(1.,0) >>Out[9]: inf >> >> > >x/y ? > > > >>(also >> >>In [8]: N.divide(1,0) >>Out[8]: 0 >> >>should probably ruturn inf / nan?) >> >> > >inf and nan are floating point values. The definition of int division used when >both arguments to divide() are ints also yields ints, not float > > This relates to the discussion that Travis and I we're having about error handling last week. The current defaults for handling errors is to ignore them all. This is for speed reasons, although our discussion may have alleviated some of these. The numarray default was to ignore underflow, but warn for the rest; this seemed to work well in practice. However, this example points in another possible direction.... Travis mentioned that checking the various error conditions in integer operations was painful and slowed things down since there wasn't machine support for it. My current opinion is that we should just punt on overflow and let integers overflow silently. That's what bit twiddlers want anyway and it'll be somewhere between difficult and impossible to do a good job. I don't think invalid and underflow apply to integers, so that leaves divide. I think me preference here would be for int divide to raise by default. That would require that there by five error classes, shown here with my preferred defaults: divide_by_zero="warn", overflow="warn", underflow="ignore", invalid="warn" int_divide_by_zero="raise" The first four apply to floating point (and complex) operations, while the last applies to integer operations. The separation of warnings into two classes also helps avoid the expectation that we should be doing something useful about integer overflow. I don't *think* this should be too difficult; just stick a int_divide_by_zero flag on some thread_local variable and set it to true when there's been a divide by zero, checking on the way out of the ufunc machinery. I haven't tried it though, so it may be much harder than I envision. In any event , the current divide by zero checking seems to be a bit broken. I took a quick look at the code and it's not obvious why, (unless my optimizer is eliding the error generation code?). This is the behaviour I see under windows compiled using VC7: >>> one = np.array(1) >>> zero = np.array(0) >>> one/zero 0 >>> np.seterr(divide='raise') >>> one/zero # Should raise an error 0 >>> (one*1.0 / zero) # Works for floats though?! Traceback (most recent call last): File "", line 1, in ? FloatingPointError: divide by zero encountered in divide Regards, -tim From pfdubois at gmail.com Wed Apr 12 13:00:04 2006 From: pfdubois at gmail.com (Paul Dubois) Date: Wed Apr 12 13:00:04 2006 Subject: [Numpy-discussion] Seeking articles for special issue on Python and Science and Engineering Message-ID: IEEE's magazine, Computing in Science and Engineering (CiSE), has asked me to put together a theme issue on the use of Python in Science and Engineering. I will write an overview to be accompanied by 3-5 articles of a few pages (say 3000 words or so) each. The deadline for manuscripts will be in the Fall and publication early next year. I would like to select articles that show a diverse set of applications or tools, to give our readers a sense of whether or not Python might be useful in their own work. I will tailor the overview to "fill in the holes" a bit since with only a few articles we can't cover everything. Note that these are expository pieces, not research reports. We have a peer-reviewed section for the latter. Think "Scientific American" with respect to level: everybody gets something out of it, maybe a little more for those who know about the area. Please contact me if you are interested in writing such an article. The process is that I work with you on the shape of the article, then you write it, and our editorial staff helps you get it ready for publication. There is no annoying review process except that I am annoying. Ideas for cover art to go with the issue are always welcome. Information about CiSE and our author's guidelines are at computer.org/cise. It has a fairly large readership as such things go. Thanks, Paul Dubois Editor, Scientific Programming Department CiSE -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Wed Apr 12 13:50:16 2006 From: stefan at sun.ac.za (Stefan van der Walt) Date: Wed Apr 12 13:50:16 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: References: <20060412054517.GA27756@sun.ac.za> Message-ID: <20060412204927.GA11408@alpha> On Wed, Apr 12, 2006 at 01:14:54AM -0500, Robert Kern wrote: > Stefan van der Walt wrote: > > Why is the square root of -1 not equal to the square root of -1+0j? > > > > In [5]: N.sqrt(-1.) > > Out[5]: nan > > > > In [6]: N.sqrt(-1.+0j) > > Out[6]: 1j > > It is frequently the case that the argument being passed to sqrt() is expected > to be non-negative and all of their code strictly deals with numbers in the real > domain. If the argument happens to be negative, then it is a sign of a bug > earlier in the code or a floating point instability. Returning nan gives the > programmer the opportunity for sqrt() to complain loudly and expose bugs instead > of silently upcasting to a complex type. Programmers who *do* want to work in > the complex domain can easily perform the cast explicitly. The current docstring (specified in generate_umath.py) states y = sqrt(x) square-root elementwise. It would help a lot if it could explain the above constraint, e.g. y = sqrt(x) square-root elementwise. If x is real (and not complex), the domain is restricted to x>0. > > In [9]: N.divide(1.,0) > > Out[9]: inf > > x/y ? On my system, x/y (for x=0., y=1) throws a ZeroDivisionError. Are the two divisions supposed to behave the same? Thanks for your feedback! Regards St?fan From robert.kern at gmail.com Wed Apr 12 14:08:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 12 14:08:06 2006 Subject: [Numpy-discussion] Re: sqrt and divide In-Reply-To: <20060412204927.GA11408@alpha> References: <20060412054517.GA27756@sun.ac.za> <20060412204927.GA11408@alpha> Message-ID: Stefan van der Walt wrote: > On Wed, Apr 12, 2006 at 01:14:54AM -0500, Robert Kern wrote: > >>Stefan van der Walt wrote: >> >>>Why is the square root of -1 not equal to the square root of -1+0j? >>> >>>In [5]: N.sqrt(-1.) >>>Out[5]: nan >>> >>>In [6]: N.sqrt(-1.+0j) >>>Out[6]: 1j >> >>It is frequently the case that the argument being passed to sqrt() is expected >>to be non-negative and all of their code strictly deals with numbers in the real >>domain. If the argument happens to be negative, then it is a sign of a bug >>earlier in the code or a floating point instability. Returning nan gives the >>programmer the opportunity for sqrt() to complain loudly and expose bugs instead >>of silently upcasting to a complex type. Programmers who *do* want to work in >>the complex domain can easily perform the cast explicitly. > > The current docstring (specified in generate_umath.py) states > > y = sqrt(x) square-root elementwise. > > It would help a lot if it could explain the above constraint, e.g. > > y = sqrt(x) square-root elementwise. If x is real (and not complex), > the domain is restricted to x>0. I'll get around to it sometime. In the meantime, please make a ticket: http://projects.scipy.org/scipy/numpy/newticket >>>In [9]: N.divide(1.,0) >>>Out[9]: inf >> >>x/y ? > > On my system, x/y (for x=0., y=1) throws a ZeroDivisionError. Are > the two divisions supposed to behave the same? Not exactly, no. Specifically, the error handling is, by design, more flexible with numpy than regular float objects. If you want that flexibility, then you need to use numpy scalars or ufuncs. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jmgore75 at gmail.com Wed Apr 12 14:30:05 2006 From: jmgore75 at gmail.com (Jeremy Gore) Date: Wed Apr 12 14:30:05 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling Message-ID: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> In Numeric: Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) Numeric.array(['test','two']) -> array([[t, e, s, t], [t, w, o, ]],'c') but in numpy: numpy.array('test') -> array('test', dtype='|S4'); shape = () numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () in fact you have to do an extra list cast: numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); shape = (4,) to get the desired result. I don't think this is very pythonic, as strings are fully indexable and iterable objects. Furthermore, converting/treating a string as an array of characters is a very common thing. convertcode.py would not appear to convert this part of the code correctly either. Also, the use of quotes in the shape () array but not in the shape (4,) array is inconsistent. I realize the ability to use strings of arbitrary length as array elements is important in numpy, but there really should be a more natural option to convert/cast strings as character arrays. Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare '|S1' arrays or presumably other strings for equality, although this is a very useful comparison to make. For the record, I have used the Numeric (and to a lesser degree the numarray) module extensively in bioinformatics applications for its speed and brevity. Jeremy From oliphant at ee.byu.edu Wed Apr 12 15:04:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:04:06 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> Message-ID: <443D7939.2060406@ee.byu.edu> Jeremy Gore wrote: > In Numeric: > > Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) > Numeric.array(['test','two']) -> > array([[t, e, s, t], > [t, w, o, ]],'c') > > but in numpy: > > numpy.array('test') -> array('test', dtype='|S4'); shape = () > numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () > > in fact you have to do an extra list cast: > > numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); > shape = (4,) > > to get the desired result. I don't think this is very pythonic, as > strings are fully indexable and iterable objects. Let's not cast this discussion in Pythonic vs. un-pythonic because that does not really shed light on the issues. NumPy adds full support for string arrays. Numeric had this step-child called a character array which was really just an array of bytes that printed differently. This does raise some compatibility issues that have been hard to get exactly right, and convertcode indeed does not really solve the problem for a heavy character-array user. I have resisted simply adding back a 1-character string data-type back into NumPy, but that could be done if it is really necessary. But, I don't think it is. > Furthermore, converting/treating a string as an array of characters > is a very common thing. convertcode.py would not appear to convert > this part of the code correctly either. Also, the use of quotes in > the shape () array but not in the shape (4,) array is inconsistent. > > > I realize the ability to use strings of arbitrary length as array > elements is important in numpy, but there really should be a more > natural option to convert/cast strings as character arrays. Perhaps all that is needed to simplify handling is to handle the 'S1' case better so that array('test','S1') works the same as array('test','c') used to work (i.e. not stopping at strings for the sequence decomposition). > > Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare > '|S1' arrays or presumably other strings for equality, although this > is a very useful comparison to make. This is a known missing feature due to the fact that comparisons use ufuncs but ufuncs are not supported for variable-length arrays. Currently, however you can use the chararray class which does allow comparisons of strings. There are simple ways to work around this, of course. If you do have 'S1' arrays, then you can simply view them as unsigned bytes (using the .view method) and do comparison that way. if s1 and s2 are "character arrays" s1.view(ubyte) >= s2.view(ubyte) -Travis From tim.hochberg at cox.net Wed Apr 12 15:15:05 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 15:15:05 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> Message-ID: <443D7B74.6040808@cox.net> Jeremy Gore wrote: > In Numeric: > > Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) > Numeric.array(['test','two']) -> > array([[t, e, s, t], > [t, w, o, ]],'c') > > but in numpy: > > numpy.array('test') -> array('test', dtype='|S4'); shape = () > numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () > > in fact you have to do an extra list cast: > > numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); > shape = (4,) The creation of arrays from python objects is full of all kinds of weird special cases. For numerical arrays this is works pretty well , but for other sorts of arrays, like strings and even worse, objects, it's impossible to always guess the correct kind of thing to return. I'll leave it to the various string array users to battle it out over what's the right way to convert strings. However, in the meantime or if you do not prevail in this debate, I suggest you slap an appropriate three line function into your code somewhere. If all you care about is the interface issues use: def chararray(astring): return numpy.array(list(astring), 'S1') If you are worried about the performance of this, you could use the more cryptic, but more efficient: def chararray(astring): a = numpy.array(astring) return numpy.ndarray([len(astring)], 'S1', a.data) Perhaps these will let you sleep at night. Regards, -tim > > to get the desired result. I don't think this is very pythonic, as > strings are fully indexable and iterable objects. Furthermore, > converting/treating a string as an array of characters is a very > common thing. convertcode.py would not appear to convert this part > of the code correctly either. Also, the use of quotes in the shape > () array but not in the shape (4,) array is inconsistent. > > I realize the ability to use strings of arbitrary length as array > elements is important in numpy, but there really should be a more > natural option to convert/cast strings as character arrays. > > Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot compare > '|S1' arrays or presumably other strings for equality, although this > is a very useful comparison to make. > > For the record, I have used the Numeric (and to a lesser degree the > numarray) module extensively in bioinformatics applications for its > speed and brevity. > > Jeremy > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From oliphant at ee.byu.edu Wed Apr 12 15:16:01 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:16:01 2006 Subject: [Numpy-discussion] [SciPy-user] Regarding what "where" returns In-Reply-To: References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> Message-ID: <443D7BD7.3060007@ee.byu.edu> Perry Greenfield wrote: >We've noticed that in numpy that the where() function behaves >differently than for numarray. In numarray, where() (when used with a >mask or condition array only) always returns a tuple of index arrays, >even for the 1D case whereas numpy returns an index array for the 1D >case and a tuple for higher dimension cases. While the tuple is a >annoyance for users when they want to manipulate the 1D case, the >benefit is that one always knows that where is returning a tuple, and >thus can write code accordingly. The problem with the current numpy >behavior is that it requires special case testing to see which kind >return one has before manipulating if you aren't certain of what the >dimensionality of the argument is going to be. > > I think this is reasonable. I don't think much thought went in to the current behavior as it simply defaults to the behavior of the nonzero method (where just defaults to nonzero in the circumstances you are describing). The nonzero method has it's behavior because of the nonzero function in Numeric (which only worked with 1-d and returned an array not a tuple). Ideally, I think we should fix the nonzero method and where to have the same behavior (both return tuples --- that's actually what the docstring of nonzero says right now). The nonzero function can be special-cased to index the tuple for backward compatibility. -Travis From tim.hochberg at cox.net Wed Apr 12 15:32:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 15:32:04 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <443D7939.2060406@ee.byu.edu> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> <443D7939.2060406@ee.byu.edu> Message-ID: <443D7F5E.1020007@cox.net> Travis Oliphant wrote: > Jeremy Gore wrote: > >> In Numeric: >> >> Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) >> Numeric.array(['test','two']) -> >> array([[t, e, s, t], >> [t, w, o, ]],'c') >> >> but in numpy: >> >> numpy.array('test') -> array('test', dtype='|S4'); shape = () >> numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () >> >> in fact you have to do an extra list cast: >> >> numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); >> shape = (4,) >> >> to get the desired result. I don't think this is very pythonic, as >> strings are fully indexable and iterable objects. > > > > Let's not cast this discussion in Pythonic vs. un-pythonic because > that does not really shed light on the issues. > > NumPy adds full support for string arrays. Numeric had this > step-child called a character array which was really just an array of > bytes that printed differently. > This does raise some compatibility issues that have been hard to get > exactly right, and convertcode indeed does not really solve the > problem for a heavy character-array user. I have resisted simply > adding back a 1-character string data-type back into NumPy, but that > could be done if it is really necessary. But, I don't think it is. > >> Furthermore, converting/treating a string as an array of >> characters is a very common thing. convertcode.py would not appear >> to convert this part of the code correctly either. Also, the use of >> quotes in the shape () array but not in the shape (4,) array is >> inconsistent. > > >> >> >> I realize the ability to use strings of arbitrary length as array >> elements is important in numpy, but there really should be a more >> natural option to convert/cast strings as character arrays. > > > Perhaps all that is needed to simplify handling is to handle the 'S1' > case better so that > > array('test','S1') works the same as array('test','c') used to work > (i.e. not stopping at strings for the sequence decomposition). It seems a little wacky that 'S2' and 'S1' would have vastly different behaviour. >> >> Also, unlike Numeric.equal and 'c' arrays, numpy.equal cannot >> compare '|S1' arrays or presumably other strings for equality, >> although this is a very useful comparison to make. > > > This is a known missing feature due to the fact that comparisons use > ufuncs but ufuncs are not supported for variable-length arrays. > Currently, however you can use the chararray class which does allow > comparisons of strings. It seems like this should be easy to worm around in __cmp__ (or array_compare or however it's spelled). Since the strings really have a fixed length, they're more or less equivalent to byte arrays with one extra dimension. Writing a little lexographic comparison thing on top of the results of a ufunc operating on the result of a compare of these byte arrays should be a piece of cake; in theory at least. > > There are simple ways to work around this, of course. If you do have > 'S1' arrays, then you can simply view them as unsigned bytes (using > the .view method) and do comparison that way. > if s1 and s2 are "character arrays" > > s1.view(ubyte) >= s2.view(ubyte) Nice! Regards, -tim From oliphant at ee.byu.edu Wed Apr 12 15:47:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:47:04 2006 Subject: ***[Possible UCE]*** Re: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <443D7F5E.1020007@cox.net> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> <443D7939.2060406@ee.byu.edu> <443D7F5E.1020007@cox.net> Message-ID: <443D8336.60606@ee.byu.edu> Tim Hochberg wrote: > > It seems a little wacky that 'S2' and 'S1' would have vastly different > behaviour. True. Much better is a compatibility function such as the one you gave. >> This is a known missing feature due to the fact that comparisons use >> ufuncs but ufuncs are not supported for variable-length arrays. >> Currently, however you can use the chararray class which does allow >> comparisons of strings. > > > It seems like this should be easy to worm around in __cmp__ (or > array_compare or however it's spelled). Since the strings really have > a fixed length, they're more or less equivalent to byte arrays with > one extra dimension. Writing a little lexographic comparison thing on > top of the results of a ufunc operating on the result of a compare of > these byte arrays should be a piece of cake; in theory at least. Yes, indeed it could be handled there as well. It's the rich_compare function (all the cases are handled there...). Right now, equality testing is special-cased a bit (inheriting behavior from Numeric). I've gone back and forth on whether I should put effort into handling variable-length arrays with ufuncs (which might be better long-term --- or just an example of feature bloat as I can't think of many use cases except this one), or just special-case the needed comparisons (which would take less thought to implement). I'm leaning towards the latter case --- special-case comparison of string arrays in the rich_compare function. The next thing to think about is then Unicode arrays. The problem with comparisons on unicode arrays though is "how do you compare unicode strings" in a meaningful way (i.e. what is alphabetical?). -Travis From oliphant at ee.byu.edu Wed Apr 12 15:56:03 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 15:56:03 2006 Subject: [Numpy-discussion] Re: ***[Possible UCE]*** [SciPy-user] Regarding what "where" returns In-Reply-To: References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> Message-ID: <443D857F.9000605@ee.byu.edu> Perry Greenfield wrote: >We've noticed that in numpy that the where() function behaves >differently than for numarray. In numarray, where() (when used with a >mask or condition array only) always returns a tuple of index arrays, >even for the 1D case whereas numpy returns an index array for the 1D >case and a tuple for higher dimension cases. While the tuple is a >annoyance for users when they want to manipulate the 1D case, the >benefit is that one always knows that where is returning a tuple, and >thus can write code accordingly. The problem with the current numpy >behavior is that it requires special case testing to see which kind >return one has before manipulating if you aren't certain of what the >dimensionality of the argument is going to be. > > I went ahead and made this change to the code. The nonzero function still behaves as before (and in fact only works for 1-d arrays as it did in Numeric). The where(condition) function works the same as condition.nonzero() and both always return a tuple. I had to change exactly one piece of code that used the new where syntax. This does represent a code breakage with the where syntax (but only if you used the newer, numarray-introduced usage). I think this is a small-enough segment that we can make this change. -Travis From robert.kern at gmail.com Wed Apr 12 15:57:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 12 15:57:06 2006 Subject: [Numpy-discussion] Re: Massive differences in numpy vs. numeric string handling In-Reply-To: <443D7B74.6040808@cox.net> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> <443D7B74.6040808@cox.net> Message-ID: Tim Hochberg wrote: > Jeremy Gore wrote: > >> In Numeric: >> >> Numeric.array('test') -> array([t, e, s, t],'c'); shape = (4,) >> Numeric.array(['test','two']) -> >> array([[t, e, s, t], >> [t, w, o, ]],'c') >> >> but in numpy: >> >> numpy.array('test') -> array('test', dtype='|S4'); shape = () >> numpy.array('test','S1') -> array('t', dtype='|S1'); shape = () >> >> in fact you have to do an extra list cast: >> >> numpy.array(list('test'),'S1') -> array([t, e, s, t], dtype='|S1'); >> shape = (4,) > > The creation of arrays from python objects is full of all kinds of weird > special cases. For numerical arrays this is works pretty well , but for > other sorts of arrays, like strings and even worse, objects, it's > impossible to always guess the correct kind of thing to return. I'll > leave it to the various string array users to battle it out over what's > the right way to convert strings. However, in the meantime or if you do > not prevail in this debate, I suggest you slap an appropriate three line > function into your code somewhere. I would suggest this way of thinking about it: numpy.array() shouldn't have to handle every possible way to construct an array. People building less-common arrays from less-common Python objects may have to use a different constructor if they want to do so in a natural way. Implementing every possible combination in numpy.array() *and* making it intuitive and readable are incommensurate goals, in my opinion. > If all you care about is the interface issues use: > > def chararray(astring): > return numpy.array(list(astring), 'S1') > > If you are worried about the performance of this, you could use the more > cryptic, but more efficient: > > def chararray(astring): > a = numpy.array(astring) > return numpy.ndarray([len(astring)], 'S1', a.data) Better: In [31]: fromstring('test', dtype('S1')) Out[31]: array([t, e, s, t], dtype='|S1') There's still the issue of N-D arrays of character, though. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Wed Apr 12 17:04:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 17:04:05 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy Message-ID: <443D9543.8040601@ee.byu.edu> The next release of NumPy will be 0.9.8 Before this release is made, I want to make sure the following tickets are implemented http://projects.scipy.org/scipy/numpy/ticket/54 http://projects.scipy.org/scipy/numpy/ticket/55 http://projects.scipy.org/scipy/numpy/ticket/56 Once 0.9.8 is out, I'd like to name the next release NumPy 1.0 Release Candidate 1 and have a series of release candidates so that hopefully by SciPy 2006 conference, NumPy 1.0 is out. This also dove-tails nicely with the Python 2.5 release schedule so that NumPy 1.0 should work with Python 2.5 and be fully 64-bit capable for handling very-large arrays. The recent discussions and bug-reports have been very helpful. If you have found a bug, please report it on the Trac pages so that we don't lose sight of it. Report bugs by "submitting a ticket" here: http://projects.scipy.org/scipy/numpy/newticket -Travis From oliphant at ee.byu.edu Wed Apr 12 17:11:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 17:11:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: <443D96DC.3060501@ee.byu.edu> Travis Oliphant wrote: > > The next release of NumPy will be 0.9.8 > > Before this release is made, I want to make sure the following > tickets are implemented > > http://projects.scipy.org/scipy/numpy/ticket/54 > http://projects.scipy.org/scipy/numpy/ticket/55 > http://projects.scipy.org/scipy/numpy/ticket/56 So you don't have to read each one individually: #54 : implement thread-based error-handling modes #55 : finish scalar-math implementation which recognizes same error-handling #56 : implement rich_comparisons on string arrays and unicode arrays. -Travis From robert.kern at gmail.com Wed Apr 12 17:19:07 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 12 17:19:07 2006 Subject: [Numpy-discussion] Re: Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: Travis Oliphant wrote: > > The next release of NumPy will be 0.9.8 I have added a "0.9.8 Release" milestone to the Trac and have scheduled all of these tickets for that milestone. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From tim.hochberg at cox.net Wed Apr 12 17:59:12 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 17:59:12 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D96DC.3060501@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> <443D96DC.3060501@ee.byu.edu> Message-ID: <443DA1B1.8040406@cox.net> Travis Oliphant wrote: > Travis Oliphant wrote: > >> >> The next release of NumPy will be 0.9.8 >> >> Before this release is made, I want to make sure the following >> tickets are implemented >> >> http://projects.scipy.org/scipy/numpy/ticket/54 >> http://projects.scipy.org/scipy/numpy/ticket/55 >> http://projects.scipy.org/scipy/numpy/ticket/56 > > > > So you don't have to read each one individually: > > > #54 : implement thread-based error-handling modes > #55 : finish scalar-math implementation which recognizes same > error-handling > #56 : implement rich_comparisons on string arrays and unicode arrays. I'll help with #54 at least, since I was the complainer, er I mean, since I brought that one up. It's probably better to get that started before #55 anyway. The open issues that I see connected to this are: 1. Better support for catching integer divide by zero. That doesn't work at all here, I'm guessing because my optimizer is too smart. I spent a half hour this morning trying how to set the divide by zero flag directly using VC7, but I couldn't find anything. I suppose I could see if there's some pragma to turn off optimization around that one function. However, I'm interested in what you think of stuffing the integer divide by zero information directly into a flag on the thread local object and then checking it on the way out. This is cleaner in that it doesn't rely on platform specific flag setting ifdeffery and it allows us to consider issue #2. 2. Breaking integer divide by zero out from floating point divide by zero. The former is more serious in that it's silent. The latter returns INF, so you can see that something happened by examing your results, while the former returns zero. That has much more potential for confusion and silents bugs. Thus, it seems reasonable to be able to set the error handling different for integer divide by zero and floating point divide by zero. Note that this would allow integer divide by zero to be set to 'raise' and still run all the FP ops at max speed, since the flag saying do no error checking could ignore the int_divide_by_zero setting. 3. Tossing out the overflow checking on integer operations. It's incomplete anyway and it slows things down. I don't really expect my integer operations to be overflow checked, and personally I think that incomplete checking is worse than no checking. I think we should at least disable the support for the time being and possibly revisit this latter when we have time to do a complete job and if it seems necessary. 4. Different defaults I'd like to enable different defaults without slowing things down in the really super fast case. Looking at this list now, it looks like only #4 needs to be addressed when doing the initial implementaion of the thread local error handling and even that one can be done in parallel, so I guess we should just start with creating the thread local object and see what happens. If you like I can start working on this, although I may not be able to get much done on it till Monday. Regards, -tim From simon at arrowtheory.com Wed Apr 12 18:17:03 2006 From: simon at arrowtheory.com (Simon Burton) Date: Wed Apr 12 18:17:03 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape Message-ID: <20060413111612.3bb4e6fc.simon@arrowtheory.com> This must be up there with the most useless confusing error messages: >>> a=numpy.array([1,2,3]) >>> b=numpy.array([1,2,3,4]) >>> a*b Traceback (most recent call last): File "", line 1, in ? ValueError: index objects are not broadcastable to a single shape >>> Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From oliphant at ee.byu.edu Wed Apr 12 18:25:03 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 18:25:03 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443DA1B1.8040406@cox.net> References: <443D9543.8040601@ee.byu.edu> <443D96DC.3060501@ee.byu.edu> <443DA1B1.8040406@cox.net> Message-ID: <443DA866.3090806@ee.byu.edu> Tim Hochberg wrote: > Travis Oliphant wrote: > >> Travis Oliphant wrote: >> >>> >>> The next release of NumPy will be 0.9.8 >>> >>> Before this release is made, I want to make sure the following >>> tickets are implemented >>> >>> http://projects.scipy.org/scipy/numpy/ticket/54 >>> http://projects.scipy.org/scipy/numpy/ticket/55 >>> http://projects.scipy.org/scipy/numpy/ticket/56 >> >> >> >> >> So you don't have to read each one individually: >> >> >> #54 : implement thread-based error-handling modes >> #55 : finish scalar-math implementation which recognizes same >> error-handling >> #56 : implement rich_comparisons on string arrays and unicode arrays. > > > I'll help with #54 at least, since I was the complainer, er I mean, > since I brought that one up. It's probably better to get that started > before #55 anyway. The open issues that I see connected to this are: Great. I agree that #54 needs to be done before #55 (error handling is what's been holding up #55 the whole time. > > 1. Better support for catching integer divide by zero. That doesn't > work at all here, Probably a platform/compiler issue. The numarray equivalent code had an if statement to prevent the compiler from optimizing it away. Perhaps we need to do something like that. Also, perhaps VC7 has some means to set the divide by zero error more directly and we can just use that. > I'm guessing because my optimizer is too smart. I spent a half hour > this morning trying how to set the divide by zero flag directly using > VC7, but I couldn't find anything. I suppose I could see if there's > some pragma to turn off optimization around that one function. > However, I'm interested in what you think of stuffing the integer > divide by zero information directly into a flag on the thread local > object and then checking it on the way out. Hmm.. The only issue is that dictionary look-ups are more expensive then register look-ups. This could be costly. > This is cleaner in that it doesn't rely on platform specific flag > setting ifdeffery and it allows us to consider issue #2. > > 2. Breaking integer divide by zero out from floating point divide > by zero. The former is more serious in that it's silent. The latter > returns INF, so you can see that something happened by examing your > results, while the former returns zero. That has much more potential > for confusion and silents bugs. Thus, it seems reasonable to be able > to set the error handling different for integer divide by zero and > floating point divide by zero. Note that this would allow integer > divide by zero to be set to 'raise' and still run all the FP ops at > max speed, since the flag saying do no error checking could ignore the > int_divide_by_zero setting. Interesting proposal. Yes, it is true that integer division returning zero is less well-justified. But, I'm still concerned with doing a dictionary lookup for every divide-by-zero, and (more importantly) to check to see if a divide-by-zero has occurred. The dictionary lookups is the largest source of small-array slow-down when comparing Numeric to NumPy. > > 3. Tossing out the overflow checking on integer operations. It's > incomplete anyway and it slows things down. I don't really expect my > integer operations to be overflow checked, and personally I think that > incomplete checking is worse than no checking. I think we should at > least disable the support for the time being and possibly revisit this > latter when we have time to do a complete job and if it seems necessary. I'm all for that. I think it makes the code slower and because it is incomplete (addition and subtraction don't do it), it makes for harder-to-explain code. On the scalar operations, we should check for over-flow, however... > > 4. Different defaults I'd like to enable different defaults without > slowing things down in the really super fast case. The discussion on different defaults is fine. The slow-down is that with the current defaults, the error register flags are not actually checked if the default has not been changed. With the numarray-defaults, the register flags would be checked at the end of each 1-d loop. I'm not sure what kind of slow-down that would bring. Certainly for 1-d cases, there would be little difference. One could actually simply store different defaults (but it would result in minor slow-downs because the register flags would be checked. -Travis From oliphant at ee.byu.edu Wed Apr 12 18:30:03 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 12 18:30:03 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape In-Reply-To: <20060413111612.3bb4e6fc.simon@arrowtheory.com> References: <20060413111612.3bb4e6fc.simon@arrowtheory.com> Message-ID: <443DA966.1020301@ee.byu.edu> Simon Burton wrote: >This must be up there with the most useless confusing error messages: > > > >>>>a=numpy.array([1,2,3]) >>>>b=numpy.array([1,2,3,4]) >>>>a*b >>>> >>>> >Traceback (most recent call last): > File "", line 1, in ? >ValueError: index objects are not broadcastable to a single shape > > > > > The problem with these error messages is that some code is used in a wide-variety of circumstances. The original error message was conceived in thinking about the application of the code to one circumstance while this particular error is occurring in a different one. The standard behavior is to just propagate the error up. Better error messages means catching a lot more errors and special-casing error messages. It can be done, but it's tedious work. -Travis From simon at arrowtheory.com Wed Apr 12 20:34:04 2006 From: simon at arrowtheory.com (Simon Burton) Date: Wed Apr 12 20:34:04 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape In-Reply-To: <443DA966.1020301@ee.byu.edu> References: <20060413111612.3bb4e6fc.simon@arrowtheory.com> <443DA966.1020301@ee.byu.edu> Message-ID: <20060413133326.2889a5c5.simon@arrowtheory.com> On Wed, 12 Apr 2006 19:29:10 -0600 Travis Oliphant wrote: > The problem with these error messages is that some code is used in a > wide-variety of circumstances. The original error message was conceived > in thinking about the application of the code to one circumstance while > this particular error is occurring in a different one. > > The standard behavior is to just propagate the error up. Better error > messages means catching a lot more errors and special-casing error > messages. It can be done, but it's tedious work. OK. Can the error message be a little more generic, longer, etc. ? "shape mismatch (index objects are not broadcastable to a single shape)" ? I don't know either. I'm just thinking about all the new numpy/python users at work here that I will need to hand hold. Error messages like this are pretty scary. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From tim.hochberg at cox.net Wed Apr 12 21:59:01 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 12 21:59:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443DA866.3090806@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> <443D96DC.3060501@ee.byu.edu> <443DA1B1.8040406@cox.net> <443DA866.3090806@ee.byu.edu> Message-ID: <443DD9D9.9080004@cox.net> Travis Oliphant wrote: > Tim Hochberg wrote: > >> Travis Oliphant wrote: >> >>> Travis Oliphant wrote: >>> >>>> >>>> The next release of NumPy will be 0.9.8 >>>> >>>> Before this release is made, I want to make sure the following >>>> tickets are implemented >>>> >>>> http://projects.scipy.org/scipy/numpy/ticket/54 >>>> http://projects.scipy.org/scipy/numpy/ticket/55 >>>> http://projects.scipy.org/scipy/numpy/ticket/56 >>> >>> >>> >>> >>> >>> So you don't have to read each one individually: >>> >>> >>> #54 : implement thread-based error-handling modes >>> #55 : finish scalar-math implementation which recognizes same >>> error-handling >>> #56 : implement rich_comparisons on string arrays and unicode arrays. >> >> >> >> I'll help with #54 at least, since I was the complainer, er I mean, >> since I brought that one up. It's probably better to get that started >> before #55 anyway. The open issues that I see connected to this are: > > > Great. I agree that #54 needs to be done before #55 (error handling > is what's been holding up #55 the whole time. > >> >> 1. Better support for catching integer divide by zero. That >> doesn't work at all here, > > > Probably a platform/compiler issue. The numarray equivalent code had > an if statement to prevent the compiler from optimizing it away. > Perhaps we need to do something like that. Also, perhaps VC7 has > some means to set the divide by zero error more directly and we can > just use that. > >> I'm guessing because my optimizer is too smart. I spent a half hour >> this morning trying how to set the divide by zero flag directly using >> VC7, but I couldn't find anything. I suppose I could see if there's >> some pragma to turn off optimization around that one function. >> However, I'm interested in what you think of stuffing the integer >> divide by zero information directly into a flag on the thread local >> object and then checking it on the way out. > > > > Hmm.. The only issue is that dictionary look-ups are more expensive > then register look-ups. This could be costly. > > >> This is cleaner in that it doesn't rely on platform specific flag >> setting ifdeffery and it allows us to consider issue #2. >> >> 2. Breaking integer divide by zero out from floating point divide >> by zero. The former is more serious in that it's silent. The latter >> returns INF, so you can see that something happened by examing your >> results, while the former returns zero. That has much more potential >> for confusion and silents bugs. Thus, it seems reasonable to be able >> to set the error handling different for integer divide by zero and >> floating point divide by zero. Note that this would allow integer >> divide by zero to be set to 'raise' and still run all the FP ops at >> max speed, since the flag saying do no error checking could ignore >> the int_divide_by_zero setting. > > > > Interesting proposal. Yes, it is true that integer division > returning zero is less well-justified. But, I'm still concerned with > doing a dictionary lookup for every divide-by-zero, and (more > importantly) to check to see if a divide-by-zero has occurred. The > dictionary lookups is the largest source of small-array slow-down when > comparing Numeric to NumPy. Well, assuming that we can fix the error flag setting code here, we could still break the divide by zero error handling out by doing some special casing in the ufunc machinery since the ufuncs presumably can figure out there own types. Still, the thread local storage option is cleaner if we can figure out a way to make the dictionary lookups fast enough. The lookup in the failing case is not a big deal I don't think. First, it's normally an error so I don't mind introduce some slowing. Second ,it should be easy to only do the lookup once. Just have a flag that enusres that after the first lookup, the divided by zero flag is not set a second time. I guess the bigger issue is the lookup on the way out to see if anything failed. I have a plane, which I'll present at the bottom. >> >> 3. Tossing out the overflow checking on integer operations. It's >> incomplete anyway and it slows things down. I don't really expect my >> integer operations to be overflow checked, and personally I think >> that incomplete checking is worse than no checking. I think we should >> at least disable the support for the time being and possibly revisit >> this latter when we have time to do a complete job and if it seems >> necessary. > > > I'm all for that. I think it makes the code slower and because it is > incomplete (addition and subtraction don't do it), it makes for > harder-to-explain code. > > On the scalar operations, we should check for over-flow, however... OK. > >> >> 4. Different defaults I'd like to enable different defaults without >> slowing things down in the really super fast case. > > > > The discussion on different defaults is fine. The slow-down is that > with the current defaults, the error register flags are not actually > checked if the default has not been changed. With the > numarray-defaults, the register flags would be checked at the end of > each 1-d loop. I'm not sure what kind of slow-down that would > bring. Certainly for 1-d cases, there would be little difference. > > One could actually simply store different defaults (but it would > result in minor slow-downs because the register flags would be checked. > OK, here's my plan. It sounds like it will work, but this threading business is always tricky so find holes in it if you can. 1. As we've discussed we grow some thread local storage. This storage has flags check_divide, check_over, check_under, check_invalid and check_int_divide. It also has a flag int_divide_err. These flags are initialized to False, but then may immediately be set to a different default value. This is to simplify #3. 2. We grow 6 static longs that correspond to the above and are initialized to zero. They should be called check_divide_count, etc. or something similar. 3. Whenever a flag is switched from False to True it's corresponding global is incremented. Similarly, when switched from True to False the global is decremented. 4. When a divide by integer zero occurs, we check the int_divide_err flag. If it is false, we set it to true and also increment int_divide_err_count. We also set a local flag so that we don't do this again in that call to the ufunc core function. We can actually skip this whole step if check_int_divide_count is zero. With all that in place, I think we should be able to do things efficiently. The ufunc can check whether any of the XXX_check_counts are nonzero and turn on register flag checking as appropriate. If an error occurs, it still only has to go to the per thread dictionary if the count for that particular error type is nonzero. Similarly, if the count int_divide_err_count is nonzero, the ufunc will have to go to the dictionary. If the error was set in this thread, then appropriate action (including possibly nothing) is taken and int_divide_err_count is decremented. That all sounds more complicated than it really is, at least in my head ;) Anyway, try to find the holes in it. It should be able to run at full speed if you turn off error checking in all threads. It should run at almost full speed as long as there aren't any errors that are being checked in *any thread*. I think in practice this means that most of the speed hit that is seen in numarray won't be here. It doesn't actually matter what the defaults are; turning off all error checking will still be fast. Regards, -tim > > > > From winnieshop888 at yahoo.com.cn Wed Apr 12 22:02:02 2006 From: winnieshop888 at yahoo.com.cn (winnie) Date: Wed Apr 12 22:02:02 2006 Subject: [Numpy-discussion] Rash Guard Message-ID: The products name :Rash Guard The price :USD4.50/pc (with shiiping cost) The qty : 200pcs The size :XL,L,M,and S see the attached www.rmb.com.hk Thanks, winnie From shetbest at 163.com Wed Apr 12 23:30:04 2006 From: shetbest at 163.com (=?GB2312?B?NNTCMTUtMTbJz7qjLzIxLTIyye7b2g==?=) Date: Wed Apr 12 23:30:04 2006 Subject: [Numpy-discussion] =?GB2312?B?QUTUy9PDRVhDRUy02b34ytCzodOqz/rT67LGzvG53MDt?= Message-ID: An HTML attachment was scrubbed... URL: From arnd.baecker at web.de Thu Apr 13 00:58:04 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 13 00:58:04 2006 Subject: [Numpy-discussion] Massive differences in numpy vs. numeric string handling In-Reply-To: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> References: <3C3B42A3-962D-4F34-B704-AB1BF0E2390A@gmail.com> Message-ID: On Wed, 12 Apr 2006, Jeremy Gore wrote: > In Numeric: [...] > but in numpy: [...] > For the record, I have used the Numeric (and to a lesser degree the > numarray) module extensively in bioinformatics applications for its > speed and brevity. If (after this round of discussion) there remain any differences, it would be good if you could add them to the wiki at http://www.scipy.org/Converting_from_Numeric Best, Arnd P.S.: The same applies of course to any other differences which show up! From svetosch at gmx.net Thu Apr 13 01:20:02 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Thu Apr 13 01:20:02 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: <443E096D.3040407@gmx.net> Travis Oliphant schrieb: > > The next release of NumPy will be 0.9.8 > > The recent discussions and bug-reports have been very helpful. If you > have found a bug, please report it on the Trac pages so that we don't > lose sight of it. > Report bugs by "submitting a ticket" here: > Before submitting the following as a bug, I would like to repeat what I posted earlier (no replies) to check whether you agree it's a bug: The "kron" (Kronecker product) function returns numpy-arrays even if both arguments are numpy-matrices; imho that's a bug in light of the proclaimed goal of preserving matrices where possible/sensible. On a related issue, "eye" also still returns a numpy-array instead of a numpy-matrix. At least one person (I think it was Ed Schofield) agreed that it would be better to return a numpy-matrix, given that another function ("identity") already returns a numpy-array. Currently, one of the two functions seems redundant. So unless somebody tells me otherwise, I will submit these two things as bugs/tickets. Great that numpy soon will be officially stable! Cheers, Sven From pgmdevlist at mailcan.com Thu Apr 13 01:41:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 13 01:41:02 2006 Subject: [Numpy-discussion] range/arange Message-ID: <200604130507.40241.pgmdevlist@mailcan.com> Folks, Could any of you explain me why the two following commands give different results ? It's mere curiosity, for my personal edification. [(m-5)/10 for m in arange(1,10)] [0, 0, 0, 0, 0, 0, 0, 0, 0] [(m-5)/10 for m in range(1,10)] [-1, -1, -1, -1, 0, 0, 0, 0, 0] From lars.bittrich at googlemail.com Thu Apr 13 02:30:01 2006 From: lars.bittrich at googlemail.com (Lars Bittrich) Date: Thu Apr 13 02:30:01 2006 Subject: [Numpy-discussion] range/arange In-Reply-To: <200604130507.40241.pgmdevlist@mailcan.com> References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: <200604131123.56171.lars.bittrich@googlemail.com> Hi, On Thursday 13 April 2006 11:07, Pierre GM wrote: > Could any of you explain me why the two following commands give different > results ? It's mere curiosity, for my personal edification. > > [(m-5)/10 for m in arange(1,10)] > [0, 0, 0, 0, 0, 0, 0, 0, 0] > > [(m-5)/10 for m in range(1,10)] > [-1, -1, -1, -1, 0, 0, 0, 0, 0] I have no idea where the reason is located exactly, but it seems to be caused by different types of range and arange. In [15]:type(arange(1,10)[0]) Out[15]: In [14]:type(range(1,10)[0]) Out[14]: If you use for example: In [16]:-1/10 Out[16]:-1 you get the normal behavior of the "floor" function. In [17]:floor(-.1) Out[17]:-1.0 The behavior of int32scalar seems more intuitive to me. Best regards, Lars From robert.kern at gmail.com Thu Apr 13 05:17:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 13 05:17:05 2006 Subject: [Numpy-discussion] Re: range/arange In-Reply-To: <200604130507.40241.pgmdevlist@mailcan.com> References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: Pierre GM wrote: > Folks, > Could any of you explain me why the two following commands give different > results ? It's mere curiosity, for my personal edification. > > [(m-5)/10 for m in arange(1,10)] > [0, 0, 0, 0, 0, 0, 0, 0, 0] > > [(m-5)/10 for m in range(1,10)] > [-1, -1, -1, -1, 0, 0, 0, 0, 0] Python's rule for integer division is to round towards negative infinity. C's rule (if it has one; I think it may be platform dependent) is to round towards 0. When it comes to arithmetic, numpy tends to expose the C behavior because it's fastest. As Lars pointed out, the type of the object that you get from iterating over an array is a numpy int32scalar object, so the numpy behavior is used. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From fullung at gmail.com Thu Apr 13 05:18:04 2006 From: fullung at gmail.com (Albert Strasheim) Date: Thu Apr 13 05:18:04 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple Message-ID: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> Hello all, The following segfault bug was discovered in NumPy 0.9.7.2348 by someone at our Python workshop: import numpy as N F = N.zeros((1,1)) F[:,[0]] = 0 The following also segfaults: F[:,(0,)] = 0 Something seems to go wrong when one uses a tuple or a list to index into a NumPy array on the second or higher dimension, since the following code works: F = N.zeros((1,)) F[[0]] = 0 The Trac ticket is here: http://projects.scipy.org/scipy/numpy/ticket/59 If someone gets around to fixing this, please include some test cases. Thanks! Regards, Albert From cimrman3 at ntc.zcu.cz Thu Apr 13 05:24:02 2006 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu Apr 13 05:24:02 2006 Subject: [Numpy-discussion] Re: ***[Possible UCE]*** [SciPy-user] Regarding what "where" returns In-Reply-To: <443D857F.9000605@ee.byu.edu> References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> <443D857F.9000605@ee.byu.edu> Message-ID: <443E42A2.80402@ntc.zcu.cz> Travis Oliphant wrote: > I went ahead and made this change to the code. The nonzero function > still behaves as before (and in fact only works for 1-d arrays as it did > in Numeric). > > The where(condition) function works the same as condition.nonzero() and > both always return a tuple. So, for 1-d arrays, using 'nonzero( condition )' should be faster than 'where( condition )[0]', right? r. From charlesr.harris at gmail.com Thu Apr 13 05:35:13 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 05:35:13 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443E096D.3040407@gmx.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> Message-ID: Sven, On 4/13/06, Sven Schreiber wrote: > > Travis Oliphant schrieb: > > > > The next release of NumPy will be 0.9.8 > > > > > The recent discussions and bug-reports have been very helpful. If you > > have found a bug, please report it on the Trac pages so that we don't > > lose sight of it. > > Report bugs by "submitting a ticket" here: > > > > Before submitting the following as a bug, I would like to repeat what I > posted earlier (no replies) to check whether you agree it's a bug: > > The "kron" (Kronecker product) function returns numpy-arrays even if > both arguments are numpy-matrices; imho that's a bug in light of the > proclaimed goal of preserving matrices where possible/sensible. What would you do instead? The Kronecker product (aka Tensor product) of two matrices isn't a matrix. I suppose you could make it one by appealing to the universal property -- bilinear map on the Cartesian product of linear spaces -> linear map on the tensor product of linear spaces -- but that seems a bit abstract for numpy and you would need to define the indices of the resulting object as some sort of pair. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pjssilva at ime.usp.br Thu Apr 13 05:51:02 2006 From: pjssilva at ime.usp.br (Paulo Jose da Silva e Silva) Date: Thu Apr 13 05:51:02 2006 Subject: [Numpy-discussion] Re: range/arange In-Reply-To: References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: <1144932598.16449.5.camel@localhost.localdomain> Em Qui, 2006-04-13 ?s 07:15 -0500, Robert Kern escreveu: > > Python's rule for integer division is to round towards negative infinity. C's > rule (if it has one; I think it may be platform dependent) is to round towards > 0. When it comes to arithmetic, numpy tends to expose the C behavior because > it's fastest. As Lars pointed out, the type of the object that you get from > iterating over an array is a numpy int32scalar object, so the numpy behavior is > used. > Actually, in C99 standard the division was defined to truncate towards zero always, see item 25 in: http://home.datacomm.ch/t_wolf/tw/c/c9x_changes.html So it is not platform dependent anymore. Paulo Obs: It once was platform dependent. Old gcc (for Linux) would truncate towards infinity. I know this because of a "bug" in somebody else's code. I took me a quite some time to discover that the problem was the shift in gcc behavior in this matter. From tejeda at clubspit.com Thu Apr 13 06:17:03 2006 From: tejeda at clubspit.com (Socorro Tejeda) Date: Thu Apr 13 06:17:03 2006 Subject: [Numpy-discussion] Re: your news Message-ID: <000001c65efc$5afb0e50$f914a8c0@sfb92> A M B r I E N X A q N A X C I A f L I S V o I A G R A V b A L I U M http://www.korbahcut.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Thu Apr 13 07:02:11 2006 From: aisaac at american.edu (Alan G Isaac) Date: Thu Apr 13 07:02:11 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> Message-ID: On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > The Kronecker product (aka Tensor product) of two > matrices isn't a matrix. That is an unusual way to describe things in the world of econometrics. Here is a more common way: http://planetmath.org/encyclopedia/KroneckerProduct.html I share Sven's expectation. Cheers, Alan Isaac From fullung at gmail.com Thu Apr 13 07:24:02 2006 From: fullung at gmail.com (Albert Strasheim) Date: Thu Apr 13 07:24:02 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple In-Reply-To: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> References: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> Message-ID: <20060413142246.GA6870@dogbert.sdsl.sun.ac.za> Hello all I've attached a test case that reproduces the bug to the ticket: http://projects.scipy.org/scipy/numpy/attachment/ticket/59/test_list_tuple_indexing.diff I've also created a test case for the recent vectorize bug: http://projects.scipy.org/scipy/numpy/attachment/ticket/52/test_vectorize.diff Regards, Albert On Thu, 13 Apr 2006, Albert Strasheim wrote: > Hello all, > > The following segfault bug was discovered in NumPy 0.9.7.2348 by > someone at our Python workshop: > > import numpy as N > F = N.zeros((1,1)) > F[:,[0]] = 0 > > The following also segfaults: > > F[:,(0,)] = 0 > > Something seems to go wrong when one uses a tuple or a list to index > into a NumPy array on the second or higher dimension, since the > following code works: > > F = N.zeros((1,)) > F[[0]] = 0 > > The Trac ticket is here: > > http://projects.scipy.org/scipy/numpy/ticket/59 > > If someone gets around to fixing this, please include some test cases. > > Thanks! > > Regards, > > Albert > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From oliphant.travis at ieee.org Thu Apr 13 07:58:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 07:58:05 2006 Subject: [Numpy-discussion] index objects are not broadcastable to a single shape In-Reply-To: <20060413133326.2889a5c5.simon@arrowtheory.com> References: <20060413111612.3bb4e6fc.simon@arrowtheory.com> <443DA966.1020301@ee.byu.edu> <20060413133326.2889a5c5.simon@arrowtheory.com> Message-ID: <443E66AC.2020108@ieee.org> Simon Burton wrote: > On Wed, 12 Apr 2006 19:29:10 -0600 > Travis Oliphant wrote: > > >> The problem with these error messages is that some code is used in a >> wide-variety of circumstances. The original error message was conceived >> in thinking about the application of the code to one circumstance while >> this particular error is occurring in a different one. >> >> The standard behavior is to just propagate the error up. Better error >> messages means catching a lot more errors and special-casing error >> messages. It can be done, but it's tedious work. >> > > OK. Can the error message be a little more generic, longer, etc. ? > > Absolutely, I should have finished the above message with an appeal for more helpful generic messages. All suggestions are welcome. > "shape mismatch (index objects are not broadcastable to a single shape)" ? > Definitely better. I would probably drop the index qualifier as well. Thanks for the tip. -Travis From oliphant.travis at ieee.org Thu Apr 13 08:16:13 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 08:16:13 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple In-Reply-To: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> References: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> Message-ID: <443E6B01.7000906@ieee.org> Albert Strasheim wrote: > Hello all, > > The following segfault bug was discovered in NumPy 0.9.7.2348 by > someone at our Python workshop: > > import numpy as N > F = N.zeros((1,1)) > F[:,[0]] = 0 > > The following also segfaults: > > F[:,(0,)] = 0 > > Something seems to go wrong when one uses a tuple or a list to index > into a NumPy array on the second or higher dimension, since the > following code works: > > The segfault was due to an error condition not being caught. This is now fixed, so now you get (a rather cryptic error). Now, to figure out why this code doesn't work.... -Travis From oliphant.travis at ieee.org Thu Apr 13 08:29:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 08:29:01 2006 Subject: [Numpy-discussion] Segfault when indexing on second or higher dimension with list or tuple In-Reply-To: <443E6B01.7000906@ieee.org> References: <20060413121710.GA30372@dogbert.sdsl.sun.ac.za> <443E6B01.7000906@ieee.org> Message-ID: <443E6DF1.5020206@ieee.org> Travis Oliphant wrote: > Albert Strasheim wrote: >> Hello all, >> >> The following segfault bug was discovered in NumPy 0.9.7.2348 by >> someone at our Python workshop: >> >> import numpy as N >> F = N.zeros((1,1)) >> F[:,[0]] = 0 >> >> The following also segfaults: >> >> F[:,(0,)] = 0 >> >> Something seems to go wrong when one uses a tuple or a list to index >> into a NumPy array on the second or higher dimension, since the >> following code works: >> >> > The segfault was due to an error condition not being caught. This is > now fixed, so now you get (a rather cryptic error). Now, to figure > out why this code doesn't work.... > The problem is that the code is not handling arbitrary shapes on the RHS of the equal sign. I'll enter a ticket and fix this before 0.9.8. Basically, right now, the RHS needs to have the same shape as the LHS so F[:,[0]] = [[0]] should work already. -Travis From oliphant.travis at ieee.org Thu Apr 13 08:43:14 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 13 08:43:14 2006 Subject: [Numpy-discussion] Re: ***[Possible UCE]*** [SciPy-user] Regarding what "where" returns In-Reply-To: <443E42A2.80402@ntc.zcu.cz> References: <443C0D36.80608@enthought.com> <443D39F6.6040805@enthought.com> <443D601E.3020500@enthought.com> <443D857F.9000605@ee.byu.edu> <443E42A2.80402@ntc.zcu.cz> Message-ID: <443E7150.2010006@ieee.org> Robert Cimrman wrote: > Travis Oliphant wrote: >> I went ahead and made this change to the code. The nonzero >> function still behaves as before (and in fact only works for 1-d >> arrays as it did in Numeric). >> >> The where(condition) function works the same as condition.nonzero() >> and both always return a tuple. > > So, for 1-d arrays, using 'nonzero( condition )' should be faster than > 'where( condition )[0]', right? > No. since the function just selects off the first element of the tuple returned by the method... 'condition.nonzero()[0]' may be *slightly* faster than 'where(condition)[0]' however -Travis From tim.hochberg at cox.net Thu Apr 13 08:44:47 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 13 08:44:47 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> Message-ID: <443E7109.6080808@cox.net> Alan G Isaac wrote: >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > >>The Kronecker product (aka Tensor product) of two >>matrices isn't a matrix. >> >> > >That is an unusual way to describe things in >the world of econometrics. Here is a more >common way: >http://planetmath.org/encyclopedia/KroneckerProduct.html >I share Sven's expectation. > > mathworld also agrees with you. As does the documentation (as best as I can tell) and the actual output of kron. I think Charles must be thinking of the tensor product instead. In fact, if you look at the code you see this: # TODO: figure out how to keep arrays the same I think that in general this is going to be a bit of an issue whenever we have multiple arguments. Let me propose the world's second dumbest (in a good way, maybe) procedure: def kron(a,b): wrappers = [(getattr(x, '__array_priority__', 0), x.__array_wrap__) for x in [a,b] if hasattr(x, '__array_wrap__')] if wrappers: priority, wrap = wrappers[-1] else: wrap = None # .... result = concatenate(concatenate(o, axis=1), axis=1) if wrap is not None: result = wrap(result) return result This generalizes what _wrapit does for arbitrary arguments. It breaks 'ties' where more than one argument wants to wrap something by using __array_priority__. You'd actually want to factor out the wrapper finding code. This generalized what _wrapit does to multiple dimensions. Thought? Better plans? -tim From ryanlists at gmail.com Thu Apr 13 09:11:10 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 09:11:10 2006 Subject: [Numpy-discussion] where Message-ID: Can someone help me understand the proper use of where? I want to use it like this myvect=where(f>19.5 and phase>0, f, phase) but I seem to be getting or rather than and. Thanks, Ryan From oliphant at ee.byu.edu Thu Apr 13 09:18:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 09:18:05 2006 Subject: [Numpy-discussion] where In-Reply-To: References: Message-ID: <443E79A5.2000700@ee.byu.edu> Ryan Krauss wrote: >Can someone help me understand the proper use of where? > >I want to use it like this > >myvect=where(f>19.5 and phase>0, f, phase) > >but I seem to be getting or rather than and. > > > It is probably your use of the 'and' statement. Use '&' instead (f > 19.5) & (phase > 0) What version are you using. In numarray and NumPy the use of 'and' like this should raise an error if 'f' and/or 'phase' are arrays of more than one element. -Travis From ryanlists at gmail.com Thu Apr 13 09:27:06 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 09:27:06 2006 Subject: [Numpy-discussion] where In-Reply-To: <443E79A5.2000700@ee.byu.edu> References: <443E79A5.2000700@ee.byu.edu> Message-ID: Does where return a mask? If I do myvect=where((f > 19.5) & (phase > 0),f,phase) myvect is the same length as f and phase and there is some modification of the values where the condition is met, but what that modification is is unclear to me. If I do myind=where((f > 19.5) & (phase > 0)) I seem to get the indices of the points where both conditions are met. I am using version 0.9.5.2043. I see those kinds of errors about truth testing an array often, but not in this case. Thanks, Ryan On 4/13/06, Travis Oliphant wrote: > Ryan Krauss wrote: > > >Can someone help me understand the proper use of where? > > > >I want to use it like this > > > >myvect=where(f>19.5 and phase>0, f, phase) > > > >but I seem to be getting or rather than and. > > > > > > > It is probably your use of the 'and' statement. Use '&' instead > > (f > 19.5) & (phase > 0) > > What version are you using. In numarray and NumPy the use of 'and' like > this should raise an error if 'f' and/or 'phase' are arrays of more than > one element. > > -Travis > > From oliphant at ee.byu.edu Thu Apr 13 09:39:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 09:39:04 2006 Subject: [Numpy-discussion] where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> Message-ID: <443E7E7B.2030203@ee.byu.edu> Ryan Krauss wrote: >Does where return a mask? > > Only in the second use case... >If I do >myvect=where((f > 19.5) & (phase > 0),f,phase) >myvect is the same length as f and phase and there is some >modification of the values where the condition is met, but what that >modification is is unclear to me. > > The behavior of where(condition, for_true, for_false) is to return an array of the same shape as condition with elements of for_true where condition is true and for_false where condition is false. Thus myvect will contain elements of f where the condition is met and elements of phase otherwise. >If I do >myind=where((f > 19.5) & (phase > 0)) >I seem to get the indices of the points where both conditions are met. > > Yes. That is correct. It is a different use-case... Note, however, that in the current SVN version of NumPy, this use-case will always return a tuple of indices (use the nonzero function instead for behavior that will stay constant). For your 1-d example (I'm guessing it's 1-d) where will return a length-1 tuple. >I am using version 0.9.5.2043. I see those kinds of errors about >truth testing an array often, but not in this case. > > That is strange. What are the sizes of f and phase? -Travis From robert.kern at gmail.com Thu Apr 13 09:42:04 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 13 09:42:04 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> Message-ID: Ryan Krauss wrote: > Does where return a mask? > > If I do > myvect=where((f > 19.5) & (phase > 0),f,phase) > myvect is the same length as f and phase and there is some > modification of the values where the condition is met, but what that > modification is is unclear to me. > > If I do > myind=where((f > 19.5) & (phase > 0)) > I seem to get the indices of the points where both conditions are met. > > I am using version 0.9.5.2043. I see those kinds of errors about > truth testing an array often, but not in this case. Have you read the docstring? In [33]: where? Type: builtin_function_or_method Base Class: String Form: Namespace: Interactive Docstring: where(condition, | x, y) is shaped like condition and has elements of x and y where condition is respectively true or false. If x or y are not given, then it is equivalent to nonzero(condition). -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ryanlists at gmail.com Thu Apr 13 09:44:01 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 09:44:01 2006 Subject: [Numpy-discussion] where In-Reply-To: <443E7E7B.2030203@ee.byu.edu> References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: f and phase are each (4250,) I have something that is working but doesn't use where. Can this be done easier using where: f1=f>19.5 f2=f<38 myf=f1&f2 myp=phase>0 myind=myf&myp correction=myind*-360 newphase=phase+correction Basically, can where give me an output vector of the same size as f and phase where the output is either 1 or 0? Ryan On 4/13/06, Travis Oliphant wrote: > Ryan Krauss wrote: > > >Does where return a mask? > > > > > Only in the second use case... > > >If I do > >myvect=where((f > 19.5) & (phase > 0),f,phase) > >myvect is the same length as f and phase and there is some > >modification of the values where the condition is met, but what that > >modification is is unclear to me. > > > > > > The behavior of > > where(condition, for_true, for_false) > > is to return an array of the same shape as condition with elements of > for_true where condition is true and > for_false where condition is false. > > Thus myvect will contain elements of f where the condition is met and > elements of phase otherwise. > > >If I do > >myind=where((f > 19.5) & (phase > 0)) > >I seem to get the indices of the points where both conditions are met. > > > > > Yes. That is correct. It is a different use-case... Note, however, > that in the current SVN version of NumPy, this use-case will always > return a tuple of indices (use the nonzero function instead for behavior > that will stay constant). For your 1-d example (I'm guessing it's 1-d) > where will return a length-1 tuple. > > >I am using version 0.9.5.2043. I see those kinds of errors about > >truth testing an array often, but not in this case. > > > > > That is strange. What are the sizes of f and phase? > > -Travis > > From robert.kern at gmail.com Thu Apr 13 09:54:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 13 09:54:05 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: Ryan Krauss wrote: > f and phase are each (4250,) > > I have something that is working but doesn't use where. Can this be > done easier using where: > > f1=f>19.5 > f2=f<38 > myf=f1&f2 > myp=phase>0 > myind=myf&myp > correction=myind*-360 > newphase=phase+correction (untested) phase[((f>19.5) & (f<38)) & (phase>0)] -= 360 > Basically, can where give me an output vector of the same size as f > and phase where the output is either 1 or 0? Why? The condition array that you would pass into where() is already such an array. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From arnd.baecker at web.de Thu Apr 13 10:07:14 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 13 10:07:14 2006 Subject: [Numpy-discussion] range/arange In-Reply-To: <200604131123.56171.lars.bittrich@googlemail.com> References: <200604130507.40241.pgmdevlist@mailcan.com> <200604131123.56171.lars.bittrich@googlemail.com> Message-ID: On Thu, 13 Apr 2006, Lars Bittrich wrote: > Hi, > > On Thursday 13 April 2006 11:07, Pierre GM wrote: > > Could any of you explain me why the two following commands give different > > results ? It's mere curiosity, for my personal edification. > > > > [(m-5)/10 for m in arange(1,10)] > > [0, 0, 0, 0, 0, 0, 0, 0, 0] > > > > [(m-5)/10 for m in range(1,10)] > > [-1, -1, -1, -1, 0, 0, 0, 0, 0] > > I have no idea where the reason is located exactly, but it seems to be caused > by different types of range and arange. Interestingly with Numeric you get the following: In [1]: from Numeric import * In [2]: [(m-5)/10 for m in arange(1,10)] Out[2]: [-1, -1, -1, -1, 0, 0, 0, 0, 0] In [3]: type(arange(1,10)[0]) Out[3]: Will this cause any trouble for projects transitioning from Numeric to numpy? Presumably a proper explanation (which?) should go into the scipy wiki ("Converting from Numeric"). > In [15]:type(arange(1,10)[0]) > Out[15]: > > In [14]:type(range(1,10)[0]) > Out[14]: > > If you use for example: > > In [16]:-1/10 > Out[16]:-1 > > you get the normal behavior of the "floor" function. > > In [17]:floor(-.1) > Out[17]:-1.0 > > The behavior of int32scalar seems more intuitive to me. Me too. Best, Arnd From ryanlists at gmail.com Thu Apr 13 10:12:06 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 10:12:06 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> Message-ID: Sorry, I can't explain myself. I read the docstring and it didn't make sense before. Now it seems clear enough. Some how I got it in my head that I needed to be passing f and phase so that condition could use them. It turns out that this: myvect=where((f>19.5) & (f<38) & (phase>0),ones(shape(phase)),zeros(shape(phase))) does exactly what I want. Ryan On 4/13/06, Robert Kern wrote: > Ryan Krauss wrote: > > Does where return a mask? > > > > If I do > > myvect=where((f > 19.5) & (phase > 0),f,phase) > > myvect is the same length as f and phase and there is some > > modification of the values where the condition is met, but what that > > modification is is unclear to me. > > > > If I do > > myind=where((f > 19.5) & (phase > 0)) > > I seem to get the indices of the points where both conditions are met. > > > > I am using version 0.9.5.2043. I see those kinds of errors about > > truth testing an array often, but not in this case. > > Have you read the docstring? > > In [33]: where? > Type: builtin_function_or_method > Base Class: > String Form: > Namespace: Interactive > Docstring: > where(condition, | x, y) is shaped like condition and has elements of x and > y where condition is respectively true or false. If x or y are not given, then > it is equivalent to nonzero(condition). > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ryanlists at gmail.com Thu Apr 13 10:15:03 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 10:15:03 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: > Why? The condition array that you would pass into where() is already such an array. That is the key point I was missing. Until I played around with the conditions myself I didn't get that I was passing in an explicit array of 1's and 0's. I guess I thought I was passing in some magic expression that where was some how making sense. That is why I thought I would need to pass f and phase to the function. Ryan On 4/13/06, Robert Kern wrote: > Ryan Krauss wrote: > > f and phase are each (4250,) > > > > I have something that is working but doesn't use where. Can this be > > done easier using where: > > > > f1=f>19.5 > > f2=f<38 > > myf=f1&f2 > > myp=phase>0 > > myind=myf&myp > > correction=myind*-360 > > newphase=phase+correction > > (untested) > phase[((f>19.5) & (f<38)) & (phase>0)] -= 360 > > > Basically, can where give me an output vector of the same size as f > > and phase where the output is either 1 or 0? > > Why? The condition array that you would pass into where() is already such an array. > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ryanlists at gmail.com Thu Apr 13 10:17:14 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 13 10:17:14 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: <443E79A5.2000700@ee.byu.edu> <443E7E7B.2030203@ee.byu.edu> Message-ID: which makes this: myvect=where((f>19.5) & (f<38) & (phase>0),ones(shape(phase)),zeros(shape(phase))) actually really silly, sense all it is a complicated way to get back the input of (f>19.5) & (f<38) & (phase>0) Ryan On 4/13/06, Ryan Krauss wrote: > > Why? The condition array that you would pass into where() is already such an array. > > That is the key point I was missing. Until I played around with the > conditions myself I didn't get that I was passing in an explicit array > of 1's and 0's. I guess I thought I was passing in some magic > expression that where was some how making sense. That is why I > thought I would need to pass f and phase to the function. > > Ryan > > On 4/13/06, Robert Kern wrote: > > Ryan Krauss wrote: > > > f and phase are each (4250,) > > > > > > I have something that is working but doesn't use where. Can this be > > > done easier using where: > > > > > > f1=f>19.5 > > > f2=f<38 > > > myf=f1&f2 > > > myp=phase>0 > > > myind=myf&myp > > > correction=myind*-360 > > > newphase=phase+correction > > > > (untested) > > phase[((f>19.5) & (f<38)) & (phase>0)] -= 360 > > > > > Basically, can where give me an output vector of the same size as f > > > and phase where the output is either 1 or 0? > > > > Why? The condition array that you would pass into where() is already such an array. > > > > -- > > Robert Kern > > robert.kern at gmail.com > > > > "I have come to believe that the whole world is an enigma, a harmless enigma > > that is made terrible by our own mad attempt to interpret it as though it had > > an underlying truth." > > -- Umberto Eco > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > > that extends applications into web and mobile media. Attend the live webcast > > and join the prime developer group breaking into this new coding territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From oliphant at ee.byu.edu Thu Apr 13 10:49:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 10:49:06 2006 Subject: [Numpy-discussion] range/arange In-Reply-To: References: <200604130507.40241.pgmdevlist@mailcan.com> <200604131123.56171.lars.bittrich@googlemail.com> Message-ID: <443E8EEB.9070609@ee.byu.edu> Arnd Baecker wrote: >On Thu, 13 Apr 2006, Lars Bittrich wrote: > > > >>Hi, >> >>On Thursday 13 April 2006 11:07, Pierre GM wrote: >> >> >>>Could any of you explain me why the two following commands give different >>>results ? It's mere curiosity, for my personal edification. >>> >>>[(m-5)/10 for m in arange(1,10)] >>>[0, 0, 0, 0, 0, 0, 0, 0, 0] >>> >>>[(m-5)/10 for m in range(1,10)] >>>[-1, -1, -1, -1, 0, 0, 0, 0, 0] >>> >>> >>I have no idea where the reason is located exactly, but it seems to be caused >>by different types of range and arange. >> >> > > >Interestingly with Numeric you get the following: > >In [1]: from Numeric import * >In [2]: [(m-5)/10 for m in arange(1,10)] >Out[2]: [-1, -1, -1, -1, 0, 0, 0, 0, 0] >In [3]: type(arange(1,10)[0]) >Out[3]: > >Will this cause any trouble for projects >transitioning from Numeric to numpy? >Presumably a proper explanation (which?) >should go into the scipy wiki ("Converting from Numeric"). > > > Yes, some discussion will be needed about the fact that NumPy now has its own scalars. This will give us quite a bit more flexibility moving forward and should be seamless for the most part. -Travis From pgmdevlist at mailcan.com Thu Apr 13 11:29:09 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Thu Apr 13 11:29:09 2006 Subject: [Numpy-discussion] Re: range/arange In-Reply-To: References: <200604130507.40241.pgmdevlist@mailcan.com> Message-ID: <200604131456.48570.pgmdevlist@mailcan.com> > Python's rule for integer division is to round towards negative infinity. > C's rule (if it has one; I think it may be platform dependent) is to round > towards 0. Ah OK. That makes sense, and it's something I'll have to keep in mind later on. Thanks y'all for your answers, I feel quite edified now :) From ndarray at mac.com Thu Apr 13 11:53:00 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 13 11:53:00 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443D9543.8040601@ee.byu.edu> References: <443D9543.8040601@ee.byu.edu> Message-ID: On 4/12/06, Travis Oliphant wrote: > ... This also dove-tails nicely > with the Python 2.5 release schedule so that NumPy 1.0 should work with > Python 2.5 and be fully 64-bit capable for handling very-large arrays. > I would like to mention one feature that is going to appear in Python 2.5 that is covering some of the functionality of NumPy. I am talking about the ctypes module . Like NumPy, ctypes provides a set of python classes that represent basic C types: c_byte c_char c_char_p c_double c_float c_int c_long c_short c_ubyte ... and the ability to describe composite structures. The later functionality is very close to what dtype class provides in numpy. There are some features in ctype that I like better than similar features in numpy. For example, in ctypes a fixed width array is described by multiplying basic type by an integer: >>> c_char * 10 I find this approach more elegant than numpy's dtype('S10'). It looks like there is some synergy to be exploited here, particularly in the area of record arrays. From oliphant at ee.byu.edu Thu Apr 13 12:49:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 13 12:49:02 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> Message-ID: <443EAB01.8040700@ee.byu.edu> Sasha wrote: >On 4/12/06, Travis Oliphant wrote: > > >>... This also dove-tails nicely >>with the Python 2.5 release schedule so that NumPy 1.0 should work with >>Python 2.5 and be fully 64-bit capable for handling very-large arrays. >> >> >> > >I would like to mention one feature that is going to appear in Python >2.5 that is covering some of the functionality of NumPy. I am talking >about the ctypes module >. Like >NumPy, ctypes provides a set of python classes that represent basic C >types: > > c_byte > c_char > c_char_p > c_double > c_float > c_int > c_long > c_short > c_ubyte > ... > >and the ability to describe composite structures. The later >functionality is very close to what dtype class provides in numpy. > >There are some features in ctype that I like better than similar >features in numpy. For example, in ctypes a fixed width array is >described by multiplying basic type by an integer: > > >>>>c_char * 10 >>>> >>>> > > >I find this approach more elegant than numpy's dtype('S10'). > >It looks like there is some synergy to be exploited here, particularly >in the area of record arrays. > > Definitely. I'm not familiar enough with c_types to do this. Any help is appreciated. -Travis From charlesr.harris at gmail.com Thu Apr 13 13:33:08 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 13:33:08 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443E7109.6080808@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> Message-ID: Tim, On 4/13/06, Tim Hochberg wrote: > > Alan G Isaac wrote: > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > >>The Kronecker product (aka Tensor product) of two > >>matrices isn't a matrix. > >> > >> > > > >That is an unusual way to describe things in > >the world of econometrics. Here is a more > >common way: > >http://planetmath.org/encyclopedia/KroneckerProduct.html > >I share Sven's expectation. > > > > > mathworld also agrees with you. As does the documentation (as best as I > can tell) and the actual output of kron. I think Charles must be > thinking of the tensor product instead. It *is* the tensor product, A \tensor B, but it is not the most general tensor with four indices just as a bivector is not the most general tensor with two indices. Numerically, kron chooses to represent the tensor product of two vector spaces a, b with dimensions n,m respectively as the direct sum of n copies of b, and the tensor product of two operators takes the given form. More generally, the B matrix in each spot could be replaced with an arbitrary matrix of the correct dimensions and you would recover the general tensor with four indices. Anyway, it sounds like you are proposing that the tensor (outer) product of two matrices be reshaped to run over two indices. It seems that likewise the tensor (outer) product of two vectors should be reshaped to run over one index (i.e. flat). That would do the trick. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Apr 13 14:19:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 14:19:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> Message-ID: Tim, In particular: def kron(a,b): n = shape(a)[1]*shape(b)[1] c = transpose(product.outer(a,b), axis=(0,2,1,3)).reshape(-1,n) # wrap c as a matrix. On 4/13/06, Charles R Harris wrote: > > Tim, > > On 4/13/06, Tim Hochberg < tim.hochberg at cox.net> wrote: > > > > Alan G Isaac wrote: > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > >>matrices isn't a matrix. > > >> > > >> > > > > > >That is an unusual way to describe things in > > >the world of econometrics. Here is a more > > >common way: > > > http://planetmath.org/encyclopedia/KroneckerProduct.html > > >I share Sven's expectation. > > > > > > > > mathworld also agrees with you. As does the documentation (as best as I > > can tell) and the actual output of kron. I think Charles must be > > thinking of the tensor product instead. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at cox.net Thu Apr 13 14:32:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 13 14:32:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> Message-ID: <443EC2B4.807@cox.net> Charles R Harris wrote: > Tim, > > On 4/13/06, *Tim Hochberg* > wrote: > > Alan G Isaac wrote: > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > >>The Kronecker product (aka Tensor product) of two > >>matrices isn't a matrix. > >> > >> > > > >That is an unusual way to describe things in > >the world of econometrics. Here is a more > >common way: > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > >I share Sven's expectation. > > > > > mathworld also agrees with you. As does the documentation (as best > as I > can tell) and the actual output of kron. I think Charles must be > thinking of the tensor product instead. > > > It *is* the tensor product, A \tensor B, but it is not the most > general tensor with four indices just as a bivector is not the most > general tensor with two indices. Numerically, kron chooses to > represent the tensor product of two vector spaces a, b with dimensions > n,m respectively as the direct sum of n copies of b, and the tensor > product of two operators takes the given form. More generally, the B > matrix in each spot could be replaced with an arbitrary matrix of the > correct dimensions and you would recover the general tensor with four > indices. > > Anyway, it sounds like you are proposing that the tensor (outer) > product of two matrices be reshaped to run over two indices. It seems > that likewise the tensor (outer) product of two vectors should be > reshaped to run over one index ( i.e. flat). That would do the trick. I'm not proposing anything. I don't care at all what kron does. I just want to fix the return type if that's feasible so that people stop complaining about it. As far as I can tell, kron already returns a flattened tensor product of some sort. I believe the general tensor product that you are talking about is already covered by multiply.outer, but I'm not sure so correct me if I'm wrong. Here's what kron does as present: >>> a array([[1, 1], [1, 1]]) >>> kron(a,a) # => 4x4 matrix array([[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]) >>> kron(a,a[0]) => 8x1 array([1, 1, 1, 1, 1, 1, 1, 1]) >>> kron(a[0], a[0]) Traceback (most recent call last): File "", line 1, in ? File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line 577, in kron result = concatenate(concatenate(o, axis=1), axis=1) ValueError: 0-d arrays can't be concatenated >>> b.shape (2, 2, 2) >>> kron(b,b).shape (4, 4, 2, 2) So, it looks like the 2d x 2d product obeys Alan's definition. The other products are probably all broken. Regards, -tim From charlesr.harris at gmail.com Thu Apr 13 16:02:04 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 16:02:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EC2B4.807@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: On 4/13/06, Tim Hochberg wrote: > > Charles R Harris wrote: > > > Tim, > > > > On 4/13/06, *Tim Hochberg* > > wrote: > > > > Alan G Isaac wrote: > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > >>matrices isn't a matrix. > > >> > > >> > > > > > >That is an unusual way to describe things in > > >the world of econometrics. Here is a more > > >common way: > > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > > > >I share Sven's expectation. > > > > > > > > mathworld also agrees with you. As does the documentation (as best > > as I > > can tell) and the actual output of kron. I think Charles must be > > thinking of the tensor product instead. > > > > > > It *is* the tensor product, A \tensor B, but it is not the most > > general tensor with four indices just as a bivector is not the most > > general tensor with two indices. Numerically, kron chooses to > > represent the tensor product of two vector spaces a, b with dimensions > > n,m respectively as the direct sum of n copies of b, and the tensor > > product of two operators takes the given form. More generally, the B > > matrix in each spot could be replaced with an arbitrary matrix of the > > correct dimensions and you would recover the general tensor with four > > indices. > > > > Anyway, it sounds like you are proposing that the tensor (outer) > > product of two matrices be reshaped to run over two indices. It seems > > that likewise the tensor (outer) product of two vectors should be > > reshaped to run over one index ( i.e. flat). That would do the trick. > > I'm not proposing anything. I don't care at all what kron does. I just > want to fix the return type if that's feasible so that people stop > complaining about it. As far as I can tell, kron already returns a > flattened tensor product of some sort. I believe the general tensor > product that you are talking about is already covered by multiply.outer, > but I'm not sure so correct me if I'm wrong. Here's what kron does as > present: > > >>> a > array([[1, 1], > [1, 1]]) > >>> kron(a,a) # => 4x4 matrix > array([[1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1]]) Good at first look. Lets see a simpler version... Nevermind, seems numpy isn't working on this machine (X86_64, fc5 64 bit) at the moment, maybe I need to check out a clean version. >>> kron(a,a[0]) => 8x1 > array([1, 1, 1, 1, 1, 1, 1, 1]) Looks broken. a[0] should be an operator (matrix), so either it should be (2,1) or (1,2). In the first case, the return should have shape (4,2), in the latter (2,4). Should probably raise an error as the result strikes me as ambiguous. But I have to admit I am not sure what the point of this particular construction is. >>> kron(a[0], a[0]) > Traceback (most recent call last): > File "", line 1, in ? > File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line > 577, in kron > result = concatenate(concatenate(o, axis=1), axis=1) > ValueError: 0-d arrays can't be concatenated See above. this could be (1,4) or (4,1), depending. >>> b.shape > (2, 2, 2) > >>> kron(b,b).shape > (4, 4, 2, 2) I think this is doing transpose(outer(b,b), axis=(0,2,1,3)) and reshaping the first 4 indices into 2. Again, I am not sure what the point is for these operators. Now another way to get all this functionality is to have a contraction function or method with a list of axis. For instance, consider the matrices A(i,j) and B(k,l) operating on x(j) and y(l) like A(i,j)x(j) and B(k,l)y(l), then the outer product of all of these is A(i,j)B(k,l)x(j)y(l) with the summation convention on the indices j and l. The result should be the same as kron(A,B)*kron(x,y) up to a permutation of rows and columes. It is just a question of which basis is used and how the elements are indexed. So, it looks like the 2d x 2d product obeys Alan's definition. The other > products are probably all broken. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Thu Apr 13 16:21:08 2006 From: aisaac at american.edu (Alan G Isaac) Date: Thu Apr 13 16:21:08 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EC2B4.807@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: On Thu, 13 Apr 2006, Tim Hochberg apparently wrote: > Here's what kron does as present: As possible context: http://www.mathworks.com/access/helpdesk/help/techdoc/ref/kron.html#998881 http://www.aptech.com/pdf_man/basicgauss.pdf p.69 In this sense, the 2-d handling is not surprising. Cheers, Alan Isaac From charlesr.harris at gmail.com Thu Apr 13 16:32:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 16:32:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: Hi, On 4/13/06, Alan G Isaac wrote: > > On Thu, 13 Apr 2006, Tim Hochberg apparently wrote: > > Here's what kron does as present: > > As possible context: > http://www.mathworks.com/access/helpdesk/help/techdoc/ref/kron.html#998881 > http://www.aptech.com/pdf_man/basicgauss.pdf p.69 > In this sense, the 2-d handling is not surprising. Yep, that is what the little python routine I gave above does. Note that in these cases only matrices are involved. Matlab, for instance, defines vectors as (1,n) or (n,1), which is actually helpful in minding the distinction between a vector space and its dual. I don't know how the numpy matrix package works, but the vectors of rank 1 are going to be a constant source of ambiguity. Cheers, > Alan Isaac Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.hochberg at cox.net Thu Apr 13 16:37:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Thu Apr 13 16:37:04 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> Message-ID: <443EDFE7.6010509@cox.net> Charles R Harris wrote: > > > On 4/13/06, *Tim Hochberg* > wrote: > > Charles R Harris wrote: > > > Tim, > > > > On 4/13/06, *Tim Hochberg* > > >> wrote: > > > > Alan G Isaac wrote: > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > >>matrices isn't a matrix. > > >> > > >> > > > > > >That is an unusual way to describe things in > > >the world of econometrics. Here is a more > > >common way: > > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > < http://planetmath.org/encyclopedia/KroneckerProduct.html> > > >I share Sven's expectation. > > > > > > > > mathworld also agrees with you. As does the documentation > (as best > > as I > > can tell) and the actual output of kron. I think Charles must be > > thinking of the tensor product instead. > > > > > > It *is* the tensor product, A \tensor B, but it is not the most > > general tensor with four indices just as a bivector is not the most > > general tensor with two indices. Numerically, kron chooses to > > represent the tensor product of two vector spaces a, b with > dimensions > > n,m respectively as the direct sum of n copies of b, and the tensor > > product of two operators takes the given form. More generally, the B > > matrix in each spot could be replaced with an arbitrary matrix > of the > > correct dimensions and you would recover the general tensor with > four > > indices. > > > > Anyway, it sounds like you are proposing that the tensor (outer) > > product of two matrices be reshaped to run over two indices. It > seems > > that likewise the tensor (outer) product of two vectors should be > > reshaped to run over one index ( i.e. flat). That would do the > trick. > > I'm not proposing anything. I don't care at all what kron does. I > just > want to fix the return type if that's feasible so that people stop > complaining about it. As far as I can tell, kron already returns a > flattened tensor product of some sort. I believe the general tensor > product that you are talking about is already covered by > multiply.outer, > but I'm not sure so correct me if I'm wrong. Here's what kron does as > present: > > >>> a > array([[1, 1], > [1, 1]]) > >>> kron(a,a) # => 4x4 matrix > array([[1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1], > [1, 1, 1, 1]]) > > > Good at first look. Lets see a simpler version... Nevermind, seems > numpy isn't working on this machine (X86_64, fc5 64 bit) at the > moment, maybe I need to check out a clean version. > > >>> kron(a,a[0]) => 8x1 > array([1, 1, 1, 1, 1, 1, 1, 1]) > > > Looks broken. a[0] should be an operator (matrix), so either it should > be (2,1) or (1,2). Since a is an array here, a[0] is shape (2,). Let's repeat this excercise using matrices, which are always rank-2 and see if they make sense. >>> m matrix([[1, 1], [1, 1]]) >>> kron(m, m[0]) matrix([[1, 1, 1, 1], [1, 1, 1, 1]]) >>> kron(m,m[:,0]) matrix([[1, 1], [1, 1], [1, 1], [1, 1]]) That looks OK. > In the first case, the return should have shape (4,2), in the latter > (2,4). Should probably raise an error as the result strikes me as > ambiguous. But I have to admit I am not sure what the point of this > particular construction is. > > >>> kron(a[0], a[0]) > Traceback (most recent call last): > File "", line 1, in ? > File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line > 577, in kron > result = concatenate(concatenate(o, axis=1), axis=1) > ValueError: 0-d arrays can't be concatenated > > >>> kron(m[0], m[0]) matrix([[1, 1, 1, 1]]) >>> kron(m[:,0], m[:,0]) matrix([[1], [1], [1], [1]]) >>> kron(m[:,0],m[0]) matrix([[1, 1], [1, 1]]) > See above. this could be (1,4) or (4,1), depending. All of these look like they're probably right without thinking about it too hard. > > >>> b.shape > (2, 2, 2) > >>> kron(b,b).shape > (4, 4, 2, 2) > > > I think this is doing transpose(outer(b,b), axis=(0,2,1,3)) and > reshaping the first 4 indices into 2. Again, I am not sure what the > point is for these operators. Now another way to get all this > functionality is to have a contraction function or method with a list > of axis. For instance, consider the matrices A(i,j) and B(k,l) > operating on x(j) and y(l) like A(i,j)x(j) and B(k,l)y(l), then the > outer product of all of these is > > A(i,j)B(k,l)x(j)y(l) > > with the summation convention on the indices j and l. The result > should be the same as kron(A,B)*kron(x,y) up to a permutation of rows > and columes. It is just a question of which basis is used and how the > elements are indexed. > > So, it looks like the 2d x 2d product obeys Alan's definition. The > other > products are probably all broken. > Here's my best guess as to what is going on: 1. There is a relatively large group of people who use Kronecker product as Alan does (probably the matrix as opposed to tensor math folks). I'm guessing it's a large group since they manage to write the definitions at both mathworld and planetmath. 2. kron was meant to implement this. 2.5 People who need the other meaning of kron can just use outer, so no real conflict. 3. The implementation was either inappropriately generalized or it was assumed that all inputs would be matrices (and hence rank-2). Assuming 3. is correct, and I'd like to hear from people if they think that the behaviour in the non rank-2 cases is sensible, the next question is whether the behaviour in the rank-2 cases makes sense. It seem to, but I'm not a user of kron. If both of the preceeding are true, it seems like a complete fix entails the following two things: 1. Forbid arguments that are not rank-2. This allows all matrices, which is really the main target here I think. 2. Fix the return type issue. I have a fix for this ready to commit, but I want to figure out the first part as well. Regards, -tim From charlesr.harris at gmail.com Thu Apr 13 17:14:32 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 17:14:32 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EDFE7.6010509@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> <443EDFE7.6010509@cox.net> Message-ID: On 4/13/06, Tim Hochberg wrote: > > Charles R Harris wrote: > > > > > > > On 4/13/06, *Tim Hochberg* > > wrote: > > > > Charles R Harris wrote: > > > > > Tim, > > > > > > On 4/13/06, *Tim Hochberg* > > > > >> > wrote: > > > > > > Alan G Isaac wrote: > > > > > > >On Thu, 13 Apr 2006, Charles R Harris apparently wrote: > > > > > > > > > > > >>The Kronecker product (aka Tensor product) of two > > > >>matrices isn't a matrix. > > > >> > > > >> > > > > > > > >That is an unusual way to describe things in > > > >the world of econometrics. Here is a more > > > >common way: > > > >http://planetmath.org/encyclopedia/KroneckerProduct.html > > > < http://planetmath.org/encyclopedia/KroneckerProduct.html> > > > >I share Sven's expectation. > > > > > > > > > > > mathworld also agrees with you. As does the documentation > > (as best > > > as I > > > can tell) and the actual output of kron. I think Charles must > be > > > thinking of the tensor product instead. > > > > > > > > > It *is* the tensor product, A \tensor B, but it is not the most > > > general tensor with four indices just as a bivector is not the > most > > > general tensor with two indices. Numerically, kron chooses to > > > represent the tensor product of two vector spaces a, b with > > dimensions > > > n,m respectively as the direct sum of n copies of b, and > the tensor > > > product of two operators takes the given form. More generally, the > B > > > matrix in each spot could be replaced with an arbitrary matrix > > of the > > > correct dimensions and you would recover the general tensor with > > four > > > indices. > > > > > > Anyway, it sounds like you are proposing that the tensor (outer) > > > product of two matrices be reshaped to run over two indices. It > > seems > > > that likewise the tensor (outer) product of two vectors should be > > > reshaped to run over one index ( i.e. flat). That would do the > > trick. > > > > I'm not proposing anything. I don't care at all what kron does. I > > just > > want to fix the return type if that's feasible so that people stop > > complaining about it. As far as I can tell, kron already returns a > > flattened tensor product of some sort. I believe the general tensor > > product that you are talking about is already covered by > > multiply.outer, > > but I'm not sure so correct me if I'm wrong. Here's what kron does > as > > present: > > > > >>> a > > array([[1, 1], > > [1, 1]]) > > >>> kron(a,a) # => 4x4 matrix > > array([[1, 1, 1, 1], > > [1, 1, 1, 1], > > [1, 1, 1, 1], > > [1, 1, 1, 1]]) > > > > > > Good at first look. Lets see a simpler version... Nevermind, seems > > numpy isn't working on this machine (X86_64, fc5 64 bit) at the > > moment, maybe I need to check out a clean version. > > > > >>> kron(a,a[0]) => 8x1 > > array([1, 1, 1, 1, 1, 1, 1, 1]) > > > > > > Looks broken. a[0] should be an operator (matrix), so either it should > > be (2,1) or (1,2). > > Since a is an array here, a[0] is shape (2,). Let's repeat this > excercise using matrices, which are always rank-2 and see if they make > sense. > > >>> m > matrix([[1, 1], > [1, 1]]) > >>> kron(m, m[0]) > matrix([[1, 1, 1, 1], > [1, 1, 1, 1]]) > >>> kron(m,m[:,0]) > matrix([[1, 1], > [1, 1], > [1, 1], > [1, 1]]) > > That looks OK. > > > In the first case, the return should have shape (4,2), in the latter > > (2,4). Should probably raise an error as the result strikes me as > > ambiguous. But I have to admit I am not sure what the point of this > > particular construction is. > > > > >>> kron(a[0], a[0]) > > Traceback (most recent call last): > > File "", line 1, in ? > > File "C:\Python24\Lib\site-packages\numpy\lib\shape_base.py", line > > 577, in kron > > result = concatenate(concatenate(o, axis=1), axis=1) > > ValueError: 0-d arrays can't be concatenated > > > > > >>> kron(m[0], m[0]) > matrix([[1, 1, 1, 1]]) > >>> kron(m[:,0], m[:,0]) > matrix([[1], > [1], > [1], > [1]]) > >>> kron(m[:,0],m[0]) > matrix([[1, 1], > [1, 1]]) > > > See above. this could be (1,4) or (4,1), depending. > > All of these look like they're probably right without thinking about it > too hard. > > > > > >>> b.shape > > (2, 2, 2) > > >>> kron(b,b).shape > > (4, 4, 2, 2) > > > > > > I think this is doing transpose(outer(b,b), axis=(0,2,1,3)) and > > reshaping the first 4 indices into 2. Again, I am not sure what the > > point is for these operators. Now another way to get all this > > functionality is to have a contraction function or method with a list > > of axis. For instance, consider the matrices A(i,j) and B(k,l) > > operating on x(j) and y(l) like A(i,j)x(j) and B(k,l)y(l), then the > > outer product of all of these is > > > > A(i,j)B(k,l)x(j)y(l) > > > > with the summation convention on the indices j and l. The result > > should be the same as kron(A,B)*kron(x,y) up to a permutation of rows > > and columes. It is just a question of which basis is used and how the > > elements are indexed. > > > > So, it looks like the 2d x 2d product obeys Alan's definition. The > > other > > products are probably all broken. > > > Here's my best guess as to what is going on: > 1. There is a relatively large group of people who use Kronecker > product as Alan does (probably the matrix as opposed to tensor math > folks). I'm guessing it's a large group since they manage to write the > definitions at both mathworld and planetmath. > 2. kron was meant to implement this. > 2.5 People who need the other meaning of kron can just use outer, so > no real conflict. > 3. The implementation was either inappropriately generalized or it > was assumed that all inputs would be matrices (and hence rank-2). Uh-huh. Assuming 3. is correct, and I'd like to hear from people if they think > that the behaviour in the non rank-2 cases is sensible, the next > question is whether the behaviour in the rank-2 cases makes sense. It > seem to, but I'm not a user of kron. If both of the preceeding are true, > it seems like a complete fix entails the following two things: > 1. Forbid arguments that are not rank-2. This allows all matrices, > which is really the main target here I think. > 2. Fix the return type issue. I have a fix for this ready to commit, > but I want to figure out the first part as well. I think it was inappropriately generalized, it is hard to make sense of what kron means for rank > 2. So I vote for restricting the usage to matrices, or arrays of rank two. This avoids the both the ambiguity of rank one arrays and big why that arises for arrays with rank > 2. Note that in tensor algebra the rank 1 problem is solved by the use of upper or lower indices, lower index => [1,n], upper index => [n,1]. Hmm, I should to check that kron is associative: kron(kron(a,b),c) == kron(a, kron(b,c)) like a good tensor product should be. I suspect it is. Regards, > > -tim Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Apr 13 17:22:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 17:22:01 2006 Subject: [Numpy-discussion] Problem on FC5 Message-ID: Has anyone else seen this: Python 2.4.2 (#1, Feb 12 2006, 03:45:41) > [GCC 4.1.0 20060210 (Red Hat 4.1.0-0.24)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> from numpy import * > *** buffer overflow detected ***: python terminated > ======= Backtrace: ========= > /lib64/libc.so.6(__chk_fail+0x2f)[0x32c76dee3f] > > /usr/lib64/python2.4/site-packages/numpy/core/multiarray.so[0x2aaaae191099] this is on FC5-x86_64. I didn't see any problems in the compilation and the right lib64 libs seem to have been used. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ivazquez at ivazquez.net Thu Apr 13 17:48:18 2006 From: ivazquez at ivazquez.net (Ignacio Vazquez-Abrams) Date: Thu Apr 13 17:48:18 2006 Subject: [Numpy-discussion] Problem on FC5 In-Reply-To: References: Message-ID: <1144975662.3758.3.camel@ignacio.lan> On Thu, 2006-04-13 at 18:21 -0600, Charles R Harris wrote: > this is on FC5-x86_64. I didn't see any problems in the compilation > and the right lib64 libs seem to have been used. Self-built or from Fedora Extras? -- Ignacio Vazquez-Abrams http://fedora.ivazquez.net/ gpg --keyserver hkp://subkeys.pgp.net --recv-key 38028b72 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 191 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Thu Apr 13 19:04:10 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu Apr 13 19:04:10 2006 Subject: [Numpy-discussion] Problem on FC5 In-Reply-To: <1144975662.3758.3.camel@ignacio.lan> References: <1144975662.3758.3.camel@ignacio.lan> Message-ID: OK, I solved this problem by deleting the numpy directory in site-packages. I probably should have tried that first :-/ On 4/13/06, Ignacio Vazquez-Abrams wrote: > > On Thu, 2006-04-13 at 18:21 -0600, Charles R Harris wrote: > > this is on FC5-x86_64. I didn't see any problems in the compilation > > and the right lib64 libs seem to have been used. > > Self-built or from Fedora Extras? > > -- > Ignacio Vazquez-Abrams > http://fedora.ivazquez.net/ > > gpg --keyserver hkp://subkeys.pgp.net --recv-key 38028b72 > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.2.2 (GNU/Linux) > > iD8DBQBEPvEuoK1Hsnseh8QRAgE4AJwMYPOUU6nz5z2aVBe6lz6fnAhgDwCgw2B0 > E9KCAvYMOYIz035NlwyLvYo= > =TZyJ > -----END PGP SIGNATURE----- > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chanley at stsci.edu Fri Apr 14 07:27:03 2006 From: chanley at stsci.edu (Christopher Hanley) Date: Fri Apr 14 07:27:03 2006 Subject: [Numpy-discussion] numpy.test() segfaults under Solaris 8 Message-ID: <443FB11E.5040102@stsci.edu> From the daily Solaris 8 regression tests: Found 5 tests for numpy.distutils.misc_util Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Found 4 tests for numpy.lib.getlimits Found 30 tests for numpy.core.numerictypes Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/random/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/random/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Found 13 tests for numpy.core.umath Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Found 8 tests for numpy.lib.arraysetops Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Found 42 tests for numpy.lib.type_check Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Found 90 tests for numpy.core.multiarray Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Found 3 tests for numpy.dft.helper Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Found 36 tests for numpy.core.ma Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/f2py/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Found 2 tests for numpy.core.oldnumeric Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/random/tests for module Found 9 tests for numpy.lib.twodim_base Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/distutils/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Found 8 tests for numpy.core.defmatrix Warning: No test file found in /data/basil5/numpy/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Found 1 tests for numpy.lib.ufunclike Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/dft/tests for module Found 32 tests for numpy.lib.function_base Found 1 tests for numpy.lib.polynomial Warning: No test file found in /data/basil5/numpy/tests for module Found 6 tests for numpy.core.records Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Found 17 tests for numpy.core.numeric Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/core/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/testing/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Found 4 tests for numpy.lib.index_tricks Found 44 tests for numpy.lib.shape_base Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/lib/tests for module Warning: No test file found in /data/basil5/site-packages/lib/python/numpy/linalg/tests for module Found 0 tests for __main__ check_1 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_2 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_3 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_gpaths (numpy.distutils.tests.test_misc_util.test_gpaths) ... ok check_1 (numpy.distutils.tests.test_misc_util.test_minrelpath) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_double) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_longdouble) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_python_float) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_single) ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation from list of list of tuples ... ok Check creation from list of tuples ... ok Check creation from tuples ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check reading the top fields of a nested array ... ok Check reading the nested fields of a nested array (1st level) ... ok Check access nested descriptors of a nested array (1st level) ... ok Check reading the nested fields of a nested array (2nd level) ... ok Check access nested descriptors of a nested array (2nd level) ... ok Check reading the top fields of a nested array ... ok Check reading the nested fields of a nested array (1st level) ... ok Check access nested descriptors of a nested array (1st level) ... ok Check reading the nested fields of a nested array (2nd level) ... ok Check access nested descriptors of a nested array (2nd level) ... ok check_access_fields (numpy.core.tests.test_numerictypes.test_read_values_plain_multiple) ... ok check_access_fields (numpy.core.tests.test_numerictypes.test_read_values_plain_single) ... ok test_mixed (numpy.core.tests.test_umath.test_choose) ... ok check_expm1 (numpy.core.tests.test_umath.test_expm1) ... ok check_floating_point (numpy.core.tests.test_umath.test_floating_point) ... ok check_log1p (numpy.core.tests.test_umath.test_log1p) ... ok check_reduce_complex (numpy.core.tests.test_umath.test_maximum) ... ok check_reduce_complex (numpy.core.tests.test_umath.test_minimum) ... ok check_power_complex (numpy.core.tests.test_umath.test_power) ... ok check_power_float (numpy.core.tests.test_umath.test_power) ... ok test_array_with_context (numpy.core.tests.test_umath.test_special_methods) ... ok test_failing_wrap (numpy.core.tests.test_umath.test_special_methods) ... ok test_old_wrap (numpy.core.tests.test_umath.test_special_methods) ... ok test_priority (numpy.core.tests.test_umath.test_special_methods) ... ok test_wrap (numpy.core.tests.test_umath.test_special_methods) ... ok check_intersect1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_intersect1d_nu (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_manyways (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_setdiff1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_setmember1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_setxor1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_union1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_unique1d (numpy.lib.tests.test_arraysetops.test_aso) ... ok check_cmplx (numpy.lib.tests.test_type_check.test_imag) ... ok check_real (numpy.lib.tests.test_type_check.test_imag) ... ok check_fail (numpy.lib.tests.test_type_check.test_iscomplex) ... ok check_pass (numpy.lib.tests.test_type_check.test_iscomplex) ... ok check_basic (numpy.lib.tests.test_type_check.test_iscomplexobj) ... ok check_complex (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_complex1 (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_goodvalues (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_ind (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_integer (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_neginf (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_posinf (numpy.lib.tests.test_type_check.test_isfinite) ... ok check_goodvalues (numpy.lib.tests.test_type_check.test_isinf) ... ok check_ind (numpy.lib.tests.test_type_check.test_isinf) ... ok check_neginf (numpy.lib.tests.test_type_check.test_isinf) ... ok check_neginf_scalar (numpy.lib.tests.test_type_check.test_isinf) ... ok check_posinf (numpy.lib.tests.test_type_check.test_isinf) ... ok check_posinf_scalar (numpy.lib.tests.test_type_check.test_isinf) ... ok check_complex (numpy.lib.tests.test_type_check.test_isnan) ... ok check_complex1 (numpy.lib.tests.test_type_check.test_isnan) ... ok check_goodvalues (numpy.lib.tests.test_type_check.test_isnan) ... ok check_ind (numpy.lib.tests.test_type_check.test_isnan) ... ok check_integer (numpy.lib.tests.test_type_check.test_isnan) ... ok check_neginf (numpy.lib.tests.test_type_check.test_isnan) ... ok check_posinf (numpy.lib.tests.test_type_check.test_isnan) ... ok check_generic (numpy.lib.tests.test_type_check.test_isneginf) ... ok check_generic (numpy.lib.tests.test_type_check.test_isposinf) ... ok check_fail (numpy.lib.tests.test_type_check.test_isreal) ... ok check_pass (numpy.lib.tests.test_type_check.test_isreal) ... ok check_basic (numpy.lib.tests.test_type_check.test_isrealobj) ... ok check_basic (numpy.lib.tests.test_type_check.test_isscalar) ... ok check_default_1 (numpy.lib.tests.test_type_check.test_mintypecode) ... ok check_default_2 (numpy.lib.tests.test_type_check.test_mintypecode) ... ok check_default_3 (numpy.lib.tests.test_type_check.test_mintypecode) ... ok check_complex_bad (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_complex_bad2 (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_complex_good (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_generic (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_integer (numpy.lib.tests.test_type_check.test_nan_to_num) ... ok check_cmplx (numpy.lib.tests.test_type_check.test_real) ... ok check_real (numpy.lib.tests.test_type_check.test_real) ... ok check_basic (numpy.lib.tests.test_type_check.test_real_if_close) ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok Check assignment of 0-dimensional objects with values ... ok Check assignment of multi-dimensional objects with values ... ok Check assignment of single-dimensional objects with values ... ok check_attributes (numpy.core.tests.test_multiarray.test_attributes) ... ok check_dtypeattr (numpy.core.tests.test_multiarray.test_attributes) ... ok check_fill (numpy.core.tests.test_multiarray.test_attributes) ... ok check_set_stridesattr (numpy.core.tests.test_multiarray.test_attributes) ... ok check_stridesattr (numpy.core.tests.test_multiarray.test_attributes) ... ok check_test_interning (numpy.core.tests.test_multiarray.test_bool) ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check byteorder of 0-dimensional objects ... ok Check byteorder of multi-dimensional objects ... ok Check byteorder of single-dimensional objects ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects with values ... ok Check creation of multi-dimensional objects with values ... ok Check creation of single-dimensional objects with values ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok Check creation of 0-dimensional objects ... ok Check creation of multi-dimensional objects ... ok Check creation of single-dimensional objects ... ok check_from_attribute (numpy.core.tests.test_multiarray.test_creation) ... ok check_construction (numpy.core.tests.test_multiarray.test_dtypedescr) ... ok check_list (numpy.core.tests.test_multiarray.test_fancy_indexing) ... ok check_tuple (numpy.core.tests.test_multiarray.test_fancy_indexing) ... ok check_otherflags (numpy.core.tests.test_multiarray.test_flags) ... ok check_writeable (numpy.core.tests.test_multiarray.test_flags) ... ok check_ascii (numpy.core.tests.test_multiarray.test_fromstring) ... ok check_binary (numpy.core.tests.test_multiarray.test_fromstring) ... ok check_test_round (numpy.core.tests.test_multiarray.test_methods) ... ok check_both (numpy.core.tests.test_multiarray.test_pickling) ... ok check_test_zero_rank (numpy.core.tests.test_multiarray.test_subscripting) ... ok check_constructor (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_ellipsis_subscript (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_ellipsis_subscript_assignment (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_empty_subscript (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_empty_subscript_assignment (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_invalid_newaxis (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_invalid_subscript (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_invalid_subscript_assignment (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_newaxis (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_output (numpy.core.tests.test_multiarray.test_zero_rank) ... ok check_definition (numpy.dft.tests.test_helper.test_fftfreq) ... ok check_definition (numpy.dft.tests.test_helper.test_fftshift) ... ok check_inverse (numpy.dft.tests.test_helper.test_fftshift) ... ok test_clip (numpy.core.tests.test_ma.test_array_methods) ... ok test_cumprod (numpy.core.tests.test_ma.test_array_methods) ... ok test_cumsum (numpy.core.tests.test_ma.test_array_methods) ... ok test_ptp (numpy.core.tests.test_ma.test_array_methods) ... ok test_swapaxes (numpy.core.tests.test_ma.test_array_methods) ... ok test_trace (numpy.core.tests.test_ma.test_array_methods) ... ok test_varstd (numpy.core.tests.test_ma.test_array_methods) ... ok check_testAPI (numpy.core.tests.test_ma.test_ma) ... ok Test add, sum, product. ... ok Test of basic arithmetic. ... ok check_testArrayAttributes (numpy.core.tests.test_ma.test_ma) ... ok check_testArrayMethods (numpy.core.tests.test_ma.test_ma) ... ok Test of average. ... ok More tests of average. ... ok Test of basic array creation and properties in 1 dimension. ... ok Test of basic array creation and properties in 2 dimensions. ... ok Test of conversions and indexing ... ok Tests of some subtle points of copying and sizing. ... ok Test of inplace operations and rich comparisons ... ok check_testMaPut (numpy.core.tests.test_ma.test_ma) ... ok Test of masked element ... ok Test of minumum, maximum. ... ok check_testMixedArithmetic (numpy.core.tests.test_ma.test_ma) ... ok Test of other odd features ... ok Test of pickling ... ok Test of put ... ok check_testScalarArithmetic (numpy.core.tests.test_ma.test_ma) ... ok check_testSingleElementSubscript (numpy.core.tests.test_ma.test_ma) ... ok Test of take, transpose, inner, outer products ... ok check_testToPython (numpy.core.tests.test_ma.test_ma) ... ok Test various functions such as sin, cos. ... ok Test count ... ok check_testUfuncRegression (numpy.core.tests.test_ma.test_ufuncs) ... ok test_minmax (numpy.core.tests.test_ma.test_ufuncs) ... ok test_nonzero (numpy.core.tests.test_ma.test_ufuncs) ... ok test_reduce (numpy.core.tests.test_ma.test_ufuncs) ... ok check_bug_r2089 (numpy.core.tests.test_oldnumeric.test_put) ... ok check_array_subclass (numpy.core.tests.test_oldnumeric.test_wrapit) ... ok check_matrix (numpy.lib.tests.test_twodim_base.test_diag) ... ok check_vector (numpy.lib.tests.test_twodim_base.test_diag) ... ok check_2d (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_diag (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_diag2d (numpy.lib.tests.test_twodim_base.test_eye) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_fliplr) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_flipud) ... ok check_basic (numpy.lib.tests.test_twodim_base.test_rot90) ... ok check_basic (numpy.core.tests.test_defmatrix.test_algebra) ... ok check_basic (numpy.core.tests.test_defmatrix.test_casting) ... ok check_basic (numpy.core.tests.test_defmatrix.test_ctor) ... ok check_asmatrix (numpy.core.tests.test_defmatrix.test_properties) ... ok check_basic (numpy.core.tests.test_defmatrix.test_properties) ... ok check_comparisons (numpy.core.tests.test_defmatrix.test_properties) ... ok check_noaxis (numpy.core.tests.test_defmatrix.test_properties) ... ok Test whether matrix.sum(axis=1) preserves orientation. ... ok Doctest: numpy.lib.tests.test_ufunclike ... ok check_basic (numpy.lib.tests.test_function_base.test_all) ... ok check_nd (numpy.lib.tests.test_function_base.test_all) ... ok check_basic (numpy.lib.tests.test_function_base.test_amax) ... ok check_basic (numpy.lib.tests.test_function_base.test_amin) ... ok check_basic (numpy.lib.tests.test_function_base.test_angle) ... ok check_basic (numpy.lib.tests.test_function_base.test_any) ... ok check_nd (numpy.lib.tests.test_function_base.test_any) ... ok check_basic (numpy.lib.tests.test_function_base.test_average) ... ok check_basic (numpy.lib.tests.test_function_base.test_cumprod) ... ok check_basic (numpy.lib.tests.test_function_base.test_cumsum) ... ok check_basic (numpy.lib.tests.test_function_base.test_diff) ... ok check_nd (numpy.lib.tests.test_function_base.test_diff) ... ok check_basic (numpy.lib.tests.test_function_base.test_extins) ... ok check_both (numpy.lib.tests.test_function_base.test_extins) ... ok check_insert (numpy.lib.tests.test_function_base.test_extins) ... ok check_bartlett (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_blackman (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_hamming (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_hanning (numpy.lib.tests.test_function_base.test_filterwindows) ... ok check_simple (numpy.lib.tests.test_function_base.test_histogram) ... ok check_basic (numpy.lib.tests.test_function_base.test_linspace) ... ok check_corner (numpy.lib.tests.test_function_base.test_linspace) ... ok check_basic (numpy.lib.tests.test_function_base.test_logspace) ... ok check_basic (numpy.lib.tests.test_function_base.test_prod) ... ok check_basic (numpy.lib.tests.test_function_base.test_ptp) ... ok check_simple (numpy.lib.tests.test_function_base.test_sinc) ... ok check_simple (numpy.lib.tests.test_function_base.test_trapz) ... ok check_basic (numpy.lib.tests.test_function_base.test_trim_zeros) ... ok check_leading_skip (numpy.lib.tests.test_function_base.test_trim_zeros) ... ok check_trailing_skip (numpy.lib.tests.test_function_base.test_trim_zeros) ... ok check_simple (numpy.lib.tests.test_function_base.test_unwrap) ... ok check_vectorize (numpy.lib.tests.test_function_base.test_vectorize)Segmentation Fault (core dumped) This is a clean checkout and build of numpy that is done every morning on a Solaris 8 system. We are currently using python 2.4.2 on this machine. The equivalent build and test on a RHE system passed with no problems. Chris From fullung at gmail.com Fri Apr 14 08:15:14 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 08:15:14 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <20060412124032.GA30471@sun.ac.za> Message-ID: <006c01c65fd6$2d043b90$0502010a@dsp.sun.ac.za> Hello all There still seems to be a problem with vectorize (or something else). So far I've only been able to reproduce the problem by running the test suite 5 times under IPython on Windows (weird, eh?). Details here: http://projects.scipy.org/scipy/numpy/ticket/52 If anybody has some ideas on how to do a proper debug build with MinGW so that I can get a useful stack trace from the Visual Studio debugger, I can narrow down the problem further. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Stefan van der Walt > Sent: 12 April 2006 14:41 > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Vectorize bug > > Hello all > > Vectorize segfaults for large arrays. I filed the bug at > > http://projects.scipy.org/scipy/numpy/ticket/52 > > The offending code is > > import numpy as N > x = N.linspace(-3,2,10000) > y = N.vectorize(lambda x: x) > > # Segfaults here > y(x) > > Regards > St?fan From fullung at gmail.com Fri Apr 14 08:18:02 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 08:18:02 2006 Subject: [Numpy-discussion] numpy.test() segfaults under Solaris 8 In-Reply-To: <443FB11E.5040102@stsci.edu> Message-ID: <006d01c65fd6$85b72450$0502010a@dsp.sun.ac.za> Hello Chris I am seeing this same crash on Windows under IPython with revision 2351 of NumPy from SVN. If you can get a useful stack trace on your platform, you could add some details to this ticket: http://projects.scipy.org/scipy/numpy/ticket/52 Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Christopher Hanley > Sent: 14 April 2006 16:27 > To: numpy-discussion > Subject: [Numpy-discussion] numpy.test() segfaults under Solaris 8 > > From the daily Solaris 8 regression tests: > check_vectorize > (numpy.lib.tests.test_function_base.test_vectorize)Segmentation Fault > (core dumped) > > This is a clean checkout and build of numpy that is done every morning > on a Solaris 8 system. We are currently using python 2.4.2 on this > machine. The equivalent build and test on a RHE system passed with no > problems. > > Chris From support_ref_16193163133 at natwest.com Fri Apr 14 08:30:09 2006 From: support_ref_16193163133 at natwest.com (support_ref_16193163133 at natwest.com) Date: Fri Apr 14 08:30:09 2006 Subject: [Numpy-discussion] NatWest Account service update! Message-ID: An HTML attachment was scrubbed... URL: From faltet at xot.carabos.com Fri Apr 14 14:36:06 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Fri Apr 14 14:36:06 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy Message-ID: <20060414213511.GA14355@xot.carabos.com> Hi, I'm seeing some slowness in NumPy when dealing with strided arrays. numarray is dealing better with these situations, so I guess that something could be done in NumPy about this. Below are the situations that I've found up to now (maybe there are others). For the timings, I've used numpy 0.9.7.2278 and numarray 1.5.1. It seems that NumPy copy() method is almost 3x slower than in numarray: In [105]: npcopy=timeit.Timer('b=a.copy()','import numpy as np;a=np.arange(1000000,dtype="Float64")[::10]') In [106]: npcopy.repeat(3,10) Out[106]: [0.171913146972656, 0.175906896591186, 0.171195983886718] In [107]: nacopy=timeit.Timer('b=a.copy()','import numarray as np;a=np.arange(1000000,type="Float64")[::10]') In [108]: nacopy.repeat(3,10) Out[108]: [0.065090894699096, 0.0630550384521484, 0.0626609325408935] However, a copy without strides performs similarly in both packages In [127]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np;a=np.arange(1000000,dtype="Float64")') In [128]: npcopy2.repeat(3,10) Out[128]: [0.24657797813415527, 0.24657106399536133, 0.2464911937713623] In [129]: nacopy2=timeit.Timer('b=a.copy()','import numarray as np;a=np.arange(1000000,type="Float64")') In [130]: nacopy2.repeat(3,10) Out[130]: [0.244544982910156, 0.251885890960693, 0.2419440746307373] -------------------------------------------- where() seems more than 2x slower in NumPy than in numarray: In [136]: tnpf=timeit.Timer('np.where(a + b < 10, a, b)','import numpy as np;a=np.arange(100000,dtype="float64");b=a*2') In [137]: tnpf.repeat(3,10) Out[137]: [0.225586891174316, 0.22503495216369629, 0.224209785461425] In [138]: tnaf=timeit.Timer('np.where(a + b < 2, a, b)','import numarray as np;a=np.arange(100000,type="Float64");b=a*2') In [139]: tnaf.repeat(3,10) Out[139]: [0.108436822891235, 0.1069340705871582, 0.10654377937316895] However, for where() without parameters, NumPy performs slightly better than numarray: In [143]: tnpf2=timeit.Timer('np.where(a + b < 10)','import numpy as np;a=np.arange(100000,dtype="float64");b=a*2') In [144]: tnpf2.repeat(3,10) Out[144]: [0.0759999752044677, 0.0731539726257324, 0.073034048080444336] In [145]: tnaf2=timeit.Timer('np.where(a + b < 2)','import numarray as np;a=np.arange(100000,type="Float64");b=a*2') In [146]: tnaf2.repeat(3,10) Out[146]: [0.0890851020812988, 0.0853078365325927, 0.085799932479858398] Cheers, Francesc From oliphant at ee.byu.edu Fri Apr 14 14:54:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 14 14:54:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <006c01c65fd6$2d043b90$0502010a@dsp.sun.ac.za> References: <006c01c65fd6$2d043b90$0502010a@dsp.sun.ac.za> Message-ID: <444019E8.8000700@ee.byu.edu> Albert Strasheim wrote: >Hello all > >There still seems to be a problem with vectorize (or something else). So far >I've only been able to reproduce the problem by running the test suite 5 >times under IPython on Windows (weird, eh?). Details here: > >http://projects.scipy.org/scipy/numpy/ticket/52 > > I'm pretty sure it's a reference-counting issue. I think I found the problem and it should now be fixed. I'm hoping this will clear up the Solaris issue as well. -Travis From oliphant at ee.byu.edu Fri Apr 14 16:04:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 14 16:04:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060414213511.GA14355@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> Message-ID: <44402A2A.9050300@ee.byu.edu> faltet at xot.carabos.com wrote: >Hi, > >I'm seeing some slowness in NumPy when dealing with strided arrays. >numarray is dealing better with these situations, so I guess that >something could be done in NumPy about this. Below are the situations >that I've found up to now (maybe there are others). For the timings, >I've used numpy 0.9.7.2278 and numarray 1.5.1. > > What I've found in experiments like this in the past is that numarray is good at striding in one direction but much worse at striding in another direction for multi-dimensional arrays. Of course my experiments were not complete. That just seemed to be the case. The array-iterator construct handles almost all of these cases. The copy method is a good place to start since it uses that code. -Travis From fullung at gmail.com Fri Apr 14 16:34:06 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 16:34:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <444019E8.8000700@ee.byu.edu> Message-ID: <00f301c6601b$d340a350$0502010a@dsp.sun.ac.za> Hello Travis I'm still getting the same crash when running via IPython, which is the only way I've been able to reproduce the crash on Windows. Just to confirm: In [1]: import numpy In [2]: numpy.__version__ Out[2]: '0.9.7.2356' The crash now happens in check_large, which is the new name of the test method in question. Cheers, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 14 April 2006 23:54 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > >Hello all > > > >There still seems to be a problem with vectorize (or something else). So > far > >I've only been able to reproduce the problem by running the test suite 5 > >times under IPython on Windows (weird, eh?). Details here: > > > >http://projects.scipy.org/scipy/numpy/ticket/52 > > > > > I'm pretty sure it's a reference-counting issue. I think I found the > problem and it should now be fixed. > > I'm hoping this will clear up the Solaris issue as well. > > -Travis From oliphant at ee.byu.edu Fri Apr 14 16:43:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Fri Apr 14 16:43:07 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00f301c6601b$d340a350$0502010a@dsp.sun.ac.za> References: <00f301c6601b$d340a350$0502010a@dsp.sun.ac.za> Message-ID: <44403354.2040708@ee.byu.edu> Albert Strasheim wrote: >Hello Travis > >I'm still getting the same crash when running via IPython, which is the only >way I've been able to reproduce the crash on Windows. > >Just to confirm: > >In [1]: import numpy > >In [2]: numpy.__version__ >Out[2]: '0.9.7.2356' > >The crash now happens in check_large, which is the new name of the test >method in question. > > Do you have SciPy installed? Make sure you are not importing an old version of SciPy. I cannot reproduce this problem. -Travis From fullung at gmail.com Fri Apr 14 16:55:04 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 16:55:04 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44403354.2040708@ee.byu.edu> Message-ID: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> Hello I don't have SciPy installed. Is there any way of doing a debug build of the C code so that I can investigate this problem? You say that you cannot reproduce this problem. Are you trying to reproduce it on Linux or on Windows under IPython? I have also been unable to reproduce the crash on Linux, but as we saw earlier, this crash also cropped up on Solaris, without having to run the tests N times. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 15 April 2006 01:42 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > >Hello Travis > > > >I'm still getting the same crash when running via IPython, which is the > only > >way I've been able to reproduce the crash on Windows. > > > >Just to confirm: > > > >In [1]: import numpy > > > >In [2]: numpy.__version__ > >Out[2]: '0.9.7.2356' > > > >The crash now happens in check_large, which is the new name of the test > >method in question. > > > > > Do you have SciPy installed? > > Make sure you are not importing an old version of SciPy. > > I cannot reproduce this problem. > > -Travis From fullung at gmail.com Fri Apr 14 16:58:03 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 16:58:03 2006 Subject: [Numpy-discussion] Summer of Code 2006 Message-ID: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> Hello all The Google Summer of Code site for 2006 is up: http://code.google.com/soc/ Maybe the NumPy team can propose a few projects to be funded by this program. Personally, I'd be interested in working on the build system, especially on Windows, and/or extending the test suite. Regards, Albert From fullung at gmail.com Fri Apr 14 17:19:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 14 17:19:05 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> Message-ID: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Hello all I think Valgrind might be very useful in tracking down this bug. http://valgrind.org/ Example usage: ~/bin/valgrind \ -v --error-limit=no --leak-check=full \ python -c 'import numpy; numpy.test()' Valgrind emits many warnings for things going on inside Python on my Fedora Core 4 system, but there is also a lot of interesting things going on in the NumPy code. Some warnings that someone might want to look at: ==26750== Use of uninitialised value of size 4 ==26750== at 0x453D4B1: DOUBLE_to_OBJECT (arraytypes.inc:4470) ==26750== by 0x46AB3F3: PyUFunc_GenericFunction (ufuncobject.c:1566) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) ==26750== Conditional jump or move depends on uninitialised value(s) ==26750== at 0x4556055: PyArray_Newshape (multiarraymodule.c:524) ==26750== by 0x45568F4: PyArray_Reshape (multiarraymodule.c:369) ==26750== by 0x4556931: array_shape_set (arrayobject.c:4642) ==26750== Address 0x41D2010 is 392 bytes inside a block of size 1,648 free'd ==26750== at 0x4004F6B: free (vg_replace_malloc.c:235) ==26750== by 0x46A53C3: ufuncloop_dealloc (ufuncobject.c:1280) ==26750== by 0x46AAD60: PyUFunc_GenericFunction (ufuncobject.c:1656) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) ==26750== Conditional jump or move depends on uninitialised value(s) ==26750== at 0x454EE52: PyArray_NewFromDescr (arrayobject.c:4119) ==26750== by 0x4550919: PyArray_GetField (arraymethods.c:265) ==26750== by 0x456C05A: array_subscript (arrayobject.c:2010) ==26750== by 0x456D606: array_subscript_nice (arrayobject.c:2250) ==26750== Conditional jump or move depends on uninitialised value(s) ==26750== at 0x455ED1D: PyArray_MapIterReset (arrayobject.c:7788) ==26750== by 0x456D087: array_ass_sub (arrayobject.c:1812) A possible memory leak: ==26750== 6,051 (1,120 direct, 4,931 indirect) bytes in 28 blocks are definitely lost in loss record 35 of 55 ==26750== at 0x400444E: malloc (vg_replace_malloc.c:149) ==26750== by 0x45442D8: array_alloc (arrayobject.c:5332) ==26750== by 0x454F19D: PyArray_NewFromDescr (arrayobject.c:4155) ==26750== by 0x46A61E4: construct_loop (ufuncobject.c:1000) ==26750== by 0x46AAD09: PyUFunc_GenericFunction (ufuncobject.c:1401) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) ==26750== by 0x454243B: PyArray_GenericBinaryFunction (arrayobject.c:2593) ==26750== by 0x456DA2C: PyArray_Round (multiarraymodule.c:291) The following error is generated when the test segfaults: ==26750== Process terminating with default action of signal 11 (SIGSEGV) ==26750== Access not within mapped region at address 0x10FFFF ==26750== at 0x453D4B1: DOUBLE_to_OBJECT (arraytypes.inc:4470) ==26750== by 0x46AB3F3: PyUFunc_GenericFunction (ufuncobject.c:1566) ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Albert Strasheim > Sent: 15 April 2006 01:55 > To: 'numpy-discussion' > Subject: RE: [Numpy-discussion] Vectorize bug > > Hello > > I don't have SciPy installed. Is there any way of doing a debug build of > the > C code so that I can investigate this problem? > > You say that you cannot reproduce this problem. Are you trying to > reproduce > it on Linux or on Windows under IPython? I have also been unable to > reproduce the crash on Linux, but as we saw earlier, this crash also > cropped > up on Solaris, without having to run the tests N times. > > Regards, > > Albert > > > -----Original Message----- > > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > > Sent: 15 April 2006 01:42 > > To: numpy-discussion > > Subject: Re: [Numpy-discussion] Vectorize bug > > > > Albert Strasheim wrote: > > > > >Hello Travis > > > > > >I'm still getting the same crash when running via IPython, which is the > > only > > >way I've been able to reproduce the crash on Windows. > > > > > >Just to confirm: > > > > > >In [1]: import numpy > > > > > >In [2]: numpy.__version__ > > >Out[2]: '0.9.7.2356' > > > > > >The crash now happens in check_large, which is the new name of the test > > >method in question. > > > > > > > > Do you have SciPy installed? > > > > Make sure you are not importing an old version of SciPy. > > > > I cannot reproduce this problem. > > > > -Travis > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From oliphant.travis at ieee.org Fri Apr 14 18:20:03 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:20:03 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Message-ID: <44404A18.1070202@ieee.org> Albert Strasheim wrote: > Hello all > > I think Valgrind might be very useful in tracking down this bug. > > http://valgrind.org/ > It's a good suggestion. I've run the code through Valgrind, several times before releasing the first version of NumPy. I tracked down many memory leaks that way already. There may be errors that have creeped in, but Valgrind does not help with reference counting errors which this may be. But, I need to be able to reproduce the problem to have any hope of finding it. -Travis From oliphant.travis at ieee.org Fri Apr 14 18:21:09 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:21:09 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> References: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> Message-ID: <44404A5B.5010802@ieee.org> Albert Strasheim wrote: > Hello > > I don't have SciPy installed. Is there any way of doing a debug build of the > C code so that I can investigate this problem? > > You say that you cannot reproduce this problem. Are you trying to reproduce > it on Linux or on Windows under IPython? I have also been unable to > reproduce the crash on Linux, but as we saw earlier, this crash also cropped > up on Solaris, without having to run the tests N times. > > I've tried under Linux with IPython and cannot reproduce the error. I've run numpy.test() 100 times with no error. I'm not sure if the Solaris crash is fixed or not yet after the recent changes to SVN. There may be more than one bug here... -Travis From oliphant.travis at ieee.org Fri Apr 14 18:47:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:47:01 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Message-ID: <44405068.203@ieee.org> Albert Strasheim wrote: > Hello all > > I think Valgrind might be very useful in tracking down this bug. > > http://valgrind.org/ > > Example usage: > > ~/bin/valgrind \ > -v --error-limit=no --leak-check=full \ > python -c 'import numpy; numpy.test()' > > Valgrind emits many warnings for things going on inside Python on my Fedora > Core 4 system, but there is also a lot of interesting things going on in the > NumPy code. > > Some warnings that someone might want to look at: > > ==26750== Use of uninitialised value of size 4 > ==26750== at 0x453D4B1: DOUBLE_to_OBJECT (arraytypes.inc:4470) > ==26750== by 0x46AB3F3: PyUFunc_GenericFunction (ufuncobject.c:1566) > ==26750== by 0x46ABE9F: ufunc_generic_call (ufuncobject.c:2653) > I think this may be the culprit. The buffer was not being initialized to NULL and so DECREF was being called on whatever was there. This can produce strange results indeed depending on the environment. I've initialized the buffer now for loops involving OBJECTs (this same error has happened a couple of times as it's one of the big ones for object arrays). I thought I fixed all places where it might occur, but apparently not... Perhaps you could try the code again. From oliphant.travis at ieee.org Fri Apr 14 18:49:03 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 14 18:49:03 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> Message-ID: <444050DA.6050809@ieee.org> Albert Strasheim wrote: > Hello all > > I think Valgrind might be very useful in tracking down this bug. > > http://valgrind.org/ > > Example usage: > > ~/bin/valgrind \ > -v --error-limit=no --leak-check=full \ > python -c 'import numpy; numpy.test()' > Here's the command that I run to test a Python script provided at the command line: valgrind --tool=memcheck --leak-check=yes --error-limit=no -v --log-file=testmem --suppressions=valgrind-python.supp --show-reachable=yes --num-callers=10 python $1 The valgrind-python.supp file will suppress the complaints valgrind emits for Python. -Travis From robert.kern at gmail.com Fri Apr 14 22:21:00 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 14 22:21:00 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> References: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> Message-ID: Albert Strasheim wrote: > Hello all > > The Google Summer of Code site for 2006 is up: > > http://code.google.com/soc/ > > Maybe the NumPy team can propose a few projects to be funded by this > program. Personally, I'd be interested in working on the build system, > especially on Windows, and/or extending the test suite. What work do you think needs to be done on the build system? (I'm not contending the point; I'm just curious.) -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From fullung at gmail.com Sat Apr 15 02:26:04 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 02:26:04 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: Message-ID: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> Hello all > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Robert Kern > Sent: 15 April 2006 07:20 > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] Re: Summer of Code 2006 > > Albert Strasheim wrote: > > Hello all > > > > The Google Summer of Code site for 2006 is up: > > > > http://code.google.com/soc/ > > > > Maybe the NumPy team can propose a few projects to be funded by this > > program. Personally, I'd be interested in working on the build system, > > especially on Windows, and/or extending the test suite. > > What work do you think needs to be done on the build system? (I'm not > contending the point; I'm just curious.) Let me start by saying that the build system works fine for what I think is the default case, i.e. building NumPy on Linux with preinstalled LAPACK and BLAS. However, as soon as you vary any of those parameters, things get interesting. I've spent the past couple of days trying to build NumPy on Windows with ATLAS and CLAPACK with MinGW and Visual Studio .NET 2003 and VS 8. I don't know if it's just me, but this seems to be very hard. This could probably be partly attributed to the build systems of these libraries and to the lack of documentation, but I've also run into problems with NumPy build scripts. For example, the inclusion of the gcc library in the list of libraries when building Fortran code with MinGW causes the build to break. Also, building FLAPACK from source causes the build to fail (too many open files). While these errors on their own aren't particularly serious, I think it would be helpful to set up an automated system to check that builds of the various configurations NumPy supports can actually be done. There are probably a few million ways to build NumPy, but it would be nice if we could make sure that the N most common configurations always work, and provide documentation for "trying this at home." I also think it would be useful to set up a system that performs regular builds of the latest revision from the SVN repository. I think anyone attempting this is going to run into a few issues with the build scripts, especially when trying to build on multiple platforms. Things I would like to get right, which I think are much harder than they need to be (feel free to disagree): - Windows builds in general - Visual Studio .NET 2003 builds - Visual C++ Toolkit 2003 builds - Visual Studio 2005 builds - Builds with ATLAS and CLAPACK The reason I'm interested in the Microsoft compilers is that they have many features to help us make sure that the code is correct, both at compile time and at run time. Any comments? Anybody building on Windows that finds the process to be completely painless? Regards, Albert From fullung at gmail.com Sat Apr 15 02:42:06 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 02:42:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44404A5B.5010802@ieee.org> Message-ID: <013601c66070$d2377010$0502010a@dsp.sun.ac.za> Hello all The crash I was seeing seems to be fixed in revision 2358. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 15 April 2006 03:20 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > Hello > > > > I don't have SciPy installed. Is there any way of doing a debug build of > the > > C code so that I can investigate this problem? > > > > You say that you cannot reproduce this problem. Are you trying to > reproduce > > it on Linux or on Windows under IPython? I have also been unable to > > reproduce the crash on Linux, but as we saw earlier, this crash also > cropped > > up on Solaris, without having to run the tests N times. > > > > > I've tried under Linux with IPython and cannot reproduce the error. > I've run numpy.test() 100 times with no error. > > I'm not sure if the Solaris crash is fixed or not yet after the recent > changes to SVN. There may be more than one bug here... > > -Travis From fullung at gmail.com Sat Apr 15 04:59:03 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 04:59:03 2006 Subject: [Numpy-discussion] bool_ leaks memory Message-ID: <014701c66083$e3ca5c30$0502010a@dsp.sun.ac.za> Hello all According to Valgrind 3.1.1, the following code leaks memory: from numpy import bool_ bool_(1) Valgrind says: ==32531== 82 (80 direct, 2 indirect) bytes in 2 blocks are definitely lost in loss record 7 of 25 ==32531== at 0x400444E: malloc (vg_replace_malloc.c:149) ==32531== by 0x45442E8: array_alloc (arrayobject.c:5330) ==32531== by 0x454F18D: PyArray_NewFromDescr (arrayobject.c:4153) ==32531== by 0x4551844: Array_FromScalar (arrayobject.c:5768) ==32531== by 0x45602B7: PyArray_FromAny (arrayobject.c:6630) ==32531== by 0x4570065: bool_arrtype_new (scalartypes.inc:2855) ==32531== by 0x2FBF6E: (within /usr/lib/libpython2.4.so.1.0) ==32531== by 0x2C53B3: PyObject_Call (in /usr/lib/libpython2.4.so.1.0) The second leak that Valgrind reports is from this code in ma.py: MaskType = bool_ nomask = MaskType(0) Tested with NumPy 0.9.7.2358. Trac ticket at http://projects.scipy.org/scipy/numpy/ticket/60 Regards, Albert From faltet at xot.carabos.com Sat Apr 15 05:06:01 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Sat Apr 15 05:06:01 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <44402A2A.9050300@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <44402A2A.9050300@ee.byu.edu> Message-ID: <20060415120451.GA15123@xot.carabos.com> On Fri, Apr 14, 2006 at 05:03:06PM -0600, Travis Oliphant wrote: > What I've found in experiments like this in the past is that numarray is > good at striding in one direction but much worse at striding in another > direction for multi-dimensional arrays. Of course my experiments were > not complete. That just seemed to be the case. > > The array-iterator construct handles almost all of these cases. The > copy method is a good place to start since it uses that code. I'm not sure this is directly related with striding. Look at this: In [5]: npcopy=timeit.Timer('a=a.copy()','import numpy as np; a=np.arange(1000000,dtype="Float64")[::10]') In [6]: npcopy.repeat(3,10) Out[6]: [0.061118125915527344, 0.061014175415039062, 0.063937187194824219] In [7]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np; a=np.arange(1000000,dtype="Float64")[::10]') In [8]: npcopy2.repeat(3,10) Out[8]: [0.29984092712402344, 0.29889702796936035, 0.29834103584289551] You see? assigning to a new variable makes the copy go 5x times slower! numarray is also affected by this, but not as much: In [9]: nacopy=timeit.Timer('a=a.copy()','import numarray as np; a=np.arange(1000000,type="Float64")[::10]') In [10]: nacopy.repeat(3,10) Out[10]: [0.039573907852172852, 0.037765979766845703, 0.038245916366577148] In [11]: nacopy2=timeit.Timer('b=a.copy()','import numarray as np; a=np.arange(1000000,type="Float64")[::10]') In [12]: nacopy2.repeat(3,10) Out[12]: [0.073218107223510742, 0.07414698600769043, 0.072872161865234375] i.e. just a 2x slowdown. I don't understand this effect: in both cases we are doing a plain copy, no? I'm missing something, but not sure what it is. Regards, -- Francesc From fullung at gmail.com Sat Apr 15 06:38:02 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 15 06:38:02 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <444050DA.6050809@ieee.org> Message-ID: <014e01c66091$b6b6b730$0502010a@dsp.sun.ac.za> Hello all I did some more Valgrinding and reduces all the warnings still produced when running NumPy revision 0.9.7.2358 to a few lines of code. The relevant Trac tickets: http://projects.scipy.org/scipy/numpy/ticket/60 http://projects.scipy.org/scipy/numpy/ticket/61 http://projects.scipy.org/scipy/numpy/ticket/62 http://projects.scipy.org/scipy/numpy/ticket/64 http://projects.scipy.org/scipy/numpy/ticket/65 If anybody else wants to play with Valgrind, you can find the Valgrind supressions for Python 2.4 here: http://svn.python.org/projects/python/branches/release24-maint/Misc/valgrind -python.supp See also http://svn.python.org/projects/python/branches/release24-maint/Misc/README.v algrind Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 15 April 2006 03:48 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Vectorize bug > > Albert Strasheim wrote: > > Hello all > > > > I think Valgrind might be very useful in tracking down this bug. > > > > http://valgrind.org/ > > > > Example usage: > > > > ~/bin/valgrind \ > > -v --error-limit=no --leak-check=full \ > > python -c 'import numpy; numpy.test()' > > > > Here's the command that I run to test a Python script provided at the > command line: > > valgrind --tool=memcheck --leak-check=yes --error-limit=no -v > --log-file=testmem --suppressions=valgrind-python.supp > --show-reachable=yes --num-callers=10 python $1 > > > The valgrind-python.supp file will suppress the complaints valgrind > emits for Python. > > > -Travis From cjw at sympatico.ca Sat Apr 15 08:01:03 2006 From: cjw at sympatico.ca (Colin J. Williams) Date: Sat Apr 15 08:01:03 2006 Subject: [Numpy-discussion] Summer of Code 2006 In-Reply-To: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> References: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> Message-ID: <44410A87.70205@sympatico.ca> Albert Strasheim wrote: >Hello all > >The Google Summer of Code site for 2006 is up: > >http://code.google.com/soc/ > >Maybe the NumPy team can propose a few projects to be funded by this >program. Personally, I'd be interested in working on the build system, >especially on Windows, and/or extending the test suite. > >Regards, > >Albert > > > > I believe that the Python Software Foundation (http://www.python.org/psf/grants/) offers funding from time to time. Colin W. From Saqib.Sohail at colorado.edu Sat Apr 15 08:51:02 2006 From: Saqib.Sohail at colorado.edu (Saqib bin Sohail) Date: Sat Apr 15 08:51:02 2006 Subject: [Numpy-discussion] Code Question Message-ID: <1145116214.444116365d326@webmail.colorado.edu> Hi guys I have never used python, but I wanted to compute FFT of audio files, I came upon a page which had python code, so I installed Numpy but after beating the bush for a few days, I have finally come in here to ask. After taking the FFT I want to output it to a file and the use gnuplot to plot it. When I instaled NumPy, and ran the tests, it seemed that all passed without a problem. My input is a .dat file converted from .wav file by sox. Here is the code which obviously doesn't work because it seems that changes have occured since this code was written. (not my code, just from some website where a guy had written on how to do things which i require) import Numeric import FFT out_array=Numeric.array(out) out_fft=FFT.fft(out) offt=open('outfile_fft.dat','w') for x in range(len(out_fft)/2): offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) I do the following at the python prompt import numarray myFile = open('test.dat', 'r') my_array = numarray.arra(myFile) /* at this stage I wanted to see if it was correctly read */ print myArray [1632837691 1701605485 1952535072 ..., 538976288 538976288 168632368] it seems that these values do not correspond to the values in the file (but I guess the array is considering these as ints when infact these are floats) anyway the problem starts when i try to do fft, because I can't seem to find module or how to invoke it, the second problem is writing to the file, that code obviously doesn't work, and in my search through various documentations, i found arrayrange() but couldn't make it to work, call me stupid, but despite going through several examples, i haven't been able to make the for loop worked in any case, it would be very kind of someone if he could at least tell me what i am doing wrong and reply a simple example so that I can modify my code or at least be able to understand . Thanks -- Saqib bin Sohail PhD ECE University of Colorado at Boulder Res: (303) 786 0636 http://ucsu.colorado.edu/~sohail/index.html From ndarray at mac.com Sat Apr 15 09:10:07 2006 From: ndarray at mac.com (Sasha) Date: Sat Apr 15 09:10:07 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44404A18.1070202@ieee.org> References: <00fc01c66022$1b51fb70$0502010a@dsp.sun.ac.za> <44404A18.1070202@ieee.org> Message-ID: On 4/14/06, Travis Oliphant wrote: > ... > There may be errors that have creeped in, but Valgrind does not help > with reference counting errors which this may be. > ... Valgrind is alittle bit more helpful if python is compiled using --without-pymalloc config option. In addition to valgrind, memory problems can be exposed by using --with-pydebug option. From faltet at xot.carabos.com Sat Apr 15 10:29:01 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Sat Apr 15 10:29:01 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <44410972.4090502@cox.net> References: <20060414213511.GA14355@xot.carabos.com> <44402A2A.9050300@ee.byu.edu> <20060415120451.GA15123@xot.carabos.com> <44410972.4090502@cox.net> Message-ID: <20060415172755.GA15274@xot.carabos.com> On Sat, Apr 15, 2006 at 07:55:46AM -0700, Tim Hochberg wrote: > >I'm not sure this is directly related with striding. Look at this: > > > >In [5]: npcopy=timeit.Timer('a=a.copy()','import numpy as np; > >a=np.arange(1000000,dtype="Float64")[::10]') > > > >In [6]: npcopy.repeat(3,10) > >Out[6]: [0.061118125915527344, 0.061014175415039062, > >0.063937187194824219] > > > >In [7]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np; > >a=np.arange(1000000,dtype="Float64")[::10]') > > > >In [8]: npcopy2.repeat(3,10) > >Out[8]: [0.29984092712402344, 0.29889702796936035, 0.29834103584289551] > > > >You see? assigning to a new variable makes the copy go 5x times > >slower! > > > You are being tricked! In the first case, the array is discontiguous for > the first copy but for every subsequenc copy is contiguous since you > replace 'a'. In the second case, the array is discontiguous for every copy Oh, yes!. Thanks for noting this!. So in order to compare apples with apples, the difference between numarray and numpy in case of strided copies is: In [87]: npcopy_stride=timeit.Timer('b=a.copy()','import numpy as np; a=np.arange(1000000,dtype="Float64")[::10]') In [88]: npcopy_stride.repeat(3,10) Out[88]: [0.30013298988342285, 0.29976487159729004, 0.29945492744445801] In [89]: nacopy_stride=timeit.Timer('b=a.copy()','import numarray as np; a=np.arange(1000000,type="Float64")[::10]') In [90]: nacopy_stride.repeat(3,10) Out[90]: [0.07545709609985351, 0.0731458663940429, 0.073173046112060547] so numpy is aproximately 4x times slower than numarray. Cheers, Francesc From oliphant.travis at ieee.org Sat Apr 15 10:51:18 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 15 10:51:18 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> References: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> Message-ID: <44413251.3080505@ieee.org> Albert Strasheim wrote: > Hello all > > > Let me start by saying that the build system works fine for what I think is > the default case, i.e. building NumPy on Linux with preinstalled LAPACK and > BLAS. However, as soon as you vary any of those parameters, things get > interesting. > It also builds fine with mingw and pre-installed ATLAS (I do it all the time). It also builds fine with no-installed ATLAS (or LAPACK or BLAS) with mingw32 and Linux. It also builds on Mac OS X. It also builds on Solaris, AIX, and Cygwin. Work also went in recently to make sure it builds with a Visual Studio Compiler (the one Tim Hochberg was using...) So, I think it's a bit unfair to say that varying from only a Linux build causes "things to get interesting". Definitely there are configurations that can require a specialized site.cfg file and it can be difficult if you build with a compiler that was not used to build Python itself. But, it's not a one-platform build system. I just want that to be clear. Documentation on the site.cfg file could be more prominent, of course, and this was aided recently by the addition of an example file to the source tree. The expert on the build system is Pearu Peterson. He has been very responsive to suggested fixes and problems that people have experienced. Robert Kern, David Cooke, and I also have some familiarity with the build system enough to assist from time to time. All help is greatly appreciated, however, as I know you can come up with configurations that do cause things to "get interesting." The more configurations that we get tested and working, the better off we will be. The more people who understand the build system well enough to help fix it, the better off we'll be as well. So, I definitely don't want to discourage any ideas you have on improving the build system. Thanks for being willing to dive in and help. -Travis > I've spent the past couple of days trying to build NumPy on Windows with > ATLAS and CLAPACK with MinGW and Visual Studio .NET 2003 and VS 8. I don't > know if it's just me, but this seems to be very hard. This could probably be > partly attributed to the build systems of these libraries and to the lack of > documentation, but I've also run into problems with NumPy build scripts. > > For example, the inclusion of the gcc library in the list of libraries when > building Fortran code with MinGW causes the build to break. Also, building > FLAPACK from source causes the build to fail (too many open files). > > While these errors on their own aren't particularly serious, I think it > would be helpful to set up an automated system to check that builds of the > various configurations NumPy supports can actually be done. There are > probably a few million ways to build NumPy, but it would be nice if we could > make sure that the N most common configurations always work, and provide > documentation for "trying this at home." > > I also think it would be useful to set up a system that performs regular > builds of the latest revision from the SVN repository. I think anyone > attempting this is going to run into a few issues with the build scripts, > especially when trying to build on multiple platforms. > > Things I would like to get right, which I think are much harder than they > need to be (feel free to disagree): > > - Windows builds in general > - Visual Studio .NET 2003 builds > - Visual C++ Toolkit 2003 builds > - Visual Studio 2005 builds > - Builds with ATLAS and CLAPACK > > The reason I'm interested in the Microsoft compilers is that they have many > features to help us make sure that the code is correct, both at compile time > and at run time. > > Any comments? Anybody building on Windows that finds the process to be > completely painless? > > Regards, > > Albert > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From oliphant.travis at ieee.org Sat Apr 15 10:55:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 15 10:55:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060415172755.GA15274@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> <44402A2A.9050300@ee.byu.edu> <20060415120451.GA15123@xot.carabos.com> <44410972.4090502@cox.net> <20060415172755.GA15274@xot.carabos.com> Message-ID: <4441333D.50906@ieee.org> faltet at xot.carabos.com wrote: > On Sat, Apr 15, 2006 at 07:55:46AM -0700, Tim Hochberg wrote: > >>> I'm not sure this is directly related with striding. Look at this: >>> >>> In [5]: npcopy=timeit.Timer('a=a.copy()','import numpy as np; >>> a=np.arange(1000000,dtype="Float64")[::10]') >>> >>> In [6]: npcopy.repeat(3,10) >>> Out[6]: [0.061118125915527344, 0.061014175415039062, >>> 0.063937187194824219] >>> >>> In [7]: npcopy2=timeit.Timer('b=a.copy()','import numpy as np; >>> a=np.arange(1000000,dtype="Float64")[::10]') >>> >>> In [8]: npcopy2.repeat(3,10) >>> Out[8]: [0.29984092712402344, 0.29889702796936035, 0.29834103584289551] >>> >>> You see? assigning to a new variable makes the copy go 5x times >>> slower! >>> >>> >> You are being tricked! In the first case, the array is discontiguous for >> the first copy but for every subsequenc copy is contiguous since you >> replace 'a'. In the second case, the array is discontiguous for every copy >> > > Oh, yes!. Thanks for noting this!. So in order to compare apples with > apples, the difference between numarray and numpy in case of strided > copies is: > > In [87]: npcopy_stride=timeit.Timer('b=a.copy()','import numpy as np; > a=np.arange(1000000,dtype="Float64")[::10]') > > In [88]: npcopy_stride.repeat(3,10) > Out[88]: [0.30013298988342285, 0.29976487159729004, 0.29945492744445801] > > In [89]: nacopy_stride=timeit.Timer('b=a.copy()','import numarray as np; > a=np.arange(1000000,type="Float64")[::10]') > > In [90]: nacopy_stride.repeat(3,10) > Out[90]: [0.07545709609985351, 0.0731458663940429, 0.073173046112060547] > > so numpy is aproximately 4x times slower than numarray. > > This also seems to vary from compiler to compiler. On my system it's not quite so different (about 1.5x slower). I'm wondering what the effect of an inlined memmove is. Essentially numarray has an inlined for-loop to copy bytes while NumPy calles memmove. I'll try that out and see... -Travis From ryanlists at gmail.com Sat Apr 15 10:58:17 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Sat Apr 15 10:58:17 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <44413251.3080505@ieee.org> References: <013501c6606e$86888200$0502010a@dsp.sun.ac.za> <44413251.3080505@ieee.org> Message-ID: As I understand the summer of code, we can basically get a full time student (who gets paid $4500 for the summer) at no cost to us, as long as someone is willing to coach and define the project. (NumPy/SciPy would actually get $500 from Google). So, I think it would be great if we could define some projects and see what happens. (I am trying to graduate this summer, so maybe I should shut up if I can't help much). Ryan On 4/15/06, Travis Oliphant wrote: > Albert Strasheim wrote: > > Hello all > > > > > > Let me start by saying that the build system works fine for what I think is > > the default case, i.e. building NumPy on Linux with preinstalled LAPACK and > > BLAS. However, as soon as you vary any of those parameters, things get > > interesting. > > > It also builds fine with mingw and pre-installed ATLAS (I do it all the > time). It also builds fine with no-installed ATLAS (or LAPACK or BLAS) > with mingw32 and Linux. It also builds on Mac OS X. It also builds on > Solaris, AIX, and Cygwin. Work also went in recently to make sure it > builds with a Visual Studio Compiler (the one Tim Hochberg was using...) > > So, I think it's a bit unfair to say that varying from only a Linux > build causes "things to get interesting". Definitely there are > configurations that can require a specialized site.cfg file and it can > be difficult if you build with a compiler that was not used to build > Python itself. But, it's not a one-platform build system. I just > want that to be clear. > > Documentation on the site.cfg file could be more prominent, of course, > and this was aided recently by the addition of an example file to the > source tree. > > The expert on the build system is Pearu Peterson. He has been very > responsive to suggested fixes and problems that people have > experienced. Robert Kern, David Cooke, and I also have some > familiarity with the build system enough to assist from time to time. > > All help is greatly appreciated, however, as I know you can come up with > configurations that do cause things to "get interesting." The more > configurations that we get tested and working, the better off we will > be. The more people who understand the build system well enough to > help fix it, the better off we'll be as well. So, I definitely don't > want to discourage any ideas you have on improving the build system. > > Thanks for being willing to dive in and help. > > -Travis > > > > > > I've spent the past couple of days trying to build NumPy on Windows with > > ATLAS and CLAPACK with MinGW and Visual Studio .NET 2003 and VS 8. I don't > > know if it's just me, but this seems to be very hard. This could probably be > > partly attributed to the build systems of these libraries and to the lack of > > documentation, but I've also run into problems with NumPy build scripts. > > > > For example, the inclusion of the gcc library in the list of libraries when > > building Fortran code with MinGW causes the build to break. Also, building > > FLAPACK from source causes the build to fail (too many open files). > > > > While these errors on their own aren't particularly serious, I think it > > would be helpful to set up an automated system to check that builds of the > > various configurations NumPy supports can actually be done. There are > > probably a few million ways to build NumPy, but it would be nice if we could > > make sure that the N most common configurations always work, and provide > > documentation for "trying this at home." > > > > I also think it would be useful to set up a system that performs regular > > builds of the latest revision from the SVN repository. I think anyone > > attempting this is going to run into a few issues with the build scripts, > > especially when trying to build on multiple platforms. > > > > Things I would like to get right, which I think are much harder than they > > need to be (feel free to disagree): > > > > - Windows builds in general > > - Visual Studio .NET 2003 builds > > - Visual C++ Toolkit 2003 builds > > - Visual Studio 2005 builds > > - Builds with ATLAS and CLAPACK > > > > The reason I'm interested in the Microsoft compilers is that they have many > > features to help us make sure that the code is correct, both at compile time > > and at run time. > > > > Any comments? Anybody building on Windows that finds the process to be > > completely painless? > > > > Regards, > > > > Albert > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > > that extends applications into web and mobile media. Attend the live webcast > > and join the prime developer group breaking into this new coding territory! > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From robert.kern at gmail.com Sat Apr 15 11:31:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 15 11:31:01 2006 Subject: [Numpy-discussion] Re: Summer of Code 2006 In-Reply-To: <44410A87.70205@sympatico.ca> References: <00fb01c6601f$26e19b10$0502010a@dsp.sun.ac.za> <44410A87.70205@sympatico.ca> Message-ID: Colin J. Williams wrote: > I believe that the Python Software Foundation > (http://www.python.org/psf/grants/) offers funding from time to time. However, it likes to fund new projects, not continuing ones. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Sat Apr 15 11:35:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 15 11:35:04 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <014e01c66091$b6b6b730$0502010a@dsp.sun.ac.za> References: <014e01c66091$b6b6b730$0502010a@dsp.sun.ac.za> Message-ID: <44413C9B.3080507@ieee.org> Albert Strasheim wrote: > Hello all > > I did some more Valgrinding and reduces all the warnings still produced when > running NumPy revision 0.9.7.2358 to a few lines of code. The relevant Trac > tickets: > > http://projects.scipy.org/scipy/numpy/ticket/60 > http://projects.scipy.org/scipy/numpy/ticket/61 > http://projects.scipy.org/scipy/numpy/ticket/62 > http://projects.scipy.org/scipy/numpy/ticket/64 > http://projects.scipy.org/scipy/numpy/ticket/65 > > This is very useful. Thank you for isolating the code producing the warnings like this. It makes it much easier to debug. -Travis From robert.kern at gmail.com Sat Apr 15 12:00:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 15 12:00:06 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: <1145116214.444116365d326@webmail.colorado.edu> References: <1145116214.444116365d326@webmail.colorado.edu> Message-ID: Saqib bin Sohail wrote: > Hi guys > > I have never used python, but I wanted to compute FFT of audio files, I came > upon a page which had python code, so I installed Numpy but after beating the > bush for a few days, I have finally come in here to ask. After taking the FFT I > want to output it to a file and the use gnuplot to plot it. > When I instaled NumPy, and ran the tests, it seemed that all passed without a > problem. My input is a .dat file converted from .wav file by sox. > > Here is the code which obviously doesn't work because it seems that changes > have occured since this code was written. (not my code, just from some website > where a guy had written on how to do things which i require) Okay, first some history. Originally, the package was named Numeric; occasionally, it was referred to by its nickname NumPy. Some years ago, a group needed features that couldn't be done in the Numeric codebase, so they started a rewrite called numarray. For various reasons that I don't want to get into, another group needed features that couldn't be done in the numarray codebase, so a second rewrite happened and this package is the one that is currently getting the most developer attention. It is called numpy. Since you are a new user, I highly recommend that you use numpy instead of Numeric or numarray. http://numeric.scipy.org/ > import Numeric > import FFT > out_array=Numeric.array(out) > out_fft=FFT.fft(out) > > offt=open('outfile_fft.dat','w') > for x in range(len(out_fft)/2): > offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) Rewritten for numpy (but untested): import numpy # Assuming that the file contains 32-bit floats, and not 64-bit floats data = numpy.fromfile('test.dat', dtype=numpy.float32) out_fft = numpy.refft(data) # Note: refft does the FFT on real data and thus throws away the negative # frequencies since they are redundant. len(out_fft) != len(data) # and now I'm confused because the code references variables that weren't # created anywhere, so I'm going to output the power spectrum n = len(out_fft) freqs = numpy.arange(n, dtype=numpy.float32) / len(data) power = out_fft.real*out_fft.real + out_fft.imag*out_fft.imag outarray = numpy.column_stack(freqs, power) assert outarray.shape == (n, 2) offt = open('outfile_fft.dat', 'w') try: for f, p in outarray: offt.write('%f %f\n' % (f, p)) finally: offt.close() > I do the following at the python prompt > > import numarray > myFile = open('test.dat', 'r') > my_array = numarray.arra(myFile) > > /* at this stage I wanted to see if it was correctly read */ > > print myArray > [1632837691 1701605485 1952535072 ..., 538976288 538976288 168632368] > > it seems that these values do not correspond to the values in the file (but I > guess the array is considering these as ints when infact these are floats) Indeed. There is no way for the array constructor to know the data type in the file unless if you tell it. The default type is int. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Saqib.Sohail at colorado.edu Sat Apr 15 13:42:02 2006 From: Saqib.Sohail at colorado.edu (Saqib bin Sohail) Date: Sat Apr 15 13:42:02 2006 Subject: [Numpy-discussion] Code Question In-Reply-To: <06041504462800.00752@rbastian> References: <1145116214.444116365d326@webmail.colorado.edu> <06041504462800.00752@rbastian> Message-ID: <1145133678.44415a6e5d8f7@webmail.colorado.edu> Thanks a lot for your detailed email, unfortunately both of the following imports don't work import Gnuplot import fft as FFT from numarray import * I think I need Gnuplot package but what I can't understand is why, fft is not being imported, do I need to install the NumPy package with special options to install fft. Quoting Ren? Bastian : > Le Samedi 15 Avril 2006 17:50, Saqib bin Sohail a ?crit : > > Hi guys > > > > I have never used python, but I wanted to compute FFT of audio files, I > > came upon a page which had python code, so I installed Numpy but after > > beating the bush for a few days, I have finally come in here to ask. After > > taking the FFT I want to output it to a file and the use gnuplot to plot > > it. > > With the module Gnuplot.py you can plot arrays > > import Gnuplot > > g =Gnuplot.Gnuplot() > g.plot(w) # w is an array > raw_input("Enter") > g.reset() > > I use numarray > > Some code : > ---------------- > > import fft as FFT > from numarray import * > > T = arrayrange(0.0, 2*pi, 1.0/1000) > a = sin(2*pi*440.0*T) > > r = FFT.fft(a) > print r > g.plot(r) > raw_input("Enter") > .... > r = FFT.inverse_real_fft(a) > r = FFT.real_fft(a) > r = FFT.hermite_fft(a) > > g.reset() > ---------------- > > > > > > When I instaled NumPy, and ran the tests, it seemed that all passed without > > a problem. My input is a .dat file converted from .wav file by sox. > > > > > > Here is the code which obviously doesn't work because it seems that changes > > have occured since this code was written. (not my code, just from some > > website where a guy had written on how to do things which i require) > > > > import Numeric > > import FFT > > out_array=Numeric.array(out) > > out_fft=FFT.fft(out) > > > > > > offt=open('outfile_fft.dat','w') > > for x in range(len(out_fft)/2): > > offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) > > > > > > I do the following at the python prompt > > > > import numarray > > myFile = open('test.dat', 'r') > > my_array = numarray.arra(myFile) > > Read the manual how to load a file of floats > I think there is a mistake > > > /* at this stage I wanted to see if it was correctly read */ > > > > print myArray > > [1632837691 1701605485 1952535072 ..., 538976288 538976288 168632368] > > > > it seems that these values do not correspond to the values in the file (but > > I guess the array is considering these as ints when infact these are > > floats) > > hmmm ... > > > > > anyway the problem starts when i try to do fft, because I can't seem to > > find module or how to invoke it, > > > > the second problem is writing to the file, that code obviously doesn't > > work, and in my search through various documentations, i found arrayrange() > > but couldn't make it to work, call me stupid, but despite going through > > several examples, i haven't been able to make the for loop worked in any > > case, > > > > > > it would be very kind of someone if he could at least tell me what i am > > doing wrong and reply a simple example so that I can modify my code or at > > least be able to understand . > > > > Thanks > > > > > > > > -- > > Saqib bin Sohail > > PhD ECE > > University of Colorado at Boulder > > Res: (303) 786 0636 > > http://ucsu.colorado.edu/~sohail/index.html > > > > > > ------------------------------------------------------- > > -- > Ren? Bastian > http://pythoneon.musiques-rb.org "Musique en Python" > > -- Saqib bin Sohail PhD ECE University of Colorado at Boulder Res: (303) 786 0636 http://ucsu.colorado.edu/~sohail/index.html From robert.kern at gmail.com Sun Apr 16 02:37:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 02:37:05 2006 Subject: [Numpy-discussion] Trac Wikis closed for anonymous edits until further notice Message-ID: <44421025.9060804@gmail.com> We've been hit badly by spammers, so I can only presume our Trac sites are now on the traded spam lists. I am going to turn off anonymous edits for now. Ticket creation will probably still be left open for now. Many thanks to David Cooke for quickly removing the spam. I am looking into ways to allow people to register themselves with the Trac sites so they can edit the Wikis and submit tickets without needing to be added by a project admin. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Sun Apr 16 12:36:01 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Sun Apr 16 12:36:01 2006 Subject: [Numpy-discussion] g95 detection not working Message-ID: <44429C55.2030500@gmail.com> Hi all, at least on my setup (OS X, Python 2.4.1, latest svn of numpy and scipy), config_fc fails to recognize my g95 compiler, which was directly downloaded from http://g95.sourceforge.net/ (and always has failed, I think). This is because the current version string doesn't conform to the regexp pattern; the version string is """ G95 (GCC 4.0.3 (g95!) Apr 12 2006) Copyright (C) 2002-2005 Free Software Foundation, Inc. G95 comes with NO WARRANTY, to the extent permitted by law. You may redistribute copies of G95 under the terms of the GNU General Public License. For more information about these matters, see the file named COPYING """ I've attached a patch below, although this identifies the version string with the date of the release, rather than the gcc version; I'm not sure which is the right one to use! Andrew --- numpy/distutils/fcompiler/g95.py (revision 2360) +++ numpy/distutils/fcompiler/g95.py (working copy) @@ -9,7 +9,7 @@ class G95FCompiler(FCompiler): compiler_type = 'g95' - version_pattern = r'G95.*\(experimental\) \(g95!\) (?P.*)\).*' + version_pattern = r'G95.*\(g95!\) (?P.*)\).*' executables = { 'version_cmd' : ["g95", "--version"], From robert.kern at gmail.com Sun Apr 16 12:50:05 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 12:50:05 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: <44429C55.2030500@gmail.com> References: <44429C55.2030500@gmail.com> Message-ID: Andrew Jaffe wrote: > Hi all, > > at least on my setup (OS X, Python 2.4.1, latest svn of numpy and > scipy), config_fc fails to recognize my g95 compiler, which was directly > downloaded from http://g95.sourceforge.net/ (and always has failed, I > think). This is because the current version string doesn't conform to > the regexp pattern; the version string is > """ > G95 (GCC 4.0.3 (g95!) Apr 12 2006) > Copyright (C) 2002-2005 Free Software Foundation, Inc. > > G95 comes with NO WARRANTY, to the extent permitted by law. > You may redistribute copies of G95 > under the terms of the GNU General Public License. > For more information about these matters, see the file named COPYING > """ > > I've attached a patch below, although this identifies the version string > with the date of the release, rather than the gcc version; I'm not sure > which is the right one to use! We need the actual version number; in this case, "4.0.3". -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Sun Apr 16 13:53:03 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Sun Apr 16 13:53:03 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: References: <44429C55.2030500@gmail.com> Message-ID: <4442AE89.8080303@gmail.com> Robert Kern wrote: > Andrew Jaffe wrote: >> Hi all, >> >> at least on my setup (OS X, Python 2.4.1, latest svn of numpy and >> scipy), config_fc fails to recognize my g95 compiler, which was directly >> downloaded from http://g95.sourceforge.net/ (and always has failed, I >> think). This is because the current version string doesn't conform to >> the regexp pattern; the version string is >> """ >> G95 (GCC 4.0.3 (g95!) Apr 12 2006) >> Copyright (C) 2002-2005 Free Software Foundation, Inc. >> >> G95 comes with NO WARRANTY, to the extent permitted by law. >> You may redistribute copies of G95 >> under the terms of the GNU General Public License. >> For more information about these matters, see the file named COPYING >> """ >> >> I've attached a patch below, although this identifies the version string >> with the date of the release, rather than the gcc version; I'm not sure >> which is the right one to use! > > We need the actual version number; in this case, "4.0.3". Thanks -- OK, in that case the following regexp works for me: version_pattern = r'G95.*\(GCC (?P.*) \(g95!\)' But are there different versions of the version string? Also on an unrelated f2py note: is the f2py mailing list being read by the f2py developers? I've posted a question (about the status of F9x "types") without reply... Yours, Andrew From robert.kern at gmail.com Sun Apr 16 13:56:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 13:56:02 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: <44429C55.2030500@gmail.com> References: <44429C55.2030500@gmail.com> Message-ID: Andrew Jaffe wrote: > Hi all, > > at least on my setup (OS X, Python 2.4.1, latest svn of numpy and > scipy), config_fc fails to recognize my g95 compiler, which was directly > downloaded from http://g95.sourceforge.net/ (and always has failed, I > think). This is because the current version string doesn't conform to > the regexp pattern; the version string is > """ > G95 (GCC 4.0.3 (g95!) Apr 12 2006) > Copyright (C) 2002-2005 Free Software Foundation, Inc. > > G95 comes with NO WARRANTY, to the extent permitted by law. > You may redistribute copies of G95 > under the terms of the GNU General Public License. > For more information about these matters, see the file named COPYING > """ > > I've attached a patch below, although this identifies the version string > with the date of the release, rather than the gcc version; I'm not sure > which is the right one to use! Also, note that you can override the get_version() method entirely, if it's easier to do grab the version using something other than a regex. You can look at hpux.py and ibm.py for examples. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Saqib.Sohail at colorado.edu Sun Apr 16 14:02:04 2006 From: Saqib.Sohail at colorado.edu (Saqib bin Sohail) Date: Sun Apr 16 14:02:04 2006 Subject: [Numpy-discussion] Code Question In-Reply-To: References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> Message-ID: <1145221290.4442b0aa55961@webmail.colorado.edu> Thanks Guys for all your prompt responses. I have tried to use the provided solutions but I am had my share of issues mixed with my lack of knowledge to the point that I feel quite embarrassed to bother you guys. Issue 1 I am running FC 3 with native python-2.3 and then I installed python-2.4 in it. numarray-1.5.1 seems to have installed with success in python-2.3. I have tried to install numpy-0.9.6-1.i586.rpm but I don't have python-base and when I try to install python-base I get a long list of dependency lists which I need. I haven't further pursued down that line, unfortunately I haven't been able to use numarray, I don't know how to use it because ppl have repeatedly told me to use numpy but I can't seem to get that installed. Issue 2 To input the file, Ryan suggested to use scipy, I don't want to go down that path, if only there is a simple way to input the file, (i can clean up the file and format it in the right way in perl, I can do that in a heartbeat) Issue 3 I don't want to use gnuplot functionality, or mathplot, if only I am able to write the file then again I can use perl to format it and use gnuplot then, So if there is the simplest of ways in which I can just i) read the file (formatting will be done in perl) ii) get the fft iii) write the file or files (and then use perl to format for gnuplot) I am sure all of you will say why not use the existing functionalities, but after 3 days I haven't gotten anywhere. All I need to do is get FFT of some sound files so that I can verify the result of FFT's and compare them with my FFT code in VxWorks. An Pierre, I started reading diveintopython.pdf but got nowhere when I tried two of its examples, the attached image shows that when I tried to run one of the examples on python-2.3 and the output wasn't according to what the guide suggested. (no output to be precise) http://jobim.colorado.edu/~sohail/pythonExample.JPG Thanks again guys. Quoting Ryan Krauss : > I guess it depends on how much you want to learn and what you want to do. > > I was able to load your data using > data=scipy.io.read_array('monkey.dat') > > I had to comment out the first line to make it work. I couldn't make > the fromfile method of numpy work because the data is actually fixed > width. > > If you don't want to install scipy, you would need to learn enough > Python to read the file and clean it up a little by hand. > > It seems like the first column is time and the second is the signal > you want to fft. I was able to fft it with: > myfft=numpy.fft(data[:,1]) > (I don't have the latest version of numpy and don't seem to have the > refft function Robert mentioned). > > t=data[:,0] > df=1/max(t) > df > maxf=8012 > fvect=arange(0,maxf+df,df) > > plot(fvect,abs(myfft)) > > I am plotting using matplotlib and the resulting figures are attached. > > If you really want to learn python for scientific and plotting > applications, I would highly recommend a few packages: > SciPy - some additional capabilities beyond Numpy (optimization, ode's , ...) > ipython - it is a really good interactive python shell > matplotlib - the best python 2d plotting package I am aware of > > Let me know if you have any additional questions. You can find out > about each package by googling it. They are all closely related to > Numpy and all have good mailing lists to help you. > > Ryan > > On 4/15/06, Saqib bin Sohail wrote: > > Do let me know if you get somewhere. > > > > Thanks > > > > > > Quoting Ryan Krauss : > > > > > email me the dat file and I could play with it a bit. If I can read > > > your input file, the rest should be easy. > > > > > > Ryan > > > > > > On 4/15/06, Saqib bin Sohail wrote: > > > > Hi guys > > > > > > > > I have never used python, but I wanted to compute FFT of audio files, I > > > came > > > > upon a page which had python code, so I installed Numpy but after > beating > > > the > > > > bush for a few days, I have finally come in here to ask. After taking > the > > > FFT I > > > > want to output it to a file and the use gnuplot to plot it. > > > > > > > > When I instaled NumPy, and ran the tests, it seemed that all passed > without > > > a > > > > problem. My input is a .dat file converted from .wav file by sox. > > > > > > > > Here is the code which obviously doesn't work because it seems that > changes > > > > have occured since this code was written. (not my code, just from some > > > website > > > > where a guy had written on how to do things which i require) > > > > > > > > import Numeric > > > > import FFT > > > > out_array=Numeric.array(out) > > > > out_fft=FFT.fft(out) > > > > > > > > offt=open('outfile_fft.dat','w') > > > > for x in range(len(out_fft)/2): > > > > offt.write('%f %f\n'%(1.0*x/wtime,abs(out_fft[x].real))) > > > > > > > > > > > > I do the following at the python prompt > > > > > > > > import numarray > > > > myFile = open('test.dat', 'r') > > > > my_array = numarray.arra(myFile) > > > > > > > > /* at this stage I wanted to see if it was correctly read */ > > > > > > > > print myArray > > > > [1632837691 1701605485 1952535072 ..., 538976288 538976288 > 168632368] > > > > > > > > it seems that these values do not correspond to the values in the file > (but > > > I > > > > guess the array is considering these as ints when infact these are > floats) > > > > > > > > anyway the problem starts when i try to do fft, because I can't seem to > > > find > > > > module or how to invoke it, > > > > > > > > the second problem is writing to the file, that code obviously doesn't > > > work, > > > > and in my search through various documentations, i found arrayrange() > but > > > > couldn't make it to work, call me stupid, but despite going through > several > > > > examples, i haven't been able to make the for loop worked in any case, > > > > > > > > it would be very kind of someone if he could at least tell me what i am > > > doing > > > > wrong and reply a simple example so that I can modify my code or at > least > > > be > > > > able to understand . > > > > > > > > Thanks > > > > > > > > > > > > > > > > -- > > > > Saqib bin Sohail > > > > PhD ECE > > > > University of Colorado at Boulder > > > > Res: (303) 786 0636 > > > > http://ucsu.colorado.edu/~sohail/index.html > > > > > > > > > > > > ------------------------------------------------------- > > > > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > > > > that extends applications into web and mobile media. Attend the live > > > webcast > > > > and join the prime developer group breaking into this new coding > territory! > > > > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > > > > _______________________________________________ > > > > Numpy-discussion mailing list > > > > Numpy-discussion at lists.sourceforge.net > > > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > > > > > > > > > -- > > Saqib bin Sohail > > PhD ECE > > University of Colorado at Boulder > > Res: (303) 786 0636 > > http://ucsu.colorado.edu/~sohail/index.html > > > > > -- Saqib bin Sohail PhD ECE University of Colorado at Boulder Res: (303) 786 0636 http://ucsu.colorado.edu/~sohail/index.html From robert.kern at gmail.com Sun Apr 16 14:03:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 14:03:01 2006 Subject: [Numpy-discussion] Re: g95 detection not working In-Reply-To: <4442AE89.8080303@gmail.com> References: <44429C55.2030500@gmail.com> <4442AE89.8080303@gmail.com> Message-ID: Andrew Jaffe wrote: > Thanks -- OK, in that case the following regexp works for me: > > version_pattern = r'G95.*\(GCC (?P.*) \(g95!\)' > > But are there different versions of the version string? Possibly. I don't really know. > Also on an unrelated f2py note: is the f2py mailing list being read by > the f2py developers? I've posted a question (about the status of F9x > "types") without reply... Pearu is really the only f2py developer, and he has just flown from his home in Estonia to Austin to work with us at Enthought for a month. I presume he has been busy preparing for his journey. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 16 14:26:06 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 14:26:06 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: <1145221290.4442b0aa55961@webmail.colorado.edu> References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> <1145221290.4442b0aa55961@webmail.colorado.edu> Message-ID: Saqib bin Sohail wrote: > An Pierre, I started reading diveintopython.pdf but got nowhere when I tried > two of its examples, the attached image shows that when I tried to run one of > the examples on python-2.3 and the output wasn't according to what the guide > suggested. (no output to be precise) > > http://jobim.colorado.edu/~sohail/pythonExample.JPG Note the indentation. Indentation is important in Python. > Quoting Ryan Krauss : >>(I don't have the latest version of numpy and don't seem to have the >>refft function Robert mentioned). My example was wrong. It should have used "numpy.dft.refft()", not "numpy.refft()". -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 16 14:37:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 16 14:37:02 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: <1145221290.4442b0aa55961@webmail.colorado.edu> References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> <1145221290.4442b0aa55961@webmail.colorado.edu> Message-ID: Saqib bin Sohail wrote: > I am sure all of you will say why not use the existing functionalities, but > after 3 days I haven't gotten anywhere. All I need to do is get FFT of some > sound files so that I can verify the result of FFT's and compare them with my > FFT code in VxWorks. Well, if you are just trying to get an independent verification of your VxWorks FFT code, and you are much more comfortable with Perl, then you might want to use one of the FFT libraries available for Perl like Math::FFT. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From a.h.jaffe at gmail.com Sun Apr 16 15:18:02 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Sun Apr 16 15:18:02 2006 Subject: [Numpy-discussion] where() has started returning a tuple!? Message-ID: I think the following behavior is (only recently) wrong: In [7]: numpy.__version__ Out[7]: '0.9.7.2360' In [8]: numpy.nonzero([True, False, True]) Out[8]: array([0, 2]) In [9]: numpy.where([True, False, True]) Out[9]: (array([0, 2]),) Note the tuple output to where(), which should be the same as nonzero. Andrew From perry at stsci.edu Sun Apr 16 20:18:02 2006 From: perry at stsci.edu (Perry Greenfield) Date: Sun Apr 16 20:18:02 2006 Subject: [Numpy-discussion] where() has started returning a tuple!? In-Reply-To: Message-ID: see: http://sourceforge.net/mailarchive/forum.php?thread_id=10165581&forum_id=489 0 > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net > [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Andrew > Jaffe > Sent: Sunday, April 16, 2006 6:17 PM > To: numpy-discussion at lists.sourceforge.net > Subject: [Numpy-discussion] where() has started returning a tuple!? > > > I think the following behavior is (only recently) wrong: > > In [7]: numpy.__version__ > Out[7]: '0.9.7.2360' > > In [8]: numpy.nonzero([True, False, True]) > Out[8]: array([0, 2]) > > In [9]: numpy.where([True, False, True]) > Out[9]: (array([0, 2]),) > > Note the tuple output to where(), which should be the same as nonzero. > > Andrew > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking > scripting language > that extends applications into web and mobile media. Attend the > live webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From a.h.jaffe at gmail.com Mon Apr 17 00:53:04 2006 From: a.h.jaffe at gmail.com (Andrew Jaffe) Date: Mon Apr 17 00:53:04 2006 Subject: [Numpy-discussion] Re: where() has started returning a tuple!? In-Reply-To: References: Message-ID: Aha, missed that thread (and the docstring -- my bad). And actually I misunderstood the effect of the change, anyway: a[where(a>0)] is still fine, it's just other activities like iterating over where(a>0) that is no longer possible in the same way. Thanks for the pointer! Andrew Perry Greenfield wrote: > see: > > http://sourceforge.net/mailarchive/forum.php?thread_id=10165581&forum_id=489 > 0 > >> -----Original Message----- >> From: numpy-discussion-admin at lists.sourceforge.net >> [mailto:numpy-discussion-admin at lists.sourceforge.net]On Behalf Of Andrew >> Jaffe >> Sent: Sunday, April 16, 2006 6:17 PM >> To: numpy-discussion at lists.sourceforge.net >> Subject: [Numpy-discussion] where() has started returning a tuple!? >> >> >> I think the following behavior is (only recently) wrong: >> >> In [7]: numpy.__version__ >> Out[7]: '0.9.7.2360' >> >> In [8]: numpy.nonzero([True, False, True]) >> Out[8]: array([0, 2]) >> >> In [9]: numpy.where([True, False, True]) >> Out[9]: (array([0, 2]),) >> >> Note the tuple output to where(), which should be the same as nonzero. >> >> Andrew >> From ryanlists at gmail.com Mon Apr 17 05:57:03 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Mon Apr 17 05:57:03 2006 Subject: [Numpy-discussion] Re: Code Question In-Reply-To: References: <1145116214.444116365d326@webmail.colorado.edu> <1145133185.4441588148cb3@webmail.colorado.edu> <1145221290.4442b0aa55961@webmail.colorado.edu> Message-ID: Alright Saqib, Robert is right that you should try fft in perl if you don't want to learn Python. But as I understand it, you want to read in this file, fft it, and write the fft to a file using only numarray. Attached is a script that does that. Most of the script is just low-level file io to avoid having to install scipy to read and write the arrays. Hope this helps, Ryan On 4/16/06, Robert Kern wrote: > Saqib bin Sohail wrote: > > > I am sure all of you will say why not use the existing functionalities, but > > after 3 days I haven't gotten anywhere. All I need to do is get FFT of some > > sound files so that I can verify the result of FFT's and compare them with my > > FFT code in VxWorks. > > Well, if you are just trying to get an independent verification of your VxWorks > FFT code, and you are much more comfortable with Perl, then you might want to > use one of the FFT libraries available for Perl like Math::FFT. > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: read_fft_write_numarray.py Type: text/x-python Size: 872 bytes Desc: not available URL: From chanley at stsci.edu Mon Apr 17 06:24:06 2006 From: chanley at stsci.edu (Christopher Hanley) Date: Mon Apr 17 06:24:06 2006 Subject: [Numpy-discussion] Vectorize bug In-Reply-To: <44404A5B.5010802@ieee.org> References: <00fa01c6601e$c7707840$0502010a@dsp.sun.ac.za> <44404A5B.5010802@ieee.org> Message-ID: <4443969D.4090604@stsci.edu> Travis Oliphant wrote: > I'm not sure if the Solaris crash is fixed or not yet after the recent > changes to SVN. There may be more than one bug here... The numpy.test() unit tests no longer cause segfaults on Solaris. All of my daily numpy regression tests are now passing for Solaris. Thank you for your time and help, Chris From michael.sorich at gmail.com Mon Apr 17 17:13:09 2006 From: michael.sorich at gmail.com (Michael Sorich) Date: Mon Apr 17 17:13:09 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array Message-ID: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> On 4/8/06, Sasha wrote: > > ... > See above. For ndarray mask is always False unless an add-on module is > loaded that redefines arithmetic to recognize special bit-patterns > such as NaN or INT_MIN. > > Is it possible to implement masked values using these special bit patterns in the ndarray instead of using a separate MA class? If so has there been any thought as to whether this may be the better option. I think it would be preferable if the ability to handle masked data was available in the standard array class (ndarray), as this would increase the likelihood that functions built for numeric arrays will handle masked values well. It seems that ndarray already has decent support for nans (isnan() returns the equivalent of a boolean mask array), indicating that such an approach may be acceptable. How difficult is it to generalise the concept to other data types (int, string, bool)? Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Apr 17 19:53:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 17 19:53:01 2006 Subject: [Numpy-discussion] Re: using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> Message-ID: Michael Sorich wrote: > On 4/8/06, *Sasha* > wrote: > > ... > > See above. For ndarray mask is always False unless an add-on module is > loaded that redefines arithmetic to recognize special bit-patterns > such as NaN or INT_MIN. > > Is it possible to implement masked values using these special bit > patterns in the ndarray instead of using a separate MA class? If so has > there been any thought as to whether this may be the better option. I > think it would be preferable if the ability to handle masked data was > available in the standard array class (ndarray), as this would increase > the likelihood that functions built for numeric arrays will handle > masked values well. It seems that ndarray already has decent support for > nans (isnan() returns the equivalent of a boolean mask array), > indicating that such an approach may be acceptable. How difficult is it > to generalise the concept to other data types (int, string, bool)? Well, I'm certainly dead set against any change that would make all arrays that happen to contain those special values to be treated as masked arrays. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Mon Apr 17 23:04:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 17 23:04:04 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> Message-ID: <44448138.2080402@ieee.org> Michael Sorich wrote: > On 4/8/06, *Sasha* > wrote: > > ... > > See above. For ndarray mask is always False unless an add-on module is > loaded that redefines arithmetic to recognize special bit-patterns > such as NaN or INT_MIN. > > > Is it possible to implement masked values using these special bit > patterns in the ndarray instead of using a separate MA class? If so > has there been any thought as to whether this may be the better > option. I think it would be preferable if the ability to handle masked > data was available in the standard array class (ndarray), as this > would increase the likelihood that functions built for numeric arrays > will handle masked values well. It seems that ndarray already has > decent support for nans (isnan() returns the equivalent of a boolean > mask array), indicating that such an approach may be acceptable. How > difficult is it to generalise the concept to other data types (int, > string, bool)? > I don't think the approach can be generalized at all. It would only work with floating-point values and therefore is not particularly exciting. I think ultimately, making masked arrays a C-based sub-class is where masked array should go. For now the Python-based class is a good environment for developing the ideas behind how to preserve masked arrays through other functions if it is possible. It seems that masked arrays must do things quite differently than other arrays on certain applications, and I'm not altogether clear on how to support them in all the NumPy code. Because masked arrays are not used by everybody who uses NumPy arrays, it should be a separate sub-class. Ultimately, I hope we will get the basic array object into Python (what Tim was calling the super array) before 2.6 -Travis From svetosch at gmx.net Tue Apr 18 01:15:01 2006 From: svetosch at gmx.net (Sven Schreiber) Date: Tue Apr 18 01:15:01 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <443EDFE7.6010509@cox.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> <443EDFE7.6010509@cox.net> Message-ID: <44449FC4.8020406@gmx.net> [Sorry for the late reaction, I was on vacation.] Tim Hochberg schrieb: >> > Here's my best guess as to what is going on: > 1. There is a relatively large group of people who use Kronecker > product as Alan does (probably the matrix as opposed to tensor math > folks). I'm guessing it's a large group since they manage to write the > definitions at both mathworld and planetmath. Yes. > 2. kron was meant to implement this. That's what I thought, anyway. > 2.5 People who need the other meaning of kron can just use outer, so > no real conflict. > 3. The implementation was either inappropriately generalized or it > was assumed that all inputs would be matrices (and hence rank-2). > > Assuming 3. is correct, and I'd like to hear from people if they think > that the behaviour in the non rank-2 cases is sensible, the next > question is whether the behaviour in the rank-2 cases makes sense. It > seem to, but I'm not a user of kron. If both of the preceeding are true, > it seems like a complete fix entails the following two things: > 1. Forbid arguments that are not rank-2. This allows all matrices, > which is really the main target here I think. > 2. Fix the return type issue. I have a fix for this ready to commit, > but I want to figure out the first part as well. > Both 1 and 2 sound very good to me as a user. So, should I still submit a new ticket about kron, or is it already being fixed? Greetings, Sven From a.u.r.e.l.i.a.n at gmx.net Tue Apr 18 01:46:04 2006 From: a.u.r.e.l.i.a.n at gmx.net (Johannes Loehnert) Date: Tue Apr 18 01:46:04 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: References: Message-ID: <200604181045.05058.a.u.r.e.l.i.a.n@gmx.net> On Thursday 13 April 2006 19:16, Ryan Krauss wrote: > which makes this: > myvect=where((f>19.5) & (f<38) & > (phase>0),ones(shape(phase)),zeros(shape(phase))) > > actually really silly, sense all it is a complicated way to get back > the input of > (f>19.5) & (f<38) & (phase>0) > ...but you should cast the second to signed int32, otherwise a = (f>19.5) & (f<38) & (phase>0) print a-1 will give an array of 0's and 255's :) (since boolean arrays are by default upcast to unsigned int8) Johannes From ryanlists at gmail.com Tue Apr 18 05:31:15 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Tue Apr 18 05:31:15 2006 Subject: [Numpy-discussion] Re: where In-Reply-To: <200604181045.05058.a.u.r.e.l.i.a.n@gmx.net> References: <200604181045.05058.a.u.r.e.l.i.a.n@gmx.net> Message-ID: You are right. I actually did run into a problem with this. I was trying to subtract 360 degrees from the phase of some fft data and I multiplied -360 (no dot) times my bool array. It took me a while to track that one down. Ryan On 4/18/06, Johannes Loehnert wrote: > On Thursday 13 April 2006 19:16, Ryan Krauss wrote: > > which makes this: > > myvect=where((f>19.5) & (f<38) & > > (phase>0),ones(shape(phase)),zeros(shape(phase))) > > > > actually really silly, sense all it is a complicated way to get back > > the input of > > (f>19.5) & (f<38) & (phase>0) > > > > ...but you should cast the second to signed int32, otherwise > > a = (f>19.5) & (f<38) & (phase>0) > print a-1 > > will give an array of 0's and 255's :) (since boolean arrays are by default > upcast to unsigned int8) > > Johannes > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From tim.hochberg at cox.net Tue Apr 18 06:24:09 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 18 06:24:09 2006 Subject: [Numpy-discussion] Toward release 1.0 of NumPy In-Reply-To: <44449FC4.8020406@gmx.net> References: <443D9543.8040601@ee.byu.edu> <443E096D.3040407@gmx.net> <443E7109.6080808@cox.net> <443EC2B4.807@cox.net> <443EDFE7.6010509@cox.net> <44449FC4.8020406@gmx.net> Message-ID: <4444E7DD.2010209@cox.net> Sven Schreiber wrote: >[Sorry for the late reaction, I was on vacation.] > >Tim Hochberg schrieb: > > > >>Here's my best guess as to what is going on: >> 1. There is a relatively large group of people who use Kronecker >>product as Alan does (probably the matrix as opposed to tensor math >>folks). I'm guessing it's a large group since they manage to write the >>definitions at both mathworld and planetmath. >> >> > >Yes. > > > >> 2. kron was meant to implement this. >> >> > >That's what I thought, anyway. > > > >> 2.5 People who need the other meaning of kron can just use outer, so >>no real conflict. >> 3. The implementation was either inappropriately generalized or it >>was assumed that all inputs would be matrices (and hence rank-2). >> >>Assuming 3. is correct, and I'd like to hear from people if they think >>that the behaviour in the non rank-2 cases is sensible, the next >>question is whether the behaviour in the rank-2 cases makes sense. It >>seem to, but I'm not a user of kron. If both of the preceeding are true, >>it seems like a complete fix entails the following two things: >> 1. Forbid arguments that are not rank-2. This allows all matrices, >>which is really the main target here I think. >> 2. Fix the return type issue. I have a fix for this ready to commit, >>but I want to figure out the first part as well. >> >> >> > >Both 1 and 2 sound very good to me as a user. > >So, should I still submit a new ticket about kron, or is it already >being fixed? > > Go ahead and submit a ticket if you would. I have a fix here, but I've been waiting to submit it till I heard from some other people who use kron (and because I've been swamped the last couple of days). If you submit the ticket, that'll keep it from falling through the cracks. Thanks for the feedback, -tim >Greetings, >Sven > > > > From ndarray at mac.com Tue Apr 18 07:06:22 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 18 07:06:22 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <44448138.2080402@ieee.org> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> Message-ID: On 4/18/06, Travis Oliphant wrote: > Michael Sorich wrote: > ... > > Is it possible to implement masked values using these special bit > > patterns in the ndarray instead of using a separate MA class? If so > > has there been any thought as to whether this may be the better > > option. I think it would be preferable if the ability to handle masked > > data was available in the standard array class (ndarray), as this > > would increase the likelihood that functions built for numeric arrays > > will handle masked values well. It seems that ndarray already has > > decent support for nans (isnan() returns the equivalent of a boolean > > mask array), indicating that such an approach may be acceptable. How > > difficult is it to generalise the concept to other data types (int, > > string, bool)? > > > I don't think the approach can be generalized at all. It would only > work with floating-point values and therefore is not particularly exciting. > Not true. R supports "NA" for all its types except raw bytes. For example: > x<-logical(5) > x [1] FALSE FALSE FALSE FALSE FALSE > x[1:2]=NA > !x [1] NA NA TRUE TRUE TRUE > I think ultimately, making masked arrays a C-based sub-class is where > masked array should go. For now the Python-based class is a good > environment for developing the ideas behind how to preserve masked > arrays through other functions if it is possible. > I've voiced my opposition to subclassing before. Here I believe it is more appropriate to have an add-on module that installs alternative math functions. Having two classes in the same application that a subtly different in the corner cases is already a problem with ma.array vs. ndarray, adding the third class will only make things worse. > It seems that masked arrays must do things quite differently than other > arrays on certain applications, and I'm not altogether clear on how to > support them in all the NumPy code. Because masked arrays are not used > by everybody who uses NumPy arrays, it should be a separate sub-class. > As far as I understand, people who don't use MA don't deal with missing values. For this category of users there will be no visible effect no matter how missing values are treated as long as in the absence of missing values, normal rules apply. Yes, many functions must treat missing values differently, but the same is true for NaNs. NumPy allows floating point arrays to have nans, but there is no real support beyong what happened to work at the OS level. For example: >>> sort([5,nan,3,2]) array([ 5. , nan, 2. , 3. ]) Also, what is the justification for >>> int_(nan) 0 ? > Ultimately, I hope we will get the basic array object into Python (what > Tim was calling the super array) before 2.6 As far as I understand, that object will not come with arithmetic rules or math functions. Therefore, I don't see how this is relevant to the present discussion. From oliphant.travis at ieee.org Tue Apr 18 09:39:11 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 09:39:11 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> Message-ID: <44451611.9070707@ieee.org> Sasha wrote: > On 4/18/06, Travis Oliphant wrote: > >> Michael Sorich wrote: >> ... >> >>> Is it possible to implement masked values using these special bit >>> patterns in the ndarray instead of using a separate MA class? If so >>> has there been any thought as to whether this may be the better >>> option. I think it would be preferable if the ability to handle masked >>> data was available in the standard array class (ndarray), as this >>> would increase the likelihood that functions built for numeric arrays >>> will handle masked values well. It seems that ndarray already has >>> decent support for nans (isnan() returns the equivalent of a boolean >>> mask array), indicating that such an approach may be acceptable. How >>> difficult is it to generalise the concept to other data types (int, >>> string, bool)? >>> >>> >> I don't think the approach can be generalized at all. It would only >> work with floating-point values and therefore is not particularly exciting. >> >> > Not true. R supports "NA" for all its types except raw bytes. > For example: > > >> x<-logical(5) >> x >> > [1] FALSE FALSE FALSE FALSE FALSE > >> x[1:2]=NA >> !x >> > [1] NA NA TRUE TRUE TRUE > For Boolean values there is "room" for a NA value, but what about arbitrary integers. Does R just limit the range of the integer value? That's what I meant: "fiddling with special-values" doesn't generalize to all data-types. >> arrays through other functions if it is possible. >> >> > I've voiced my opposition to subclassing before. And you haven't been very clear about why you are opposed. Just voicing concern is not enough. Python sub-classing in C amounts to exactly what masked arrays are: arrays with additional components in their structure (i.e. a mask). Please be more specific about whatever your concerns are with sub-classing. > Here I believe it is > more appropriate to have an add-on module that installs alternative > math functions. Sure that will work. But, we're talking about more than math functions. Ultimately masked array users will want *every* function they use to work "right" with masked arrays. > Having two classes in the same application that a > subtly different in the corner cases is already a problem with > ma.array vs. ndarray, adding the third class will only make things > worse. > I don't know what you are talking about. What is the "third class?" I'm talking about just making ma.array construct a sub-class.. >> It seems that masked arrays must do things quite differently than other >> arrays on certain applications, and I'm not altogether clear on how to >> support them in all the NumPy code. Because masked arrays are not used >> by everybody who uses NumPy arrays, it should be a separate sub-class. >> >> > As far as I understand, people who don't use MA don't deal with > missing values. For this category of users there will be no visible > effect no matter how missing values are treated as long as in the > absence of missing values, normal rules apply. Yes, many functions > must treat missing values differently, but the same is true for NaNs. > NumPy allows floating point arrays to have nans, but there is no real > support beyong what happened to work at the OS level. > Or we deal with missing values differently (i.e. manage it ourselves). Sure, there will be no behavioral effect, but the code will have to be re-written to "do the right thing" with masked arrays in such a way as to not slow everything else down (that's at least an "if" statement sprinkled throughout every sub-routine). Many people are not enthused about complicating the basic array object any more than necessary. If it can be shown that masked arrays can be integrated into the ndarray object without inordinate complication and/or slowness, then I don't think people would mind. The best way to prove that is to create a sub-class and change only the methods / functions that are necessary. That's really all I'm saying. > >> Ultimately, I hope we will get the basic array object into Python (what >> Tim was calling the super array) before 2.6 >> > > As far as I understand, that object will not come with arithmetic > rules or math functions. Therefore, I don't see how this is relevant > to the present discussion. > Because it will help all array objects talk more cleanly to each other. But, if you are so opposed to sub-classing (which I'm not sure why in this case), then it may not matter. -Travis From strang at nmr.mgh.harvard.edu Tue Apr 18 10:37:03 2006 From: strang at nmr.mgh.harvard.edu (Gary Strangman) Date: Tue Apr 18 10:37:03 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: <44451611.9070707@ieee.org> References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> <44451611.9070707@ieee.org> Message-ID: >> Not true. R supports "NA" for all its types except raw bytes. >> For example: (snip) > > For Boolean values there is "room" for a NA value, but what about arbitrary > integers. Does R just limit the range of the integer value? That's what I > meant: "fiddling with special-values" doesn't generalize to all data-types. In R, I believe NA = -sys.maxint-1 Gary From oliphant.travis at ieee.org Tue Apr 18 11:09:03 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 11:09:03 2006 Subject: [Numpy-discussion] String (and unicode) comparisons and per-thread error handling fixed Message-ID: <44452B04.4090403@ieee.org> String comparisons were added last week. Today, I added per-thread error handling to NumPy. There is 1 more enhancement (scalar math) prior to 0.9.8 release --- but it will probably take 1-2 weeks. The new error handling means that the three-scope system is gone. Now, there is only one per-Python-thread global scope for error handling. If you change the error handling it will affect all ufuncs. Because of this, the seterr function now returns an object with the old error-handling information. This object must be passed to umath.seterrobj() in order to restore the error handling. -Travis From tim.hochberg at cox.net Tue Apr 18 11:21:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Tue Apr 18 11:21:06 2006 Subject: [Numpy-discussion] String (and unicode) comparisons and per-thread error handling fixed In-Reply-To: <44452B04.4090403@ieee.org> References: <44452B04.4090403@ieee.org> Message-ID: <44452D53.70009@cox.net> Travis Oliphant wrote: > > String comparisons were added last week. Today, I added per-thread > error handling to NumPy. There is 1 more enhancement (scalar math) > prior to 0.9.8 release --- but it will probably take 1-2 weeks. Oops! I'm about 2/3 done doing this one too. I think I'll go ahead and finish mine up and see how our approaches stack up performance wise and see if there's any of mine that's useful to roll into yours. -tim > > The new error handling means that the three-scope system is gone. > Now, there is only one per-Python-thread global scope for error > handling. If you change the error handling it will affect all > ufuncs. Because of this, the seterr function now returns an object > with the old error-handling information. This object must be passed > to umath.seterrobj() in order to restore the error handling. > > -Travis > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language > that extends applications into web and mobile media. Attend the live > webcast > and join the prime developer group breaking into this new coding > territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From oliphant.travis at ieee.org Tue Apr 18 12:14:14 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 12:14:14 2006 Subject: [Numpy-discussion] String (and unicode) comparisons and per-thread error handling fixed In-Reply-To: <44452D53.70009@cox.net> References: <44452B04.4090403@ieee.org> <44452D53.70009@cox.net> Message-ID: <44453A5E.4020506@ieee.org> Tim Hochberg wrote: > Travis Oliphant wrote: > >> >> String comparisons were added last week. Today, I added per-thread >> error handling to NumPy. There is 1 more enhancement (scalar math) >> prior to 0.9.8 release --- but it will probably take 1-2 weeks. > > Oops! I'm about 2/3 done doing this one too. I think I'll go ahead > and finish mine up and see how our approaches stack up performance > wise and see if there's any of mine that's useful to roll into yours. Darn. I thought I gave you enough time.... :-) On the other hand, all I did was change the way the error-mode is being looked-up (from the three dictionaries to just one). It's not much different than before except for that. I didn't do anything about the other ideas you spoke of. I did add a simple object to reset the error mode when it gets deleted, and had to fiddle with the seterr code a little to accept that object so that both methods of resetting the error mode work. A stack can certainly be built on top of what is now there (I'm thinking for numarray compatibility...), but I didn't do that. Sorry for stepping on your toes. I'm just anxious... I'll be gone for a couple of days and won't be working on NumPy/SciPy, so feel free to adjust. -Travis From rhl at astro.princeton.edu Tue Apr 18 13:07:04 2006 From: rhl at astro.princeton.edu (Robert Lupton) Date: Tue Apr 18 13:07:04 2006 Subject: [Numpy-discussion] Infinite recursion in numpy called from swig generated code In-Reply-To: References: <5809AC56-B2DF-4403-B7BC-9AEEAAC78505@astro.princeton.edu> <43FD32E4.10600@ieee.org> <44203F91.7010505@ieee.org> Message-ID: The latest version of swig (1.3.28 or 1.3.29) has broken my multiple-inheritance-from-C-and-numpy application; more specifically, it generates an infinite loop in numpy-land. I'm using numpy (0.9.6), and here's the offending code. Ideas anyone? I've pasted the crucial part of numpy.lib.UserArray onto the end of this message (how do I know? because you can replace the "from numpy.lib.UserArray" with this, and the problem persists). ##################################################### from numpy.lib.UserArray import * import types class myImage(types.ObjectType): def __init__(self, *args): this = None try: self.this.append(this) except: self.this = this class Image(UserArray, myImage): def __init__(self, *args): myImage.__init__(self, *args) ##################################################### The symptoms are: from recursionBug import *; Image(myImage()) ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in ? File "recursionBug.py", line 32, in __init__ myImage.__init__(self, *args) File "recursionBug.py", line 26, in __init__ except: self.this = this File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 187, in __setattr__ self.array.__setattr__(attr, value) File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 193, in __getattr__ return self.array.__getattribute__(attr) ... File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 193, in __getattr__ return self.array.__getattribute__(attr) File "/sw/lib/python2.4/site-packages/numpy/lib/UserArray.py", line 193, in __getattr__ return self.array.__getattribute__(attr) RuntimeError: maximum recursion depth exceeded The following stripped down piece of numpy seems to be the problem: class UserArray(object): def __setattr__(self,attr,value): try: self.array.__setattr__(attr, value) except AttributeError: object.__setattr__(self, attr, value) # Only called after other approaches fail. def __getattr__(self,attr): return self.array.__getattribute__(attr) R From cookedm at physics.mcmaster.ca Tue Apr 18 13:10:02 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Tue Apr 18 13:10:02 2006 Subject: [Numpy-discussion] Trac Wikis closed for anonymous edits until further notice In-Reply-To: <44421025.9060804@gmail.com> (Robert Kern's message of "Sun, 16 Apr 2006 04:36:37 -0500") References: <44421025.9060804@gmail.com> Message-ID: Robert Kern writes: > We've been hit badly by spammers, so I can only presume our Trac sites are now > on the traded spam lists. I am going to turn off anonymous edits for now. Ticket > creation will probably still be left open for now. Another thing that's concerned me is closing of tickets by anonymous; can we turn that off? It disturbs me when I'm browsing the RSS feed and I see that. If a user who's not a developer thinks it could be closed, they could post a comment saying that, and a developer could close it. > Many thanks to David Cooke for quickly removing the spam. The RSS feeds are great for that. Although having a way to quickly revert a change would have made it easier :-) > I am looking into ways to allow people to register themselves with the Trac > sites so they can edit the Wikis and submit tickets without needing to be added > by a project admin. that'd be good. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant.travis at ieee.org Tue Apr 18 13:50:09 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 18 13:50:09 2006 Subject: [Numpy-discussion] Infinite recursion in numpy called from swig generated code In-Reply-To: References: <5809AC56-B2DF-4403-B7BC-9AEEAAC78505@astro.princeton.edu> <43FD32E4.10600@ieee.org> <44203F91.7010505@ieee.org> Message-ID: <444550CF.6090100@ieee.org> Robert Lupton wrote: > The latest version of swig (1.3.28 or 1.3.29) has broken my > multiple-inheritance-from-C-and-numpy application; more specifically, > it generates an infinite loop in numpy-land. I'm using numpy (0.9.6), > and here's the offending code. Ideas anyone? I've pasted the crucial > part of numpy.lib.UserArray onto the end of this message (how do I know? > because you can replace the "from numpy.lib.UserArray" with this, and > the problem persists). This is a problem in the getattr code of UserArray. This is fixed in SVN. But, you can just replace the getattr code in UserArray.py with the following: def __getattr__(self,attr): if (attr == 'array'): return object.__getattr__(self, attr) return self.array.__getattribute__(attr) Thanks for finding and reporting this. -Travis From christian at marquardt.sc Tue Apr 18 14:48:06 2006 From: christian at marquardt.sc (Christian Marquardt) Date: Tue Apr 18 14:48:06 2006 Subject: [Numpy-discussion] using NaN, INT_MIN etc in ndarray instead of a masked array In-Reply-To: References: <16761e100604171712v195b47f5q111cb2c4519a03db@mail.gmail.com> <44448138.2080402@ieee.org> <44451611.9070707@ieee.org> Message-ID: <20053.84.167.224.64.1145396854.squirrel@webmail.marquardt.sc> On Tue, April 18, 2006 19:36, Gary Strangman wrote: > >>> Not true. R supports "NA" for all its types except raw bytes. >>> For example: > (snip) >> >> For Boolean values there is "room" for a NA value, but what about >> arbitrary >> integers. Does R just limit the range of the integer value? That's >> what I >> meant: "fiddling with special-values" doesn't generalize to all >> data-types. > > In R, I believe NA = -sys.maxint-1 Don't know if this helps, but I have found the following in the R Data Import/Export Manual (in section 6.5.1, available at http://cran.r-project.org/doc/manuals/R-data.html): The missing value for R logical and integer types is INT_MIN, the smallest representable int defined in the C header limits.h, normally corresponding to the bit pattern 0xffffffff. For doubles (I think R only uses double precision internally), it's a bit more complex apparently; in the section mentioned above, the authors explain that [If R's internal constant definitions / library functions can't be used], on all common platforms IEC 60559 (aka IEEE 754) arithmetic is used, so standard C facilities can be used to test for or set Inf, -Inf and NaN values. On such platforms NA is represented by the NaN value with low-word 0x7a2 (1954 in decimal). The implementation of the floating point NA value is done in the file arithmetics.c of the R source code; the relevant code snippets defining the NA "value" are (I believe) typedef union { double value; unsigned int word[2]; } ieee_double; #ifdef WORDS_BIGENDIAN static CONST int hw = 0; static CONST int lw = 1; #else /* !WORDS_BIGENDIAN */ static CONST int hw = 1; static CONST int lw = 0; #endif /* WORDS_BIGENDIAN */ static double R_ValueOfNA(void) { /* The gcc shipping with RedHat 9 gets this wrong without * the volatile declaration. Thanks to Marc Schwartz. */ volatile ieee_double x; x.word[hw] = 0x7ff00000; x.word[lw] = 1954; return x.value; } and the tests for a number being NA or NaN are int R_IsNA(double x) { if (isnan(x)) { ieee_double y; y.value = x; return (y.word[lw] == 1954); } return 0; } int R_IsNaN(double x) { if (isnan(x)) { ieee_double y; y.value = x; return (y.word[lw] != 1954); } return 0; } Hope this is useful, Christian. From twegener at radlogic.com.au Tue Apr 18 18:07:02 2006 From: twegener at radlogic.com.au (Tim Wegener) Date: Tue Apr 18 18:07:02 2006 Subject: [Numpy-discussion] Backporting numpy to Python 2.2 Message-ID: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Hi, I am attempting to backport numpy-0.9.6 to be compatible with python 2.2. (Some of our machines run python 2.2 as part of Red Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard setup.) I was able to change most of the 2.3-isms to be 2.2 compatible (see the attached patch). However I had problems compiling the following c module: In file included from numpy/core/src/multiarraymodule.c:64: numpy/core/src/arrayobject.c: In function `arraydescr_dealloc': numpy/core/src/arrayobject.c:8417: warning: passing arg 1 of pointer to function from incompatible pointer type numpy/core/src/multiarraymodule.c: In function `PyArray_DescrConverter': numpy/core/src/multiarraymodule.c:4072: `PyBool_Type' undeclared (first use in this function) numpy/core/src/multiarraymodule.c: In function `setup_scalartypes': numpy/core/src/multiarraymodule.c:5736: `PyBool_Type' undeclared (first use in this function) numpy/core/src/multiarraymodule.c: In function `initmultiarray': numpy/core/src/multiarraymodule.c:5897: `PyObject_SelfIter' undeclared (first use in this function) error: Command "gcc -DNDEBUG -O2 -g -pipe -march=i386 -mcpu=i686 -D_GNU_SOURCE -fPIC -fPIC -Ibuild/src/numpy/core/src -Inumpy/core/include -Ibuild/src/numpy/core -Inumpy/core/src -Inumpy/core/include -I/usr/include/python2.2 -c numpy/core/src/multiarraymodule.c -o build/temp.linux-i686-2.2/multiarraymodule.o" failed with exit status 1 Is it possible to modify this module for python 2.2 compatibility or have I reached a dead end? It would be great if numpy were compatible with 2.2 out of the box, given that 2.3 is only a couple of years old (new), and 2.2 is still quite widely deployed. I am trying to migrate to numpy from Numeric, which worked happily with 2.2. FYI, a quick summary of the compatibility amendments to the python code: - backported os.walk - backported enumerate - backported distutils.log - used slices instead of list.index(item, ) - used 'r' mode instead of 'U' mode (it didn't seem that universal newline support was needed where it was used) - used the {} way of building a new dict rather than using keyword args to the dict constructor - from __future__ import generators - used str.count(substr) rather than substr in str - used os.sep rather than os.path.sep - commented out some of the new Configuration keword arguments (download_url and classifiers) The above don't really affect the functionality, but a couple of more unusual changes were needed as well: - had to add "self.compiler.exe_extension = ''" to numpy/distutils/command/config.py (see patch) - had to change the following to and empty dict: "kws = {'depends':ext.depends}" in numpy/distutils/command/build_ext.py (see patch) These two changes may have unwanted side effects, and a better fix is probably needed there. Regards, Tim -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: numpy-0.9.6_patched_for_py2.2_diff.txt URL: From oliphant at ee.byu.edu Tue Apr 18 20:03:01 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 18 20:03:01 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060414213511.GA14355@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> Message-ID: <4445A822.60207@ee.byu.edu> faltet at xot.carabos.com wrote: >Hi, > >I'm seeing some slowness in NumPy when dealing with strided arrays. >numarray is dealing better with these situations, so I guess that >something could be done in NumPy about this. Below are the situations >that I've found up to now (maybe there are others). For the timings, >I've used numpy 0.9.7.2278 and numarray 1.5.1. > > The source of this slowness is the use in numarray of special-cases for certain-sized byte-copies. Apparently, it is *much* faster to do ((double *)dst)[0] = ((double *)src)[0] when you have aligned data than it is to do memmove(dst, src, sizeof(double)) This is a useful piece of knowledge to have for optimization. There may be other optimizations like that already used by Numarray but still needing to be adapted for NumPy. I applied an optimization to take advantage of this when possible and got a 10x speed-up in the 1-d case. My timings for your benchmark with current SVN of NumPy are: NumPy: [0.021701812744140625, 0.021739959716796875, 0.021548032760620117] Numarray: [0.052516937255859375, 0.052685976028442383, 0.052355051040649414] Old timings: NumPy: [~0.09, ~0.09, ~0.09] Numarray: [~0.05, ~0.05, ~0.05] -Travis From ndarray at mac.com Tue Apr 18 20:26:16 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 18 20:26:16 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <4445A822.60207@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> Message-ID: On 4/18/06, Travis Oliphant wrote: > [...] > Apparently, it is *much* faster to do > > ((double *)dst)[0] = ((double *)src)[0] > > when you have aligned data than it is to do > > memmove(dst, src, sizeof(double)) > > This is a useful piece of knowledge to have for optimization. This is not surprising because memmove has to assume arbitrary alignment and possibility of overlap between src and dst areas. From ndarray at mac.com Tue Apr 18 20:27:02 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 18 20:27:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <4445A822.60207@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> Message-ID: On 4/18/06, Travis Oliphant wrote: > [...] > Apparently, it is *much* faster to do > > ((double *)dst)[0] = ((double *)src)[0] > > when you have aligned data than it is to do > > memmove(dst, src, sizeof(double)) > > This is a useful piece of knowledge to have for optimization. This is not surprising because memmove has to assume arbitrary alignment and possibility of overlap between src and dst areas. From tim.hochberg at cox.net Wed Apr 19 08:58:04 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 19 08:58:04 2006 Subject: [Numpy-discussion] seterr changes Message-ID: <44465DEE.8090703@cox.net> Hi Travis et al, I started looking at your seterr changes. I stared at yours for a while then I stared at mine for a while. Then I decided that mine wouldn't work right in the presence of threads. Then I decided that yours wouldn't work right in the presence of threads either. Specifically, it looks like ufunc_update_use_defaults isn't going to work. I think I know how to fix that, but I'm not sure that it's worth the trouble since I also did some benchmarking and it appears that the benefit of special casing is minimal. I looked at six cases: small (len-1), medium (len-1e4) and large (len-1e6) arrays with error checking on and error checking off. For medium and large arrays, I could discern no difference at all. For small arrays, there may be some difference, but it appears to be less than 5%. I'm not sure it's worth working through a bunch of finicky thread stuff to get just 5% back. If these benchmark numbers hold up I'd be inclined to rip out the use_default support since it's complicated enough that I know we'll end up chasing a few evil thread related bugs down through it. I'll include the benchmarking code below. If people could (a) look it over and confirm that I'm not doing something bogus and (b) try it on some different platforms and see if they see a more signifigant difference, I'd appreciate it. I'm also curious about the seterr interface. It returns ufunc_values_obj. I'm wasn't sure how one is supposed to pass that back in to seterr, so I modified seterr to instead return a dictionary. I also modified it so that the seterr function itself has no defaults (or rather they're all None). Instead, any unspecified values are taken from the current error state. Thus seterr(divide="warn") changes only the divide state, leaving the other entries alone. Regards, -tim if True: from timeit import Timer setup = """ import numpy numpy.seterr(divide="%s") a = numpy.zeros([%s], dtype=float) """ for size in [1, 10000, 1000000]: for i in range(3): for state in ['ignore', 'warn']: reps = min(100000000 / size, 100000) timer = Timer("a * a", setup % (state, size)) print "%s|%s =>" % (state, size), timer.timeit(reps) print print From arkaitz.bitorika at gmail.com Wed Apr 19 10:30:03 2006 From: arkaitz.bitorika at gmail.com (Arkaitz Bitorika) Date: Wed Apr 19 10:30:03 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter Message-ID: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> Hi, I'm embedding Python in a big C++ program (the NS network simulator) and I have problems when importing the numpy module, I get a Floating Point exception. The C code that causes the exception is: Py_Initialize(); PyObject* module = PyImport_ImportModule("numpy"); Py_DECREF(module); I'm running Ubuntu Breezy on a dual processor Dell machine, with the stock python and numpy 0.9.6. One strange thing is that I haven't been able to reproduce the crash by writing a minimal C program with the code above, it only crashes when added to my program. I've been embedding Python for ages on the same program and other modules work fine, only numpy fails. I've debugged the issue a bit and I've seen that the exception is thrown when the numpy __init__.py tries to import the core module. The GDB backtrace is pasted at the end. Any idea what may be going wrong? Thanks, Arkaitz 0xb7900fd2 in initumath () at build/src/numpy/core/src/umathmodule.c:10321 10321 pinf *= mul; (gdb) bt #0 0xb7900fd2 in initumath () at build/src/numpy/core/src/umathmodule.c:10321 #1 0xb7e4e310 in _PyImport_LoadDynamicModule () from /usr/lib/libpython2.4.so.1.0 #2 0xb7e4c450 in _PyImport_FindModule () from /usr/lib/libpython2.4.so.1.0 #3 0xb7e4cc01 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #4 0xb7e4ce26 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #5 0xb7e4d2c6 in PyImport_ImportModuleEx () from /usr/lib/libpython2.4.so.1.0 #6 0xb7e22d9e in _PyUnicodeUCS4_ToLowercase () from /usr/lib/libpython2.4.so.1.0 #7 0xb7df5923 in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0 #8 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #9 0xb7e2a92c in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.4.so.1.0 #10 0xb7e2e8f9 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #11 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #12 0xb7e31b76 in PyEval_EvalCode () from /usr/lib/libpython2.4.so.1.0 #13 0xb7e4a525 in PyImport_ExecCodeModuleEx () from /usr/lib/libpython2.4.so.1.0 #14 0xb7e4a8e9 in PyImport_ExecCodeModule () from /usr/lib/libpython2.4.so.1.0 #15 0xb7e4c73e in _PyImport_FindModule () from /usr/lib/libpython2.4.so.1.0 #16 0xb7e4cc01 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #17 0xb7e4ce26 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #18 0xb7e4d2c6 in PyImport_ImportModuleEx () from /usr/lib/libpython2.4.so.1.0 #19 0xb7e22d9e in _PyUnicodeUCS4_ToLowercase () from /usr/lib/libpython2.4.so.1.0 #20 0xb7df5923 in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0 #21 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #22 0xb7e2a92c in PyEval_CallObjectWithKeywords () from /usr/lib/libpython2.4.so.1.0 #23 0xb7e2e8f9 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #24 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #25 0xb7e31b76 in PyEval_EvalCode () from /usr/lib/libpython2.4.so.1.0 #26 0xb7e5667f in PyRun_String () from /usr/lib/libpython2.4.so.1.0 #27 0xb7e2fce6 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #28 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #29 0xb7e3011a in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #30 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #31 0xb7de31b6 in PyFunction_SetClosure () from /usr/lib/libpython2.4.so.1.0 #32 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #33 0xb7dd079b in PyMethod_New () from /usr/lib/libpython2.4.so.1.0 #34 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #35 0xb7dcfd7b in PyInstance_NewRaw () from /usr/lib/libpython2.4.so.1.0 #36 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #37 0xb7e2f5d2 in PyEval_EvalFrame () from /usr/lib/libpython2.4.so.1.0 #38 0xb7e31a2d in PyEval_EvalCodeEx () from /usr/lib/libpython2.4.so.1.0 #39 0xb7e31b76 in PyEval_EvalCode () from /usr/lib/libpython2.4.so.1.0 #40 0xb7e4a525 in PyImport_ExecCodeModuleEx () from /usr/lib/libpython2.4.so.1.0 #41 0xb7e4a8e9 in PyImport_ExecCodeModule () from /usr/lib/libpython2.4.so.1.0 #42 0xb7e4c73e in _PyImport_FindModule () from /usr/lib/libpython2.4.so.1.0 #43 0xb7e4cc01 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #44 0xb7e4ce26 in PyImport_ReloadModule () from /usr/lib/libpython2.4.so.1.0 #45 0xb7e4d2c6 in PyImport_ImportModuleEx () from /usr/lib/libpython2.4.so.1.0 #46 0xb7e22d9e in _PyUnicodeUCS4_ToLowercase () from /usr/lib/libpython2.4.so.1.0 #47 0xb7df5923 in PyCFunction_Call () from /usr/lib/libpython2.4.so.1.0 #48 0xb7dc8fdf in PyObject_Call () from /usr/lib/libpython2.4.so.1.0 #49 0xb7dcc6c0 in PyObject_CallFunction () from /usr/lib/libpython2.4.so.1.0 #50 0xb7e4d745 in PyImport_Import () from /usr/lib/libpython2.4.so.1.0 #51 0xb7e4d918 in PyImport_ImportModule () from /usr/lib/libpython2.4.so.1.0 -------------- next part -------------- An HTML attachment was scrubbed... URL: From strawman at astraw.com Wed Apr 19 10:38:11 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 19 10:38:11 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> Message-ID: <44467576.1020708@astraw.com> Arkaitz Bitorika wrote: > Hi, > > I'm embedding Python in a big C++ program (the NS network simulator) > and I have problems when importing the numpy module, I get a Floating > Point exception. The C code that causes the exception is: I guess you mean a CPU/kernel level floating point exception (SIGFPE), not a Python exception? > > Py_Initialize(); > PyObject* module = PyImport_ImportModule("numpy"); > Py_DECREF(module); > > > I'm running Ubuntu Breezy on a dual processor Dell machine, with the > stock python and numpy 0.9.6. One strange thing is that I haven't been > able to reproduce the crash by writing a minimal C program with the > code above, it only crashes when added to my program. Does your program change error bits on the FPU or SSE units on your processor? (What processor are you using?) > I've been embedding Python for ages on the same program and other > modules work fine, only numpy fails. Most other modules don't use the SSE units, so wouldn't get hit by such a bug. > > I've debugged the issue a bit and I've seen that the exception is > thrown when the numpy __init__.py tries to import the core module. The > GDB backtrace is pasted at the end. > Any idea what may be going wrong? glibc 2.3.2 (e.g. in debian sarge) has a bug where the SSE unit has an error bit set wrong. But I'd guess Ubuntu isn't using this version of glibc, so I think the problem may be elsewhere. http://sources.redhat.com/bugzilla/show_bug.cgi?id=10 From strawman at astraw.com Wed Apr 19 11:30:10 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 19 11:30:10 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> Message-ID: <4446819D.3030401@astraw.com> Arkaitz Bitorika wrote: > > On 19 Apr 2006, at 18:37, Andrew Straw wrote: > >> >>> I've been embedding Python for ages on the same program and other >>> modules work fine, only numpy fails. >> >> >> Most other modules don't use the SSE units, so wouldn't get hit by such >> a bug. > > > Is there a way of not using those units from numpy, to check if > that's what's going on? I think that numpy only accesses the SSE units through ATLAS or other external library. So, build numpy without ATLAS. But I'm not 100% sure anymore if there aren't any optimizations that directly use SSE if it's available. > Or alternatively, how would I check if my program is messing with the > SSE bits? Hmm, I think that's a bit hairy. I'd suggest simply asking the C++ library's mailing list if they alter the error bits on the control registers of the SSE unit. (Out of curiousity, what library is it?) If you want hairy, though, I think you'd have to check from C with the appropriate calls -- I'd start with the source code in that bug report. It looks like they're inlining an assembly statement to query a SSE control register. From faltet at xot.carabos.com Wed Apr 19 14:49:02 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Wed Apr 19 14:49:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <4445A822.60207@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> Message-ID: <20060419214814.GA21524@xot.carabos.com> On Tue, Apr 18, 2006 at 09:01:54PM -0600, Travis Oliphant wrote: > faltet at xot.carabos.com wrote: > The source of this slowness is the use in numarray of special-cases for > certain-sized byte-copies. > > Apparently, it is *much* faster to do > > ((double *)dst)[0] = ((double *)src)[0] > > when you have aligned data than it is to do > > memmove(dst, src, sizeof(double)) Mmm.. very interesting. > My timings for your benchmark with current SVN of NumPy are: > > NumPy: [0.021701812744140625, 0.021739959716796875, 0.021548032760620117] > Numarray: [0.052516937255859375, 0.052685976028442383, 0.052355051040649414] Well, in my machine and using numpy SVN version: numpy: [0.0974161624908447, 0.0621590614318847, 0.0612149238586425] numarray: [0.0658359527587890, 0.0623040199279785, 0.0627131462097167] So, numpy and numarray exhibits same performance now (it's curious why you are actually getting better performance in your platform). However: In [25]: stnac=timeit.Timer('b=a.copy()','import numarray as np; a=np.arange(1000000,dtype="complex128")[::10]') In [26]: stnpc=timeit.Timer('b=a.copy()','import numpy as np; a=np.arange(1000000,dtype="complex128")[::10]') In [27]: stnac.repeat(3,10) Out[27]: [0.11303496360778809, 0.11540508270263672, 0.11556506156921387] In [28]: stnpc.repeat(3,10) Out[28]: [0.21353006362915039, 0.21468400955200195, 0.21390914916992188] So, it seems that you forgot optimizing complex types. Fortunately, the cure is easy; after adding the attached patch I'm getting: In [3]: stnpc.repeat(3,10) Out[3]: [0.10468602180480957, 0.10204982757568359, 0.10242295265197754] so, good performance for numpy in copying strided complex128 is achieved as well. Thanks for looking into this! Francesc ====================================================================== --- numpy/core/src/arrayobject.c (revision 2381) +++ numpy/core/src/arrayobject.c (working copy) @@ -629,6 +629,14 @@ char *tout = dst; char *tin = src; switch(elsize) { + case 16: + for (i=0; i References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> Message-ID: <20060420091351.475439ab.simon@arrowtheory.com> On Wed, 19 Apr 2006 11:29:49 -0700 Andrew Straw wrote: > > > > > Is there a way of not using those units from numpy, to check if > > that's what's going on? > > I think that numpy only accesses the SSE units through ATLAS or other > external library. So, build numpy without ATLAS. But I'm not 100% sure > anymore if there aren't any optimizations that directly use SSE if it's > available. We had to disable attlas-sse on our debian system for these exact reasons. Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From tom.denniston at alum.dartmouth.org Wed Apr 19 17:17:18 2006 From: tom.denniston at alum.dartmouth.org (Tom Denniston) Date: Wed Apr 19 17:17:18 2006 Subject: [Numpy-discussion] LAPACK question building numpy Message-ID: Is there a way to pass a command line argument to setup.py for numpy that does the equivalent of a make using the flags: -L/home/tdennist/lib -lmkl_lapack -lmkl_lapack32 -lmkl_ia32 -lmkl -lguide All i can find on the subject is a page on the scipy wiki that says to use the variable LAPACK and set it to a .a file. When I do so I get undefined symbol problems. I this is probably really obvous and documented somewhere but I haven't been able to find it. I don't really know where to look. --Tom From strawman at astraw.com Wed Apr 19 18:59:03 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 19 18:59:03 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <20060420091351.475439ab.simon@arrowtheory.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <20060420091351.475439ab.simon@arrowtheory.com> Message-ID: <4446EAB9.7010209@astraw.com> Simon Burton wrote: >On Wed, 19 Apr 2006 11:29:49 -0700 >Andrew Straw wrote: > > > >>>Is there a way of not using those units from numpy, to check if >>>that's what's going on? >>> >>> >>I think that numpy only accesses the SSE units through ATLAS or other >>external library. So, build numpy without ATLAS. But I'm not 100% sure >>anymore if there aren't any optimizations that directly use SSE if it's >>available. >> >> > >We had to disable attlas-sse on our debian system for these exact >reasons. > > If you're using debian sarge and the problem is your glibc, you can fix it: http://www.its.caltech.edu/~astraw/coding.html#id3 From robert.kern at gmail.com Wed Apr 19 19:43:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Wed Apr 19 19:43:02 2006 Subject: [Numpy-discussion] Re: LAPACK question building numpy In-Reply-To: References: Message-ID: Tom Denniston wrote: > Is there a way to pass a command line argument to setup.py for numpy > that does the equivalent of a make using the flags: > -L/home/tdennist/lib -lmkl_lapack -lmkl_lapack32 -lmkl_ia32 -lmkl -lguide > > All i can find on the subject is a page on the scipy wiki that says to > use the variable LAPACK and set it to a .a file. When I do so I get > undefined symbol problems. > > I this is probably really obvous and documented somewhere but I > haven't been able to find it. I don't really know where to look. Don't worry, it's not really well documented. Create a file called site.cfg in the root source directory. There's an example site.cfg.example there. Unfortunately, it's pretty sparse at the moment. Now, I'm not terribly familiar with the MKL, so I don't know what libraries do what, but here is my guess at the appropriate things you will need in site.cfg: [DEFAULT] library_dirs=/home/tdennist/lib:/some/other/path/perhaps include_dirs=/home/tdennist/include [blas_opt] libraries=whatever_the_mkl_blas_lib_is,mkl_ia32,mkl,guide [lapack_opt] libraries=mkl_lapack,mkl_lapack32,mkl_ia32,mkl,guide There's some more documentation in numpy/distutils/system_info.py . -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From faltet at xot.carabos.com Wed Apr 19 19:46:03 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Wed Apr 19 19:46:03 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060419214814.GA21524@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> <20060419214814.GA21524@xot.carabos.com> Message-ID: <20060420024510.GA21987@xot.carabos.com> On Wed, Apr 19, 2006 at 09:48:14PM +0000, faltet at xot.carabos.com wrote: > On Tue, Apr 18, 2006 at 09:01:54PM -0600, Travis Oliphant wrote: > > Apparently, it is *much* faster to do > > > > ((double *)dst)[0] = ((double *)src)[0] > > > > when you have aligned data than it is to do > > > > memmove(dst, src, sizeof(double)) > > Mmm.. very interesting. A follow-up on this. After analyzing somewhat the issue, it seems that the problem with the memcpy() version was not the call itself, but the parameter that was passed as the number of bytes to copy. As this was a parameter whose value was unknown in compile time, the compiler cannot generate optimized code for it and always has to fetch its value from memory (or cache). In the version of the code that you optimized, you managed to do this because you are telling to the compiler (i.e. specifying at compile time) the exact extend of the data copy, so allowing it to generate optimum code for the copy operation. However, if you do a similar thing but using the call (using doubles here): memcpy(tout, tin, 8); instead of: ((Float64 *)tout)[0] = ((Float64 *)tin)[0]; and repeat the operation for the other types, then you can achieve similar performance than the pointer version. On another hand, I see that you have disabled the optimization for unaligned data through the use of a check. Is there any reason for doing that? If I remove this check, I can achieve similar performance than for numarray (a bit better, in fact). I'm attaching a small benchmark script that compares the performance of copying a 1D vector of 1 million of elements in contiguous, strided (2 and 10), and strided (2 and 10 again) & unaligned flavors. The results for my machine (p4 at 2 GHz) are: For the original numpy code (i.e. before Travis optimization): time for numpy contiguous --> 0.234 time for numarray contiguous --> 0.229 time for numpy strided (2) --> 1.605 time for numarray strided (2) --> 0.263 time for numpy strided (10) --> 1.72 time for numarray strided (10) --> 0.264 time for numpy strided (2) & unaligned--> 1.736 time for numarray strided (2) & unaligned--> 0.402 time for numpy strided (10) & unaligned--> 1.872 time for numarray strided (10) & unaligned--> 0.435 where you can see that, for 1e6 elements the slowdown of original numpy is almost 7x (!). Remember that in the previous benchmarks sent here the slowdown was 3x, but we were copying 10 times less data. For the pointer optimised code (i.e. the current SVN version): time for numpy contiguous --> 0.238 time for numarray contiguous --> 0.232 time for numpy strided (2) --> 0.214 time for numarray strided (2) --> 0.264 time for numpy strided (10) --> 0.299 time for numarray strided (10) --> 0.262 time for numpy strided (2) & unaligned--> 1.736 time for numarray strided (2) & unaligned--> 0.401 time for numpy strided (10) & unaligned--> 1.874 time for numarray strided (10) & unaligned--> 0.433 here you can see that your figures are very similar to numarray except for unaligned data (4x slower). For the pointer optimised code but releasing the unaligned data check: time for numpy contiguous --> 0.236 time for numarray contiguous --> 0.231 time for numpy strided (2) --> 0.213 time for numarray strided (2) --> 0.262 time for numpy strided (10) --> 0.297 time for numarray strided (10) --> 0.261 time for numpy strided (2) & unaligned--> 0.263 time for numarray strided (2) & unaligned--> 0.403 time for numpy strided (10) & unaligned--> 0.452 time for numarray strided (10) & unaligned--> 0.432 Ei! numpy is very similar to numarray in all cases, except for the strided with 2 elements and unaligned case, where numpy performs a 50% better. Finally, and just for showing the effect of providing memcpy with size information in compilation time, the numpy code using memcpy() with this optimization on (and disabling the alignment check, of course!): time for numpy contiguous --> 0.234 time for numarray contiguous --> 0.233 time for numpy strided (2) --> 0.223 time for numarray strided (2) --> 0.262 time for numpy strided (10) --> 0.285 time for numarray strided (10) --> 0.262 time for numpy strided (2) & unaligned--> 0.261 time for numarray strided (2) & unaligned--> 0.401 time for numpy strided (10) & unaligned--> 0.42 time for numarray strided (10) & unaligned--> 0.436 you can see that the figures are very similar to the previous case. So Travis, you may want to use the pointer indirection approach or the memcpy() one, whichever you prefer. Well, I just wanted to point this out. Time for sleep! Francesc -------------- next part -------------- A non-text attachment was scrubbed... Name: bench-copy.py Type: text/x-python Size: 2054 bytes Desc: not available URL: From tim.hochberg at cox.net Wed Apr 19 19:57:06 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 19 19:57:06 2006 Subject: [Numpy-discussion] Summer of Code ideas Message-ID: <4446F8D8.40909@cox.net> Discussing ideas for summer of code projects seems to be all the rage right now on various other Python lists, so I though I'd throw out a few that I've had. There are several different things that could be done with numexpr including: 1. Adding broadcasting. 2. Coercing arrays a chunk at a time instead of all at once when coercion is necessary. 3. Fancier syntax. I think that some variant of the following could be made to work: with deferred_evaluation: # Converts everything in local namespace to special objects # all of these math operations are deferred a = 5 + b*32 c = a + 73 # Now all objects are restored and deferred experesions are evaluated. This might be cool or it might be useless, but it sounds fun to try. I haven't talked to David Cooke about any of these and since numexpr is really his project he should be consulted before anyone tries these. There's also some stuff to be done on the basearray front. I expect I'll have the actual basearray object together in the next couple of weeks depending on my level of busyness, but there'll be a lot of other stuff to do besides just that. My general plan it to build a toolkit around basearray that can be used to build other array packages. These packages might be lighter weight than numpy or they might be specialized in some way that's not really compatible with numpy and ndarray. There's also room for potential for experimentation with protocols / generic functions. If anyones interested I suggest you read the thread (currently dormant) on python-3000.devel on this topic. There are lots of possible applications for this in numpy including using them to implement or replace: * asarray * __array_priority__ (by making the ufuncs and thus __add__, etc overloaded functions). * __array__, __array_wrap__, etc. * all the various functions that are giving us trouble with MA. * probably a bunch of other stuff. The basic basearray toolkit I mentioned above would be a good place to experiment with stuff like this, once it exists, since in theory it will be simpler than the full numpy codebase and you don't have to worry so much about backwards compatibility. Anyway, that's a bunch of random ideas that I at least find interesting. Regards, -tim From oliphant at ee.byu.edu Wed Apr 19 20:44:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Wed Apr 19 20:44:02 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <20060420024510.GA21987@xot.carabos.com> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> <20060419214814.GA21524@xot.carabos.com> <20060420024510.GA21987@xot.carabos.com> Message-ID: <44470255.302@ee.byu.edu> faltet at xot.carabos.com wrote: >On Wed, Apr 19, 2006 at 09:48:14PM +0000, faltet at xot.carabos.com wrote: > > >>On Tue, Apr 18, 2006 at 09:01:54PM -0600, Travis Oliphant wrote: >> >> >>>Apparently, it is *much* faster to do >>> >>>((double *)dst)[0] = ((double *)src)[0] >>> >>>when you have aligned data than it is to do >>> >>>memmove(dst, src, sizeof(double)) >>> >>> >>Mmm.. very interesting. >> >> > >A follow-up on this. After analyzing somewhat the issue, it seems that >the problem with the memcpy() version was not the call itself, but the >parameter that was passed as the number of bytes to copy. As this was a >parameter whose value was unknown in compile time, the compiler cannot >generate optimized code for it and always has to fetch its value from >memory (or cache). > > >In the version of the code that you optimized, you managed to do this >because you are telling to the compiler (i.e. specifying at compile >time) the exact extend of the data copy, so allowing it to generate >optimum code for the copy operation. However, if you do a similar >thing but using the call (using doubles here): > >memcpy(tout, tin, 8); > >instead of: > >((Float64 *)tout)[0] = ((Float64 *)tin)[0]; > >and repeat the operation for the other types, then you can achieve >similar performance than the pointer version. > > This is good to know. It certainly makes sense. I'll test it on my system when I get back. >On another hand, I see that you have disabled the optimization for >unaligned data through the use of a check. Is there any reason for >doing that? If I remove this check, I can achieve similar performance >than for numarray (a bit better, in fact). > > The only reason was to avoid pointer dereferencing on misaligned data (dereferencing a misaligned pointer causes bus errors on Solaris). But, if we can achieve it with a memmove, then there is no reason to limit the code. >I'm attaching a small benchmark script that compares the performance >of copying a 1D vector of 1 million of elements in contiguous, strided >(2 and 10), and strided (2 and 10 again) & unaligned flavors. The >results for my machine (p4 at 2 GHz) are: > >For the original numpy code (i.e. before Travis optimization): > >time for numpy contiguous --> 0.234 >time for numarray contiguous --> 0.229 >time for numpy strided (2) --> 1.605 >time for numarray strided (2) --> 0.263 >time for numpy strided (10) --> 1.72 >time for numarray strided (10) --> 0.264 >time for numpy strided (2) & unaligned--> 1.736 >time for numarray strided (2) & unaligned--> 0.402 >time for numpy strided (10) & unaligned--> 1.872 >time for numarray strided (10) & unaligned--> 0.435 > >where you can see that, for 1e6 elements the slowdown of original >numpy is almost 7x (!). Remember that in the previous benchmarks sent >here the slowdown was 3x, but we were copying 10 times less data. > >For the pointer optimised code (i.e. the current SVN version): > >time for numpy contiguous --> 0.238 >time for numarray contiguous --> 0.232 >time for numpy strided (2) --> 0.214 >time for numarray strided (2) --> 0.264 >time for numpy strided (10) --> 0.299 >time for numarray strided (10) --> 0.262 >time for numpy strided (2) & unaligned--> 1.736 >time for numarray strided (2) & unaligned--> 0.401 >time for numpy strided (10) & unaligned--> 1.874 >time for numarray strided (10) & unaligned--> 0.433 > >here you can see that your figures are very similar to numarray except >for unaligned data (4x slower). > >For the pointer optimised code but releasing the unaligned data check: > >time for numpy contiguous --> 0.236 >time for numarray contiguous --> 0.231 >time for numpy strided (2) --> 0.213 >time for numarray strided (2) --> 0.262 >time for numpy strided (10) --> 0.297 >time for numarray strided (10) --> 0.261 >time for numpy strided (2) & unaligned--> 0.263 >time for numarray strided (2) & unaligned--> 0.403 >time for numpy strided (10) & unaligned--> 0.452 >time for numarray strided (10) & unaligned--> 0.432 > >Ei! numpy is very similar to numarray in all cases, except for the >strided with 2 elements and unaligned case, where numpy performs a 50% >better. > >Finally, and just for showing the effect of providing memcpy with size >information in compilation time, the numpy code using memcpy() with >this optimization on (and disabling the alignment check, of course!): > >time for numpy contiguous --> 0.234 >time for numarray contiguous --> 0.233 >time for numpy strided (2) --> 0.223 >time for numarray strided (2) --> 0.262 >time for numpy strided (10) --> 0.285 >time for numarray strided (10) --> 0.262 >time for numpy strided (2) & unaligned--> 0.261 >time for numarray strided (2) & unaligned--> 0.401 >time for numpy strided (10) & unaligned--> 0.42 >time for numarray strided (10) & unaligned--> 0.436 > >you can see that the figures are very similar to the previous case. So >Travis, you may want to use the pointer indirection approach or the >memcpy() one, whichever you prefer. > >Well, I just wanted to point this out. Time for sleep! > > > Very, very useful information. 1000 Thank you's for talking the time to investigate and assemble it. Do you think the memmove would work similarly? -Travis From tom.denniston at alum.dartmouth.org Thu Apr 20 08:07:04 2006 From: tom.denniston at alum.dartmouth.org (Tom Denniston) Date: Thu Apr 20 08:07:04 2006 Subject: [Numpy-discussion] Re: LAPACK question building numpy In-Reply-To: References: Message-ID: Thanks for your help. I will try this. --Tom On 4/19/06, Robert Kern wrote: > Tom Denniston wrote: > > Is there a way to pass a command line argument to setup.py for numpy > > that does the equivalent of a make using the flags: > > -L/home/tdennist/lib -lmkl_lapack -lmkl_lapack32 -lmkl_ia32 -lmkl -lguide > > > > All i can find on the subject is a page on the scipy wiki that says to > > use the variable LAPACK and set it to a .a file. When I do so I get > > undefined symbol problems. > > > > I this is probably really obvous and documented somewhere but I > > haven't been able to find it. I don't really know where to look. > > Don't worry, it's not really well documented. Create a file called site.cfg in > the root source directory. There's an example site.cfg.example there. > Unfortunately, it's pretty sparse at the moment. Now, I'm not terribly familiar > with the MKL, so I don't know what libraries do what, but here is my guess at > the appropriate things you will need in site.cfg: > > [DEFAULT] > library_dirs=/home/tdennist/lib:/some/other/path/perhaps > include_dirs=/home/tdennist/include > > [blas_opt] > libraries=whatever_the_mkl_blas_lib_is,mkl_ia32,mkl,guide > > [lapack_opt] > libraries=mkl_lapack,mkl_lapack32,mkl_ia32,mkl,guide > > There's some more documentation in numpy/distutils/system_info.py . > > -- > Robert Kern > robert.kern at gmail.com > > "I have come to believe that the whole world is an enigma, a harmless enigma > that is made terrible by our own mad attempt to interpret it as though it had > an underlying truth." > -- Umberto Eco > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From faltet at xot.carabos.com Thu Apr 20 09:42:04 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Thu Apr 20 09:42:04 2006 Subject: [Numpy-discussion] Performance problems with strided arrays in NumPy In-Reply-To: <44470255.302@ee.byu.edu> References: <20060414213511.GA14355@xot.carabos.com> <4445A822.60207@ee.byu.edu> <20060419214814.GA21524@xot.carabos.com> <20060420024510.GA21987@xot.carabos.com> <44470255.302@ee.byu.edu> Message-ID: <20060420164132.GA23763@xot.carabos.com> On Wed, Apr 19, 2006 at 09:39:01PM -0600, Travis Oliphant wrote: >>On another hand, I see that you have disabled the optimization for >>unaligned data through the use of a check. Is there any reason for >>doing that? If I remove this check, I can achieve similar performance >>than for numarray (a bit better, in fact). > >The only reason was to avoid pointer dereferencing on misaligned data >(dereferencing a misaligned pointer causes bus errors on Solaris). >But, if we can achieve it with a memmove, then there is no reason to >limit the code. I see. Well, I've tried out with memmove instead than memcpy, and I can reproduce the same slowdown than it was seen previously to using your pointer addressing optimisation. I'm afraid that Shasha was right in that memmove check for not overwriting destination is the responsible for this. Having said that, and although I must admit that I don't know in deep the different situations under which the source of a copy may overlap the destination, my guess is that for typical element sizes (i.e. [1], 2, 4, 8 and 16) for which the optimization has been done, there is not any harm on using memcpy instead of memmove (admittedly, you may come with a counter-example of this, but I do hope you don't). In any case, the use of memcpy is completely equivalent to the current optimization using pointers except that, hopefully, pointer addressing is not made on unaligned data. So, perhaps using the memcpy approach in Solaris (under Sparc I guess) may avoid the bus errors. It would be nice if anyone with access to such a platform can confirm this point. I'm attaching a patch for current SVN numpy that uses the memcpy approach. Feel free to try it against the benchmarks (also attached). One last word, I've added a case for typesize 1 in addition of the existing ones as this effectively improves the speed for 1-byte types. Below are the speeds without the 1-byte case optimisation: time for numpy contiguous --> 0.03 time for numarray contiguous --> 0.062 time for numpy strided (2) --> 0.078 time for numarray strided (2) --> 0.064 time for numpy strided (10) --> 0.081 time for numarray strided (10) --> 0.07 I haven't added a case for the unaligned case because this makes non-sense for 1 byte sized types. and here with the 1-byte case optimisation added: time for numpy contiguous --> 0.03 time for numarray contiguous --> 0.062 time for numpy strided (2) --> 0.054 time for numarray strided (2) --> 0.065 time for numpy strided (10) --> 0.061 time for numarray strided (10) --> 0.07 you can notice an speed-up between a 30% and 45% over the previous case. Cheers, -------------- next part -------------- --- numpy/core/src/arrayobject.c (revision 2381) +++ numpy/core/src/arrayobject.c (working copy) @@ -628,28 +628,44 @@ intp i, j; char *tout = dst; char *tin = src; + /* For typical datasizes, the memcpy call is much faster than memmove + and perfectely safe */ switch(elsize) { + case 16: + for (i=0; ind) == src->nd && (nd > 0) && + if (!swap && (nd = dest->nd) == src->nd && (nd > 0) && PyArray_CompareLists(dest->dimensions, src->dimensions, nd)) { int maxaxis=0, maxdim=dest->dimensions[0]; int i; -------------- next part -------------- A non-text attachment was scrubbed... Name: bench-copy.py Type: text/x-python Size: 2053 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: bench-copy1.py Type: text/x-python Size: 1168 bytes Desc: not available URL: From rng7 at cornell.edu Thu Apr 20 13:49:13 2006 From: rng7 at cornell.edu (Ryan Gutenkunst) Date: Thu Apr 20 13:49:13 2006 Subject: [Numpy-discussion] Bypassing a[2].item()? Message-ID: <4447F397.7010006@cornell.edu> Hi all, I'm porting some code from old scipy to new scipy, and I've run into a rather large performance problem. The heart of the code is integrating a system of nonlinear differential equations using odeint. The function that dominates the time to run calculates the right hand side, given a current state x. (len(x) ~ 50.) Abstracted, the function looks like: def rhs(x) output = scipy.zeros(10, scipy.Float) a = x[0] b = x[1] ... output[0] = a/b + c*sqrt(d)... output[1] = b-a + 2*b... ... return output (I copy the elements of the current state to local variables to avoid the cost of repeatedly calling x.__getitem__, and to make the resulting equations easier to read.) When using numpy, a and b are now array scalars and the arithmetic is much slower, resulting in about a factor of 10 increase in runtimes from those using Numeric. I've tried doing: a = x[0].item(), which allows the arimetic be done on pure scalars. This is a little faster, but still results in a factor of 3 increase in runtime from old scipy. I imagine the slowdown comes from having to call __getitem__() followed by item() So questions: 1) I haven't followed the details of the array scalar discussions. Is it anticipated that array scalar arithmetic will eventually be as fast as arithmetic in native python types? 2) If not, is it possible to get a "pure" scalar directly from an array in one function call? Thanks for any help, Ryan -- Ryan Gutenkunst | Cornell LASSP | "It is not the mountain | we conquer but ourselves." Clark 535 / (607)227-7914 | -- Sir Edmund Hillary AIM: JepettoRNG | http://www.physics.cornell.edu/~rgutenkunst/ From robert.kern at gmail.com Thu Apr 20 14:20:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 20 14:20:02 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: <4447F397.7010006@cornell.edu> References: <4447F397.7010006@cornell.edu> Message-ID: Ryan Gutenkunst wrote: > So questions: > 1) I haven't followed the details of the array scalar discussions. Is it > anticipated that array scalar arithmetic will eventually be as fast as > arithmetic in native python types? More or less, if I'm not mistaken. This ticket is aimed at that: http://projects.scipy.org/scipy/numpy/ticket/55 > 2) If not, is it possible to get a "pure" scalar directly from an array > in one function call? float(x[0]) seems to be faster on my PowerBook. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rng7 at cornell.edu Thu Apr 20 15:21:11 2006 From: rng7 at cornell.edu (Ryan Gutenkunst) Date: Thu Apr 20 15:21:11 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: References: <4447F397.7010006@cornell.edu> Message-ID: <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> On Apr 20, 2006, at 5:18 PM, Robert Kern wrote: > Ryan Gutenkunst wrote: > >> So questions: >> 1) I haven't followed the details of the array scalar discussions. Is >> it >> anticipated that array scalar arithmetic will eventually be as fast as >> arithmetic in native python types? > > More or less, if I'm not mistaken. This ticket is aimed at that: > > http://projects.scipy.org/scipy/numpy/ticket/55 Good to hear. >> 2) If not, is it possible to get a "pure" scalar directly from an >> array >> in one function call? > > float(x[0]) seems to be faster on my PowerBook. It's faster for me, too, but float(x[0]) is still much slower than using Numeric where x[0] suffices. I guess I'll just have to warn my users away from the new scipy until numpy 0.9.8 comes out and scalar math is sped up. Cheers, Ryan -- Ryan Gutenkunst | Cornell Dept. of Physics | "It is not the mountain | we conquer but ourselves." Clark 535 / (607)255-6068 | -- Sir Edmund Hillary AIM: JepettoRNG | http://www.physics.cornell.edu/~rgutenkunst/ From robert.kern at gmail.com Thu Apr 20 16:22:09 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 20 16:22:09 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> References: <4447F397.7010006@cornell.edu> <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> Message-ID: Ryan Gutenkunst wrote: > On Apr 20, 2006, at 5:18 PM, Robert Kern wrote: > >> Ryan Gutenkunst wrote: >>> 2) If not, is it possible to get a "pure" scalar directly from an array >>> in one function call? >> >> float(x[0]) seems to be faster on my PowerBook. > > It's faster for me, too, but float(x[0]) is still much slower than using > Numeric where x[0] suffices. I guess I'll just have to warn my users > away from the new scipy until numpy 0.9.8 comes out and scalar math is > sped up. For that matter, a plain "x[0]" seems to be about 3x faster with Numeric than numpy. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant at ee.byu.edu Thu Apr 20 20:16:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 20 20:16:02 2006 Subject: [Numpy-discussion] Re: Bypassing a[2].item()? In-Reply-To: References: <4447F397.7010006@cornell.edu> <9b9f0633c5a242a6ab8a199708c8dd94@cornell.edu> Message-ID: <44484E44.2050300@ee.byu.edu> Robert Kern wrote: >Ryan Gutenkunst wrote: > > >>On Apr 20, 2006, at 5:18 PM, Robert Kern wrote: >> >> >> >>>Ryan Gutenkunst wrote: >>> >>> > > > >>>>2) If not, is it possible to get a "pure" scalar directly from an array >>>>in one function call? >>>> >>>> >>>float(x[0]) seems to be faster on my PowerBook. >>> >>> >>It's faster for me, too, but float(x[0]) is still much slower than using >>Numeric where x[0] suffices. I guess I'll just have to warn my users >>away from the new scipy until numpy 0.9.8 comes out and scalar math is >>sped up. >> >> > >For that matter, a plain "x[0]" seems to be about 3x faster with Numeric than numpy. > > > We are already special-casing the integer select code but could special-case the getitem code so that if nd==1 a faster construction is used. I think right now a 0-dim array is being created only to get destroyed later on return. Please add a ticket as this extremely common operation should be made as fast as possible. This is a little tricky because array_big_item is called in a few places and is expected to return an array. If it returns a scalar in those places segfaults can occur. Either checks need to be made in each of those cases or the special-casing needs to be in array_big_item_nice. I'm not sure which I prefer.... -Travis From simon at arrowtheory.com Thu Apr 20 23:24:59 2006 From: simon at arrowtheory.com (Simon Burton) Date: Thu Apr 20 23:24:59 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs Message-ID: <20060421162336.42285837.simon@arrowtheory.com> Hi, Inspired by numexpr, pypy and llvm, i've built a simple JIT for creating numpy "ufuncs" (they are not yet real ufuncs). It uses llvm[1] as the backend machine code generator. The main things it can do are: *) parse simple python code (function def's) *) generate SSA assembly code for llvm *) build ufunc code for applying to numpy array's When I say simple I mean it: def calc(a,b): c = (a+b)/2.0 return c No control flow or type inference has been implemented. As with numexpr, significant speedups are possible. I'm putting this announce here to see what the other numpy'ers think. $ svn co http://rubis.rsise.anu.edu.au/local/repos/elefant/pyjit bye, Simon. [1] http://llvm.org/ -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From oqdr at dcorthodontics.com Fri Apr 21 00:08:02 2006 From: oqdr at dcorthodontics.com (Rosalia Oneal) Date: Fri Apr 21 00:08:02 2006 Subject: [Numpy-discussion] six-pack Message-ID: <001901c66512$37850955$68c487dd@tswt.rkkudn> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: frosty.gif Type: image/gif Size: 26123 bytes Desc: not available URL: From cookedm at physics.mcmaster.ca Fri Apr 21 09:27:00 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 21 09:27:00 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong Message-ID: Travis, Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to sourceforge, instead of the sdist. The one there isn't the source, it's a binary distribution of a 32-bit Linux compile. It's been over a month, with 2684 downloads, and I can't find a mention that anybody's noticed this before... Have we silently lost people who think we're on crack, or are there 2684 people who haven't looked at what they got? [On a another note, the download URL on PyPi won't work with setuptools; I've fixed the setup.py in svn to use the correct one, but if you could fix it on PyPi and set it to http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103 then people can use easy_install to install numpy.] -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Fri Apr 21 09:30:01 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Fri Apr 21 09:30:01 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong In-Reply-To: (David M. Cooke's message of "Fri, 21 Apr 2006 12:25:52 -0400") References: Message-ID: cookedm at physics.mcmaster.ca (David M. Cooke) writes: > Travis, > > Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to > sourceforge, instead of the sdist. The one there isn't the source, > it's a binary distribution of a 32-bit Linux compile. Gah! My bad! When I convinced easy_install to grab the source, it grabbed numpy-0.9.6-py2.4-linux-i686.tar.gz instead, which of course is a binary package. *why* it grabbed that one is another story (that's not my platform! I'm on py2.4-linux-x86_64). -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From ndarray at mac.com Fri Apr 21 09:35:02 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 21 09:35:02 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong In-Reply-To: References: Message-ID: I've downloaded numpy-0.9.6.tar.gz from SF about a month ago and it was fine: > tar tzf ~/Archives/numpy-0.9.6.tar.gz numpy-0.9.6/ numpy-0.9.6/numpy/ numpy-0.9.6/numpy/core/ numpy-0.9.6/numpy/core/blasdot/ numpy-0.9.6/numpy/core/blasdot/_dotblas.c numpy-0.9.6/numpy/core/blasdot/cblas.h ... On 4/21/06, David M. Cooke wrote: > Travis, > > Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to > sourceforge, instead of the sdist. The one there isn't the source, > it's a binary distribution of a 32-bit Linux compile. > > It's been over a month, with 2684 downloads, and I can't find a > mention that anybody's noticed this before... Have we silently lost > people who think we're on crack, or are there 2684 people who haven't > looked at what they got? > > [On a another note, the download URL on PyPi won't work with > setuptools; I've fixed the setup.py in svn to use the correct one, but > if you could fix it on PyPi and set it to > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103 > then people can use easy_install to install numpy.] > > -- > |>|\/|< > /--------------------------------------------------------------------------\ > |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ > |cookedm at physics.mcmaster.ca > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From bsouthey at gmail.com Fri Apr 21 10:35:02 2006 From: bsouthey at gmail.com (Bruce Southey) Date: Fri Apr 21 10:35:02 2006 Subject: [Numpy-discussion] Source release of 0.9.6 on sourceforge is wrong In-Reply-To: References: Message-ID: Hi, I concurr as I downloaded and installed it yesterday (April 20) afternoon: (from my ls -l) : 2006-04-20 13:38 numpy-0.9.6.tar.gz I had no problems installing that version as the import numpy appeared to work. Regards Bruce On 4/21/06, Sasha wrote: > I've downloaded numpy-0.9.6.tar.gz from SF about a month ago and it was fine: > > > tar tzf ~/Archives/numpy-0.9.6.tar.gz > numpy-0.9.6/ > numpy-0.9.6/numpy/ > numpy-0.9.6/numpy/core/ > numpy-0.9.6/numpy/core/blasdot/ > numpy-0.9.6/numpy/core/blasdot/_dotblas.c > numpy-0.9.6/numpy/core/blasdot/cblas.h > ... > > > > On 4/21/06, David M. Cooke wrote: > > Travis, > > > > Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to > > sourceforge, instead of the sdist. The one there isn't the source, > > it's a binary distribution of a 32-bit Linux compile. > > > > It's been over a month, with 2684 downloads, and I can't find a > > mention that anybody's noticed this before... Have we silently lost > > people who think we're on crack, or are there 2684 people who haven't > > looked at what they got? > > > > [On a another note, the download URL on PyPi won't work with > > setuptools; I've fixed the setup.py in svn to use the correct one, but > > if you could fix it on PyPi and set it to > > http://sourceforge.net/project/showfiles.php?group_id=1369&package_id=175103 > > then people can use easy_install to install numpy.] > > > > -- > > |>|\/|< > > /--------------------------------------------------------------------------\ > > |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ > > |cookedm at physics.mcmaster.ca > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmdlnk&kid0709&bid&3057&dat1642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From robert.kern at gmail.com Fri Apr 21 11:28:11 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 21 11:28:11 2006 Subject: [Numpy-discussion] Re: Source release of 0.9.6 on sourceforge is wrong In-Reply-To: References: Message-ID: David M. Cooke wrote: > cookedm at physics.mcmaster.ca (David M. Cooke) writes: > >>Travis, >> >>Looks like you uploaded the bdist .tar.gz of NumPy 0.9.6 to >>sourceforge, instead of the sdist. The one there isn't the source, >>it's a binary distribution of a 32-bit Linux compile. > > Gah! My bad! When I convinced easy_install to grab the source, it > grabbed numpy-0.9.6-py2.4-linux-i686.tar.gz instead, which of course is a > binary package. > > *why* it grabbed that one is another story (that's not my platform! > I'm on py2.4-linux-x86_64). Phillip Eby tells me that the bdist_dumb packages there confuse some versions of setuptools. He fixed it this morning. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From faltet at xot.carabos.com Fri Apr 21 13:56:04 2006 From: faltet at xot.carabos.com (faltet at xot.carabos.com) Date: Fri Apr 21 13:56:04 2006 Subject: [Numpy-discussion] numexpr enhancements Message-ID: <20060421205530.GA25020@xot.carabos.com> Hi, After looking at the numpy performance issues on strided and unaligned data, I decided to have a try at the numexpr package and finally implemented better suport for them. As a result, numexpr can reach now a 2x of performance improvement for simple expressions, like 'a>2.'. In the way, I've added support for boolean expressions (&, | and ~, as in the where() function), a new boolean data type (important to get better performance on boolean expressions) and support for numarray (maintaining the compatibility with numpy, of course). I've called the new package numexpr 0.2 to not confuse it with existing 0.1. Well, let's hope that numexpr can continue making its way towards integration in numpy. You can fetch this new package at: http://www.carabos.com/downloads/divers/numexpr-0.2.tar.gz Finally, let me say that numexpr is a wonderful toy to get your hands dirty ;-) Many thanks to David (and Tim) for this! Cheers! Francesc From hetland at tamu.edu Fri Apr 21 15:02:12 2006 From: hetland at tamu.edu (Robert Hetland) Date: Fri Apr 21 15:02:12 2006 Subject: [Numpy-discussion] 'append' array method request. Message-ID: I find myself writing things like x = []; y = []; t = [] for line in open(filename).readlines(): xstr, ystr, tstr = line.split() x.append(float(xstr)) y.append(float(ystr)_ t.append(dateutil.parser.parse(tstr)) # or something similar x = asarray(x) y = asarray(y) t = asarray(t) I think it would be nice to be able to create empty arrays, and append the values onto the end as I loop through the file without creating the intermediate list. Is this reasonable? Is there a way to do this with existing methods or functions that I am missing? Is there a better way altogether? -Rob. ----- Rob Hetland, Assistant Professor Dept of Oceanography, Texas A&M University p: 979-458-0096, f: 979-845-6331 e: hetland at tamu.edu, w: http://pong.tamu.edu From robert.kern at gmail.com Fri Apr 21 15:13:07 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 21 15:13:07 2006 Subject: [Numpy-discussion] Re: 'append' array method request. In-Reply-To: References: Message-ID: Robert Hetland wrote: > > I find myself writing things like > > x = []; y = []; t = [] > for line in open(filename).readlines(): > xstr, ystr, tstr = line.split() > x.append(float(xstr)) > y.append(float(ystr)_ > t.append(dateutil.parser.parse(tstr)) # or something similar > x = asarray(x) > y = asarray(y) > t = asarray(t) > > I think it would be nice to be able to create empty arrays, and append > the values onto the end as I loop through the file without creating the > intermediate list. Is this reasonable? Not in the core array object, no. We can't make the underlying pointer point to something else (because you've just reallocated the whole memory block to add an item to the array) without invalidating all of the views on that array. This is also the reason that numpy arrays can't use the standard library's array module as its storage. That said: > Is there a way to do this with > existing methods or functions that I am missing? Is there a better way > altogether? We've done performance tests before. The fastest way that I've found is to use the stdlib array module to accumulate values (it uses the same preallocation strategy that Python lists use, and you can't create views from them, so you are always safe) and then create the numpy array using fromstring on that object (stdlib arrays obey the buffer protocol, so they will be treated like strings of binary data). I posted timings one or two or three years ago on one of the scipy lists. However, lists are fine if you don't need blazing speed/low memory usage. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ndarray at mac.com Fri Apr 21 15:20:01 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 21 15:20:01 2006 Subject: [Numpy-discussion] 'append' array method request. In-Reply-To: References: Message-ID: On 4/21/06, Robert Hetland wrote: > [...] > I think it would be nice to be able to create empty arrays, and > append the values onto the end as I loop through the file without > creating the intermediate list. Is this reasonable? Is there a way > to do this with existing methods or functions that I am missing? Is > there a better way altogether? > Numpy arrays cannot grow in-place because there is no way for an array to tell if it's data is shared with other arrays. You can use python's standard library arrays instead of lists: >>> from numpy import * >>> import array as a >>> x = a.array('i',[]) >>> x.append(1) >>> x.append(2) >>> x.append(3) >>> ndarray(len(x), dtype=int, buffer=x) array([1, 2, 3]) Note that data is not copied: >>> ndarray(len(x), dtype=int, buffer=x)[1] = 20 >>> x array('i', [1, 20, 3]) From charlesr.harris at gmail.com Fri Apr 21 18:50:02 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri Apr 21 18:50:02 2006 Subject: [Numpy-discussion] 'append' array method request. In-Reply-To: References: Message-ID: Hi, On 4/21/06, Robert Hetland wrote: > > > I find myself writing things like > > x = []; y = []; t = [] > for line in open(filename).readlines(): > xstr, ystr, tstr = line.split() > x.append(float(xstr)) > y.append(float(ystr)_ > t.append(dateutil.parser.parse(tstr)) # or something similar > x = asarray(x) > y = asarray(y) > t = asarray(t) I think you can read the ascii file directly into an array with numeric conversions (fromfile) then just reshape it to have x,y,z columns. For example: $[charris at E011704 ~]$ cat input.txt 1 2 3 4 5 6 7 8 9 Then after importing numpy into ipython: In [6]:fromfile('input.txt',sep=' ').reshape(-1,3) Out[6]: array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant.travis at ieee.org Fri Apr 21 19:51:07 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 21 19:51:07 2006 Subject: [Numpy-discussion] Re: seterr changes In-Reply-To: <44465DEE.8090703@cox.net> References: <44465DEE.8090703@cox.net> Message-ID: <444999E2.1040009@ieee.org> Tim Hochberg wrote: > > Hi Travis et al, > > I started looking at your seterr changes. Thank you very much for the help on this. I'm not an expert on threaded code by any means. In fact, as you clearly point out, I don't eat and drink what will work under threaded environments and what wont. Clearly global variables are problematic. That is the problem with the update_use_defaults bit, right? This is the way it was being managed before and I just changed names a bit to use PyThreadState_GetDict for the dictionary (it seems possible to use only from C until Python 2.4). I say if it only buys 5% on small arrays then it's not worth it as there are other fish to fry to make up for that 5% and I agree that tracking down threading problems due to a fanagled global variable is sticky. I did not think about the threading issues deeply enough. > I'm also curious about the seterr interface. It returns > ufunc_values_obj. I'm wasn't sure how one is supposed to pass that > back in to seterr, so I modified seterr to instead return a > dictionary. I also modified it so that the seterr function itself has > no defaults (or rather they're all None). Instead, any unspecified > values are taken from the current error state. Thus > seterr(divide="warn") changes only the divide state, leaving the other > entries alone. Returning an object is a late-in-the-game idea and should be critiqued. It can be passed to seterr (an attribute check grabs the actual list --- did you want to change it to a dictionary?). Doesn't a small list have faster access than a small dictionary? I'll look over your commits and comment later if I think of anything... I'm thrilled with your work. Best, -Travis From bitorika at cs.tcd.ie Sat Apr 22 03:18:00 2006 From: bitorika at cs.tcd.ie (bitorika at cs.tcd.ie) Date: Sat Apr 22 03:18:00 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <4446819D.3030401@astraw.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> Message-ID: <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> >> On 19 Apr 2006, at 18:37, Andrew Straw wrote: > I think that numpy only accesses the SSE units through ATLAS or other > external library. So, build numpy without ATLAS. But I'm not 100% sure > anymore if there aren't any optimizations that directly use SSE if it's > available. I've tried getting rid of all atlas, blas and lapack packages in my system and rebuilding numpy to use its own unoptimised lapack_lite, but no luck. Just trying to import numpy with PyImport_ImportModule("numpy") causes the program to crash with just a "Floating point exception" message output. The program I'm embedding Python in is the NS Network Simulator (http://www.isi.edu/nsnam/ns/). It's a complex C++ beast with its own Object-Tcl interpreter, but it's been working fine with embedded Python except for this numpy crash. I've used Numeric before and it worked fine as well. I'm lost now regarding what to work on to find a solution, anyone familiar with numpy internals has any suggestion? Thanks, Arkaitz From jordi.bofill at upc.edu Sat Apr 22 09:46:00 2006 From: jordi.bofill at upc.edu (Jordi Bofill) Date: Sat Apr 22 09:46:00 2006 Subject: [Numpy-discussion] Re: Dumping record arrays References: <200603302127.24231.pgmdevlist@mailcan.com> Message-ID: Pierre GM wrote: > Folks, > I'd like to dump/pickle some record arrays. The pickling works, the > unpickling raises a ValueError (on my version of numpy 0.9.6). (cf below). > Is this already corrected in the svn version ? > Thx > > > ########################################################################### > # > > x1 = array([21,32,14]) > x2 = array(['my','first','name']) > x3 = array([3.1, 4.5, 6.2]) > r = rec.fromarrays([x1,x2,x3], names='id, word, number') > > r.dump('dumper') > rb=load('dumper') > --------------------------------------------------------------------------- > exceptions.ValueError Traceback (most > recent call last) > > /home/backtopop/Work/workspace-python/pyflows/src/ > > /usr/lib64/python2.4/site-packages/numpy/core/numeric.py in load(file) > 331 if isinstance(file, type("")): > 332 file = _file(file,"rb") > --> 333 return _cload(file) > 334 > 335 # These are all essentially abbreviations > > /usr/lib64/python2.4/site-packages/numpy/core/_internal.py in > _reconstruct(subtype, shape, dtype) > 251 > 252 def _reconstruct(subtype, shape, dtype): > --> 253 return ndarray.__new__(subtype, shape, dtype) > 254 > 255 > > ValueError: ('data-type with unspecified variable length', _reconstruct at 0x2aaaafcf1578>, (, > (0,), 'V')) > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting > language that extends applications into web and mobile media. Attend the > live webcast and join the prime developer group breaking into this new > coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 I'm newbie moving from numarray and I also get this error. I tried svn records.py with the same result. Any hope in getting it fixed? The error can be reproduce from the source example: import numpy.core.records as rec r=rec.fromrecords([(456,'dbe',1.2),(2,'de',1.3)],names='col1,col2,col3') import cPickle print cPickle.loads(cPickle.dumps(r)) --------------------------------------------------------------------------- exceptions.ValueError Traceback (most recent call last) /home/jordi/temp/ /usr/lib/python2.4/site-packages/numpy/core/_internal.py in _reconstruct(subtype, shape, dt ype) 251 252 def _reconstruct(subtype, shape, dtype): --> 253 return ndarray.__new__(subtype, shape, dtype) 254 255 ValueError: ('data-type with unspecified variable length', , (, (0,), 'V')) From oliphant.travis at ieee.org Sat Apr 22 10:19:00 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 22 10:19:00 2006 Subject: [Numpy-discussion] Re: Dumping record arrays In-Reply-To: References: <200603302127.24231.pgmdevlist@mailcan.com> Message-ID: <444A653A.9020402@ieee.org> Jordi Bofill wrote: > Pierre GM wrote: > > >> Folks, >> I'd like to dump/pickle some record arrays. The pickling works, the >> unpickling raises a ValueError (on my version of numpy 0.9.6). (cf below). >> Is this already corrected in the svn version ? >> Thx >> >> >> >> > ########################################################################### > >> # >> >> x1 = array([21,32,14]) >> x2 = array(['my','first','name']) >> x3 = array([3.1, 4.5, 6.2]) >> r = rec.fromarrays([x1,x2,x3], names='id, word, number') >> >> This is fixed in SVN (but you have to get more than just the SVN records.py script). The needed change is in the __reduce__ method of the array object (which is in C). A re-compile is needed. NumPy 0.9.8 should be out in a few weeks. Best, -Travis >> r.dump('dumper') >> rb=load('dumper') >> --------------------------------------------------------------------------- >> exceptions.ValueError Traceback (most >> recent call last) >> >> /home/backtopop/Work/workspace-python/pyflows/src/ >> >> /usr/lib64/python2.4/site-packages/numpy/core/numeric.py in load(file) >> 331 if isinstance(file, type("")): >> 332 file = _file(file,"rb") >> --> 333 return _cload(file) >> 334 >> 335 # These are all essentially abbreviations >> >> /usr/lib64/python2.4/site-packages/numpy/core/_internal.py in >> _reconstruct(subtype, shape, dtype) >> 251 >> 252 def _reconstruct(subtype, shape, dtype): >> --> 253 return ndarray.__new__(subtype, shape, dtype) >> 254 >> 255 >> >> ValueError: ('data-type with unspecified variable length', > _reconstruct at 0x2aaaafcf1578>, (, >> (0,), 'V')) >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by xPML, a groundbreaking scripting >> language that extends applications into web and mobile media. Attend the >> live webcast and join the prime developer group breaking into this new >> coding territory! >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> > > I'm newbie moving from numarray and I also get this error. I tried svn > records.py with the same result. Any hope in getting it fixed? > The error can be reproduce from the source example: > > import numpy.core.records as rec > r=rec.fromrecords([(456,'dbe',1.2),(2,'de',1.3)],names='col1,col2,col3') > import cPickle > print cPickle.loads(cPickle.dumps(r)) > --------------------------------------------------------------------------- > exceptions.ValueError Traceback (most recent > call last) > > /home/jordi/temp/ > > /usr/lib/python2.4/site-packages/numpy/core/_internal.py in > _reconstruct(subtype, shape, dt ype) > 251 > 252 def _reconstruct(subtype, shape, dtype): > --> 253 return ndarray.__new__(subtype, shape, dtype) > 254 > 255 > > ValueError: ('data-type with unspecified variable length', _reconstruct at 0xb78f ce64>, (, > (0,), 'V')) > > > > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From fullung at gmail.com Sat Apr 22 10:53:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 22 10:53:05 2006 Subject: [Numpy-discussion] Re: seterr changes In-Reply-To: <444999E2.1040009@ieee.org> Message-ID: <005701c66635$82b3a930$0502010a@dsp.sun.ac.za> Hello all I was just wondering if someone could provide some example code that would cause an error if invalid is set to 'raise'? I also noticed that seterr returns the old values. Is this really useful? Consider its use in an IPython session: In [184]: N.geterr() Out[184]: {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under': 'ignore'} In [185]: N.seterr(over='raise') Out[185]: {'over': 'ignore', 'divide': 'ignore', 'invalid': 'ignore', 'under': 'ignore'} I think the following pattern would make sense, but it seems it doesn't work at present: old=N.geterr() N.seterr(over='raise') # so some calculations that might overflow N.seterr(old) This currently causes the following error: Traceback (most recent call last): File "", line 1, in ? File "C:\Python24\Lib\site-packages\numpy\core\numeric.py", line 426, in seterr maskvalue = ((_errdict[divide] << SHIFT_DIVIDEBYZERO) + TypeError: dict objects are unhashable Is this intended? I think it would be useful to be able to restore all the error states in one go. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 22 April 2006 04:50 > To: tim.hochberg at ieee.org; numpy-discussion > Subject: [Numpy-discussion] Re: seterr changes > > Tim Hochberg wrote: > > > > Hi Travis et al, > > > > I started looking at your seterr changes. > Thank you very much for the help on this. I'm not an expert on threaded > code by any means. In fact, as you clearly point out, I don't eat and > drink what will work under threaded environments and what wont. Clearly > global variables are problematic. That is the problem with the > update_use_defaults bit, right? This is the way it was being managed > before and I just changed names a bit to use PyThreadState_GetDict for > the dictionary (it seems possible to use only from C until Python 2.4). > > I say if it only buys 5% on small arrays then it's not worth it as there > are other fish to fry to make up for that 5% and I agree that tracking > down threading problems due to a fanagled global variable is sticky. I > did not think about the threading issues deeply enough. > > > I'm also curious about the seterr interface. It returns > > ufunc_values_obj. I'm wasn't sure how one is supposed to pass that > > back in to seterr, so I modified seterr to instead return a > > dictionary. I also modified it so that the seterr function itself has > > no defaults (or rather they're all None). Instead, any unspecified > > values are taken from the current error state. Thus > > seterr(divide="warn") changes only the divide state, leaving the other > > entries alone. > Returning an object is a late-in-the-game idea and should be critiqued. > It can be passed to seterr (an attribute check grabs the actual list --- > did you want to change it to a dictionary?). Doesn't a small list have > faster access than a small dictionary? > > I'll look over your commits and comment later if I think of anything... > > I'm thrilled with your work. > > Best, > > -Travis From rob at hooft.net Sat Apr 22 11:48:01 2006 From: rob at hooft.net (Rob Hooft) Date: Sat Apr 22 11:48:01 2006 Subject: [Numpy-discussion] Re: seterr changes In-Reply-To: <005701c66635$82b3a930$0502010a@dsp.sun.ac.za> References: <005701c66635$82b3a930$0502010a@dsp.sun.ac.za> Message-ID: <444A7A35.5090906@hooft.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Albert Strasheim wrote: | old=N.geterr() | N.seterr(over='raise') | # so some calculations that might overflow | N.seterr(old) You should try (but I didn't): N.seterr(**old) Rob - -- Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFESno1H7J/Cv8rb3QRAppZAKCGBRSvL++wg3wFer6odmG8sxyrFwCfQ1nq p0aVr4r+Z1ZfRBGQgir+KX0= =eZMa -----END PGP SIGNATURE----- From strawman at astraw.com Sat Apr 22 12:13:02 2006 From: strawman at astraw.com (Andrew Straw) Date: Sat Apr 22 12:13:02 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> Message-ID: <444A8026.3030307@astraw.com> bitorika at cs.tcd.ie wrote: >>>On 19 Apr 2006, at 18:37, Andrew Straw wrote: >>> >>> >>I think that numpy only accesses the SSE units through ATLAS or other >>external library. So, build numpy without ATLAS. But I'm not 100% sure >>anymore if there aren't any optimizations that directly use SSE if it's >>available. >> >> > >I've tried getting rid of all atlas, blas and lapack packages in my system >and rebuilding numpy to use its own unoptimised lapack_lite, but no luck. >Just trying to import numpy with PyImport_ImportModule("numpy") causes the >program to crash with just a "Floating point exception" message output. > >The program I'm embedding Python in is the NS Network Simulator >(http://www.isi.edu/nsnam/ns/). It's a complex C++ beast with its own >Object-Tcl interpreter, but it's been working fine with embedded Python >except for this numpy crash. I've used Numeric before and it worked fine >as well. > >I'm lost now regarding what to work on to find a solution, anyone familiar >with numpy internals has any suggestion? > > OK, going back to your original gdb traceback, it looks like the SIGFPE originated in the following funtion in umathmodule.c: static double pinf_init(void) { double mul = 1e10; double tmp = 0.0; double pinf; pinf = mul; for (;;) { pinf *= mul; if (pinf == tmp) break; tmp = pinf; } return pinf; } If you try just that function (instead of the whole Python interpreter and numpy module) and still get the exception, you'll be that much closer to narrowing down the issue. From robert.kern at gmail.com Sat Apr 22 18:58:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 22 18:58:01 2006 Subject: [Numpy-discussion] Re: Backporting numpy to Python 2.2 In-Reply-To: <20060419103554.4ac1df4a.twegener@radlogic.com.au> References: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Message-ID: Tim Wegener wrote: > Hi, > > I am attempting to backport numpy-0.9.6 to be compatible with python 2.2. (Some of our machines run python 2.2 as part of Red Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard setup.) I was able to change most of the 2.3-isms to be 2.2 compatible (see the attached patch). However I had problems compiling the following c module: I was hoping that Travis would jump in and talk about the reasons that he targetted 2.3 and not 2.2. I don't think that it's going to be feasible to target 2.2 at this point. If nothing else, I've long since forgotten how to write 2.2 code. More seriously, doing an overhaul of all of the C code in numpy to use the older API is just going to make the code clumsier and more difficult to maintain. I think it is going to be much easier for you to install a second, more recent Python interpreter on your machines than it will be for you to maintain a 2.2-compatible branch. Linux installations, even Red Hat, usually handle having multiple versions of Python installed side by side just fine. You don't have to remove Python 2.2. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From zpincus at stanford.edu Sat Apr 22 20:48:00 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Sat Apr 22 20:48:00 2006 Subject: [Numpy-discussion] Matrix and var method Message-ID: <83468068-4E41-45A1-9753-90CEADF34722@stanford.edu> Hi folks, I just ran across an error with numpy.matrix types: the var() method does not seem to work! (I've tried all sorts of permutations on the matrix shape, and the axis parameter to var; nothing works.) Perhaps this has already been fixed -- I haven't updated my numpy in a week or so. If so, sorry; if not, I hope this helps. Zach In [1]: import numpy In [2]: numpy.__version__ Out[2]: '0.9.7.2335' In [3]: numpy.matrix([[1,2,3], [1,2,3]]).var() ------------------------------------------------------------------------ --- exceptions.ValueError Traceback (most recent call last) /Users/zpincus/ /Library/Frameworks/Python.framework/Versions/2.4/lib/python2.4/site- packages/numpy/core/defmatrix.py in __mul__(self, other) 147 if isinstance(other, N.ndarray) or N.isscalar(other) or \ 148 not hasattr(other, '__rmul__'): --> 149 return N.dot(self, other) 150 else: 151 return NotImplemented ValueError: matrices are not aligned In [4]: numpy.array([[1,2,3], [1,2,3]]).var() Out[4]: 0.80000000000000004 From a.mcmorland at auckland.ac.nz Sun Apr 23 17:40:02 2006 From: a.mcmorland at auckland.ac.nz (Angus McMorland) Date: Sun Apr 23 17:40:02 2006 Subject: [Numpy-discussion] Error installing on amd64 Debian-unstable Message-ID: <444C1E24.8030603@auckland.ac.nz> I had no troubles installing numpy and scipy on my 32-bit laptop, but cannot get numpy to install on my amd64 debian desktop. I've pulled in the latest svn versions, then run: $ python setup.py install Installation seems to run okay (no error messages), but the following happens: In [1]: import numpy import core -> failed: /usr/lib/python2.3/site-packages/numpy/core/_sort.so: undefined symbol: PyArray_CompareUCS4 import lib -> failed: module compiled against version 90703 of C-API but this version of numpy is 90704 import linalg -> failed: module compiled against version 90703 of C-API but this version of numpy is 90704 import dft -> failed: cannot import name asarray import random -> failed: 'module' object has no attribute 'dtype' --------------------------------------------------------------------------- exceptions.ImportError Traceback (most recent call last) /home/amcmorl/ /usr/lib/python2.3/site-packages/numpy/__init__.py 47 return NumpyTest().test(level, verbosity) 48 ---> 49 import add_newdocs 50 51 if __doc__ is not None: /usr/lib/python2.3/site-packages/numpy/add_newdocs.py ----> 2 from lib import add_newdoc 3 4 add_newdoc('numpy.core','dtype', 5 [('fields', "Fields of the data-typedescr if any."), 6 ('alignment', "Needed alignment for this data-type"), ImportError: cannot import name add_newdoc Can anyone suggest what I'm doing wrong? Cheers, A. -- Angus McMorland email a.mcmorland at auckland.ac.nz mobile +64-21-155-4906 PhD Student, Neurophysiology / Multiphoton & Confocal Imaging Physiology, University of Auckland phone +64-9-3737-599 x89707 Armourer, Auckland University Fencing Secretary, Fencing North Inc. From robert.kern at gmail.com Sun Apr 23 17:55:08 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 23 17:55:08 2006 Subject: [Numpy-discussion] Re: Error installing on amd64 Debian-unstable In-Reply-To: <444C1E24.8030603@auckland.ac.nz> References: <444C1E24.8030603@auckland.ac.nz> Message-ID: Angus McMorland wrote: > I had no troubles installing numpy and scipy on my 32-bit laptop, but > cannot get numpy to install on my amd64 debian desktop. I've pulled in > the latest svn versions, then run: > > $ python setup.py install > > Installation seems to run okay (no error messages), but the following > happens: > > In [1]: import numpy > import core -> failed: > /usr/lib/python2.3/site-packages/numpy/core/_sort.so: undefined symbol: > PyArray_CompareUCS4 > import lib -> failed: module compiled against version 90703 of C-API but > this version of numpy is 90704 Please delete the build/ directory and the installed numpy package and rebuild. If the problem persists, please let us know. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 23 17:58:22 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 23 17:58:22 2006 Subject: [Numpy-discussion] Changing the Trac authentication Message-ID: <444C20E5.7090309@gmail.com> I will be changing the Trac authentication over the next hour or so. I will be installing the AccountManagerPlugin to allow users to create accounts for themselves without needing to have SVN write access. Anonymous users will not be able to edit the Wikis or tickets. Non-developer, but registered users will be able to do so with some restrictions, notably not being able to resolve tickets. Developers who currently have accounts will have the same username/password as before. If you have problems using the Trac sites before I announce that I am done, please wait until I am finished. If there are still problems, please let me know and I will try to fix them as soon as possible. Thank you for your patience. Hopefully, this change will resolve the spam problem. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sun Apr 23 18:12:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sun Apr 23 18:12:02 2006 Subject: [Numpy-discussion] Re: Changing the Trac authentication In-Reply-To: <444C20E5.7090309@gmail.com> References: <444C20E5.7090309@gmail.com> Message-ID: <444C25A9.8080701@gmail.com> Robert Kern wrote: > I will be changing the Trac authentication over the next hour or so. Never mind. I'll have to do it tomorrow when I get to the office. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rmuller at sandia.gov Mon Apr 24 09:12:13 2006 From: rmuller at sandia.gov (Rick Muller) Date: Mon Apr 24 09:12:13 2006 Subject: [Numpy-discussion] Problems building numpy Message-ID: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> Numpy really builds nicely now, and I appreciate all of the hard work that people have put into portability of this code. That being said, I just had my first system where Numpy failed to build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe it either. not my decision.) and I get the following error when trying to run Numpy: Python 2.4.3 (#1, Apr 24 2006, 09:54:46) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-42)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from numpy import array import linalg -> failed: /usr/local/lib/python2.4/site-packages/numpy/ linalg/lapack_lite.so: undefined symbol: s_wsfe If this is easy to fix, I'd prefer to fix it. However, if the numpy developers have better things to do than to support a 10-year-old operating system (and I suspect that they do), I'm cool with that. Rick Rick Muller rmuller at sandia.gov From arkaitz.bitorika at gmail.com Mon Apr 24 09:24:03 2006 From: arkaitz.bitorika at gmail.com (Arkaitz Bitorika) Date: Mon Apr 24 09:24:03 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: <444A8026.3030307@astraw.com> References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> <444A8026.3030307@astraw.com> Message-ID: Andrew, I've verified that the function causes the exception when embedded in the program but not when used from a simple C program with just a main () function. The successful version iterates 31 times over the for loop while the crashing one fails the 30th time that it does "pinf *= mul". Now we know exactly where the crash is, but no idea how to fix it ;). It doesn't look it should be related to SSE2 flags, it's just doing a big multiplication, but I don't know enough about low level C and floating point operations to understand why it may be throwing the exception there. Any idea how I could avoid that function crashing? Thanks, Arkaitz On 22 Apr 2006, at 20:12, Andrew Straw wrote: > OK, going back to your original gdb traceback, it looks like the > SIGFPE > originated in the following funtion in umathmodule.c: > > static double > pinf_init(void) > { > double mul = 1e10; > double tmp = 0.0; > double pinf; > > pinf = mul; > for (;;) { > pinf *= mul; > if (pinf == tmp) break; > tmp = pinf; > } > return pinf; > } > > If you try just that function (instead of the whole Python interpreter > and numpy module) and still get the exception, you'll be that much > closer to narrowing down the issue. From robert.kern at gmail.com Mon Apr 24 09:53:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 09:53:02 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> Message-ID: Rick Muller wrote: > Numpy really builds nicely now, and I appreciate all of the hard work > that people have put into portability of this code. > > That being said, I just had my first system where Numpy failed to > build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe > it either. not my decision.) and I get the following error when trying > to run Numpy: > > Python 2.4.3 (#1, Apr 24 2006, 09:54:46) > [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-42)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> from numpy import array > import linalg -> failed: /usr/local/lib/python2.4/site-packages/numpy/ > linalg/lapack_lite.so: undefined symbol: s_wsfe > > If this is easy to fix, I'd prefer to fix it. However, if the numpy > developers have better things to do than to support a 10-year-old > operating system (and I suspect that they do), I'm cool with that. This usually means that you are not linking in the g2c library: http://www.scipy.org/FAQ#head-26562f0a9e046b53eae17de300fc06408f9c91a8 -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ndarray at mac.com Mon Apr 24 10:07:06 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 10:07:06 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). Message-ID: I was looking at ticket 76: http://projects.scipy.org/scipy/numpy/ticket/76 At first, I concluded that the ticket was valid and that >>> a = zeros([5,2]) >>> a[:] = arange(5) should raise an error as it did in Numeric. However, once I started looking at the code, I've realized that numpy supports more flexible broadcasting rules than Numeric. For example: >>> x = zeros([10]) >>> x[:] = 1,2 >>> x array([1, 2, 1, 2, 1, 2, 1, 2, 1, 2]) That would be an error in Numeric. Given that the above is valid, the result in Ticket 76 actually makes sense. I believe it is time to have some discussion about the future of broadcasting rules in numpy. Can anyone provide a summary of the status quo? From oliphant.travis at ieee.org Mon Apr 24 10:43:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 10:43:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: Message-ID: <444D0DF7.2060307@ieee.org> Sasha wrote: > I was looking at ticket 76: > > http://projects.scipy.org/scipy/numpy/ticket/76 > > At first, I concluded that the ticket was valid and that > > >>>> a = zeros([5,2]) >>>> a[:] = arange(5) >>>> > > should raise an error as it did in Numeric. However, once I started > looking at the code, I've realized that numpy supports more flexible > broadcasting rules than Numeric. > This really isn't in the category of "broadcasting" as I see it. My understanding is that broadcasting refers to operations involving more than one array on the input side. It's really just a "universal function" concept. A copying operation is not handled using the same rules. In this case, for example, Numeric used to raise an error but in NumPy the array will be copied as many times as possible into the array. I don't believe ticket #76 is actually an error. This behavior could be changed if somebody wants to write the code to change it but only until version 1.0. It would be very difficult to change the other broadcasting behavior which was inherited from Numeric, however. The only possibility I see is adding new useful functionality where Numeric used to raise an error. -Travis From zpincus at stanford.edu Mon Apr 24 10:57:04 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Mon Apr 24 10:57:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444D0DF7.2060307@ieee.org> References: <444D0DF7.2060307@ieee.org> Message-ID: <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> > It would be very difficult to change the other broadcasting > behavior which was inherited from Numeric, however. The only > possibility I see is adding new useful functionality where Numeric > used to raise an error. Well, there is one case that I run into all of the time where the broadcasting rules seem a bit constraining: In [1]: import numpy In [2]: numpy.__version__ '0.9.7.2335' In [3]: a = numpy.ones([50, 100]) In [4]: means = a.mean(axis = 1) In [5]: print a.shape, means.shape (50, 100) (50,) In [5]: a / means ValueError: index objects are not broadcastable to a single shape In [6]: (a.transpose() / means).transpose() #this works It's obvious why this doesn't work due to the broadcasting rules, but it also seems (to me, in this case at least) obvious what I am trying to do. I don't think I'm suggesting that the broadcasting rules be changed to allow matching-from-the-right in the general case, since that seems likely to make the broadcasting rules even more difficult to grok. But there do seem to be a lot of (....transpose () ... ).transpose() bits in my code. Is there anything to be done here? I presume not, but I just wanted to mention it. Zach From oliphant.travis at ieee.org Mon Apr 24 11:25:06 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 11:25:06 2006 Subject: ***[Possible UCE]*** Re: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> References: <444D0DF7.2060307@ieee.org> <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> Message-ID: <444D17E6.1070104@ieee.org> Zachary Pincus wrote: >> It would be very difficult to change the other broadcasting behavior >> which was inherited from Numeric, however. The only possibility I >> see is adding new useful functionality where Numeric used to raise an >> error. > > Well, there is one case that I run into all of the time where the > broadcasting rules seem a bit constraining: > > In [1]: import numpy > In [2]: numpy.__version__ > '0.9.7.2335' > In [3]: a = numpy.ones([50, 100]) > In [4]: means = a.mean(axis = 1) > In [5]: print a.shape, means.shape > (50, 100) (50,) > In [5]: a / means > ValueError: index objects are not broadcastable to a single shape > In [6]: (a.transpose() / means).transpose() > #this works > > It's obvious why this doesn't work due to the broadcasting rules, but > it also seems (to me, in this case at least) obvious what I am trying > to do. I don't think I'm suggesting that the broadcasting rules be > changed to allow matching-from-the-right in the general case, since > that seems likely to make the broadcasting rules even more difficult > to grok. But there do seem to be a lot of (....transpose() ... > ).transpose() bits in my code. > > Is there anything to be done here? I presume not, but I just wanted to > mention it. Yes, just be more explicit about which end to tack extra dimensions onto (the automatic extension always assumes pre-pending...) a / means[:,newaxis] is the suggested spelling... -Travis From ndarray at mac.com Mon Apr 24 11:30:05 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 11:30:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> References: <444D0DF7.2060307@ieee.org> <4AB1DE92-E877-4E22-83AB-69DDBB32FB25@stanford.edu> Message-ID: On 4/24/06, Zachary Pincus wrote: > [...] > In [5]: print a.shape, means.shape > (50, 100) (50,) > In [5]: a / means > ValueError: index objects are not broadcastable to a single shape > In [6]: (a.transpose() / means).transpose() > #this works This works too: >>> x = a / means[:,newaxis] no .transpose() :-). From ndarray at mac.com Mon Apr 24 11:49:04 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 11:49:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444D0DF7.2060307@ieee.org> References: <444D0DF7.2060307@ieee.org> Message-ID: On 4/24/06, Travis Oliphant wrote: > [...] > A copying operation is not handled using the same rules. In this case, > for example, Numeric used to raise an error but in NumPy the array will > be copied as many times as possible into the array. I don't believe > ticket #76 is actually an error. > I disagree on the terminology. In my view broadcasting means repeating the values of the array to fit into a different shape no matter what dictates the new shape an operand or the receiver. IMHO the following is slightly confusing: >>> a = zeros([5,2]) >>> a[...] += arange(5) Traceback (most recent call last): File "", line 1, in ? ValueError: shape mismatch: objects cannot be broadcast to a single shape but >>> a[...] = arange(5) is ok. > This behavior could be changed if somebody wants to write the code to > change it but only until version 1.0. It would be very difficult to > change the other broadcasting behavior which was inherited from Numeric, > however. The only possibility I see is adding new useful functionality > where Numeric used to raise an error. In this category, I would suggest to allow broadcasting to any multiple of the dimension even if the dimension is not 1. I don't see what makes 1 so special. >>> x = zeros(4) >>> x+(1,2) Traceback (most recent call last): File "", line 1, in ? ValueError: shape mismatch: objects cannot be broadcast to a single shape >>> x+(1,) array([1, 1, 1, 1]) I suggest that we make ufunc sonsistent with slice assignment. Currently: >>> x[:]=1,1 >>> x[:]=1,1,1 Traceback (most recent call last): File "", line 1, in ? ValueError: number of elements in destination must be integer multiple of number of elements in source From cookedm at physics.mcmaster.ca Mon Apr 24 13:13:09 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Mon Apr 24 13:13:09 2006 Subject: [Numpy-discussion] numexpr enhancements In-Reply-To: <20060421205530.GA25020@xot.carabos.com> (faltet@xot.carabos.com's message of "Fri, 21 Apr 2006 20:55:30 +0000") References: <20060421205530.GA25020@xot.carabos.com> Message-ID: faltet at xot.carabos.com writes: > Hi, > > After looking at the numpy performance issues on strided and unaligned > data, I decided to have a try at the numexpr package and finally > implemented better suport for them. As a result, numexpr can reach now > a 2x of performance improvement for simple expressions, like 'a>2.'. > > In the way, I've added support for boolean expressions (&, | and ~, as > in the where() function), a new boolean data type (important to get > better performance on boolean expressions) and support for numarray > (maintaining the compatibility with numpy, of course). > > I've called the new package numexpr 0.2 to not confuse it with existing > 0.1. Well, let's hope that numexpr can continue making its way towards > integration in numpy. > > You can fetch this new package at: > > http://www.carabos.com/downloads/divers/numexpr-0.2.tar.gz > > Finally, let me say that numexpr is a wonderful toy to get your hands > dirty ;-) Many thanks to David (and Tim) for this! Unfortunately, real life (damn Ph.D.! :-) has gotten in my way, so I'm not going to be able to look at this for a while. But I'll add it to my list. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Mon Apr 24 13:18:05 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Mon Apr 24 13:18:05 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs In-Reply-To: <20060421162336.42285837.simon@arrowtheory.com> (Simon Burton's message of "Fri, 21 Apr 2006 16:23:36 +1000") References: <20060421162336.42285837.simon@arrowtheory.com> Message-ID: Simon Burton writes: > Hi, > > Inspired by numexpr, pypy and llvm, i've built a simple > JIT for creating numpy "ufuncs" (they are not yet real ufuncs). > It uses llvm[1] as the backend machine code generator. Cool! I had a look at LLVM, but I wanted something to go into SciPy, and that was too heavy a dependence. However, I could see doing more stuff with this than I can easily with numexpr. > The main things it can do are: > > *) parse simple python code (function def's) > *) generate SSA assembly code for llvm > *) build ufunc code for applying to numpy array's > > When I say simple I mean it: > > def calc(a,b): > c = (a+b)/2.0 > return c > > No control flow or type inference has been implemented. > > As with numexpr, significant speedups are possible. > > I'm putting this announce here to see what the other numpy'ers think. > > $ svn co http://rubis.rsise.anu.edu.au/local/repos/elefant/pyjit > > [1] http://llvm.org/ How do the speedups compare with numexpr? Are there any lessons you learned from this that could apply to numexpr? Could we have a common frontend for numexpr/pyjit, and a different backend for each? Then each wouldn't have to reinvent the wheel in parsing (the same thought goes with weave, too...) I don't have much time to look at it (real life sucking my time :-(), but I'll have a look when I do have the time. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant.travis at ieee.org Mon Apr 24 14:22:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 14:22:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> Message-ID: <444D4143.4020204@ieee.org> Sasha wrote: > On 4/24/06, Travis Oliphant wrote: > >> [...] >> A copying operation is not handled using the same rules. In this case, >> for example, Numeric used to raise an error but in NumPy the array will >> be copied as many times as possible into the array. I don't believe >> ticket #76 is actually an error. >> >> > I disagree on the terminology. In my view broadcasting means > repeating the values of the array to fit into a different shape no > matter what dictates the new shape an operand or the receiver. > I can understand that view. But, that's not been the historical use of broadcasting which has always been only a "ufunc" concept. Code to implement a broader view of broadcasting across more operations if people decide that is appropriate could be done (carefully), but I don't have time to write it. -Travis From oliphant.travis at ieee.org Mon Apr 24 14:25:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 14:25:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> Message-ID: <444D41FE.7050904@ieee.org> Sasha wrote: > In this category, I would suggest to allow broadcasting to any > multiple of the dimension even if the dimension is not 1. I don't see > what makes 1 so special. > What's so special about 1 is that the code for it is relatively straightforward and already implemented using strides. Altering the code to allow any multiple of the dimension would be harder and slower. -Travis From oliphant.travis at ieee.org Mon Apr 24 14:30:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 14:30:01 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> Message-ID: <444D4329.9050700@ieee.org> Sasha wrote: >>>> x[:]=1,1 >>>> x[:]=1,1,1 >>>> > Traceback (most recent call last): > File "", line 1, in ? > ValueError: number of elements in destination must be integer multiple > of number of elements in source > I think the only reasonable thing to do is to raise an error unless the shapes were compatible like Numeric did and eliminate the multiple copying feature. This would bring the desired consistency. -Travis From strawman at astraw.com Mon Apr 24 14:33:01 2006 From: strawman at astraw.com (Andrew Straw) Date: Mon Apr 24 14:33:01 2006 Subject: [Numpy-discussion] Floating point exception with numpy and embedded python interpreter In-Reply-To: References: <81934aa60604191029h4d8a8d9bl550fa58cc67d3d5e@mail.gmail.com> <44467576.1020708@astraw.com> <4446819D.3030401@astraw.com> <35791.134.226.38.190.1145701016.squirrel@webmail.cs.tcd.ie> <444A8026.3030307@astraw.com> Message-ID: <444D43D0.3040308@astraw.com> This doesn't seem like an issue with numpy. Your test proved that. I'm curious what the outcome is, but I'm afraid there's not much we can do. At this point I think you should write the ns2 people and see what they say. Their program seems to be responsible for twiddling the FPU/SSE flags, so I think the issue is better solved, or at least discussed, by them. Cheers! Andrew Arkaitz Bitorika wrote: > Andrew, > > I've verified that the function causes the exception when embedded in > the program but not when used from a simple C program with just a main > () function. The successful version iterates 31 times over the for > loop while the crashing one fails the 30th time that it does "pinf *= > mul". > > Now we know exactly where the crash is, but no idea how to fix it ;). > It doesn't look it should be related to SSE2 flags, it's just doing a > big multiplication, but I don't know enough about low level C and > floating point operations to understand why it may be throwing the > exception there. Any idea how I could avoid that function crashing? > > Thanks, > Arkaitz > > On 22 Apr 2006, at 20:12, Andrew Straw wrote: > >> OK, going back to your original gdb traceback, it looks like the SIGFPE >> originated in the following funtion in umathmodule.c: >> >> static double >> pinf_init(void) >> { >> double mul = 1e10; >> double tmp = 0.0; >> double pinf; >> >> pinf = mul; >> for (;;) { >> pinf *= mul; >> if (pinf == tmp) break; >> tmp = pinf; >> } >> return pinf; >> } >> >> If you try just that function (instead of the whole Python interpreter >> and numpy module) and still get the exception, you'll be that much >> closer to narrowing down the issue. > From oliphant.travis at ieee.org Mon Apr 24 17:40:04 2006 From: oliphant.travis at ieee.org (Travis E. Oliphant) Date: Mon Apr 24 17:40:04 2006 Subject: [Numpy-discussion] Re: Backporting numpy to Python 2.2 In-Reply-To: <20060419103554.4ac1df4a.twegener@radlogic.com.au> References: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Message-ID: Tim Wegener wrote: > Hi, > > I am attempting to backport numpy-0.9.6 to be compatible with python 2.2. (Some of our machines run python 2.2 as part of Red Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard setup.) I was able to change most of the 2.3-isms to be 2.2 compatible (see the attached patch). However I had problems compiling the following c module: I targeted Python 2.3 because it added some very nice constructs (Python 2.4 added even more but I disciplined myself not to use them). I think it is not impossible to back-port it to Python 2.2 but I agree with Robert that I wonder if it is worth the effort. In this case Python 2.3 added the bool type which is used in NumPy. Basically this type would have to be constructed (the code could be grabbed from Python 2.3) in order to be used. The addition of the boolean type is probably the single biggest change that would make back-porting to 2.2 difficult. There may be others as well but they are probably easier to work around... -Travis From robert.kern at gmail.com Mon Apr 24 18:00:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 18:00:01 2006 Subject: [Numpy-discussion] Changing the Trac authentication, for real this time! Message-ID: <444D7458.3020402@gmail.com> If you encounter errors accessing the Trac sites for NumPy and SciPy over the next hour or so, please wait until I have announced that I have finished. If things are still broken after that, please let me know and I will try to fix it immediately. The details of the changes were posted to the previous thread "Changing the Trac authentication". Apologies for any disruption and for the noise. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From ndarray at mac.com Mon Apr 24 18:26:07 2006 From: ndarray at mac.com (Sasha) Date: Mon Apr 24 18:26:07 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444D4329.9050700@ieee.org> References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> Message-ID: On 4/24/06, Travis Oliphant wrote: > Sasha wrote: > >>>> x[:]=1,1 > >>>> x[:]=1,1,1 > >>>> > > Traceback (most recent call last): > > File "", line 1, in ? > > ValueError: number of elements in destination must be integer multiple > > of number of elements in source > > > I think the only reasonable thing to do is to raise an error unless the > shapes were compatible like Numeric did and eliminate the multiple > copying feature. I've attached a patch to the ticket: I don't see why slice assignment cannot reuse the ufunc code. It looks like slice assignment can just be dispatched to a trivial (pass-through) ufunc. This aproach may even prove to be faster because type-aware copying loops can be faster than memmove on popular platforms. From robert.kern at gmail.com Mon Apr 24 19:39:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 19:39:02 2006 Subject: [Numpy-discussion] Re: Changing the Trac authentication, for real this time! In-Reply-To: <444D7458.3020402@gmail.com> References: <444D7458.3020402@gmail.com> Message-ID: <444D8BA2.1080407@gmail.com> I hate computers. It's still not done. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stephen.walton at csun.edu Mon Apr 24 20:49:03 2006 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Apr 24 20:49:03 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> Message-ID: <444D9C0C.3030006@csun.edu> Robert Kern wrote: >Rick Muller wrote: > >> >> >>That being said, I just had my first system where Numpy failed to >>build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe >>it either. not my decision.) and I get the following error when trying >>to run Numpy: >> >> >> >This usually means that you are not linking in the g2c library. > > On Redhat 7.3, I don't believe there was a g2c library, but an f2c one. So -lf2c is needed at the link step (and f2c needs to be installed). From robert.kern at gmail.com Mon Apr 24 20:54:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 20:54:02 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <444D9C0C.3030006@csun.edu> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> Message-ID: Stephen Walton wrote: > Robert Kern wrote: > >> Rick Muller wrote: >> >>> That being said, I just had my first system where Numpy failed to >>> build. It's on a redhat 7.3 (yes, we have a 7.3 box. I didn't believe >>> it either. not my decision.) and I get the following error when trying >>> to run Numpy: >> >> This usually means that you are not linking in the g2c library. >> > On Redhat 7.3, I don't believe there was a g2c library, but an f2c one. > So -lf2c is needed at the link step (and f2c needs to be installed). Well, there's libf2c which is a library provided by f2c, a program that converts FORTRAN to C. And then there's libg2c which is provided by g77. They really are different and, I don't think, interchangeable. Note that libg2c will be stuck several ellipses down in the bowels of /usr/lib/gcc/.../.../libg2c.a not up in /usr/lib/. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stephen.walton at csun.edu Mon Apr 24 21:09:01 2006 From: stephen.walton at csun.edu (Stephen Walton) Date: Mon Apr 24 21:09:01 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> Message-ID: <444DA0A5.80902@csun.edu> Robert Kern wrote: >Well, there's libf2c which is a library provided by f2c, a program that converts >FORTRAN to C. And then there's libg2c which is provided by g77. They really are >different > Oh, I knew that. My point was that there were some old Redhat releases (I don't recall if 7.3 is that old, probably not) which didn't include g77, just an f77 shell script which called f2c and cc. From robert.kern at gmail.com Mon Apr 24 21:14:01 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 21:14:01 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <444DA0A5.80902@csun.edu> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> <444DA0A5.80902@csun.edu> Message-ID: Stephen Walton wrote: > Robert Kern wrote: > >> Well, there's libf2c which is a library provided by f2c, a program >> that converts >> FORTRAN to C. And then there's libg2c which is provided by g77. They >> really are >> different > > Oh, I knew that. My point was that there were some old Redhat releases > (I don't recall if 7.3 is that old, probably not) which didn't include > g77, just an f77 shell script which called f2c and cc. Oy. I'm not sure if even we support that. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From rob at hooft.net Mon Apr 24 21:25:01 2006 From: rob at hooft.net (Rob Hooft) Date: Mon Apr 24 21:25:01 2006 Subject: [Numpy-discussion] Re: Problems building numpy In-Reply-To: <444DA0A5.80902@csun.edu> References: <02801766-7F45-48EE-AD4A-7B4B0590C9AC@sandia.gov> <444D9C0C.3030006@csun.edu> <444DA0A5.80902@csun.edu> Message-ID: <444DA473.2010000@hooft.net> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Stephen Walton wrote: | Robert Kern wrote: | |> Well, there's libf2c which is a library provided by f2c, a program |> that converts |> FORTRAN to C. And then there's libg2c which is provided by g77. They |> really are |> different | | Oh, I knew that. My point was that there were some old Redhat releases | (I don't recall if 7.3 is that old, probably not) which didn't include | g77, just an f77 shell script which called f2c and cc. And in addition, very old versions of g77 (I'm not sure to which RedHat version this age corresponds) used f2c's library unmodified. I think the f2c/cc times (the compiler script was called fcomp?) were a bit older. I moved back to my current job with RedHat 4.x (1997), and I worked with self-compiled g77 already in my previous job.... Rob - -- Rob W.W. Hooft || rob at hooft.net || http://www.hooft.net/people/rob/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.2.2 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFETaRzH7J/Cv8rb3QRAqtEAKCsDcj3tO7Gcvgsyj0CaDCu99JLSgCgjgjp sB7u8S0krk5a1G2bYC+h9cQ= =MLOS -----END PGP SIGNATURE----- From oliphant.travis at ieee.org Mon Apr 24 21:31:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 21:31:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> Message-ID: <444DA5D4.4080104@ieee.org> Sasha wrote: > On 4/24/06, Travis Oliphant wrote: > >> Sasha wrote: >> >>>>>> x[:]=1,1 >>>>>> x[:]=1,1,1 >>>>>> >>>>>> >>> Traceback (most recent call last): >>> File "", line 1, in ? >>> ValueError: number of elements in destination must be integer multiple >>> of number of elements in source >>> >>> >> I think the only reasonable thing to do is to raise an error unless the >> shapes were compatible like Numeric did and eliminate the multiple >> copying feature. >> > > I've attached a patch to the ticket: > > > > I don't see why slice assignment cannot reuse the ufunc code. It > looks like slice assignment can just be dispatched to a trivial > (pass-through) ufunc. This aproach may even prove to be faster > because type-aware copying loops can be faster than memmove on popular > platforms. > > It could re-use that code but there are at least two drawbacks to that approach: 1) The overhead of the ufunc for small array copies. 2) The special-casing that would be needed for variable-size arrays (string, unicode, void...) which are not supported by the ufunc machinery. and we've already improved the copying by making them type-aware. Right now copying is handled by the data-type functions (not the ufuncs). Perhaps what should be done instead is to allow for strided copying in the copyswapn function. To fully support record arrays with object components the copy operation for the VOID case needs to be recursive when fields are defined. -Travis From oliphant.travis at ieee.org Mon Apr 24 22:00:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 22:00:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> Message-ID: <444DACB8.50203@ieee.org> Sasha wrote: > On 4/24/06, Travis Oliphant wrote: > > I've attached a patch to the ticket: > > > I don't think the patch will do your definition of "the right thing" (i.e. mirror broadcasting behavior) in all cases. For example if "a" is 2x3x4x5 and "b" is 2x1x1x5, then a[...] = b will not fill the right sub-space of "a" with the contents of "b". The PyArray_CopyInto gets called in a lot of places. Have you checked all of them to be sure that altering the semantics of copying (which are currently different than broadcasting) will work correctly? I agree that one can demonstrate a slight in-consistency. But, I'd rather have the inconsistency and tell people that copying and assignment is not a broadcasting ufunc, then feign consistency and have it not quite right. -Travis From robert.kern at gmail.com Mon Apr 24 22:22:03 2006 From: robert.kern at gmail.com (Robert Kern) Date: Mon Apr 24 22:22:03 2006 Subject: [Numpy-discussion] Re: [SciPy-dev] Google Summer of Code In-Reply-To: <44476AEA.7080003@decsai.ugr.es> References: <44476AEA.7080003@decsai.ugr.es> Message-ID: <444DB033.4000906@gmail.com> [Cross-posted because this is partially an announcement. Continuing discussion should go to only one list, please.] Antonio Arauzo Azofra wrote: > Google Summer of Code > http://code.google.com/soc/ > > Have you considered participating as a Mentoring organization? Offering > any project about Scipy? I'm not sure which "you" you are referring to here, but yes! Unfortunately, it was a bit late in the process to be applying as a mentoring organization. Google started consolidating mentoring organizations. However, I and several others at Enthought are volunteering to mentor through the PSF. I encourage others on these lists to do the same or to apply as students, whichever is appropriate. We'll be happy to provide SVN workspace for numpy and scipy SoC projects. I've added one fairly general scipy entry to the python.org Wiki page listing project ideas: http://wiki.python.org/moin/SummerOfCode If you have more specific ideas, please add them to the Wiki. Potential mentors: Neal Norwitz is coordinating PSF mentors this year and has asked that those he or Guido does not know personally to give personal references. If you've been active on this list, I'm sure we can play the "Two Degrees of Separation From Guido Game" and get you a reference from someone else here. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Mon Apr 24 22:27:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Mon Apr 24 22:27:02 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444DACB8.50203@ieee.org> References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> <444DACB8.50203@ieee.org> Message-ID: <444DB302.30903@ieee.org> Travis Oliphant wrote: > Sasha wrote: >> On 4/24/06, Travis Oliphant wrote: >> I've attached a patch to the ticket: >> >> >> >> > I don't think the patch will do your definition of "the right thing" > (i.e. mirror broadcasting behavior) in all cases. For example if "a" > is 2x3x4x5 and "b" is 2x1x1x5, then a[...] = b will not fill the > right sub-space of "a" with the contents of "b". > > > The PyArray_CopyInto gets called in a lot of places. Have you checked > all of them to be sure that altering the semantics of copying (which > are currently different than broadcasting) will work correctly? I > agree that one can demonstrate a slight in-consistency. But, I'd > rather have the inconsistency and tell people that copying and > assignment is not a broadcasting ufunc, then feign consistency and > have it not quite right. > Of course, as I've said I'm not opposed to the consistency. To do it "right", one should use PyArray_MultiIterNew which abstracts the concept of broadcasting into iterators (and uses the broadcastable checking code that's already written --- so you guarantee consistency). I'm not sure what overhead it would bring. But, special cases could be checked-for (scalar, and same-size arrays for example). I'm also thinking that copyswapn should grow stride arguments so that it can be used more generally. -Travis From lroubeyrie at limair.asso.fr Tue Apr 25 00:39:04 2006 From: lroubeyrie at limair.asso.fr (Lionel Roubeyrie) Date: Tue Apr 25 00:39:04 2006 Subject: [Numpy-discussion] equality with masked object Message-ID: <200604250938.48648.lroubeyrie@limair.asso.fr> Hi all, I have a problem with masked_object (and masked_values to) like in this sort example : ########################################### lionel[Donn?es]8>test=array([1,2,3,inf,5]) lionel[Donn?es]9>test = ma.masked_object(test, inf) lionel[Donn?es]10>print test[3], type(test[3]) -- lionel[Donn?es]11>print test.max(), type(test.max()) 5.0 lionel[Donn?es]12>test[3] == test.max() Sortie[12]: array(data = [True], mask = True, fill_value=?) ########################################### Why 5.0 == -- return True? A float is it the same as a masked object? thanks -- Lionel Roubeyrie - lroubeyrie at limair.asso.fr LIMAIR http://www.limair.asso.fr From nicolas.chauvat at logilab.fr Tue Apr 25 03:22:15 2006 From: nicolas.chauvat at logilab.fr (Nicolas Chauvat) Date: Tue Apr 25 03:22:15 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs In-Reply-To: References: <20060421162336.42285837.simon@arrowtheory.com> Message-ID: <20060425102134.GI24645@crater.logilab.fr> On Mon, Apr 24, 2006 at 04:17:16PM -0400, David M. Cooke wrote: > Simon Burton writes: > > > Hi, > > > > Inspired by numexpr, pypy and llvm, i've built a simple > > JIT for creating numpy "ufuncs" (they are not yet real ufuncs). > > It uses llvm[1] as the backend machine code generator. > > Cool! I had a look at LLVM, but I wanted something to go into SciPy, > and that was too heavy a dependence. However, I could see doing more > stuff with this than I can easily with numexpr. Hello, People interested in this might also be interested in PyPy's rctypes and the exploratory work done in PyPy to annotate code using arrays. The goal is "write Python code using numeric arrays and other C libs, then ask PyPy to translate it to C while removing the python wrapper of the C libs, then compile". Then you can run the code as python code when developping and compile the all thing from C to assembly when speed matters. Please note it is a goal. We are not there yet. But any help will be welcome :) -- Nicolas Chauvat logilab.fr - services en informatique avanc?e et gestion de connaissances From steffen.loeck at gmx.de Tue Apr 25 04:25:22 2006 From: steffen.loeck at gmx.de (Steffen Loeck) Date: Tue Apr 25 04:25:22 2006 Subject: [Numpy-discussion] vectorize problem Message-ID: <200604251324.42987.steffen.loeck@gmx.de> Hello all, I have a problem using scalar variables in a vectorized function: from numpy import vectorize def f(x): if x>0: return 1 else: return 0 F = vectorize(f) F(1) gives the error message: --------------------------------------------------------------------------- exceptions.AttributeError Traceback (most recent call last) .../function_base.py in __call__(self, *args) 619 620 if self.nout == 1: --> 621 return self.ufunc(*args).astype(self.otypes[0]) 622 else: 623 return tuple([x.astype(c) for x, c in zip(self.ufunc(*args), self.otypes)]) AttributeError: 'int' object has no attribute 'astype' Is there any way to get vectorized functions working with scalars again? Regards Steffen From ndarray at mac.com Tue Apr 25 06:17:13 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 06:17:13 2006 Subject: [Numpy-discussion] equality with masked object In-Reply-To: <200604250938.48648.lroubeyrie@limair.asso.fr> References: <200604250938.48648.lroubeyrie@limair.asso.fr> Message-ID: On 4/25/06, Lionel Roubeyrie wrote: > > Why 5.0 == -- return True? A float is it the same as a masked object? > thanks It does not. It returns ma.masked : >>> test[3] is ma.masked True You should not access masked data - it makes no sense. The current behavior is historical and I don't really like it. Masked scalars are replaced by ma.masked singleton in subscript operations to allow a[i] is masked idiom. In my view it is not worth the trouble, but my suggestion to get rid of that feature was not met with much enthusiasm. From ndarray at mac.com Tue Apr 25 06:59:07 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 06:59:07 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444DACB8.50203@ieee.org> References: <444D0DF7.2060307@ieee.org> <444D4329.9050700@ieee.org> <444DACB8.50203@ieee.org> Message-ID: On 4/25/06, Travis Oliphant wrote: > Sasha wrote: > > On 4/24/06, Travis Oliphant wrote: > > > > I've attached a patch to the ticket: > > > > > > > I don't think the patch will do your definition of "the right thing" > (i.e. mirror broadcasting behavior) in all cases. For example if "a" is > 2x3x4x5 and "b" is 2x1x1x5, then a[...] = b will not fill the right > sub-space of "a" with the contents of "b". > You are right, but it is not the fault of my code. My code checks shapes correctly, but the code that follows does not implement broadcasting. I did not realize that. This also explains why we disagreed on whether slice assignment is the same as broadcasting before. > > The PyArray_CopyInto gets called in a lot of places. Have you checked > all of them to be sure that altering the semantics of copying (which are > currently different than broadcasting) will work correctly? I agree > that one can demonstrate a slight in-consistency. But, I'd rather have > the inconsistency and tell people that copying and assignment is not a > broadcasting ufunc, then feign consistency and have it not quite right. > That's why I would rather use an identity ufunc for slice assignment instead of PyArray_CopyInto. From charges at humortadela.com.br Tue Apr 25 07:23:06 2006 From: charges at humortadela.com.br (Humortadela) Date: Tue Apr 25 07:23:06 2006 Subject: [Numpy-discussion] Voce recebeu uma charge humortadela Message-ID: <80a6946d133576735a9bca9dea6ea1c3@humortadela.com.br> An HTML attachment was scrubbed... URL: From charges at humortadela.com.br Tue Apr 25 07:24:03 2006 From: charges at humortadela.com.br (Humortadela) Date: Tue Apr 25 07:24:03 2006 Subject: [Numpy-discussion] Voce recebeu uma charge humortadela Message-ID: <80a6946d133576735a9bca9dea6ea1c3@humortadela.com.br> An HTML attachment was scrubbed... URL: From perry at stsci.edu Tue Apr 25 08:21:02 2006 From: perry at stsci.edu (Perry Greenfield) Date: Tue Apr 25 08:21:02 2006 Subject: [Numpy-discussion] Re: Backporting numpy to Python 2.2 In-Reply-To: References: <20060419103554.4ac1df4a.twegener@radlogic.com.au> Message-ID: <93BC9AD0-A6CA-4128-B0EE-9999F4CE8077@stsci.edu> On Apr 24, 2006, at 8:38 PM, Travis E. Oliphant wrote: > Tim Wegener wrote: >> Hi, I am attempting to backport numpy-0.9.6 to be compatible with >> python 2.2. (Some of our machines run python 2.2 as part of Red >> Hat 9 and Red Hat 7.3 and it is hazardous to alter the standard >> setup.) I was able to change most of the 2.3-isms to be 2.2 >> compatible (see the attached patch). However I had problems >> compiling the following c module: > > I targeted Python 2.3 because it added some very nice constructs > (Python 2.4 added even more but I disciplined myself not to use them). > > I think it is not impossible to back-port it to Python 2.2 but I > agree with Robert that I wonder if it is worth the effort. > > In this case Python 2.3 added the bool type which is used in NumPy. > Basically this type would have to be constructed (the code could be > grabbed from Python 2.3) in order to be used. > > The addition of the boolean type is probably the single biggest > change that would make back-porting to 2.2 difficult. If I recall correctly, True and False were added in one of the 2.2 patch releases (one of those rare new features added in a patch release). Only as constant definitions using 0 and 1, and not the current boolean implementation. So depending on what the current dependencies on booleans are, it may or may not be usable from 2.2.3. But I also wonder if it is worth the effort. I tend to think not. Perry From ndarray at mac.com Tue Apr 25 10:27:10 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 10:27:10 2006 Subject: [Numpy-discussion] Question about __array_struct__ Message-ID: I am trying to add __array_struct__ attribute to R object wrappers in RPy. This is desirable because it eliminates a compile-time dependency on an array module and makes the binary compatible with either Numeric or numpy. R has four types of data: logical, integer, float, and character. The first three map perfectly to Numpy with inter->data simply pointing to an appropriate internal memory area. The character type, however is more problematic. In R character arrays are arrays of variable length strings and therefore similar to Numpy object arrays holding python strings. Obviously, there is no memory area that can be reused. I've tried to allocate new memory in __array_struct__ getter, but this presents a problem: I cannot deallocate that memory in CObject destructor because it is passed to the newly created array which lives long after the interface object is deleted. The __array_struct__ mechanism does not seem to allow to cause the new array assume ownership of the data, but even if it did, I do not know what memory allocator is appropriate. The only solution that I can think of is to create a dummy buffer type with the sole purpose of deleting an array of PyObjects and make an instance of that type the "base" of the new array. Can anyone suggest a better approach? From strawman at astraw.com Tue Apr 25 10:52:08 2006 From: strawman at astraw.com (Andrew Straw) Date: Tue Apr 25 10:52:08 2006 Subject: [Numpy-discussion] Question about __array_struct__ In-Reply-To: References: Message-ID: <444E619C.6030802@astraw.com> Sasha wrote: >I cannot deallocate that memory in CObject destructor because it is >passed to the newly created array which lives long after the interface >object is deleted. > Normally, the array that's viewing the data held by the __array_struct__ should keep a reference to the base object alive, thus preventing the issue. If the base object isn't a Python object, you'll have to create some kind of Python type that will ensure the original data is not freed, although this would normally take place via refcounts if the data source was a Python object. > The __array_struct__ mechanism does not seem to >allow to cause the new array assume ownership of the data, but even if >it did, I do not know what memory allocator is appropriate. > >The only solution that I can think of is to create a dummy buffer type >with the sole purpose of deleting an array of PyObjects and make an >instance of that type the "base" of the new array. > > Yes, that's I do. (See http://www.scipy.org/Cookbook/ArrayStruct_and_Pyrex for example.) From fullung at gmail.com Tue Apr 25 14:16:06 2006 From: fullung at gmail.com (Albert Strasheim) Date: Tue Apr 25 14:16:06 2006 Subject: [Numpy-discussion] SWIG wrappers: Inplace arrays Message-ID: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> Hello all I am using the SWIG Numpy typemaps to wrap some C code. I ran into the following problem when wrapping a function with INPLACE_ARRAY1. In Python, I create the following array: x = array([],dtype='descr->type_num) Given that I created the array with ' ---- Travis Oliphant wrote: > Sasha wrote: > > In this category, I would suggest to allow broadcasting to any > > multiple of the dimension even if the dimension is not 1. I don't see > > what makes 1 so special. > > > What's so special about 1 is that the code for it is relatively > straightforward and already implemented using strides. Altering the > code to allow any multiple of the dimension would be harder and slower. It also does the right thing most of the time and is easy to understand. It's my expectation that oppening up broadcasting will be more effective in masking errors than in enabling useful new behaviour. I think that's my ticket being discussed here. If so, it was motivated by a case that stopped working because the looser broadcasting behaviour was preventing some other broadcasting from taking place. I'm not home right now, so I can't provide details; I'll do that on Thursday. Just keep in mind that it's much easier to keep the broadcasting rules restrictive for now and loosen them up later than to try to tighten them up later if loosening them up turns out to not be a good idea. -tim From tim.hochberg at cox.net Tue Apr 25 14:24:05 2006 From: tim.hochberg at cox.net (tim.hochberg at cox.net) Date: Tue Apr 25 14:24:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). Message-ID: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> ---- Travis Oliphant wrote: > Sasha wrote: > > In this category, I would suggest to allow broadcasting to any > > multiple of the dimension even if the dimension is not 1. I don't see > > what makes 1 so special. > > > What's so special about 1 is that the code for it is relatively > straightforward and already implemented using strides. Altering the > code to allow any multiple of the dimension would be harder and slower. It also does the right thing most of the time and is easy to understand. It's my expectation that oppening up broadcasting will be more effective in masking errors than in enabling useful new behaviour. I think that's my ticket being discussed here. If so, it was motivated by a case that stopped working because the looser broadcasting behaviour was preventing some other broadcasting from taking place. I'm not home right now, so I can't provide details; I'll do that on Thursday. Just keep in mind that it's much easier to keep the broadcasting rules restrictive for now and loosen them up later than to try to tighten them up later if loosening them up turns out to not be a good idea. -tim From oliphant at ee.byu.edu Tue Apr 25 15:55:04 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 25 15:55:04 2006 Subject: [Numpy-discussion] SWIG wrappers: Inplace arrays In-Reply-To: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> References: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> Message-ID: <444EA88B.4050704@ee.byu.edu> Albert Strasheim wrote: >Hello all > >I am using the SWIG Numpy typemaps to wrap some C code. I ran into the >following problem when wrapping a function with INPLACE_ARRAY1. > >In Python, I create the following array: > >x = array([],dtype=' >When this is passed to the C function expecting an int*, it goes via >obj_to_array_no_conversion in numpy.i where a direct comparison of the >typecodes is done, at which point a TypeError is raised. > >In this case: > >desired type = int [typecode 5] >actual type = long [typecode 7] > >The typecode is obtained as follows: > >#define array_type(a) (int)(((PyArrayObject *)a)->descr->type_num) > >Given that I created the array with 'int instead of long. Why isn't this happening? > > Actually there is ambiguity i4 can be either int or long. If you want to guarantee an int-type then use dtype=intc). >Assuming the is a good reason for type_num being what it is, I think >obj_to_array_no_conversion needs to be slightly cleverer about the >conversions it allows. Is there any way to figure out that int and long are >actually identical (at least on my system) using the Numpy C API? Any other >suggestions or comments for solving this problem? > > > Yes. You can use one of PyArray_EquivTypes(PyArray_Descr *dtype1, PyArray_Descr *dtype2) PyArray_EquivTypenums(int typenum1, int typenum2) PyArray_EquivArrTypes(PyObject *array1, PyObject *array2) These return TRUE (non-zero) if the two type representations are equivalent. -Travis From oliphant at ee.byu.edu Tue Apr 25 16:07:05 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Tue Apr 25 16:07:05 2006 Subject: [Numpy-discussion] SWIG wrappers: Inplace arrays In-Reply-To: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> References: <006b01c668ad$68b12ab0$0502010a@dsp.sun.ac.za> Message-ID: <444EAB81.3070001@ee.byu.edu> Albert Strasheim wrote: >Hello all > >I am using the SWIG Numpy typemaps to wrap some C code. I ran into the >following problem when wrapping a function with INPLACE_ARRAY1. > >In Python, I create the following array: > >x = array([],dtype=' >When this is passed to the C function expecting an int*, it goes via >obj_to_array_no_conversion in numpy.i where a direct comparison of the >typecodes is done, at which point a TypeError is raised. > >In this case: > >desired type = int [typecode 5] >actual type = long [typecode 7] > >The typecode is obtained as follows: > >#define array_type(a) (int)(((PyArrayObject *)a)->descr->type_num) > >Given that I created the array with 'int instead of long. Why isn't this happening? > >Assuming the is a good reason for type_num being what it is, I think >obj_to_array_no_conversion needs to be slightly cleverer about the >conversions it allows. Is there any way to figure out that int and long are >actually identical (at least on my system) using the Numpy C API? Any other >suggestions or comments for solving this problem? > > > Here is the relevant new numpy.i code (just checked in...) PyArrayObject* obj_to_array_no_conversion(PyObject* input, int typecode) { PyArrayObject* ary = NULL; if (is_array(input) && (typecode == PyArray_NOTYPE || PyArray_EquivTypenums(array_type(input), typecode)) { ary = (PyArrayObject*) input; } else if is_array(input) { char* desired_type = typecode_string(typecode); char* actual_type = typecode_string(array_type(input)); PyErr_Format(PyExc_TypeError, "Array of type '%s' required. Array of type '%s' given", desired_type, actual_type); ary = NULL; } else { char * desired_type = typecode_string(typecode); char * actual_type = pytype_string(input); PyErr_Format(PyExc_TypeError, "Array of type '%s' required. A %s was given", desired_type, actual_type); ary = NULL; } return ary; } From ndarray at mac.com Tue Apr 25 18:17:04 2006 From: ndarray at mac.com (Sasha) Date: Tue Apr 25 18:17:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> Message-ID: On 4/25/06, tim.hochberg at cox.net wrote: > > ---- Travis Oliphant wrote: > > Sasha wrote: > > > In this category, I would suggest to allow broadcasting to any > > > multiple of the dimension even if the dimension is not 1. I don't see > > > what makes 1 so special. > > > > > What's so special about 1 is that the code for it is relatively > > straightforward and already implemented using strides. Altering the > > code to allow any multiple of the dimension would be harder and slower. I don't think so. The same zero-stride trick that allows size-1 broadcasting can be used to implement repetition. I did not review the C code, but the following Python fragment shows that the loop that is already in numpy can be used to implement repetition by simply manipulating shapes and strides: >>> x = zeros(6) >>> reshape(x,(3,2))[...] = 1,2 >>> x array([1, 2, 1, 2, 1, 2]) > It also does the right thing most of the time and is easy to understand. Easy to understand? Let me quote Travis' book on this: "Broadcasting can be understood by four rules: ... While perhaps somewhat difficult to explain, broadcasting can be quite useful and becomes second nature rather quickly." I may be slow, but it did not become second nature for me. I am still getting bitten by subtle differences between unit length 1-d arrays and 0-d arrays. > It's my expectation that oppening up broadcasting will be more effective in masking > errors than in enabling useful new behaviour. > In my experience broadcasting length-1 and not broadcasting other lengths is very error prone as it is. I understand that restricting broadcasting to make it a strictly dimension-increasing operation is not possible for two reasons: 1. Numpy cannot break legacy Numeric code. 2. It is not possible to differentiate between 1-d array that broadcasts column-wise vs. one that broadcasts raw-wise. In my view none of these reasons is valid. In my experience Numeric code that relies on dimension-preserving broadcasting is already broken, only in a subtle and hard to reproduce way. Similarly the need to broadcast over non-leading dimension is a sign of bad design. In rare cases where such broadcasting is desirable, it can be easily done via swapaxes which is a cheap operation. Nevertheless, I've lost that battle some time ago. On the other hand I don't see much problem in making dimension-preserving broadcasting more permissive. In R, for example, (1-d) arrays can be broadcast to arbitrary size. This has an additional benefit that 1-d to 2-d broadcasting requires no special code, it just happens because matrices inherit arithmetics from vectors. I've never had a problem with R rules being too loose. > I think that's my ticket being discussed here. If so, it was motivated by a case that > stopped working because the looser broadcasting behaviour was preventing some > other broadcasting from taking place. I'm not home right now, so I can't provide > details; I'll do that on Thursday. In my view the problem that your ticket highlighted is not so much in the particular set of broadcasting rules, but in the fact that a[...] = b uses one set of rules while a[...] += b uses another. This is *very* confusing. > Just keep in mind that it's much easier to keep the broadcasting rules restrictive for > now and loosen them up later than to try to tighten them up later if loosening them up > turns out to not be a good idea. You are preaching to the choir! From simon at arrowtheory.com Tue Apr 25 18:29:01 2006 From: simon at arrowtheory.com (Simon Burton) Date: Tue Apr 25 18:29:01 2006 Subject: [Numpy-discussion] announce: pyjit, a little jit for creating numpy ufuncs In-Reply-To: References: <20060421162336.42285837.simon@arrowtheory.com> Message-ID: <20060426112808.531d652b.simon@arrowtheory.com> On Mon, 24 Apr 2006 16:17:16 -0400 cookedm at physics.mcmaster.ca (David M. Cooke) wrote: > > How do the speedups compare with numexpr? numexpr segfaults for me (runing timings.py): Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1209670912 (LWP 31768)] 0xb7d2b696 in PyArray_NewFromDescr (subtype=0x626e6769, descr=0x64007469, nd=1919251557, dims=0x656e696d, strides=0x782d2073, data=0x656c6520, flags=1953391981, obj=0x65736977) at arrayobject.c:3942 3942 arrayobject.c: No such file or directory. in arrayobject.c Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From robert.kern at gmail.com Tue Apr 25 20:10:07 2006 From: robert.kern at gmail.com (Robert Kern) Date: Tue Apr 25 20:10:07 2006 Subject: [Numpy-discussion] Chang*ed* the Trac authentication Message-ID: <444EE463.10007@gmail.com> Trying not to embarass myself again, I made the changes without telling you. :-) In order to create or modify Wiki pages or tickets on the NumPy and SciPy Tracs, you will have to be logged in. You can register yourself by clicking the "Register" link in the upper right-hand corner of the page. Developers who previously had accounts have the same username/password as before. You can now change your password if you like. Only developers have the ability to close tickets, delete Wiki pages entirely, or create new ticket reports (and possibly a couple of other things). Developers, please enter your name and email by clicking on the "Settings" link up at top once logged in. Thank you for your patience. If there are any problems, please email me, and I will try to correct them quickly. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Tue Apr 25 22:26:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Tue Apr 25 22:26:01 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> Message-ID: <444F0420.9000500@ieee.org> Sasha wrote: > On 4/25/06, tim.hochberg at cox.net wrote: > >> ---- Travis Oliphant wrote: >> >>> Sasha wrote: >>> >>>> In this category, I would suggest to allow broadcasting to any >>>> multiple of the dimension even if the dimension is not 1. I don't see >>>> what makes 1 so special. >>>> >>>> >>> What's so special about 1 is that the code for it is relatively >>> straightforward and already implemented using strides. Altering the >>> code to allow any multiple of the dimension would be harder and slower. >>> > > I don't think so. The same zero-stride trick that allows size-1 > broadcasting can be used to implement repetition. I did not review > the C code, but the following Python fragment shows that the loop that > is already in numpy can be used to implement repetition by simply > manipulating shapes and strides: > I don't think anyone is fundamentally opposed to multiple repetitions. We're just being cautious. Also, as you've noted, the assignment code is currently not using the ufunc broadcasting code and so they really aren't the same thing, yet. > >> It's my expectation that oppening up broadcasting will be more effective in masking >> errors than in enabling useful new behaviour. >> >> > In my experience broadcasting length-1 and not broadcasting other > lengths is very error prone as it is. That's not been my experience. But, I don't know R very well. I'm very interested in what ideas you can bring. > I understand that restricting > broadcasting to make it a strictly dimension-increasing operation is > not possible for two reasons: > > 1. Numpy cannot break legacy Numeric code. > 2. It is not possible to differentiate between 1-d array that > broadcasts column-wise vs. one that broadcasts raw-wise. > > In my view none of these reasons is valid. In my experience Numeric > code that relies on dimension-preserving broadcasting is already > broken, only in a subtle and hard to reproduce way. I definitely don't agree with you here. Dimension-preserving broadcasting is at the heart of the utility of broadcasting and it is very, very useful for that. Calling it subtly broken suggests that you don't understand it and have never used it for it's intended purpose. I've used dimension-preserving broadcasting literally hundreds of times. It's rather bold of you to say that all of that code is "broken" Now, I'm sure there are other useful ways to "broadcast", but dimension-preserving is essentially what broadcasting *is* in NumPy. If anything it is the dimension-increasing rule that is somewhat arbitrary (e.g. why prepend with ones). Perhaps you want to introduce some other way for non-commensurate shapes to interact in an operation. I think you will find many open minds on this list (although probably not anyone who will want to code it up :-) ). We do welcome the discussion. Your experience with other array-like languages is helpful. > Similarly the > need to broadcast over non-leading dimension is a sign of bad design. > In rare cases where such broadcasting is desirable, it can be easily > done via swapaxes which is a cheap operation. > Again, it would help if you would refrain from using negative words about coding styles that are different from your own. Such broadcasting is not that rare. It happens quite frequently, actually. The point of a language like Python is that you can write algorithms simply without struggling with optimization questions up front like you seem to be hinting at. > On the other hand I don't see much problem in making > dimension-preserving broadcasting more permissive. In R, for example, > (1-d) arrays can be broadcast to arbitrary size. This has an > additional benefit that 1-d to 2-d broadcasting requires no special > code, it just happens because matrices inherit arithmetics from > vectors. I've never had a problem with R rules being too loose. > So, please explain exactly what you mean. Only a few on this list know what the R rules even are. > In my view the problem that your ticket highlighted is not so much in > the particular set of broadcasting rules, but in the fact that a[...] > = b uses one set of rules while a[...] += b uses another. This is > *very* confusing. > Yes, this is admittedly confusing. But, it's an outgrowth of the way Numeric code developed. Broadcasting was always only a ufunc concept in Numeric, and copying was not a ufunc. NumPy grew out of Numeric code. I was not trying to mimick broadcasting behavior when I wrote the array copy and array setting code. Perhaps I should have been. I'm willing to change the code on this one, but only if the new copy code actually does implement broadcasting behavior equivalently. And going through the ufunc machinery is probably a waste of effort because the copy code must be written for variable length arrays anyway (and ufuncs don't support them). However, the broadcasting machinery has been abstracted in NumPy and can therefore be re-used in the copying code. In Numeric, broadcasting was basically implemented deep inside a confusing while loop. -Travis From fullung at gmail.com Tue Apr 25 23:42:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Tue Apr 25 23:42:05 2006 Subject: [Numpy-discussion] SWIG wrappers: Passing NULL pointers or arrays Message-ID: <00dd01c668fc$6d04b470$0502010a@dsp.sun.ac.za> Hello all, I've currently wrapping a C library (libsvm) with NumPy. libsvm has a few structs similiar to the following: struct svm_parameter { double* weight; int nr_weight; }; In my SWIG wrapper I did the following: struct svm_parameter { %immutable; int nr_weight; %mutable; double* weight; %extend { svm_parameter() { struct svm_parameter* param = (struct svm_parameter*) malloc(sizeof(struct svm_parameter)); param->nr_weight = 0; param->weight = 0; return param; } ~svm_parameter() { free(self->weight); free(self); } void _set_weight(double* IN_ARRAY1, int DIM1) { free(self->weight); self->nr_weight = DIM1; self->weight = malloc(sizeof(double) * DIM1); if (!self->weight) { SWIG_exception(SWIG_MemoryError, "OOM"); } memcpy(self->weight, IN_ARRAY1, sizeof(double) * DIM1); return; fail: self->nr_weight = 0; self->weight = 0; } } }; This works pretty well (suggestion welcome though). However, one feature that I think is lacking from the current array typemaps is a way of passing NULL to the C function. On the Python side I want to be able to do: svm_parameter.weight = N.array([1.0,2.0]) or svm_parameter.weight = None This heads off to __setattr__ where the following happens: def __setattr__(self, attr, val): if attr in ['weight', 'weight_label']: set_func = getattr(self, '_set_%s' % (attr,)) set_func(val) else: super(svm_parameter, self).__setattr__(attr, val) At this point the typemap magic kicks in. However, passing a None doesn't work, because somewhere down the line somebody checks for the int argument. The current typemap looks like this: %define TYPEMAP_IN1(type,typecode) %typemap(in) (type* IN_ARRAY1, int DIM1) (PyArrayObject* array=NULL, int is_new_object) { int size[1] = {-1}; array = obj_to_array_contiguous_allow_conversion($input, typecode, &is_new_object); if (!array || !require_dimensions(array,1) || !require_size(array,size,1)) SWIG_fail; $1 = (type*) array->data; $2 = array->dimensions[0]; } %typemap(freearg) (type* IN_ARRAY1, int DIM1) { if (is_new_object$argnum && array$argnum) Py_DECREF(array$argnum); } %enddef I quickly hacked up the following typemap that seems to deal gracefully when a None is passed instead of an array. Changed lines: if ($input == Py_None) { is_new_object = 0; $1 = NULL; $2 = 0; } else { int size[1] = {-1}; array = obj_to_array_contiguous_allow_conversion($input, typecode, &is_new_object); if (!array || !require_dimensions(array,1) || !require_size(array,size,1)) SWIG_fail; $1 = (type*) array->data; $2 = array->dimensions[0]; } Now I can write my set_weight function as follows: void _set_weight(double* IN_ARRAY1, int DIM1) { free(self->weight); self->weight = 0; self->nr_weight = DIM1; if (DIM1 > 0) { self->weight = malloc(sizeof(double) * DIM1); if (!self->weight) { SWIG_exception(SWIG_MemoryError, "OOM"); } memcpy(self->weight, IN_ARRAY1, sizeof(double) * DIM1); } return; fail: self->nr_weight = 0; } Does it make sense to add this to the typemaps? Any other comments? Are there better ways to accomplish this? Regards, Albert From arnd.baecker at web.de Wed Apr 26 00:52:01 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 00:52:01 2006 Subject: [Numpy-discussion] vectorize problem In-Reply-To: <200604251324.42987.steffen.loeck@gmx.de> References: <200604251324.42987.steffen.loeck@gmx.de> Message-ID: Hi, On Tue, 25 Apr 2006, Steffen Loeck wrote: > Hello all, > > I have a problem using scalar variables in a vectorized function: > > from numpy import vectorize > > def f(x): > if x>0: return 1 > else: return 0 > > F = vectorize(f) > > F(1) > > gives the error message: > --------------------------------------------------------------------------- > exceptions.AttributeError Traceback (most recent call last) > > .../function_base.py in __call__(self, *args) > 619 > 620 if self.nout == 1: > --> 621 return self.ufunc(*args).astype(self.otypes[0]) > 622 else: > 623 return tuple([x.astype(c) for x, c in > zip(self.ufunc(*args), self.otypes)]) > > AttributeError: 'int' object has no attribute 'astype' Ouch - that's not nice - a lot of my code relies the fact that (old scipy) vectorize happily eats scalars *and* arrays. I am not familiar with the code of numpy.vectorize (which has indeed changed quite a bit compared to the old scipy.vectorize), but maybe it is only a simple change? > Is there any way to get vectorized functions working with scalars again? +1 (or is there a particular reason why "vectorized" functions should not be able to operate on scalars?) Best, Arnd From pgmdevlist at mailcan.com Wed Apr 26 01:06:04 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Wed Apr 26 01:06:04 2006 Subject: [Numpy-discussion] A python interface for loess ? Message-ID: <200604260329.17115.pgmdevlist@mailcan.com> Folks, Would any of you be aware of a Python interface to the loess routines ? http://netlib.bell-labs.com/netlib/a/dloess.gz I could use the R implementation through Rpy, but I would prefer to stick to Python... Thanks a lot in advance P. From arnd.baecker at web.de Wed Apr 26 02:39:05 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 02:39:05 2006 Subject: [Numpy-discussion] concatenate, doc-string Message-ID: Hi, the doc-string of concatentate is pretty short: numpy.concatenate? Docstring: concatenate((a1,a2,...),axis=None). Would the following be better: """ concatenate((a1, a2,...), axis=None) joins the tuple `(a1, a2, ...)` of sequences (or arrays) into a single numpy array. Example:: print concatenate( ([0,1,2], [5,6,7])) """ ((The ``(or arrays)`` could be omitted if sequences include array by default, though it might not be obvious to beginners ...)) I was also tempted to suggest a dtype argument, concatenate( ([0,1,2], [5,6,7]), dtype=numpy.Float) but I am not sure if that would be a good idea ... Best, Arnd From gnchen at cortechs.net Wed Apr 26 06:52:01 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Wed Apr 26 06:52:01 2006 Subject: [Numpy-discussion] SWIG for 3D array Message-ID: Hi! I will like to use SWIG to wrap my code. However, it seems the current numpy.i only can map 1 and 2D array, but not 3D. Is it correct? Or I miss something here. I don't mind spend some time to do it like scipy.ndimage if numpy.i did not support ND arrary. But I am new to write extension to Python. And I really have hard time to understand how to deal with reference counting issues. Anyone know where I can know a good reference for that? Or a simple example in numpy will be appreciated.... Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net From oliphant.travis at ieee.org Wed Apr 26 10:05:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Apr 26 10:05:01 2006 Subject: [Numpy-discussion] vectorize problem In-Reply-To: References: <200604251324.42987.steffen.loeck@gmx.de> Message-ID: <444FA7E7.2070303@ieee.org> Arnd Baecker wrote: > Hi, > > On Tue, 25 Apr 2006, Steffen Loeck wrote: > > >> Hello all, >> >> I have a problem using scalar variables in a vectorized function: >> >> from numpy import vectorize >> >> def f(x): >> if x>0: return 1 >> else: return 0 >> >> F = vectorize(f) >> >> F(1) >> >> gives the error message: >> --------------------------------------------------------------------------- >> exceptions.AttributeError Traceback (most recent call last) >> >> .../function_base.py in __call__(self, *args) >> 619 >> 620 if self.nout == 1: >> --> 621 return self.ufunc(*args).astype(self.otypes[0]) >> 622 else: >> 623 return tuple([x.astype(c) for x, c in >> zip(self.ufunc(*args), self.otypes)]) >> >> AttributeError: 'int' object has no attribute 'astype' >> > > Ouch - that's not nice - a lot of my code relies the fact that (old > scipy) vectorize happily eats scalars *and* arrays. > > I am not familiar with the code of numpy.vectorize (which has indeed > changed quite a bit compared to the old scipy.vectorize), > but maybe it is only a simple change? > It is just a simple change. Scalars are supposed to be supported. They aren't only as a side-effect of the switch to not return object-scalars. I did not update the vectorize code to handle the scalar return value from the object ufunc (which is now no-longer an object-scalar with the methods of arrays (like astype) but is instead the underlying object). I'll add a check. -Travis From jrl at gatewayengineers.com Wed Apr 26 12:29:01 2006 From: jrl at gatewayengineers.com (Frida Maldonado) Date: Wed Apr 26 12:29:01 2006 Subject: [Numpy-discussion] vat Message-ID: <001a01c66967$82f94541$ddc46747@ijopi.sewtp> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: controversy.gif Type: image/gif Size: 28493 bytes Desc: not available URL: From cookedm at physics.mcmaster.ca Wed Apr 26 12:33:01 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 26 12:33:01 2006 Subject: [Numpy-discussion] Chang*ed* the Trac authentication In-Reply-To: <444EE463.10007@gmail.com> (Robert Kern's message of "Tue, 25 Apr 2006 22:09:23 -0500") References: <444EE463.10007@gmail.com> Message-ID: Robert Kern writes: > Trying not to embarass myself again, I made the changes without telling you. :-) > > In order to create or modify Wiki pages or tickets on the NumPy and SciPy Tracs, > you will have to be logged in. You can register yourself by clicking the > "Register" link in the upper right-hand corner of the page. > > Developers who previously had accounts have the same username/password as > before. You can now change your password if you like. Only developers have the > ability to close tickets, delete Wiki pages entirely, or create new ticket > reports (and possibly a couple of other things). Developers, please enter your > name and email by clicking on the "Settings" link up at top once logged in. > > Thank you for your patience. If there are any problems, please email me, and I > will try to correct them quickly. Thanks Robert; I hope this helps with our spam problem to an extent. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From cookedm at physics.mcmaster.ca Wed Apr 26 12:48:04 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 26 12:48:04 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: (Arnd Baecker's message of "Wed, 26 Apr 2006 11:38:26 +0200 (CEST)") References: Message-ID: Arnd Baecker writes: > Hi, > > the doc-string of concatentate is pretty short: > > numpy.concatenate? > Docstring: > concatenate((a1,a2,...),axis=None). > > Would the following be better: > """ > concatenate((a1, a2,...), axis=None) joins the tuple `(a1, a2, ...)` of > sequences (or arrays) into a single numpy array. > > Example:: > > print concatenate( ([0,1,2], [5,6,7])) > """ > > ((The ``(or arrays)`` could be omitted if sequences include array by > default, though it might not be obvious to beginners ...)) Here's what I just checked in: concatenate((a1, a2, ...), axis=None) joins arrays together The tuple of sequences (a1, a2, ...) are joined along the given axis (default is the first one) into a single numpy array. Example: >>> concatenate( ([0,1,2], [5,6,7]) ) array([0, 1, 2, 5, 6, 7]) > I was also tempted to suggest a dtype argument, > concatenate( ([0,1,2], [5,6,7]), dtype=numpy.Float) > but I am not sure if that would be a good idea ... Well, that would require more code, so I didn't do it :-) -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From arnd.baecker at web.de Wed Apr 26 14:03:02 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 14:03:02 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: On Wed, 26 Apr 2006, David M. Cooke wrote: > Arnd Baecker writes: > > > Hi, > > > > the doc-string of concatentate is pretty short: > > > > numpy.concatenate? > > Docstring: > > concatenate((a1,a2,...),axis=None). > > > > Would the following be better: > > """ > > concatenate((a1, a2,...), axis=None) joins the tuple `(a1, a2, ...)` of > > sequences (or arrays) into a single numpy array. > > > > Example:: > > > > print concatenate( ([0,1,2], [5,6,7])) > > """ > > > > ((The ``(or arrays)`` could be omitted if sequences include array by > > default, though it might not be obvious to beginners ...)) > > Here's what I just checked in: > > concatenate((a1, a2, ...), axis=None) joins arrays together > > The tuple of sequences (a1, a2, ...) are joined along the given axis > (default is the first one) into a single numpy array. > > Example: > > >>> concatenate( ([0,1,2], [5,6,7]) ) > array([0, 1, 2, 5, 6, 7]) Great - many thanks!! There are some further routines which might benefit from some more explanation/examples - so if you don't mind I will try to suggest some additions (I could check them in directly, I think, but as I am not a native speaker I feel better to post them here for review/improvement). > > I was also tempted to suggest a dtype argument, > > concatenate( ([0,1,2], [5,6,7]), dtype=numpy.Float) > > but I am not sure if that would be a good idea ... > > Well, that would require more code, so I didn't do it :-) ;-) It might also be problematic, when one of the sequence elements would not fit into the output type. Best, Arnd From ndarray at mac.com Wed Apr 26 14:18:06 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 14:18:06 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <444F0420.9000500@ieee.org> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: I would like to apologize up-front if anyone found my overly general arguments inappropriate. I did not intend to be critical about anyone's code or design other than my own. Any references to "bad design" or "broken code" are related to my own misguided attempts to use some of the Numeric features in the past. It turned out that dimension-preserving broadcasting was a wrong feature to use for a specific class of problems that I am dealing with most of the time. This does not mean, that it cannot be used appropriately in other domains. I was wrong in posting overly general opinions without providing specific examples. I will try to do better in this post. Before I do that, however, let me try to explain why I hold strong views on certain things. In my view the most appealing feature in Python is the Zen of Python " and in particular "There should be one-- and preferably only one --obvious way to do it." In my view Python represents the "hard science" approach appealing to physics and math types while Perl is more of a "soft science" language. (There is nothing wrong with either Perl or soft sciences.) This is what make Python so appealing for scientific computing. Unfortunately, it is the fact of life that there are always many ways to solve the same problem and a successful "pythonic" design has to pick one (preferably the best) of the possible ways and make it obvious. This said, let me present a specific problem that I will use to illustrate my points below. Suppose we study school statistics in different cities. Let city A have 10 schools with 20 classes and 30 students in each. It is natural to organize the data collected about the students in a 10x20x30 array. It is also natural to collect some of the data at the per-school or per-class level. This data may come from aggregating student level statistics (say average test score) or from the characteristics that are class or school specific (say the grade or primary language). There are two obvious ways to present such data. 1) We can use 3-d arrays for everything and make the shape of the per-class array 10x20x1 and the shape of per-school array 10x1x1; and 2) use 2-d and 1-d arrays. The first approach seems to be more flexible. We can also have 10x1x30 or 1x1x30 arrays to represent data which varies along the student dimension, but is constant across schools or classes. However, this added benefit is illusory: the first student in one class list has no relationship to the first student in the other class, so in this particular problem an average score of the first student across classes makes no sense (it will also depend on whether students are ordered alphabetically or by an achievement rank). On the other hand this approach has a very significant drawback: functions that process city data have no way to distinguish between per-school data and a lucky city that can afford educating its students in individual classes. Just as it is extremely unlikely to have one student per class in our toy example, in real-world problems it is not unreasonable to assume that dimension of size 1 represents aggregate data. A software designed based on this assumption is what I would call broken in a subtle way. Please see more below. On 4/26/06, Travis Oliphant wrote: > Sasha wrote: > > On 4/25/06, tim.hochberg at cox.net wrote: > > > >> ---- Travis Oliphant wrote: > [...] > I don't think anyone is fundamentally opposed to multiple repetitions. > We're just being cautious. Also, as you've noted, the assignment code > is currently not using the ufunc broadcasting code and so they really > aren't the same thing, yet. It looks like there is a lot of development in this area going on at the moment. Please let me know if I can help. > [...] > > In my experience broadcasting length-1 and not broadcasting other > > lengths is very error prone as it is. > > That's not been my experience. I should have been more specific. As I explained above, the special properties of length-1 led me to design a system that distinguished aggregate data by testing for unit length. This system was subtly broken. In a rare case when the population had only one element, the system was producing wrong results. > But, I don't know R very well. I'm very > interested in what ideas you can bring. > R takes a very simple approach: everything is a vector. There are no scalars, if you need a scalar, you use a vector of length 1. Broadcasting is simply repetition: > x <- rep(0,10) > x + c(1,2) [1] 1 2 1 2 1 2 1 2 1 2 the length of the larger vector does not even need to be a multiple of the shorter, but in this case a warning is issued: > x + c(1,2,3) [1] 1 2 3 1 2 3 1 2 3 1 Warning message: longer object length is not a multiple of shorter object length in: x + c(1, 2, 3) Multi-dimensional arrays are implemented by setting a "dim" attribute: > dim(x) <- c(2,5) > x [,1] [,2] [,3] [,4] [,5] [1,] 0 0 0 0 0 [2,] 0 0 0 0 0 (R uses Fortran order). Broadcasting ignores the dim attribute, but does the right thing for conformable vectors: > x + c(1,2) [,1] [,2] [,3] [,4] [,5] [1,] 1 1 1 1 1 [2,] 2 2 2 2 2 However, the following is unfortunate: > x + 1:5 [,1] [,2] [,3] [,4] [,5] [1,] 1 3 5 2 4 [2,] 2 4 1 3 5 > > I understand that restricting > > broadcasting to make it a strictly dimension-increasing operation is > > not possible for two reasons: > > > > 1. Numpy cannot break legacy Numeric code. > > 2. It is not possible to differentiate between 1-d array that > > broadcasts column-wise vs. one that broadcasts raw-wise. > > > > In my view none of these reasons is valid. In my experience Numeric > > code that relies on dimension-preserving broadcasting is already > > broken, only in a subtle and hard to reproduce way. > > I definitely don't agree with you here. Dimension-preserving > broadcasting is at the heart of the utility of broadcasting and it is > very, very useful for that. Calling it subtly broken suggests that you > don't understand it and have never used it for it's intended purpose. > I've used dimension-preserving broadcasting literally hundreds of > times. It's rather bold of you to say that all of that code is "broken" > Sorry I was not specific in the original post. I hope you now understand where I come from. Can you point me to some examples of the correct way to use dimension-preserving broadcasting? I would assume that it is probably more useful in the problem domains where there is no natural ordering of the dimensions, unlike in the hierarchial data example that I used. > Now, I'm sure there are other useful ways to "broadcast", but > dimension-preserving is essentially what broadcasting *is* in NumPy. > If anything it is the dimension-increasing rule that is somewhat > arbitrary (e.g. why prepend with ones). > The dimension-increasing broadcasting is very natural when you deal with hierarchical data where various dimensions correspond to the levels of aggregation. As I explained above, average student score per class makes sense while the average score per student over classes does not. It is very common to combine per-class data with per-student data by broadcasting per-class data. For example, the total time spent by student is a sum spent in regular per-class session plus individual elected courses. > > Perhaps you want to introduce some other way for non-commensurate shapes > to interact in an operation. I think you will find many open minds on > this list (although probably not anyone who will want to code it up :-) > ). We do welcome the discussion. Your experience with other > array-like languages is helpful. > I will be happy to contribute code if I see interest. > > > Similarly the > > need to broadcast over non-leading dimension is a sign of bad design. > > In rare cases where such broadcasting is desirable, it can be easily > > done via swapaxes which is a cheap operation. > > > > Again, it would help if you would refrain from using negative words > about coding styles that are different from your own. Such > broadcasting is not that rare. It happens quite frequently, actually. > The point of a language like Python is that you can write algorithms > simply without struggling with optimization questions up front like you > seem to be hinting at. > I hope you understand that I did not mean to criticize anyone's coding style. I was not really hinting at optimization issues, I just had a particular design problem in mind (see above). Incidentally, dimension-increasing broadcasting does tend to lead to more efficient code both in terms of memory utilization and more straightforward algorithms with fewer special cases, but this was not really what I was referring to. > > On the other hand I don't see much problem in making > > dimension-preserving broadcasting more permissive. In R, for example, > > (1-d) arrays can be broadcast to arbitrary size. This has an > > additional benefit that 1-d to 2-d broadcasting requires no special > > code, it just happens because matrices inherit arithmetics from > > vectors. I've never had a problem with R rules being too loose. > > > > So, please explain exactly what you mean. Only a few on this list know > what the R rules even are. See above. > > In my view the problem that your ticket highlighted is not so much in > > the particular set of broadcasting rules, but in the fact that a[...] > > = b uses one set of rules while a[...] += b uses another. This is > > *very* confusing. > > > > Yes, this is admittedly confusing. But, it's an outgrowth of the way > Numeric code developed. Broadcasting was always only a ufunc concept in > Numeric, and copying was not a ufunc. NumPy grew out of Numeric > code. I was not trying to mimick broadcasting behavior when I wrote > the array copy and array setting code. Perhaps I should have been. > In the spirit of appealing to obscure languages ;-), let me mention that in the K language (kx.com) element assignment is implemented using an Amend primitive that takes four arguments: @[x,i,f,y] id more or less equivalent to numpy's x[i] = f(x[i], y[i]), where x, y and i are vectors and f is a binary (broadcasting) function. Thus, x[i] += y[i] can be written as @[x,i,+,y] and x[i] = y[i] is @[x,i,:,y], where ':' denotes a binary function that returns it's second argument and ignores the first. K interpretor's Linux binary is less than 200K and that includes a simple X window GUI! Such small code size would not be possible without picking the right set of primitives and avoiding special case code. > I'm willing to change the code on this one, but only if the new copy > code actually does implement broadcasting behavior equivalently. And > going through the ufunc machinery is probably a waste of effort because > the copy code must be written for variable length arrays anyway (and > ufuncs don't support them). > I know close to nothing about variable length arrays. When I need to deal with the relational database data, I transpose it so that each column gets into its own fixed length array. This is the approach that both R and K take. However, at least at the C level, I don't see why ufunc code cannot be generalized to handle variable length arrays. At the python level, pre-defined arithmetic or math functions are probably not feasible for variable length, but the ability to define a variable length array function by just writing an inner loop implementation may be quite useful. > However, the broadcasting machinery has been abstracted in NumPy and can > therefore be re-used in the copying code. In Numeric, broadcasting was > basically implemented deep inside a confusing while loop. I've never understood the Numeric's while loop and completely agree with your characterization. I am still studying the numpy code, but it is clearly a big improvement. From shhong at u.washington.edu Wed Apr 26 14:19:01 2006 From: shhong at u.washington.edu (Sungho Hong) Date: Wed Apr 26 14:19:01 2006 Subject: [Numpy-discussion] Building Numpy with Windows and MKL? Message-ID: <207B8B70-6328-421D-8343-B32506AF47CA@u.washington.edu> Has anyone tried to install numpy with MS Windows and Intel Math Kernel Library, especially using the VC 2003 compiler? I began with MKLROOT=C:\Program Files\Inter\plsuite, but the setup.py seems to have a problem with finding the library path. In that case, how do manually set up all the relevant paths manually? Thanks. - SH From ryanlists at gmail.com Wed Apr 26 14:21:07 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Wed Apr 26 14:21:07 2006 Subject: [Numpy-discussion] array.min() vs. min(array) Message-ID: I was spending some time trying to track down how to speed up an algorithm that gets called a bunch of times during an optimization. I was startled when I finally figured out that most of the time was wasted by using the built-in pyhton min function. It turns out that in my case, using array.min() (i.e. the method of the Numpy array) is 300-500 times faster than the built-in python min function (i.e. min(array)). So, thank you Travis and everyone who has put so much time into thinking through Numpy and making it fast (as well as making sure it is correct). And to the rest of us, use the Numpy array methods whenever you can. Thanks, Ryan From oliphant.travis at ieee.org Wed Apr 26 14:42:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Apr 26 14:42:05 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: Message-ID: <444FE909.5080209@ieee.org> Ryan Krauss wrote: > I was spending some time trying to track down how to speed up an > algorithm that gets called a bunch of times during an optimization. I > was startled when I finally figured out that most of the time was > wasted by using the built-in pyhton min function. It turns out that > in my case, using array.min() (i.e. the method of the Numpy array) is > 300-500 times faster than the built-in python min function (i.e. > min(array)). > > So, thank you Travis and everyone who has put so much time into > thinking through Numpy and making it fast (as well as making sure it > is correct). The builtin min function is a bit confusing because it usually does work on NumPy arrays. But, as you've noticed it is always slower because it uses the "generic sequence interface" that NumPy arrays expose. So, it's basically not much faster than a Python loop. In this case you are also being hit by the fact that scalarmath is not yet implemented (it's getting close though...) so the returned array scalars are being compared using the bulky ufunc machinery on each element separately. In Python 2.5 we are going to have the same issues with the new any() and all() functions of Python. -Travis From wbaxter at gmail.com Wed Apr 26 14:56:12 2006 From: wbaxter at gmail.com (Bill Baxter) Date: Wed Apr 26 14:56:12 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: Is that a representative example? It seems highly unlikely that in real life every one of the schools would have exactly 20 classes, and each of those exactly 30 students. I don't know anything about R or the way things are typically done with statistical languages -- maybe this is the norm there -- but from a pure CompSci data structures perspective, a 3D array seems ill-suited for this type of hierarchical data. Something more flexible, along the lines of a Python list of list of list, seems more apropriate. --bill On 4/27/06, Sasha wrote: > Suppose we study school statistics in > different cities. Let city A have 10 schools with 20 classes and 30 > students in each. It is natural to organize the data collected about > the students in a 10x20x30 array. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Wed Apr 26 15:24:07 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 15:24:07 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: On 4/26/06, David M. Cooke wrote: > .... > Here's what I just checked in: > > concatenate((a1, a2, ...), axis=None) joins arrays together > > The tuple of sequences (a1, a2, ...) are joined along the given axis > (default is the first one) into a single numpy array. > > Example: > > >>> concatenate( ([0,1,2], [5,6,7]) ) > array([0, 1, 2, 5, 6, 7]) > The first argument does not have to be a tuple: >>> print concatenate([[0,1,2], [5,6,7]]) [0 1 2 5 6 7] but the docstring is probably ok given that the alternative is "sequence of sequences" ... From ndarray at mac.com Wed Apr 26 15:58:04 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 15:58:04 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: On 4/26/06, Bill Baxter wrote: > Is that a representative example? It seems highly unlikely that in real > life every one of the schools would have exactly 20 classes, and each of > those exactly 30 students. You should not take my toy example too seriousely. However, with support for missing values, 3-d arrays may provide an efficient representation for a more realistic scenario when you only know upper bounds for the number of students/classes. Smaller schools will have missing values in their arrays. > I don't know anything about R or the way things > are typically done with statistical languages -- maybe this is the norm > there -- but from a pure CompSci data structures perspective, a 3D array > seems ill-suited for this type of hierarchical data. Something more > flexible, along the lines of a Python list of list of list, seems more > apropriate. > You are right. I am sorely missing ragged array support in numpy like the one available in K. Numpy supports nested arrays, but does not optimize the most common case when nested arrays are of the same type. > --bill > > > On 4/27/06, Sasha wrote: > > > Suppose we study school statistics in > > different cities. Let city A have 10 schools with 20 classes and 30 > > students in each. It is natural to organize the data collected about > > the students in a 10x20x30 array. > > > From ndarray at mac.com Wed Apr 26 16:16:07 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 16:16:07 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: On 4/26/06, Sasha wrote: > On 4/26/06, Bill Baxter wrote: > > Is that a representative example? It seems highly unlikely that in real > > life every one of the schools would have exactly 20 classes, and each of > > those exactly 30 students. > > You should not take my toy example too seriousely. However, with > support for missing values, 3-d arrays may provide an efficient > representation for a more realistic scenario when you only know upper > bounds for the number of students/classes. Smaller schools will have > missing values in their arrays. In addition, it is reasonable to sample a fixed number of classes from each school and a fixed number of students from each class at random for a statistical study. From simon at arrowtheory.com Wed Apr 26 16:41:04 2006 From: simon at arrowtheory.com (Simon Burton) Date: Wed Apr 26 16:41:04 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? Message-ID: <20060427094025.10172889.simon@arrowtheory.com> Is it possible to obtain a permutation (array of indices) representing the transform that sorts an array ? Is there a numpy way of doing this ? I can do it in python as: a = [ 6, 5, 99, 2 ] idxs = range(len(a)) z = zip(idxs,a) def zcmp(u,v): if u[1]<=v[1]: return -1 return 1 z.sort( zcmp ) idxs = [u[0] for u in z] # <--- permutation Simon. -- Simon Burton, B.Sc. Licensed PO Box 8066 ANU Canberra 2601 Australia Ph. 61 02 6249 6940 http://arrowtheory.com From pgmdevlist at mailcan.com Wed Apr 26 16:45:02 2006 From: pgmdevlist at mailcan.com (Pierre GM) Date: Wed Apr 26 16:45:02 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? In-Reply-To: <20060427094025.10172889.simon@arrowtheory.com> References: <20060427094025.10172889.simon@arrowtheory.com> Message-ID: <200604261944.01584.pgmdevlist@mailcan.com> On Wednesday 26 April 2006 19:40, Simon Burton wrote: > Is it possible to obtain a permutation (array of indices) > representing the transform that sorts an array ? Is there a numpy way > of doing this ? I guess argsort() could be what you want From ndarray at mac.com Wed Apr 26 16:45:03 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 16:45:03 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? In-Reply-To: <20060427094025.10172889.simon@arrowtheory.com> References: <20060427094025.10172889.simon@arrowtheory.com> Message-ID: >>> argsort([ 6, 5, 99, 2 ]) array([3, 1, 0, 2]) On 4/26/06, Simon Burton wrote: > > Is it possible to obtain a permutation (array of indices) > representing the transform that sorts an array ? Is there a numpy way > of doing this ? > > I can do it in python as: > > a = [ 6, 5, 99, 2 ] > idxs = range(len(a)) > z = zip(idxs,a) > def zcmp(u,v): > if u[1]<=v[1]: > return -1 > return 1 > z.sort( zcmp ) > idxs = [u[0] for u in z] # <--- permutation > > Simon. > > -- > Simon Burton, B.Sc. > Licensed PO Box 8066 > ANU Canberra 2601 > Australia > Ph. 61 02 6249 6940 > http://arrowtheory.com > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From zpincus at stanford.edu Wed Apr 26 16:46:05 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Wed Apr 26 16:46:05 2006 Subject: [Numpy-discussion] obtain indexes of a sort ? In-Reply-To: <20060427094025.10172889.simon@arrowtheory.com> References: <20060427094025.10172889.simon@arrowtheory.com> Message-ID: <800F9820-F672-4EBF-8F48-3C3AEF17FC34@stanford.edu> a.argsort() or numpy.argsort(a) Zach On Apr 26, 2006, at 4:40 PM, Simon Burton wrote: > > Is it possible to obtain a permutation (array of indices) > representing the transform that sorts an array ? Is there a numpy way > of doing this ? > > I can do it in python as: > > a = [ 6, 5, 99, 2 ] > idxs = range(len(a)) > z = zip(idxs,a) > def zcmp(u,v): > if u[1]<=v[1]: > return -1 > return 1 > z.sort( zcmp ) > idxs = [u[0] for u in z] # <--- permutation > > Simon. > > -- > Simon Burton, B.Sc. > Licensed PO Box 8066 > ANU Canberra 2601 > Australia > Ph. 61 02 6249 6940 > http://arrowtheory.com > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From pearu at scipy.org Wed Apr 26 16:56:05 2006 From: pearu at scipy.org (Pearu Peterson) Date: Wed Apr 26 16:56:05 2006 Subject: [Numpy-discussion] Possible ref.count bug in changeset #2422 Message-ID: Hi, Shouldn't result be Py_INCRE'ted when it is equal to Py_NotImplemented and returned from array_richcompare? Pearu From doug5y at shaw.ca Wed Apr 26 17:10:05 2006 From: doug5y at shaw.ca (Doug Nadworny) Date: Wed Apr 26 17:10:05 2006 Subject: [Numpy-discussion] Can't install numpy-0.9.6-1.i586.rpm on FC5 Message-ID: <44500B9E.10602@shaw.ca> when trying to install numpy-0.9.6-1.i586.rpm on Fedora Core 5, rpm reports incorrectly that python is the incorrect version, even though it is correct: >rpm -i --test numpy-0.9.6-1.i586.rpm ## Tests dependences of rpm package error: Failed dependencies: python-base >= 2.4 is needed by numpy-0.9.6-1.i586 >python -V Python 2.4.2 Is there a way around this? TIA, Doug N From cookedm at physics.mcmaster.ca Wed Apr 26 17:20:05 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Wed Apr 26 17:20:05 2006 Subject: [Numpy-discussion] Possible ref.count bug in changeset #2422 In-Reply-To: (Pearu Peterson's message of "Wed, 26 Apr 2006 18:55:55 -0500 (CDT)") References: Message-ID: Pearu Peterson writes: > Hi, > > Shouldn't result be Py_INCRE'ted when it is equal to Py_NotImplemented > and returned from array_richcompare? Theoretically, yes, but since the case statement "should" cover all cases, it doesn't matter. Bad code style though on my part; I've added a default: case instead. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From silesalvarado at hotmail.com Wed Apr 26 17:33:04 2006 From: silesalvarado at hotmail.com (Hugo Siles) Date: Wed Apr 26 17:33:04 2006 Subject: [Numpy-discussion] crush!!!! Message-ID: HI, I have a problem when I run the following options in python: >>>from Numeric import * >>>from Linear algebra I define a matrix 'a' which prints correctly, calculates its inverse, determinat and so for but when I try to calculate the eigenvalues, such as >>> c = eigenvalues(a) the system just crushs without any message I made this test because in some other programs with source code happens the same thing. I hope some body can help, thanks Hugo Siles From ivazquez at ivazquez.net Wed Apr 26 17:33:08 2006 From: ivazquez at ivazquez.net (Ignacio Vazquez-Abrams) Date: Wed Apr 26 17:33:08 2006 Subject: [Numpy-discussion] Can't install numpy-0.9.6-1.i586.rpm on FC5 In-Reply-To: <44500B9E.10602@shaw.ca> References: <44500B9E.10602@shaw.ca> Message-ID: <1146098100.16081.15.camel@ignacio.lan> On Wed, 2006-04-26 at 18:09 -0600, Doug Nadworny wrote: > when trying to install numpy-0.9.6-1.i586.rpm on Fedora Core 5, rpm > reports incorrectly that python is the incorrect version, even though it > is correct: > > >rpm -i --test numpy-0.9.6-1.i586.rpm ## Tests dependences of rpm package > error: Failed dependencies: > python-base >= 2.4 is needed by numpy-0.9.6-1.i586 > >python -V > Python 2.4.2 Alright, alright, I'll update it already... -- Ignacio Vazquez-Abrams http://fedora.ivazquez.net/ gpg --keyserver hkp://subkeys.pgp.net --recv-key 38028b72 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 191 bytes Desc: This is a digitally signed message part URL: From ndarray at mac.com Wed Apr 26 18:15:04 2006 From: ndarray at mac.com (Sasha) Date: Wed Apr 26 18:15:04 2006 Subject: [Numpy-discussion] crush!!!! In-Reply-To: References: Message-ID: Numeric computes by calling lapack's dgeev subroutine. Depending on installation Numeric may either use its own subset of lapack (translated from Fortran to C) or link to the system supplied Lapack libraries. It is possible that there is a bug in your system's lapack libraries. Some lapack bugs related to extended precision calculations were reported recently. What you observe is unlikely to be a Numeric bug. Note, however that Numeric is no longer actively supported. If you can reproduce the same problem with numpy, it will likely to get more attention. Also you have to give us some means to reproduce your matrix a if you expect more than a general advise. On 4/26/06, Hugo Siles wrote: > HI, > > I have a problem when I run the following options in python: > > >>>from Numeric import * > >>>from Linear algebra > I define a matrix 'a' which prints correctly, calculates its inverse, > determinat and so for > but when I try to calculate the eigenvalues, such as > >>> c = eigenvalues(a) > the system just crushs without any message > I made this test because in some other programs with source code happens the > same thing. > > I hope some body can help, thanks > > Hugo Siles > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From strawman at astraw.com Wed Apr 26 19:26:05 2006 From: strawman at astraw.com (Andrew Straw) Date: Wed Apr 26 19:26:05 2006 Subject: [Numpy-discussion] SWIG for 3D array In-Reply-To: References: Message-ID: <44502B85.3000504@astraw.com> Gennan Chen wrote: > And I really have hard time to understand how to deal with reference > counting issues. Anyone know where I can know a good reference for that? http://docs.python.org/ext/refcounts.html From oliphant.travis at ieee.org Wed Apr 26 20:30:12 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Wed Apr 26 20:30:12 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: <44503A8A.2050701@ieee.org> Sasha wrote: > In my view the most appealing feature in Python is > the Zen of Python " and in > particular "There should be one-- and preferably only one --obvious > way to do it." In my view Python represents the "hard science" > approach appealing to physics and math types while Perl is more of a > "soft science" language. Interesting analogy. I've not heard that expression before. > Unfortunately, it is the fact of life that there are > always many ways to solve the same problem and a successful "pythonic" > design has to pick one (preferably the best) of the possible ways and > make it obvious. > And it's probably impossible to agree as to what is "best" because of the different uses that array's receive. That's one reason I'm anxious to get a basic structure-only basearray into Python itself. > This said, let me present a specific problem that I will use to > illustrate my points below. Suppose we study school statistics in > different cities. Let city A have 10 schools with 20 classes and 30 > students in each. It is natural to organize the data collected about > the students in a 10x20x30 array. It is also natural to collect some > of the data at the per-school or per-class level. This data may come > from aggregating student level statistics (say average test score) or > from the characteristics that are class or school specific (say the > grade or primary language). There are two obvious ways to present > such data. 1) We can use 3-d arrays for everything and make the shape > of the per-class array 10x20x1 and the shape of per-school array > 10x1x1; and 2) use 2-d and 1-d arrays. The first approach seems to be > more flexible. We can also have 10x1x30 or 1x1x30 arrays to represent > data which varies along the student dimension, but is constant across > schools or classes. However, this added benefit is illusory: the > first student in one class list has no relationship to the first > student in the other class, so in this particular problem an average > score of the first student across classes makes no sense (it will also > depend on whether students are ordered alphabetically or by an > achievement rank). > > On the other hand this approach has a very significant drawback: > functions that process city data have no way to distinguish between > per-school data and a lucky city that can afford educating its > students in individual classes. Just as it is extremely unlikely to > have one student per class in our toy example, in real-world problems > it is not unreasonable to assume that dimension of size 1 represents > aggregate data. A software designed based on this assumption is what > I would call broken in a subtle way. > I think I see what you are saying. This is a very specific circumstance. I can verify that the ndarray has not been designed to distinguish such hierarchial data. You will never be able to tell from the array itself if a dimension of length 1 means aggregate data or not. I don't see that as a limitation of the ndarray but as evidence that another object (i.e. an R-like data-frame) should probably be used. Such an object could even be built on top of the ndarray. >> [...] >> I don't think anyone is fundamentally opposed to multiple repetitions. >> We're just being cautious. Also, as you've noted, the assignment code >> is currently not using the ufunc broadcasting code and so they really >> aren't the same thing, yet. >> > > It looks like there is a lot of development in this area going on at > the moment. Please let me know if I can help. > Well, I did some refactoring to make it easier to expose the basic concept of the ufunc elsewhere: 1) Adjusting the inputs to a common shape (this is what I call broadcasting --- it appears to me that you use the term a little more loosely) 2) Setting up iterators to iterate over all but the longest dimension so that the inner loop is done. These are the key ingredients to a fast ufunc. There is 1 more optimization in the ufunc machinery for the contiguous case (when the inner loop is all that is needed) and then there is code to handle the buffering needed for unaligned and/or byte-swapped data. The final thing that makes a ufunc is the precise signature of the inner loop. Every inner loop as the same signature. This signature does not contain a slot for the length of the array element (that's a big reason why variable-length arrays are not supported in ufuncs). The ufuncs could be adapted, of course, but it was a bigger fish than I wanted to try and fry pre 1.0 Note, though, that I haven't used these concepts yet to implement ufunc-like copying. The PyArray_Cast function will also need to be adjusted at the same time and this could actually prove more difficult as it must implement buffering. Of course it could give us a chance to abstract-out the buffered, broadcasted call as well. That might make a useful C-API function. Any help you can provide would be greatly appreciated. I'm focused right now on the scalar math module as without it, NumPy is still slower for people that use a lot of array elements. >> [...] >> >>> In my experience broadcasting length-1 and not broadcasting other >>> lengths is very error prone as it is. >>> >> That's not been my experience. >> > > I should have been more specific. As I explained above, the special > properties of length-1 led me to design a system that distinguished > aggregate data by testing for unit length. This system was subtly > broken. In a rare case when the population had only one element, the > system was producing wrong results. > Yes I can see that now. Your comments make a lot more sense. Trying to use ndarray's to represent hierarchial data can cause these subtle issues. The ndarray is a "flat" object in the sense that every element is seen as "equal" to every other element. >> dim(x) <- c(2,5) >> x >> > [,1] [,2] [,3] [,4] [,5] > [1,] 0 0 0 0 0 > [2,] 0 0 0 0 0 > > (R uses Fortran order). Broadcasting ignores the dim attribute, but > does the right thing for conformable vectors: > > Thanks for the description of R. >> x + c(1,2) >> > [,1] [,2] [,3] [,4] [,5] > [1,] 1 1 1 1 1 > [2,] 2 2 2 2 2 > > However, the following is unfortunate: > Ahh... So, it looks like R does on arithmetic what NumPy copying is currently doing (treating both as flat spaces to fill). >> x >> > Sorry I was not specific in the original post. I hope you now > understand where I come from. Can you point me to some examples of > the correct way to use dimension-preserving broadcasting? I would > assume that it is probably more useful in the problem domains where > there is no natural ordering of the dimensions, unlike in the > hierarchial data example that I used. > Yes, the ndarray does not recognize any natural ordering to the dimensions at all. Every dimension is "equal." It's designed to be a very basic object. I'll post some examples later. I've got to go right now. > The dimension-increasing broadcasting is very natural when you deal > with hierarchical data where various dimensions correspond to the > levels of aggregation. As I explained above, average student score > per class makes sense while the average score per student over classes > does not. It is very common to combine per-class data with > per-student data by broadcasting per-class data. For example, the > total time spent by student is a sum spent in regular per-class > session plus individual elected courses. > I think you've hit on something here regarding the use of an array for "hierachial" data. I'm not sure I understand the implications entirely, but at least it helps me a little bit see what your concerns really are. > I hope you understand that I did not mean to criticize anyone's coding > style. I was not really hinting at optimization issues, I just had a > particular design problem in mind (see above). I do understand much better now. I still need to think about the hierarchial case a bit more. My basic concept of an array which definitely biases me is a medical imaging volume.... (i.e. the X-ray density at each location in 3-space). I could use improved understanding of how to use array's effectively in hierarchies. Perhaps we can come up with some useful concepts (or maybe another useful structure that inherits from the basearray) and can therefore share data effectively with the ndarray.... > In the spirit of appealing to obscure languages ;-), let me mention > that in the K language (kx.com) element assignment is implemented > using an Amend primitive that takes four arguments: @[x,i,f,y] id more > or less equivalent to numpy's x[i] = f(x[i], y[i]), where x, y and i > are vectors and f is a binary (broadcasting) function. Thus, x[i] += > y[i] can be written as @[x,i,+,y] and x[i] = y[i] is @[x,i,:,y], where > ':' denotes a binary function that returns it's second argument and > ignores the first. K interpretor's Linux binary is less than 200K and > that includes a simple X window GUI! Such small code size would not be > possible without picking the right set of primitives and avoiding > special case code. > Not to mention limiting the number of data-types :-) > I know close to nothing about variable length arrays. When I need to > deal with the relational database data, I transpose it so that each > column gets into its own fixed length array. Yeah, that was my strategy too and what I always suggested to the numarray folks who wanted the variable-length arrays. But, memory-mapping can't be done that way.... > This is the approach > that both R and K take. However, at least at the C level, I don't see > why ufunc code cannot be generalized to handle variable length arrays. > They of course, could be, it's just more re-factoring than I wanted to do. The biggest issue is the underlying 1-d loop function signature. I hesitated to change the signature because that would break compatibility with Numeric extension modules that defined ufuncs (like scipy-special...) The length could piggy-back in the data argument passed into those functions, but doing that right was more work than I wanted to do. If you solve that problem, everything else could be made to work without too much trouble. > At the python level, pre-defined arithmetic or math functions are > probably not feasible for variable length, but the ability to define a > variable length array function by just writing an inner loop > implementation may be quite useful. > Yes, it could have helped write the string comparisons much faster :-) >> However, the broadcasting machinery has been abstracted in NumPy and can >> therefore be re-used in the copying code. In Numeric, broadcasting was >> basically implemented deep inside a confusing while loop. >> > > I've never understood the Numeric's while loop and completely agree > with your characterization. I am still studying the numpy code, but > it is clearly a big improvement. > Well, it's more straightforward because I'm not the genius Jim Hugunin is. It makes heavy use of the iterator concept which I finally grok'd while trying to write things (and realized I had basically already implemented in writing the old scipy.vectorize). I welcome many more eyes on the code. I know I've taken shortcuts in places that should be improved. Thanks for your continued help and useful comments. -Travis From tim.hochberg at cox.net Wed Apr 26 21:02:10 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Wed Apr 26 21:02:10 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: <44504296.8040602@cox.net> I haven't fully waded through all the various replies and to this thread. I plan to do that and send a reply on specific points later. This is message is more of a historical, motivational or possibly philosophical nature. First off, NumPy has used the term "broadcast" to mean the same thing since its inception and changing the terminology now is asking for confusion. *In the context of this mailing list *,I think we should use "broadcast" in the numpy sense and use appropriate qualifiers when referring to how other array packages practice broadcasting. Referring to broadcasting as "shape-preserving broadcasting" or some such doesn't seems to make things any clearer and adds a bunch of excess verbiage. In any event, I plan to omit any "broadcast" qualifiers here. The following understanding was formed by using and occasionally helping with development of NumPy since it was developed in 1995 or thereabouts. That doesn't mean that my understanding aggrees with the primary developers of the time, I may misremember things and my recollections are likely tinged by the experience I've had with NumPy in the interim. So, don't take this as definitive, but perhaps it will help provide some insight into what NumPy's broadcasting is supposed to be. Let's first dispense with the padding of dimensions. As I recall, this was a way to make matrix like operations easier. This was way before there was a matrix class and by defining padding in this way 1-D vectors could generally be treated as column vectors. Row vectors still needed to be 2-D (1xN), but they tended to be less frequent, so that was less of a burden. Or maybe I have that backwards, in any event they were put there to to facilitate matrix-like uses of numpy arrays. Given that there is a matrix class at this point, I doubt I would automagically pad the dimensions if I were designing numpy from scratch now. Since the dimension padding is at least partly historical accident and since it is in some sense orthogonal to the main point of numpy's broadcasting I'm going to pretend it doesn't exist for the rest of this discussion. At it's core broadcasting is about adjusting the shapes of two arrays so that they match. Consider an array 'A' and an array 'B' with shaps (3, Any) and (Any, 4). Here, 'Any' means that the given dimension of the array is unspecified and can take on any value that is convenient for functions operating on the array. If we add 'A' and 'B' together we'd like the two 'Any' dimensions to stretch appropriately so that the result was an array of shape (3, 4). Similarly adding and array of shape (3, 4) to an array of shape (Any, 4) should work and produce an array of shape (3, 4). So far, this is pretty straightforward; I believe, it also bears a fair amount of resemblance to Sasha's 0-stride ideas. The complicating factor is that there wasn't a good way to spell 'Any' at the time. Or maybe we were lazy. Or maybe there was some other reason that I'm forgetting. In any event, we ended up spelling 'Any' as '1'. That means that there's no way to distinguish between a dimension that's of length-1 for some legitimate reason and one that is that length just for stretchability. It would be an interesting experiment to see how things would work with no padding and with an explicit 'Any' value available for dimensions. However, it's probably too much work and would result in too many backwards compatibility problems for NumPy proper. [Half baked thoughts on how to do this though: newaxis would produce a new axis with length -1 (or some other marker length). This would be treated as length-1 axes are treated now. However, length-1axes would no longer broadcast. Padding would be right out.] In summary, the platonic ideal of broadcasting is simple and clean. In practice it's more complicated for two reasons. First, padding the dimensions.I believe that this is mostly historical baggage. The second is the conflation of '1' and 'Any' (a name that I made up for this message, so don't go searching for it). This may be an hostorical accident and/or implementation artifact, but there may actually be some more practical reasons behind this as well that I am forgetting. Hopefully that is mildly informative, Regards, -tim From kwgoodman at gmail.com Wed Apr 26 21:46:08 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed Apr 26 21:46:08 2006 Subject: [Numpy-discussion] matrix.std() returns array Message-ID: I noticed that the mean of a matrix is a matrix but the standard deviation of a matrix is an array. Is that the expected behavior? I'm also getting the wrong values (0 and nan) for the standard deviation. Did I mess something up? I'm trying to learn scipy (and python) by porting a small Octave program. I installed numpy from svn (today) on a Debian box. And numpy.test() says OK. Here's an example: >> numpy.__version__ '0.9.7.2416' >> x = asmatrix(random.uniform(0,1,(3,3))) >> x matrix([[ 0.56771284, 0.57053769, 0.57505946], [ 0.10479534, 0.81692248, 0.91829316], [ 0.48627829, 0.59255983, 0.32628573]]) >> x.mean(0) matrix([[ 0.38626216, 0.66000667, 0.60654612]]) >> x.std(0) array([ nan, 0. , 0. ]) From arnd.baecker at web.de Wed Apr 26 23:01:03 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 23:01:03 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: On Wed, 26 Apr 2006, Sasha wrote: > On 4/26/06, David M. Cooke wrote: > > .... > > Here's what I just checked in: > > > > concatenate((a1, a2, ...), axis=None) joins arrays together > > > > The tuple of sequences (a1, a2, ...) are joined along the given axis > > (default is the first one) into a single numpy array. > > > > Example: > > > > >>> concatenate( ([0,1,2], [5,6,7]) ) > > array([0, 1, 2, 5, 6, 7]) > > > > The first argument does not have to be a tuple: > > >>> print concatenate([[0,1,2], [5,6,7]]) > [0 1 2 5 6 7] > > but the docstring is probably ok given that the alternative is > "sequence of sequences" ... Seems to be the usual problem of either being slightly unprecise but understandable or legally correct but impossible to understand (in particular for beginners). What about changing the example to: """ Examples: >>> concatenate(([0, 1, 2], [5, 6, 7])) array([0, 1, 2, 5, 6, 7]) >>> concatenate([[0, 1, 2], [5, 6, 7]]) array([0, 1, 2, 5, 6, 7]) >>> z = arange(5) >>> concatenate(([0, 1, 2], [5, 6, 7], z)) array([0, 1, 2, 5, 6, 7, 0, 1, 2, 3, 4]) """ Best, Arnd From Chris.Barker at noaa.gov Wed Apr 26 23:42:02 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed Apr 26 23:42:02 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <444FE909.5080209@ieee.org> References: <444FE909.5080209@ieee.org> Message-ID: <445067C6.3050805@noaa.gov> Travis Oliphant wrote: > In Python 2.5 we are going to have the same issues with the new any() > and all() functions of Python. "Namespaces are one honking great idea -- let's do more of those!" Yet another reason to deprecate import * ! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From arnd.baecker at web.de Wed Apr 26 23:49:06 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 23:49:06 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <444FE909.5080209@ieee.org> References: <444FE909.5080209@ieee.org> Message-ID: Moin, On Wed, 26 Apr 2006, Travis Oliphant wrote: > Ryan Krauss wrote: > > I was spending some time trying to track down how to speed up an > > algorithm that gets called a bunch of times during an optimization. I > > was startled when I finally figured out that most of the time was > > wasted by using the built-in pyhton min function. It turns out that > > in my case, using array.min() (i.e. the method of the Numpy array) is > > 300-500 times faster than the built-in python min function (i.e. > > min(array)). > > > > So, thank you Travis and everyone who has put so much time into > > thinking through Numpy and making it fast (as well as making sure it > > is correct). > > The builtin min function is a bit confusing because it usually does work > on NumPy arrays. But, as you've noticed it is always slower because it > uses the "generic sequence interface" that NumPy arrays expose. So, > it's basically not much faster than a Python loop. In this case you are > also being hit by the fact that scalarmath is not yet implemented (it's > getting close though...) so the returned array scalars are being > compared using the bulky ufunc machinery on each element separately. > > In Python 2.5 we are going to have the same issues with the new any() > and all() functions of Python. I am just preparing a small text to collect such cases for the wiki. However, I am not sure about a good name for such a page: http://www.scipy.org/Cookbook/Speed http://www.scipy.org/Cookbook/SpeedProblems http://www.scipy.org/Cookbook/Performance ? (As usual, it is easy to start a page, than to properly maintain it. OTOH things like this get lost very quickly, in particular with this nice amount of traffic here). In addition this also relates to - profiling (For example I would like to add the contents of http://mail.enthought.com/pipermail/enthought-dev/2006-January/001075.html to the wiki at some point) - psyco - pyrex - f2py - weave - numexpr - ... Presently much of this is listed in the Cookbook under "Using NumPy With Other Languages (Advanced)", and therefore the above "Python only" issues don't quite fit. Any suggestions? Best, Arnd From arnd.baecker at web.de Wed Apr 26 23:51:07 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Wed Apr 26 23:51:07 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <445067C6.3050805@noaa.gov> References: <444FE909.5080209@ieee.org> <445067C6.3050805@noaa.gov> Message-ID: On Wed, 26 Apr 2006, Christopher Barker wrote: > Travis Oliphant wrote: > > > In Python 2.5 we are going to have the same issues with the new any() > > and all() functions of Python. > > "Namespaces are one honking great idea -- let's do more of those!" > > Yet another reason to deprecate import * ! Yep! But it would not work for `min` as there is no such function in numpy. (would we need one?...) Best, Arnd From Chris.Barker at noaa.gov Thu Apr 27 00:00:05 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 27 00:00:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> Message-ID: <44506BE6.10301@noaa.gov> As Sasha quite clearly pointed out, when you do aggregation, you really do want to reduce the dimensionality of your data. IN fact, that's something that always bit me with MATLAB. If I had a matrix that happened to have a dimension of 1, MATLAB would interpret it as a vector. I ended up writing functions like "SumColumns" that would check if it was a single row vector before calling sum, so that I wouldn't suddenly get a scaler result if a matrix happened to have on row. Once you reduce dimensionality with aggregating functions, I can see how it would be natural to want to use broadcasting to to merge the reduced data and full data. However, I can't see how you could do that cleanly. How is the code to know whether a rank-1 array represents a column or row when multiplied with a rank-2 array? There is simply no way to know, in general. I suppose we could define a convention, like: "rank-1 arrays will be interpreted as row vectors for broadcasting." etc. for higher dimensions. However, I've found that even in my code, I don't find one convention always makes the most sense for all applications, so I'm just as happy to make it clear with a lot of calls like: v.shape = (-1, 1) NOTE: It appears that numpy does, in fact, use such a convention: >>> v = N.arange(5) >>> m = N.ones((5,5)) >>> v * m array([[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]]) >>> v.shape = (-1,1) >>> v * m array([[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]]) So what's the disagreement about? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Apr 27 00:10:03 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 27 00:10:03 2006 Subject: [Numpy-discussion] concatenate, doc-string In-Reply-To: References: Message-ID: <44506E2F.9040902@noaa.gov> David M. Cooke wrote: > Here's what I just checked in: > > concatenate((a1, a2, ...), axis=None) joins arrays together > > The tuple of sequences (a1, a2, ...) are joined along the given axis > (default is the first one) into a single numpy array. > > Example: > > >>> concatenate( ([0,1,2], [5,6,7]) ) > array([0, 1, 2, 5, 6, 7]) While we're at it, why not an example of how the axis argument works: >>> concatenate( (ones((1,3)), zeros((1,3))) ) array([[1, 1, 1], [0, 0, 0]]) >>> concatenate( (ones((1,3)), zeros((1,3))), axis = 0 ) array([[1, 1, 1], [0, 0, 0]]) >>> concatenate( (ones((1,3)), zeros((1,3))), axis = 1 ) array([[1, 1, 1, 0, 0, 0]]) I'm not sure I like this example, but it's a easy way to do a one liner. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant.travis at ieee.org Thu Apr 27 00:53:00 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 27 00:53:00 2006 Subject: [Numpy-discussion] matrix.std() returns array In-Reply-To: References: Message-ID: <4450780C.9060403@ieee.org> Keith Goodman wrote: > I noticed that the mean of a matrix is a matrix but the standard > deviation of a matrix is an array. Is that the expected behavior? I'm > also getting the wrong values (0 and nan) for the standard deviation. > Did I mess something up? > > I'm trying to learn scipy (and python) by porting a small Octave > program. I installed numpy from svn (today) on a Debian box. And > numpy.test() says OK. > > This should be fixed now in SVN. If somebody can add a test that would be great. Note, that the methods taking axes also now preserve row and column orientation for matrices. -Travis From oliphant.travis at ieee.org Thu Apr 27 01:03:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 27 01:03:04 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN Message-ID: <44507A9D.8070902@ieee.org> I want to apologize for the relative instability of the SVN tree in the past couple of days. Getting the scalarmath layout working took more C-API changes than I had anticipated. The SVN version of NumPy now builds scalarmath by default. The basic layout of the module is complete. However, there are many basic functions that are missing. As a result, during compile you will get many warnings about undefined functions. If an attempt were made to load the module it would cause an error as well due to undefined symbols. These undefined symbols are all the basic operations on fundamental c data-types that either need a function defined or a #define statement made. The names have this form: @name at _ctype_@oper@ where @name@ is one of the 16 Number-like types and @oper@ is one of the operations needing to be supported. The function (or macro) needs to implement the operation on the basic data-type and if necessary set an error-flag in the floating-point registers. If anybody has time to help implement these basic operations, it would be greatly appreciated. -Travis From zpincus at stanford.edu Thu Apr 27 01:22:05 2006 From: zpincus at stanford.edu (Zachary Pincus) Date: Thu Apr 27 01:22:05 2006 Subject: [Numpy-discussion] matrix.std() returns array In-Reply-To: <4450780C.9060403@ieee.org> References: <4450780C.9060403@ieee.org> Message-ID: <05B8DC8B-CD68-4EF2-BB2B-6FFABABF812E@stanford.edu> On a slightly-related note, was anyone able to reproduce the exception with matrix types and the var() method? e.g. numpy.matrix([[1,2,3], [1,2,3]]).var() complains about unaligned data... Presumably if std is fixed in SVN, so is var. Also if a std unit test is added, a var one should be too. Zach On Apr 27, 2006, at 12:51 AM, Travis Oliphant wrote: > Keith Goodman wrote: >> I noticed that the mean of a matrix is a matrix but the standard >> deviation of a matrix is an array. Is that the expected behavior? I'm >> also getting the wrong values (0 and nan) for the standard deviation. >> Did I mess something up? >> >> I'm trying to learn scipy (and python) by porting a small Octave >> program. I installed numpy from svn (today) on a Debian box. And >> numpy.test() says OK. >> >> > This should be fixed now in SVN. If somebody can add a test that > would be great. > > Note, that the methods taking axes also now preserve row and column > orientation for matrices. > > -Travis > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, > security? > Get stuff done quickly with pre-integrated technology to make your > job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel? > cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion From arnd.baecker at web.de Thu Apr 27 03:06:17 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 27 03:06:17 2006 Subject: [Numpy-discussion] vectorize problem In-Reply-To: <444FA7E7.2070303@ieee.org> References: <200604251324.42987.steffen.loeck@gmx.de> <444FA7E7.2070303@ieee.org> Message-ID: On Wed, 26 Apr 2006, Travis Oliphant wrote: [...] > It is just a simple change. Scalars are supposed to be supported. > They aren't only as a side-effect of the switch to not return > object-scalars. I did not update the vectorize code to handle the > scalar return value from the object ufunc (which is now no-longer an > object-scalar with the methods of arrays (like astype) but is instead > the underlying object). > > I'll add a check. Works perfect now - many thanks! This reminds me of some other issue when trying to vectorize f2py-wrapped functions: Pearu suggested a fix in terms of a more general way to determine the number of arguments of a callable Python object, http://www.scipy.net/pipermail/scipy-user/2006-April/007617.html However, it seems that this has fallen through the cracks (and I don't see how to incorporate it into numpy.vectorize...) Is this another simple one? ;-) Many thanks, Arnd From gruben at bigpond.net.au Thu Apr 27 05:05:02 2006 From: gruben at bigpond.net.au (Gary Ruben) Date: Thu Apr 27 05:05:02 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: <444FE909.5080209@ieee.org> Message-ID: <4450B34F.8010501@bigpond.net.au> Hi Arnd, You could call it PerformanceTips and include some search terms like "speed" in the page so search engines pick them up. Gary R. Arnd Baecker wrote: > I am just preparing a small text to collect such cases for the wiki. > > However, I am not sure about a good name for such a page: > http://www.scipy.org/Cookbook/Speed > http://www.scipy.org/Cookbook/SpeedProblems > http://www.scipy.org/Cookbook/Performance > ? From ryanlists at gmail.com Thu Apr 27 06:41:08 2006 From: ryanlists at gmail.com (Ryan Krauss) Date: Thu Apr 27 06:41:08 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <4450B34F.8010501@bigpond.net.au> References: <444FE909.5080209@ieee.org> <4450B34F.8010501@bigpond.net.au> Message-ID: I think this is a great idea. We get a lot of these kinds of questions on the list, and the collective wisdom of people here who have really dug into this is really impressive. But, that wisdom does need to be a little easier to find. Speaking of which, I don't always feel like I get trustworthy results out of the profiler, so when I really want to know what is going on I find myself doing this alot: t1=time.time() [block of code here] t2=time.time() [more code] t3=time.time() and then comparing t3-t2 and t2-t1 to narrow down where the code is spending its time. Does anyone have good tips on how to do good profiling? Or is this question so vague and counter-intuitive that I seem silly and I had better come back with a believable example? Thanks, Ryan On 4/27/06, Gary Ruben wrote: > Hi Arnd, > > You could call it PerformanceTips and include some search terms like > "speed" in the page so search engines pick them up. > > Gary R. > > Arnd Baecker wrote: > > > I am just preparing a small text to collect such cases for the wiki. > > > > However, I am not sure about a good name for such a page: > > http://www.scipy.org/Cookbook/Speed > > http://www.scipy.org/Cookbook/SpeedProblems > > http://www.scipy.org/Cookbook/Performance > > ? > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From arnd.baecker at web.de Thu Apr 27 06:56:08 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 27 06:56:08 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: <4450B34F.8010501@bigpond.net.au> References: <444FE909.5080209@ieee.org> <4450B34F.8010501@bigpond.net.au> Message-ID: On Thu, 27 Apr 2006, Gary Ruben wrote: > Hi Arnd, > > You could call it PerformanceTips and include some search terms like > "speed" in the page so search engines pick them up. Alright, I put all I know on this (which is not that much ;-) at http://www.scipy.org/PerformanceTips The pointers to weave/f2py/pyrex/ (ah - psyco is missing) will have to be added. Also the profiling/benchmarking aspect, which is important (actually more important even before thinking about PerformanceTips) needs to be put somewhere, maybe even separately under http://www.scipy.org/BenchmarkingAndProfiling Best, Arnd > Gary R. > > Arnd Baecker wrote: > > > I am just preparing a small text to collect such cases for the wiki. > > > > However, I am not sure about a good name for such a page: > > http://www.scipy.org/Cookbook/Speed > > http://www.scipy.org/Cookbook/SpeedProblems > > http://www.scipy.org/Cookbook/Performance > > ? > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From arnd.baecker at web.de Thu Apr 27 07:02:16 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Thu Apr 27 07:02:16 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: <444FE909.5080209@ieee.org> <4450B34F.8010501@bigpond.net.au> Message-ID: On Thu, 27 Apr 2006, Ryan Krauss wrote: > I think this is a great idea. We get a lot of these kinds of > questions on the list, and the collective wisdom of people here who > have really dug into this is really impressive. But, that wisdom does > need to be a little easier to find. > > Speaking of which, I don't always feel like I get trustworthy results > out of the profiler, so when I really want to know what is going on I > find myself doing this alot: > > t1=time.time() > [block of code here] > t2=time.time() > [more code] > t3=time.time() > > and then comparing t3-t2 and t2-t1 to narrow down where the code is > spending its time. > > Does anyone have good tips on how to do good profiling? Or is this > question so vague and counter-intuitive that I seem silly and I had > better come back with a believable example? Maybe this one is of interest then: http://www.physik.tu-dresden.de/~baecker/comp_talks.html and goto "Python and Co - some recent developments" Quite late in the talk there is an example on Profiling (sorry, it seems that no direct linking is possible) The corresponding files are at http://www.physik.tu-dresden.de/~baecker/talks/pyco/BenchExamples/ Essentially it is an example of using kcachegrind to display the results of hotshot (see also: http://mail.enthought.com/pipermail/enthought-dev/2006-January/001075.html ) Best, Arnd > Thanks, > > Ryan > > On 4/27/06, Gary Ruben wrote: > > Hi Arnd, > > > > You could call it PerformanceTips and include some search terms like > > "speed" in the page so search engines pick them up. > > > > Gary R. > > > > Arnd Baecker wrote: > > > > > I am just preparing a small text to collect such cases for the wiki. > > > > > > However, I am not sure about a good name for such a page: > > > http://www.scipy.org/Cookbook/Speed > > > http://www.scipy.org/Cookbook/SpeedProblems > > > http://www.scipy.org/Cookbook/Performance > > > ? > > > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd_______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > From faltet at carabos.com Thu Apr 27 07:08:06 2006 From: faltet at carabos.com (Francesc Altet) Date: Thu Apr 27 07:08:06 2006 Subject: [Numpy-discussion] array.min() vs. min(array) In-Reply-To: References: <4450B34F.8010501@bigpond.net.au> Message-ID: <200604271606.52780.faltet@carabos.com> A Dijous 27 Abril 2006 15:40, Ryan Krauss va escriure: > I think this is a great idea. We get a lot of these kinds of > questions on the list, and the collective wisdom of people here who > have really dug into this is really impressive. But, that wisdom does > need to be a little easier to find. > > Speaking of which, I don't always feel like I get trustworthy results > out of the profiler, so when I really want to know what is going on I > find myself doing this alot: > > t1=time.time() > [block of code here] > t2=time.time() > [more code] > t3=time.time() > > and then comparing t3-t2 and t2-t1 to narrow down where the code is > spending its time. > > Does anyone have good tips on how to do good profiling? Or is this > question so vague and counter-intuitive that I seem silly and I had > better come back with a believable example? Well, if you are on Linux, and want to time C extension, then oprofile is a *very* good option. Another profiling tool is Cachegrind, part of Valgrind. It uses the processor emulation of Valgrind to run the executable, and catches all memory accesses for the trace. In addition, you can combine the output of oprofile with Cachegrind. In [3] one can see more info about these and more tools. [1] http://oprofile.sourceforge.net [2] http://kcachegrind.sourceforge.net/ [3] https://uimon.cern.ch/twiki/bin/view/Atlas/OptimisingCode Cheers, -- >0,0< Francesc Altet ? ? http://www.carabos.com/ V V C?rabos Coop. V. ??Enjoy Data "-" From lroubeyrie at limair.asso.fr Thu Apr 27 08:41:03 2006 From: lroubeyrie at limair.asso.fr (Lionel Roubeyrie) Date: Thu Apr 27 08:41:03 2006 Subject: [Numpy-discussion] equality with masked object In-Reply-To: References: <200604250938.48648.lroubeyrie@limair.asso.fr> Message-ID: <200604271740.11385.lroubeyrie@limair.asso.fr> Hi, thanks for your answer, but my problem is that I want to obtain the index of the max value in each column of a 2d masked array, then how can I do that without comparaison? Thanks Le Mardi 25 Avril 2006 15:10, Sasha a ?crit?: > On 4/25/06, Lionel Roubeyrie wrote: > > Why 5.0 == -- return True? A float is it the same as a masked object? > > thanks > > It does not. It returns ma.masked : > >>> test[3] is ma.masked > > True > > You should not access masked data - it makes no sense. The current > behavior is historical and I don't really like it. Masked scalars are > replaced by ma.masked singleton in subscript operations to allow a[i] > is masked idiom. In my view it is not worth the trouble, but my > suggestion to get rid of that feature was not met with much > enthusiasm. > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- Lionel Roubeyrie - lroubeyrie at limair.asso.fr LIMAIR http://www.limair.asso.fr From ndarray at mac.com Thu Apr 27 08:57:07 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 08:57:07 2006 Subject: [Numpy-discussion] equality with masked object In-Reply-To: <200604271740.11385.lroubeyrie@limair.asso.fr> References: <200604250938.48648.lroubeyrie@limair.asso.fr> <200604271740.11385.lroubeyrie@limair.asso.fr> Message-ID: On 4/27/06, Lionel Roubeyrie wrote: >[....................] I want to obtain the index of > the max value in each column of a 2d masked array, then how can I do that > without comparaison? ma.argmax(x, axis=0, fill_value=ma.maximum_fill_value(x)) or better: argmax(x.fill(ma.maximum_fill_value(x)), axis=0) From kwgoodman at gmail.com Thu Apr 27 09:32:10 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu Apr 27 09:32:10 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <44506BE6.10301@noaa.gov> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: On 4/26/06, Christopher Barker wrote: > something that always bit me with MATLAB. If I had a matrix that > happened to have a dimension of 1, MATLAB would interpret it as a > vector. I ended up writing functions like "SumColumns" that would check > if it was a single row vector before calling sum, so that I wouldn't > suddenly get a scaler result if a matrix happened to have on row. In Octave or Matlab, all you need to do is sum(x,1). For example: >> x = rand(1,4) x = 0.56755 0.24575 0.53804 0.36521 >> sum(x,1) ans = 0.56755 0.24575 0.53804 0.36521 From schofield at ftw.at Thu Apr 27 09:50:03 2006 From: schofield at ftw.at (Ed Schofield) Date: Thu Apr 27 09:50:03 2006 Subject: [Numpy-discussion] matrix operations with axis=None In-Reply-To: <4450780C.9060403@ieee.org> References: <4450780C.9060403@ieee.org> Message-ID: <4450F6F4.2060800@ftw.at> Travis Oliphant wrote: > Keith Goodman wrote: >> I noticed that the mean of a matrix is a matrix but the standard >> deviation of a matrix is an array. Is that the expected behavior? I'm >> also getting the wrong values (0 and nan) for the standard deviation. >> Did I mess something up? > This should be fixed now in SVN. If somebody can add a test that > would be great. > > Note, that the methods taking axes also now preserve row and column > orientation for matrices. > Well done for doing this. In fact, you beat me to it by a few hours; I was going to post a patch this morning to preserve orientation with matrix operations. The approach I took was different in one respect. Matrix objects currently return a matrix of shape (1, 1) from methods with an axis=None argument. For example: >>> x = asmatrix(random.uniform(0,1,(3,3))) >>> x.std() matrix([[ 0.26890557]]) >>> x.argmax() matrix([[4]]) I believe this behaviour is unfortunate, and that an operation aggregating a matrix over all dimensions should return a scalar. I've posted a patch at http://projects.scipy.org/scipy/numpy/ticket/83 that modifies this behaviour to return scalars (as rank-0 arrays) instead. It also removes some code duplication. The behaviour with the patch is: >>> x.std() 0.29610630190701492 >>> x.std().shape () >>> x.argmax() 3 Returning scalars from methods with an axis=None argument is the current behaviour of scipy sparse matrices, while axis=0 or axis=1 yields a sparse matrix with height or width 1, like numpy matrices. A (1 x 1) sparse matrix would be a strange object indeed, and would not be usable in all contexts where scalars are expected. I suspect the same would hold for (1 x 1) dense matrices. One example is that they cannot be used as indices for Python lists. For some matrix methods, such as argmax, returning a scalar would be highly desirable by allowing simpler code. A potential drawback to this change is that matrix operations aggregating along all dimensions, which would now share the behaviour of numpy arrays, would be no longer be consistent with matrix operations that aggregate along only one dimension, which currently do not reduce dimension, because matrices are inherently 2-d. This could be an argument for introducing a new vector class to represent one-dimensional data with orientation. -- Ed From gnchen at cortechs.net Thu Apr 27 09:56:12 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Thu Apr 27 09:56:12 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions Message-ID: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Hi! All, I just start writing my own python extension based on numpy. Couple of questions here: 1. I have some utility functions, such as wrappers for PyArray_GETPTR* needed be access by different extension modules. So, I put them in utlis.h and utlis.c. In utils.h, I need to include "numpy/arrayobject.h". But the compilation failed when I include it again in my extension module function, wrap.c: #include "numpy/arrayobject.h" #include "utils.h" When I remove it and use #include "utils.h" the compilation works. So, is it true that I can only include arrayobject.h once? 2. which import I should use in my initial function: import_array() or import_libnumarray() Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net From ndarray at mac.com Thu Apr 27 09:59:11 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 09:59:11 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: <44507A9D.8070902@ieee.org> References: <44507A9D.8070902@ieee.org> Message-ID: On 4/27/06, Travis Oliphant wrote: > [...] > The function (or macro) needs to implement the operation on the basic > data-type and if necessary set an error-flag in the floating-point > registers. > > If anybody has time to help implement these basic operations, it would > be greatly appreciated. I can help. To make sure we don't duplicate our effort, let's do the following: 1. I will add place-holders for all the necessary functions to make them return "NotImplemented". 2. I will then follow up with the list of functions that need to be filled out and we can then split the work. 3. We will also need to write tests that will make sure scalars behave similar to dimensionless arrays. If anyone would like to help with this, it will be greately appreciated. No C coding skills are necessary for that. From oliphant at ee.byu.edu Thu Apr 27 10:01:07 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 10:01:07 2006 Subject: [Numpy-discussion] matrix operations with axis=None In-Reply-To: <4450F6F4.2060800@ftw.at> References: <4450780C.9060403@ieee.org> <4450F6F4.2060800@ftw.at> Message-ID: <4450F7F2.1050707@ee.byu.edu> Ed Schofield wrote: >Travis Oliphant wrote: > > >>Keith Goodman wrote: >> >> >>>I noticed that the mean of a matrix is a matrix but the standard >>>deviation of a matrix is an array. Is that the expected behavior? I'm >>>also getting the wrong values (0 and nan) for the standard deviation. >>>Did I mess something up? >>> >>> >>This should be fixed now in SVN. If somebody can add a test that >>would be great. >> >>Note, that the methods taking axes also now preserve row and column >>orientation for matrices. >> >> >> >Well done for doing this. > >In fact, you beat me to it by a few hours; I was going to post a patch >this morning to preserve orientation with matrix operations. The >approach I took was different in one respect. > > I like your function-call approach as it ensures consistent behavior. >Returning scalars from methods with an axis=None argument is the current >behaviour of scipy sparse matrices, while axis=0 or axis=1 yields a >sparse matrix with height or width 1, like numpy matrices. A (1 x 1) >sparse matrix would be a strange object indeed, and would not be usable >in all contexts where scalars are expected. I suspect the same would >hold for (1 x 1) dense matrices. One example is that they cannot be >used as indices for Python lists. For some matrix methods, such as >argmax, returning a scalar would be highly desirable by allowing simpler >code. > >A potential drawback to this change is that matrix operations >aggregating along all dimensions, which would now share the behaviour of >numpy arrays, would be no longer be consistent with matrix operations >that aggregate along only one dimension, which currently do not reduce >dimension, because matrices are inherently 2-d. This could be an >argument for introducing a new vector class to represent one-dimensional >data with orientation. > > There is one more problem in that matrix-operations will not be preserved in all cases as they would have before. However, I suppose somebody doing a reduce over all dimensions would probably not expect the result to be a matrix, so I don't think it's a big drawback. Consistency with sparse matrices is another reason for returning a scalar. -Travis From Fernando.Perez at colorado.edu Thu Apr 27 10:04:01 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Apr 27 10:04:01 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: References: <44507A9D.8070902@ieee.org> Message-ID: <4450F93D.9050905@colorado.edu> Sasha wrote: > On 4/27/06, Travis Oliphant wrote: > >>[...] >>The function (or macro) needs to implement the operation on the basic >>data-type and if necessary set an error-flag in the floating-point >>registers. >> >>If anybody has time to help implement these basic operations, it would >>be greatly appreciated. > > > I can help. To make sure we don't duplicate our effort, let's do the following: > > 1. I will add place-holders for all the necessary functions to make > them return "NotImplemented". just a minor reminder: raise NotImplementedError is the standard idiom for this. Cheers, f From kwgoodman at gmail.com Thu Apr 27 10:05:05 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu Apr 27 10:05:05 2006 Subject: [Numpy-discussion] matrix.std() returns array In-Reply-To: <4450780C.9060403@ieee.org> References: <4450780C.9060403@ieee.org> Message-ID: On 4/27/06, Travis Oliphant wrote: > This should be fixed now in SVN. If somebody can add a test that would > be great. > > Note, that the methods taking axes also now preserve row and column > orientation for matrices. Hey, it works. Thank you. From ndarray at mac.com Thu Apr 27 10:52:01 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 10:52:01 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: On 4/27/06, Keith Goodman wrote: > [...] > In Octave or Matlab, all you need to do is sum(x,1). For example: > > >> x = rand(1,4) > x = > > 0.56755 0.24575 0.53804 0.36521 > > >> sum(x,1) > ans = > > 0.56755 0.24575 0.53804 0.36521 > How is this different from Numpy: >>> x = matrix(rand(4)) >>> sum(x.T, 1) matrix([[ 0.36186805], [ 0.90198107], [ 0.60407661], [ 0.49523327]]) From kwgoodman at gmail.com Thu Apr 27 11:05:03 2006 From: kwgoodman at gmail.com (Keith Goodman) Date: Thu Apr 27 11:05:03 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: On 4/27/06, Sasha wrote: > On 4/27/06, Keith Goodman wrote: > > [...] > > In Octave or Matlab, all you need to do is sum(x,1). For example: > > > > >> x = rand(1,4) > > x = > > > > 0.56755 0.24575 0.53804 0.36521 > > > > >> sum(x,1) > > ans = > > > > 0.56755 0.24575 0.53804 0.36521 > > > > How is this different from Numpy: > > >>> x = matrix(rand(4)) > >>> sum(x.T, 1) > matrix([[ 0.36186805], > [ 0.90198107], > [ 0.60407661], > [ 0.49523327]]) > Exactly. That's why the OP doesn't need to write a special function in Matlab called SumColumns. From Chris.Barker at noaa.gov Thu Apr 27 11:11:03 2006 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu Apr 27 11:11:03 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> Message-ID: <4451090C.5020901@noaa.gov> Keith Goodman wrote: > Exactly. That's why the OP doesn't need to write a special function in > Matlab called SumColumns. "Didn't". I haven't used MATLAB for much in years. Back in the day, that feature didn't exist. Or at least was poorly enough documented that i didn't think it existed. Matlab didn't used to only support 2-d arrays as well. Anyway, the point was that a (n,) array and a (n,1) array and a (1,n) array are all different, and that difference should be preserved. I'm still confused as to what behavior Sasha wants that doesn't exist. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From oliphant at ee.byu.edu Thu Apr 27 11:17:02 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 11:17:02 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: References: <44507A9D.8070902@ieee.org> Message-ID: <44510A6E.4090906@ee.byu.edu> Sasha wrote: >On 4/27/06, Travis Oliphant wrote: > > >>[...] >>The function (or macro) needs to implement the operation on the basic >>data-type and if necessary set an error-flag in the floating-point >>registers. >> >>If anybody has time to help implement these basic operations, it would >>be greatly appreciated. >> >> > >I can help. To make sure we don't duplicate our effort, let's do the following: > > > Thanks for your help. >1. I will add place-holders for all the necessary functions to make > > >them return "NotImplemented". > > The Python-object-returning functions are already there. All that is missing is the ctype functions to actually do the computation. So, I'm not sure what you mean. >2. I will then follow up with the list of functions that need to be >filled out and we can then split the work. > > This would be good to get a list. Some of the functions may require some repetition of what's in umathmodule.c. Let's just do the repetition for now and think about code refactoring after we know better what is actually duplicated. >3. We will also need to write tests that will make sure scalars behave >similar to dimensionless arrays. If anyone would like to help with >this, it will be greately appreciated. No C coding skills are >necessary for that. > > Tests would be necessary to ensure consistency. Thanks for jumping in... -Travis From cookedm at physics.mcmaster.ca Thu Apr 27 11:30:05 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 27 11:30:05 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: <4450F93D.9050905@colorado.edu> (Fernando Perez's message of "Thu, 27 Apr 2006 11:02:53 -0600") References: <44507A9D.8070902@ieee.org> <4450F93D.9050905@colorado.edu> Message-ID: Fernando Perez writes: > Sasha wrote: >> On 4/27/06, Travis Oliphant wrote: >> >>>[...] >>>The function (or macro) needs to implement the operation on the basic >>>data-type and if necessary set an error-flag in the floating-point >>>registers. >>> >>>If anybody has time to help implement these basic operations, it would >>>be greatly appreciated. >> I can help. To make sure we don't duplicate our effort, let's do >> the following: >> 1. I will add place-holders for all the necessary functions to make >> them return "NotImplemented". > > just a minor reminder: > > raise NotImplementedError > > is the standard idiom for this. Just a note: For __xxx__ methods, "return NotImplemented" is the standard idiom. See section 3.3.8 (Coercion rules) of the Python 2.4 language manual: For most intents and purposes, an operator that returns NotImplemented is treated the same as one that is not implemented at all. I believe the idea is that it's not actually an error for an __xxx__ method to not be implemented, as there are fallbacks. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From ndarray at mac.com Thu Apr 27 11:32:08 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 11:32:08 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: <44510A6E.4090906@ee.byu.edu> References: <44507A9D.8070902@ieee.org> <44510A6E.4090906@ee.byu.edu> Message-ID: On 4/27/06, Travis Oliphant wrote: > [ ... ] > > The Python-object-returning functions are already there. All that is > missing is the ctype functions to actually do the computation. So, I'm > not sure what you mean. > I did not realize that. However, it is still reasonable to add non-working prototypes to kill the warnings first marked by /* XXX */. I will do that before the end of the day. > >2. I will then follow up with the list of functions that need to be > >filled out and we can then split the work. > > > > > This would be good to get a list. See attached. -------------- next part -------------- byte_ctype_multiply ubyte_ctype_multiply short_ctype_multiply ushort_ctype_multiply int_ctype_multiply uint_ctype_multiply long_ctype_multiply ulong_ctype_multiply longlong_ctype_multiply ulonglong_ctype_multiply byte_ctype_divide ubyte_ctype_divide short_ctype_divide ushort_ctype_divide int_ctype_divide uint_ctype_divide long_ctype_divide ulong_ctype_divide longlong_ctype_divide ulonglong_ctype_divide byte_ctype_remainder ubyte_ctype_remainder short_ctype_remainder ushort_ctype_remainder int_ctype_remainder uint_ctype_remainder long_ctype_remainder ulong_ctype_remainder longlong_ctype_remainder ulonglong_ctype_remainder byte_ctype_divmod ubyte_ctype_divmod short_ctype_divmod ushort_ctype_divmod int_ctype_divmod uint_ctype_divmod long_ctype_divmod ulong_ctype_divmod longlong_ctype_divmod ulonglong_ctype_divmod byte_ctype_power ubyte_ctype_power short_ctype_power ushort_ctype_power int_ctype_power uint_ctype_power long_ctype_power ulong_ctype_power longlong_ctype_power ulonglong_ctype_power byte_ctype_floor_divide ubyte_ctype_floor_divide short_ctype_floor_divide ushort_ctype_floor_divide int_ctype_floor_divide uint_ctype_floor_divide long_ctype_floor_divide ulong_ctype_floor_divide longlong_ctype_floor_divide ulonglong_ctype_floor_divide byte_ctype_true_divide ubyte_ctype_true_divide short_ctype_true_divide ushort_ctype_true_divide int_ctype_true_divide uint_ctype_true_divide long_ctype_true_divide ulong_ctype_true_divide longlong_ctype_true_divide ulonglong_ctype_true_divide byte_ctype_lshift ubyte_ctype_lshift short_ctype_lshift ushort_ctype_lshift int_ctype_lshift uint_ctype_lshift long_ctype_lshift ulong_ctype_lshift longlong_ctype_lshift ulonglong_ctype_lshift byte_ctype_rshift ubyte_ctype_rshift short_ctype_rshift ushort_ctype_rshift int_ctype_rshift uint_ctype_rshift long_ctype_rshift ulong_ctype_rshift longlong_ctype_rshift ulonglong_ctype_rshift byte_ctype_and ubyte_ctype_and short_ctype_and ushort_ctype_and int_ctype_and uint_ctype_and long_ctype_and ulong_ctype_and longlong_ctype_and ulonglong_ctype_and byte_ctype_or ubyte_ctype_or short_ctype_or ushort_ctype_or int_ctype_or uint_ctype_or long_ctype_or ulong_ctype_or longlong_ctype_or ulonglong_ctype_or byte_ctype_xor ubyte_ctype_xor short_ctype_xor ushort_ctype_xor int_ctype_xor uint_ctype_xor long_ctype_xor ulong_ctype_xor longlong_ctype_xor ulonglong_ctype_xor float_ctype_remainder double_ctype_remainder longdouble_ctype_remainder cfloat_ctype_remainder cdouble_ctype_remainder clongdouble_ctype_remainder float_ctype_divmod double_ctype_divmod longdouble_ctype_divmod cfloat_ctype_divmod cdouble_ctype_divmod clongdouble_ctype_divmod float_ctype_power double_ctype_power longdouble_ctype_power cfloat_ctype_power cdouble_ctype_power clongdouble_ctype_power cfloat_cfloat_divide cdouble_cfloat_divide clongdouble_cfloat_divide byte_ctype_negative ubyte_ctype_negative short_ctype_negative ushort_ctype_negative int_ctype_negative uint_ctype_negative long_ctype_negative ulong_ctype_negative longlong_ctype_negative ulonglong_ctype_negative float_ctype_negative double_ctype_negative longdouble_ctype_negative cfloat_ctype_negative cdouble_ctype_negative clongdouble_ctype_negative byte_ctype_positive ubyte_ctype_positive short_ctype_positive ushort_ctype_positive int_ctype_positive uint_ctype_positive long_ctype_positive ulong_ctype_positive longlong_ctype_positive ulonglong_ctype_positive float_ctype_positive double_ctype_positive longdouble_ctype_positive cfloat_ctype_positive cdouble_ctype_positive clongdouble_ctype_positive byte_ctype_absolute ubyte_ctype_absolute short_ctype_absolute ushort_ctype_absolute int_ctype_absolute uint_ctype_absolute long_ctype_absolute ulong_ctype_absolute longlong_ctype_absolute ulonglong_ctype_absolute float_ctype_absolute double_ctype_absolute longdouble_ctype_absolute cfloat_ctype_absolute cdouble_ctype_absolute clongdouble_ctype_absolute byte_ctype_nonzero ubyte_ctype_nonzero short_ctype_nonzero ushort_ctype_nonzero int_ctype_nonzero uint_ctype_nonzero long_ctype_nonzero ulong_ctype_nonzero longlong_ctype_nonzero ulonglong_ctype_nonzero float_ctype_nonzero double_ctype_nonzero longdouble_ctype_nonzero cfloat_ctype_nonzero cdouble_ctype_nonzero clongdouble_ctype_nonzero byte_ctype_invert ubyte_ctype_invert short_ctype_invert ushort_ctype_invert int_ctype_invert uint_ctype_invert long_ctype_invert ulong_ctype_invert longlong_ctype_invert ulonglong_ctype_invert float_ctype_invert double_ctype_invert longdouble_ctype_invert cfloat_ctype_invert cdouble_ctype_invert clongdouble_ctype_invert From cookedm at physics.mcmaster.ca Thu Apr 27 11:32:11 2006 From: cookedm at physics.mcmaster.ca (David M. Cooke) Date: Thu Apr 27 11:32:11 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> (Gennan Chen's message of "Thu, 27 Apr 2006 09:55:42 -0700") References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Message-ID: Gennan Chen writes: > Hi! All, > > I just start writing my own python extension based on numpy. Couple > of questions here: > > 1. I have some utility functions, such as wrappers for > PyArray_GETPTR* needed be access by different extension modules. So, > I put them in utlis.h and utlis.c. In utils.h, I need to include > "numpy/arrayobject.h". But the compilation failed when I include it > again in my extension module function, wrap.c: > > #include "numpy/arrayobject.h" > #include "utils.h" > > When I remove it and use > > #include "utils.h" > > the compilation works. So, is it true that I can only include > arrayobject.h once? What is the compiler error message? > 2. which import I should use in my initial function: > > import_array() This one. It's the one to use for Numeric, numarray, and numpy. > or > import_libnumarray() This is for numarray, the other Numeric derivative. It pulls in the numarray-specific stuff IIRC. -- |>|\/|< /--------------------------------------------------------------------------\ |David M. Cooke http://arbutus.physics.mcmaster.ca/dmc/ |cookedm at physics.mcmaster.ca From oliphant at ee.byu.edu Thu Apr 27 11:36:06 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 11:36:06 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Message-ID: <44510F04.3020806@ee.byu.edu> Gennan Chen wrote: > Hi! All, > > I just start writing my own python extension based on numpy. Couple > of questions here: > > 1. I have some utility functions, such as wrappers for > PyArray_GETPTR* needed be access by different extension modules. So, > I put them in utlis.h and utlis.c. In utils.h, I need to include > "numpy/arrayobject.h". But the compilation failed when I include it > again in my extension module function, wrap.c: > > #include "numpy/arrayobject.h" > #include "utils.h" > > When I remove it and use > > #include "utils.h" > > the compilation works. So, is it true that I can only include > arrayobject.h once? No, you can include arrayobject.h more than once. However, if you make use of C-API functions (not just macros that access elements of the array) in more than one file for the same extension module, you need to do a couple of things to make it work. In the original file you must define PY_ARRAY_UNIQUE_SYMBOL to something unique to your extension module before you include the arrayobject.h file. In the helper c file you must define PY_ARRAY_UNIQUE_SYMBOL and define NO_IMPORT_ARRAY prior to including the arrayobject.h Thus, in wrap.c you do (feel free to change the name from _chen_extension to something else) #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #include "numpy/arrayobject.h" and in utils.c you do #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #define NO_IMPORT_ARRAY #include "numpy/arrayobject.h" > > 2. which import I should use in my initial function: > > import_array() import_array() -Travis From oliphant at ee.byu.edu Thu Apr 27 11:40:10 2006 From: oliphant at ee.byu.edu (Travis Oliphant) Date: Thu Apr 27 11:40:10 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <4451090C.5020901@noaa.gov> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> <4451090C.5020901@noaa.gov> Message-ID: <44510FD2.1090502@ee.byu.edu> Christopher Barker wrote: > Keith Goodman wrote: > >> Exactly. That's why the OP doesn't need to write a special function in >> Matlab called SumColumns. > > > "Didn't". I haven't used MATLAB for much in years. Back in the day, > that feature didn't exist. Or at least was poorly enough documented > that i didn't think it existed. Matlab didn't used to only support 2-d > arrays as well. > > Anyway, the point was that a (n,) array and a (n,1) array and a (1,n) > array are all different, and that difference should be preserved. > > I'm still confused as to what behavior Sasha wants that doesn't exist. I'm not exactly sure. But, one of the things I think he has suggested (please tell me if my understanding is wrong) is to allow a 2x3 array to be "broadcast" to a (2n)x(3m) array by repeated copying as needed. -Travis From gnchen at cortechs.net Thu Apr 27 12:24:38 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Thu Apr 27 12:24:38 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <44510F04.3020806@ee.byu.edu> References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> <44510F04.3020806@ee.byu.edu> Message-ID: Thanks! That solve the problem. May I ask what does those #define really means?? Gen On Apr 27, 2006, at 11:35 AM, Travis Oliphant wrote: > Gennan Chen wrote: > >> Hi! All, >> >> I just start writing my own python extension based on numpy. >> Couple of questions here: >> >> 1. I have some utility functions, such as wrappers for >> PyArray_GETPTR* needed be access by different extension modules. >> So, I put them in utlis.h and utlis.c. In utils.h, I need to >> include "numpy/arrayobject.h". But the compilation failed when I >> include it again in my extension module function, wrap.c: >> >> #include "numpy/arrayobject.h" >> #include "utils.h" >> >> When I remove it and use >> >> #include "utils.h" >> >> the compilation works. So, is it true that I can only include >> arrayobject.h once? > > > No, you can include arrayobject.h more than once. However, if you > make use of C-API functions (not just macros that access elements > of the array) in more than one file for the same extension module, > you need to do a couple of things to make it work. > > In the original file you must define PY_ARRAY_UNIQUE_SYMBOL to > something unique to your extension module before you include the > arrayobject.h file. > > In the helper c file you must define PY_ARRAY_UNIQUE_SYMBOL and > define NO_IMPORT_ARRAY prior to including the arrayobject.h > > Thus, in wrap.c you do (feel free to change the name from > _chen_extension to something else) > > #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #include "numpy/ > arrayobject.h" > > and in > > utils.c you do > > #define PY_ARRAY_UNIQUE_SYMBOL _chen_extension #define > NO_IMPORT_ARRAY > #include "numpy/arrayobject.h" > > >> >> 2. which import I should use in my initial function: >> >> import_array() > > > import_array() > > -Travis > > From gnchen at cortechs.net Thu Apr 27 12:24:41 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Thu Apr 27 12:24:41 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: References: <228EDE46-B760-44BA-A987-273F2ADC9B81@cortechs.net> Message-ID: <8CD47186-A354-4C8A-B5AF-8BEC2CE82D2E@cortechs.net> Got it. Looks like ndimage still used the old one. Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net On Apr 27, 2006, at 11:31 AM, David M. Cooke wrote: > Gennan Chen writes: > >> Hi! All, >> >> I just start writing my own python extension based on numpy. Couple >> of questions here: >> >> 1. I have some utility functions, such as wrappers for >> PyArray_GETPTR* needed be access by different extension modules. So, >> I put them in utlis.h and utlis.c. In utils.h, I need to include >> "numpy/arrayobject.h". But the compilation failed when I include it >> again in my extension module function, wrap.c: >> >> #include "numpy/arrayobject.h" >> #include "utils.h" >> >> When I remove it and use >> >> #include "utils.h" >> >> the compilation works. So, is it true that I can only include >> arrayobject.h once? > > What is the compiler error message? > >> 2. which import I should use in my initial function: >> >> import_array() > > This one. It's the one to use for Numeric, numarray, and numpy. > >> or >> import_libnumarray() > > This is for numarray, the other Numeric derivative. It pulls in the > numarray-specific stuff IIRC. > > -- > |>|\/|< > /--------------------------------------------------------------------- > -----\ > |David M. Cooke http:// > arbutus.physics.mcmaster.ca/dmc/ > |cookedm at physics.mcmaster.ca > From ndarray at mac.com Thu Apr 27 12:29:03 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 12:29:03 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: <44510FD2.1090502@ee.byu.edu> References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> <4451090C.5020901@noaa.gov> <44510FD2.1090502@ee.byu.edu> Message-ID: On 4/27/06, Travis Oliphant wrote: > [...] > > I'm still confused as to what behavior Sasha wants that doesn't exist. > > > I'm not exactly sure. But, one of the things I think he has suggested > (please tell me if my understanding is wrong) is to allow a 2x3 array to > be "broadcast" to a (2n)x(3m) array by repeated copying as needed. Yes, this is the only new feature that I've suggested. I was also hoping that the same code that allows shape=(3,) being broadcast to shape (2,3) can be reused to broadcast (3,) to (6,). The idea is that since in terms of memory operations broadcasting and repetition is the same, the code can be reused. The idea is that since repetition can be achieved using broadcasting: >>> x = zeros(3) >>> x.reshape((2,3)) += arange(3) >>> x array([0, 1, 2, 0, 1, 2]) if we allow x += arange(3), it can use the same code as broadcasting internally. From ndarray at mac.com Thu Apr 27 12:30:05 2006 From: ndarray at mac.com (Sasha) Date: Thu Apr 27 12:30:05 2006 Subject: [Numpy-discussion] Broadcasting rules (Ticket 76). In-Reply-To: References: <4050236.1146000212838.JavaMail.root@fed1wml05.mgt.cox.net> <444F0420.9000500@ieee.org> <44506BE6.10301@noaa.gov> <4451090C.5020901@noaa.gov> <44510FD2.1090502@ee.byu.edu> Message-ID: On 4/27/06, Sasha wrote: > >>> x.reshape((2,3)) += arange(3) Oops, that should have been >>> x.reshape((2,3))[...] += arange(3) From Fernando.Perez at colorado.edu Thu Apr 27 12:58:02 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Thu Apr 27 12:58:02 2006 Subject: [Numpy-discussion] Warnings in NumPy SVN In-Reply-To: References: <44507A9D.8070902@ieee.org> <4450F93D.9050905@colorado.edu> Message-ID: <44512213.9090902@colorado.edu> David M. Cooke wrote: > Fernando Perez writes: > > >>Sasha wrote: >> >>>On 4/27/06, Travis Oliphant wrote: >>> >>> >>>>[...] >>>>The function (or macro) needs to implement the operation on the basic >>>>data-type and if necessary set an error-flag in the floating-point >>>>registers. >>>> >>>>If anybody has time to help implement these basic operations, it would >>>>be greatly appreciated. >>> >>>I can help. To make sure we don't duplicate our effort, let's do >>>the following: >>>1. I will add place-holders for all the necessary functions to make >>>them return "NotImplemented". >> >>just a minor reminder: >> >> raise NotImplementedError >> >>is the standard idiom for this. > > > Just a note: For __xxx__ methods, "return NotImplemented" is the > standard idiom. See section 3.3.8 (Coercion rules) of the Python 2.4 > language manual: > > For most intents and purposes, an operator that returns > NotImplemented is treated the same as one that is not implemented > at all. > > I believe the idea is that it's not actually an error for an __xxx__ > method to not be implemented, as there are fallbacks. You are right. It's worth remembering that the actual syntaxes are return NotImplemented and raise NotImplementedError /without/ quotes (as per the original msg), since these are actual python builtins, not strings. That way they can be properly handled by their return value or proper exception handling. Cheers, f From woeue at kandy.ccom.lk Thu Apr 27 18:28:06 2006 From: woeue at kandy.ccom.lk (Bert Morrow) Date: Thu Apr 27 18:28:06 2006 Subject: [Numpy-discussion] sob story Message-ID: <001b01c66a62$ee740da5$41bd8147@rf.ncuwi> An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: overdue.gif Type: image/gif Size: 10245 bytes Desc: not available URL: From nvf at MIT.EDU Thu Apr 27 21:02:03 2006 From: nvf at MIT.EDU (Nick Fotopoulos) Date: Thu Apr 27 21:02:03 2006 Subject: [Numpy-discussion] Freeing memory allocated in C Message-ID: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> Dear numpy-discussion, I have written a python module in C which wraps a C library (FrameL) in order to read data from specially formatted files into Python arrays. It works, but I think have a memory leak, and I can't see what I might be doing wrong. This Python wrapper is almost identical to a Matlab wrapper, but the Matlab version doesn't leak. Perhaps someone here can help me out? I have read in many places that to return an array, one should wrap with PyArray_FromDimsAndData (or more modern versions) and then return it without freeing the memory. Does the same principle hold for strings? Are the following example snippets correct? // output2 = x-axis values relative to first data point. data = malloc(nData*sizeof(double)); for(i=0; istartX[0]+(double)i*dt; } shape[0] = nData; out2 = (PyArrayObject *) PyArray_FromDimsAndData(1,shape,PyArray_DOUBLE,(char *)data); //snip // output5 = gps start time as a string utc = vect->GTime - vect->ULeapS + FRGPSTAI; out5 = malloc(200*sizeof(char)); sprintf(out5,"Starting GPS time:%.1f UTC=%s", vect->GTime,FrStrGTime(utc)); //snip -- Free all memory not assigned to a return object return Py_BuildValue("(OOOdsss)",out1,out2,out3,out4,out5,out6,out7); I see in the Numpy book that I should modernize PyArray_FromDimsAndData, but will it be incompatible with users who have only Numeric? If the code above should not leak under your inspection, are there any other common places that python C modules often leak that I should check? As a side note, here is how I have been defining "leak". I have been measuring memory usage by opening a pipe to ps to check rss between reading in frames and invoking del on them. Memory usage increases, but does not decrease. In contrast, if I commit the same data in an array to a pickle file and read that in, invoking del reduces memory usage. Many thanks, Nick From robert.kern at gmail.com Thu Apr 27 21:14:02 2006 From: robert.kern at gmail.com (Robert Kern) Date: Thu Apr 27 21:14:02 2006 Subject: [Numpy-discussion] Re: Freeing memory allocated in C In-Reply-To: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> Message-ID: Nick Fotopoulos wrote: > Dear numpy-discussion, > > I have written a python module in C which wraps a C library (FrameL) in > order to read data from specially formatted files into Python arrays. > It works, but I think have a memory leak, and I can't see what I might > be doing wrong. This Python wrapper is almost identical to a Matlab > wrapper, but the Matlab version doesn't leak. Perhaps someone here can > help me out? > > I have read in many places that to return an array, one should wrap > with PyArray_FromDimsAndData (or more modern versions) and then return > it without freeing the memory. Does the same principle hold for > strings? Are the following example snippets correct? > > // output2 = x-axis values relative to first data point. > data = malloc(nData*sizeof(double)); > for(i=0; i data[i] = vect->startX[0]+(double)i*dt; > } > shape[0] = nData; > out2 = (PyArrayObject *) > PyArray_FromDimsAndData(1,shape,PyArray_DOUBLE,(char *)data); I wouldn't rely on PyArray_FromDimsAndData doing the right thing. Instead of malloc'ing a block of memory, why don't you create an empty array of the right size, use its data pointer to fill it with that for-loop, and then return that array object? > //snip > > // output5 = gps start time as a string > utc = vect->GTime - vect->ULeapS + FRGPSTAI; > out5 = malloc(200*sizeof(char)); > sprintf(out5,"Starting GPS time:%.1f UTC=%s", > vect->GTime,FrStrGTime(utc)); > > //snip -- Free all memory not assigned to a return object > > return Py_BuildValue("(OOOdsss)",out1,out2,out3,out4,out5,out6,out7); > > I see in the Numpy book that I should modernize > PyArray_FromDimsAndData, but will it be incompatible with users who > have only Numeric? Yes. However, I would suggest that new code should probably use just use numpy fully especially if the restrictions of the old Numeric API is causing you pain. The longer people support both, the longer people will *have* to support both. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From oliphant.travis at ieee.org Thu Apr 27 21:40:04 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Thu Apr 27 21:40:04 2006 Subject: [Numpy-discussion] Freeing memory allocated in C In-Reply-To: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> Message-ID: <44519C6E.80006@ieee.org> Nick Fotopoulos wrote: > Dear numpy-discussion, > > I have written a python module in C which wraps a C library (FrameL) > in order to read data from specially formatted files into Python > arrays. It works, but I think have a memory leak, and I can't see > what I might be doing wrong. This Python wrapper is almost identical > to a Matlab wrapper, but the Matlab version doesn't leak. Perhaps > someone here can help me out? > > I have read in many places that to return an array, one should wrap > with PyArray_FromDimsAndData (or more modern versions) and then return > it without freeing the memory. Does the same principle hold for > strings? Are the following example snippets correct? Why don't you just use PyArray_FromDims and let NumPy manage the memory? FromDimsAndData is only for situations where you can't manage the memory with Python. Therefore the memory is never freed. If you do want to have NumPy deallocate the memory when you are done, then you have to 1) Make sure you are using the same allocator as NumPy is... _pya_malloc is defined in arrayobject.h (in NumPy but not in Numeric) 2) Reset the array flag so that OWN_DATA is set out2->flags |= OWN_DATA As long as you are using the same memory allocator, this should work. The OWN_DATA flag instructs the deallocator to free the data. But, I would strongly suggest just using PyArray_FromDims and let NumPy allocate the new array for you. > > // output2 = x-axis values relative to first data point. > data = malloc(nData*sizeof(double)); > for(i=0; i data[i] = vect->startX[0]+(double)i*dt; > } > shape[0] = nData; > out2 = (PyArrayObject *) > PyArray_FromDimsAndData(1,shape,PyArray_DOUBLE,(char *)data); > > //snip > > // output5 = gps start time as a string > utc = vect->GTime - vect->ULeapS + FRGPSTAI; > out5 = malloc(200*sizeof(char)); > sprintf(out5,"Starting GPS time:%.1f UTC=%s", > vect->GTime,FrStrGTime(utc)); > > //snip -- Free all memory not assigned to a return object > > return Py_BuildValue("(OOOdsss)",out1,out2,out3,out4,out5,out6,out7); > > > I see in the Numpy book that I should modernize > PyArray_FromDimsAndData, but will it be incompatible with users who > have only Numeric? Yes, the only issue, however, is that PyArray_FromDims and friends will only allow int-length sizes which on 64-bit computers is not as large as intp-length sizes. So, if you don't care about allowing large sizes then you can use the old Numeric C-API. > > If the code above should not leak under your inspection, are there any > other common places that python C modules often leak that I should check? All of the malloc calls in your code leak. In general you should not assume that Python will deallocate memory you have allocated. Python uses it's own memory manager so even if you manage to arange things so that Python will free your memory (and you really have to hack things to do that), then you can run into trouble if you try mixing system malloc calls with Python's deallocation. The proper strategy for your arrays is to use PyArray_SimpleNew and then get the data-pointer to fill using PyArray_DATA(...). The proper way to handle strings is to create a new string (say using PyString_FromFormat) and then return everything as objects. /* make sure shape is defined as intp unless you don't care about 64-bit */ obj2 = PyArray_SimpleNew(1, shape, PyArray_DOUBLE); data = (double *)PyArray_DATA(obj2) [snip...] out5 = PyString_FromFormat("Starting GPS time:%.1f UTC=%s", vect->GTime,FrStrGTime(utc)); return Py_BuildValue("(NNNdNNN)",out1,out2,out3,out4,out5,out6,out7); Make sure you use the 'N' tag so that another reference count isn't generated. The 'O' tag will increase the reference count of your objects by one which is is not necessarily what you want (but sometimes you do). Good luck, -Travis From oliphant.travis at ieee.org Fri Apr 28 00:14:16 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Fri Apr 28 00:14:16 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing Message-ID: <4451C076.40608@ieee.org> The scalar math module is complete and ready to be tested. It should speed up code that relies heavily on scalar arithmetic by by-passing the ufunc machinery. It needs lots of testing to be sure that it is doing the "right" thing. To enable scalarmath you need to import numpy.core.scalarmath You cannot disable it once it's enabled except by restarting Python. If we need that feature we can add it. The array scalars respond to the error modes of ufuncs. There is an experimental function called alter_scalars that replaces the Python int, float, and complex number tables with the array scalar equivalents. Thus, to amaze (or seriously annoy) your Python friends you can do import numpy.core.scalarmath as ncs ncs.alter_scalars(int) 1 / 0 This will return 0 unless you change the error modes... ncs.retore_scalars(int) Will put things back the way Guido intended.... Please try it out and send us error reports. Many thanks to Sasha for his help in getting all the code so it at least compiles and loads. All bugs should be blamed on me, though... Best, -Travis From arnd.baecker at web.de Fri Apr 28 00:48:04 2006 From: arnd.baecker at web.de (Arnd Baecker) Date: Fri Apr 28 00:48:04 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: Hi Travis, On Fri, 28 Apr 2006, Travis Oliphant wrote: > > The scalar math module is complete and ready to be tested. It should > speed up code that relies heavily on scalar arithmetic by by-passing the > ufunc machinery. > > It needs lots of testing to be sure that it is doing the "right" > thing. To enable scalarmath you need to > > import numpy.core.scalarmath > > You cannot disable it once it's enabled except by restarting Python. If > we need that feature we can add it. The array scalars respond to the > error modes of ufuncs. > > There is an experimental function called alter_scalars that replaces the > Python int, float, and complex number tables with the array scalar > equivalents. Thus, to amaze (or seriously annoy) your Python friends LOL ;-) > you can do > > import numpy.core.scalarmath as ncs > > ncs.alter_scalars(int) > > 1 / 0 > > This will return 0 unless you change the error modes... > > ncs.retore_scalars(int) > > Will put things back the way Guido intended.... > > > Please try it out and send us error reports. Many thanks to Sasha for > his help in getting all the code so it at least compiles and loads. All > bugs should be blamed on me, though... Well, it does not compile for me (64 Bit opteron, as usual;-): gcc options: '-pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC' compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.4/numpy/core -Inumpy/core/src -Inumpy/core/include -I/scr/python/include/python2.4 -c' gcc: build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:472: error: redefinition of 'ulong_ctype_multiply' build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: error: previous definition of 'ulong_ctype_multiply' was here build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: warning: 'ulong_ctype_multiply' defined but not used build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:472: error: redefinition of 'ulong_ctype_multiply' build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: error: previous definition of 'ulong_ctype_multiply' was here build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c:421: warning: 'ulong_ctype_multiply' defined but not used error: Command "gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-2.4/numpy/core -Inumpy/core/src -Inumpy/core/include -I/scr/python/include/python2.4 -c build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.c -o build/temp.linux-x86_64-2.4/build/src.linux-x86_64-2.4/numpy/core/src/scalarmathmodule.o" failed with exit status 1 (I can't look into this now - meeting in -2 minutes ;-) Best, Arnd From schofield at ftw.at Fri Apr 28 01:32:00 2006 From: schofield at ftw.at (Ed Schofield) Date: Fri Apr 28 01:32:00 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: <4451D3F0.7080408@ftw.at> Travis Oliphant wrote: > > The scalar math module is complete and ready to be tested. It should > speed up code that relies heavily on scalar arithmetic by by-passing > the ufunc machinery. Excellent! > It needs lots of testing to be sure that it is doing the "right" thing. With revision 2454 I get a segfault in numpy.test() after importing numpy.core.scalarmath: check_1 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_2 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_3 (numpy.distutils.tests.test_misc_util.test_appendpath) ... ok check_gpaths (numpy.distutils.tests.test_misc_util.test_gpaths) ... ok check_1 (numpy.distutils.tests.test_misc_util.test_minrelpath) ... ok check_singleton (numpy.lib.tests.test_getlimits.test_double) Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1208403744 (LWP 11232)] 0xb7142cf7 in int_richcompare (self=0x81c0ab8, other=0x8141dbc, cmp_op=3) at build/src.linux-i686-2.4/numpy/core/src/scalarmathmodule.c:19120 19120 PyArrayScalar_RETURN_TRUE; (gdb) bt #0 0xb7142cf7 in int_richcompare (self=0x81c0ab8, other=0x8141dbc, cmp_op=3) at build/src.linux-i686-2.4/numpy/core/src/scalarmathmodule.c:19120 #1 0x0807ce1f in PyObject_Print () #2 0x0807e451 in PyObject_RichCompare () Is this helpful? -- Ed From steffen.loeck at gmx.de Fri Apr 28 01:34:07 2006 From: steffen.loeck at gmx.de (Steffen Loeck) Date: Fri Apr 28 01:34:07 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: <200604281033.19781.steffen.loeck@gmx.de> On Friday 28 April 2006 09:12 am, Travis Oliphant wrote: > Please try it out and send us error reports. Many thanks to Sasha for > his help in getting all the code so it at least compiles and loads. All > bugs should be blamed on me, though... Running the tests with numpy.test(10) i get: /test/lib/python2.3/site-packages/numpy/testing/numpytest.py:179: DeprecationWarning: Non-ASCII character '\xf2' in file/test/lib/python2.3/site-packages/numpy/lib/tests/test_ufunclike.pyc on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details m = imp.load_module(name, open(filename), filename,('.py','U',1)) E................................../test/lib/python2.3/site-packages/numpy/testing/numpytest.py:179: DeprecationWarning: Non-ASCII character '\xf2' in file test/lib/python2.3/site-packages/numpy/lib/tests/test_polynomial.pyc on line 1, but no encoding declared; see http://www.python.org/peps/pep-0263.html for details m = imp.load_module(name, open(filename), filename,('.py','U',1)) E........................................................................... ====================================================================== ERROR: check_doctests (numpy.lib.tests.test_ufunclike.test_docs) ---------------------------------------------------------------------- Traceback (most recent call last): File "/test/lib/python2.3/site-packages/numpy/lib/tests/test_ufunclike.py", line 59, in check_doctests def check_doctests(self): return self.rundocs() File "/test//lib/python2.3/site-packages/numpy/testing/numpytest.py", line 179, in rundocs m = imp.load_module(name, open(filename), filename,('.py','U',1)) File "test/lib/python2.3/site-packages/numpy/lib/tests/test_ufunclike.pyc", line 1 ;? ^ SyntaxError: invalid syntax ====================================================================== ERROR: check_doctests (numpy.lib.tests.test_polynomial.test_docs) ---------------------------------------------------------------------- Traceback (most recent call last): File "/test/lib/python2.3/site-packages/numpy/lib/tests/test_polynomial.py", line 79, in check_doctests def check_doctests(self): return self.rundocs() File "/test//lib/python2.3/site-packages/numpy/testing/numpytest.py", line 179, in rundocs m = imp.load_module(name, open(filename), filename,('.py','U',1)) File "/test/lib/python2.3/site-packages/numpy/lib/tests/test_polynomial.pyc", line 1 ;? ^ SyntaxError: invalid syntax I have no idea, where this comes from. Regards, Steffen From fullung at gmail.com Fri Apr 28 02:39:03 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 28 02:39:03 2006 Subject: [Numpy-discussion] newbie for writing numpy/scipy extensions In-Reply-To: <8CD47186-A354-4C8A-B5AF-8BEC2CE82D2E@cortechs.net> Message-ID: <018c01c66aa7$77764480$0a84a8c0@dsp.sun.ac.za> Hello all I've collected the information from this thread along with links to some recent threads on writing C extensions on the wiki at: http://www.scipy.org/Cookbook/C_Extensions Feel free to contribute! Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Gennan Chen > Sent: 27 April 2006 21:23 > To: David M.Cooke > Cc: Numpy-discussion at lists.sourceforge.net > Subject: Re: [Numpy-discussion] newbie for writing numpy/scipy extensions > > Got it. Looks like ndimage still used the old one. > > Gen-Nan Chen, PhD > Chief Scientist > Research and Development Group > CorTechs Labs Inc (www.cortechs.net) > 1020 Prospect St., #304, La Jolla, CA, 92037 > Tel: 1-858-459-9700 ext 16 > Fax: 1-858-459-9705 > Email: gnchen at cortechs.net > > > On Apr 27, 2006, at 11:31 AM, David M. Cooke wrote: > > > Gennan Chen writes: > > > >> Hi! All, > >> > >> I just start writing my own python extension based on numpy. Couple > >> of questions here: > >> > >> 1. I have some utility functions, such as wrappers for > >> PyArray_GETPTR* needed be access by different extension modules. So, > >> I put them in utlis.h and utlis.c. In utils.h, I need to include > >> "numpy/arrayobject.h". But the compilation failed when I include it > >> again in my extension module function, wrap.c: > >> > >> #include "numpy/arrayobject.h" > >> #include "utils.h" > >> > >> When I remove it and use > >> > >> #include "utils.h" > >> > >> the compilation works. So, is it true that I can only include > >> arrayobject.h once? > > > > What is the compiler error message? > > > >> 2. which import I should use in my initial function: > >> > >> import_array() > > > > This one. It's the one to use for Numeric, numarray, and numpy. > > > >> or > >> import_libnumarray() > > > > This is for numarray, the other Numeric derivative. It pulls in the > > numarray-specific stuff IIRC. > > > > -- > > |>|\/|< > > /--------------------------------------------------------------------- > > -----\ > > |David M. Cooke http:// > > arbutus.physics.mcmaster.ca/dmc/ > > |cookedm at physics.mcmaster.ca From lcordier at point45.com Fri Apr 28 06:36:10 2006 From: lcordier at point45.com (Louis Cordier) Date: Fri Apr 28 06:36:10 2006 Subject: [Numpy-discussion] Bug Message-ID: Hi, I am not sure if this is the proper place to do a bug post. I looked at the active tickets on http://projects.scipy.org/scipy/numpy/ but didn't feel confident to go and create a new one. ;) Anyway the current release version 0.9.6 have some broken behavior. I guess some example code would illustrate it best. ---8<---------------- >>> z = numpy.zeros((10,10), 'O') >>> z.fill(None) >>> z.fill([]) Segmentation fault (core dumped) This happens on both Linux and FreeBSD machines. (both builds use *_lite versions of Lapack) Linux bellagio 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 i386 GNU/Linux Python 2.4.1 gcc version 4.0.0 20050519 (Red Hat 4.0.0-8) FreeBSD cerberus.intranet 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #0: Wed Mar 15 16:06:48 UTC 2006 Python 2.4.2 gcc version 3.4.2 [FreeBSD] 20040728 I assume fill() will need to make a copy, of the object for each coordinate in the matix. ---8<---------------- While, >>> import numpy >>> z = numpy.zeros((2,2), 'O') >>> z array([[0, 0], [0, 0]], dtype=object) >>> z.fill([1]) >>> z array([[1, 1], [1, 1]], dtype=object) and >>> z.fill([1,2,3]) >>> z array([[1, 1], [1, 1]], dtype=object) I would have expected, >>> z array([[[1], [1]], [[1], [1]]], dtype=object) and >>> z array([[[1, 2, 3], [1, 2, 3]], [[1, 2, 3], [1, 2, 3]]], dtype=object) Regards, Louis. -- Louis Cordier cell: +27721472305 Point45 Entertainment (Pty) Ltd. http://www.point45.org From ndarray at mac.com Fri Apr 28 09:04:09 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 28 09:04:09 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: The core dump is definitely a bug. I reproduced it on my Linux system. Please create a ticket. I am not sure whether fill should copy objects or not. When you populate an array with immutable objects, creating multiple copies is a waste. On 4/28/06, Louis Cordier wrote: > > Hi, I am not sure if this is the proper place to do a bug post. > I looked at the active tickets on http://projects.scipy.org/scipy/numpy/ > but didn't feel confident to go and create a new one. ;) > > Anyway the current release version 0.9.6 have some broken behavior. > I guess some example code would illustrate it best. > > ---8<---------------- > > >>> z = numpy.zeros((10,10), 'O') > >>> z.fill(None) > >>> z.fill([]) > Segmentation fault (core dumped) > > This happens on both Linux and FreeBSD machines. > (both builds use *_lite versions of Lapack) > > Linux bellagio 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 > i386 GNU/Linux > Python 2.4.1 > gcc version 4.0.0 20050519 (Red Hat 4.0.0-8) > > FreeBSD cerberus.intranet 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #0: Wed > Mar 15 16:06:48 UTC 2006 > Python 2.4.2 > gcc version 3.4.2 [FreeBSD] 20040728 > > I assume fill() will need to make a copy, of the object > for each coordinate in the matix. > > ---8<---------------- > > While, > > >>> import numpy > >>> z = numpy.zeros((2,2), 'O') > >>> z > array([[0, 0], > [0, 0]], dtype=object) > >>> z.fill([1]) > >>> z > array([[1, 1], > [1, 1]], dtype=object) > > and > > >>> z.fill([1,2,3]) > >>> z > array([[1, 1], > [1, 1]], dtype=object) > > > I would have expected, > > >>> z > array([[[1], [1]], > [[1], [1]]], dtype=object) > > and > > >>> z > array([[[1, 2, 3], [1, 2, 3]], > [[1, 2, 3], [1, 2, 3]]], dtype=object) > > > Regards, Louis. > > -- > Louis Cordier cell: +27721472305 > Point45 Entertainment (Pty) Ltd. http://www.point45.org > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From ndarray at mac.com Fri Apr 28 10:04:08 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 28 10:04:08 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: See . On 4/28/06, Sasha wrote: > The core dump is definitely a bug. I reproduced it on my Linux > system. Please create a ticket. I am not sure whether fill should > copy objects or not. When you populate an array with immutable > objects, creating multiple copies is a waste. > > On 4/28/06, Louis Cordier wrote: > > > > Hi, I am not sure if this is the proper place to do a bug post. > > I looked at the active tickets on http://projects.scipy.org/scipy/numpy/ > > but didn't feel confident to go and create a new one. ;) > > > > Anyway the current release version 0.9.6 have some broken behavior. > > I guess some example code would illustrate it best. > > > > ---8<---------------- > > > > >>> z = numpy.zeros((10,10), 'O') > > >>> z.fill(None) > > >>> z.fill([]) > > Segmentation fault (core dumped) > > > > This happens on both Linux and FreeBSD machines. > > (both builds use *_lite versions of Lapack) > > > > Linux bellagio 2.6.11-1.1369_FC4 #1 Thu Jun 2 22:55:56 EDT 2005 i686 i686 > > i386 GNU/Linux > > Python 2.4.1 > > gcc version 4.0.0 20050519 (Red Hat 4.0.0-8) > > > > FreeBSD cerberus.intranet 5.4-RELEASE-p12 FreeBSD 5.4-RELEASE-p12 #0: Wed > > Mar 15 16:06:48 UTC 2006 > > Python 2.4.2 > > gcc version 3.4.2 [FreeBSD] 20040728 > > > > I assume fill() will need to make a copy, of the object > > for each coordinate in the matix. > > > > ---8<---------------- > > > > While, > > > > >>> import numpy > > >>> z = numpy.zeros((2,2), 'O') > > >>> z > > array([[0, 0], > > [0, 0]], dtype=object) > > >>> z.fill([1]) > > >>> z > > array([[1, 1], > > [1, 1]], dtype=object) > > > > and > > > > >>> z.fill([1,2,3]) > > >>> z > > array([[1, 1], > > [1, 1]], dtype=object) > > > > > > I would have expected, > > > > >>> z > > array([[[1], [1]], > > [[1], [1]]], dtype=object) > > > > and > > > > >>> z > > array([[[1, 2, 3], [1, 2, 3]], > > [[1, 2, 3], [1, 2, 3]]], dtype=object) > > > > > > Regards, Louis. > > > > -- > > Louis Cordier cell: +27721472305 > > Point45 Entertainment (Pty) Ltd. http://www.point45.org > > > > > > > > ------------------------------------------------------- > > Using Tomcat but need to do more? Need to support web services, security? > > Get stuff done quickly with pre-integrated technology to make your job easier > > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at lists.sourceforge.net > > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > > > From lcordier at point45.com Fri Apr 28 10:24:04 2006 From: lcordier at point45.com (Louis Cordier) Date: Fri Apr 28 10:24:04 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: > See . >> > >>> z.fill([1,2,3]) >> > >>> z >> > array([[1, 1], >> > [1, 1]], dtype=object) >> > >> > I would have expected, >> > >> > >>> z >> > array([[[1, 2, 3], [1, 2, 3]], >> > [[1, 2, 3], [1, 2, 3]]], dtype=object) Souldn't the second example be a ticket ? Or is it part of #86 ? Regards, Louis. -- Louis Cordier cell: +27721472305 Point45 Entertainment (Pty) Ltd. http://www.point45.org From ndarray at mac.com Fri Apr 28 10:49:02 2006 From: ndarray at mac.com (Sasha) Date: Fri Apr 28 10:49:02 2006 Subject: [Numpy-discussion] Bug In-Reply-To: References: Message-ID: On 4/28/06, Louis Cordier wrote: > Souldn't the second example be a ticket ? > Or is it part of #86 ? I think all your examples are different signs of the same problem. You can help by converting your examples into unit tests to be added to say test_multiarray.py and attaching a patch to the ticket. A brief comment for the developers: the problem that Louis reported is caused by the fact that x.fill([]) creates an empty array internally instead of a scalar object array containing an empty list. Note that numpy does not even have a good notation for the required object: >>> from numpy import * >>> x = zeros(1,'O') >>> x.shape=() >>> x[()] = [] >>> x array([], dtype=object) >>> x.shape () but >>> array([], dtype=object).shape (0,) From fullung at gmail.com Fri Apr 28 15:32:13 2006 From: fullung at gmail.com (Albert Strasheim) Date: Fri Apr 28 15:32:13 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> Message-ID: <007701c66b13$8365df00$0a84a8c0@dsp.sun.ac.za> Hello Travis I'm having some problems compiling the scalarmath code with the Visual Studio .NET 2003 compiler. Specifically, the compiler is failing to link in the llabs, fabsf and sqrtf functions. The reason it is not finding these symbols could be explained by the following errors I get when building the object file by hand using the parameters distutils passes to the compiler (for some reason distutils is suppressing compiler output -- this is pretty, but it makes debugging build failures hard): build\src.win32-2.4\numpy\core\src\scalarmathmodule.c(1737) : warning C4013: 'llabs' undefined; assuming extern returning int build\src.win32-2.4\numpy\core\src\scalarmathmodule.c(1751) : warning C4013: 'fabsf' undefined; assuming extern returning int build\src.win32-2.4\numpy\core\src\scalarmathmodule.c(1773) : warning C4013: 'sqrtf' undefined; assuming extern returning int In c:\Program Files\Microsoft Visual Studio .NET 2003\vc7\crt\src\math.h I have the following (extra code stripped): ... #ifndef __cplusplus #define acosl(x) ((long double)acos((double)(x))) #define asinl(x) ((long double)asin((double)(x))) #define atanl(x) ((long double)atan((double)(x))) ... /* NOTE! no sqrtf or fabsf is defined in this block */ #else /* __cplusplus */ ... #if !defined (_M_MRX000) && !defined (_M_ALPHA) && !defined (_M_IA64) /* NOTE! none of the above are defined on x86 */ ... inline float fabsf(float _X) {return ((float)fabs((double)_X)); } ... inline float sqrtf(float _X) {return ((float)sqrt((double)_X)); } ... #endif /* !defined (_M_MRX000) && !defined (_M_ALPHA) && !defined (_M_IA64) */ #endif /* __cplusplus */ >From this it would seem that Microsoft doesn't consider sqrtf and fabsf to be part of the C language? However, the C++ code provides a clue for how they implemented it. Also, llabs isn't defined anywhere. From reading the MSDN docs, I suspect it is called _abs64 on Windows. Regards, Albert > -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 28 April 2006 09:13 > To: numpy-discussion > Subject: [Numpy-discussion] Scalar math module is ready for testing > > > The scalar math module is complete and ready to be tested. It should > speed up code that relies heavily on scalar arithmetic by by-passing the > ufunc machinery. > > It needs lots of testing to be sure that it is doing the "right" > thing. To enable scalarmath you need to > > import numpy.core.scalarmath > > You cannot disable it once it's enabled except by restarting Python. If > we need that feature we can add it. The array scalars respond to the > error modes of ufuncs. > > There is an experimental function called alter_scalars that replaces the > Python int, float, and complex number tables with the array scalar > equivalents. Thus, to amaze (or seriously annoy) your Python friends > you can do > > import numpy.core.scalarmath as ncs > > ncs.alter_scalars(int) > > 1 / 0 > > This will return 0 unless you change the error modes... > > ncs.retore_scalars(int) > > Will put things back the way Guido intended.... > > > Please try it out and send us error reports. Many thanks to Sasha for > his help in getting all the code so it at least compiles and loads. All > bugs should be blamed on me, though... > > > Best, > > -Travis From jonathan.taylor at stanford.edu Fri Apr 28 16:21:15 2006 From: jonathan.taylor at stanford.edu (Jonathan Taylor) Date: Fri Apr 28 16:21:15 2006 Subject: [Numpy-discussion] confusing recarray behaviour Message-ID: <44528318.6010604@stanford.edu> I'm new to recarrays and have been struggling with them. I keep getting an exception TypeError: expected a readable buffer object with no informative traceback. What I pass to N.array seems to agree with the examples in numpybook. Below is an example that does work for me (excuse the longish example but it was just cut and paste to make my life easier). In my code, funny things happen (see ipython excerpt below this). In particular, I have a list v with v[0:2] = V and with the same dtype "ddesc" I get this exception when I change V to v[0:2]. Any help would be appreciated. --------------------------------------------------------------------------------------- import numpy as N timedesc = N.dtype({'names':['tm_year', 'tm_mon', 'tm_mday', 'tm_hour', 'tm_min', 'tm_sec', 'tm_wday', 'tm_yday', 'tm_isdst'], 'formats':['i2']*9}) ddesc = N.dtype({'names': ('Week', 'Date', 'Institution', 'SeqNo', 'HeightDone', 'Height', 'UnitsH', 'WeightDone', 'Weight', 'Units', 'PulseDone', 'Pulse', 'BPdone', 'BPSys', 'BPDia', 'PID', 'RN'), 'formats': ['f4', timedesc] + ['f4']*15}) V = [(12.0, (2005, 4, 22, 0, 0, 0, 4, 112, -1), 501.0, 1.0, 2.0, 0.0, 0, 1.0, 91.5, 1.0, 1.0, 87.0, 1.0, 129.0, 76.0, 107.0, 11.0), (24.0, (2005, 2, 1, 0, 0, 0, 1, 32, -1), 504.0, 1.0, 2.0, 0.0, 0, 1.0, 166.0, 2.0, 1.0, 84.0, 1.0, 128.0, 78.0, 401.0, 7.0) ] w=N.array(V, dtype=ddesc) -------------------------------------------------------------------------------------------------- In [97]:v[0:2] == V Out[97]:True In [98]:N.array(V, ddesc) Out[98]: array([ (12.0, (2005, 4, 22, 0, 0, 0, 4, 112, -1), 501.0, 1.0, 2.0, 0.0, 0.0, 1.0, 91.5, 1.0, 1.0, 87.0, 1.0, 129.0, 76.0, 107.0, 11.0), (24.0, (2005, 2, 1, 0, 0, 0, 1, 32, -1), 504.0, 1.0, 2.0, 0.0, 0.0, 1.0, 166.0, 2.0, 1.0, 84.0, 1.0, 128.0, 78.0, 401.0, 7.0)], dtype=[('Week', ' TypeError: expected a readable buffer object -- ------------------------------------------------------------------------ I'm part of the Team in Training: please support our efforts for the Leukemia and Lymphoma Society! http://www.active.com/donate/tntsvmb/tntsvmbJTaylor GO TEAM !!! ------------------------------------------------------------------------ Jonathan Taylor Tel: 650.723.9230 Dept. of Statistics Fax: 650.725.8977 Sequoia Hall, 137 www-stat.stanford.edu/~jtaylo 390 Serra Mall Stanford, CA 94305 -------------- next part -------------- A non-text attachment was scrubbed... Name: jonathan.taylor.vcf Type: text/x-vcard Size: 329 bytes Desc: not available URL: From Fernando.Perez at colorado.edu Fri Apr 28 16:21:17 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Apr 28 16:21:17 2006 Subject: [Numpy-discussion] [OT] A weekend floating point/compiler question Message-ID: <44528F49.3080005@colorado.edu> Hi all, this is somewhat off-topic, since it's really a gcc/g77 question. Yet for us here (my group) it may lead to the decision to stop using g77 for all fortran code and switch to another compiler for our python-wrapped libraries. So it did arise in the context of python usage of in-house code, and I'm appealing to anyone who may want to play a little with the question and help. Feel free to reply off-list to keep the noise down on the list. The problem arose in some in-house library, but can be boiled down to this: planck[f77bug]> cat testbug.f program testbug c implicit real *8 (a-h,o-z) c half = 0.5d0 x = 0.49d0 nnx = 100 iax = (x+half)*nnx print *, 'Should be 99:',iax stop end c EOF planck[f77bug]> g77 -o testbug.g77 testbug.f planck[f77bug]> ./testbug.g77 Should be 99: 98 This can be seen as computing (x/n+1/2)*n and comparing it to x+n/2. Yes, I know about the dangers of floating point roundoff error (I didn't write the original code), but a variation of this is used inside a library that began crashing for certain inputs. The point is that this same code works fine with the Intel and Lahey compilers, but not with g77. Now, to add a bit of mystery to the question, I wrote the following C code: planck[f77bug]> cat scanbug.c #include int main(int argc, char* argv[]) { double x; double eps = 1e-2; double x0 = 0.0; double xmax = 1.0; int nnx = 100; int i = 0; double dax; int iax,iax_direct; x = x0; while (x References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> <44519C6E.80006@ieee.org> Message-ID: Many thanks, with your help, I got it working without any leaks. I need to run on ~10 TB of data, so fixing this leak sure helps my program scale. One error in the code below is that PyString_FromFormat does not accept %f, so I created a regular string and created the PyString with PyString_FromString (it seems to copy data), then freed the regular string. Is there any better way to do that? I'm curious why I didn't see any explanation of PyArray_DATA in the NumPy book. It seems really important, especially if you're touting it as the Proper Strategy. Finally, Robert encouraged me to stop using the legacy interface. I'm happy to do so, but I have to cater to my users. Approximately old a version of Numeric (and Numarray) will still work with PyArray_SimpleNew? Thanks, Nick On Apr 28, 2006, at 12:39 AM, Travis Oliphant wrote: > The proper strategy for your arrays is to use PyArray_SimpleNew and > then get the data-pointer to fill using PyArray_DATA(...). The > proper way to handle strings is to create a new string (say using > PyString_FromFormat) and then return everything as objects. > > > > /* make sure shape is defined as intp unless you don't care about > 64-bit */ > obj2 = PyArray_SimpleNew(1, shape, PyArray_DOUBLE); > data = (double *)PyArray_DATA(obj2) > [snip...] > out5 = PyString_FromFormat("Starting GPS time:%.1f UTC=%s", > vect->GTime,FrStrGTime(utc)); > > return Py_BuildValue("(NNNdNNN)",out1,out2,out3,out4,out5,out6,out7); > > > Make sure you use the 'N' tag so that another reference count isn't > generated. The 'O' tag will increase the reference count of your > objects by one which is is not necessarily what you want (but > sometimes you do). From robert.kern at gmail.com Fri Apr 28 16:43:18 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 28 16:43:18 2006 Subject: [Numpy-discussion] Re: Freeing memory allocated in C In-Reply-To: References: <1C119933-F93B-47C6-AADA-4A61DF16B745@mit.edu> <44519C6E.80006@ieee.org> Message-ID: Nick Fotopoulos wrote: > I'm curious why I didn't see any explanation of PyArray_DATA in the > NumPy book. It seems really important, especially if you're touting it > as the Proper Strategy. Section 13.3 talks about PyArray_DATA. > Finally, Robert encouraged me to stop using the legacy interface. I'm > happy to do so, but I have to cater to my users. Approximately old a > version of Numeric (and Numarray) will still work with PyArray_SimpleNew? None. It is new to Numpy. The old way would be to use PyArray_FromDims. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Fernando.Perez at colorado.edu Fri Apr 28 16:55:02 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Apr 28 16:55:02 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question Message-ID: <4452AB3F.8090700@colorado.edu> Hi Robert and George, We found a bug in g77 v. 3.4.4 as well as in gcc, which manifests itself in the following little snippet: planck[f77bug]> cat testbug.f program testbug c implicit real *8 (a-h,o-z) c half = 0.5d0 x = 0.49d0 nnx = 100 iax = (x+half)*nnx print *, 'Should be 99:',iax stop end c EOF planck[f77bug]> g77 -o testbug.g77 testbug.f planck[f77bug]> ./testbug.g77 Should be 99: 98 This can be seen as computing (x/n+1/2)*n and comparing it to x+n/2. Greg is using this in a number of places inside a library, which had never given trouble before when built with other compilers, like the sun, IBM, Intel and Lahey ones. Now with g77 it gives the result above. Questions: 1. Have you seen similar behavior in the past? 2. If we switch away from g77, what do you suggest moving towards? We ran paranoia on ifort, lahey and g77, and lahey was the best performing of all. The intel one has the advantage of being free. On the other hand, paranoia did complain about arithmetic issues with it (though the above code works fine with intel). Any ideas you can give us would be very appreciated. Cheers, Fernando and Greg. ps. Apparently g77 v 3.3.2 does NOT have this problem. From robert.kern at gmail.com Fri Apr 28 16:58:15 2006 From: robert.kern at gmail.com (Robert Kern) Date: Fri Apr 28 16:58:15 2006 Subject: [Numpy-discussion] Re: [OT] A weekend floating point/compiler question In-Reply-To: <44528F49.3080005@colorado.edu> References: <44528F49.3080005@colorado.edu> Message-ID: <4452ABFE.2040307@gmail.com> Fernando Perez wrote: > Any ideas/comments? Shouldn't the result be independent of the > intermediate double var? It is for icc, can this be considered a gcc bug? It seems like it might be processor-specific. On my G4 Powerbook (g77 3.4.4, gcc 3.3) and AMD64 Linux desktop (g77 3.4.5, gcc 4.0.2), both programs give the expected results. Specifically, the Intel 80-bit FPU thingy is probably a factor. It might be worth filing a bug report against gcc. If nothing else, you might get a better explanation of what's going on. -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Fernando.Perez at colorado.edu Fri Apr 28 17:13:16 2006 From: Fernando.Perez at colorado.edu (Fernando Perez) Date: Fri Apr 28 17:13:16 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question In-Reply-To: <4452AB3F.8090700@colorado.edu> References: <4452AB3F.8090700@colorado.edu> Message-ID: <4452AF7D.6040008@colorado.edu> Fernando Perez wrote: > Hi Robert and George, Sorry! I was writing the same question to two colleagues and forgot to change the TO line. My apology. Cheers, f From gnchen at cortechs.net Fri Apr 28 18:08:03 2006 From: gnchen at cortechs.net (Gennan Chen) Date: Fri Apr 28 18:08:03 2006 Subject: [Numpy-discussion] Guide to Numpy book Message-ID: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> Hi! What is the newest version of Guide to numpy? The recent one I got is dated at Jan 9 2005 on the cover. Gen-Nan Chen, PhD Chief Scientist Research and Development Group CorTechs Labs Inc (www.cortechs.net) 1020 Prospect St., #304, La Jolla, CA, 92037 Tel: 1-858-459-9700 ext 16 Fax: 1-858-459-9705 Email: gnchen at cortechs.net From luis at geodynamics.org Fri Apr 28 18:29:03 2006 From: luis at geodynamics.org (Luis Armendariz) Date: Fri Apr 28 18:29:03 2006 Subject: [Numpy-discussion] Guide to Numpy book In-Reply-To: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> References: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> Message-ID: <4452C145.8050803@geodynamics.org> Gennan Chen wrote: > Hi! > > What is the newest version of Guide to numpy? The recent one I got is > dated at Jan 9 2005 on the cover. > The one I got yesterday is dated March 15, 2006. -Luis From robert.kern at gmail.com Sat Apr 29 00:31:22 2006 From: robert.kern at gmail.com (Robert Kern) Date: Sat Apr 29 00:31:22 2006 Subject: [Numpy-discussion] Re: A python interface for loess ? In-Reply-To: <200604260329.17115.pgmdevlist@mailcan.com> References: <200604260329.17115.pgmdevlist@mailcan.com> Message-ID: <4453162E.1040901@gmail.com> Pierre GM wrote: > Folks, > Would any of you be aware of a Python interface to the loess routines ? > http://netlib.bell-labs.com/netlib/a/dloess.gz Not specifically this code, but there is a pure Python+old Numeric implementation of lowess in BioPython, specifically in the Bio.Statistics subpackage. It's short and could be easily ported to use numpy. http://www.biopython.org -- Robert Kern robert.kern at gmail.com "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From chris at pseudogreen.org Sat Apr 29 09:09:11 2006 From: chris at pseudogreen.org (Christopher Stawarz) Date: Sat Apr 29 09:09:11 2006 Subject: [Numpy-discussion] Re: A weekend floating point/compiler question Message-ID: <01fa3363e635409f488757070c5f8268@pseudogreen.org> Hi, I don't think this is a GCC bug, but it does seem to be related to Intel's 80-bit floating-point architecture. As of the Pentium 3, Intel and compatible processors have two sets of instructions for performing floating-point operations: the original 8087 set, which do all computations at 80-bit precision, and SSE (and their extension SSE2), which don't use extended precision. GCC allows you to select either instruction set. Unfortunately, in the absence of an explicit choice, it uses a default target that varies by platform: The i386 version defaults to 8087 instructions, while the x86-64 version defaults to SSE. See http://gcc.gnu.org/onlinedocs/gcc-4.1.0/gcc/i386-and-x86_002d64- Options.html for details. I can make your test programs behave correctly on a Pentium 4 by selecting SSE2: devel12-35: g77 testbug.f devel12-36: ./a.out Should be 99: 98 devel12-37: g77 -msse2 -mfpmath=sse testbug.f devel12-38: ./a.out Should be 99: 99 devel12-39: gcc scanbug.c devel12-40: ./a.out | head -1 ERROR at x=3.000000e-02! devel12-41: gcc -msse2 -mfpmath=sse scanbug.c devel12-42: ./a.out devel12-43: Interestingly, I expected to be able to induce incorrect results on an Opteron by using 8087, but that wasn't the case (both instruction sets produced the correct result). I'll have to think about why that's happening -- maybe casting between ints and doubles differs between 32 and 64-bit architectures? I've never used the Intel or Lahey Fortran compilers, but I suspect they must be generating SSE instructions by default. Actually, it's interesting that the 80-bit computations are causing problems here, since it's easy to come up with examples where they give you better results than computations done without the extra bits. Hope that helps, Chris From charlesr.harris at gmail.com Sat Apr 29 10:25:01 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat Apr 29 10:25:01 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question In-Reply-To: <4452AB3F.8090700@colorado.edu> References: <4452AB3F.8090700@colorado.edu> Message-ID: On 4/28/06, Fernando Perez wrote: > > Hi Robert and George, > > We found a bug in g77 v. 3.4.4 as well as in gcc, which manifests itself > in > the following little snippet: > > planck[f77bug]> cat testbug.f > program testbug > c > implicit real *8 (a-h,o-z) > c > half = 0.5d0 > x = 0.49d0 > nnx = 100 > iax = (x+half)*nnx > > print *, 'Should be 99:',iax > > stop > end > > c EOF I don't see why the answer should be 99. The number .99 can not be exactly represented in IEEE floating point, in fact it is ~ 0.9899999999999999911182. So as you can see the result is perfectly correct given the standard conversion to int by truncation. IMHO, this is programmer error, not a compiler problem and should be fixed in the code. Now you may get slightly different results depending on roundoff error if you indulge in such things as (.5 + .49)*100 vs (.33 + .17 + .49)*100, and since these numbers are constants they may also be precomputed by the compiler and the results will depend on the accuracy of the compiler's computation. The whole construction is ambiguous. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Apr 29 10:43:08 2006 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat Apr 29 10:43:08 2006 Subject: [Numpy-discussion] A weekend floating point/compiler question In-Reply-To: References: <4452AB3F.8090700@colorado.edu> Message-ID: On 4/29/06, Charles R Harris wrote: > > > > On 4/28/06, Fernando Perez wrote: > > > > Hi Robert and George, > > > > We found a bug in g77 v. 3.4.4 as well as in gcc, which manifests itself > > in > > the following little snippet: > > > > planck[f77bug]> cat testbug.f > > program testbug > > c > > implicit real *8 (a-h,o-z) > > c > > half = 0.5d0 > > x = 0.49d0 > > nnx = 100 > > iax = (x+half)*nnx > > > > print *, 'Should be 99:',iax > > > > stop > > end > > > > c EOF > > > I don't see why the answer should be 99. The number .99 can not be exactly > represented in IEEE floating point, in fact it is ~ > 0.9899999999999999911182. So as you can see the result is perfectly > correct given the standard conversion to int by truncation. IMHO, this is > programmer error, not a compiler problem and should be fixed in the code. > Now you may get slightly different results depending on roundoff error if > you indulge in such things as (.5 + .49)*100 vs (.33 + .17 + .49)*100, and > since these numbers are constants they may also be precomputed by the > compiler and the results will depend on the accuracy of the compiler's > computation. The whole construction is ambiguous. > > Chuck > As an example: #include int main(int argc, char** argv) { int x = 100; long double y = .49; long double z = .50; printf("%25.22Lf\n", (y + z)*x); return 0; } prints 98.9999999999999991118216 whereas the same code with doubles instead of long doubles prints 99.0000000000000000000000. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant.travis at ieee.org Sat Apr 29 13:13:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 13:13:05 2006 Subject: [Numpy-discussion] confusing recarray behaviour In-Reply-To: <44528318.6010604@stanford.edu> References: <44528318.6010604@stanford.edu> Message-ID: <4453C8B7.8040000@ieee.org> Jonathan Taylor wrote: > > What I pass to N.array seems to agree with the examples in numpybook. > > Below is an example that does work for me (excuse the longish example > but it was just cut and paste to make my life easier). In my code, > funny things happen > (see ipython excerpt below this). In particular, I have a list v with > v[0:2] = V and with the > same dtype "ddesc" I get this exception when I change V to v[0:2]. Please show us what v is. If I run v = V[:] and then try N.array(v[0:2],ddesc) I don't get any error. So something else must be going on. Which version are you running? -Travis From fullung at gmail.com Sat Apr 29 14:30:10 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 29 14:30:10 2006 Subject: [Numpy-discussion] Array data and struct alignment Message-ID: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> Hello all I'm busy wrapping a C library with NumPy. Some of the functions operate on a buffer containing structs that look like this: struct node { int index; double value; }; On the Python side, I do the following to set up my data. examples is a list containing lists or dicts. nodes = [] for example in examples: if type(example) is dict: nodes.append(example.items()) else: nodes.append(zip(range(1, len(example)+1), example)) descr = [('index','intc',1),('value','f8',1)] self.nodes = map(lambda x: array(x, dtype=descr), nodes) Assume example = [[1.0, 2.0, 3.0], {4: 4.0}]. The nodes array can now be accessed in various useful ways: nodes[0][0] -> (1, 1.0) nodes[1][0] -> (4, 4.0)) nodes[0]['index'] -> [1,2,3] nodes[0]['value'] -> [1.0,2.0,3.0]) nodes[1]['index'] -> [4] nodes[1]['value'] -> [4.0] On the C side I can now do the following: PyObject* Svm_GetStructNode(PyObject* obj, PyObject* args) { PyObject* op1; struct node* node; if(!PyArg_ParseTuple(args, "O", &op1)) { return NULL; } node = (struct node*) PyArray_DATA(op1); return Py_BuildValue("(id)", node->index, node->value); } However, this only works if struct node is tightly packed (#pragma pack(1) with the Visual C compiler). I don't know how feasible this is, but it would be useful if NumPy could be told to pack its data on n-byte boundaries or on "same as the compiler" boundaries. I realise that there can be problems when mixing code compiled by more than one compiler, etc., etc., but a simple unit test can check for this. Any thoughts? Regards, Albert From oliphant.travis at ieee.org Sat Apr 29 14:58:01 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 14:58:01 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> References: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> Message-ID: <4453E10E.5090108@ieee.org> Albert Strasheim wrote: > Hello all > > I'm busy wrapping a C library with NumPy. Some of the functions operate on a > buffer containing structs that look like this: > > struct node { > int index; > double value; > }; > > [snip] > However, this only works if struct node is tightly packed (#pragma pack(1) > with the Visual C compiler). > > I don't know how feasible this is, but it would be useful if NumPy could be > told to pack its data on n-byte boundaries or on "same as the compiler" > boundaries. I realise that there can be problems when mixing code compiled > by more than one compiler, etc., etc., but a simple unit test can check for > this. > When you create a data-type using the dtype(...) syntax there is an align keyword that will "align" the data according to how the compiler does it. I'm not sure if it always works right so please test it out. So, in your case you should be able to say. descr = dtype([('index',intc),('value','f8')], align=1) Note, I've eliminated some unnecessary verbage in your description. Currently this is giving me an error that I will look into. -Travis From oliphant.travis at ieee.org Sat Apr 29 15:04:10 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 15:04:10 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> References: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> Message-ID: <4453E293.7080502@ieee.org> Albert Strasheim wrote: > Hello all > > I'm busy wrapping a C library with NumPy. Some of the functions operate on a > buffer containing structs that look like this: > > struct node { > int index; > double value; > }; > > In my previous discussion I was wrong. You cannot use the array_descriptor format for a data-type and the align keyword at the same time. You need to use a different method to specify fields. This, for example: descr = dtype({'names':['index', 'value'], 'formats':[intc,'f8']},align=1) On my (32-bit) system it doesn't produce any difference from align=0. -Travis From oliphant.travis at ieee.org Sat Apr 29 15:11:07 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sat Apr 29 15:11:07 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <4453E293.7080502@ieee.org> References: <001601c66bd4$0a37ddb0$0a84a8c0@dsp.sun.ac.za> <4453E293.7080502@ieee.org> Message-ID: <4453E449.20407@ieee.org> Travis Oliphant wrote: > Albert Strasheim wrote: >> Hello all >> >> I'm busy wrapping a C library with NumPy. Some of the functions >> operate on a >> buffer containing structs that look like this: >> >> struct node { >> int index; >> double value; >> }; >> >> > > In my previous discussion I was wrong. You cannot use the > array_descriptor format for a data-type and the align keyword at the > same time. You need to use a different method to specify fields. > > This, for example: > > descr = dtype({'names':['index', 'value'], > 'formats':[intc,'f8']},align=1) > > On my (32-bit) system it doesn't produce any difference from align=0. > > -Travis > > However notice the difference with >>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=1) dtype([('index', '>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=0) dtype([('index', ' References: <4452AB3F.8090700@colorado.edu> Message-ID: <4453F3A6.9030309@colorado.edu> Charles R Harris wrote: >>I don't see why the answer should be 99. The number .99 can not be exactly >>represented in IEEE floating point, in fact it is ~ >>0.9899999999999999911182. So as you can see the result is perfectly >>correct given the standard conversion to int by truncation. IMHO, this is >>programmer error, not a compiler problem and should be fixed in the code. >>Now you may get slightly different results depending on roundoff error if >>you indulge in such things as (.5 + .49)*100 vs (.33 + .17 + .49)*100, and >>since these numbers are constants they may also be precomputed by the >>compiler and the results will depend on the accuracy of the compiler's >>computation. The whole construction is ambiguous. >> >>Chuck >> > > > As an example: [...] Thanks to yours and the other replies. I did try resetting the FPU control word as suggested to only 64 bits, and in fact the 'problem' does disappear, and I suspect that's also why Robert sees differences in CPUs without the extra 16 internal FPU bits. I do agree that I don't like code like this, but unfortunately this one is outside of my control. For the sake of completeness (since this thread has some educational value on the vagaries of FP arithmetic), I've slightly extended your example to: abdul[f77bug]> cat print99.c #include int main(int argc, char** argv) { int x = 100; float fy = .49; float fz = .50; float fw = (fy + fz)*x; int ifw = fw; double y = .49; double z = .50; double w = (y + z)*x; int iw = w; long double ly = .49; long double lz = .50; long double lw = (ly + lz)*x; int ilw = lw; printf("floats:\n"); printf("w=%25.22f, iw=%d\n", fw,ifw); printf("doubles:\n"); printf("w=%25.22f, iw=%d\n", w,iw); printf("long doubles:\n"); printf("w=%25.22Lf, iw=%d\n", lw,ilw); return 0; } // EOF which gives on my box (AMD chip, running 32-bit fedora3): abdul[f77bug]> ./print99.gcc floats: w=99.0000000000000000000000, iw=99 doubles: w=99.0000000000000000000000, iw=99 long doubles: w=98.9999999999999991118216, iw=98 This is consitent with the calculations done in 80 bits giving also different results. One of the nice things about this community is precisely this kind of friendly expertise. Many thanks to all. Cheers, f From fullung at gmail.com Sat Apr 29 17:27:15 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sat Apr 29 17:27:15 2006 Subject: [Numpy-discussion] Array data and struct alignment In-Reply-To: <4453E449.20407@ieee.org> Message-ID: <001d01c66bec$c556ece0$0a84a8c0@dsp.sun.ac.za> Thanks Travis, this works like a charm. For the curious, here's a quick way to see if your system is doing the right thing: In [87]: descr = dtype({'names':['a', 'b'], 'formats':[byte,'f8']},align=1) In [88]: descr Out[88]: dtype([('a', '|i1'), ('', '|V7'), ('b', ' -----Original Message----- > From: numpy-discussion-admin at lists.sourceforge.net [mailto:numpy- > discussion-admin at lists.sourceforge.net] On Behalf Of Travis Oliphant > Sent: 30 April 2006 00:10 > To: numpy-discussion > Subject: Re: [Numpy-discussion] Array data and struct alignment > > Travis Oliphant wrote: > > Albert Strasheim wrote: > >> Hello all > >> > >> I'm busy wrapping a C library with NumPy. Some of the functions > >> operate on a > >> buffer containing structs that look like this: > >> > >> struct node { > >> int index; > >> double value; > >> }; > >> > >> > > > > In my previous discussion I was wrong. You cannot use the > > array_descriptor format for a data-type and the align keyword at the > > same time. You need to use a different method to specify fields. > > > > This, for example: > > > > descr = dtype({'names':['index', 'value'], > > 'formats':[intc,'f8']},align=1) > > > > On my (32-bit) system it doesn't produce any difference from align=0. > > > > -Travis > > > > > > However notice the difference with > > >>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=1) > dtype([('index', ' > >>> dtype({'names':['index', 'value'], 'formats':[short,'f8']},align=0) > dtype([('index', ' > > There is padding inserted in the first-case. This corresponds to how > the compiler packs a short; double struct on my system. The default is > align=0. You need to use the dtype() constructor to change the > default. The auto-constructor used in dtype= keyword calls will not > change the alignment from align=0. > > > -Travis From jonathan.taylor at stanford.edu Sat Apr 29 19:56:03 2006 From: jonathan.taylor at stanford.edu (Jonathan Taylor) Date: Sat Apr 29 19:56:03 2006 Subject: [Numpy-discussion] confusing recarray behaviour In-Reply-To: <4453C8B7.8040000@ieee.org> References: <44528318.6010604@stanford.edu> <4453C8B7.8040000@ieee.org> Message-ID: <44542730.4050609@stanford.edu> Here is a pickle file with v and desc, v is just a list of tuples with integer and string entries. My point with my example is that when I had two identical lists (i.e. v[0:2] == V) one time I got an error, the other time I didn't and the traceback had no information, i.e. I couldn't get anywhere with pdb. I am using svn revision 2456. Jonathan Travis Oliphant wrote: > Jonathan Taylor wrote: > >> >> What I pass to N.array seems to agree with the examples in numpybook. >> >> Below is an example that does work for me (excuse the longish example >> but it was just cut and paste to make my life easier). In my code, >> funny things happen >> (see ipython excerpt below this). In particular, I have a list v with >> v[0:2] = V and with the >> same dtype "ddesc" I get this exception when I change V to v[0:2]. > > Please show us what v is. > > If I run v = V[:] and then try N.array(v[0:2],ddesc) I don't get any > error. So something else must be going on. > > Which version are you running? > > > -Travis > > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache > Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion -- ------------------------------------------------------------------------ I'm part of the Team in Training: please support our efforts for the Leukemia and Lymphoma Society! http://www.active.com/donate/tntsvmb/tntsvmbJTaylor GO TEAM !!! ------------------------------------------------------------------------ Jonathan Taylor Tel: 650.723.9230 Dept. of Statistics Fax: 650.725.8977 Sequoia Hall, 137 www-stat.stanford.edu/~jtaylo 390 Serra Mall Stanford, CA 94305 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: dump.pickle URL: From awf at yahoo.co.kr Sun Apr 30 06:28:01 2006 From: awf at yahoo.co.kr (=?iso-2022-jp?B?Zndm?=) Date: Sun Apr 30 06:28:01 2006 Subject: [Numpy-discussion] =?iso-2022-jp?B?PRskQjIrNmI9NTRWJUolUxsoQj0=?= Message-ID: ??????????????? ??????????? http://biz-station.org/week/ ? gonghexinnian at yahoo.com.cn From ndarray at mac.com Sun Apr 30 10:12:06 2006 From: ndarray at mac.com (Sasha) Date: Sun Apr 30 10:12:06 2006 Subject: [Numpy-discussion] [Numeric] "put" into object array corrupts memory In-Reply-To: References: Message-ID: I know that Numeric is no longer maintained, but since this bug cost me two sleepless nights, I think it is appropriate to announce the bug and the fix to the list. ---------- Forwarded message ---------- From: SourceForge.net Date: Apr 30, 2006 12:58 PM Subject: [ numpy-Bugs-1479376 ] [Numeric] "put" into object array corrupts memory To: noreply at sourceforge.net Bugs item #1479376, was opened at 2006-04-30 12:46 Message generated for change (Comment added) made by belopolsky You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=101369&aid=1479376&group_id=1369 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Fatal Error Group: Normal bug Status: Open Priority: 5 Submitted By: Alexander Belopolsky (belopolsky) Assigned to: Nobody/Anonymous (nobody) Summary: [Numeric] "put" into object array corrupts memory Initial Comment: This is one of those bugs that are easier to fix than to reproduce: $ cat test-put.py class A(object): def __del__(self): print "deleting %r" % self a = A() from Numeric import * x = array([None], 'O') y = array([a], 'O') put(x,[0],y) del a,y print "exiting" $ python test-put.py deleting <__main__.A object at 0xf7e4d24c> exiting Fatal Python error: deletion of interned string failed Aborted (core dumped) Numeric version: 24.2 ---------------------------------------------------------------------- >Comment By: Alexander Belopolsky (belopolsky) Date: 2006-04-30 12:58 Message: Logged In: YES user_id=835142 Attached patch fixes the bug. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=101369&aid=1479376&group_id=1369 From vidar+list at 37mm.no Sun Apr 30 16:27:00 2006 From: vidar+list at 37mm.no (Vidar Gundersen) Date: Sun Apr 30 16:27:00 2006 Subject: [Numpy-discussion] Guide to Numpy book In-Reply-To: <4452C145.8050803@geodynamics.org> (Luis Armendariz's message of "Fri, 28 Apr 2006 18:28:37 -0700") References: <3FA6601C-819F-4F15-A670-829FC428F47B@cortechs.net> <4452C145.8050803@geodynamics.org> Message-ID: ===== Original message from Luis Armendariz | 29 Apr 2006: >> What is the newest version of Guide to numpy? The recent one I got is >> dated at Jan 9 2005 on the cover. > The one I got yesterday is dated March 15, 2006. aren't the updates supposed to be sent out to customers when available? From ted.horst at earthlink.net Sun Apr 30 16:50:08 2006 From: ted.horst at earthlink.net (Ted Horst) Date: Sun Apr 30 16:50:08 2006 Subject: [Numpy-discussion] Scalar math module is ready for testing In-Reply-To: <4451C076.40608@ieee.org> References: <4451C076.40608@ieee.org> Message-ID: <3856FA57-539D-47DE-8427-2A6BB508F917@earthlink.net> Here is an issue I am having with scalarmath: >>> import numpy >>> numpy.__version__ '0.9.7.2462' >>> import numpy.core.scalarmath >>> a = numpy.array([1], 'h') >>> 1*a array([1], dtype=int16) >>> 1*a[0] Traceback (most recent call last): File "", line 1, in ? TypeError: unsupported operand type(s) for *: 'int' and 'int16scalar' This happens because PyArray_CanCastSafely returns false for casting from int to short. alter_scalars(int) fixes this, but I have lots of non-numpy code that I don't want to behave differently. Ted On Apr 28, 2006, at 02:12, Travis Oliphant wrote: > The scalar math module is complete and ready to be tested. It > should speed up code that relies heavily on scalar arithmetic by by- > passing the ufunc machinery. From fullung at gmail.com Sun Apr 30 17:11:05 2006 From: fullung at gmail.com (Albert Strasheim) Date: Sun Apr 30 17:11:05 2006 Subject: [Numpy-discussion] Creating a descr with aligned=1 using the C API Message-ID: <000601c66cb3$b762a940$0a84a8c0@dsp.sun.ac.za> Hello all I was wondering what the best way would be to create the following descr using the C API: descr = dtype({'names' : ['index', 'value'], 'formats' : [intc, 'f8']}, align=1) One could use PyArray_DescrConverter in multiarraymodule.c, but there doesn't seem to be a way to specify aligned=1 and one would have to build the dict object before being able to pass it on for conversion. Unless there's another easy way I'm missing, the API could possibly do with a function like PyArray_DescrFromCommaString(const char*, int align) which calls _convert_from_commastring. By the way, what is the general format of these commastrings? Comments appreciated. Regards, Albert From tim.hochberg at cox.net Sun Apr 30 19:33:03 2006 From: tim.hochberg at cox.net (Tim Hochberg) Date: Sun Apr 30 19:33:03 2006 Subject: [Numpy-discussion] basearray lives! Message-ID: <445573B0.6020408@cox.net> After a fashion anyway. I implemented the simplest thing that could possibly work and I've left out some stuff that even I think we need (docstring, repr and str). Still it exists, ndarray inherits from it and some stuff seems to work automagically. >>> import numpy as n >>> ba = n.basearray([3,3], int, n.arange(9)) >>> ba >>> a = asarray(ba) >>> a array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) >>> a + ba array([[ 0, 2, 4], [ 6, 8, 10], [12, 14, 16]]) >>> isinstance(a, n.basearray) True >>> type(ba) >>> type(a) >>> len(dir(ba)) 19 >>> len(dir(a)) 156 Travis: should I go ahead and check this into the trunk? It shouldn't interfear with anything. The only change to ndarray is the tp_base, which sets up the inheritance. -tim From ndarray at mac.com Sun Apr 30 20:27:09 2006 From: ndarray at mac.com (Sasha) Date: Sun Apr 30 20:27:09 2006 Subject: [Numpy-discussion] basearray lives! In-Reply-To: <445573B0.6020408@cox.net> References: <445573B0.6020408@cox.net> Message-ID: Let me add my $.02. I am very much in favor of a basic array object. I would probably go much further than Tim in simplifying it. No need for repr/str. No number protocol. No sequence/mapping protocol either. Maybe even no dimensions/striding etc. What is left? Not much on top of buffer protocol: the type description. I've expressed this opinion several times before (and was criticised for not supporting it:-): I don't think a basearray should be a base class. The main reason is that in most cases subclasses will need to adapt all the array methods. In many cases (speaking from ma experience, but probably matrix folks can relate) the adaptation is not automatic and has to be done on the method by method bases. Exposure of the base class methods without adaptation or with wrong adaptation leads to errors. Unless the base array is truly minimalistic and stays this way, methods that are added to the base class in the future will likely not work unadapted. The only implementation that uses inheritance that I will like would be something similar to python's object type: rich C API and no Python API. Would you consider checking your implementation in without modifying ndarray's tp_base? On 4/30/06, Tim Hochberg wrote: > > After a fashion anyway. I implemented the simplest thing that could > possibly work and I've left out some stuff that even I think we need > (docstring, repr and str). Still it exists, ndarray inherits from it and > some stuff seems to work automagically. > > >>> import numpy as n > >>> ba = n.basearray([3,3], int, n.arange(9)) > >>> ba > > >>> a = asarray(ba) > >>> a > array([[0, 1, 2], > [3, 4, 5], > [6, 7, 8]]) > >>> a + ba > array([[ 0, 2, 4], > [ 6, 8, 10], > [12, 14, 16]]) > >>> isinstance(a, n.basearray) > True > >>> type(ba) > > >>> type(a) > > >>> len(dir(ba)) > 19 > >>> len(dir(a)) > 156 > > > Travis: should I go ahead and check this into the trunk? It shouldn't > interfear with anything. The only change to ndarray is the tp_base, > which sets up the inheritance. > > > > -tim > > > > > ------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/numpy-discussion > From oliphant.travis at ieee.org Sun Apr 30 21:45:05 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Apr 30 21:45:05 2006 Subject: [Numpy-discussion] Creating a descr with aligned=1 using the C API In-Reply-To: <000601c66cb3$b762a940$0a84a8c0@dsp.sun.ac.za> References: <000601c66cb3$b762a940$0a84a8c0@dsp.sun.ac.za> Message-ID: <44559204.3020902@ieee.org> Albert Strasheim wrote: > Hello all > > I was wondering what the best way would be to create the following descr > using the C API: > You can use the "new" method. PyArray_Descr *dtype PyObject *dict; dtype = PyArrayDescr_Type.ob_type->tp_new(dtype->ob_type, Py_BuildValue("Oi", dict, 1)); where the dict is the one you give. Yes, this could be an easier-to use API. > descr = dtype({'names' : ['index', 'value'], 'formats' : [intc, 'f8']}, > align=1) > > One could use PyArray_DescrConverter in multiarraymodule.c, but there > doesn't seem to be a way to specify aligned=1 and one would have to build > the dict object before being able to pass it on for conversion. > > Unless there's another easy way I'm missing, the API could possibly do with > a function like PyArray_DescrFromCommaString(const char*, int align) which > calls _convert_from_commastring. By the way, what is the general format of > these commastrings? > It's in the NumPy book and it's also documented by numarray... -Travis From oliphant.travis at ieee.org Sun Apr 30 21:49:02 2006 From: oliphant.travis at ieee.org (Travis Oliphant) Date: Sun Apr 30 21:49:02 2006 Subject: [Numpy-discussion] basearray lives! In-Reply-To: <445573B0.6020408@cox.net> References: <445573B0.6020408@cox.net> Message-ID: <445592EB.1000406@ieee.org> Tim Hochberg wrote: > > After a fashion anyway. I implemented the simplest thing that could > possibly work and I've left out some stuff that even I think we need > (docstring, repr and str). Still it exists, ndarray inherits from it > and some stuff seems to work automagically. > > >>> import numpy as n > >>> ba = n.basearray([3,3], int, n.arange(9)) > >>> ba > > >>> a = asarray(ba) > >>> a > array([[0, 1, 2], > [3, 4, 5], > [6, 7, 8]]) > >>> a + ba > array([[ 0, 2, 4], > [ 6, 8, 10], > [12, 14, 16]]) > >>> isinstance(a, n.basearray) > True > >>> type(ba) > > >>> type(a) > > >>> len(dir(ba)) > 19 > >>> len(dir(a)) > 156 > > > Travis: should I go ahead and check this into the trunk? It shouldn't > interfear with anything. The only change to ndarray is the tp_base, > which sets up the inheritance. > I say go ahead. We can then all deal with it there and improve upon it. The ndarray used to inherit from another array and things worked. Python's inheritance in C is actually quite slick. Especially for structural issues. I agree that the basearray should have minimal operations (I would not even define several of the protocols for it). I'd probably only keep the buffer and mapping protocol but even then probably only a simple mapping protocol (i.e. no fancy-indexing) that then gets enhanced by the ndarray. Thanks for the work. -Travis