From Nicolas.Rougier at inria.fr Fri May 1 03:49:50 2015 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Fri, 1 May 2015 09:49:50 +0200 Subject: [Numpy-discussion] EuroScipy 2015: Extended deadline (15/05/2015) Message-ID: <763F7E6D-7850-403E-ADEC-79167A31FD41@inria.fr> -------------------------------- Extended deadline: 15th May 2015 -------------------------------- EuroScipy 2015, the annual conference on Python in science will take place in Cambridge, UK on 26-30 August 2015. The conference features two days of tutorials followed by two days of scientific talks & posters and an extra day dedicated to developer sprints. It is the major event in Europe in the field of technical/scientific computing within the Python ecosystem. Data scientists, analysts, quants, PhD's, scientists and students from more than 20 countries attended the conference last year. The topics presented at EuroSciPy are very diverse, with a focus on advanced software engineering and original uses of Python and its scientific libraries, either in theoretical or experimental research, from both academia and the industry. Submissions for posters, talks & tutorials (beginner and advanced) are welcome on our website at http://www.euroscipy.org/2015/ Sprint proposals should be addressed directly to the organisation at euroscipy-org at python.org Important dates =============== Mar 24, 2015 Call for talks, posters & tutorials Apr 30, 2015 Talk and tutorials submission deadline May 15, 2015 EXTENDED DEADLINE May 1, 2015 Registration opens May 30, 2015 Final program announced Jun 15, 2015 Early-bird registration ends Aug 26-27, 2015 Tutorials Aug 28-29, 2015 Main conference Aug 30, 2015 Sprints We look forward to an exciting conference and hope to see you in Cambridge The EuroSciPy 2015 Team - http://www.euroscipy.org/2015/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Fri May 1 05:26:50 2015 From: faltet at gmail.com (Francesc Alted) Date: Fri, 01 May 2015 11:26:50 +0200 Subject: [Numpy-discussion] ANN: PyTables 3.2.0 RC2 is out Message-ID: <554346DA.5000309@gmail.com> =========================== Announcing PyTables 3.2.0rc2 =========================== We are happy to announce PyTables 3.2.0rc2. ******************************* IMPORTANT NOTICE: If you are a user of PyTables, it needs your help to keep going. Please read the next thread as it contains important information about the future (or lack of it) of the project: https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4 Thanks! ******************************* What's new ========== This is a major release of PyTables and it is the result of more than a year of accumulated patches, but most specially it fixes a couple of nasty problem with indexed queries not returning the correct results in some scenarios (mainly pandas users). There are many usability and performance improvements too. In case you want to know more in detail what has changed in this version, please refer to: http://www.pytables.org/release_notes.html You can install it via pip or download a source package with generated PDF and HTML docs from: http://sourceforge.net/projects/pytables/files/pytables/3.2.0rc2 For an online version of the manual, visit: http://www.pytables.org/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers From aymeric.rateau at gmail.com Mon May 4 16:17:42 2015 From: aymeric.rateau at gmail.com (Gmail) Date: Mon, 04 May 2015 22:17:42 +0200 Subject: [Numpy-discussion] read not byte aligned records Message-ID: <5547D3E6.9080400@gmail.com> Hi, I am developping a code to read binary files (MDF, Measurement Data File). In its previous version 3, data was always byte aligned. I used widely numpy.core.records module (fromstring, fromfile) showing good performance to read and unpack data on the fly. However, in the latest version 4, not byte aligned data is possible. It allows to reduce size of file, especially when raw data is not actually recorded on bytes, like 10bits for analog converter. For instance, a record structure could be: uint64, float32, uint8, unit10, padding 6bits, uint9, padding 7bits, uint24, uint24, uint24, etc. I found a way using instead of numpy.core.records the bitstring module to read these records when not aligned but performance is much worse (I did not try cython implementation though but in python like x10) ? Would there be a pure numpy way to do ? Regards Aymeric From Jerome.Kieffer at esrf.fr Tue May 5 01:21:24 2015 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Tue, 5 May 2015 07:21:24 +0200 Subject: [Numpy-discussion] read not byte aligned records In-Reply-To: <5547D3E6.9080400@gmail.com> References: <5547D3E6.9080400@gmail.com> Message-ID: <20150505072124.aa8746c35d26992bb5f16ec2@esrf.fr> Hi, If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 entries at a time... -- J?r?me Kieffer Data analysis unit - ESRF From njs at pobox.com Tue May 5 02:15:46 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 4 May 2015 23:15:46 -0700 Subject: [Numpy-discussion] read not byte aligned records In-Reply-To: <20150505072124.aa8746c35d26992bb5f16ec2@esrf.fr> References: <5547D3E6.9080400@gmail.com> <20150505072124.aa8746c35d26992bb5f16ec2@esrf.fr> Message-ID: On Mon, May 4, 2015 at 10:21 PM, Jerome Kieffer wrote: > Hi, > If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 entries at a time... NumPy arrays don't have any support for sub-byte alignment. So if you want to handle such data, you either need to write some manual packing/unpacking code (using bitshift operators, or perhaps np.unpackbits, or whatever), or use another library designed for doing this. You may find Cython useful to write the core packing/unpacking, since bit-by-bit processing in a for loop is not something that CPython is super well suited to. Good luck, -n -- Nathaniel J. Smith -- http://vorpus.org From aymeric.rateau at gmail.com Tue May 5 07:07:42 2015 From: aymeric.rateau at gmail.com (aymeric.rateau at gmail.com) Date: Tue, 05 May 2015 11:07:42 +0000 Subject: [Numpy-discussion] read not byte aligned records In-Reply-To: References: <5547D3E6.9080400@gmail.com> <20150505072124.aa8746c35d26992bb5f16ec2@esrf.fr> Message-ID: <080ff0d475a9941f5f752078524158b7@ratal.org> Hi, To answer Jerome (I hope), data is sometime spread on bytes shared by other data in the whole record. 10 bits was an example, sometimes, 24, 2, 8, 7 etc. all combined including some padding between them. I am not sure to have understood... To Nathaniel, yes indeed I could read the records in big/long bytes and apply right_shift and bitwise_and functions to extract each channels. I am a bit afraid of performance though. I am currently using bitstring module which is doing exactly this bits handling. It is implemented in both pure python and cython. I tried to use the pure python and performance drawback compared to byte aligned data is around 2-3x for similar file sizes. --> I will try with bitstring's cython implementation. --> I will also try the way using right_shift and bitwise_and Best will win but at least I am sure I am not missing any trick or optimisation and I am in the right direction from your answers. Thanks ! Regards Aymeric 5 mai 2015 08:15 "Nathaniel Smith" a ?crit: > On Mon, May 4, 2015 at 10:21 PM, Jerome Kieffer wrote: > >> Hi, >> If you want to play with 10 bits data-blocks, read 5 bytes and work with 4 entries at a time... > > NumPy arrays don't have any support for sub-byte alignment. So if you > want to handle such data, you either need to write some manual > packing/unpacking code (using bitshift operators, or perhaps > np.unpackbits, or whatever), or use another library designed for doing > this. You may find Cython useful to write the core packing/unpacking, > since bit-by-bit processing in a for loop is not something that > CPython is super well suited to. > > Good luck, > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Tue May 5 09:39:45 2015 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 5 May 2015 09:39:45 -0400 Subject: [Numpy-discussion] read not byte aligned records In-Reply-To: <080ff0d475a9941f5f752078524158b7@ratal.org> References: <5547D3E6.9080400@gmail.com> <20150505072124.aa8746c35d26992bb5f16ec2@esrf.fr> <080ff0d475a9941f5f752078524158b7@ratal.org> Message-ID: I have been very happy with the bitarray package. I don't know if it is faster than bitstring, but it is worth a mention. Just watch out for any hashing operations on its objects, it doesn't seem to do them right (set(), dict(), etc...), but comparison operations work just fine. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Tue May 5 11:13:22 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Tue, 05 May 2015 11:13:22 -0400 Subject: [Numpy-discussion] Should ndarray subclasses support the keepdims arg? Message-ID: <5548DE12.2060606@gmail.com> Hello all, A question: Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword argument, but ndarray subclass methods sometimes don't. The 'matrix' subclass doesn't, and numpy functions like 'np.sum' intentionally drop/ignore the keepdims argument when called with an ndarray subclass as first argument. This means you can't always use ndarray subclasses as 'drop in' replacement for ndarrays if the code uses keepdims (even indirectly), and it means code that deals with keepdims (eg np.sum and more) has to detect ndarray subclasses and drop keepdims even if the subclass supports it (since there is no good way to detect support). It seems to me that if we are going to use inheritance, subclass methods should keep the signature of the parent class methods. What does the list think? ---- Details: ---- This problem comes up in a PR I'm working on (#5706) to add the keepdims arg to masked array methods. In order to support masked matrices (which a lot of unit tests check), I would have to detect and drop the keepdims arg to avoid an exception. This would be solved if the matrix class supported keepdims (plus an update to np.sum). Similarly, `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it could work if all subclasses supported keepdims. I do not foresee immediate problems with adding keepdims to the matrix methods, except that it would be an unused argument. Modifying `np.sum` to always pass on the keepdims arg is trickier, since it would break any code that tried to np.sum a subclass that doesn't support keepdims, eg pandas.DataFrame. **kwargs tricks might work. But if it's permissible I think it would be better to require subclasses to support all the keyword args ndarray supports. Allan From njs at pobox.com Tue May 5 13:55:07 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 May 2015 10:55:07 -0700 Subject: [Numpy-discussion] Should ndarray subclasses support the keepdims arg? In-Reply-To: <5548DE12.2060606@gmail.com> References: <5548DE12.2060606@gmail.com> Message-ID: AFAICT the only real solution here is for np.sum and friends to propagate the keepdims argument if and only if it was explicitly passed to them (or maybe the slightly different, if and only if it has a non-default value). If we just started requiring code to handle it and passing it unconditionally, then as soon as someone upgraded numpy all their existing code might break for no good reason. On May 5, 2015 8:13 AM, "Allan Haldane" wrote: > Hello all, > > A question: > > Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword > argument, but ndarray subclass methods sometimes don't. The 'matrix' > subclass doesn't, and numpy functions like 'np.sum' intentionally > drop/ignore the keepdims argument when called with an ndarray subclass > as first argument. > > This means you can't always use ndarray subclasses as 'drop in' > replacement for ndarrays if the code uses keepdims (even indirectly), > and it means code that deals with keepdims (eg np.sum and more) has to > detect ndarray subclasses and drop keepdims even if the subclass > supports it (since there is no good way to detect support). It seems to > me that if we are going to use inheritance, subclass methods should keep > the signature of the parent class methods. What does the list think? > > ---- Details: ---- > > This problem comes up in a PR I'm working on (#5706) to add the keepdims > arg to masked array methods. In order to support masked matrices (which > a lot of unit tests check), I would have to detect and drop the keepdims > arg to avoid an exception. This would be solved if the matrix class > supported keepdims (plus an update to np.sum). Similarly, > `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it > could work if all subclasses supported keepdims. > > I do not foresee immediate problems with adding keepdims to the matrix > methods, except that it would be an unused argument. Modifying `np.sum` > to always pass on the keepdims arg is trickier, since it would break any > code that tried to np.sum a subclass that doesn't support keepdims, eg > pandas.DataFrame. **kwargs tricks might work. But if it's permissible I > think it would be better to require subclasses to support all the > keyword args ndarray supports. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue May 5 14:05:08 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 5 May 2015 14:05:08 -0400 Subject: [Numpy-discussion] Should ndarray subclasses support the keepdims arg? In-Reply-To: References: <5548DE12.2060606@gmail.com> Message-ID: Maybe they should have written their code with **kwargs that consumes all keyword arguments rather than assuming that no keyword arguments would be added? The problem with this approach in general is that it makes writing code unnecessarily convoluted. On Tue, May 5, 2015 at 1:55 PM, Nathaniel Smith wrote: > AFAICT the only real solution here is for np.sum and friends to propagate > the keepdims argument if and only if it was explicitly passed to them (or > maybe the slightly different, if and only if it has a non-default value). > If we just started requiring code to handle it and passing it > unconditionally, then as soon as someone upgraded numpy all their existing > code might break for no good reason. > On May 5, 2015 8:13 AM, "Allan Haldane" wrote: > >> Hello all, >> >> A question: >> >> Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword >> argument, but ndarray subclass methods sometimes don't. The 'matrix' >> subclass doesn't, and numpy functions like 'np.sum' intentionally >> drop/ignore the keepdims argument when called with an ndarray subclass >> as first argument. >> >> This means you can't always use ndarray subclasses as 'drop in' >> replacement for ndarrays if the code uses keepdims (even indirectly), >> and it means code that deals with keepdims (eg np.sum and more) has to >> detect ndarray subclasses and drop keepdims even if the subclass >> supports it (since there is no good way to detect support). It seems to >> me that if we are going to use inheritance, subclass methods should keep >> the signature of the parent class methods. What does the list think? >> >> ---- Details: ---- >> >> This problem comes up in a PR I'm working on (#5706) to add the keepdims >> arg to masked array methods. In order to support masked matrices (which >> a lot of unit tests check), I would have to detect and drop the keepdims >> arg to avoid an exception. This would be solved if the matrix class >> supported keepdims (plus an update to np.sum). Similarly, >> `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it >> could work if all subclasses supported keepdims. >> >> I do not foresee immediate problems with adding keepdims to the matrix >> methods, except that it would be an unused argument. Modifying `np.sum` >> to always pass on the keepdims arg is trickier, since it would break any >> code that tried to np.sum a subclass that doesn't support keepdims, eg >> pandas.DataFrame. **kwargs tricks might work. But if it's permissible I >> think it would be better to require subclasses to support all the >> keyword args ndarray supports. >> >> Allan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue May 5 13:41:51 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 05 May 2015 19:41:51 +0200 Subject: [Numpy-discussion] Should ndarray subclasses support the keepdims arg? In-Reply-To: <5548DE12.2060606@gmail.com> References: <5548DE12.2060606@gmail.com> Message-ID: <1430847711.2930.4.camel@sipsolutions.net> On Di, 2015-05-05 at 11:13 -0400, Allan Haldane wrote: > Hello all, > > A question: > > Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword > argument, but ndarray subclass methods sometimes don't. The 'matrix' > subclass doesn't, and numpy functions like 'np.sum' intentionally > drop/ignore the keepdims argument when called with an ndarray subclass > as first argument. > > This means you can't always use ndarray subclasses as 'drop in' > replacement for ndarrays if the code uses keepdims (even indirectly), > and it means code that deals with keepdims (eg np.sum and more) has to > detect ndarray subclasses and drop keepdims even if the subclass > supports it (since there is no good way to detect support). It seems to > me that if we are going to use inheritance, subclass methods should keep > the signature of the parent class methods. What does the list think? > > ---- Details: ---- > > This problem comes up in a PR I'm working on (#5706) to add the keepdims > arg to masked array methods. In order to support masked matrices (which > a lot of unit tests check), I would have to detect and drop the keepdims > arg to avoid an exception. This would be solved if the matrix class > supported keepdims (plus an update to np.sum). Similarly, > `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it > could work if all subclasses supported keepdims. > > I do not foresee immediate problems with adding keepdims to the matrix > methods, except that it would be an unused argument. Modifying `np.sum` > to always pass on the keepdims arg is trickier, since it would break any > code that tried to np.sum a subclass that doesn't support keepdims, eg > pandas.DataFrame. **kwargs tricks might work. But if it's permissible I > think it would be better to require subclasses to support all the > keyword args ndarray supports. What is the advantage over having an error raised due to the invalid **kwargs trick when the subclass does not support it? First sight it seems like a far shot have a hard requirement. The transition period alone seems hard, unless we have add magic to test the subclass upon creation, and I am not sure that is easy to do (something like ABC conformance test). - Sebastian > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From alexander.brezinov at mmbresearch.com Tue May 5 15:52:44 2015 From: alexander.brezinov at mmbresearch.com (Alexander Brezinov) Date: Tue, 5 May 2015 15:52:44 -0400 Subject: [Numpy-discussion] import scipy.linalg is hanging on Marvell armada 370 Message-ID: Hello The import of scipy.linalg is hanging in DOUBLE_mutiply function (BINARY_LOOP) in umath.so. After attaching the gdb and dumping the local varibles the args are empty strings. Could you please advise if this is known issue? I just search the mailing list and could not find any solution for the problem. I am running: kernel 3.2.36 + Debian wheezy on ARMv71 armhf CPU Armada 370 Marvell python 2.7.3 scipy 0.15.1 numpy 1.9.2 The problem could be reproduced by launching python and importing scipy.linalg(import linalg) I also run the same OS on qemu and was not able to reproduce the issue. Similar architecture such as rasbery pi (ARMv7 armhf) is fine. Also if using software floating point intead of hardware floating point on the same Armada 370 (ARMv7) working just fine. Thank you for any comments or suggestions in advance, Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue May 5 16:39:13 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 5 May 2015 13:39:13 -0700 Subject: [Numpy-discussion] Should ndarray subclasses support the keepdims arg? In-Reply-To: References: <5548DE12.2060606@gmail.com> Message-ID: On May 5, 2015 11:05 AM, "Neil Girdhar" wrote: > > Maybe they should have written their code with **kwargs that consumes all keyword arguments rather than assuming that no keyword arguments would be added? The problem with this approach in general is that it makes writing code unnecessarily convoluted. If the user asked for keepdims=True, then silently ignoring this is worse than raising an error. And I guess I would call this making code necessarily convoluted :-). There are not that many options for evolving an interface shared by multiple unrelated libraries. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From allanhaldane at gmail.com Tue May 5 18:46:24 2015 From: allanhaldane at gmail.com (Allan Haldane) Date: Tue, 05 May 2015 18:46:24 -0400 Subject: [Numpy-discussion] Should ndarray subclasses support the keepdims arg? In-Reply-To: References: <5548DE12.2060606@gmail.com> Message-ID: <55494840.4060705@gmail.com> That makes sense. I think it's the way to go, thanks. The downside is using **kwargs instead of an explicit keepdims arg gives a more obscure signature, but using the new __signature__ attribute this could be hidden for python 3 users and python2 using ipython3+. On 05/05/2015 01:55 PM, Nathaniel Smith wrote: > AFAICT the only real solution here is for np.sum and friends to > propagate the keepdims argument if and only if it was explicitly passed > to them (or maybe the slightly different, if and only if it has a > non-default value). If we just started requiring code to handle it and > passing it unconditionally, then as soon as someone upgraded numpy all > their existing code might break for no good reason. > > On May 5, 2015 8:13 AM, "Allan Haldane" > wrote: > > Hello all, > > A question: > > Many ndarray methods (eg sum, mean, any, min) have a "keepdims" keyword > argument, but ndarray subclass methods sometimes don't. The 'matrix' > subclass doesn't, and numpy functions like 'np.sum' intentionally > drop/ignore the keepdims argument when called with an ndarray subclass > as first argument. > > This means you can't always use ndarray subclasses as 'drop in' > replacement for ndarrays if the code uses keepdims (even indirectly), > and it means code that deals with keepdims (eg np.sum and more) has to > detect ndarray subclasses and drop keepdims even if the subclass > supports it (since there is no good way to detect support). It seems to > me that if we are going to use inheritance, subclass methods should keep > the signature of the parent class methods. What does the list think? > > ---- Details: ---- > > This problem comes up in a PR I'm working on (#5706) to add the keepdims > arg to masked array methods. In order to support masked matrices (which > a lot of unit tests check), I would have to detect and drop the keepdims > arg to avoid an exception. This would be solved if the matrix class > supported keepdims (plus an update to np.sum). Similarly, > `np.sum(mymaskedarray, keepdims=True)` does not respect keepdims, but it > could work if all subclasses supported keepdims. > > I do not foresee immediate problems with adding keepdims to the matrix > methods, except that it would be an unused argument. Modifying `np.sum` > to always pass on the keepdims arg is trickier, since it would break any > code that tried to np.sum a subclass that doesn't support keepdims, eg > pandas.DataFrame. **kwargs tricks might work. But if it's permissible I > think it would be better to require subclasses to support all the > keyword args ndarray supports. > > Allan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Tue May 5 20:18:02 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 May 2015 18:18:02 -0600 Subject: [Numpy-discussion] import scipy.linalg is hanging on Marvell armada 370 In-Reply-To: References: Message-ID: On Tue, May 5, 2015 at 1:52 PM, Alexander Brezinov < alexander.brezinov at mmbresearch.com> wrote: > Hello > > The import of scipy.linalg is hanging in DOUBLE_mutiply function > (BINARY_LOOP) in umath.so. After attaching the gdb and dumping the local > varibles the args are empty strings. Could you please advise if this is > known issue? I just search the mailing list and could not find any solution > for the problem. > > I am running: > > kernel 3.2.36 + Debian wheezy on ARMv71 armhf > CPU Armada 370 Marvell > python 2.7.3 > scipy 0.15.1 > numpy 1.9.2 > > The problem could be reproduced by launching python and importing > scipy.linalg(import linalg) > > I also run the same OS on qemu and was not able to reproduce the issue. > Similar architecture such as rasbery pi (ARMv7 armhf) is fine. Also if > using software floating point intead of hardware floating point on the same > Armada 370 (ARMv7) working just fine. > > > Thank you for any comments or suggestions in advance, > Alex > > Almost sounds like a compiler problem, are you using the correctly compiled version of umath.so? Not that there couldn't be other sources of the problem... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed May 6 05:59:16 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 6 May 2015 02:59:16 -0700 Subject: [Numpy-discussion] Dispatch rules for binary operations on ndarrays In-Reply-To: References: Message-ID: I just wanted to draw the list's attention to a discussion happening on the tracker, about the details of how methods like ndarray.__add__ are implemented, and how this interacts with the new __numpy_ufunc__ method that will make it possible for third party libraries to override arbitrary ufuncs starting in (hopefully) 1.10: https://github.com/numpy/numpy/issues/5844 The details are somewhat arcane, but very important for anyone who implements ndarray-like objects or (to a lesser extent) anyone who subclasses ndarray. So feedback very welcome. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Wed May 6 06:11:10 2015 From: faltet at gmail.com (Francesc Alted) Date: Wed, 6 May 2015 12:11:10 +0200 Subject: [Numpy-discussion] ANN: python-blosc 1.2.7 released Message-ID: ============================= Announcing python-blosc 1.2.7 ============================= What is new? ============ Updated to use c-blosc v1.6.1. Although that this supports AVX2, it is not enabled in python-blosc because we still need a way to devise how to detect AVX2 in the underlying platform. At any rate, c-blosc 1.6.1 fixed an important bug in the blosclz codec that a release was deemed important. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? =========== Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy tool built on Blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Installing ========== python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the python- prefix Download sources ================ The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation ============= There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list ============ There is an official mailing list for Blosc at: blosc at googlegroups.com http://groups.google.es/group/blosc Licenses ======== Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. ---- **Enjoy data!** -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Wed May 6 12:37:24 2015 From: faltet at gmail.com (Francesc Alted) Date: Wed, 6 May 2015 18:37:24 +0200 Subject: [Numpy-discussion] ANN: PyTables 3.2.0 (final) released! Message-ID: =========================== Announcing PyTables 3.2.0 =========================== We are happy to announce PyTables 3.2.0. ******************************* IMPORTANT NOTICE: If you are a user of PyTables, it needs your help to keep going. Please read the next thread as it contains important information about the future (or the lack of it) of the project: https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4 Thanks! ******************************* What's new ========== This is a major release of PyTables and it is the result of more than a year of accumulated patches, but most specially it fixes a couple of nasty problem with indexed queries not returning the correct results in some scenarios. There are many usablity and performance improvements too. In case you want to know more in detail what has changed in this version, please refer to: http://www.pytables.org/release_notes.html You can install it via pip or download a source package with generated PDF and HTML docs from: http://sourceforge.net/projects/pytables/files/pytables/3.2.0 For an online version of the manual, visit: http://www.pytables.org/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers -------------- next part -------------- An HTML attachment was scrubbed... URL: From damilarefagbemi at gmail.com Wed May 6 20:26:02 2015 From: damilarefagbemi at gmail.com (Dammy) Date: Wed, 6 May 2015 17:26:02 -0700 (MST) Subject: [Numpy-discussion] Using gentxt to import a csv with a string class label and hundreds of integer features Message-ID: <1430958362815-40319.post@n7.nabble.com> Hi, I am trying to use numpy.gentxt to import a csv for classification using scikit-learn. The first column in the csv is a string type class label while 200+ extra columns are integer features. Please I wish to find out how I can use the gentext function to specify a dtype of string for the first column while specifying int type for all other columns. I have tried using "dtype=None" as shown below, but when I print dataset.shape, I get (number_or_rows,) i.e no columns are read in: dataset = np.genfromtxt(file,delimiter=',', skip_header=True) I also tried setting the dtypes as shown in the examples below, but I get the same error as dtype=None: a: dataset = np.genfromtxt(file,delimiter=',', skip_header=True, dtype=['S2'] + [ int for n in range(241)],) b: dataset = np.genfromtxt(file,delimiter=',', skip_header=True, dtype=['S2'] + [ int for n in range(241)],names=True ) Any thoughts? Thanks for your assistance. Dammy -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Using-gentxt-to-import-a-csv-with-a-string-class-label-and-hundreds-of-integer-features-tp40319.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From arnaldorusso at gmail.com Thu May 7 09:48:38 2015 From: arnaldorusso at gmail.com (Arnaldo Russo) Date: Thu, 7 May 2015 10:48:38 -0300 Subject: [Numpy-discussion] Using gentxt to import a csv with a string class label and hundreds of integer features In-Reply-To: <1430958362815-40319.post@n7.nabble.com> References: <1430958362815-40319.post@n7.nabble.com> Message-ID: Hi Dammy, I really don't know how to test your issue, but you could try np.readtxt, or in the last case using pandas (read_csv) could do this for you. Cheers, Arnaldo. ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri May 8 08:42:53 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 8 May 2015 14:42:53 +0200 Subject: [Numpy-discussion] Using gentxt to import a csv with a string class label and hundreds of integer features In-Reply-To: <1430958362815-40319.post@n7.nabble.com> References: <1430958362815-40319.post@n7.nabble.com> Message-ID: On Thu, May 7, 2015 at 2:26 AM, Dammy wrote: > Hi, > I am trying to use numpy.gentxt to import a csv for classification using > scikit-learn. The first column in the csv is a string type class label while > 200+ extra columns are integer features. > Please I wish to find out how I can use the gentext function to specify a > dtype of string for the first column while specifying int type for all other > columns. > > I have tried using "dtype=None" as shown below, but when I print > dataset.shape, I get (number_or_rows,) i.e no columns are read in: > dataset = np.genfromtxt(file,delimiter=',', skip_header=True) > > I also tried setting the dtypes as shown in the examples below, but I get > the same error as dtype=None: these dtypes will create structured arrays: http://docs.scipy.org/doc/numpy/user/basics.rec.html so it is expected that the shape is the number of rows, the colums are part of the dtype and can be accessed like a dictionary: In [21]: d = np.ones(3, dtype='S2, int8') In [22]: d Out[22]: array([('1', 1), ('1', 1), ('1', 1)], dtype=[('f0', 'S2'), ('f1', 'i1')]) In [23]: d.shape Out[23]: (3,) In [24]: d.dtype.names Out[24]: ('f0', 'f1') In [25]: d[0] Out[25]: ('1', 1) In [26]: d['f0'] Out[26]: array(['1', '1', '1'], dtype='|S2') In [27]: d['f1'] Out[27]: array([1, 1, 1], dtype=int8) From jaime.frio at gmail.com Sat May 9 13:48:46 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sat, 9 May 2015 10:48:46 -0700 Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? Message-ID: There is a reported bug (issue #5837 ) regarding different returns from np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output: >>> class C(np.ndarray): pass ... >>> a = np.arange(6).view(C) >>> b = np.arange(6).reshape(2, 3).view(C) >>> anz = a.nonzero() >>> bnz = b.nonzero() >>> type(anz[0]) >>> anz[0].flags C_CONTIGUOUS : True F_CONTIGUOUS : True OWNDATA : True WRITEABLE : True ALIGNED : True UPDATEIFCOPY : False >>> anz[0].base >>> type(bnz[0]) >>> bnz[0].flags C_CONTIGUOUS : False F_CONTIGUOUS : False OWNDATA : False WRITEABLE : False ALIGNED : True UPDATEIFCOPY : False >>> bnz[0].base array([[0, 1], [0, 2], [1, 0], [1, 1], [1, 2]]) The original bug report was only concerned with the non-writeability of higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable. I have a branch that attempts to fix this by making both 1-D and n-D arrays: 1. return a view, never the base array, 2. return an ndarray, never a subclass, and 3. return a writeable view. I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions. Since we are changing the returns of a few other functions in 1.10 (diagonal, diag, ravel), it may be a good moment to revisit the behavior for these other functions. Any thoughts? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 9 14:42:50 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 May 2015 11:42:50 -0700 Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? In-Reply-To: References: Message-ID: On May 9, 2015 10:48 AM, "Jaime Fern?ndez del R?o" wrote: > > There is a reported bug (issue #5837) regarding different returns from np.nonzero with 1-D vs higher dimensional arrays. A full summary of the differences can be seen from the following output: > > >>> class C(np.ndarray): pass > ... > >>> a = np.arange(6).view(C) > >>> b = np.arange(6).reshape(2, 3).view(C) > >>> anz = a.nonzero() > >>> bnz = b.nonzero() > > >>> type(anz[0]) > > >>> anz[0].flags > C_CONTIGUOUS : True > F_CONTIGUOUS : True > OWNDATA : True > WRITEABLE : True > ALIGNED : True > UPDATEIFCOPY : False > >>> anz[0].base > > >>> type(bnz[0]) > > >>> bnz[0].flags > C_CONTIGUOUS : False > F_CONTIGUOUS : False > OWNDATA : False > WRITEABLE : False > ALIGNED : True > UPDATEIFCOPY : False > >>> bnz[0].base > array([[0, 1], > [0, 2], > [1, 0], > [1, 1], > [1, 2]]) > > The original bug report was only concerned with the non-writeability of higher dimensional array returns, but there are more differences: 1-D always returns an ndarray that owns its memory and is writeable, but higher dimensional arrays return views, of the type of the original array, that are non-writeable. > > I have a branch that attempts to fix this by making both 1-D and n-D arrays: > return a view, never the base array, This doesn't matter, does it? "View" isn't a thing, only "view of" is meaningful. And in this case, none of the returned arrays share any memory with any other arrays that the user has access to... so whether they were created as a view or not should be an implementation detail that's transparent to the user? > return an ndarray, never a subclass, and > return a writeable view. > I guess the most controversial choice is #2, and in fact making that change breaks a few tests. I nevertheless think that all of the index returning functions (nonzero, argsort, argmin, argmax, argpartition) should always return a bare ndarray, not a subclass. I'd be happy to be corrected, but I can't think of any situation in which preserving the subclass would be needed for these functions. I also can't see any logical reason why the return type of these functions has anything to do with the type of the inputs. You can index me with my phone number but my phone number is not a person. OTOH logic and ndarray subclassing don't have much to do with each other; the practical effect is probably more important. Looking at the subclasses I know about (masked arrays, np.matrix, and astropy quantities), though, I also can't see much benefit in copying the subclass of the input, and the fact that we were never consistent about this suggests that people probably aren't depending on it too much. So in summary my feeling is: +1 to making then writable, no objection to the view thing (though I don't see how it matters), and provisional +1 to consistently returning ndarray (to be revised if the people who use the subclassing functionality disagree). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat May 9 15:53:31 2015 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 9 May 2015 15:53:31 -0400 Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? In-Reply-To: References: Message-ID: Absolutely, it should be writable. As for subclassing, that might be messy. Consider the following: inds = np.where(data > 5) In that case, I'd expect a normal, bog-standard ndarray because that is what you use for indexing (although pandas might have a good argument for having it return one of their special indexing types if "data" was a pandas array...). Next: foobar = np.where(data > 5, 1, 2) Again, I'd expect a normal, bog-standard ndarray because the scalar elements are very simple. This question gets very complicated when considering array arguments. Consider: merged_data = np.where(data > 5, data, data2) So, what should "merged_data" be? If both "data" and "data2" are the same types, then it would be reasonable to return the same type, if possible. But what if they aren't the same? Maybe use array_priority to determine the return type? Or, perhaps it does make sense to say "sod it all" and always return an ndarray? I don't know the answer. I do find it interesting that the result from a multi-dimensional array is not writable. I don't know why I have never encountered that. Ben Root On Sat, May 9, 2015 at 2:42 PM, Nathaniel Smith wrote: > On May 9, 2015 10:48 AM, "Jaime Fern?ndez del R?o" > wrote: > > > > There is a reported bug (issue #5837) regarding different returns from > np.nonzero with 1-D vs higher dimensional arrays. A full summary of the > differences can be seen from the following output: > > > > >>> class C(np.ndarray): pass > > ... > > >>> a = np.arange(6).view(C) > > >>> b = np.arange(6).reshape(2, 3).view(C) > > >>> anz = a.nonzero() > > >>> bnz = b.nonzero() > > > > >>> type(anz[0]) > > > > >>> anz[0].flags > > C_CONTIGUOUS : True > > F_CONTIGUOUS : True > > OWNDATA : True > > WRITEABLE : True > > ALIGNED : True > > UPDATEIFCOPY : False > > >>> anz[0].base > > > > >>> type(bnz[0]) > > > > >>> bnz[0].flags > > C_CONTIGUOUS : False > > F_CONTIGUOUS : False > > OWNDATA : False > > WRITEABLE : False > > ALIGNED : True > > UPDATEIFCOPY : False > > >>> bnz[0].base > > array([[0, 1], > > [0, 2], > > [1, 0], > > [1, 1], > > [1, 2]]) > > > > The original bug report was only concerned with the non-writeability of > higher dimensional array returns, but there are more differences: 1-D > always returns an ndarray that owns its memory and is writeable, but higher > dimensional arrays return views, of the type of the original array, that > are non-writeable. > > > > I have a branch that attempts to fix this by making both 1-D and n-D > arrays: > > return a view, never the base array, > > This doesn't matter, does it? "View" isn't a thing, only "view of" is > meaningful. And in this case, none of the returned arrays share any memory > with any other arrays that the user has access to... so whether they were > created as a view or not should be an implementation detail that's > transparent to the user? > > > return an ndarray, never a subclass, and > > return a writeable view. > > I guess the most controversial choice is #2, and in fact making that > change breaks a few tests. I nevertheless think that all of the index > returning functions (nonzero, argsort, argmin, argmax, argpartition) should > always return a bare ndarray, not a subclass. I'd be happy to be corrected, > but I can't think of any situation in which preserving the subclass would > be needed for these functions. > > I also can't see any logical reason why the return type of these functions > has anything to do with the type of the inputs. You can index me with my > phone number but my phone number is not a person. OTOH logic and ndarray > subclassing don't have much to do with each other; the practical effect is > probably more important. Looking at the subclasses I know about (masked > arrays, np.matrix, and astropy quantities), though, I also can't see much > benefit in copying the subclass of the input, and the fact that we were > never consistent about this suggests that people probably aren't depending > on it too much. > > So in summary my feeling is: +1 to making then writable, no objection to > the view thing (though I don't see how it matters), and provisional +1 to > consistently returning ndarray (to be revised if the people who use the > subclassing functionality disagree). > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 9 16:03:07 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 May 2015 13:03:07 -0700 Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? In-Reply-To: References: Message-ID: On May 9, 2015 12:54 PM, "Benjamin Root" wrote: > > Absolutely, it should be writable. As for subclassing, that might be messy. Consider the following: > > inds = np.where(data > 5) > > In that case, I'd expect a normal, bog-standard ndarray because that is what you use for indexing (although pandas might have a good argument for having it return one of their special indexing types if "data" was a pandas array...). Pandas doesn't subclass ndarray (anymore), so they're irrelevant to this particular discussion :-). Of course they're an argument for having a cleaner more general way of allowing non-ndarray array-like objects, but the legacy subclassing system will never be that. > Next: > > foobar = np.where(data > 5, 1, 2) > > Again, I'd expect a normal, bog-standard ndarray because the scalar elements are very simple. This question gets very complicated when considering array arguments. Consider: > > merged_data = np.where(data > 5, data, data2) > > So, what should "merged_data" be? If both "data" and "data2" are the same types, then it would be reasonable to return the same type, if possible. But what if they aren't the same? Maybe use array_priority to determine the return type? Or, perhaps it does make sense to say "sod it all" and always return an ndarray? Not sure what this has to do with Jaime's post about nonzero? There is indeed a potential question about what 3-argument where() should do with subclasses, but that's effectively a different operation entirely and to discuss it we'd need to know things like what it historically has done and why that was causing problems. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 9 16:26:58 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 May 2015 13:26:58 -0700 Subject: [Numpy-discussion] Proposed deprecations for 1.10: dot corner cases Message-ID: Hi all, I'd like to suggest that we go ahead and add deprecation warnings to the following operations. This doesn't commit us to changing anything on any particular time scale, but it gives us more options later. 1) dot(A, B) where A and B *both* have *3 or more dimensions*: currently, this does a weird "outer product" thing, where it computes all pairwise matrix products. We've had numerous discussions about why this is suboptimal, and it contradicts the PEP 465 semantics for @, which broadcast + vectorize over extra dimensions. (If you have a vectorized version, then the outer product one is easy to derive; if you have only the outer product version .) While dot() is widely used in general, this particular varient is very, very rarely used. I propose we issue a FutureWarning here, so as to lay the groundwork for someday eventually making dot() and @ the same. 2) dot(A, B) where one of the argument is a scalar: currently, this does scalar multiplication. There is no logically consistent motivation for this, it violates TOOWTDI, and again it is inconsistent with the PEP semantics for @ (which are that this case should be an error). (NB for those still using np.matrix: scalar * np.matrix will still be supported regardless; this would only affect expressions where you actually call the dot() function.) I propose to make this a DeprecationWarning. -- Nathaniel J. Smith -- http://vorpus.org From ben.root at ou.edu Sat May 9 16:27:03 2015 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 9 May 2015 16:27:03 -0400 Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? In-Reply-To: References: Message-ID: On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith wrote: > Not sure what this has to do with Jaime's post about nonzero? There is > indeed a potential question about what 3-argument where() should do with > subclasses, but that's effectively a different operation entirely and to > discuss it we'd need to know things like what it historically has done and > why that was causing problems. Because my train of thought started at np.nonzero(), which I have always just mentally mapped to np.where(), and then... squirrel! Indeed, np.where() has no bearing here. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat May 9 16:56:24 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 9 May 2015 13:56:24 -0700 Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? In-Reply-To: References: Message-ID: On Sat, May 9, 2015 at 1:27 PM, Benjamin Root wrote: > > On Sat, May 9, 2015 at 4:03 PM, Nathaniel Smith wrote: >> >> Not sure what this has to do with Jaime's post about nonzero? There is >> indeed a potential question about what 3-argument where() should do with >> subclasses, but that's effectively a different operation entirely and to >> discuss it we'd need to know things like what it historically has done and >> why that was causing problems. > > Because my train of thought started at np.nonzero(), which I have always > just mentally mapped to np.where(), and then... squirrel! > > Indeed, np.where() has no bearing here. Ah, gotcha :-). There is an argument that we should try to reduce this confusion by nudging people to use np.nonzero() consistently instead of np.where(), via the documentation and/or a warning message... -- Nathaniel J. Smith -- http://vorpus.org From shoyer at gmail.com Sat May 9 21:53:42 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Sat, 09 May 2015 18:53:42 -0700 (PDT) Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? In-Reply-To: References: Message-ID: <1431222821751.71592119@Nodemailer> With regards to np.where -- shouldn't where be a ufunc, so subclasses or other array-likes can be control its behavior with __numpy_ufunc__? As for the other indexing functions, I don't have a strong opinion about how they should handle subclasses. But it is certainly tricky to attempt to handle handle arbitrary subclasses. I would agree that the least error prone thing to do is usually to return base ndarrays. Better to force subclasses to override methods explicitly. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Sun May 10 06:33:30 2015 From: stefan.otte at gmail.com (Stefan Otte) Date: Sun, 10 May 2015 10:33:30 +0000 Subject: [Numpy-discussion] Generalize hstack/vstack --> stack; Block matrices like in matlab In-Reply-To: References: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org> Message-ID: Hey, Just a quick update. I updated the pull request and renamed `stack` into `block`. Have a look: https://github.com/numpy/numpy/pull/5057 I'm sticking with simple initial implementation because it's simple and does what you think it does. Cheers, Stefan On Fri, Oct 31, 2014 at 2:13 PM Stefan Otte wrote: > To make the last point more concrete the implementation could look > something like this (note that I didn't test it and that it still > takes some work): > > > def bmat(obj, ldict=None, gdict=None): > return matrix(stack(obj, ldict, gdict)) > > > def stack(obj, ldict=None, gdict=None): > # the old bmat code minus the matrix calls > if isinstance(obj, str): > if gdict is None: > # get previous frame > frame = sys._getframe().f_back > glob_dict = frame.f_globals > loc_dict = frame.f_locals > else: > glob_dict = gdict > loc_dict = ldict > return _from_string(obj, glob_dict, loc_dict) > > if isinstance(obj, (tuple, list)): > # [[A,B],[C,D]] > arr_rows = [] > for row in obj: > if isinstance(row, N.ndarray): # not 2-d > return concatenate(obj, axis=-1) > else: > arr_rows.append(concatenate(row, axis=-1)) > return concatenate(arr_rows, axis=0) > > if isinstance(obj, N.ndarray): > return obj > > > I basically turned the old `bmat` into `stack` and removed the matrix > calls. > > > Best, > Stefan > > > > On Wed, Oct 29, 2014 at 3:59 PM, Stefan Otte > wrote: > > Hey, > > > > there are several ways how to proceed. > > > > - My proposed solution covers the 80% case quite well (at least I use > > it all the time). I'd convert the doctests into unittests and we're > > done. > > > > - We could slightly change the interface to leave out the surrounding > > square brackets, i.e. turning `stack([[a, b], [c, d]])` into > > `stack([a, b], [c, d])` > > > > - We could extend it even further allowing a "filler value" for non > > set values and a "shape" argument. This could be done later as well. > > > > - `bmat` is not really matrix specific. We could refactor `bmat` a bit > > to use the same logic in `stack`. Except the `matrix` calls `bmat` and > > `_from_string` are pretty agnostic to the input. > > > > I'm in favor of the first or last approach. The first: because it > > already works and is quite simple. The last: because the logic and > > tests of both `bmat` and `stack` would be the same and the feature to > > specify a string representation of the block matrix is nice. > > > > > > Best, > > Stefan > > > > > > > > On Tue, Oct 28, 2014 at 7:46 PM, Nathaniel Smith wrote: > >> On 28 Oct 2014 18:34, "Stefan Otte" wrote: > >>> > >>> Hey, > >>> > >>> In the last weeks I tested `np.asarray(np.bmat(....))` as `stack` > >>> function and it works quite well. So the question persits: If `bmat` > >>> already offers something like `stack` should we even bother > >>> implementing `stack`? More code leads to more > >>> bugs and maintenance work. (However, the current implementation is > >>> only 5 lines and by using `bmat` which would reduce that even more.) > >> > >> In the long run we're trying to reduce usage of np.matrix and ideally > >> deprecate it entirely. So yes, providing ndarray equivalents of matrix > >> functionality (like bmat) is valuable. > >> > >> -n > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Sun May 10 07:40:52 2015 From: stefan.otte at gmail.com (Stefan Otte) Date: Sun, 10 May 2015 11:40:52 +0000 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative Message-ID: Hey, quite often I want to evaluate a function on a grid in a n-D space. What I end up doing (and what I really dislike) looks something like this: x = np.linspace(0, 5, 20) M1, M2 = np.meshgrid(x, x) X = np.column_stack([M1.flatten(), M2.flatten()]) X.shape # (400, 2) fancy_function(X) I don't think I ever used `meshgrid` in any other way. Is there a better way to create such a grid space? I wrote myself a little helper function: def gridspace(linspaces): return np.column_stack([space.flatten() for space in np.meshgrid(*linspaces)]) But maybe something like this should be part of numpy? Best, Stefan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Sun May 10 10:05:02 2015 From: stefan.otte at gmail.com (Stefan Otte) Date: Sun, 10 May 2015 16:05:02 +0200 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: I just drafted different versions of the `gridspace` function: https://tmp23.tmpnb.org/user/1waoqQ8PJBJ7/notebooks/2015-05%20gridspace.ipynb Beste Gr??e, Stefan On Sun, May 10, 2015 at 1:40 PM, Stefan Otte wrote: > Hey, > > quite often I want to evaluate a function on a grid in a n-D space. > What I end up doing (and what I really dislike) looks something like this: > > x = np.linspace(0, 5, 20) > M1, M2 = np.meshgrid(x, x) > X = np.column_stack([M1.flatten(), M2.flatten()]) > X.shape # (400, 2) > > fancy_function(X) > > I don't think I ever used `meshgrid` in any other way. > Is there a better way to create such a grid space? > > I wrote myself a little helper function: > > def gridspace(linspaces): > return np.column_stack([space.flatten() > for space in np.meshgrid(*linspaces)]) > > But maybe something like this should be part of numpy? > > > Best, > Stefan > From jaime.frio at gmail.com Sun May 10 12:22:51 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sun, 10 May 2015 09:22:51 -0700 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: On Sun, May 10, 2015 at 7:05 AM, Stefan Otte wrote: > I just drafted different versions of the `gridspace` function: > > https://tmp23.tmpnb.org/user/1waoqQ8PJBJ7/notebooks/2015-05%20gridspace.ipynb The link seems to be broken... Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From aymeric.rateau at gmail.com Sun May 10 15:11:29 2015 From: aymeric.rateau at gmail.com (Gmail) Date: Sun, 10 May 2015 21:11:29 +0200 Subject: [Numpy-discussion] read not byte aligned records In-Reply-To: References: <5547D3E6.9080400@gmail.com> <20150505072124.aa8746c35d26992bb5f16ec2@esrf.fr> <080ff0d475a9941f5f752078524158b7@ratal.org> Message-ID: <554FAD61.9000809@gmail.com> For the archive, I tried to use bitarray instead of bitstring and for same file parsing went from 180ms to 60ms. Code was finally shorter and more simple but less easy to jump into (documentation). Performance is still far from using fromstring or fromfile which gives like 5ms for similar size of file but byte aligned. Aymeric my code is below: def readBitarray(self, bita, channelList=None): """ reads stream of record bytes using bitarray module needed for not byte aligned data Parameters ------------ bitarray : stream stream of bytes channelList : List of str, optional Returns -------- rec : numpy recarray contains a matrix of raw data in a recarray (attributes corresponding to channel name) """ from bitarray import bitarray B = bitarray(endian="little") # little endian by default B.frombytes(bytes(bita)) # initialise data structure if channelList is None: channelList = self.channelNames format = [] for channel in self: if channel.name in channelList: format.append(channel.RecordFormat) buf = recarray(self.numberOfRecords, format) # read data for chan in range(len(self)): if self[chan].name in channelList: record_bit_size = self.CGrecordLength * 8 temp = [B[self[chan].posBitBeg + record_bit_size * i:\ self[chan].posBitEnd + record_bit_size * i]\ for i in range(self.numberOfRecords)] nbytes = len(temp[0].tobytes()) if not nbytes == self[chan].nBytes and \ self[chan].signalDataType not in (6, 7, 8, 9, 10, 11, 12): # not Ctype byte length byte = 8 * (self[chan].nBytes - nbytes) * bitarray([False]) for i in range(self.numberOfRecords): # extend data of bytes to match numpy requirement temp[i].append(byte) temp = [self[chan].CFormat.unpack(temp[i].tobytes())[0] \ for i in range(self.numberOfRecords)] buf[self[chan].name] = asarray(temp) return buf Le 05/05/15 15:39, Benjamin Root a ?crit : > I have been very happy with the bitarray package. I don't know if it > is faster than bitstring, but it is worth a mention. Just watch out > for any hashing operations on its objects, it doesn't seem to do them > right (set(), dict(), etc...), but comparison operations work just fine. > > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Sun May 10 17:46:12 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sun, 10 May 2015 14:46:12 -0700 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: On Sun, May 10, 2015 at 4:40 AM, Stefan Otte wrote: > Hey, > > quite often I want to evaluate a function on a grid in a n-D space. > What I end up doing (and what I really dislike) looks something like this: > > x = np.linspace(0, 5, 20) > M1, M2 = np.meshgrid(x, x) > X = np.column_stack([M1.flatten(), M2.flatten()]) > X.shape # (400, 2) > > fancy_function(X) > > I don't think I ever used `meshgrid` in any other way. > Is there a better way to create such a grid space? > > I wrote myself a little helper function: > > def gridspace(linspaces): > return np.column_stack([space.flatten() > for space in np.meshgrid(*linspaces)]) > > But maybe something like this should be part of numpy? > Isn't what you are trying to build a cartesian product function? There is a neat, efficient implementation of such a function in StackOverflow, by our own pv.: http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays/1235363#1235363 Perhaps we could make this part of numpy.lib.arraysetops? Isthere room for other combinatoric generators, i.e. combinations, permutations... as in itertools? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun May 10 20:44:33 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 10 May 2015 17:44:33 -0700 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: On Sun, May 10, 2015 at 4:40 AM, Stefan Otte wrote: > Hey, > > quite often I want to evaluate a function on a grid in a n-D space. > What I end up doing (and what I really dislike) looks something like this: > > x = np.linspace(0, 5, 20) > M1, M2 = np.meshgrid(x, x) > X = np.column_stack([M1.flatten(), M2.flatten()]) > X.shape # (400, 2) > > fancy_function(X) > > I don't think I ever used `meshgrid` in any other way. > Is there a better way to create such a grid space? I feel like our "house style" has moved away from automatic flattening, and would maybe we should be nudging people towards something more like # using proposed np.stack from pull request #5605 X = np.stack(np.meshgrid(x, x), axis=-1) assert X.shape == (20, 20, 2) fancy_function(X) # vectorized to accept any array with shape (..., 2) -n -- Nathaniel J. Smith -- http://vorpus.org From stefanv at berkeley.edu Sun May 10 21:07:09 2015 From: stefanv at berkeley.edu (Stefan van der Walt) Date: Sun, 10 May 2015 18:07:09 -0700 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: <871tinorb6.fsf@berkeley.edu> On 2015-05-10 14:46:12, Jaime Fern?ndez del R?o wrote: > Isn't what you are trying to build a cartesian product function? > There is a neat, efficient implementation of such a function in > StackOverflow, by our own pv.: > > http://stackoverflow.com/questions/1208118/using-numpy-to-build-an-array-of-all-combinations-of-two-arrays/1235363#1235363 And a slightly faster version just down that page ;) St?fan From jeffreback at gmail.com Mon May 11 11:42:11 2015 From: jeffreback at gmail.com (Jeff Reback) Date: Mon, 11 May 2015 11:42:11 -0400 Subject: [Numpy-discussion] ANN: pandas 0.16.1 released Message-ID: Hello, We are proud to announce v0.16.1 of pandas, a minor release from 0.16.0. This release includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. This was a release of 7 weeks with 222 commits by 57 authors encompassing 85 issues. We recommend that all users upgrade to this version. *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. Highlights of this release include: - Support for *CategoricalIndex*, a category based index, see here - New section on how-to-contribute to *pandas*, see here - Revised "Merge, join, and concatenate" documentation, including graphical examples to make it easier to understand each operations, see here - New method *sample* for drawing random samples from Series, DataFrames and Panels. See here - The default *Index* printing has changed to a more uniform format, see here - *BusinessHour* datetime-offset is now supported, see here - Further enhancement to the *.str* accessor to make string operations easier, see here See the Whatsnew in v0.16.1 Documentation: http://pandas.pydata.org/pandas-docs/stable/ Source tarballs, windows binaries are available on PyPI: https://pypi.python.org/pypi/pandas windows binaries are courtesy of Christoph Gohlke and are built on Numpy 1.8 macosx wheels are courtesy of Matthew Brett Please report any issues here: https://github.com/pydata/pandas/issues Thanks The Pandas Development Team Contributors to the 0.16.1 release - - Alfonso MHC - Andy Hayden - Artemy Kolchinsky - Chris Gilmer - Chris Grinolds - Dan Birken - David BROCHART - David Hirschfeld - David Stephens - Dr. Leo - Evan Wright - Frans van Dunn? - Hatem Nassrat - Henning Sperr - Hugo Herter - Jan Schulz - Jeff Blackburne - Jeff Reback - Jim Crist - Jonas Abernot - Joris Van den Bossche - Kerby Shedden - Leo Razoumov - Manuel Riel - Mortada Mehyar - Nick Burns - Nick Eubank - Olivier Grisel - Phillip Cloud - Pietro Battiston - Roy Hyunjin Han - Sam Zhang - Scott Sanderson - Stephan Hoyer - Tiago Antao - Tom Ajamian - Tom Augspurger - Tomaz Berisa - Vikram Shirgur - Vladimir Filimonov - William Hogman - Yasin A - Younggun Kim - behzad nouri - dsm054 - floydsoft - flying-sheep - gfr - jnmclarty - jreback - ksanghai - lucas - mschmohl - ptype - rockg - scls19fr - sinhrks -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon May 11 15:43:51 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 11 May 2015 15:43:51 -0400 Subject: [Numpy-discussion] Proposed deprecations for 1.10: dot corner cases In-Reply-To: References: Message-ID: <55510677.40804@gmail.com> On 5/9/2015 4:26 PM, Nathaniel Smith wrote: > dot(A, B) where one of the argument is a scalar: currently, this > does scalar multiplication. There is no logically consistent > motivation for this, it violates TOOWTDI, and again it is inconsistent > with the PEP semantics for @ (which are that this case should be an > error). Do I recall incorrectly: I thought that reconciliation of `@` and `dot` was explicitly not part of the project on getting a `@` operator? I do not mean this to speak for or against the change above, which I only moderately oppose, but rather to the argument offered. As for the "logic" of the current behavior, can it not be given a tensor product motivation? (Otoh, it conflicts with the current behavior of `vdot`.) Alan From njs at pobox.com Mon May 11 15:52:55 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 11 May 2015 12:52:55 -0700 Subject: [Numpy-discussion] Proposed deprecations for 1.10: dot corner cases In-Reply-To: <55510677.40804@gmail.com> References: <55510677.40804@gmail.com> Message-ID: On May 11, 2015 12:44 PM, "Alan G Isaac" wrote: > > On 5/9/2015 4:26 PM, Nathaniel Smith wrote: > > dot(A, B) where one of the argument is a scalar: currently, this > > does scalar multiplication. There is no logically consistent > > motivation for this, it violates TOOWTDI, and again it is inconsistent > > with the PEP semantics for @ (which are that this case should be an > > error). > > Do I recall incorrectly: I thought that reconciliation of `@` and `dot` > was explicitly not part of the project on getting a `@` operator? > > I do not mean this to speak for or against the change above, which I only > moderately oppose, but rather to the argument offered. Not sure what you mean. It's true that PEP 465 doesn't say anything about np.dot, because it's out of scope. The argument here, though, is not "PEP 465 says we have to do this". It's that it's confusing to have two different subtly different sets of semantics, and the PEP semantics are better (that's why we chose them), so we should at a minimum warn people who are getting the old behavior > As for the "logic" of the current behavior, can it not be given a > tensor product motivation? (Otoh, it conflicts with the current > behavior of `vdot`.) Maybe? I don't know of any motivation that doesn't require treating it as a special case added only to duplicate existing behavior, but that doesn't mean one doesnt exist... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Mon May 11 16:07:51 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 11 May 2015 13:07:51 -0700 Subject: [Numpy-discussion] Proposed deprecations for 1.10: dot corner cases In-Reply-To: References: Message-ID: On Sat, May 9, 2015 at 1:26 PM, Nathaniel Smith wrote: > I'd like to suggest that we go ahead and add deprecation warnings to > the following operations. This doesn't commit us to changing anything > on any particular time scale, but it gives us more options later. > These both get a strong +1 from me. How long has the "outer product" behavior for np.dot been around? -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon May 11 17:53:46 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 11 May 2015 17:53:46 -0400 Subject: [Numpy-discussion] Proposed deprecations for 1.10: dot corner cases In-Reply-To: References: <55510677.40804@gmail.com> Message-ID: <555124EA.5000505@gmail.com> On 5/11/2015 3:52 PM, Nathaniel Smith wrote: > Not sure what you mean. It's true that PEP 465 doesn't say anything about np.dot, because it's out of scope. The argument here, though, is not "PEP > 465 says we have to do this". It's that it's confusing to have two different subtly different sets of semantics, and the PEP semantics are better > (that's why we chose them), so we should at a minimum warn people who are getting the old behavior I would have to dig around, but I am pretty sure there were explicit statements that `@` would neither be bound by the behavior of `dot` nor expected to be reconciled with it. I agree that where `@` and `dot` differ in behavior, this should be clearly documented. I would hope that the behavior of `dot` would not change. Alan From shoyer at gmail.com Mon May 11 23:13:24 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Mon, 11 May 2015 20:13:24 -0700 Subject: [Numpy-discussion] Proposed deprecations for 1.10: dot corner cases In-Reply-To: <555124EA.5000505@gmail.com> References: <55510677.40804@gmail.com> <555124EA.5000505@gmail.com> Message-ID: On Mon, May 11, 2015 at 2:53 PM, Alan G Isaac wrote: > I agree that where `@` and `dot` differ in behavior, this should be > clearly documented. > I would hope that the behavior of `dot` would not change. Even if np.dot never changes (and indeed, perhaps it should not), issuing these warnings seems like a good idea to me, once we have @ implemented with the new behavior (and the @ operator backported from Python <3.5 as a numpy function). I expect that this warning would serve the useful purpose of reminding users writing code intended to be used on earlier versions of numpy/python that @ and np.dot don't work exactly the same way. As Nathaniel already mentioned, it is quite straightforward to implement the "outer product" behavior using the new @ behavior, so it will not be much of a hassle to update code to remove the warning. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Tue May 12 04:17:12 2015 From: stefan.otte at gmail.com (Stefan Otte) Date: Tue, 12 May 2015 08:17:12 +0000 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: Hello, indeed I was looking for the cartesian product. I timed the two stackoverflow answers and the winner is not quite as clear: n_elements: 10 cartesian 0.00427 cartesian2 0.00172 n_elements: 100 cartesian 0.02758 cartesian2 0.01044 n_elements: 1000 cartesian 0.97628 cartesian2 1.12145 n_elements: 5000 cartesian 17.14133 cartesian2 31.12241 (This is for two arrays as parameters: np.linspace(0, 1, n_elements)) cartesian2 seems to be slower for bigger. I'd really appreciate if this was be part of numpy. Should I create a pull request? Regarding combinations and permutations: I could be convenient to have as well. Cheers, Stefan -------------- next part -------------- An HTML attachment was scrubbed... URL: From johannes.kulick at ipvs.uni-stuttgart.de Tue May 12 04:57:28 2015 From: johannes.kulick at ipvs.uni-stuttgart.de (Johannes Kulick) Date: Tue, 12 May 2015 10:57:28 +0200 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: <20150512085728.24178.24849@quirm.robotics.tu-berlin.de> I'm totally in favor of the 'gridspace(linspaces)' version, as you probably end up wanting to create grids of other things than linspaces (e.g. a logspace grid, or a grid of random points etc.). It should be called somewhat different though. Maybe 'cartesian(arrays)'? Best, Johannes Quoting Stefan Otte (2015-05-10 16:05:02) > I just drafted different versions of the `gridspace` function: > https://tmp23.tmpnb.org/user/1waoqQ8PJBJ7/notebooks/2015-05%20gridspace.ipynb > > > Beste Gr??e, > Stefan > > > > On Sun, May 10, 2015 at 1:40 PM, Stefan Otte wrote: > > Hey, > > > > quite often I want to evaluate a function on a grid in a n-D space. > > What I end up doing (and what I really dislike) looks something like this: > > > > x = np.linspace(0, 5, 20) > > M1, M2 = np.meshgrid(x, x) > > X = np.column_stack([M1.flatten(), M2.flatten()]) > > X.shape # (400, 2) > > > > fancy_function(X) > > > > I don't think I ever used `meshgrid` in any other way. > > Is there a better way to create such a grid space? > > > > I wrote myself a little helper function: > > > > def gridspace(linspaces): > > return np.column_stack([space.flatten() > > for space in np.meshgrid(*linspaces)]) > > > > But maybe something like this should be part of numpy? > > > > > > Best, > > Stefan > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Question: What is the weird attachment to all my emails? Answer: http://en.wikipedia.org/wiki/Digital_signature From jaime.frio at gmail.com Tue May 12 08:01:26 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 12 May 2015 05:01:26 -0700 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: On Tue, May 12, 2015 at 1:17 AM, Stefan Otte wrote: > Hello, > > indeed I was looking for the cartesian product. > > I timed the two stackoverflow answers and the winner is not quite as clear: > > n_elements: 10 cartesian 0.00427 cartesian2 0.00172 > n_elements: 100 cartesian 0.02758 cartesian2 0.01044 > n_elements: 1000 cartesian 0.97628 cartesian2 1.12145 > n_elements: 5000 cartesian 17.14133 cartesian2 31.12241 > > (This is for two arrays as parameters: np.linspace(0, 1, n_elements)) > cartesian2 seems to be slower for bigger. > On my system, the following variation on Pauli's answer is 2-4x faster than his for your test cases: def cartesian4(arrays, out=None): arrays = [np.asarray(x).ravel() for x in arrays] dtype = np.result_type(*arrays) n = np.prod([arr.size for arr in arrays]) if out is None: out = np.empty((len(arrays), n), dtype=dtype) else: out = out.T for j, arr in enumerate(arrays): n /= arr.size out.shape = (len(arrays), -1, arr.size, n) out[j] = arr[np.newaxis, :, np.newaxis] out.shape = (len(arrays), -1) return out.T > I'd really appreciate if this was be part of numpy. Should I create a pull > request? > There hasn't been any opposition, quite the contrary, so yes, I would go ahead an create that PR. I somehow feel this belongs with the set operations, rather than with the indexing ones. Other thoughts? Also for consideration: should it work on flattened arrays? or should we give it an axis argument, and then "broadcast on the rest", a la generalized ufunc? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Tue May 12 09:29:10 2015 From: stefan.otte at gmail.com (Stefan Otte) Date: Tue, 12 May 2015 13:29:10 +0000 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: Hey, here is an ipython notebook with benchmarks of all implementations (scroll to the bottom for plots): https://github.com/sotte/ipynb_snippets/blob/master/2015-05%20gridspace%20-%20cartesian.ipynb Overall, Jaime's version is the fastest. On Tue, May 12, 2015 at 2:01 PM Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Tue, May 12, 2015 at 1:17 AM, Stefan Otte > wrote: > >> Hello, >> >> indeed I was looking for the cartesian product. >> >> I timed the two stackoverflow answers and the winner is not quite as >> clear: >> >> n_elements: 10 cartesian 0.00427 cartesian2 0.00172 >> n_elements: 100 cartesian 0.02758 cartesian2 0.01044 >> n_elements: 1000 cartesian 0.97628 cartesian2 1.12145 >> n_elements: 5000 cartesian 17.14133 cartesian2 31.12241 >> >> (This is for two arrays as parameters: np.linspace(0, 1, n_elements)) >> cartesian2 seems to be slower for bigger. >> > > On my system, the following variation on Pauli's answer is 2-4x faster > than his for your test cases: > > def cartesian4(arrays, out=None): > arrays = [np.asarray(x).ravel() for x in arrays] > dtype = np.result_type(*arrays) > > n = np.prod([arr.size for arr in arrays]) > if out is None: > out = np.empty((len(arrays), n), dtype=dtype) > else: > out = out.T > > for j, arr in enumerate(arrays): > n /= arr.size > out.shape = (len(arrays), -1, arr.size, n) > out[j] = arr[np.newaxis, :, np.newaxis] > out.shape = (len(arrays), -1) > > return out.T > > >> I'd really appreciate if this was be part of numpy. Should I create a >> pull request? >> > > There hasn't been any opposition, quite the contrary, so yes, I would go > ahead an create that PR. I somehow feel this belongs with the set > operations, rather than with the indexing ones. Other thoughts? > > Also for consideration: should it work on flattened arrays? or should we > give it an axis argument, and then "broadcast on the rest", a la > generalized ufunc? > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From m.h.vankerkwijk at gmail.com Tue May 12 11:49:07 2015 From: m.h.vankerkwijk at gmail.com (Marten van Kerkwijk) Date: Tue, 12 May 2015 11:49:07 -0400 Subject: [Numpy-discussion] Bug in np.nonzero / Should index returning functions return ndarray subclasses? In-Reply-To: <1431222821751.71592119@Nodemailer> References: <1431222821751.71592119@Nodemailer> Message-ID: Agreed that indexing functions should return bare `ndarray`. Note that in Jaime's PR one can override it anyway by defining __nonzero__. -- Marten On Sat, May 9, 2015 at 9:53 PM, Stephan Hoyer wrote: > With regards to np.where -- shouldn't where be a ufunc, so subclasses or > other array-likes can be control its behavior with __numpy_ufunc__? > > As for the other indexing functions, I don't have a strong opinion about > how they should handle subclasses. But it is certainly tricky to attempt to > handle handle arbitrary subclasses. I would agree that the least error > prone thing to do is usually to return base ndarrays. Better to force > subclasses to override methods explicitly. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Tue May 12 12:20:15 2015 From: ellisonbg at gmail.com (Brian Granger) Date: Tue, 12 May 2015 09:20:15 -0700 Subject: [Numpy-discussion] [JOB] Work full time on Project Jupyter/IPython Message-ID: Hi all, I wanted to let the community know that we are currently hiring 3 full time software engineers to work full time on Project Jupyter/IPython. These positions will be in my group at Cal Poly in San Luis Obispo, CA. We are looking for frontend and backend software engineers with lots of Python/JavaScript experience and a passion for open source software. The details can be found here: https://www.calpolycorporationjobs.org/postings/736 This is an unusual opportunity in a couple of respects: * These positions will allow you to work on open source software full time - not as a X% side project (aka weekends and evenings). * These are fully benefited positions (CA state retirement, health care, etc.) * You will get to work and live in San Luis Obispo, one of the nicest places on earth. We are minutes from the beach, have perfect year-round weather and are close to both the Bay Area and So Cal. I am more than willing to talk to any who is interested in these positions. Cheers, Brian -- Brian E. Granger Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger at calpoly.edu and ellisonbg at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue May 12 14:18:40 2015 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 12 May 2015 14:18:40 -0400 Subject: [Numpy-discussion] python is cool Message-ID: In order to make sure all my random number generators have good independence, it is a good practice to use a single shared instance (because it is already known to have good properties). A less-desirable alternative is to used rng's seeded with different starting states - in this case the independence properties are not generally known. So I have some fairly deeply nested data structures (classes) that somewhere contain a reference to a RandomState object. I need to be able to clone these data structures, producing new independent copies, but I want the RandomState part to be the shared, singleton rs object. In python, no problem: --- from numpy.random import RandomState class shared_random_state (RandomState): def __init__ (self, rs): RandomState.__init__(self, rs) def __deepcopy__ (self, memo): return self --- Now I can copy.deepcopy the data structures, but the randomstate part is shared. I just use rs = shared_random_state (random.RandomState(0)) and provide this rs to all my other objects. Pretty nice! -- Those who fail to understand recursion are doomed to repeat it From ocp at gatech.edu Tue May 12 14:41:33 2015 From: ocp at gatech.edu (Pierson, Oliver C) Date: Tue, 12 May 2015 18:41:33 +0000 Subject: [Numpy-discussion] Integral Equation Solver Message-ID: <1431456093009.7894@gatech.edu> Hi All, Awhile back I had written some code to solve Volterra integral equations (integral equations where one of the integration bounds is a variable). The code is available on Github (https://github.com/oliverpierson/volterra). Just curious if there'd be any interest in adding this to Numpy? I still have some work to do on the code. However, before I invest too much time, I was trying to get a feel for the interest in this functionality. Please let me know if you have any questions. Thanks, Oliver -------------- next part -------------- An HTML attachment was scrubbed... URL: From roland at utk.edu Tue May 12 14:56:02 2015 From: roland at utk.edu (Roland Schulz) Date: Tue, 12 May 2015 14:56:02 -0400 Subject: [Numpy-discussion] python is cool In-Reply-To: References: Message-ID: Hi, I think the best way to solve this issue to not use a state at all. It is fast, reproducible even in parallel (if wanted), and doesn't suffer from the shared issue. Would be nice if numpy provided such a stateless RNG as implemented in Random123: www.deshawresearch.com/resources_random123.html Roland On Tue, May 12, 2015 at 2:18 PM, Neal Becker wrote: > In order to make sure all my random number generators have good > independence, it is a good practice to use a single shared instance > (because > it is already known to have good properties). A less-desirable alternative > is to used rng's seeded with different starting states - in this case the > independence properties are not generally known. > > So I have some fairly deeply nested data structures (classes) that > somewhere > contain a reference to a RandomState object. > > I need to be able to clone these data structures, producing new independent > copies, but I want the RandomState part to be the shared, singleton rs > object. > > In python, no problem: > > --- > from numpy.random import RandomState > > class shared_random_state (RandomState): > def __init__ (self, rs): > RandomState.__init__(self, rs) > > def __deepcopy__ (self, memo): > return self > --- > > Now I can copy.deepcopy the data structures, but the randomstate part is > shared. I just use > > rs = shared_random_state (random.RandomState(0)) > > and provide this rs to all my other objects. Pretty nice! > > -- > Those who fail to understand recursion are doomed to repeat it > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 865-241-1537, ORNL PO BOX 2008 MS6309 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue May 12 15:00:42 2015 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 12 May 2015 15:00:42 -0400 Subject: [Numpy-discussion] python is cool References: Message-ID: Roland Schulz wrote: > Hi, > > I think the best way to solve this issue to not use a state at all. It is > fast, reproducible even in parallel (if wanted), and doesn't suffer from > the shared issue. Would be nice if numpy provided such a stateless RNG as > implemented in Random123: www.deshawresearch.com/resources_random123.html > > Roland That is interesting. I think np.random needs to be refactored, so it can accept a pluggable rng - then we could switch the underlying rng. From charlesr.harris at gmail.com Tue May 12 15:34:58 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 12 May 2015 13:34:58 -0600 Subject: [Numpy-discussion] Integral Equation Solver In-Reply-To: <1431456093009.7894@gatech.edu> References: <1431456093009.7894@gatech.edu> Message-ID: On Tue, May 12, 2015 at 12:41 PM, Pierson, Oliver C wrote: > Hi All, > > Awhile back I had written some code to solve Volterra integral equations > (integral equations where one of the integration bounds is a variable). > The code is available on Github (https://github.com/oliverpierson/volterra). > Just curious if there'd be any interest in adding this to Numpy? I still > have some work to do on the code. However, before I invest too much time, > I was trying to get a feel for the interest in this functionality. > Could be useful. The best place for something like this would be scipy ( scipy-dev at scipy.org).. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue May 12 17:54:55 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 12 May 2015 23:54:55 +0200 Subject: [Numpy-discussion] ANN: Scipy 0.16.0 beta 1 release Message-ID: Hi all, I'm pleased to announce the availability of the first beta release of Scipy 0.16.0. Please try this beta and report any issues on the Github issue tracker or on the scipy-dev mailing list. This first beta is a source-only release; binary installers will follow (probably next week). Source tarballs and the full release notes can be found at https://sourceforge.net/projects/scipy/files/scipy/0.16.0b1/. Part of the release notes copied below. Thanks to everyone who contributed to this release! Ralf ========================== SciPy 0.16.0 Release Notes ========================== .. note:: Scipy 0.16.0 is not released yet! SciPy 0.16.0 is the culmination of 6 months of hard work. It contains many new features, numerous bug-fixes, improved test coverage and better documentation. There have been a number of deprecations and API changes in this release, which are documented below. All users are encouraged to upgrade to this release, as there are a large number of bug-fixes and optimizations. Moreover, our development attention will now shift to bug-fix releases on the 0.15.x branch, and on adding new features on the master branch. This release requires Python 2.6, 2.7 or 3.2-3.4 and NumPy 1.6.2 or greater. Highlights of this release include: - A Cython API for BLAS/LAPACK in `scipy.linalg` - A new benchmark suite. It's now straightforward to add new benchmarks, and they're routinely included with performance enhancement PRs. - Support for the second order sections (SOS) format in `scipy.signal`. New features ============ Benchmark suite --------------- The benchmark suite has switched to using `Airspeed Velocity `__ for benchmarking. You can run the suite locally via ``python runtests.py --bench``. For more details, see ``benchmarks/README.rst``. `scipy.linalg` improvements --------------------------- A full set of Cython wrappers for BLAS and LAPACK has been added in the modules `scipy.linalg.cython_blas` and `scipy.linalg.cython_lapack`. In Cython, these wrappers can now be cimported from their corresponding modules and used without linking directly against BLAS or LAPACK. The functions `scipy.linalg.qr_delete`, `scipy.linalg.qr_insert` and `scipy.linalg.qr_update` for updating QR decompositions were added. The function `scipy.linalg.solve_circulant` solves a linear system with a circulant coefficient matrix. The function `scipy.linalg.invpascal` computes the inverse of a Pascal matrix. The function `scipy.linalg.solve_toeplitz`, a Levinson-Durbin Toeplitz solver, was added. Added wrapper for potentially useful LAPACK function ``*lasd4``. It computes the square root of the i-th updated eigenvalue of a positive symmetric rank-one modification to a positive diagonal matrix. See its LAPACK documentation and unit tests for it to get more info. Added two extra wrappers for LAPACK least-square solvers. Namely, they are ``*gelsd`` and ``*gelsy``. Wrappers for the LAPACK ``*lange`` functions, which calculate various matrix norms, were added. Wrappers for ``*gtsv`` and ``*ptsv``, which solve ``A*X = B`` for tri-diagonal matrix ``A``, were added. `scipy.signal` improvements --------------------------- Support for second order sections (SOS) as a format for IIR filters was added. The new functions are: * `scipy.signal.sosfilt` * `scipy.signal.sosfilt_zi`, * `scipy.signal.sos2tf` * `scipy.signal.sos2zpk` * `scipy.signal.tf2sos` * `scipy.signal.zpk2sos`. Additionally, the filter design functions `iirdesign`, `iirfilter`, `butter`, `cheby1`, `cheby2`, `ellip`, and `bessel` can return the filter in the SOS format. The function `scipy.signal.place_poles`, which provides two methods to place poles for linear systems, was added. The option to use Gustafsson's method for choosing the initial conditions of the forward and backward passes was added to `scipy.signal.filtfilt`. New classes ``TransferFunction``, ``StateSpace`` and ``ZerosPolesGain`` were added. These classes are now returned when instantiating `scipy.signal.lti`. Conversion between those classes can be done explicitly now. An exponential (Poisson) window was added as `scipy.signal.exponential`, and a Tukey window was added as `scipy.signal.tukey`. The function for computing digital filter group delay was added as `scipy.signal.group_delay`. The functionality for spectral analysis and spectral density estimation has been significantly improved: `scipy.signal.welch` became ~8x faster and the functions `scipy.signal.spectrogram`, `scipy.signal.coherence` and `scipy.signal.csd` (cross-spectral density) were added. `scipy.signal.lsim` was rewritten - all known issues are fixed, so this function can now be used instead of ``lsim2``; ``lsim`` is orders of magnitude faster than ``lsim2`` in most cases. `scipy.sparse` improvements --------------------------- The function `scipy.sparse.norm`, which computes sparse matrix norms, was added. The function `scipy.sparse.random`, which allows to draw random variates from an arbitrary distribution, was added. `scipy.spatial` improvements ---------------------------- `scipy.spatial.cKDTree` has seen a major rewrite, which improved the performance of the ``query`` method significantly, added support for parallel queries, pickling, and options that affect the tree layout. See pull request 4374 for more details. The function `scipy.spatial.procrustes` for Procrustes analysis (statistical shape analysis) was added. `scipy.stats` improvements -------------------------- The Wishart distribution and its inverse have been added, as `scipy.stats.wishart` and `scipy.stats.invwishart`. The Exponentially Modified Normal distribution has been added as `scipy.stats.exponnorm`. The Generalized Normal distribution has been added as `scipy.stats.gennorm`. All distributions now contain a ``random_state`` property and allow specifying a specific ``numpy.random.RandomState`` random number generator when generating random variates. Many statistical tests and other `scipy.stats` functions that have multiple return values now return ``namedtuples``. See pull request 4709 for details. `scipy.optimize` improvements ----------------------------- A new derivative-free method DF-SANE has been added to the nonlinear equation system solving function `scipy.optimize.root`. Deprecated features =================== ``scipy.stats.pdf_fromgamma`` is deprecated. This function was undocumented, untested and rarely used. Statsmodels provides equivalent functionality with ``statsmodels.distributions.ExpandedNormal``. ``scipy.stats.fastsort`` is deprecated. This function is unnecessary, ``numpy.argsort`` can be used instead. ``scipy.stats.signaltonoise`` and ``scipy.stats.mstats.signaltonoise`` are deprecated. These functions did not belong in ``scipy.stats`` and are rarely used. See issue #609 for details. ``scipy.stats.histogram2`` is deprecated. This function is unnecessary, ``numpy.histogram2d`` can be used instead. Backwards incompatible changes ============================== The deprecated global optimizer ``scipy.optimize.anneal`` was removed. The following deprecated modules have been removed: ``scipy.lib.blas``, ``scipy.lib.lapack``, ``scipy.linalg.cblas``, ``scipy.linalg.fblas``, ``scipy.linalg.clapack``, ``scipy.linalg.flapack``. They had been deprecated since Scipy 0.12.0, the functionality should be accessed as `scipy.linalg.blas` and `scipy.linalg.lapack`. The deprecated function ``scipy.special.all_mat`` has been removed. The deprecated functions ``fprob``, ``ksprob``, ``zprob``, ``randwcdf`` and ``randwppf`` have been removed from `scipy.stats`. Other changes ============= The version numbering for development builds has been updated to comply with PEP 440. Building with ``python setup.py develop`` is now supported. Authors ======= * @axiru + * @endolith * Elliott Sales de Andrade + * Anne Archibald * Yoshiki V?zquez Baeza + * Sylvain Bellemare * Felix Berkenkamp + * Raoul Bourquin + * Matthew Brett * Per Brodtkorb * Christian Brueffer * Lars Buitinck * Evgeni Burovski * Steven Byrnes * CJ Carey * George Castillo + * Alex Conley + * Liam Damewood + * Rupak Das + * Abraham Escalante + * Matthias Feurer + * Eric Firing + * Clark Fitzgerald * Chad Fulton * Andr? Gaul * Andreea Georgescu + * Christoph Gohlke * Andrey Golovizin + * Ralf Gommers * J.J. Green + * Alex Griffing * Alexander Grigorievskiy + * Hans Moritz Gunther + * Jonas Hahnfeld + * Charles Harris * Ian Henriksen * Andreas Hilboll * ?smund Hjulstad + * Jan Schl?ter + * Janko Slavi? + * Daniel Jensen + * Johannes Ball? + * Terry Jones + * Amato Kasahara + * Eric Larson * Denis Laxalde * Antony Lee * Gregory R. Lee * Perry Lee + * Lo?c Est?ve * Martin Manns + * Eric Martin + * Mat?j Koci?n + * Andreas Mayer + * Nikolay Mayorov + * Robert McGibbon + * Sturla Molden * Nicola Montecchio + * Eric Moore * Jamie Morton + * Nikolas Moya + * Maniteja Nandana + * Andrew Nelson * Joel Nothman * Aldrian Obaja * Regina Ongowarsito + * Paul Ortyl + * Pedro L?pez-Adeva Fern?ndez-Layos + * Stefan Peterson + * Irvin Probst + * Eric Quintero + * John David Reaver + * Juha Remes + * Thomas Robitaille * Clancy Rowley + * Tobias Schmidt + * Skipper Seabold * Aman Singh + * Eric Soroos * Valentine Svensson + * Julian Taylor * Aman Thakral + * Helmut Toplitzer + * Fukumu Tsutsumi + * Anastasiia Tsyplia + * Jacob Vanderplas * Pauli Virtanen * Matteo Visconti + * Warren Weckesser * Florian Wilhelm + * Nathan Woods * Haochen Wu + * Daan Wynen + A total of 93 people contributed to this release. People with a "+" by their names contributed a patch for the first time. This list of names is automatically generated, and may not be fully complete. -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Wed May 13 17:14:39 2015 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 13 May 2015 15:14:39 -0600 Subject: [Numpy-discussion] Help loading data into pandas Message-ID: ?I have a large (~400mb) csv file I am trying to open in Pandas. When I don't specify the dtype and open it with the following command It appears to work. df = pd.io.parsers.read_csv(CSVFILECLEAN2013, quotechar='"', low_memory=False, na_values='') If I try to specify the dtype for each field I get an error but no hint as to where I should look. I have "cleaned" the csv by checking that all values that should be an int for a float are either blank or can be cast as a float or a int. I guess my question is, can I get a more useful error message or is there a hint as to where the problem is that I am not seeing. Exception Traceback (most recent call last) in () 3 import load_data 4 import numpy as np ----> 5 df2 = load_data.load('jeffco_2013') /Users/vmd/GitHub/Jeffco-Properties/tools/load_data.py in load(data) 47 def load(data): 48 files = dict(jeffco_2013 = '/Users/vmd/GitHub/Jeffco-Properties/Data/JeffersonCo/Datasets/2013_clean_Jeffco_ATSDTA_ATSP600.csv') ---> 49 return pd.io.parsers.read_csv(files[data], quotechar='"', low_memory=False, na_values='', dtype=DATASHAPE) /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines) 468 skip_blank_lines=skip_blank_lines) 469 --> 470 return _read(filepath_or_buffer, kwds) 471 472 parser_f.__name__ = name /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds) 254 return parser 255 --> 256 return parser.read() 257 258 _parser_defaults = { /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py in read(self, nrows) 713 raise ValueError('skip_footer not supported for iteration') 714 --> 715 ret = self._engine.read(nrows) 716 717 if self.options.get('as_recarray'): /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py in read(self, nrows) 1162 1163 try: -> 1164 data = self._reader.read(nrows) 1165 except StopIteration: 1166 if nrows is None: pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:7426)() pandas/parser.pyx in pandas.parser.TextReader._read_rows (pandas/parser.c:8484)() pandas/parser.pyx in pandas.parser.TextReader._convert_column_data (pandas/parser.c:9795)() pandas/parser.pyx in pandas.parser.TextReader._convert_tokens (pandas/parser.c:10403)() pandas/parser.pyx in pandas.parser.TextReader._convert_with_dtype (pandas/parser.c:11257)() Exception: Integer column has NA values Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed May 13 17:27:39 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 May 2015 14:27:39 -0700 Subject: [Numpy-discussion] Help loading data into pandas In-Reply-To: References: Message-ID: I don't think pandas allows blank values in integer columns? You might get better results asking on the pandas list, though -- see http://pandas.pydata.org/community.html -n On May 13, 2015 2:17 PM, "Vincent Davis" wrote: > ?I have a large (~400mb) csv file I am trying to open in Pandas. When I > don't specify the dtype and open it with the following command It appears > to work. > > df = pd.io.parsers.read_csv(CSVFILECLEAN2013, quotechar='"', > low_memory=False, na_values='') > > If I try to specify the dtype for each field I get an error but no hint as > to where I should look. I have "cleaned" the csv by checking that all > values that should be an int for a float are either blank or can be cast as > a float or a int. I guess my question is, can I get a more useful error > message or is there a hint as to where the problem is that I am not seeing. > > Exception Traceback (most recent call last) > in () > 3 import load_data > 4 import numpy as np > ----> 5 df2 = load_data.load('jeffco_2013') > > /Users/vmd/GitHub/Jeffco-Properties/tools/load_data.py in load(data) > 47 def load(data): > 48 files = dict(jeffco_2013 = > '/Users/vmd/GitHub/Jeffco-Properties/Data/JeffersonCo/Datasets/2013_clean_Jeffco_ATSDTA_ATSP600.csv') > ---> 49 return pd.io.parsers.read_csv(files[data], quotechar='"', > low_memory=False, na_values='', dtype=DATASHAPE) > > /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py > in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, > escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, > index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, > na_fvalues, true_values, false_values, delimiter, converters, dtype, > usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, > use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, > keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, > dayfirst, date_parser, memory_map, float_precision, nrows, iterator, > chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, > infer_datetime_format, skip_blank_lines) > 468 skip_blank_lines=skip_blank_lines) > 469 > --> 470 return _read(filepath_or_buffer, kwds) > 471 > 472 parser_f.__name__ = name > > /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py > in _read(filepath_or_buffer, kwds) > 254 return parser > 255 > --> 256 return parser.read() > 257 > 258 _parser_defaults = { > > /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py > in read(self, nrows) > 713 raise ValueError('skip_footer not supported for > iteration') > 714 > --> 715 ret = self._engine.read(nrows) > 716 > 717 if self.options.get('as_recarray'): > > /Users/vmd/anaconda/envs/py34/lib/python3.4/site-packages/pandas/io/parsers.py > in read(self, nrows) > 1162 > 1163 try: > -> 1164 data = self._reader.read(nrows) > 1165 except StopIteration: > 1166 if nrows is None: > > pandas/parser.pyx in pandas.parser.TextReader.read (pandas/parser.c:7426)() > > pandas/parser.pyx in pandas.parser.TextReader._read_rows > (pandas/parser.c:8484)() > > pandas/parser.pyx in pandas.parser.TextReader._convert_column_data > (pandas/parser.c:9795)() > > pandas/parser.pyx in pandas.parser.TextReader._convert_tokens > (pandas/parser.c:10403)() > > pandas/parser.pyx in pandas.parser.TextReader._convert_with_dtype > (pandas/parser.c:11257)() > > Exception: Integer column has NA values > > > > > > Vincent Davis > 720-301-3003 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgodshall at enthought.com Wed May 13 20:16:21 2015 From: cgodshall at enthought.com (Courtenay Godshall (Enthought)) Date: Wed, 13 May 2015 19:16:21 -0500 Subject: [Numpy-discussion] ANN: SciPy 2015 Talk & Poster Selections Announced Today, Early Bird Deadline 5/22 Message-ID: <008e01d08ddb$35f11290$a1d337b0$@enthought.com> The talks & posters for the 2015 SciPy Conference were announced today: http://scipy2015.scipy.org/ehome/115969/292868/? &. Early bird registration deadline was extended (final) to 5/22 - hope we'll see you this year! -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed May 13 20:23:10 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 13 May 2015 17:23:10 -0700 Subject: [Numpy-discussion] ANN: NumPy Developer Meeting: July 7th @ SciPy 2015 in Austin Message-ID: Hi all, I wanted to announce that the numpy core team will be organizing a whole-day face-to-face developer meeting on July 7 this year at the SciPy conference in Austin, TX. (This is the second day of the tutorials and the day before the conference proper starts.) This will be a working meeting to discuss and address numpy-related issues, particularly ones that are too big to fit in a github issue, like governance and release management and where we want to be in five years. (We'll be talking more between now and then about more detailed logistics and agenda and so forth, but I wanted to get this out now.) If you're reading this and interested in these issues, then you're invited :-). -n -- Nathaniel J. Smith -- http://vorpus.org From vincent at vincentdavis.net Wed May 13 22:30:28 2015 From: vincent at vincentdavis.net (Vincent Davis) Date: Wed, 13 May 2015 20:30:28 -0600 Subject: [Numpy-discussion] Help loading data into pandas In-Reply-To: References: Message-ID: On Wed, May 13, 2015 at 3:27 PM, Nathaniel Smith wrote: > I don't think pandas allows blank values in integer columns? You might get > better results asking on the pandas list, though -- see > http://pandas.pydata.org/community.html > ?"integer columns" seems to be the key. they have to be float. Thanks? Vincent Davis -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Thu May 14 01:47:08 2015 From: ellisonbg at gmail.com (Brian Granger) Date: Wed, 13 May 2015 22:47:08 -0700 Subject: [Numpy-discussion] ANN: NumPy Developer Meeting: July 7th @ SciPy 2015 in Austin In-Reply-To: References: Message-ID: Great! On Wed, May 13, 2015 at 5:23 PM, Nathaniel Smith wrote: > Hi all, > > I wanted to announce that the numpy core team will be organizing a > whole-day face-to-face developer meeting on July 7 this year at the > SciPy conference in Austin, TX. (This is the second day of the > tutorials and the day before the conference proper starts.) This will > be a working meeting to discuss and address numpy-related issues, > particularly ones that are too big to fit in a github issue, like > governance and release management and where we want to be in five > years. (We'll be talking more between now and then about more detailed > logistics and agenda and so forth, but I wanted to get this out now.) > > If you're reading this and interested in these issues, then you're invited > :-). > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Brian E. Granger Cal Poly State University, San Luis Obispo @ellisonbg on Twitter and GitHub bgranger at calpoly.edu and ellisonbg at gmail.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Thu May 14 07:31:17 2015 From: stefan.otte at gmail.com (Stefan Otte) Date: Thu, 14 May 2015 11:31:17 +0000 Subject: [Numpy-discussion] Create a n-D grid; meshgrid alternative In-Reply-To: References: Message-ID: Hey, I just created a pull request: https://github.com/numpy/numpy/pull/5874 Best, Stefan On Tue, May 12, 2015 at 3:29 PM Stefan Otte wrote: > Hey, > > here is an ipython notebook with benchmarks of all implementations (scroll > to the bottom for plots): > > https://github.com/sotte/ipynb_snippets/blob/master/2015-05%20gridspace%20-%20cartesian.ipynb > > Overall, Jaime's version is the fastest. > > > > > > > > On Tue, May 12, 2015 at 2:01 PM Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Tue, May 12, 2015 at 1:17 AM, Stefan Otte >> wrote: >> >>> Hello, >>> >>> indeed I was looking for the cartesian product. >>> >>> I timed the two stackoverflow answers and the winner is not quite as >>> clear: >>> >>> n_elements: 10 cartesian 0.00427 cartesian2 0.00172 >>> n_elements: 100 cartesian 0.02758 cartesian2 0.01044 >>> n_elements: 1000 cartesian 0.97628 cartesian2 1.12145 >>> n_elements: 5000 cartesian 17.14133 cartesian2 31.12241 >>> >>> (This is for two arrays as parameters: np.linspace(0, 1, n_elements)) >>> cartesian2 seems to be slower for bigger. >>> >> >> On my system, the following variation on Pauli's answer is 2-4x faster >> than his for your test cases: >> >> def cartesian4(arrays, out=None): >> arrays = [np.asarray(x).ravel() for x in arrays] >> dtype = np.result_type(*arrays) >> >> n = np.prod([arr.size for arr in arrays]) >> if out is None: >> out = np.empty((len(arrays), n), dtype=dtype) >> else: >> out = out.T >> >> for j, arr in enumerate(arrays): >> n /= arr.size >> out.shape = (len(arrays), -1, arr.size, n) >> out[j] = arr[np.newaxis, :, np.newaxis] >> out.shape = (len(arrays), -1) >> >> return out.T >> >> >>> I'd really appreciate if this was be part of numpy. Should I create a >>> pull request? >>> >> >> There hasn't been any opposition, quite the contrary, so yes, I would go >> ahead an create that PR. I somehow feel this belongs with the set >> operations, rather than with the indexing ones. Other thoughts? >> >> Also for consideration: should it work on flattened arrays? or should we >> give it an axis argument, and then "broadcast on the rest", a la >> generalized ufunc? >> >> Jaime >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri May 15 16:07:15 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 15 May 2015 13:07:15 -0700 Subject: [Numpy-discussion] binary wheels for numpy? Message-ID: Hi folks., I did a little "intro to scipy" session as part of a larger Python class the other day, and was dismayed to find that "pip install numpy" still dosn't work on Windows. Thanks mostly to Matthew Brett's work, the whole scipy stack is pip-installable on OS-X, it would be really nice if we had that for Windows. And no, saying "you should go get Python(x,y) or Anaconda, or Canopy, or...) is really not a good solution. That is indeed the way to go if someone is primarily focusing on computational programming, but if you have a web developer, or someone new to Python for general use, they really should be able to just grab numpy and play around with it a bit without having to start all over again. My solution was to point folks to Chris Gohlke's site -- which is a Fabulous resource -- THANK YOU CHRISTOPH! But I still think that we should have the basic scipy stack on PyPi as Windows Wheels... IIRC, the last run through on this discussion got stuck on the "what hardware should it support" -- wheels do not allow a selection at install time, so we'd have to decide what instruction set to support, and just stick with that. Which would mean that: some folks would get a numpy/scipy that would run a bit slower than it might and some folks would get one that wouldn't run at all on their machine. But I don't see any reason that we can't find a compromise here -- do a build that supports most machines, and be done with it. Even now, people have to go get (one way or another) a MKL-based build to get optimum performance anyway -- so if we pick an instruction set support by, say (an arbitrary, and impossible to determine) 95% of machines out there -- we're good to go. I take it there are licensing issues that prevent us from putting Chris' Binaries up on PyPi? But are there technical issues I'm forgetting here, or do we just need to come to a consensus as to hardware version to support and do it? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri May 15 16:35:36 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 15 May 2015 13:35:36 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: Hi, On Fri, May 15, 2015 at 1:07 PM, Chris Barker wrote: > Hi folks., > > I did a little "intro to scipy" session as part of a larger Python class the > other day, and was dismayed to find that "pip install numpy" still dosn't > work on Windows. > > Thanks mostly to Matthew Brett's work, the whole scipy stack is > pip-installable on OS-X, it would be really nice if we had that for Windows. > > And no, saying "you should go get Python(x,y) or Anaconda, or Canopy, or...) > is really not a good solution. That is indeed the way to go if someone is > primarily focusing on computational programming, but if you have a web > developer, or someone new to Python for general use, they really should be > able to just grab numpy and play around with it a bit without having to > start all over again. > > > My solution was to point folks to Chris Gohlke's site -- which is a Fabulous > resource -- > > THANK YOU CHRISTOPH! > > But I still think that we should have the basic scipy stack on PyPi as > Windows Wheels... > > IIRC, the last run through on this discussion got stuck on the "what > hardware should it support" -- wheels do not allow a selection at installc > time, so we'd have to decide what instruction set to support, and just stick > with that. Which would mean that: > > some folks would get a numpy/scipy that would run a bit slower than it might > and > some folks would get one that wouldn't run at all on their machine. > > But I don't see any reason that we can't find a compromise here -- do a > build that supports most machines, and be done with it. Even now, people > have to go get (one way or another) a MKL-based build to get optimum > performance anyway -- so if we pick an instruction set support by, say (an > arbitrary, and impossible to determine) 95% of machines out there -- we're > good to go. > > I take it there are licensing issues that prevent us from putting Chris' > Binaries up on PyPi? Yes, unfortunately we can't put MKL binaries on pypi because of the MKL license - see https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows#blas--lapack-libraries. Also see discussion in the containing thread of http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069701.html . > But are there technical issues I'm forgetting here, or do we just need to > come to a consensus as to hardware version to support and do it? There has been some progress on this - see https://github.com/scipy/scipy/issues/4829 I think there's a move afoot to have a Google hangout or similar on this exact topic : https://github.com/scipy/scipy/issues/2829#issuecomment-101303078 - maybe we could hammer out a policy there? Once we have got numpy and scipy built in a reasonable way, I think we will be most of the way there... Cheers, Matthew From chris.barker at noaa.gov Fri May 15 19:26:32 2015 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 15 May 2015 16:26:32 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: <309834013734674372@unknownmsgid> Thanks for the update Matthew, it's great to see so much activity on this issue. Looks like we are headed in the right direction --and getting close. Thanks to all that are putting time into this. -Chris > On May 15, 2015, at 1:37 PM, Matthew Brett wrote: > > Hi, > >> On Fri, May 15, 2015 at 1:07 PM, Chris Barker wrote: >> Hi folks., >> >> I did a little "intro to scipy" session as part of a larger Python class the >> other day, and was dismayed to find that "pip install numpy" still dosn't >> work on Windows. >> >> Thanks mostly to Matthew Brett's work, the whole scipy stack is >> pip-installable on OS-X, it would be really nice if we had that for Windows. >> >> And no, saying "you should go get Python(x,y) or Anaconda, or Canopy, or...) >> is really not a good solution. That is indeed the way to go if someone is >> primarily focusing on computational programming, but if you have a web >> developer, or someone new to Python for general use, they really should be >> able to just grab numpy and play around with it a bit without having to >> start all over again. >> >> >> My solution was to point folks to Chris Gohlke's site -- which is a Fabulous >> resource -- >> >> THANK YOU CHRISTOPH! >> >> But I still think that we should have the basic scipy stack on PyPi as >> Windows Wheels... >> >> IIRC, the last run through on this discussion got stuck on the "what >> hardware should it support" -- wheels do not allow a selection at installc >> time, so we'd have to decide what instruction set to support, and just stick >> with that. Which would mean that: >> >> some folks would get a numpy/scipy that would run a bit slower than it might >> and >> some folks would get one that wouldn't run at all on their machine. >> >> But I don't see any reason that we can't find a compromise here -- do a >> build that supports most machines, and be done with it. Even now, people >> have to go get (one way or another) a MKL-based build to get optimum >> performance anyway -- so if we pick an instruction set support by, say (an >> arbitrary, and impossible to determine) 95% of machines out there -- we're >> good to go. >> >> I take it there are licensing issues that prevent us from putting Chris' >> Binaries up on PyPi? > > Yes, unfortunately we can't put MKL binaries on pypi because of the > MKL license - see > https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows#blas--lapack-libraries. > Also see discussion in the containing thread of > http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069701.html > . > >> But are there technical issues I'm forgetting here, or do we just need to >> come to a consensus as to hardware version to support and do it? > > There has been some progress on this - see > > https://github.com/scipy/scipy/issues/4829 > > I think there's a move afoot to have a Google hangout or similar on > this exact topic : > https://github.com/scipy/scipy/issues/2829#issuecomment-101303078 - > maybe we could hammer out a policy there? Once we have got numpy and > scipy built in a reasonable way, I think we will be most of the way > there... > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Fri May 15 21:56:17 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 May 2015 21:56:17 -0400 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Fri, May 15, 2015 at 4:07 PM, Chris Barker wrote: > Hi folks., > > I did a little "intro to scipy" session as part of a larger Python class > the other day, and was dismayed to find that "pip install numpy" still > dosn't work on Windows. > > Thanks mostly to Matthew Brett's work, the whole scipy stack is > pip-installable on OS-X, it would be really nice if we had that for Windows. > > And no, saying "you should go get Python(x,y) or Anaconda, or Canopy, > or...) is really not a good solution. That is indeed the way to go if > someone is primarily focusing on computational programming, but if you have > a web developer, or someone new to Python for general use, they really > should be able to just grab numpy and play around with it a bit without > having to start all over again. > Unrelated to the pip/wheel discussion. In my experience by far the easiest to get something running to play with is using Winpython. Download and unzip (and maybe add to system path) and most of the data analysis stack is available. I haven't even bothered yet to properly install a full "system python" on my Windows machine. I'm just working with 3 winpython. (One even has Julia and IJulia included after following the installation instructions for a short time.) Josef > > > My solution was to point folks to Chris Gohlke's site -- which is a > Fabulous resource -- > > THANK YOU CHRISTOPH! > > But I still think that we should have the basic scipy stack on PyPi as > Windows Wheels... > > IIRC, the last run through on this discussion got stuck on the "what > hardware should it support" -- wheels do not allow a selection at install > time, so we'd have to decide what instruction set to support, and just > stick with that. Which would mean that: > > some folks would get a numpy/scipy that would run a bit slower than it > might > and > some folks would get one that wouldn't run at all on their machine. > > But I don't see any reason that we can't find a compromise here -- do a > build that supports most machines, and be done with it. Even now, people > have to go get (one way or another) a MKL-based build to get optimum > performance anyway -- so if we pick an instruction set support by, say (an > arbitrary, and impossible to determine) 95% of machines out there -- we're > good to go. > > I take it there are licensing issues that prevent us from putting Chris' > Binaries up on PyPi? > > But are there technical issues I'm forgetting here, or do we just need to > come to a consensus as to hardware version to support and do it? > > -Chris > > > > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Fri May 15 23:49:46 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 15 May 2015 20:49:46 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Fri, May 15, 2015 at 6:56 PM, wrote: > > > On Fri, May 15, 2015 at 4:07 PM, Chris Barker > wrote: > >> Hi folks., >> >> I did a little "intro to scipy" session as part of a larger Python class >> the other day, and was dismayed to find that "pip install numpy" still >> dosn't work on Windows. >> >> Thanks mostly to Matthew Brett's work, the whole scipy stack is >> pip-installable on OS-X, it would be really nice if we had that for Windows. >> >> And no, saying "you should go get Python(x,y) or Anaconda, or Canopy, >> or...) is really not a good solution. That is indeed the way to go if >> someone is primarily focusing on computational programming, but if you have >> a web developer, or someone new to Python for general use, they really >> should be able to just grab numpy and play around with it a bit without >> having to start all over again. >> > > Unrelated to the pip/wheel discussion. > > In my experience by far the easiest to get something running to play with > is using Winpython. Download and unzip (and maybe add to system path) and > most of the data analysis stack is available. > > I haven't even bothered yet to properly install a full "system python" on > my Windows machine. I'm just working with 3 winpython. (One even has Julia > and IJulia included after following the installation instructions for a > short time.) > +1 on WinPython. I have half a dozen "installations" of it, none registered with Windows. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Sat May 16 01:26:20 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 15 May 2015 22:26:20 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Fri, May 15, 2015 at 6:56 PM, wrote: > Unrelated to the pip/wheel discussion. > > In my experience by far the easiest to get something running to play with > is using Winpython. Download and unzip (and maybe add to system path) and > most of the data analysis stack is available. > Sure -- if someone comes to me wanting to use python for scientific/computational computing, I point them to one of the distributions -- maybe I'll add WinPython to that list now. But if someone is already using python for, say web development, then they already have an installation up and running, and I want to give them an easy option to add numpy (and secondarily scipy) to what they have easily. And it looks like we are almost there, thanks to a lot of work by a few key folks -- thanks! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun May 17 06:06:11 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 17 May 2015 12:06:11 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Fri, May 15, 2015 at 10:35 PM, Matthew Brett wrote: > Hi, > > On Fri, May 15, 2015 at 1:07 PM, Chris Barker > wrote: > > Hi folks., > > > > I did a little "intro to scipy" session as part of a larger Python class > the > > other day, and was dismayed to find that "pip install numpy" still dosn't > > work on Windows. > > > > Thanks mostly to Matthew Brett's work, the whole scipy stack is > > pip-installable on OS-X, it would be really nice if we had that for > Windows. > > > > And no, saying "you should go get Python(x,y) or Anaconda, or Canopy, > or...) > > is really not a good solution. That is indeed the way to go if someone is > > primarily focusing on computational programming, but if you have a web > > developer, or someone new to Python for general use, they really should > be > > able to just grab numpy and play around with it a bit without having to > > start all over again. > > > > > > My solution was to point folks to Chris Gohlke's site -- which is a > Fabulous > > resource -- > > > > THANK YOU CHRISTOPH! > > > > But I still think that we should have the basic scipy stack on PyPi as > > Windows Wheels... > > > > IIRC, the last run through on this discussion got stuck on the "what > > hardware should it support" -- wheels do not allow a selection at > installc > > time, so we'd have to decide what instruction set to support, and just > stick > > with that. Which would mean that: > > > > some folks would get a numpy/scipy that would run a bit slower than it > might > > and > > some folks would get one that wouldn't run at all on their machine. > > > > But I don't see any reason that we can't find a compromise here -- do a > > build that supports most machines, and be done with it. Even now, people > > have to go get (one way or another) a MKL-based build to get optimum > > performance anyway -- so if we pick an instruction set support by, say > (an > > arbitrary, and impossible to determine) 95% of machines out there -- > we're > > good to go. > > > > I take it there are licensing issues that prevent us from putting Chris' > > Binaries up on PyPi? > > Yes, unfortunately we can't put MKL binaries on pypi because of the > MKL license - see > > https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows#blas--lapack-libraries > . > Also see discussion in the containing thread of > http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069701.html > . > > > But are there technical issues I'm forgetting here, or do we just need to > > come to a consensus as to hardware version to support and do it? > There's the switch to OpenBLAS and building the right selection mechanism for which arch to use: http://article.gmane.org/gmane.comp.python.distutils.devel/20350. That seems now feasible to complete on a reasonable time-scale, and the problems with OpenBLAS seem to be mostly solved. Binaries which crash for ~1% of users (which ATLAS-SSE2 would result in) are still not acceptable I think. Ralf > There has been some progress on this - see > > https://github.com/scipy/scipy/issues/4829 > > I think there's a move afoot to have a Google hangout or similar on > this exact topic : > https://github.com/scipy/scipy/issues/2829#issuecomment-101303078 - > maybe we could hammer out a policy there? Once we have got numpy and > scipy built in a reasonable way, I think we will be most of the way > there... > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun May 17 11:22:09 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 17 May 2015 15:22:09 +0000 (UTC) Subject: [Numpy-discussion] binary wheels for numpy? References: Message-ID: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Matthew Brett wrote: > Yes, unfortunately we can't put MKL binaries on pypi because of the > MKL license - see I believe we can, because we asked Intel for permission. From what I heard the response was positive. But it doesn't mean we should. :-) Sturla From valentin at haenel.co Sun May 17 12:15:09 2015 From: valentin at haenel.co (Valentin Haenel) Date: Sun, 17 May 2015 18:15:09 +0200 Subject: [Numpy-discussion] [ANN] bcolz v0.9.0 Message-ID: <20150517161509.GA6197@kudu.in-berlin.de> ====================== Announcing bcolz 0.9.0 ====================== What's new ========== This is mostly a smallish feature and bugfix release. One large topic was implementing 'addcol' and 'delcol' to properly handle on-disk tables. 'addcol' now has a new keyword argument 'move' that allows you to specify if you want to move or copy the data. 'delcol' has a new keyword argument 'keep' which allows you preserve the data on disk when removing a column. Additionally, ctable now supports an 'auto_flush' keyword that makes it flush to disk automatically after any methods that may write data. Another important aspect is handling the GIL. In this release, we do keep the GIL while calling Blosc compress and decompress in order to support lock-free operation of newer Blosc versions (1.5.x and beyond) that no longer have a global state. Furthermore we now distribute the 'carray_ext.pxd' as part of the package via PyPi to ease building applications on bcolz, for example *bquery*. Finally, the Sphinx based API documentation is now autogenerated from the docstrings in the Python sources. For the full list, please check the release notes at: https://github.com/Blosc/bcolz/blob/v0.9.0/RELEASE_NOTES.rst What it is ========== *bcolz* provides columnar and compressed data containers that can live either on-disk or in-memory. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, an extremely fast meta-compressor that is optimized for binary data. Lastly, high-performance iterators (like ``iter()``, ``where()``) for querying the objects are provided. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, since the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots Other users of bcolz are Visualfabriq (http://www.visualfabriq.com/) the Blaze project (http://blaze.pydata.org/), Quantopian (https://www.quantopian.com/) and Scikit-Allel (https://github.com/cggh/scikit-allel) which you can read more about by pointing your browser at the links below. * Visualfabriq: * *bquery*, A query and aggregation framework for Bcolz: * https://github.com/visualfabriq/bquery * Blaze: * Notebooks showing Blaze + Pandas + BColz interaction: * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-csv.ipynb * http://nbviewer.ipython.org/url/blaze.pydata.org/notebooks/timings-bcolz.ipynb * Quantopian: * Using compressed data containers for faster backtesting at scale: * https://quantopian.github.io/talks/NeedForSpeed/slides.html * Scikit-Allel * Provides an alternative backend to work with compressed arrays * https://scikit-allel.readthedocs.org/en/latest/bcolz.html Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt Release notes can be found in the Git repository: https://github.com/Blosc/bcolz/blob/master/RELEASE_NOTES.rst ---- **Enjoy data!** From matthew.brett at gmail.com Sun May 17 14:50:14 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 17 May 2015 11:50:14 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, May 17, 2015 at 8:22 AM, Sturla Molden wrote: > Matthew Brett wrote: > >> Yes, unfortunately we can't put MKL binaries on pypi because of the >> MKL license - see > > I believe we can, because we asked Intel for permission. From what I heard > the response was positive. We would need something formal from Intel saying that they do not require us to hold our users to their standard redistribution terms and that they waive the requirement that we be responsible for any damage to Intel that happens as a result of people using our binaries. I'm guessing we don't have this, but I'm happy to be corrected, Cheers, Matthew From ralf.gommers at gmail.com Sun May 17 14:54:48 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 17 May 2015 20:54:48 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, May 17, 2015 at 8:50 PM, Matthew Brett wrote: > On Sun, May 17, 2015 at 8:22 AM, Sturla Molden > wrote: > > Matthew Brett wrote: > > > >> Yes, unfortunately we can't put MKL binaries on pypi because of the > >> MKL license - see > > > > I believe we can, because we asked Intel for permission. From what I > heard > > the response was positive. > > We would need something formal from Intel saying that they do not > require us to hold our users to their standard redistribution terms > and that they waive the requirement that we be responsible for any > damage to Intel that happens as a result of people using our binaries. > > I'm guessing we don't have this, but I'm happy to be corrected, > We only have an email, probably not enough. I'd rather not go to the trouble of discussing something more formal unless we are really sure that we actually want to distribute MKL binaries. Which isn't too likely I suspect; OpenBLAS seems like the way to go (?). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun May 17 15:11:25 2015 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 17 May 2015 20:11:25 +0100 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, May 17, 2015 at 7:50 PM, Matthew Brett wrote: > > On Sun, May 17, 2015 at 8:22 AM, Sturla Molden wrote: > > Matthew Brett wrote: > > > >> Yes, unfortunately we can't put MKL binaries on pypi because of the > >> MKL license - see > > > > I believe we can, because we asked Intel for permission. From what I heard > > the response was positive. > > We would need something formal from Intel saying that they do not > require us to hold our users to their standard redistribution terms > and that they waive the requirement that we be responsible for any > damage to Intel that happens as a result of people using our binaries. > > I'm guessing we don't have this, but I'm happy to be corrected, I don't think permission from Intel is the blocking issue for putting these binaries up on PyPI. Even with Intel's permission, we would be putting up proprietary binaries on a page that is explicitly claiming that the files linked therein are BSD-licensed. The binaries could not be redistributed with any GPLed module, say, pygsl. We could host them on numpy.org on their own page that clearly explained the license of those files, but I think PyPI is out. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun May 17 17:09:56 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 17 May 2015 23:09:56 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On 17/05/15 20:54, Ralf Gommers wrote: > I suspect; OpenBLAS seems like the way to go (?). I think OpenBLAS is currently the most promising candidate to replace ATLAS. But we need to build OpenBLAS with MinGW gcc, due to AT&T syntax in the assembly code. I am not sure if the old toolchain is good enough, or if we will need Carl Kleffner's binaries. Sturla From njs at pobox.com Sun May 17 17:18:58 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 May 2015 14:18:58 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, May 17, 2015 at 2:09 PM, Sturla Molden wrote: > On 17/05/15 20:54, Ralf Gommers wrote: > >> I suspect; OpenBLAS seems like the way to go (?). > > I think OpenBLAS is currently the most promising candidate to replace > ATLAS. But we need to build OpenBLAS with MinGW gcc, due to AT&T syntax > in the assembly code. I am not sure if the old toolchain is good enough, > or if we will need Carl Kleffner's binaries. The old toolchain is 32-bit only, so it certainly won't be a general solution. -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Sun May 17 20:39:58 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 May 2015 17:39:58 -0700 Subject: [Numpy-discussion] ANN: NumPy Developer Meeting: July 7th @ SciPy 2015 in Austin In-Reply-To: References: Message-ID: Hi all, I just made a wiki page to start collecting agenda items and doing planning for this: https://github.com/numpy/numpy/wiki/SciPy-2015-developer-meeting -n On Wed, May 13, 2015 at 5:23 PM, Nathaniel Smith wrote: > Hi all, > > I wanted to announce that the numpy core team will be organizing a > whole-day face-to-face developer meeting on July 7 this year at the > SciPy conference in Austin, TX. (This is the second day of the > tutorials and the day before the conference proper starts.) This will > be a working meeting to discuss and address numpy-related issues, > particularly ones that are too big to fit in a github issue, like > governance and release management and where we want to be in five > years. (We'll be talking more between now and then about more detailed > logistics and agenda and so forth, but I wanted to get this out now.) > > If you're reading this and interested in these issues, then you're invited :-). > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org -- Nathaniel J. Smith -- http://vorpus.org From chris.barker at noaa.gov Mon May 18 00:09:12 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Sun, 17 May 2015 21:09:12 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, May 17, 2015 at 12:11 PM, Robert Kern wrote: > I don't think permission from Intel is the blocking issue for putting > these binaries up on PyPI. Even with Intel's permission, we would be > putting up proprietary binaries on a page that is explicitly claiming that > the files linked therein are BSD-licensed. The binaries could not be > redistributed with any GPLed module, say, pygsl. > > We could host them on numpy.org on their own page that clearly explained > the license of those files, but I think PyPI is out. > > Can't PyPi re-direct -- so they can actualy be hosted somewhere else, but "pip install numpy" would still work? IIUC, The Intel libs have the great advantage of run-time selection of hardware specific code -- yes? So they would both work and give high performance on most machines (all?). Much as I am a fan of open source, there doesn't appear to be anything as good out there. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon May 18 00:14:56 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Sun, 17 May 2015 21:14:56 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Sun, May 17, 2015 at 3:06 AM, Ralf Gommers wrote: > Binaries which crash for ~1% of users (which ATLAS-SSE2 would result in) > are still not acceptable I think. > what instruction set would an OpenBLAS build support? wouldn't we still need to select a lowest common denominator instructions set to support? And SEE2 was introduced with the Pentium 4in 2001 -- that is a very long time ago! I think the 1% number came from a survey of firefox downloads -- that may well not be representative of the numpy-using population. and depending on HOW it failed, 1% might be OK if we could give a reasonable error message (which maybe we can't...) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon May 18 00:23:37 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 17 May 2015 21:23:37 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Sun, May 17, 2015 at 9:14 PM, Chris Barker wrote: > On Sun, May 17, 2015 at 3:06 AM, Ralf Gommers > wrote: >> >> Binaries which crash for ~1% of users (which ATLAS-SSE2 would result in) >> are still not acceptable I think. > > > what instruction set would an OpenBLAS build support? wouldn't we still need > to select a lowest common denominator instructions set to support? I believe OpenBLAS does run-time selection too. > And SEE2 was introduced with the Pentium 4in 2001 -- that is a very long > time ago! > > I think the 1% number came from a survey of firefox downloads -- that may > well not be representative of the numpy-using population. > > and depending on HOW it failed, 1% might be OK if we could give a reasonable > error message (which maybe we can't...) I think we discussed before having a check and error clause in __init__.py saying something like "You have a really old computer, you can't use this binary, please go to sourceforge and download the exe installer...". Matthew From njs at pobox.com Mon May 18 00:27:57 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 May 2015 21:27:57 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, May 17, 2015 at 9:09 PM, Chris Barker wrote: > On Sun, May 17, 2015 at 12:11 PM, Robert Kern wrote: >> >> I don't think permission from Intel is the blocking issue for putting >> these binaries up on PyPI. Even with Intel's permission, we would be putting >> up proprietary binaries on a page that is explicitly claiming that the files >> linked therein are BSD-licensed. The binaries could not be redistributed >> with any GPLed module, say, pygsl. >> >> We could host them on numpy.org on their own page that clearly explained >> the license of those files, but I think PyPI is out. > > Can't PyPi re-direct -- so they can actualy be hosted somewhere else, but > "pip install numpy" would still work? There's two issues here: (1) we can't actually use the intel stuff (MKL, icc) under its regular license without having our release managers accepting personal liability. Which isn't going to happen. (2) The problem isn't whether they're hosted on PyPI, it's whether the people downloading them get warned about what they're downloading. The whole point is that we *don't* want 'pip install numpy' to work in this case, because it's too seamless. -n -- Nathaniel J. Smith -- http://vorpus.org From matthew.brett at gmail.com Mon May 18 00:32:05 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 17 May 2015 21:32:05 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On Sun, May 17, 2015 at 9:27 PM, Nathaniel Smith wrote: > On Sun, May 17, 2015 at 9:09 PM, Chris Barker wrote: >> On Sun, May 17, 2015 at 12:11 PM, Robert Kern wrote: >>> >>> I don't think permission from Intel is the blocking issue for putting >>> these binaries up on PyPI. Even with Intel's permission, we would be putting >>> up proprietary binaries on a page that is explicitly claiming that the files >>> linked therein are BSD-licensed. The binaries could not be redistributed >>> with any GPLed module, say, pygsl. >>> >>> We could host them on numpy.org on their own page that clearly explained >>> the license of those files, but I think PyPI is out. >> >> Can't PyPi re-direct -- so they can actualy be hosted somewhere else, but >> "pip install numpy" would still work? > > There's two issues here: (1) we can't actually use the intel stuff > (MKL, icc) under its regular license without having our release > managers accepting personal liability. Which isn't going to happen. > (2) The problem isn't whether they're hosted on PyPI, it's whether the > people downloading them get warned about what they're downloading. The > whole point is that we *don't* want 'pip install numpy' to work in > this case, because it's too seamless. I'd add Robert's point - we will have made the default install something that is not compatible with GPL libraries, Matthew From njs at pobox.com Mon May 18 00:34:29 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 May 2015 21:34:29 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Sun, May 17, 2015 at 3:06 AM, Ralf Gommers wrote: > There's the switch to OpenBLAS and building the right selection mechanism > for which arch to use: > http://article.gmane.org/gmane.comp.python.distutils.devel/20350. That seems > now feasible to complete on a reasonable time-scale, and the problems with > OpenBLAS seem to be mostly solved. Binaries which crash for ~1% of users > (which ATLAS-SSE2 would result in) are still not acceptable I think. Where are you getting this SSE2 number from btw? The most detailed public survey source for consumer hardware that I know is the Steam hardware survey: http://store.steampowered.com/hwsurvey It's somewhat biased towards higher-end hardware b/c it targets gamers, but there is plenty of less-high-end hardware on there as well -- notice that 20% of the surveyed computers are using intel graphics. And they're reporting that 99.92% of surveyed computers have SSE*3* support, and 100.00% have SSE2. So assuming the significant digits are accurate, this puts the upper bound on SSE2 failure on these systems at ~0.05%. Even if gamers are 10x likelier to have new hardware then the rest of the world, 1% still seems to be at least an order of magnitude too high? -n -- Nathaniel J. Smith -- http://vorpus.org From ralf.gommers at gmail.com Mon May 18 00:45:30 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 18 May 2015 06:45:30 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Mon, May 18, 2015 at 6:34 AM, Nathaniel Smith wrote: > On Sun, May 17, 2015 at 3:06 AM, Ralf Gommers > wrote: > > There's the switch to OpenBLAS and building the right selection mechanism > > for which arch to use: > > http://article.gmane.org/gmane.comp.python.distutils.devel/20350. That > seems > > now feasible to complete on a reasonable time-scale, and the problems > with > > OpenBLAS seem to be mostly solved. Binaries which crash for ~1% of users > > (which ATLAS-SSE2 would result in) are still not acceptable I think. > > Where are you getting this SSE2 number from btw? This is info Matthew just collected from Firefox crash reports: https://github.com/scipy/scipy/issues/4829#issuecomment-100354752 The most detailed > public survey source for consumer hardware that I know is the Steam > hardware survey: > > http://store.steampowered.com/hwsurvey > > It's somewhat biased towards higher-end hardware b/c it targets > gamers, but there is plenty of less-high-end hardware on there as well > -- notice that 20% of the surveyed computers are using intel graphics. > And they're reporting that 99.92% of surveyed computers have SSE*3* > support, and 100.00% have SSE2. So assuming the significant digits are > accurate, this puts the upper bound on SSE2 failure on these systems > at ~0.05%. Even if gamers are 10x likelier to have new hardware then > the rest of the world, 1% still seems to be at least an order of > magnitude too high? > That would make life easier..... Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon May 18 00:56:47 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 17 May 2015 21:56:47 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Sun, May 17, 2015 at 9:45 PM, Ralf Gommers wrote: > > On Mon, May 18, 2015 at 6:34 AM, Nathaniel Smith wrote: >> >> On Sun, May 17, 2015 at 3:06 AM, Ralf Gommers >> wrote: >> > There's the switch to OpenBLAS and building the right selection >> > mechanism >> > for which arch to use: >> > http://article.gmane.org/gmane.comp.python.distutils.devel/20350. That >> > seems >> > now feasible to complete on a reasonable time-scale, and the problems >> > with >> > OpenBLAS seem to be mostly solved. Binaries which crash for ~1% of users >> > (which ATLAS-SSE2 would result in) are still not acceptable I think. >> >> Where are you getting this SSE2 number from btw? > > This is info Matthew just collected from Firefox crash reports: > https://github.com/scipy/scipy/issues/4829#issuecomment-100354752 Ah, hmm. I guess it's possible that decade-old machines are less reliable and overrepresented in crash reports, but who knows :-) It might become reasonable at some point to just go ahead and put up binaries (ideally with some check so that they fail in a human-readable way), and see how many people email us. If it's too many we can always take the wheels down again. -n -- Nathaniel J. Smith -- http://vorpus.org From ralf.gommers at gmail.com Mon May 18 01:08:10 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 18 May 2015 07:08:10 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Mon, May 18, 2015 at 6:56 AM, Nathaniel Smith wrote: > On Sun, May 17, 2015 at 9:45 PM, Ralf Gommers > wrote: > > > > On Mon, May 18, 2015 at 6:34 AM, Nathaniel Smith wrote: > >> > >> On Sun, May 17, 2015 at 3:06 AM, Ralf Gommers > >> wrote: > >> > There's the switch to OpenBLAS and building the right selection > >> > mechanism > >> > for which arch to use: > >> > http://article.gmane.org/gmane.comp.python.distutils.devel/20350. > That > >> > seems > >> > now feasible to complete on a reasonable time-scale, and the problems > >> > with > >> > OpenBLAS seem to be mostly solved. Binaries which crash for ~1% of > users > >> > (which ATLAS-SSE2 would result in) are still not acceptable I think. > >> > >> Where are you getting this SSE2 number from btw? > > > > This is info Matthew just collected from Firefox crash reports: > > https://github.com/scipy/scipy/issues/4829#issuecomment-100354752 > > Ah, hmm. I guess it's possible that decade-old machines are less > reliable and overrepresented in crash reports, but who knows :-) > > It might become reasonable at some point to just go ahead and put up > binaries (ideally with some check so that they fail in a > human-readable way), and see how many people email us. If it's too > many we can always take the wheels down again. > We should probably do that for the next release, if and only if we cannot make the switch to OpenBLAS in time. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon May 18 07:47:53 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 18 May 2015 13:47:53 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: <1250663829453568496.156738sturla.molden-gmail.com@news.gmane.org> Message-ID: On 18/05/15 06:09, Chris Barker wrote: > IIUC, The Intel libs have the great advantage of run-time selection of > hardware specific code -- yes? So they would both work and give high > performance on most machines (all?). OpenBLAS can also be built for dynamic architecture with hardware auto-detection. IIRC you build with DYNAMIC_ARCH=1 instead of specifying TARGET. Apple Accelerate Framework does this as well. Sturla From fomcl at yahoo.com Mon May 18 13:08:57 2015 From: fomcl at yahoo.com (Albert-Jan Roskam) Date: Mon, 18 May 2015 17:08:57 +0000 (UTC) Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: <1302776353.586284.1431968937387.JavaMail.yahoo@mail.yahoo.com> ----- Original Message ----- > From: Matthew Brett > To: Discussion of Numerical Python > Cc: > Sent: Monday, May 18, 2015 6:32 AM > Subject: Re: [Numpy-discussion] binary wheels for numpy? > > On Sun, May 17, 2015 at 9:27 PM, Nathaniel Smith wrote: >> On Sun, May 17, 2015 at 9:09 PM, Chris Barker > wrote: >>> On Sun, May 17, 2015 at 12:11 PM, Robert Kern > wrote: >>>> >>>> I don't think permission from Intel is the blocking issue for > putting >>>> these binaries up on PyPI. Even with Intel's permission, we > would be putting >>>> up proprietary binaries on a page that is explicitly claiming that > the files >>>> linked therein are BSD-licensed. The binaries could not be > redistributed >>>> with any GPLed module, say, pygsl. >>>> >>>> We could host them on numpy.org on their own page that clearly > explained >>>> the license of those files, but I think PyPI is out. >>> >>> Can't PyPi re-direct -- so they can actualy be hosted somewhere > else, but >>> "pip install numpy" would still work? >> >> There's two issues here: (1) we can't actually use the intel stuff >> (MKL, icc) under its regular license without having our release >> managers accepting personal liability. Which isn't going to happen. >> (2) The problem isn't whether they're hosted on PyPI, it's > whether the >> people downloading them get warned about what they're downloading. The >> whole point is that we *don't* want 'pip install numpy' to work > in >> this case, because it's too seamless. But you could use allow-external or allow-all-external: --allow-external Allow the installation of a package even if it is externally hosted --allow-all-external Allow the installation of all packages that are externally hosted https://pip.pypa.io/en/latest/reference/pip_wheel.html#allow-external From jgoutin at users.sourceforge.net Mon May 18 13:49:58 2015 From: jgoutin at users.sourceforge.net (J.Goutin) Date: Mon, 18 May 2015 17:49:58 +0000 (UTC) Subject: [Numpy-discussion] MaskedArray compatibility decorators Message-ID: Hello, I created 2 decorators for improve compatibility of "numpy.ma.MaskedArray" with functions that don't support them. This simply convert MaskedArray to classical ndarray with masked value converted to NaN, use the function, and reconvert to MaskedArray. @MaArrayToNaNKeepMask : Re-use the source mask on the output. @MaArrayToNaNFixInvalid : Replace invalid values by mask on the output. Source: import numpy as np def MaArrayToNaNKeepMask(func): """ MaArray to ndArray with nan decorator. Keep mask from originale ndArray. """ def wrapper(MaArray, *args, **kwargs): try: Mask = MaArray.mask fill = MaArray.fill_value return np.ma.masked_array(func(MaArray.filled(np.NaN),*args, **kwargs), mask=Mask, fill_value=fill) except: return func(MaArray, *args, **kwargs) return wrapper def MaArrayToNaNFixInvalid(func): """ MaArray to ndArray with nan decorator. Recreate mask from invalid points. """ def wrapper(MaArray, *args, **kwargs): try: fill = MaArray.fill_value return np.ma.fix_invalid(func(MaArray.filled(np.NaN), *args, **kwargs), fill_value=fill) except: return func(MaArray, *args, **kwargs) return wrapper Exemple: import skimage.transform @MaArrayToNaNFixInvalid def maresize(image, output_shape, order=1, mode='constant', cval=0, clip=True, preserve_range=True): return skimage.transform.resize(image, output_shape, order, mode, cval, clip, preserve_range) I think it may be usefull to include this directly on numpy. There is a lot of functions that don't work directly with MaskedArray. From chris.barker at noaa.gov Mon May 18 15:57:41 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 18 May 2015 12:57:41 -0700 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On Sun, May 17, 2015 at 9:23 PM, Matthew Brett wrote: > I believe OpenBLAS does run-time selection too. very cool! then an excellent option if we can get it to work (make that you can get it to work, I'm not doing squat in this effort other than nudging...) I think we discussed before having a check and error clause in > __init__.py saying something like "You have a really old computer, you > can't use this binary, please go to sourceforge and download the exe > installer...". If we can to that, then there is NO reason not to put up binaries that _may_ not support some tiny percentage of users. though maybe with OpenBLAS we don't need to anyway. Thanks again to y'all for working on this. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon May 18 17:28:06 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 18 May 2015 23:28:06 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: On 18/05/15 21:57, Chris Barker wrote: > On Sun, May 17, 2015 at 9:23 PM, Matthew Brett > wrote: > > I believe OpenBLAS does run-time selection too. > > > very cool! then an excellent option if we can get it to work (make that > you can get it to work, I'm not doing squat in this effort other than > nudging...) Carl Kleffner has built binary wheels for NumPy and SciPy with OpenBLAS configured for run-time hardware detection. I don't remember at the top of my head where you can download them for testing. IIRC there remaining test failures were not related to OpenBLAS. Sturla From cmkleffner at gmail.com Tue May 19 10:04:50 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Tue, 19 May 2015 16:04:50 +0200 Subject: [Numpy-discussion] binary wheels for numpy? In-Reply-To: References: Message-ID: numpy and scipy wheels for python2.6-3.4 have been uploaded on binstar last month and are installable with pip: https://binstar.org/carlkl/numpy https://binstar.org/carlkl/scipy The toolchains can be downloaded from https://bitbucket.org/carlkl/mingw-w64-for-python/downloads with some explanations given in https://bitbucket.org/carlkl/mingw-w64-for-python/downloads/mingwpy-2015-04-readme.pdf Carl 2015-05-18 23:28 GMT+02:00 Sturla Molden : > On 18/05/15 21:57, Chris Barker wrote: > > On Sun, May 17, 2015 at 9:23 PM, Matthew Brett > > wrote: > > > > I believe OpenBLAS does run-time selection too. > > > > > > very cool! then an excellent option if we can get it to work (make that > > you can get it to work, I'm not doing squat in this effort other than > > nudging...) > > > Carl Kleffner has built binary wheels for NumPy and SciPy with OpenBLAS > configured for run-time hardware detection. I don't remember at the top > of my head where you can download them for testing. IIRC there remaining > test failures were not related to OpenBLAS. > > Sturla > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Thu May 21 21:06:46 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Thu, 21 May 2015 21:06:46 -0400 Subject: [Numpy-discussion] Two questions about PEP 465 dot product Message-ID: 1. Is there a simple expression using existing numpy functions that implements PEP 465 semantics for @? 2. Suppose I have a function that takes two vectors x and y, and a matrix M and returns x.dot(M.dot(y)). I would like to "vectorize" this function so that it works with x and y of any ndim >= 1 and M of any ndim >= 2 treating multi-dimensional x and y as arrays of vectors and M as an array of matrices (broadcasting as necessary). The result should be an array of xMy products. How would I achieve that using PEP 465's @? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu May 21 21:37:04 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 21 May 2015 18:37:04 -0700 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On Thu, May 21, 2015 at 6:06 PM, Alexander Belopolsky wrote: > 1. Is there a simple expression using existing numpy functions that > implements PEP 465 semantics for @? Not yet. > 2. Suppose I have a function that takes two vectors x and y, and a matrix M > and returns x.dot(M.dot(y)). I would like to "vectorize" this function so > that it works with x and y of any ndim >= 1 and M of any ndim >= 2 treating > multi-dimensional x and y as arrays of vectors and M as an array of matrices > (broadcasting as necessary). The result should be an array of xMy products. > How would I achieve that using PEP 465's @? (x[..., np.newaxis, :] @ M @ y[..., :, np.newaxis])[..., 0, 0] Alternatively, you might prefer something like this (though it won't yet take advantage of BLAS): np.einsum("...i,...ij,...j", x, M, y) Alternatively, there's been some discussion of the possibility of adding specialized gufuncs for broadcasted vector-vector, vector-matrix, matrix-vector multiplication, which wouldn't do the magic vector promotion that dot and @ do. -n -- Nathaniel J. Smith -- http://vorpus.org From charlesr.harris at gmail.com Fri May 22 02:02:49 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 22 May 2015 00:02:49 -0600 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On Thu, May 21, 2015 at 7:06 PM, Alexander Belopolsky wrote: > 1. Is there a simple expression using existing numpy functions that > implements PEP 465 semantics for @? > > 2. Suppose I have a function that takes two vectors x and y, and a matrix > M and returns x.dot(M.dot(y)). I would like to "vectorize" this function > so that it works with x and y of any ndim >= 1 and M of any ndim >= 2 > treating multi-dimensional x and y as arrays of vectors and M as an array > of matrices (broadcasting as necessary). The result should be an array of > xMy products. How would I achieve that using PEP 465's @? > > If you are willing to run Python 3.5 (use 3.6.0a3, a4 crawls with the bugs), you can use gh-5878 . The override mechanisms are still in process in Nathaniel's PR, so that may change. I'd welcome any feedback. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri May 22 02:03:55 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 22 May 2015 00:03:55 -0600 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On Fri, May 22, 2015 at 12:02 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Thu, May 21, 2015 at 7:06 PM, Alexander Belopolsky > wrote: > >> 1. Is there a simple expression using existing numpy functions that >> implements PEP 465 semantics for @? >> >> 2. Suppose I have a function that takes two vectors x and y, and a matrix >> M and returns x.dot(M.dot(y)). I would like to "vectorize" this function >> so that it works with x and y of any ndim >= 1 and M of any ndim >= 2 >> treating multi-dimensional x and y as arrays of vectors and M as an array >> of matrices (broadcasting as necessary). The result should be an array of >> xMy products. How would I achieve that using PEP 465's @? >> >> > If you are willing to run Python 3.5 (use 3.6.0a3, a4 crawls with the > bugs), you can use gh-5878 . > The override mechanisms are still in process in Nathaniel's PR, so that may > change. I'd welcome any feedback. > > Oops, make the 3.5.0a3. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu at mblondel.org Fri May 22 04:39:00 2015 From: mathieu at mblondel.org (Mathieu Blondel) Date: Fri, 22 May 2015 17:39:00 +0900 Subject: [Numpy-discussion] np.diag(np.dot(A, B)) Message-ID: Hi, I often need to compute the equivalent of np.diag(np.dot(A, B)). Computing np.dot(A, B) is highly inefficient if you only need the diagonal entries. Two more efficient ways of computing the same thing are np.sum(A * B.T, axis=1) and np.einsum("ij,ji->i", A, B). The first can allocate quite a lot of temporary memory. The second can be quite cryptic for someone not familiar with einsum. I assume that einsum does not compute np.dot(A, B), but I haven't verified. Since this is is quite a recurrent pattern, I was wondering if it would be worth adding a dedicated function to NumPy and SciPy's sparse module. A possible name would be "diagdot". The best performance would be obtained when A is C-style and B fortran-style. Best, Mathieu -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Fri May 22 04:58:15 2015 From: cournape at gmail.com (David Cournapeau) Date: Fri, 22 May 2015 17:58:15 +0900 Subject: [Numpy-discussion] np.diag(np.dot(A, B)) In-Reply-To: References: Message-ID: On Fri, May 22, 2015 at 5:39 PM, Mathieu Blondel wrote: > Hi, > > I often need to compute the equivalent of > > np.diag(np.dot(A, B)). > > Computing np.dot(A, B) is highly inefficient if you only need the diagonal > entries. Two more efficient ways of computing the same thing are > > np.sum(A * B.T, axis=1) > > and > > np.einsum("ij,ji->i", A, B). > > The first can allocate quite a lot of temporary memory. > The second can be quite cryptic for someone not familiar with einsum. > I assume that einsum does not compute np.dot(A, B), but I haven't verified. > > Since this is is quite a recurrent pattern, I was wondering if it would be > worth adding a dedicated function to NumPy and SciPy's sparse module. A > possible name would be "diagdot". The best performance would be obtained > when A is C-style and B fortran-style. > Does your implementation use BLAS, or is just a a wrapper around einsum ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Fri May 22 05:50:46 2015 From: nadavh at visionsense.com (Nadav Horesh) Date: Fri, 22 May 2015 09:50:46 +0000 Subject: [Numpy-discussion] np.diag(np.dot(A, B)) Message-ID: There was an idea on this list to provide a function the run multiple dot on several vectors/matrices. It seems to be a particular implementation of this proposed function. Nadav. On 22 May 2015 11:58, David Cournapeau wrote: On Fri, May 22, 2015 at 5:39 PM, Mathieu Blondel > wrote: Hi, I often need to compute the equivalent of np.diag(np.dot(A, B)). Computing np.dot(A, B) is highly inefficient if you only need the diagonal entries. Two more efficient ways of computing the same thing are np.sum(A * B.T, axis=1) and np.einsum("ij,ji->i", A, B). The first can allocate quite a lot of temporary memory. The second can be quite cryptic for someone not familiar with einsum. I assume that einsum does not compute np.dot(A, B), but I haven't verified. Since this is is quite a recurrent pattern, I was wondering if it would be worth adding a dedicated function to NumPy and SciPy's sparse module. A possible name would be "diagdot". The best performance would be obtained when A is C-style and B fortran-style. Does your implementation use BLAS, or is just a a wrapper around einsum ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From mathieu at mblondel.org Fri May 22 06:15:10 2015 From: mathieu at mblondel.org (Mathieu Blondel) Date: Fri, 22 May 2015 19:15:10 +0900 Subject: [Numpy-discussion] np.diag(np.dot(A, B)) In-Reply-To: References: Message-ID: Right now I am using np.sum(A * B.T, axis=1) for dense data and I have implemented a Cython routine for sparse data. I haven't benched np.sum(A * B.T, axis=1) vs. np.einsum("ij,ji->i", A, B) yet since I am mostly interested in the sparse case right now. When A and B are C-style and Fortran-style, the optimal algorithm should be computing the inner products along the diagonal using BLAS. If not, I guess this will need some benchmarking. Another use for this is to compute the row-wise L2 norms: np.diagdot(A, A.T). Mathieu On Fri, May 22, 2015 at 5:58 PM, David Cournapeau wrote: > > > On Fri, May 22, 2015 at 5:39 PM, Mathieu Blondel > wrote: > >> Hi, >> >> I often need to compute the equivalent of >> >> np.diag(np.dot(A, B)). >> >> Computing np.dot(A, B) is highly inefficient if you only need the >> diagonal entries. Two more efficient ways of computing the same thing are >> >> np.sum(A * B.T, axis=1) >> >> and >> >> np.einsum("ij,ji->i", A, B). >> >> The first can allocate quite a lot of temporary memory. >> The second can be quite cryptic for someone not familiar with einsum. >> I assume that einsum does not compute np.dot(A, B), but I haven't >> verified. >> >> Since this is is quite a recurrent pattern, I was wondering if it would >> be worth adding a dedicated function to NumPy and SciPy's sparse module. A >> possible name would be "diagdot". The best performance would be obtained >> when A is C-style and B fortran-style. >> > > Does your implementation use BLAS, or is just a a wrapper around einsum ? > > David > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Fri May 22 06:53:26 2015 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Fri, 22 May 2015 12:53:26 +0200 Subject: [Numpy-discussion] np.diag(np.dot(A, B)) In-Reply-To: References: Message-ID: On 22 May 2015 at 12:15, Mathieu Blondel wrote: > Right now I am using np.sum(A * B.T, axis=1) for dense data and I have > implemented a Cython routine for sparse data. > I haven't benched np.sum(A * B.T, axis=1) vs. np.einsum("ij,ji->i", A, B) > yet since I am mostly interested in the sparse case right now. > In my system, einsum seems to be faster. In [3]: N = 256 In [4]: A = np.random.random((N, N)) In [5]: B = np.random.random((N, N)) In [6]: %timeit np.sum(A * B.T, axis=1) 1000 loops, best of 3: 260 ?s per loop In [7]: %timeit np.einsum("ij,ji->i", A, B) 10000 loops, best of 3: 147 ?s per loop In [9]: N = 1023 In [10]: A = np.random.random((N, N)) In [11]: B = np.random.random((N, N)) In [12]: %timeit np.sum(A * B.T, axis=1) 100 loops, best of 3: 14 ms per loop In [13]: %timeit np.einsum("ij,ji->i", A, B) 100 loops, best of 3: 10.7 ms per loop I have ATLAS installed from the Fedora repos, so not tuned; but einsum is only using one thread anyway, so probably it is not using it (definitely not computing the full dot, because that already takes 200 ms). If B is in FORTRAN order, it is much faster (for N=5000). In [25]: Bf = B.copy(order='F') In [26]: %timeit np.einsum("ij,ji->i", A, Bf) 10 loops, best of 3: 25.7 ms per loop In [27]: %timeit np.einsum("ij,ji->i", A, B) 1 loops, best of 3: 404 ms per loop In [29]: %timeit np.sum(A * Bf.T, axis=1) 10 loops, best of 3: 118 ms per loop In [30]: %timeit np.sum(A * B.T, axis=1) 1 loops, best of 3: 517 ms per loop But the copy is not worth it: In [31]: %timeit Bf = B.copy(order='F') 1 loops, best of 3: 463 ms per loop /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Fri May 22 13:57:17 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Fri, 22 May 2015 13:57:17 -0400 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On Thu, May 21, 2015 at 9:37 PM, Nathaniel Smith wrote: > > .. there's been some discussion of the possibility of > adding specialized gufuncs for broadcasted vector-vector, > vector-matrix, matrix-vector multiplication, which wouldn't do the > magic vector promotion that dot and @ do. This would be nice. What I would like to see is some consistency between multi-matrix support in linalg methods and dot. For example, when A is a matrix and b is a vector and a = linalg.solve(A, b) then dot(A, a) returns b, but if either or both A and b are stacks, this invariant does not hold. I would like to see a function (say xdot) that I can use instead of dot and have xdot(A, a) return b whenever a = linalg.solve(A, b). Similarly, if w,v = linalg.eig(A), then dot(A,v) returns w * v, but only if A is 2d. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri May 22 14:23:02 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 22 May 2015 11:23:02 -0700 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On May 22, 2015 11:00 AM, "Alexander Belopolsky" wrote: > > > On Thu, May 21, 2015 at 9:37 PM, Nathaniel Smith wrote: > > > > .. there's been some discussion of the possibility of > > > adding specialized gufuncs for broadcasted vector-vector, > > vector-matrix, matrix-vector multiplication, which wouldn't do the > > magic vector promotion that dot and @ do. > > > This would be nice. What I would like to see is some consistency between multi-matrix > support in linalg methods and dot. > > For example, when A is a matrix and b is a vector and > > a = linalg.solve(A, b) > > then > > dot(A, a) returns b, but if either or both A and b are stacks, this invariant does not hold. I would like > to see a function (say xdot) that I can use instead of dot and have xdot(A, a) return b whenever a = linalg.solve(A, b). I believe this equivalence holds if xdot(x, y) = x @ y, because solve() does follow the pep 465 semantics for shape handling. Or at least, it's intended to. Of course we will also expose pep 465 matmul semantics under some name that doesn't require the new syntax (probably not "xdot" though ;-)). > Similarly, if w,v = linalg.eig(A), then dot(A,v) returns w * v, but only if A is 2d. Again A @ v I believe does the right thing, though I'm not positive -- you might need a swapaxes or matvec or something. Let us know if you work it out :-). Note that it still won't be equivalent to w * v because w * v doesn't broadcast the way you want :-). You need w[..., np.newaxis, :] * v, I think. -n > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri May 22 14:30:42 2015 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 22 May 2015 14:30:42 -0400 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: At some point, someone is going to make a single documentation page describing all of this, right? Tables, mathtex, and such? I get woozy whenever I see this discussion go on. Ben Root On Fri, May 22, 2015 at 2:23 PM, Nathaniel Smith wrote: > On May 22, 2015 11:00 AM, "Alexander Belopolsky" wrote: > > > > > > On Thu, May 21, 2015 at 9:37 PM, Nathaniel Smith wrote: > > > > > > .. there's been some discussion of the possibility of > > > > > adding specialized gufuncs for broadcasted vector-vector, > > > vector-matrix, matrix-vector multiplication, which wouldn't do the > > > magic vector promotion that dot and @ do. > > > > > > This would be nice. What I would like to see is some consistency > between multi-matrix > > support in linalg methods and dot. > > > > For example, when A is a matrix and b is a vector and > > > > a = linalg.solve(A, b) > > > > then > > > > dot(A, a) returns b, but if either or both A and b are stacks, this > invariant does not hold. I would like > > to see a function (say xdot) that I can use instead of dot and have > xdot(A, a) return b whenever a = linalg.solve(A, b). > > I believe this equivalence holds if xdot(x, y) = x @ y, because solve() > does follow the pep 465 semantics for shape handling. Or at least, it's > intended to. Of course we will also expose pep 465 matmul semantics under > some name that doesn't require the new syntax (probably not "xdot" though > ;-)). > > > Similarly, if w,v = linalg.eig(A), then dot(A,v) returns w * v, but > only if A is 2d. > > Again A @ v I believe does the right thing, though I'm not positive -- you > might need a swapaxes or matvec or something. Let us know if you work it > out :-). > > Note that it still won't be equivalent to w * v because w * v doesn't > broadcast the way you want :-). You need w[..., np.newaxis, :] * v, I think. > > -n > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri May 22 16:05:08 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 22 May 2015 13:05:08 -0700 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On May 22, 2015 11:34 AM, "Benjamin Root" wrote: > > At some point, someone is going to make a single documentation page describing all of this, right? Tables, mathtex, and such? I get woozy whenever I see this discussion go on. That does seem like a good idea, doesn't it. Following the principle that recently-confused users write the best docs, any interest in taking a shot at writing such a thing? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri May 22 16:22:46 2015 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 22 May 2015 16:22:46 -0400 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: That assumes that the said recently-confused ever get to the point of understanding it... and I personally don't do much matrix math work, so I don't have the proper mental context. I just know that coworkers are going to be coming to me asking questions because I am the de facto "python guy". So, having a page I can point them to would be extremely valuable. Ben Root On Fri, May 22, 2015 at 4:05 PM, Nathaniel Smith wrote: > On May 22, 2015 11:34 AM, "Benjamin Root" wrote: > > > > At some point, someone is going to make a single documentation page > describing all of this, right? Tables, mathtex, and such? I get woozy > whenever I see this discussion go on. > > That does seem like a good idea, doesn't it. Following the principle that > recently-confused users write the best docs, any interest in taking a shot > at writing such a thing? > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri May 22 16:58:45 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 22 May 2015 13:58:45 -0700 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On May 22, 2015 1:26 PM, "Benjamin Root" wrote: > > That assumes that the said recently-confused ever get to the point of understanding it... Well, I don't think it's that complicated really. For whatever that's worth :-). My best attempt is here, anyway: https://www.python.org/dev/peps/pep-0465/#semantics The short version is, for 1d and 2d inputs it acts just like dot(). For higher dimension inputs like (i, j, n, m) it acts like any other gufunc (e.g., everything in np.linalg) -- it treats this as an i-by-j stack of n-by-m matrices and is vectorized over the i, j dimensions. And 0d inputs are an error. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri May 22 17:37:11 2015 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 22 May 2015 17:37:11 -0400 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: Then add in broadcasting behavior... On Fri, May 22, 2015 at 4:58 PM, Nathaniel Smith wrote: > On May 22, 2015 1:26 PM, "Benjamin Root" wrote: > > > > That assumes that the said recently-confused ever get to the point of > understanding it... > > Well, I don't think it's that complicated really. For whatever that's > worth :-). My best attempt is here, anyway: > > https://www.python.org/dev/peps/pep-0465/#semantics > > The short version is, for 1d and 2d inputs it acts just like dot(). For > higher dimension inputs like (i, j, n, m) it acts like any other gufunc > (e.g., everything in np.linalg) -- it treats this as an i-by-j stack of > n-by-m matrices and is vectorized over the i, j dimensions. And 0d inputs > are an error. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Fri May 22 17:40:09 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Fri, 22 May 2015 17:40:09 -0400 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On Fri, May 22, 2015 at 4:58 PM, Nathaniel Smith wrote: > For higher dimension inputs like (i, j, n, m) it acts like any other > gufunc (e.g., everything in np.linalg) Unfortunately, not everything in linalg acts the same way. For example, matrix_rank and lstsq don't. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri May 22 17:47:54 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 22 May 2015 14:47:54 -0700 Subject: [Numpy-discussion] Two questions about PEP 465 dot product In-Reply-To: References: Message-ID: On May 22, 2015 2:40 PM, "Benjamin Root" wrote: > > Then add in broadcasting behavior... Vectorized functions broadcast over the vectorized dimensions, there's nothing special about @ in this regard. -n > On Fri, May 22, 2015 at 4:58 PM, Nathaniel Smith wrote: >> >> On May 22, 2015 1:26 PM, "Benjamin Root" wrote: >> > >> > That assumes that the said recently-confused ever get to the point of understanding it... >> >> Well, I don't think it's that complicated really. For whatever that's worth :-). My best attempt is here, anyway: >> >> https://www.python.org/dev/peps/pep-0465/#semantics >> >> The short version is, for 1d and 2d inputs it acts just like dot(). For higher dimension inputs like (i, j, n, m) it acts like any other gufunc (e.g., everything in np.linalg) -- it treats this as an i-by-j stack of n-by-m matrices and is vectorized over the i, j dimensions. And 0d inputs are an error. >> >> -n >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antony.lee at berkeley.edu Sun May 24 04:22:21 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Sun, 24 May 2015 01:22:21 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState Message-ID: Hi, As mentioned in #1450: Patch with Ziggurat method for Normal distribution #5158: ENH: More efficient algorithm for unweighted random choice without replacement #5299: using `random.choice` to sample integers in a large range #5851: Bug in np.random.dirichlet for small alpha parameters some methods on np.random.RandomState are implemented either non-optimally (#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily changed due to backwards compatibility concerns. While some have suggested new methods deprecating the old ones (see e.g. #5872), some consensus has formed around the following ideas (see #5299 for original discussion, followed by private discussions with @njsmith): - Backwards compatibility should only be provided to those who were explicitly instantiating a seeded RandomState object or reseeding a RandomState object to a given value, and drawing variates from it: using the global methods (or a None-seeded RandomState) was already non-reproducible anyways as e.g. other libraries could be drawing variates from the global RandomState (of which the free functions in np.random are actually methods). Thus, the global RandomState object should use the latest implementation of the methods. - "RandomState(seed)" and "r = RandomState(...); r.seed(seed)" should offer backwards-compatibility guarantees (see e.g. https://docs.python.org/3.4/library/random.html#notes-on-reproducibility). As such, we propose the following improvements to the API: - RandomState gains a (keyword-only) parameter, "version", also accessible as a read-only attribute. This indicates the version of the methods on the object. The current version of RandomState is retroactively assigned version 0. The latest available version is available as np.random.LATEST_VERSION. Backwards-incompatible improvements to RandomState methods can be introduced but increase the LAGTEST_VERSION. - The global RandomState is instantiated as RandomState(version=LATEST_VERSION). - RandomState() and rs.seed() sets the version to LATEST_VERSION. - RandomState(seed[!=None]) and rs.seed(seed[!=None]) sets the version to 0. A proof-of-concept implementation, still missing tests, is tracked as #5911. It includes the patch proposed in #5158 as an example of how to include an improved version of random.choice. Comments, and help for writing tests (in particular to make sure backwards compatibility is maintained) are welcome. Antony Lee -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun May 24 04:59:49 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 24 May 2015 10:59:49 +0200 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: On Sun, May 24, 2015 at 10:22 AM, Antony Lee wrote: > Hi, > > As mentioned in > > #1450: Patch with Ziggurat method for Normal distribution > #5158: ENH: More efficient algorithm for unweighted random choice without > replacement > #5299: using `random.choice` to sample integers in a large range > #5851: Bug in np.random.dirichlet for small alpha parameters > > some methods on np.random.RandomState are implemented either non-optimally > (#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily > changed due to backwards compatibility concerns. While some have suggested > new methods deprecating the old ones (see e.g. #5872), some consensus has > formed around the following ideas (see #5299 for original discussion, > followed by private discussions with @njsmith): > > - Backwards compatibility should only be provided to those who were > explicitly instantiating a seeded RandomState object or reseeding a > RandomState object to a given value, and drawing variates from it: using > the global methods (or a None-seeded RandomState) was already > non-reproducible anyways as e.g. other libraries could be drawing variates > from the global RandomState (of which the free functions in np.random are > actually methods). Thus, the global RandomState object should use the > latest implementation of the methods. > The rest of the proposal looks good to me, but the reasoning on this point is shaky. np.random.seed() is *very* widely used, and works fine for a test suite where each test that needs random numbers calls seed(...) and is run with nose. Can you explain why you need to touch the behavior of the global methods in order to make RandomState(version=) work? Ralf - "RandomState(seed)" and "r = RandomState(...); r.seed(seed)" should offer > backwards-compatibility guarantees (see e.g. > https://docs.python.org/3.4/library/random.html#notes-on-reproducibility). > > As such, we propose the following improvements to the API: > > - RandomState gains a (keyword-only) parameter, "version", also accessible > as a read-only attribute. This indicates the version of the methods on the > object. The current version of RandomState is retroactively assigned > version 0. The latest available version is available as > np.random.LATEST_VERSION. Backwards-incompatible improvements to > RandomState methods can be introduced but increase the LAGTEST_VERSION. > > - The global RandomState is instantiated as > RandomState(version=LATEST_VERSION). > > - RandomState() and rs.seed() sets the version to LATEST_VERSION. > > - RandomState(seed[!=None]) and rs.seed(seed[!=None]) sets the version to > 0. > > A proof-of-concept implementation, still missing tests, is tracked as > #5911. It includes the patch proposed in #5158 as an example of how to > include an improved version of random.choice. > > Comments, and help for writing tests (in particular to make sure backwards > compatibility is maintained) are welcome. > > Antony Lee > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun May 24 05:30:31 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 24 May 2015 02:30:31 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: On May 24, 2015 2:03 AM, "Ralf Gommers" wrote: > > On Sun, May 24, 2015 at 10:22 AM, Antony Lee wrote: >> >> Hi, >> >> As mentioned in >> >> #1450: Patch with Ziggurat method for Normal distribution >> #5158: ENH: More efficient algorithm for unweighted random choice without replacement >> #5299: using `random.choice` to sample integers in a large range >> #5851: Bug in np.random.dirichlet for small alpha parameters >> >> some methods on np.random.RandomState are implemented either non-optimally (#1450, #5158, #5299) or have outright bugs (#5851), but cannot be easily changed due to backwards compatibility concerns. While some have suggested new methods deprecating the old ones (see e.g. #5872), some consensus has formed around the following ideas (see #5299 for original discussion, followed by private discussions with @njsmith): >> >> - Backwards compatibility should only be provided to those who were explicitly instantiating a seeded RandomState object or reseeding a RandomState object to a given value, and drawing variates from it: using the global methods (or a None-seeded RandomState) was already non-reproducible anyways as e.g. other libraries could be drawing variates from the global RandomState (of which the free functions in np.random are actually methods). Thus, the global RandomState object should use the latest implementation of the methods. > > > The rest of the proposal looks good to me, but the reasoning on this point is shaky. np.random.seed() is *very* widely used, and works fine for a test suite where each test that needs random numbers calls seed(...) and is run with nose. Can you explain why you need to touch the behavior of the global methods in order to make RandomState(version=) work? You're absolutely right about it being important to preserve the behavior of the global functions when seeded, but I think this is just a bug in the description of the proposal here, not in the proposal itself :-). If you look at the PR, there's no change to how the global functions work -- they're still just a transparently thin wrapper around a hidden, global RandomState object, and thus IIUC changes to RandomState will automatically apply to the global functions as well. So with this proposal, an unseeded RandomState uses the latest version -> therefore the global functions, which start out unseeded, start out using the latest version. If you call .seed() on an existing RandomState object and pass in a seed but no version= argument, the version gets reset to 0 -> therefore if you call the global seed() function and pass in a seed but no version= argument, the global RandomState gets reset to version 0 (at least until the next time seed() is called), and backcompat is preserved. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun May 24 05:54:24 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 24 May 2015 11:54:24 +0200 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: On Sun, May 24, 2015 at 11:30 AM, Nathaniel Smith wrote: > So with this proposal, an unseeded RandomState uses the latest version -> > therefore the global functions, which start out unseeded, start out using > the latest version. If you call .seed() on an existing RandomState object > and pass in a seed but no version= argument, the version gets reset to 0 -> > therefore if you call the global seed() function and pass in a seed but no > version= argument, the global RandomState gets reset to version 0 (at least > until the next time seed() is called), and backcompat is preserved. > > On May 24, 2015 2:03 AM, "Ralf Gommers" wrote: > > > > On Sun, May 24, 2015 at 10:22 AM, Antony Lee > wrote: > >> > >> Hi, > >> > >> As mentioned in > >> > >> #1450: Patch with Ziggurat method for Normal distribution > >> #5158: ENH: More efficient algorithm for unweighted random choice > without replacement > >> #5299: using `random.choice` to sample integers in a large range > >> #5851: Bug in np.random.dirichlet for small alpha parameters > >> > >> some methods on np.random.RandomState are implemented either > non-optimally (#1450, #5158, #5299) or have outright bugs (#5851), but > cannot be easily changed due to backwards compatibility concerns. While > some have suggested new methods deprecating the old ones (see e.g. #5872), > some consensus has formed around the following ideas (see #5299 for > original discussion, followed by private discussions with @njsmith): > >> > >> - Backwards compatibility should only be provided to those who were > explicitly instantiating a seeded RandomState object or reseeding a > RandomState object to a given value, and drawing variates from it: using > the global methods (or a None-seeded RandomState) was already > non-reproducible anyways as e.g. other libraries could be drawing variates > from the global RandomState (of which the free functions in np.random are > actually methods). Thus, the global RandomState object should use the > latest implementation of the methods. > > > > > > The rest of the proposal looks good to me, but the reasoning on this > point is shaky. np.random.seed() is *very* widely used, and works fine for > a test suite where each test that needs random numbers calls seed(...) and > is run with nose. Can you explain why you need to touch the behavior of the > global methods in order to make RandomState(version=) work? > You're absolutely right about it being important to preserve the behavior > of the global functions when seeded, but I think this is just a bug in the > description of the proposal here, not in the proposal itself :-). If you > look at the PR, there's no change to how the global functions work -- > they're still just a transparently thin wrapper around a hidden, global > RandomState object, and thus IIUC changes to RandomState will automatically > apply to the global functions as well. > > Thanks for the clarification. Then +1 from me for this proposal. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sun May 24 08:41:18 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sun, 24 May 2015 08:41:18 -0400 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: <5561C6EE.1080000@gmail.com> I echo Ralf's question. For those who need replicability, the proposed upgrade path seems quite radical. Also, I would prefer to have the new functionality introduced beside the existing implementation of RandomState, with an announcement that RandomState will change in the next major numpy version number. This will allow everyone who wants to to change now, without requiring that users attend to minor numpy version numbers if they want replicability. I think this is what is required by semantic versioning. Alan Isaac On 5/24/2015 4:59 AM, Ralf Gommers wrote: > the reasoning on this point is shaky. np.random.seed() is *very* widely used, and works fine for a test suite where each test that needs random > numbers calls seed(...) and is run with nose. Can you explain why you need to touch the behavior of the global methods in order to make > RandomState(version=) work? From ralf.gommers at gmail.com Sun May 24 08:47:34 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 24 May 2015 14:47:34 +0200 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: <5561C6EE.1080000@gmail.com> References: <5561C6EE.1080000@gmail.com> Message-ID: On Sun, May 24, 2015 at 2:41 PM, Alan G Isaac wrote: > I echo Ralf's question. > For those who need replicability, the proposed upgrade path seems quite > radical. > It's not radical, and my question was already answered. Nothing changes if you are doing: np.random.seed(1234) np.random.any_random_sample_generator_func() Values only change if you leave out the call to seed(), which you should never do if you care about replicability. Ralf > Also, I would prefer to have the new functionality introduced beside the > existing > implementation of RandomState, with an announcement that RandomState > will change in the next major numpy version number. This will allow > everyone > who wants to to change now, without requiring that users attend to minor > numpy version numbers if they want replicability. > > I think this is what is required by semantic versioning. > > Alan Isaac > > > > On 5/24/2015 4:59 AM, Ralf Gommers wrote: > > the reasoning on this point is shaky. np.random.seed() is *very* widely > used, and works fine for a test suite where each test that needs random > > numbers calls seed(...) and is run with nose. Can you explain why you > need to touch the behavior of the global methods in order to make > > RandomState(version=) work? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Sun May 24 09:08:12 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sun, 24 May 2015 09:08:12 -0400 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> Message-ID: <5561CD3C.8020401@gmail.com> On 5/24/2015 8:47 AM, Ralf Gommers wrote: > Values only change if you leave out the call to seed() OK, but this claim seems to conflict with the following language: "the global RandomState object should use the latest implementation of the methods". I take it that this is what Nathan meant by "I think this is just a bug in the description of the proposal here, not in the proposal itself". So, is the correct phrasing "the global RandomState object should use the latest implementation of the methods, unless explicitly seeded"? Thanks, Alan From ralf.gommers at gmail.com Sun May 24 10:55:31 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 24 May 2015 16:55:31 +0200 Subject: [Numpy-discussion] ANN: Scipy 0.16.0 beta release 2 Message-ID: Hi all, The second beta for Scipy 0.16.0 is now available. After beta 1 a couple of critical issues on Windows were solved, and there are now also 32-bit Windows binaries (along with the sources and release notes) available on https://sourceforge.net/projects/scipy/files/scipy/0.16.0b2/. Please try this release and report any issues on the scipy-dev mailing list. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 24 11:04:11 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 24 May 2015 11:04:11 -0400 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: <5561CD3C.8020401@gmail.com> References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac wrote: > On 5/24/2015 8:47 AM, Ralf Gommers wrote: > > Values only change if you leave out the call to seed() > > > OK, but this claim seems to conflict with the following language: > "the global RandomState object should use the latest implementation of the > methods". > I take it that this is what Nathan meant by > "I think this is just a bug in the description of the proposal here, not > in the proposal itself". > > So, is the correct phrasing > "the global RandomState object should use the latest implementation of the > methods, unless explicitly seeded"? > that's how I understand it. I don't see any problems with the clarified proposal for the use cases that I know of. Can we choose the version also for the global random state, for example to fix both version and seed in unit tests, with version > 0? BTW: I would expect that bug fixes are still exempt from backwards compatibility. fixing #5851 should be independent of the version, (without having looked at the issue) (If you need to replicate bugs, then use an old version of a package.) Josef > > Thanks, > Alan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From archibald at astron.nl Sun May 24 11:13:49 2015 From: archibald at astron.nl (Anne Archibald) Date: Sun, 24 May 2015 15:13:49 +0000 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: Do we want a deprecation-like approach, so that eventually people who want replicability will specify versions, and everyone else gets bug fixes and improvements? This would presumably take several major versions, but it might avoid people getting unintentionally trapped on this version. Incidentally, bug fixes are complicated: if a bug fix uses more or fewer raw random numbers, it breaks repeatability not just for the call that got fixed but for all successive random number generations. Anne On Sun, May 24, 2015 at 5:04 PM wrote: > On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac > wrote: > >> On 5/24/2015 8:47 AM, Ralf Gommers wrote: >> > Values only change if you leave out the call to seed() >> >> >> OK, but this claim seems to conflict with the following language: >> "the global RandomState object should use the latest implementation of >> the methods". >> I take it that this is what Nathan meant by >> "I think this is just a bug in the description of the proposal here, not >> in the proposal itself". >> >> So, is the correct phrasing >> "the global RandomState object should use the latest implementation of >> the methods, unless explicitly seeded"? >> > > that's how I understand it. > > I don't see any problems with the clarified proposal for the use cases > that I know of. > > Can we choose the version also for the global random state, for example to > fix both version and seed in unit tests, with version > 0? > > > BTW: I would expect that bug fixes are still exempt from backwards > compatibility. > > fixing #5851 should be independent of the version, (without having looked > at the issue) > > (If you need to replicate bugs, then use an old version of a package.) > > Josef > > >> >> Thanks, >> Alan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 24 11:40:06 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 24 May 2015 11:40:06 -0400 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On Sun, May 24, 2015 at 11:13 AM, Anne Archibald wrote: > Do we want a deprecation-like approach, so that eventually people who want > replicability will specify versions, and everyone else gets bug fixes and > improvements? This would presumably take several major versions, but it > might avoid people getting unintentionally trapped on this version. > > Incidentally, bug fixes are complicated: if a bug fix uses more or fewer > raw random numbers, it breaks repeatability not just for the call that got > fixed but for all successive random number generations. > Reminder: we are bottom or inline posting > > > Anne > > On Sun, May 24, 2015 at 5:04 PM wrote: > >> On Sun, May 24, 2015 at 9:08 AM, Alan G Isaac >> wrote: >> >>> On 5/24/2015 8:47 AM, Ralf Gommers wrote: >>> > Values only change if you leave out the call to seed() >>> >>> >>> OK, but this claim seems to conflict with the following language: >>> "the global RandomState object should use the latest implementation of >>> the methods". >>> I take it that this is what Nathan meant by >>> "I think this is just a bug in the description of the proposal here, not >>> in the proposal itself". >>> >>> So, is the correct phrasing >>> "the global RandomState object should use the latest implementation of >>> the methods, unless explicitly seeded"? >>> >> >> that's how I understand it. >> >> I don't see any problems with the clarified proposal for the use cases >> that I know of. >> >> Can we choose the version also for the global random state, for example >> to fix both version and seed in unit tests, with version > 0? >> >> >> BTW: I would expect that bug fixes are still exempt from backwards >> compatibility. >> >> fixing #5851 should be independent of the version, (without having >> looked at the issue) >> > I skimmed the issue. In a strict sense it's not really a bug, the user doesn't get wrong numbers, he or she gets Not A Number. So there are no current usages that use the function in that range. Josef > >> (If you need to replicate bugs, then use an old version of a package.) >> >> Josef >> >> >>> >>> Thanks, >>> Alan >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun May 24 13:49:22 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 24 May 2015 10:49:22 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On May 24, 2015 8:43 AM, wrote: > > Reminder: we are bottom or inline posting Can we stop hassling people about this? Inline replies are a great tool to have in your toolkit for complicated technical discussions, but I feel like our weird insistence on them has turned into a pointless and exclusionary thing. It's not like bottom replying is even any better -- the traditional mailing list rule is you trim quotes to just the part you're replying to (like this message); quoting the whole thing and replying underneath just to give people a bit of exercise for their scrolling finger would totally have gotten you flamed too. But email etiquette has moved on since the 90s, even regular posters to this list violate this "rule" all the time, it's time to let it go. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 24 14:01:28 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 24 May 2015 14:01:28 -0400 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On Sun, May 24, 2015 at 1:49 PM, Nathaniel Smith wrote: > On May 24, 2015 8:43 AM, wrote: > > > > Reminder: we are bottom or inline posting > > Can we stop hassling people about this? Inline replies are a great tool to > have in your toolkit for complicated technical discussions, but I feel like > our weird insistence on them has turned into a pointless and exclusionary > thing. It's not like bottom replying is even any better -- the traditional > mailing list rule is you trim quotes to just the part you're replying to > (like this message); quoting the whole thing and replying underneath just > to give people a bit of exercise for their scrolling finger would totally > have gotten you flamed too. > > But email etiquette has moved on since the 90s, even regular posters to > this list violate this "rule" all the time, it's time to let it go. > It's not a 90's thing and I learned about it around 2009 when I started in here. I find it very annoying trying to catch up with a longer thread and the replies are all over the place. Anne is a few years older than I in terms of numpy and scipy participation and this was just intended to be a friendly reminder. And as BTW: I'm glad Anne is back with scipy. Josef > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun May 24 14:04:39 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 24 May 2015 11:04:39 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On May 24, 2015 8:15 AM, "Anne Archibald" wrote: > > Do we want a deprecation-like approach, so that eventually people who want replicability will specify versions, and everyone else gets bug fixes and improvements? This would presumably take several major versions, but it might avoid people getting unintentionally trapped on this version. I'm not sure what you're envisioning as needing a deprecation cycle? The neat thing about random is that we already have a way for users to say that they want replicability -- the use of an explicit seed -- so we can just immediately go to the world you describe, where people who seed get to pick their version (or default to version 0 for backcompat), and everyone else gets the improvements automatically. Or is this different from what you meant somehow? Fortunately we haven't yet run into any really serious bugs in random, like "oops we're sampling from the wrong distribution" type bugs. Mostly it's more like "oops this is really inefficient" or "oops this crashes in this edge case", so there's no real harm in letting people use old versions. If we did run into a case where we were giving flat out wrong results, then I guess we'd still want to keep the code around because reproducibility is still important, but perhaps with a requirement that you pass an extra argument like I_know_its_broken=True or something so that people couldn't end up running the broken code accidentally? I guess we'll cross that bridge when we come to it. > Incidentally, bug fixes are complicated: if a bug fix uses more or fewer raw random numbers, it breaks repeatability not just for the call that got fixed but for all successive random number generations. Yep. This is why we mostly haven't been able to change behavior at *all* except in cases where there was a clear error so we know no-one was using something. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun May 24 14:46:50 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 24 May 2015 20:46:50 +0200 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On 24/05/15 17:13, Anne Archibald wrote: > Do we want a deprecation-like approach, so that eventually people who > want replicability will specify versions, and everyone else gets bug > fixes and improvements? This would presumably take several major > versions, but it might avoid people getting unintentionally trapped on > this version. > > Incidentally, bug fixes are complicated: if a bug fix uses more or fewer > raw random numbers, it breaks repeatability not just for the call that > got fixed but for all successive random number generations. If a function has a bug, changing it will change the output of the function. This is not special for random numbers. If not retaining the old erroneous output means we break-backwards compatibility, then no bugs can ever be fixed, anywhere in NumPy. I think we need to clarify what we mean by backwards compatibility for random numbers. What guarantees should we make from one version to another? Sturla From sturla.molden at gmail.com Sun May 24 14:56:17 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 24 May 2015 20:56:17 +0200 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On 24/05/15 20:04, Nathaniel Smith wrote: > I'm not sure what you're envisioning as needing a deprecation cycle? The > neat thing about random is that we already have a way for users to say > that they want replicability -- the use of an explicit seed -- No, this is not sufficient for random numbers. Random sampling and ziggurat generators are examples. If we introduce a change (e.g. a bugfix) that will affect the number of calls to the entropy source, just setting the seed will in general not be enough to ensure backwards compatibility. That is e.g. the case with using ziggurat samplers instead of the current transcendental transforms for normal, exponential and gamma distributions. While ziggurat is faster (and to my knowledge) more accurate, it will also make a different number of calls to the entropy source, and hence the whole sequence will be affected, even if you do set a random seed. Sturla From robert.kern at gmail.com Sun May 24 15:22:32 2015 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 24 May 2015 20:22:32 +0100 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On Sun, May 24, 2015 at 7:46 PM, Sturla Molden wrote: > > On 24/05/15 17:13, Anne Archibald wrote: > > Do we want a deprecation-like approach, so that eventually people who > > want replicability will specify versions, and everyone else gets bug > > fixes and improvements? This would presumably take several major > > versions, but it might avoid people getting unintentionally trapped on > > this version. > > > > Incidentally, bug fixes are complicated: if a bug fix uses more or fewer > > raw random numbers, it breaks repeatability not just for the call that > > got fixed but for all successive random number generations. > > If a function has a bug, changing it will change the output of the > function. This is not special for random numbers. If not retaining the > old erroneous output means we break-backwards compatibility, then no > bugs can ever be fixed, anywhere in NumPy. I think we need to clarify > what we mean by backwards compatibility for random numbers. What > guarantees should we make from one version to another? The policy thus far has been that we will fix bugs in the distributions and make changes that allow a strictly wider domain of distribution parameters (e.g. allowing b==0 where before we only allowed b>0), but we will not make other enhancements that would change existing good output. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun May 24 15:25:39 2015 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 24 May 2015 20:25:39 +0100 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On Sun, May 24, 2015 at 7:56 PM, Sturla Molden wrote: > > On 24/05/15 20:04, Nathaniel Smith wrote: > > > I'm not sure what you're envisioning as needing a deprecation cycle? The > > neat thing about random is that we already have a way for users to say > > that they want replicability -- the use of an explicit seed -- > > No, this is not sufficient for random numbers. Random sampling and > ziggurat generators are examples. If we introduce a change (e.g. a > bugfix) that will affect the number of calls to the entropy source, just > setting the seed will in general not be enough to ensure backwards > compatibility. That is e.g. the case with using ziggurat samplers > instead of the current transcendental transforms for normal, exponential > and gamma distributions. While ziggurat is faster (and to my knowledge) > more accurate, it will also make a different number of calls to the > entropy source, and hence the whole sequence will be affected, even if > you do set a random seed. Please reread the proposal at the top of the thread. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From antony.lee at berkeley.edu Sun May 24 16:15:04 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Sun, 24 May 2015 13:15:04 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: Thanks to Nathaniel who has indeed clarified my intent, i.e. "the global RandomState should use the latest implementation, unless explicitly seeded". More generally, the `RandomState` constructor is just a thin wrapper around `seed` with the same signature, so one can swap the version of the global functions with a call to `np.random.seed(version=...)`. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun May 24 16:30:59 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 24 May 2015 22:30:59 +0200 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: On 24/05/15 10:22, Antony Lee wrote: > Comments, and help for writing tests (in particular to make sure > backwards compatibility is maintained) are welcome. I have one comment, and that is what makes random numbers so special? This applies to the rest of NumPy too, fixing a bug can sometimes change the output of a function. Personally I think we should only make guarantees about the data types, array shapes, and things like that, but not about the values. Those who need a particular version of NumPy for exact reproducibility should install the version of Python and NumPy they need. That is why virtual environments exist. I am sure a lot will disagree with me on this. So please don't take this as flamebait. Sturla From njs at pobox.com Sun May 24 16:39:52 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 24 May 2015 13:39:52 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: <5561C6EE.1080000@gmail.com> <5561CD3C.8020401@gmail.com> Message-ID: On May 24, 2015 11:04 AM, wrote: > > On Sun, May 24, 2015 at 1:49 PM, Nathaniel Smith wrote: >> >> On May 24, 2015 8:43 AM, wrote: >> > >> > Reminder: we are bottom or inline posting >> >> Can we stop hassling people about this? Inline replies are a great tool to have in your toolkit for complicated technical discussions, but I feel like our weird insistence on them has turned into a pointless and exclusionary thing. It's not like bottom replying is even any better -- the traditional mailing list rule is you trim quotes to just the part you're replying to (like this message); quoting the whole thing and replying underneath just to give people a bit of exercise for their scrolling finger would totally have gotten you flamed too. >> >> But email etiquette has moved on since the 90s, even regular posters to this list violate this "rule" all the time, it's time to let it go. > > > It's not a 90's thing and I learned about it around 2009 when I started in here. > I find it very annoying trying to catch up with a longer thread and the replies are all over the place. > > > Anne is a few years older than I in terms of numpy and scipy participation and this was just intended to be a friendly reminder. And while I know you didn't mean it this way, I'm guessing that being immediately greeted by criticism for failing to follow some arbitrary and inconsistently-applied rule was indeed a strong reminder of what a unpleasant place FOSS mailing lists can sometimes be, and why someone might disappear from them for a few years. I think we can do better. This is pretty off-topic for this thread, though, see so let's let it lie here. If anyone desperately needs to comment further please email me off-list. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From antony.lee at berkeley.edu Sun May 24 17:09:43 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Sun, 24 May 2015 14:09:43 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: 2015-05-24 13:30 GMT-07:00 Sturla Molden : > On 24/05/15 10:22, Antony Lee wrote: > > > Comments, and help for writing tests (in particular to make sure > > backwards compatibility is maintained) are welcome. > > I have one comment, and that is what makes random numbers so special? > This applies to the rest of NumPy too, fixing a bug can sometimes change > the output of a function. > > Personally I think we should only make guarantees about the data types, > array shapes, and things like that, but not about the values. Those who > need a particular version of NumPy for exact reproducibility should > install the version of Python and NumPy they need. That is why virtual > environments exist. I personally agree with this point of view (see original discussion in #5299, for example); if it was only up to me at least I'd make RandomState(seed) default to the latest version rather than the original one (whether to keep the old versions around is another question). On the other hand, I see that this long-standing debate has prevented obvious improvements from being added sometimes for years (e.g. a patch for Ziggurat normal variates has been lying around since 2010), or led to potential API duplication in order to fix some clearly undesirable behavior (dirichlet returning "nan" being described as "in a strict sense not really a bug"(!)), so I'm willing to compromise to get this moving forward. Antony -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 24 17:49:17 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 24 May 2015 17:49:17 -0400 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: On Sun, May 24, 2015 at 5:09 PM, Antony Lee wrote: > 2015-05-24 13:30 GMT-07:00 Sturla Molden : > >> On 24/05/15 10:22, Antony Lee wrote: >> >> > Comments, and help for writing tests (in particular to make sure >> > backwards compatibility is maintained) are welcome. >> >> I have one comment, and that is what makes random numbers so special? >> This applies to the rest of NumPy too, fixing a bug can sometimes change >> the output of a function. >> >> Personally I think we should only make guarantees about the data types, >> array shapes, and things like that, but not about the values. Those who >> need a particular version of NumPy for exact reproducibility should >> install the version of Python and NumPy they need. That is why virtual >> environments exist. > > > I personally agree with this point of view (see original discussion in > #5299, for example); if it was only up to me at least I'd make > RandomState(seed) default to the latest version rather than the original > one (whether to keep the old versions around is another question). On the > other hand, I see that this long-standing debate has prevented obvious > improvements from being added sometimes for years (e.g. a patch for > Ziggurat normal variates has been lying around since 2010), or led to > potential API duplication in order to fix some clearly undesirable behavior > (dirichlet returning "nan" being described as "in a strict sense not really > a bug"(!)), so I'm willing to compromise to get this moving forward. > It's clearly a different kind of "bug" than some of the ones we fixed in the past without backwards compatibility discussion where the distribution was wrong, i.e. some values shifted so parts have more weight and parts have less weight. As I mentioned, I don't see any real problem with the proposal. Josef > > Antony > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andyfaff at gmail.com Mon May 25 07:02:42 2015 From: andyfaff at gmail.com (Andrew Nelson) Date: Mon, 25 May 2015 21:02:42 +1000 Subject: [Numpy-discussion] Chaining apply_over_axis for multiple axes. Message-ID: I have a function that operates over a 1D array, to return an array of a similar size. To use it in a 2D fashion I would have to do something like the following: for row in range(np.size(arr, 0): arr_out[row] = func(arr[row]) for col in range(np.size(arr, 1): arr_out[:, col] = func(arr[:, col]) I would like to generalise this to N dimensions. Does anyone have any suggestions of how to achieve this? Presumably what I need to do is build an iterator, and then remove an axis: # arr.shape=(2, 3, 4) it = np.nditer(arr, flags=['multi_index']) it.remove_axis(2) while not it.finished: arr_out[it.multi_index] = func(arr[it.multi_index]) it.iternext() If I have an array with shape (2, 3, 4) this would allow me to iterate over the 6 1D arrays that are 4 elements long. However, how do I then construct the iterator for the preceding axes? -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon May 25 07:14:08 2015 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 25 May 2015 13:14:08 +0200 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: On 24 May 2015 at 22:30, Sturla Molden wrote: > Personally I think we should only make guarantees about the data types, > array shapes, and things like that, but not about the values. Those who > need a particular version of NumPy for exact reproducibility should > install the version of Python and NumPy they need. That is why virtual > environments exist. > But there is a lot of legacy code out there that doesn't specify the version required; and in most cases the original author cannot even be asked. Tests are a particularly annoying case. For example, when testing an algorithm, is usually a good practice to record the number of iterations as well as the result; consider it an early warning that we have changed something we possibly didn't mean to, even if the result is correct. If we want to support several NumPy versions, and the algorithm has any randomness, the tests would have to be duplicated, or find a seed that gives the exact same results. Thus, keeping different versions lets us compare the results against the old API, without needing to duplicate the tests. A lot less people will get annoyed. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Mon May 25 11:21:29 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 25 May 2015 09:21:29 -0600 Subject: [Numpy-discussion] Chaining apply_over_axis for multiple axes. In-Reply-To: References: Message-ID: <1432567289.2764.7.camel@sipsolutions.net> On Mo, 2015-05-25 at 21:02 +1000, Andrew Nelson wrote: > I have a function that operates over a 1D array, to return an array of > a similar size. To use it in a 2D fashion I would have to do > something like the following: > > > for row in range(np.size(arr, 0): > arr_out[row] = func(arr[row]) > for col in range(np.size(arr, 1): > arr_out[:, col] = func(arr[:, col]) > > > I would like to generalise this to N dimensions. Does anyone have any > suggestions of how to achieve this? Presumably what I need to do is > build an iterator, and then remove an axis: > > > # arr.shape=(2, 3, 4) > it = np.nditer(arr, flags=['multi_index']) > it.remove_axis(2) > while not it.finished: > arr_out[it.multi_index] = func(arr[it.multi_index]) > it.iternext() > Just warning that nditer is pretty low level (i.e. can be a bit mind boggling since it is close to the C-side of things). Anyway, you can of course do this just iterating the result. Since you have no buffering, etc. this should work fine. There is also `np.nesterd_iters` but since I am a bit lazy to look it up, you would have to actually check some examples for it from the numpy tests to see how it works probably. - Sebastian > > If I have an array with shape (2, 3, 4) this would allow me to iterate > over the 6 1D arrays that are 4 elements long. However, how do I then > construct the iterator for the preceding axes? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Mon May 25 11:38:26 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 25 May 2015 08:38:26 -0700 Subject: [Numpy-discussion] Chaining apply_over_axis for multiple axes. In-Reply-To: References: Message-ID: On May 25, 2015 4:05 AM, "Andrew Nelson" wrote: > > I have a function that operates over a 1D array, to return an array of a similar size. To use it in a 2D fashion I would have to do something like the following: > > for row in range(np.size(arr, 0): > arr_out[row] = func(arr[row]) > for col in range(np.size(arr, 1): > arr_out[:, col] = func(arr[:, col]) > > I would like to generalise this to N dimensions. Does anyone have any suggestions of how to achieve this? The crude but effective way is tmp_in = arr.reshape((-1, arr.shape[- 1])) tmp_out = np.empty(tmp_in.shape) for i in range(tmp_in.shape[0]): tmp_out[i, :] = func(tmp_in[i, :]) out = tmp_out.reshape(arr.shape) This won't produce any unnecessary copies if your input array is contiguous. This also assumes you want to apply the function on the last axis. If not you can do something like arr = arr.swapaxes(axis, -1) ... call the code above ... out = out.swapaxes(axis, -1) This will result in an extra copy of the input array though if it's >2d and the requested axis is not the last one. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue May 26 10:56:43 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 May 2015 10:56:43 -0400 Subject: [Numpy-discussion] Strategy for OpenBLAS Message-ID: Hi, This morning I was wondering whether we ought to plan to devote some resources to collaborating with the OpenBLAS team. Summary: we should explore ways of setting up numpy as a test engine for OpenBLAS development. Detail: I am getting the impression that OpenBLAS is looking like the most likely medium term solution for open-source stack builds of numpy and scipy on Linux and Windows at least. ATLAS has been our choice for this up until now, but it is designed for optimizing to a particular CPU configuration, which will likely make it slow on some or even most of the machines a general installer gets installed on. This is only likely to get more severe over time, because current ATLAS development is on multi-core optimization, where the number of cores may need to be set at compile time. The worry about OpenBLAS has always been that it is hard to maintain, and fixes don't always have tests. There might be other alternatives that are a better bet technically, but don't currently have OpenBLAS' dynamic selection features or CPU support. It is relatively easy to add tests using Python / numpy. We like tests. Why don't we propose a collaboration with OpenBLAS where we build and test numpy with every / most / some commits of OpenBLAS, and try to make it easy for the OpenBLAS team to add tests. Maybe we can use and add to the list of machines on which OpenBLAS is tested [1]? We Berkeley Pythonistas can certainly add the machines at our buildbot farm [2]. Maybe the Julia / R developers would be interested to help too? Cheers, Matthew [1] https://github.com/xianyi/OpenBLAS/wiki/Machine-List [2] http://nipy.bic.berkeley.edu/buildslaves From thomas.p.krauss at gmail.com Tue May 26 10:59:25 2015 From: thomas.p.krauss at gmail.com (Tom Krauss) Date: Tue, 26 May 2015 09:59:25 -0500 Subject: [Numpy-discussion] addition to numpy.i Message-ID: Hi folks, After some discussion with Bill Spotz I decided to try to submit my new typemap to numpy.i that allows in-place arrays of an arbitrary number of dimensions to be passed in as a "flat" array with a single "size". To that end I created my first pull request https://github.com/numpy/numpy/pull/5914 sorry if I missed any steps or procedures - I noticed only after I did the commit and pull request that I should have created a new feature branch, sorry about that. Anyway I noticed the pull request initiated a series of tests and one of them failed. How do I go about debugging and resolving the failure? Thanks for your help, Tom Krauss -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue May 26 12:53:08 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 26 May 2015 18:53:08 +0200 Subject: [Numpy-discussion] Strategy for OpenBLAS In-Reply-To: References: Message-ID: <5564A4F4.8050803@googlemail.com> On 05/26/2015 04:56 PM, Matthew Brett wrote: > Hi, > > This morning I was wondering whether we ought to plan to devote some > resources to collaborating with the OpenBLAS team. > > > > It is relatively easy to add tests using Python / numpy. We like > tests. Why don't we propose a collaboration with OpenBLAS where we > build and test numpy with every / most / some commits of OpenBLAS, and > try to make it easy for the OpenBLAS team to add tests. Maybe we > can use and add to the list of machines on which OpenBLAS is tested > [1]? We Berkeley Pythonistas can certainly add the machines at our > buildbot farm [2]. Maybe the Julia / R developers would be interested > to help too? > Technically we only need a single machine with the newest instruction set available. All other cases could then be tested via a virtual machine that only exposes specific instruction sets (e.g. qemu which could technically also emulate stuff the host does not have). Concerning test generation there is a huge parameter space that needs testing due with openblas, at least some of it would need to be automated/fuzzed. We also need specific preconditioning of memory to test failure cases openblas had in the past, E.g. filling memory around the matrices with nans and also somehow filling openblas own temporary buffers with some signaling values (might require special built openblas if _MALLOC_PERTURB does not work). Maybe it would be feasible to write a hypothesis [0] strategy for some of the blas stuff to automate the parameter exploration. And then we'd need to run everything under valgrind as due to the assembler implementation of openblas we can't use the faster address sanitizers gcc and clang now provide. [0] https://hypothesis.readthedocs.org/en/latest/ From sturla.molden at gmail.com Tue May 26 13:02:33 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 26 May 2015 17:02:33 +0000 (UTC) Subject: [Numpy-discussion] Strategy for OpenBLAS References: Message-ID: <1788986294454351015.246752sturla.molden-gmail.com@news.gmane.org> Matthew Brett wrote: > I am getting the impression that OpenBLAS is looking like the most > likely medium term solution for open-source stack builds of numpy and > scipy on Linux and Windows at least. I think you right. OpenBLAS might even be a long-term solution. We should also consider that GotoBLAS (and GotoBLAS2) powered some of the World's most expensive superomputers for a decade. It is not like this is untested software. The remaining test errors on Windows are also due to MSVC and MinGW-w64 differences, not due to OpenBLAS itself, and those are not relevant on Linux. On Apple, I am not sure which is better. Accelerate is faster in some corner cases (level-1 BLAS with AVX, operations on very small matrices), but it has issues with multiprocessing (GCD's threadpool is not forksafe). Apart from that OpenBLAS and Accelerate are about equivalent in performance. I have built OpenBLAS on OSX with clang and gfortran, it works like a charm. So it might be worth cobsidering for binary wheels on OSX as well. Sturla From charlesr.harris at gmail.com Tue May 26 13:16:52 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 May 2015 11:16:52 -0600 Subject: [Numpy-discussion] addition to numpy.i In-Reply-To: References: Message-ID: On Tue, May 26, 2015 at 8:59 AM, Tom Krauss wrote: > Hi folks, > > After some discussion with Bill Spotz I decided to try to submit my new > typemap to numpy.i that allows in-place arrays of an arbitrary number of > dimensions to be passed in as a "flat" array with a single "size". > > To that end I created my first pull request > https://github.com/numpy/numpy/pull/5914 > sorry if I missed any steps or procedures - I noticed only after I did the > commit and pull request that I should have created a new feature branch, > sorry about that. > > Anyway I noticed the pull request initiated a series of tests and one of > them failed. How do I go about debugging and resolving the failure? > Looks like it passed the tests, in fact, I don't think travis tests numpy.i, but I could be wrong about that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue May 26 14:59:54 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 26 May 2015 14:59:54 -0400 Subject: [Numpy-discussion] Strategy for OpenBLAS In-Reply-To: <5564A4F4.8050803@googlemail.com> References: <5564A4F4.8050803@googlemail.com> Message-ID: Hi, On Tue, May 26, 2015 at 12:53 PM, Julian Taylor wrote: > On 05/26/2015 04:56 PM, Matthew Brett wrote: >> Hi, >> >> This morning I was wondering whether we ought to plan to devote some >> resources to collaborating with the OpenBLAS team. >> >> >> >> It is relatively easy to add tests using Python / numpy. We like >> tests. Why don't we propose a collaboration with OpenBLAS where we >> build and test numpy with every / most / some commits of OpenBLAS, and >> try to make it easy for the OpenBLAS team to add tests. Maybe we >> can use and add to the list of machines on which OpenBLAS is tested >> [1]? We Berkeley Pythonistas can certainly add the machines at our >> buildbot farm [2]. Maybe the Julia / R developers would be interested >> to help too? >> > > Technically we only need a single machine with the newest instruction > set available. All other cases could then be tested via a virtual > machine that only exposes specific instruction sets (e.g. qemu which > could technically also emulate stuff the host does not have). > > Concerning test generation there is a huge parameter space that needs > testing due with openblas, at least some of it would need to be > automated/fuzzed. We also need specific preconditioning of memory to > test failure cases openblas had in the past, E.g. filling memory around > the matrices with nans and also somehow filling openblas own temporary > buffers with some signaling values (might require special built openblas > if _MALLOC_PERTURB does not work). > > Maybe it would be feasible to write a hypothesis [0] strategy for some > of the blas stuff to automate the parameter exploration. > > And then we'd need to run everything under valgrind as due to the > assembler implementation of openblas we can't use the faster address > sanitizers gcc and clang now provide. > > [0] https://hypothesis.readthedocs.org/en/latest/ All this sounds extremely useful. What do you think we should do next? How feasible is it to start to set this kind of thing up for our own use, and then offer to integrate with OpenBLAS? Is there anyone out there who knows the Julia and / or R community well enough to know if they would be interested to help? What kind of help do you think we need? Money for a machine? Cheers, Matthew From njs at pobox.com Wed May 27 04:13:04 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 May 2015 01:13:04 -0700 Subject: [Numpy-discussion] Strategy for OpenBLAS In-Reply-To: <5564A4F4.8050803@googlemail.com> References: <5564A4F4.8050803@googlemail.com> Message-ID: On Tue, May 26, 2015 at 9:53 AM, Julian Taylor wrote: > On 05/26/2015 04:56 PM, Matthew Brett wrote: >> Hi, >> >> This morning I was wondering whether we ought to plan to devote some >> resources to collaborating with the OpenBLAS team. >> >> >> >> It is relatively easy to add tests using Python / numpy. We like >> tests. Why don't we propose a collaboration with OpenBLAS where we >> build and test numpy with every / most / some commits of OpenBLAS, and >> try to make it easy for the OpenBLAS team to add tests. Maybe we >> can use and add to the list of machines on which OpenBLAS is tested >> [1]? We Berkeley Pythonistas can certainly add the machines at our >> buildbot farm [2]. Maybe the Julia / R developers would be interested >> to help too? >> > > Technically we only need a single machine with the newest instruction > set available. All other cases could then be tested via a virtual > machine that only exposes specific instruction sets (e.g. qemu which > could technically also emulate stuff the host does not have). > > Concerning test generation there is a huge parameter space that needs > testing due with openblas, at least some of it would need to be > automated/fuzzed. We also need specific preconditioning of memory to > test failure cases openblas had in the past, E.g. filling memory around > the matrices with nans and also somehow filling openblas own temporary > buffers with some signaling values (might require special built openblas > if _MALLOC_PERTURB does not work). A lot of this stuff is easier if we take a white-box instead of black-box approach -- adding hooks in OpenBLAS to override the CPU-based kernel-autoselection sounds a lot easier than creating unnatural machines in qemu, and similarly for initializing temporary buffers. (I would be really unsurprised if OpenBLAS re-uses temporary buffers across calls instead of doing a free/re-malloc, for example.) > Maybe it would be feasible to write a hypothesis [0] strategy for some > of the blas stuff to automate the parameter exploration. Or if this is daunting, you can get pretty far just sitting down and writing some for loops... I think this is a case where something is a lot better than nothing :-). -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Wed May 27 04:25:50 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 27 May 2015 01:25:50 -0700 Subject: [Numpy-discussion] Strategy for OpenBLAS In-Reply-To: References: Message-ID: On Tue, May 26, 2015 at 7:56 AM, Matthew Brett wrote: > Hi, > > This morning I was wondering whether we ought to plan to devote some > resources to collaborating with the OpenBLAS team. Sounds like a great idea to me. Even a bit familiar :-) http://thread.gmane.org/gmane.comp.python.numeric.general/57498 The lead developers of both OpenBLAS and BLIS are currently at UT Austin: http://shpc.ices.utexas.edu/people.html ...and it turns out that this is also where SciPy will be held in July. Might be a good opportunity for numpy/scipy folks interested in these matters to sit down in the same room as them and hash out some kind of shared plan of action. (NB: I'm told that BLIS now has full multi-threading support, and that they are working on runtime CPU detection and kernel auto-selection right now.) -n -- Nathaniel J. Smith -- http://vorpus.org From cmkleffner at gmail.com Wed May 27 04:26:07 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Wed, 27 May 2015 10:26:07 +0200 Subject: [Numpy-discussion] Strategy for OpenBLAS In-Reply-To: References: <5564A4F4.8050803@googlemail.com> Message-ID: 2015-05-27 10:13 GMT+02:00 Nathaniel Smith : > On Tue, May 26, 2015 at 9:53 AM, Julian Taylor > wrote: > > On 05/26/2015 04:56 PM, Matthew Brett wrote: > >> Hi, > >> > >> This morning I was wondering whether we ought to plan to devote some > >> resources to collaborating with the OpenBLAS team. > >> > >> > >> > >> It is relatively easy to add tests using Python / numpy. We like > >> tests. Why don't we propose a collaboration with OpenBLAS where we > >> build and test numpy with every / most / some commits of OpenBLAS, and > >> try to make it easy for the OpenBLAS team to add tests. Maybe we > >> can use and add to the list of machines on which OpenBLAS is tested > >> [1]? We Berkeley Pythonistas can certainly add the machines at our > >> buildbot farm [2]. Maybe the Julia / R developers would be interested > >> to help too? > >> > > > > Technically we only need a single machine with the newest instruction > > set available. All other cases could then be tested via a virtual > > machine that only exposes specific instruction sets (e.g. qemu which > > could technically also emulate stuff the host does not have). > > > > Concerning test generation there is a huge parameter space that needs > > testing due with openblas, at least some of it would need to be > > automated/fuzzed. We also need specific preconditioning of memory to > > test failure cases openblas had in the past, E.g. filling memory around > > the matrices with nans and also somehow filling openblas own temporary > > buffers with some signaling values (might require special built openblas > > if _MALLOC_PERTURB does not work). > > A lot of this stuff is easier if we take a white-box instead of > black-box approach -- adding hooks in OpenBLAS to override the > CPU-based kernel-autoselection sounds a lot easier than creating > unnatural machines in qemu, and similarly for initializing temporary > buffers. (I would be really unsurprised if OpenBLAS re-uses temporary > buffers across calls instead of doing a free/re-malloc, for example.) > > Manually overwriting the OpenBLAS CPU autoselection can easily be done by setting the OPENBLAS_CORETYPE environment variable, i.e. export OPENBLAS_CORETYPE=Nehalem > > Maybe it would be feasible to write a hypothesis [0] strategy for some > > of the blas stuff to automate the parameter exploration. > > Or if this is daunting, you can get pretty far just sitting down and > writing some for loops... I think this is a case where something is a > lot better than nothing :-). > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Wed May 27 04:41:15 2015 From: cmkleffner at gmail.com (Carl Kleffner) Date: Wed, 27 May 2015 10:41:15 +0200 Subject: [Numpy-discussion] Strategy for OpenBLAS In-Reply-To: References: <5564A4F4.8050803@googlemail.com> Message-ID: 2015-05-27 10:26 GMT+02:00 Carl Kleffner : > > > 2015-05-27 10:13 GMT+02:00 Nathaniel Smith : > >> On Tue, May 26, 2015 at 9:53 AM, Julian Taylor >> wrote: >> > On 05/26/2015 04:56 PM, Matthew Brett wrote: >> >> Hi, >> >> >> >> This morning I was wondering whether we ought to plan to devote some >> >> resources to collaborating with the OpenBLAS team. >> >> >> >> >> >> >> >> It is relatively easy to add tests using Python / numpy. We like >> >> tests. Why don't we propose a collaboration with OpenBLAS where we >> >> build and test numpy with every / most / some commits of OpenBLAS, and >> >> try to make it easy for the OpenBLAS team to add tests. Maybe we >> >> can use and add to the list of machines on which OpenBLAS is tested >> >> [1]? We Berkeley Pythonistas can certainly add the machines at our >> >> buildbot farm [2]. Maybe the Julia / R developers would be interested >> >> to help too? >> >> >> > >> > Some benchmark results made by @wernsaar can be found at http://sourceforge.net/p/slurm-roll/code/HEAD/tree/branches/benchmark/ . I guess this was made on Linux, so it cannot directly applied to Windows. See i.e https://github.com/xianyi/OpenBLAS/issues/532. In general OpenBLAS development trunk runs smoothly on Windows now. > > Technically we only need a single machine with the newest instruction >> > set available. All other cases could then be tested via a virtual >> > machine that only exposes specific instruction sets (e.g. qemu which >> > could technically also emulate stuff the host does not have). >> > >> > Concerning test generation there is a huge parameter space that needs >> > testing due with openblas, at least some of it would need to be >> > automated/fuzzed. We also need specific preconditioning of memory to >> > test failure cases openblas had in the past, E.g. filling memory around >> > the matrices with nans and also somehow filling openblas own temporary >> > buffers with some signaling values (might require special built openblas >> > if _MALLOC_PERTURB does not work). >> >> A lot of this stuff is easier if we take a white-box instead of >> black-box approach -- adding hooks in OpenBLAS to override the >> CPU-based kernel-autoselection sounds a lot easier than creating >> unnatural machines in qemu, and similarly for initializing temporary >> buffers. (I would be really unsurprised if OpenBLAS re-uses temporary >> buffers across calls instead of doing a free/re-malloc, for example.) >> >> Manually overwriting the OpenBLAS CPU autoselection can easily be done by > setting the OPENBLAS_CORETYPE environment variable, i.e. > export OPENBLAS_CORETYPE=Nehalem > > >> > Maybe it would be feasible to write a hypothesis [0] strategy for some >> > of the blas stuff to automate the parameter exploration. >> >> Or if this is daunting, you can get pretty far just sitting down and >> writing some for loops... I think this is a case where something is a >> lot better than nothing :-). >> >> -n >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mailinglists at xgm.de Wed May 27 10:15:55 2015 From: mailinglists at xgm.de (Florian Lindner) Date: Wed, 27 May 2015 16:15:55 +0200 Subject: [Numpy-discussion] MPI: Sendrecv blocks Message-ID: Hello, I have this piece of code: comm = MPI.COMM_WORLD temp = np.zeros(blockSize*blockSize) PrintNB("Communicate A to", get_left_rank()) comm.Sendrecv(sendbuf=np.ascontiguousarray(lA), dest=get_left_rank(), recvbuf=temp) lA = np.reshape(temp, (blockSize, blockSize)) PrintNB("Finished sending") lA being a numpy array. Output is: [0] Communicate A to 1 [2] Communicate A to 3 [3] Communicate A to 2 [1] Communicate A to 0 [1] Finished sending # here it blocks [n] is the rank. I have a circular send. 0>1, 1>0 and 2>3, 3>2. I understood Sendrec so that it is made specifically for these cases, but still it blocks. What is the problem here? Thanks! Florian From matthew.brett at gmail.com Wed May 27 14:27:40 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 27 May 2015 14:27:40 -0400 Subject: [Numpy-discussion] Strategy for OpenBLAS In-Reply-To: References: Message-ID: Hi, On 5/27/15, Nathaniel Smith wrote: > On Tue, May 26, 2015 at 7:56 AM, Matthew Brett > wrote: >> Hi, >> >> This morning I was wondering whether we ought to plan to devote some >> resources to collaborating with the OpenBLAS team. > > Sounds like a great idea to me. Even a bit familiar :-) > http://thread.gmane.org/gmane.comp.python.numeric.general/57498 I had forgotten that thread, thanks for reminding me. I guess my idea arose from my forgotten memory of that thread , but I was thinking that it may be less of a burden, and allow more sharing of work, if we concentrate on testing. For example, if we have a testing repo on the OpenBLAS org, I can imagine Julia / R developers finding bugs, and adding tests to the Python repo, because the machinery to do that is already built and documented (in my perfect world). > The lead developers of both OpenBLAS and BLIS are currently at UT > Austin: http://shpc.ices.utexas.edu/people.html > ...and it turns out that this is also where SciPy will be held in > July. Might be a good opportunity for numpy/scipy folks interested in > these matters to sit down in the same room as them and hash out some > kind of shared plan of action. I'm afraid I'm not going to Scipy this year. Nathaniel - would you consider organizing something like this, with able help from those of us going and not going who can contribute some time? > (NB: I'm told that BLIS now has full multi-threading support, and that > they are working on runtime CPU detection and kernel auto-selection > right now.) I can well imagine that BLIS will be a good option at some point, but I'm guessing that it is unlikely we will be able to to use BLIS for our default BLAS / LAPACK library on Linux / Windows / Mac in the near future. Is that right? Cheers, Matthew From thomas.p.krauss at gmail.com Wed May 27 14:37:23 2015 From: thomas.p.krauss at gmail.com (Tom Krauss) Date: Wed, 27 May 2015 13:37:23 -0500 Subject: [Numpy-discussion] addition to numpy.i In-Reply-To: References: Message-ID: Thanks for merging! After the merge, it's kind of a moot point now, but there was a failed build at one point. The Python 2.7 test timed out. Then mid-day yesterday the tests got run again and all passed. Not sure what's going on. I'm not seeing a record of the failed build now. On Tue, May 26, 2015 at 12:16 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Tue, May 26, 2015 at 8:59 AM, Tom Krauss > wrote: > >> Hi folks, >> >> After some discussion with Bill Spotz I decided to try to submit my new >> typemap to numpy.i that allows in-place arrays of an arbitrary number of >> dimensions to be passed in as a "flat" array with a single "size". >> >> To that end I created my first pull request >> https://github.com/numpy/numpy/pull/5914 >> sorry if I missed any steps or procedures - I noticed only after I did >> the commit and pull request that I should have created a new feature >> branch, sorry about that. >> >> Anyway I noticed the pull request initiated a series of tests and one of >> them failed. How do I go about debugging and resolving the failure? >> > > Looks like it passed the tests, in fact, I don't think travis tests > numpy.i, but I could be wrong about that. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Thu May 28 08:46:00 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 28 May 2015 14:46:00 +0200 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads Message-ID: hi, It has been reported that sourceforge has taken over the gimp unofficial windows downloader page and temporarily bundled the installer with unauthorized adware: https://plus.google.com/+gimp/posts/cxhB1PScFpe As NumPy is also distributing windows installers via sourceforge I recommend that when you download the files you verify the downloads via the checksums in the README.txt before using them. The README.txt is clearsigned with my gpg key so it should be safe from tampering. Unfortunately as I don't use windows I cannot give any advice on how to do the verifcation on these platforms. Maybe someone familar with available tools can chime in. I have checked the numpy downloads and they still match what I uploaded, but as sourceforge does redirect based on OS and geolocation this may not mean much. Cheers, Julian Taylor From cournape at gmail.com Thu May 28 09:35:55 2015 From: cournape at gmail.com (David Cournapeau) Date: Thu, 28 May 2015 22:35:55 +0900 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: Message-ID: IMO, this really begs the question on whether we still want to use sourceforge at all. At this point I just don't trust the service at all anymore. Could we use some resources (e.g. rackspace ?) to host those files ? Do we know how much traffic they get so estimate the cost ? David On Thu, May 28, 2015 at 9:46 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > hi, > It has been reported that sourceforge has taken over the gimp > unofficial windows downloader page and temporarily bundled the > installer with unauthorized adware: > https://plus.google.com/+gimp/posts/cxhB1PScFpe > > As NumPy is also distributing windows installers via sourceforge I > recommend that when you download the files you verify the downloads > via the checksums in the README.txt before using them. The README.txt > is clearsigned with my gpg key so it should be safe from tampering. > Unfortunately as I don't use windows I cannot give any advice on how > to do the verifcation on these platforms. Maybe someone familar with > available tools can chime in. > > I have checked the numpy downloads and they still match what I > uploaded, but as sourceforge does redirect based on OS and geolocation > this may not mean much. > > Cheers, > Julian Taylor > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Thu May 28 09:49:03 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 28 May 2015 13:49:03 +0000 (UTC) Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads References: Message-ID: <1057862074454513646.637274sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > It has been reported that sourceforge has taken over the gimp > unofficial windows downloader page and temporarily bundled the > installer with unauthorized adware: > https://plus.google.com/+gimp/posts/cxhB1PScFpe WTF? From p.j.a.cock at googlemail.com Thu May 28 10:00:38 2015 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 28 May 2015 15:00:38 +0100 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: Message-ID: Migrating from SourceForge seems worth considering. I also agree this is a breach of trust with the open source community. It is my impression that the GIMP team stopped using SF for downloads some time ago in favour of using their own website, leaving the SF account live to maintain the old release downloads: https://mail.gnome.org/archives/gimp-developer-list/2015-May/msg00098.html According to the SourceForge blog, they assumed the "GIMP for Windows" account was abandoned, and it appears SF decided to make some money off it as a mirror site offering adware-bundled versions of the official releases: http://sourceforge.net/blog/gimp-win-project-wasnt-hijacked-just-abandoned/ We would not want the same thing to happen to NumPy, but on the other hand deleting all the old releases on SourceForge would break a vast number of installation scripts/recipes. Peter On Thu, May 28, 2015 at 2:35 PM, David Cournapeau wrote: > IMO, this really begs the question on whether we still want to use > sourceforge at all. At this point I just don't trust the service at all > anymore. > > Could we use some resources (e.g. rackspace ?) to host those files ? Do we > know how much traffic they get so estimate the cost ? > > David > > On Thu, May 28, 2015 at 9:46 PM, Julian Taylor > wrote: >> >> hi, >> It has been reported that sourceforge has taken over the gimp >> unofficial windows downloader page and temporarily bundled the >> installer with unauthorized adware: >> https://plus.google.com/+gimp/posts/cxhB1PScFpe >> >> As NumPy is also distributing windows installers via sourceforge I >> recommend that when you download the files you verify the downloads >> via the checksums in the README.txt before using them. The README.txt >> is clearsigned with my gpg key so it should be safe from tampering. >> Unfortunately as I don't use windows I cannot give any advice on how >> to do the verifcation on these platforms. Maybe someone familar with >> available tools can chime in. >> >> I have checked the numpy downloads and they still match what I >> uploaded, but as sourceforge does redirect based on OS and geolocation >> this may not mean much. >> >> Cheers, >> Julian Taylor >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla.molden at gmail.com Thu May 28 10:07:37 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 28 May 2015 14:07:37 +0000 (UTC) Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads References: Message-ID: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> David Cournapeau wrote: > IMO, this really begs the question on whether we still want to use > sourceforge at all. At this point I just don't trust the service at all > anymore. Here is their lame excuse: https://sourceforge.net/blog/gimp-win-project-wasnt-hijacked-just-abandoned/ It probably means this: If NumPy installers are moved away from Sourceforge, they will set up a mirror and load the mirrored installers with all sorts of crapware. It is some sort of racket the mob couldn't do better. Sturla From andrew.collette at gmail.com Thu May 28 13:00:08 2015 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 28 May 2015 11:00:08 -0600 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: > Here is their lame excuse: > > https://sourceforge.net/blog/gimp-win-project-wasnt-hijacked-just-abandoned/ > > It probably means this: > > If NumPy installers are moved away from Sourceforge, they will set up a > mirror and load the mirrored installers with all sorts of crapware. It is > some sort of racket the mob couldn't do better. I noticed that like most BSD-licensed software, NumPy's license includes this clause: "Neither the name of the NumPy Developers nor the names of any contributors may be used to endorse or promote products derived from this software without specific prior written permission." There's an argument to be made that SF isn't legally permitted to distribute poisoned installers under the name "NumPy" without permission. I recall a similar dust-up a while ago about "Standard Markdown" using the name "Markdown"; the original author (John Gruber) took action and got them to change the name. In any case I've always been surprised that NumPy is distributed through SourceForge, which has been sketchy for years now. Could it simply be hosted on PyPI? Andrew From cournape at gmail.com Thu May 28 13:05:57 2015 From: cournape at gmail.com (David Cournapeau) Date: Fri, 29 May 2015 02:05:57 +0900 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, May 29, 2015 at 2:00 AM, Andrew Collette wrote: > > Here is their lame excuse: > > > > > https://sourceforge.net/blog/gimp-win-project-wasnt-hijacked-just-abandoned/ > > > > It probably means this: > > > > If NumPy installers are moved away from Sourceforge, they will set up a > > mirror and load the mirrored installers with all sorts of crapware. It is > > some sort of racket the mob couldn't do better. > > I noticed that like most BSD-licensed software, NumPy's license > includes this clause: > > "Neither the name of the NumPy Developers nor the names of any > contributors may be used to endorse or promote products derived from > this software without specific prior written permission." > > There's an argument to be made that SF isn't legally permitted to > distribute poisoned installers under the name "NumPy" without > permission. I recall a similar dust-up a while ago about "Standard > Markdown" using the name "Markdown"; the original author (John Gruber) > took action and got them to change the name. > > In any case I've always been surprised that NumPy is distributed > through SourceForge, which has been sketchy for years now. Could it > simply be hosted on PyPI? > They don't accept arbitrary binaries like SF does, and some of our installer formats can't be uploaded there. David > > Andrew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu May 28 13:20:27 2015 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 28 May 2015 20:20:27 +0300 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: 28.05.2015, 20:05, David Cournapeau kirjoitti: [clip] >> In any case I've always been surprised that NumPy is distributed >> through SourceForge, which has been sketchy for years now. Could it >> simply be hosted on PyPI? >> > > They don't accept arbitrary binaries like SF does, and some of our > installer formats can't be uploaded there. Is it possible to host them on github? I think there's an option to add release notes and (apparently) to upload binaries if you go to the "Releases" section --- there's one for each tag. Pauli From sturla.molden at gmail.com Thu May 28 13:35:25 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 28 May 2015 17:35:25 +0000 (UTC) Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: <1872293233454527203.780948sturla.molden-gmail.com@news.gmane.org> Pauli Virtanen wrote: > Is it possible to host them on github? I think there's an option to add > release notes and (apparently) to upload binaries if you go to the > "Releases" section --- there's one for each tag. And then Sourceforge will put up tainted installers "for the benefit of NumPy users". :) From pav at iki.fi Thu May 28 13:46:29 2015 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 28 May 2015 20:46:29 +0300 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: <1872293233454527203.780948sturla.molden-gmail.com@news.gmane.org> References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> <1872293233454527203.780948sturla.molden-gmail.com@news.gmane.org> Message-ID: 28.05.2015, 20:35, Sturla Molden kirjoitti: > Pauli Virtanen wrote: > >> Is it possible to host them on github? I think there's an option to add >> release notes and (apparently) to upload binaries if you go to the >> "Releases" section --- there's one for each tag. > > And then Sourceforge will put up tainted installers "for the benefit of > NumPy users". :) Well, let them. They may already be tainted, who knows. It's phishing and malware distribution at that point, and there are some ways to deal with that (safe browsing, AV etc). From jtaylor.debian at googlemail.com Thu May 28 14:52:01 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 28 May 2015 20:52:01 +0200 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> <1872293233454527203.780948sturla.molden-gmail.com@news.gmane.org> Message-ID: <556763D1.9060806@googlemail.com> On 28.05.2015 19:46, Pauli Virtanen wrote: > 28.05.2015, 20:35, Sturla Molden kirjoitti: >> Pauli Virtanen wrote: >> >>> Is it possible to host them on github? I think there's an option to add >>> release notes and (apparently) to upload binaries if you go to the >>> "Releases" section --- there's one for each tag. >> >> And then Sourceforge will put up tainted installers "for the benefit of >> NumPy users". :) > > Well, let them. They may already be tainted, who knows. It's phishing > and malware distribution at that point, and there are some ways to deal > with that (safe browsing, AV etc). > > there is no guarantee that github will not do this stuff in future too, also PyPI or self hosting do not necessarily help as those resources can be compromised. The main thing that should be learned this and the many similar incidents in the past is that binaries from the internet need to be verified of they have been modified from their original state otherwise they cannot be trusted. With my mail I wanted to bring to attention that both numpy (since 1.7.2) and scipy (since 0.14.1) allow users to do so via the signed README.txt containing checksums. From pav at iki.fi Thu May 28 15:05:29 2015 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 28 May 2015 22:05:29 +0300 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: <556763D1.9060806@googlemail.com> References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> <1872293233454527203.780948sturla.molden-gmail.com@news.gmane.org> <556763D1.9060806@googlemail.com> Message-ID: 28.05.2015, 21:52, Julian Taylor kirjoitti: > there is no guarantee that github will not do this stuff in future too, > also PyPI or self hosting do not necessarily help as those resources can > be compromised. > The main thing that should be learned this and the many similar > incidents in the past is that binaries from the internet need to be > verified of they have been modified from their original state otherwise > they cannot be trusted. Indeed, but on the other hand, there's no reason for us to continue cooperating with shady partners, especially when there are easy alternatives. We can just quietly change the main binary distribution channel and be done with it. From toddrjen at gmail.com Fri May 29 01:43:34 2015 From: toddrjen at gmail.com (Todd) Date: Fri, 29 May 2015 07:43:34 +0200 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: On May 28, 2015 7:06 PM, "David Cournapeau" wrote: > On Fri, May 29, 2015 at 2:00 AM, Andrew Collette < andrew.collette at gmail.com> wrote: >> >> In any case I've always been surprised that NumPy is distributed >> through SourceForge, which has been sketchy for years now. Could it >> simply be hosted on PyPI? > > > They don't accept arbitrary binaries like SF does, and some of our installer formats can't be uploaded there. > > David Is that something that could be fixed? Has anyone asked the pypi maintainers whether they could change those rules, either in general or by granting exceptions on a case-by-case basis to projects that have proven track records and importance? It would seem to me that if the rules on pypi are forcing critical projects like numpy to host elsewhere, then the rules are flawed and are preventing pypi from serving is intended purpose. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Fri May 29 11:24:22 2015 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Fri, 29 May 2015 17:24:22 +0200 Subject: [Numpy-discussion] ANN: SfePy 2015.2 Message-ID: <556884A6.1010600@ntc.zcu.cz> I am pleased to announce release 2015.2 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method or by the isogeometric analysis (preliminary support). It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: http://groups.google.com/group/sfepy-devel Git (source) repository, issue tracker, wiki: http://github.com/sfepy Highlights of this release -------------------------- - major code simplification (removed element groups) - time stepping solvers updated for interactive use - improved finding of reference element coordinates of physical points - reorganized examples - reorganized installation on POSIX systems (sfepy-run script) For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Best regards, Robert Cimrman and Contributors (*) (*) Contributors to this release (alphabetical order): Lubos Kejzlar, Vladimir Lukes, Anton Gladky, Matyas Novak From ben.root at ou.edu Fri May 29 13:28:05 2015 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 29 May 2015 13:28:05 -0400 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: Speaking from the matplotlib project, our binaries are substantial due to our suite of test images. Pypi worked with us on relaxing size constraints. Also, I think the new cheese shop/warehouse server they are using scales better, so size is not nearly the same concern as before. Ben Root On May 29, 2015 1:43 AM, "Todd" wrote: > On May 28, 2015 7:06 PM, "David Cournapeau" wrote: > > On Fri, May 29, 2015 at 2:00 AM, Andrew Collette < > andrew.collette at gmail.com> wrote: > >> > >> In any case I've always been surprised that NumPy is distributed > >> through SourceForge, which has been sketchy for years now. Could it > >> simply be hosted on PyPI? > > > > > > They don't accept arbitrary binaries like SF does, and some of our > installer formats can't be uploaded there. > > > > David > > Is that something that could be fixed? Has anyone asked the pypi > maintainers whether they could change those rules, either in general or by > granting exceptions on a case-by-case basis to projects that have proven > track records and importance? > > It would seem to me that if the rules on pypi are forcing critical > projects like numpy to host elsewhere, then the rules are flawed and are > preventing pypi from serving is intended purpose. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From saketkc at gmail.com Fri May 29 13:52:26 2015 From: saketkc at gmail.com (Saket Choudhary) Date: Fri, 29 May 2015 10:52:26 -0700 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: On 28 May 2015 at 10:05, David Cournapeau wrote: > > > On Fri, May 29, 2015 at 2:00 AM, Andrew Collette > wrote: >> >> > Here is their lame excuse: >> > >> > >> > https://sourceforge.net/blog/gimp-win-project-wasnt-hijacked-just-abandoned/ >> > >> > It probably means this: >> > >> > If NumPy installers are moved away from Sourceforge, they will set up a >> > mirror and load the mirrored installers with all sorts of crapware. It >> > is >> > some sort of racket the mob couldn't do better. >> >> I noticed that like most BSD-licensed software, NumPy's license >> includes this clause: >> >> "Neither the name of the NumPy Developers nor the names of any >> contributors may be used to endorse or promote products derived from >> this software without specific prior written permission." >> >> There's an argument to be made that SF isn't legally permitted to >> distribute poisoned installers under the name "NumPy" without >> permission. I recall a similar dust-up a while ago about "Standard >> Markdown" using the name "Markdown"; the original author (John Gruber) >> took action and got them to change the name. >> >> In any case I've always been surprised that NumPy is distributed >> through SourceForge, which has been sketchy for years now. Could it >> simply be hosted on PyPI? > > > They don't accept arbitrary binaries like SF does, and some of our installer > formats can't be uploaded there. > Bintray [1] has been providing a free service for hosting 'bottles'(compiled binaries) for the Homebrew project [2]. Probably an option to look at. [1] https://bintray.com/ [2] http://brew.sh/ > David > >> >> >> Andrew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From antony.lee at berkeley.edu Fri May 29 17:06:39 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Fri, 29 May 2015 14:06:39 -0700 Subject: [Numpy-discussion] Backwards-incompatible improvements to numpy.random.RandomState In-Reply-To: References: Message-ID: > > A proof-of-concept implementation, still missing tests, is tracked as > #5911. It includes the patch proposed in #5158 as an example of how to > include an improved version of random.choice. > Tests are in now (whether we should bundle in pickles of old versions to make sure they are still unpickled correctly and outputs of old random streams to make sure they are still reproduced is a good question, though). Comments welcome. Antony -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun May 31 21:43:11 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 1 Jun 2015 03:43:11 +0200 Subject: [Numpy-discussion] Verify your sourceforge windows installer downloads In-Reply-To: References: <31698217454514411.075227sturla.molden-gmail.com@news.gmane.org> Message-ID: On Fri, May 29, 2015 at 7:28 PM, Benjamin Root wrote: > Speaking from the matplotlib project, our binaries are substantial due to > our suite of test images. Pypi worked with us on relaxing size constraints. > Also, I think the new cheese shop/warehouse server they are using scales > better, so size is not nearly the same concern as before. > > Ben Root > On May 29, 2015 1:43 AM, "Todd" wrote: > >> On May 28, 2015 7:06 PM, "David Cournapeau" wrote: >> > On Fri, May 29, 2015 at 2:00 AM, Andrew Collette < >> andrew.collette at gmail.com> wrote: >> >> >> >> In any case I've always been surprised that NumPy is distributed >> >> through SourceForge, which has been sketchy for years now. Could it >> >> simply be hosted on PyPI? >> > >> > >> > They don't accept arbitrary binaries like SF does, and some of our >> installer formats can't be uploaded there. >> > >> > David >> >> Is that something that could be fixed? >> > For the current .exe installers that cannot be fixed, because neither pip nor easy_install can handle those. We actually have to ensure that we don't link from pypi directly to the sourceforge folder with the latest release, because then easy_install will follow the link, download the .exe and fail. Dmg's were another non-supported format, but we'll stop using those. So if/when it's SSE2 .exe installers only (make with bdist_wininst and no NSIS) then PyPi works. Size constraints are not an issue for Numpy I think. Ralf Has anyone asked the pypi maintainers whether they could change those >> rules, either in general or by granting exceptions on a case-by-case basis >> to projects that have proven track records and importance? >> >> It would seem to me that if the rules on pypi are forcing critical >> projects like numpy to host elsewhere, then the rules are flawed and are >> preventing pypi from serving is intended purpose. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 30 18:23:47 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 30 May 2015 16:23:47 -0600 Subject: [Numpy-discussion] matmul needs some clarification. Message-ID: Hi All, The problem arises when multiplying a stack of matrices times a vector. PEP465 defines this as appending a '1' to the dimensions of the vector and doing the defined stacked matrix multiply, then removing the last dimension from the result. Note that in the middle step we have a stack of matrices and after removing the last dimension we will still have a stack of matrices. What we want is a stack of vectors, but we can't have those with our conventions. This makes the result somewhat unexpected. How should we resolve this? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: