From dileepkunjaai at gmail.com Mon Aug 1 05:31:13 2011 From: dileepkunjaai at gmail.com (dileep kunjaai) Date: Mon, 1 Aug 2011 15:01:13 +0530 Subject: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array. Message-ID: Dear sir, How can we fill a particular value in the place of number satisfying certain condition by another number in an array. Example: A=[[[ 9.42233087e-42 - 4.71116544e-42 0.00000000e+00 ..., 1.48303127e+01 1.31524124e+01 1.14745111e+01] [ 3.91788793e+00 1.95894396e+00 0.00000000e+00 ..., 1.78252487e+01 1.28667984e+01 7.90834856e+00] [ 7.83592510e+00 -3.91796255e+00 0.00000000e+00 ..., 2.08202991e+01 1.25811749e+01 4.34205008e+00] ..., [ -8.51249974e-03 7.00901222e+00 -1.40095119e+01 ..., 0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 4.26390441e-03 3.51080871e+00 -7.01735353e+00 ..., 0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., 0.00000000e+00 0.00000000e+00 0.00000000e+00]] [[ 9.42233087e-42 -4.71116544e-42 0.00000000e+00 ..., 8.48242474e+00 7.97146845e+00 7.46051216e+00] [ 5.16325808e+00 2.58162904e+00 0.00000000e+00 ..., 8.47719383e+00 8.28024673e+00 8.08330059e+00] [ 1.03267126e+01 5.16335630e+00 0.00000000e+00 ..., 8.47196198e+00 8.58903694e+00 8.70611191e+00] ..., [ 0.00000000e+00 2.74500012e-01 5.49000025e-01 ..., 0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 0.00000000e+00 1.37496844e-01 -2.74993688e-01 ..., 0.00000000e+00 0.00000000e+00 0.00000000e+00] [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., 0.00000000e+00 0.00000000e+00 0.00000000e+00]] [[ 9.42233087e-42 4.71116544e-42 0.00000000e+00 ..., 1.18437748e+01 9.72778034e+00 7.61178637e+00] [ 2.96431869e-01 1.48215935e-01 0.00000000e+00 ..., 1.64031239e+01 1.32768812e+01 1.01506386e+01] [ 5.92875004e-01 2.96437502e-01 0.00000000e+00 ..., 2.09626484e+01 1.68261185e+01 1.26895866e+01] ..., [ 1.78188753e+00 -8.90943766e-01 0.00000000e+00 ..., 0.00000000e+00 1.27500005e-03 2.55000009e-03] [ 9.34620261e-01 -4.67310131e-01 0.00000000e+00 ..., 0.00000000e+00 6.38646539e-04 1.27729308e-03] [ 8.43000039e-02 4.21500020e-02 0.00000000e+00 ..., 0.00000000e+00 0.00000000e+00 0.00000000e+00]]] A contain some negative value i want to change the negative numbers to '0'. I used 'masked_where', command but I failed. Please help me -- DILEEPKUMAR. R J R F, IIT DELHI -------------- next part -------------- An HTML attachment was scrubbed... URL: From mlist at re-factory.de Mon Aug 1 05:37:09 2011 From: mlist at re-factory.de (Robert Elsner) Date: Mon, 01 Aug 2011 11:37:09 +0200 Subject: [Numpy-discussion] C api doc shortcomings Message-ID: <4E3673C5.2020600@re-factory.de> Hey Everybody, I noticed that the c-api docs (2.0.dev-72ab385) lack a clear statement what the preferred entry point into the c-api is (from a users point of view). Normally I would expect a sentence or two stating that the api entry point is arrayobject.h (or whatever). Instead the docs ponder about reading the c sources but do not give any hints where to start. I suggest something akin to the official Python docs in a prominent place: All function, type and macro definitions needed to use the Python/C API are included in your code by the following line: #include "Python.h" This implies inclusion of the following standard headers: , , , , and (if available). modified for Numpy. cheers Robert From miguel.deval at gmail.com Mon Aug 1 05:41:52 2011 From: miguel.deval at gmail.com (Miguel de Val-Borro) Date: Mon, 1 Aug 2011 11:41:52 +0200 Subject: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array. In-Reply-To: References: Message-ID: <20110801094152.GE30796@poincare.pc.linmpi.mpg.de> Dear Dileep, the numpy.where function returns the elements from A or 0 depending if the condition in the first argument is satisfied: B = np.where(A >= 0, A, 0) Miguel On Mon, Aug 01, 2011 at 03:01:13PM +0530, dileep kunjaai wrote: > Dear sir, > How can we fill a particular value in the place of number satisfying > certain condition by another number in an array. > > > Example: > A=[[[ 9.42233087e-42 - 4.71116544e-42 0.00000000e+00 ..., > 1.48303127e+01 > 1.31524124e+01 1.14745111e+01] > [ 3.91788793e+00 1.95894396e+00 0.00000000e+00 ..., 1.78252487e+01 > 1.28667984e+01 7.90834856e+00] > [ 7.83592510e+00 -3.91796255e+00 0.00000000e+00 ..., 2.08202991e+01 > 1.25811749e+01 4.34205008e+00] > ..., > [ -8.51249974e-03 7.00901222e+00 -1.40095119e+01 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 4.26390441e-03 3.51080871e+00 -7.01735353e+00 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]] > > [[ 9.42233087e-42 -4.71116544e-42 0.00000000e+00 ..., 8.48242474e+00 > 7.97146845e+00 7.46051216e+00] > [ 5.16325808e+00 2.58162904e+00 0.00000000e+00 ..., 8.47719383e+00 > 8.28024673e+00 8.08330059e+00] > [ 1.03267126e+01 5.16335630e+00 0.00000000e+00 ..., 8.47196198e+00 > 8.58903694e+00 8.70611191e+00] > ..., > [ 0.00000000e+00 2.74500012e-01 5.49000025e-01 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 1.37496844e-01 -2.74993688e-01 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]] > > [[ 9.42233087e-42 4.71116544e-42 0.00000000e+00 ..., 1.18437748e+01 > 9.72778034e+00 7.61178637e+00] > [ 2.96431869e-01 1.48215935e-01 0.00000000e+00 ..., 1.64031239e+01 > 1.32768812e+01 1.01506386e+01] > [ 5.92875004e-01 2.96437502e-01 0.00000000e+00 ..., 2.09626484e+01 > 1.68261185e+01 1.26895866e+01] > ..., > [ 1.78188753e+00 -8.90943766e-01 0.00000000e+00 ..., 0.00000000e+00 > 1.27500005e-03 2.55000009e-03] > [ 9.34620261e-01 -4.67310131e-01 0.00000000e+00 ..., 0.00000000e+00 > 6.38646539e-04 1.27729308e-03] > [ 8.43000039e-02 4.21500020e-02 0.00000000e+00 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]]] > A contain some negative value i want to change the negative numbers to > '0'. > I used 'masked_where', command but I failed. > > > > Please help me > > -- > DILEEPKUMAR. R > J R F, IIT DELHI > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From silva at lma.cnrs-mrs.fr Mon Aug 1 05:43:13 2011 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Mon, 01 Aug 2011 11:43:13 +0200 Subject: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array. In-Reply-To: References: Message-ID: <1312191793.5117.7.camel@lma-98.cnrs-mrs.fr> Le lundi 01 ao?t 2011 ? 15:01 +0530, dileep kunjaai a ?crit : > Dear sir, > How can we fill a particular value in the place of number satisfying > certain condition by another number in an array. > A contain some negative value i want to change the negative numbers to > '0'. I used 'masked_where', command but I failed. Does np.clip fulfill your requirements ? http://docs.scipy.org/doc/numpy/reference/generated/numpy.clip.html Be aware that it needs an upper limit (which can be np.inf). Another option A[A<0] = 0. -- Fabrice Silva From jeffspencerd at gmail.com Mon Aug 1 08:14:51 2011 From: jeffspencerd at gmail.com (Jeffrey Spencer) Date: Mon, 01 Aug 2011 22:14:51 +1000 Subject: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array. In-Reply-To: References: Message-ID: <4E3698BB.5000002@gmail.com> Depends where it is contained but another option is and I find it to typically be faster: B = zeros(A.shape) maximum(A,B,A) On 08/01/2011 07:31 PM, dileep kunjaai wrote: > Dear sir, > How can we fill a particular value in the place of number > satisfying certain condition by another number in an array. > > > Example: > A=[[[ 9.42233087e-42 - 4.71116544e-42 0.00000000e+00 ..., > 1.48303127e+01 > 1.31524124e+01 1.14745111e+01] > [ 3.91788793e+00 1.95894396e+00 0.00000000e+00 ..., > 1.78252487e+01 > 1.28667984e+01 7.90834856e+00] > [ 7.83592510e+00 -3.91796255e+00 0.00000000e+00 ..., > 2.08202991e+01 > 1.25811749e+01 4.34205008e+00] > ..., > [ -8.51249974e-03 7.00901222e+00 -1.40095119e+01 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 4.26390441e-03 3.51080871e+00 -7.01735353e+00 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]] > > [[ 9.42233087e-42 -4.71116544e-42 0.00000000e+00 ..., > 8.48242474e+00 > 7.97146845e+00 7.46051216e+00] > [ 5.16325808e+00 2.58162904e+00 0.00000000e+00 ..., > 8.47719383e+00 > 8.28024673e+00 8.08330059e+00] > [ 1.03267126e+01 5.16335630e+00 0.00000000e+00 ..., > 8.47196198e+00 > 8.58903694e+00 8.70611191e+00] > ..., > [ 0.00000000e+00 2.74500012e-01 5.49000025e-01 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 1.37496844e-01 -2.74993688e-01 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]] > > [[ 9.42233087e-42 4.71116544e-42 0.00000000e+00 ..., > 1.18437748e+01 > 9.72778034e+00 7.61178637e+00] > [ 2.96431869e-01 1.48215935e-01 0.00000000e+00 ..., > 1.64031239e+01 > 1.32768812e+01 1.01506386e+01] > [ 5.92875004e-01 2.96437502e-01 0.00000000e+00 ..., > 2.09626484e+01 > 1.68261185e+01 1.26895866e+01] > ..., > [ 1.78188753e+00 -8.90943766e-01 0.00000000e+00 ..., > 0.00000000e+00 > 1.27500005e-03 2.55000009e-03] > [ 9.34620261e-01 -4.67310131e-01 0.00000000e+00 ..., > 0.00000000e+00 > 6.38646539e-04 1.27729308e-03] > [ 8.43000039e-02 4.21500020e-02 0.00000000e+00 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]]] > A contain some negative value i want to change the negative numbers > to '0'. > I used 'masked_where', command but I failed. > > > > Please help me > > -- > DILEEPKUMAR. R > J R F, IIT DELHI > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From brett.olsen at gmail.com Mon Aug 1 10:34:16 2011 From: brett.olsen at gmail.com (Brett Olsen) Date: Mon, 1 Aug 2011 09:34:16 -0500 Subject: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array. In-Reply-To: References: Message-ID: This method is probably simpler: In [1]: import numpy as N In [2]: A = N.random.random_integers(-10, 10, 25).reshape((5, 5)) In [3]: A Out[3]: array([[ -5, 9, 1, 9, -2], [ -8, 0, 9, 7, -10], [ 2, -3, -1, 5, -7], [ 0, -2, -2, 9, 1], [ -7, -9, -4, -1, 6]]) In [4]: A[A < 0] = 0 In [5]: A Out[5]: array([[0, 9, 1, 9, 0], [0, 0, 9, 7, 0], [2, 0, 0, 5, 0], [0, 0, 0, 9, 1], [0, 0, 0, 0, 6]]) ~Brett On Mon, Aug 1, 2011 at 4:31 AM, dileep kunjaai wrote: > Dear sir, > ?? How can we fill a particular value in the place of number satisfying > certain condition by another number in an array. > > > Example: > ?A=[[[? 9.42233087e-42? - 4.71116544e-42?? 0.00000000e+00 ..., > 1.48303127e+01 > ???? 1.31524124e+01?? 1.14745111e+01] > ? [? 3.91788793e+00?? 1.95894396e+00?? 0.00000000e+00 ...,?? 1.78252487e+01 > ???? 1.28667984e+01?? 7.90834856e+00] > ? [? 7.83592510e+00?? -3.91796255e+00?? 0.00000000e+00 ...,?? 2.08202991e+01 > ???? 1.25811749e+01?? 4.34205008e+00] > ? ..., > ? [? -8.51249974e-03?? 7.00901222e+00?? -1.40095119e+01 ..., > 0.00000000e+00 > ???? 0.00000000e+00?? 0.00000000e+00] > ? [? 4.26390441e-03?? 3.51080871e+00?? -7.01735353e+00 ...,?? 0.00000000e+00 > ???? 0.00000000e+00?? 0.00000000e+00] > ? [? 0.00000000e+00?? 0.00000000e+00?? 0.00000000e+00 ...,?? 0.00000000e+00 > ???? 0.00000000e+00?? 0.00000000e+00]] > > ?[[? 9.42233087e-42?? -4.71116544e-42?? 0.00000000e+00 ...,?? 8.48242474e+00 > ???? 7.97146845e+00?? 7.46051216e+00] > ? [? 5.16325808e+00?? 2.58162904e+00?? 0.00000000e+00 ...,?? 8.47719383e+00 > ???? 8.28024673e+00?? 8.08330059e+00] > ? [? 1.03267126e+01?? 5.16335630e+00?? 0.00000000e+00 ...,?? 8.47196198e+00 > ???? 8.58903694e+00?? 8.70611191e+00] > ? ..., > ? [? 0.00000000e+00?? 2.74500012e-01?? 5.49000025e-01 ...,?? 0.00000000e+00 > ???? 0.00000000e+00?? 0.00000000e+00] > ? [? 0.00000000e+00?? 1.37496844e-01?? -2.74993688e-01 ...,?? 0.00000000e+00 > ???? 0.00000000e+00?? 0.00000000e+00] > ? [? 0.00000000e+00?? 0.00000000e+00?? 0.00000000e+00 ...,?? 0.00000000e+00 > ???? 0.00000000e+00?? 0.00000000e+00]] > > ?[[? 9.42233087e-42?? 4.71116544e-42?? 0.00000000e+00 ...,?? 1.18437748e+01 > ???? 9.72778034e+00?? 7.61178637e+00] > ? [? 2.96431869e-01?? 1.48215935e-01?? 0.00000000e+00 ...,?? 1.64031239e+01 > ???? 1.32768812e+01?? 1.01506386e+01] > ? [? 5.92875004e-01?? 2.96437502e-01?? 0.00000000e+00 ...,?? 2.09626484e+01 > ???? 1.68261185e+01?? 1.26895866e+01] > ? ..., > ? [? 1.78188753e+00?? -8.90943766e-01?? 0.00000000e+00 ...,?? 0.00000000e+00 > ???? 1.27500005e-03?? 2.55000009e-03] > ? [? 9.34620261e-01?? -4.67310131e-01?? 0.00000000e+00 ...,?? 0.00000000e+00 > ???? 6.38646539e-04?? 1.27729308e-03] > ? [? 8.43000039e-02?? 4.21500020e-02?? 0.00000000e+00 ...,?? 0.00000000e+00 > ???? 0.00000000e+00?? 0.00000000e+00]]] > ? A contain some negative value i want to change the negative numbers to > '0'. > I used 'masked_where', command but I failed. > > > > Please help me > > -- > DILEEPKUMAR. R > J R F, IIT DELHI > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From e.antero.tammi at gmail.com Mon Aug 1 12:23:04 2011 From: e.antero.tammi at gmail.com (eat) Date: Mon, 1 Aug 2011 19:23:04 +0300 Subject: [Numpy-discussion] Fill a particular value in the place of number satisfying certain condition by another number in an array. In-Reply-To: <4E3698BB.5000002@gmail.com> References: <4E3698BB.5000002@gmail.com> Message-ID: Hi On Mon, Aug 1, 2011 at 3:14 PM, Jeffrey Spencer wrote: > Depends where it is contained but another option is and I find it to > typically be faster: > > B = zeros(A.shape) > maximum(A,B,A) > Since maximum(.) can handle broadcasting maximum(A, 0, A) will be even faster. -eat > > > On 08/01/2011 07:31 PM, dileep kunjaai wrote: > > Dear sir, > How can we fill a particular value in the place of number satisfying > certain condition by another number in an array. > > > Example: > A=[[[ 9.42233087e-42 - 4.71116544e-42 0.00000000e+00 ..., > 1.48303127e+01 > 1.31524124e+01 1.14745111e+01] > [ 3.91788793e+00 1.95894396e+00 0.00000000e+00 ..., 1.78252487e+01 > 1.28667984e+01 7.90834856e+00] > [ 7.83592510e+00 -3.91796255e+00 0.00000000e+00 ..., > 2.08202991e+01 > 1.25811749e+01 4.34205008e+00] > ..., > [ -8.51249974e-03 7.00901222e+00 -1.40095119e+01 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 4.26390441e-03 3.51080871e+00 -7.01735353e+00 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]] > > [[ 9.42233087e-42 -4.71116544e-42 0.00000000e+00 ..., > 8.48242474e+00 > 7.97146845e+00 7.46051216e+00] > [ 5.16325808e+00 2.58162904e+00 0.00000000e+00 ..., 8.47719383e+00 > 8.28024673e+00 8.08330059e+00] > [ 1.03267126e+01 5.16335630e+00 0.00000000e+00 ..., 8.47196198e+00 > 8.58903694e+00 8.70611191e+00] > ..., > [ 0.00000000e+00 2.74500012e-01 5.49000025e-01 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 1.37496844e-01 -2.74993688e-01 ..., > 0.00000000e+00 > 0.00000000e+00 0.00000000e+00] > [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]] > > [[ 9.42233087e-42 4.71116544e-42 0.00000000e+00 ..., 1.18437748e+01 > 9.72778034e+00 7.61178637e+00] > [ 2.96431869e-01 1.48215935e-01 0.00000000e+00 ..., 1.64031239e+01 > 1.32768812e+01 1.01506386e+01] > [ 5.92875004e-01 2.96437502e-01 0.00000000e+00 ..., 2.09626484e+01 > 1.68261185e+01 1.26895866e+01] > ..., > [ 1.78188753e+00 -8.90943766e-01 0.00000000e+00 ..., > 0.00000000e+00 > 1.27500005e-03 2.55000009e-03] > [ 9.34620261e-01 -4.67310131e-01 0.00000000e+00 ..., > 0.00000000e+00 > 6.38646539e-04 1.27729308e-03] > [ 8.43000039e-02 4.21500020e-02 0.00000000e+00 ..., 0.00000000e+00 > 0.00000000e+00 0.00000000e+00]]] > A contain some negative value i want to change the negative numbers to > '0'. > I used 'masked_where', command but I failed. > > > > Please help me > > -- > DILEEPKUMAR. R > J R F, IIT DELHI > > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Mon Aug 1 12:52:34 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 1 Aug 2011 12:52:34 -0400 Subject: [Numpy-discussion] recommendation for saving data In-Reply-To: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> References: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> Message-ID: I personally use pickle, which does exactly what you are asking for (and can be customized with __getstate__ and __setstate__ if needed). What are your issues with pickle? -=- Olivier 2011/7/31 Brian Blais > Hello, > > I was wondering if there are any recommendations for formats for saving > scientific data. I am running a simulation, which has many > somewhat-indepedent parts which have their own internal state and > parameters. I've been using pickle (gzipped) to save the entire object > (which contains subobjects, etc...), but it is getting too unwieldy and I > think it is time to look for a more robust solution. Ideally I'd like to > have something where I can call a save method on the simulation object, and > it will call the save methods on all the children, on down the line all > saving into one file. It'd also be nice if it were cross-platform, and I > could depend on the files being readable into the future for a while. > > Are there any good standards for this? What do you use for saving > scientific data? > > > thank you, > > Brian Blais > > > > -- > Brian Blais > bblais at bryant.edu > http://web.bryant.edu/~bblais > http://bblais.blogspot.com/ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Mon Aug 1 13:08:21 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 01 Aug 2011 10:08:21 -0700 Subject: [Numpy-discussion] recommendation for saving data In-Reply-To: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> References: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> Message-ID: <4E36DD85.1040801@noaa.gov> On 7/31/11 5:48 AM, Brian Blais wrote: > I was wondering if there are any recommendations for formats for saving scientific data. every field has it's own standards -- I'd try to find one that is likely to be used by folks that may care about your results. For Oceanographic and Atmospheric modeling data, netcdf is a good option. I like the NetCDF4 python lib: http://code.google.com/p/netcdf4-python/ (there are others) For broader use, and a bit more flexibility, HDF is a good option. There are at least two ways to use it with numpy: PyTables: http://www.pytables.org (Nice higher-level interface) hf5py: http://alfven.org/wp/hdf5-for-python/ (a more raw HDF5 wrapper) There is also the npz format, built in to numpy, if you are happy with requiring python to read the data. -Chris I am running a simulation, which has many somewhat-indepedent parts which have their own internal state and parameters. I've been using pickle (gzipped) to save the entire object (which contains subobjects, etc...), but it is getting too unwieldy and I think it is time to look for a more robust solution. Ideally I'd like to have something where I can call a save method on the simulation object, and it will call the save methods on all the children, on down the line all saving into one file. It'd also be nice if it were cross-platform, and I could depend on the files being readable into the future for a while. > > Are there any good standards for this? What do you use for saving scientific data? > > > thank you, > > Brian Blais > > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From aarchiba at physics.mcgill.ca Mon Aug 1 13:17:33 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Mon, 1 Aug 2011 13:17:33 -0400 Subject: [Numpy-discussion] [SciPy-User] recommendation for saving data In-Reply-To: <4E36DD85.1040801@noaa.gov> References: <8807AC87-DA23-49BE-9D6D-74FE528DBBAC@bryant.edu> <4E36DD85.1040801@noaa.gov> Message-ID: In astronomy we tend to use FITS, which is well-supported by pyfits, but a little limited. Some new instruments are beginning to use HDF5. All these generic formats allow very general data storage, so you will need to come up with a standrdized way to represent your own data. Used well, these formats can be self-describing enough that generic tools can be very useful (e.g. display images, build histograms) but it takes some thought when designing files. Anne On 8/1/11, Christopher Barker wrote: > On 7/31/11 5:48 AM, Brian Blais wrote: >> I was wondering if there are any recommendations for formats for saving >> scientific data. > > every field has it's own standards -- I'd try to find one that is likely > to be used by folks that may care about your results. > > For Oceanographic and Atmospheric modeling data, netcdf is a good > option. I like the NetCDF4 python lib: > > http://code.google.com/p/netcdf4-python/ > > (there are others) > > For broader use, and a bit more flexibility, HDF is a good option. There > are at least two ways to use it with numpy: > > PyTables: http://www.pytables.org > > (Nice higher-level interface) > > hf5py: > http://alfven.org/wp/hdf5-for-python/ > > (a more raw HDF5 wrapper) > > There is also the npz format, built in to numpy, if you are happy with > requiring python to read the data. > > -Chris > > > I am running a simulation, which has many somewhat-indepedent parts > which have their own internal state and parameters. I've been using > pickle (gzipped) to save the entire object (which contains subobjects, > etc...), but it is getting too unwieldy and I think it is time to look > for a more robust solution. Ideally I'd like to have something where I > can call a save method on the simulation object, and it will call the > save methods on all the children, on down the line all saving into one > file. It'd also be nice if it were cross-platform, and I could depend > on the files being readable into the future for a while. >> >> Are there any good standards for this? What do you use for saving >> scientific data? >> >> >> thank you, >> >> Brian Blais >> >> >> > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Sent from my mobile device From tkluck at infty.nl Mon Aug 1 16:33:28 2011 From: tkluck at infty.nl (Timo Kluck) Date: Mon, 1 Aug 2011 22:33:28 +0200 Subject: [Numpy-discussion] numpy.interp running time In-Reply-To: References: <4E3452F1.7010607@hawaii.edu> Message-ID: 2011/8/1 Timo Kluck > 2011/7/30 Eric Firing >> Maybe the thing to do is to pre-calculate if len(xp) <= len(x), or some >> such guess as to which method would be more efficient. >> > What you're suggesting is reasonable. The cutoff at len(xp) <= len(x) can distinguish between the 'refinement' case > and the 'just one value' case. I'll implement it for a start. I just submitted a patch at http://projects.scipy.org/numpy/ticket/1920 . It implements Eric's suggestion. Please review, I'll be happy to adapt it to any of your feedback. Timo From craigyk at me.com Mon Aug 1 19:20:50 2011 From: craigyk at me.com (Craig Yoshioka) Date: Mon, 01 Aug 2011 16:20:50 -0700 Subject: [Numpy-discussion] limit to number of fields in recarray Message-ID: <3D27C63E-6C4A-4908-B2E5-0CA01EDB53A2@me.com> Is there a limit to the number of fields a numpy recarray can have? I was getting a strange error about a duplicate column name, but it wasn't a duplicate. From bevan07 at gmail.com Mon Aug 1 20:43:28 2011 From: bevan07 at gmail.com (Bevan Jenkins) Date: Tue, 2 Aug 2011 00:43:28 +0000 (UTC) Subject: [Numpy-discussion] hold parameters Message-ID: Hello, I have a function that I fitting to a curve via scipy.optimize.leastsq. The function has 4 parameters and this is all working fine. For a site, I have a number of curves (n=10 in the example below). I would like to some of the parameters to be the best fit across all curves (best fit for a site) while letting the other parameters vary for each curve. I have this working as well. The issue I have is like to be able to vary this for a run. That is do a run where parameter1 is best fit for entire site, whith the remaining three varying per curve. Then on the next run, have two parameters being held or fitted for all curves at one. Or be able to do a run where all 4 parameters are fit for each individual curve. Using my e.g. below, if I change the 'fix' dict, so that 'a','b', and 'c' are True, with 'd' False, then I will have to change the zip to for a,b,c in zip(a,b,c): solve(a,b,c,d) I would prefer to find a way to do this via code. I hope this example makes sense. The code below is all within my objective function that is being called by scipy.optimize.leastsq. import numpy as np def solve(a,b,c,d): print a,b,c,d #return x*a*b*c*d fix = {"a":True,"b":True,"c":False,"d":False} n=10 params = np.array([0,1,2,3]*n) params = params.reshape(-1,4) if fix["a"] is True: a = params[0,0] else: a = params[:,0] if fix["b"] is True: b = params[0,1] else: b = params[:,1] if fix["c"] is True: c = params[0,2] else: c = params[:,2] if fix["d"] is True: d = params[0,3] else: d = params[:,3] res=[] for c,d in zip(c,d): res = solve(a,b,c,d) #res = solve(a,b,c,d)-self.orig #return np.hstack(res)**2 From pgmdevlist at gmail.com Tue Aug 2 02:18:53 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 2 Aug 2011 08:18:53 +0200 Subject: [Numpy-discussion] limit to number of fields in recarray In-Reply-To: <3D27C63E-6C4A-4908-B2E5-0CA01EDB53A2@me.com> References: <3D27C63E-6C4A-4908-B2E5-0CA01EDB53A2@me.com> Message-ID: <59DEC051-5161-49B2-9577-8C873A89CB3C@gmail.com> On Aug 2, 2011, at 1:20 AM, Craig Yoshioka wrote: > Is there a limit to the number of fields a numpy recarray can have? I was getting a strange error about a duplicate column name, but it wasn't a duplicate. And the error was? ? From josef.pktd at gmail.com Tue Aug 2 05:07:20 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 2 Aug 2011 05:07:20 -0400 Subject: [Numpy-discussion] hold parameters In-Reply-To: References: Message-ID: On Mon, Aug 1, 2011 at 8:43 PM, Bevan Jenkins wrote: > Hello, > > I have a function that I fitting to a curve via scipy.optimize.leastsq. ?The > function has 4 parameters and this is all working fine. > > For a site, I have a number of curves (n=10 in the example below). ?I would > like to some of the parameters to be the best fit across all curves (best fit > for a site) while letting the other parameters vary for each curve. ?I have > this working as well. > > The issue I have is like to be able to vary this for a run. ?That is do a run > where parameter1 is best fit for entire site, whith the remaining three > varying per curve. Then on the next run, have two parameters being held or > fitted for all curves at one. ?Or be able to do a run where all 4 parameters > are fit for each individual curve. > > Using my e.g. below, if I change the 'fix' dict, so that 'a','b', and 'c' are > True, with 'd' False, then I will have to change the zip to > for a,b,c in zip(a,b,c): > ? ?solve(a,b,c,d) > > I would prefer to find a way to do this via code. ?I hope this example makes > sense. ?The code below is all within my objective function that is being > called by scipy.optimize.leastsq. > import numpy as np > > def solve(a,b,c,d): > ? ?print a,b,c,d > ? ?#return x*a*b*c*d > > > > fix = {"a":True,"b":True,"c":False,"d":False} > > n=10 > params = np.array([0,1,2,3]*n) > params = params.reshape(-1,4) > > if fix["a"] is True: > ? ?a = params[0,0] > else: > ? ?a = params[:,0] > if fix["b"] is True: > ? ?b = params[0,1] > else: > ? ?b = params[:,1] > if fix["c"] is True: > ? ?c = params[0,2] > else: > ? ?c = params[:,2] > if fix["d"] is True: > ? ?d = params[0,3] > else: > ? ?d = params[:,3] > > res=[] > for c,d in zip(c,d): > ? ?res = solve(a,b,c,d) > ? ?#res = solve(a,b,c,d)-self.orig > #return np.hstack(res)**2 I'm not a fan of named individual parameters for function arguments when the number of arguments varies, *args What I'm using is a full parameter array with nan's fixed = np.array([nan, nan, c, d]) #fix c,d def func(args): fixed[np.isnan(fixed)] = args a,b,c,d = fixed ... to set starting values allstartvals = np.array([a0, b0, c0, d0]) startvals = allstartvals[np.isnan(fixed) optimize.leastsq(func, startvals, other_args) or something like this. I find it easier to keep track of the parameters, if I just have an array or tuple. for an alternative, Travis used a different way in the scipy.stats implementation of partially fixed parameters for distributions fit with named arguments. (I don't remember the details) Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From bsouthey at gmail.com Tue Aug 2 09:23:43 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 02 Aug 2011 08:23:43 -0500 Subject: [Numpy-discussion] hold parameters In-Reply-To: References: Message-ID: <4E37FA5F.6090900@gmail.com> On 08/01/2011 07:43 PM, Bevan Jenkins wrote: > Hello, > > I have a function that I fitting to a curve via scipy.optimize.leastsq. The > function has 4 parameters and this is all working fine. > > For a site, I have a number of curves (n=10 in the example below). I would > like to some of the parameters to be the best fit across all curves (best fit > for a site) while letting the other parameters vary for each curve. I have > this working as well. > > The issue I have is like to be able to vary this for a run. That is do a run > where parameter1 is best fit for entire site, whith the remaining three > varying per curve. Then on the next run, have two parameters being held or > fitted for all curves at one. Or be able to do a run where all 4 parameters > are fit for each individual curve. It would really help to know what you mean by 'entire site' and 'run'. If the runs are not independent then what you are doing is incorrect. > Using my e.g. below, if I change the 'fix' dict, so that 'a','b', and 'c' are > True, with 'd' False, then I will have to change the zip to > for a,b,c in zip(a,b,c): > solve(a,b,c,d) > > I would prefer to find a way to do this via code. I hope this example makes > sense. The code below is all within my objective function that is being > called by scipy.optimize.leastsq. > import numpy as np > > def solve(a,b,c,d): > print a,b,c,d > #return x*a*b*c*d > > > > fix = {"a":True,"b":True,"c":False,"d":False} > > n=10 > params = np.array([0,1,2,3]*n) > params = params.reshape(-1,4) > > if fix["a"] is True: > a = params[0,0] > else: > a = params[:,0] > if fix["b"] is True: > b = params[0,1] > else: > b = params[:,1] > if fix["c"] is True: > c = params[0,2] > else: > c = params[:,2] > if fix["d"] is True: > d = params[0,3] > else: > d = params[:,3] > > res=[] > for c,d in zip(c,d): > res = solve(a,b,c,d) > #res = solve(a,b,c,d)-self.orig > #return np.hstack(res)**2 > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Basically this code seems to be trying to do what an analysis of covariance would do. Analysis of covariance type of approach provides a statistical framework where you have a 'global' parameter and condition specific parameters that modify that parameter. Here is one example under SAS that fits a common slope but different intercepts due to drug level. http://support.sas.com/documentation/cdl/en/statug/63347/HTML/default/viewer.htm#statug_glm_sect050.htm That model can be extended allow for different slopes due different drug levels by fitting the interaction between both variables. You can do that easily in numpy/scipy by creating the correct 'design matrix'. The real issue is that you need a statistical measure of the model fit as well as comparison between models (or restrictions). For linear models usually likelihood (or similar measure like Bayesian information criterion) and extra-sums of squares tests are used. But these measures get more interesting in nonlinear cases. Bruce From ralf.gommers at googlemail.com Tue Aug 2 10:07:33 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 2 Aug 2011 16:07:33 +0200 Subject: [Numpy-discussion] numpy.sqrt behaving differently on MacOS Lion In-Reply-To: References: <4E3073C8.8060601@noaa.gov> Message-ID: On Wed, Jul 27, 2011 at 10:33 PM, Ilan Schnell wrote: > > Please don't distribute a different numpy binary for each version of > > MacOS X. > +1 > > Maybe I should mention that I just finished testing all Python > packages in EPD under 10.7, and everything (execpt numpy.sqr > for weird complex values such as inf/nan) works fine! > In particular building C and Fortran extensions with the new LLVM > based gcc and importing them into Python (both 32 and 64-bit). > There are two MacOS builds of EPD (one 32-bit and 64-bit), they > are compiled on 10.5 using gcc 4.0.1 and then tested on 10.5, 10.6 > and 10.7. > > Good to know Ilan. It seems that the problems that so many people experienced with scipy are now solved by the new gfortran binary available from http://r.research.att.com/tools/. So it should be fine to just skip the failing tests. Apple says my computer is too old for Lion, so I need a little help here. Could you either open a ticket with the full output of numpy.test() and assign it to me, or produce a patch? Thanks, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlconlin at gmail.com Tue Aug 2 10:44:26 2011 From: jlconlin at gmail.com (Jeremy Conlin) Date: Tue, 2 Aug 2011 08:44:26 -0600 Subject: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice Message-ID: I am trying to create a numpy array from some text I'm reading from a file. Ideally, I'd like to create a structured array with the first element as an int and the remaining as floats. I'm currently unsuccessful in my attempts. I've copied a simple script below that shows what I've done and the wrong output. Can someone please show me what is happening? I'm using numpy version 1.5.1 under Python 2.7.1 on a Mac running Snow Leopard. Thanks, Jeremy import numpy l = ' 32000 7.89131E-01 8.05999E-03 3.88222E+03' tfc_dtype = numpy.dtype([('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')]) m = numpy.fromstring(l, sep=' ') print("\nm: {}".format(m)) # Next line gives: # ValueError: don't know how to read character strings with that array type #n = numpy.fromstring(l, dtype=tfc_dtype, sep=' ') #print("\nn: {}".format(n)) words = l.split() o = numpy.array(words, dtype='f8') print("\no: {}".format(o)) # Next line(s) gives bad answer p = numpy.array(words, dtype=tfc_dtype) print("\np: {}".format(p)) nps = int(words[0]) t = float(words[1]) e = float(words[2]) fom = float(words[3]) a = [nps, t, e, fom] # Next line(s) converts int to float in first element r = numpy.array(a) print("\nr: {}".format(r)) # Next line gives: # TypeError: expected a readable buffer object # s = numpy.array(a, dtype=tfc_dtype) # print("\ns: {}".format(s)) From brett.olsen at gmail.com Tue Aug 2 11:09:18 2011 From: brett.olsen at gmail.com (Brett Olsen) Date: Tue, 2 Aug 2011 10:09:18 -0500 Subject: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 9:44 AM, Jeremy Conlin wrote: > I am trying to create a numpy array from some text I'm reading from a > file. Ideally, I'd like to create a structured array with the first > element as an int and the remaining as floats. I'm currently > unsuccessful in my attempts. I've copied a simple script below that > shows what I've done and the wrong output. Can someone please show me > what is happening? > > I'm using numpy version 1.5.1 under Python 2.7.1 on a Mac running Snow Leopard. > > Thanks, > Jeremy I'd use numpy.loadtxt: In [1]: import numpy, StringIO In [2]: l = ' 32000 7.89131E-01 8.05999E-03 3.88222E+03' In [3]: tfc_dtype = numpy.dtype([('nps', 'u8'), ('t', 'f8'), ('e', 'f8'), ('fom', 'f8')]) In [4]: input = StringIO.StringIO(l) In [5]: numpy.loadtxt(input, dtype=tfc_dtype) Out[5]: array((32000L, 0.78913100000000003, 0.0080599899999999995, 3882.2199999999998), dtype=[('nps', ' References: Message-ID: On Tue, Aug 2, 2011 at 9:09 AM, Brett Olsen wrote: > On Tue, Aug 2, 2011 at 9:44 AM, Jeremy Conlin wrote: >> I am trying to create a numpy array from some text I'm reading from a >> file. Ideally, I'd like to create a structured array with the first >> element as an int and the remaining as floats. I'm currently >> unsuccessful in my attempts. I've copied a simple script below that >> shows what I've done and the wrong output. Can someone please show me >> what is happening? >> >> I'm using numpy version 1.5.1 under Python 2.7.1 on a Mac running Snow Leopard. >> >> Thanks, >> Jeremy > > I'd use numpy.loadtxt: > > In [1]: import numpy, StringIO > > In [2]: l = ' ? ? ?32000 ?7.89131E-01 ?8.05999E-03 ?3.88222E+03' > > In [3]: tfc_dtype = numpy.dtype([('nps', 'u8'), ('t', 'f8'), ('e', > 'f8'), ('fom', 'f8')]) > > In [4]: input = StringIO.StringIO(l) > > In [5]: numpy.loadtxt(input, dtype=tfc_dtype) > Out[5]: > array((32000L, 0.78913100000000003, 0.0080599899999999995, 3882.2199999999998), > ? ? ?dtype=[('nps', ' > In [6]: input.close() > > In [7]: input = StringIO.StringIO(l) > > In [8]: numpy.loadtxt(input) > Out[8]: > array([ ?3.20000000e+04, ? 7.89131000e-01, ? 8.05999000e-03, > ? ? ? ? 3.88222000e+03]) > > In [9]: input.close() > > If you're reading from a file you can replace the StringIO objects > with file objects. Thanks, Brett. Using StringIO and numpy.loadtxt worked great. I'm still curious why what I was doing didn't work. Everything I can see indicates it should work. Jeremy From thomasmarkovich at gmail.com Tue Aug 2 11:50:16 2011 From: thomasmarkovich at gmail.com (Thomas Markovich) Date: Tue, 2 Aug 2011 10:50:16 -0500 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() Message-ID: Hi All, I installed numpy from the scipy superpack on Snow Leopard with python 2.7 and it all appears to work but when I do the following, I get a segmentation fault. >>> import numpy >>> print numpy.__version__, numpy.__file__ 2.0.0.dev-b5cdaee /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc >>> numpy.test() Running unit tests for numpy NumPy version 2.0.0.dev-b5cdaee NumPy is installed in /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] nose version 1.1.2 ............................................................................................................................................................................................................................................................................................................................Segmentation fault thomasmarkovich:~ Thomas$ What is the best way to trouble shoot this? Do you guys have any suggestions? I have also included the core dump in this email as a pastie link. http://pastie.org/2309652 Best, Thomas -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Aug 2 12:08:35 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 2 Aug 2011 12:08:35 -0400 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: It's a wild guess, but in the past I've had seg faults issues on Mac due to conflicting versions of Python. Do you have multiple Python installs on your Mac? -=- Olivier 2011/8/2 Thomas Markovich > Hi All, > > I installed numpy from the scipy superpack on Snow Leopard with python 2.7 > and it all appears to work but when I do the following, I get a segmentation > fault. > > >>> import numpy > >>> print numpy.__version__, numpy.__file__ > 2.0.0.dev-b5cdaee > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc > >>> numpy.test() > Running unit tests for numpy > NumPy version 2.0.0.dev-b5cdaee > NumPy is installed in > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy > Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC > 4.2.1 (Apple Inc. build 5666) (dot 3)] > nose version 1.1.2 > ............................................................................................................................................................................................................................................................................................................................Segmentation > fault > thomasmarkovich:~ Thomas$ > > What is the best way to trouble shoot this? Do you guys have any > suggestions? I have also included the core dump in this email as a pastie > link. > > http://pastie.org/2309652 > > Best, > > Thomas > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmarkovich at gmail.com Tue Aug 2 12:14:15 2011 From: thomasmarkovich at gmail.com (Thomas Markovich) Date: Tue, 2 Aug 2011 11:14:15 -0500 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: I just have the default "apple" version of python that comes with Snow Leopard (Python 2.6.1 (r261:67515, Aug 2 2010, 20:10:18)) and python 2.7 (Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) ) installed. Should I just remove 2.7 and reinstall everything with the standard apple python? On Tue, Aug 2, 2011 at 11:08 AM, Olivier Delalleau wrote: > It's a wild guess, but in the past I've had seg faults issues on Mac due to > conflicting versions of Python. Do you have multiple Python installs on your > Mac? > > -=- Olivier > > > 2011/8/2 Thomas Markovich > >> Hi All, >> >> I installed numpy from the scipy superpack on Snow Leopard with python 2.7 >> and it all appears to work but when I do the following, I get a segmentation >> fault. >> >> >>> import numpy >> >>> print numpy.__version__, numpy.__file__ >> 2.0.0.dev-b5cdaee >> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc >> >>> numpy.test() >> Running unit tests for numpy >> NumPy version 2.0.0.dev-b5cdaee >> NumPy is installed in >> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy >> Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC >> 4.2.1 (Apple Inc. build 5666) (dot 3)] >> nose version 1.1.2 >> ............................................................................................................................................................................................................................................................................................................................Segmentation >> fault >> thomasmarkovich:~ Thomas$ >> >> What is the best way to trouble shoot this? Do you guys have any >> suggestions? I have also included the core dump in this email as a pastie >> link. >> >> http://pastie.org/2309652 >> >> Best, >> >> Thomas >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue Aug 2 12:27:40 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 02 Aug 2011 11:27:40 -0500 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: <4E38257C.5080802@gmail.com> On 08/02/2011 11:14 AM, Thomas Markovich wrote: > I just have the default "apple" version of python that comes with Snow > Leopard (Python 2.6.1 (r261:67515, Aug 2 2010, 20:10:18)) and python > 2.7 (Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) ) > installed. > > Should I just remove 2.7 and reinstall everything with the standard > apple python? > > On Tue, Aug 2, 2011 at 11:08 AM, Olivier Delalleau > wrote: > > It's a wild guess, but in the past I've had seg faults issues on > Mac due to conflicting versions of Python. Do you have multiple > Python installs on your Mac? > > -=- Olivier > > > 2011/8/2 Thomas Markovich > > > Hi All, > > I installed numpy from the scipy superpack on Snow Leopard > with python 2.7 and it all appears to work but when I do the > following, I get a segmentation fault. > > >>> import numpy > >>> print numpy.__version__, numpy.__file__ > 2.0.0.dev-b5cdaee > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc > >>> numpy.test() > Running unit tests for numpy > NumPy version 2.0.0.dev-b5cdaee > NumPy is installed in > /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy > Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, > 15:22:34) [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] > nose version 1.1.2 > ............................................................................................................................................................................................................................................................................................................................Segmentation > fault > thomasmarkovich:~ Thomas$ > > What is the best way to trouble shoot this? Do you guys have > any suggestions? I have also included the core dump in this > email as a pastie link. > > http://pastie.org/2309652 > > Best, > > Thomas > > Use the numpy test verbose argument ie numpy.test(verbose=10) to find which test it is causing the crash. I have no idea of the Mac but I am curious why there is a 'py2.6' in your numpy version with Python2.7. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Aug 2 12:28:52 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 2 Aug 2011 18:28:52 +0200 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 6:14 PM, Thomas Markovich wrote: > I just have the default "apple" version of python that comes with Snow > Leopard (Python 2.6.1 (r261:67515, Aug 2 2010, 20:10:18)) and python 2.7 > (Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) ) installed. > > Should I just remove 2.7 and reinstall everything with the standard apple > python? > > Did you get it from http://stronginference.com/scipy-superpack/? The info on the 10.6 installer has disappeared, but the 10.7 one is built against Apple's Python. So conflicting Pythons makes sense. Even if you find the right one, it may be worth emailing Chris to ask him to put back the info for the 10.6 installer. Ralf On Tue, Aug 2, 2011 at 11:08 AM, Olivier Delalleau wrote: > >> It's a wild guess, but in the past I've had seg faults issues on Mac due >> to conflicting versions of Python. Do you have multiple Python installs on >> your Mac? >> >> -=- Olivier >> >> >> 2011/8/2 Thomas Markovich >> >>> Hi All, >>> >>> I installed numpy from the scipy superpack on Snow Leopard with python >>> 2.7 and it all appears to work but when I do the following, I get a >>> segmentation fault. >>> >>> >>> import numpy >>> >>> print numpy.__version__, numpy.__file__ >>> 2.0.0.dev-b5cdaee >>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc >>> >>> numpy.test() >>> Running unit tests for numpy >>> NumPy version 2.0.0.dev-b5cdaee >>> NumPy is installed in >>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy >>> Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC >>> 4.2.1 (Apple Inc. build 5666) (dot 3)] >>> nose version 1.1.2 >>> ............................................................................................................................................................................................................................................................................................................................Segmentation >>> fault >>> thomasmarkovich:~ Thomas$ >>> >>> What is the best way to trouble shoot this? Do you guys have any >>> suggestions? I have also included the core dump in this email as a pastie >>> link. >>> >>> http://pastie.org/2309652 >>> >>> Best, >>> >>> Thomas >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomasmarkovich at gmail.com Tue Aug 2 12:57:21 2011 From: thomasmarkovich at gmail.com (Thomas Markovich) Date: Tue, 2 Aug 2011 11:57:21 -0500 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: It appears that uninstalling python 2.7 and installing the scipy superpack with the apple standard python removes the segfaulting behavior from numpy. Now it appears that just scipy is segfaulting at test "test_arpack.test_hermitian_modes(True, , 'F', 2, 'SM', None, 0.5, ) ... Segmentation fault" Thomas On Tue, Aug 2, 2011 at 11:28 AM, Ralf Gommers wrote: > > > On Tue, Aug 2, 2011 at 6:14 PM, Thomas Markovich < > thomasmarkovich at gmail.com> wrote: > >> I just have the default "apple" version of python that comes with Snow >> Leopard (Python 2.6.1 (r261:67515, Aug 2 2010, 20:10:18)) and python 2.7 >> (Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) ) installed. >> >> Should I just remove 2.7 and reinstall everything with the standard apple >> python? >> >> Did you get it from http://stronginference.com/scipy-superpack/? The info > on the 10.6 installer has disappeared, but the 10.7 one is built against > Apple's Python. So conflicting Pythons makes sense. Even if you find the > right one, it may be worth emailing Chris to ask him to put back the info > for the 10.6 installer. > > Ralf > > > On Tue, Aug 2, 2011 at 11:08 AM, Olivier Delalleau wrote: >> >>> It's a wild guess, but in the past I've had seg faults issues on Mac due >>> to conflicting versions of Python. Do you have multiple Python installs on >>> your Mac? >>> >>> -=- Olivier >>> >>> >>> 2011/8/2 Thomas Markovich >>> >>>> Hi All, >>>> >>>> I installed numpy from the scipy superpack on Snow Leopard with python >>>> 2.7 and it all appears to work but when I do the following, I get a >>>> segmentation fault. >>>> >>>> >>> import numpy >>>> >>> print numpy.__version__, numpy.__file__ >>>> 2.0.0.dev-b5cdaee >>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc >>>> >>> numpy.test() >>>> Running unit tests for numpy >>>> NumPy version 2.0.0.dev-b5cdaee >>>> NumPy is installed in >>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy >>>> Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC >>>> 4.2.1 (Apple Inc. build 5666) (dot 3)] >>>> nose version 1.1.2 >>>> ............................................................................................................................................................................................................................................................................................................................Segmentation >>>> fault >>>> thomasmarkovich:~ Thomas$ >>>> >>>> What is the best way to trouble shoot this? Do you guys have any >>>> suggestions? I have also included the core dump in this email as a pastie >>>> link. >>>> >>>> http://pastie.org/2309652 >>>> >>>> Best, >>>> >>>> Thomas >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Aug 2 13:06:37 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 2 Aug 2011 19:06:37 +0200 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: On Tue, Aug 2, 2011 at 6:57 PM, Thomas Markovich wrote: > It appears that uninstalling python 2.7 and installing the scipy superpack > with the apple standard python removes the segfaulting behavior from numpy. > Now it appears that just scipy is segfaulting at test > "test_arpack.test_hermitian_modes(True, , 'F', 2, 'SM', None, > 0.5, ) ... Segmentation fault" > > That is a known problem (unfortunately hard to fix), see http://projects.scipy.org/scipy/ticket/1472 Everything else besides arpack should work fine for you. Cheers, Ralf > > > > On Tue, Aug 2, 2011 at 11:28 AM, Ralf Gommers > wrote: > >> >> >> On Tue, Aug 2, 2011 at 6:14 PM, Thomas Markovich < >> thomasmarkovich at gmail.com> wrote: >> >>> I just have the default "apple" version of python that comes with Snow >>> Leopard (Python 2.6.1 (r261:67515, Aug 2 2010, 20:10:18)) and python 2.7 >>> (Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) ) installed. >>> >>> Should I just remove 2.7 and reinstall everything with the standard apple >>> python? >>> >>> Did you get it from http://stronginference.com/scipy-superpack/? The >> info on the 10.6 installer has disappeared, but the 10.7 one is built >> against Apple's Python. So conflicting Pythons makes sense. Even if you find >> the right one, it may be worth emailing Chris to ask him to put back the >> info for the 10.6 installer. >> >> Ralf >> >> >> On Tue, Aug 2, 2011 at 11:08 AM, Olivier Delalleau wrote: >>> >>>> It's a wild guess, but in the past I've had seg faults issues on Mac due >>>> to conflicting versions of Python. Do you have multiple Python installs on >>>> your Mac? >>>> >>>> -=- Olivier >>>> >>>> >>>> 2011/8/2 Thomas Markovich >>>> >>>>> Hi All, >>>>> >>>>> I installed numpy from the scipy superpack on Snow Leopard with python >>>>> 2.7 and it all appears to work but when I do the following, I get a >>>>> segmentation fault. >>>>> >>>>> >>> import numpy >>>>> >>> print numpy.__version__, numpy.__file__ >>>>> 2.0.0.dev-b5cdaee >>>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc >>>>> >>> numpy.test() >>>>> Running unit tests for numpy >>>>> NumPy version 2.0.0.dev-b5cdaee >>>>> NumPy is installed in >>>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy >>>>> Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC >>>>> 4.2.1 (Apple Inc. build 5666) (dot 3)] >>>>> nose version 1.1.2 >>>>> ............................................................................................................................................................................................................................................................................................................................Segmentation >>>>> fault >>>>> thomasmarkovich:~ Thomas$ >>>>> >>>>> What is the best way to trouble shoot this? Do you guys have any >>>>> suggestions? I have also included the core dump in this email as a pastie >>>>> link. >>>>> >>>>> http://pastie.org/2309652 >>>>> >>>>> Best, >>>>> >>>>> Thomas >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Tue Aug 2 13:12:12 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 2 Aug 2011 19:12:12 +0200 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: On 2 Aug 2011, at 18:57, Thomas Markovich wrote: > It appears that uninstalling python 2.7 and installing the scipy > superpack with the apple standard python removes the Did the superpack installer automatically install numpy to the python2.7 directory when present? Even if so, I reckon you could simply reinstall python2.7 after the numpy installation (still calling python2.6 to use numpy of course...). > segfaulting behavior from numpy. Now it appears that just scipy is > segfaulting at test "test_arpack.test_hermitian_modes(True, hermitian>, 'F', 2, 'SM', None, 0.5, 0x1043b1848>) ... Segmentation fault" Which architecture is this? Being on Snow Leopard, probably x86_46... I remember encountering similar problems on PPC, which I suspect are related to stability issues with Apple's Accelerate framework. Cheers, Derek From thomasmarkovich at gmail.com Tue Aug 2 13:14:02 2011 From: thomasmarkovich at gmail.com (Thomas Markovich) Date: Tue, 2 Aug 2011 12:14:02 -0500 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: Oh okay, that's unfortunate but I guess not unexpected. Regardless, thank you so much for all your help Ralf, Bruce, and Oliver! You guys are great. Just to recap, the issue appears to stem from using the scipy superpack with python 2.7 from python.org. This was solved by using the apple python along with the scipy superpack. Thomas On Tue, Aug 2, 2011 at 12:06 PM, Ralf Gommers wrote: > > > On Tue, Aug 2, 2011 at 6:57 PM, Thomas Markovich < > thomasmarkovich at gmail.com> wrote: > >> It appears that uninstalling python 2.7 and installing the scipy superpack >> with the apple standard python removes the segfaulting behavior from numpy. >> Now it appears that just scipy is segfaulting at test >> "test_arpack.test_hermitian_modes(True, , 'F', 2, 'SM', None, >> 0.5, ) ... Segmentation fault" >> >> That is a known problem (unfortunately hard to fix), see > http://projects.scipy.org/scipy/ticket/1472 > Everything else besides arpack should work fine for you. > > Cheers, > Ralf > > >> >> >> >> On Tue, Aug 2, 2011 at 11:28 AM, Ralf Gommers < >> ralf.gommers at googlemail.com> wrote: >> >>> >>> >>> On Tue, Aug 2, 2011 at 6:14 PM, Thomas Markovich < >>> thomasmarkovich at gmail.com> wrote: >>> >>>> I just have the default "apple" version of python that comes with Snow >>>> Leopard (Python 2.6.1 (r261:67515, Aug 2 2010, 20:10:18)) and python 2.7 >>>> (Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) ) installed. >>>> >>>> Should I just remove 2.7 and reinstall everything with the standard >>>> apple python? >>>> >>>> Did you get it from http://stronginference.com/scipy-superpack/? The >>> info on the 10.6 installer has disappeared, but the 10.7 one is built >>> against Apple's Python. So conflicting Pythons makes sense. Even if you find >>> the right one, it may be worth emailing Chris to ask him to put back the >>> info for the 10.6 installer. >>> >>> Ralf >>> >>> >>> On Tue, Aug 2, 2011 at 11:08 AM, Olivier Delalleau wrote: >>>> >>>>> It's a wild guess, but in the past I've had seg faults issues on Mac >>>>> due to conflicting versions of Python. Do you have multiple Python installs >>>>> on your Mac? >>>>> >>>>> -=- Olivier >>>>> >>>>> >>>>> 2011/8/2 Thomas Markovich >>>>> >>>>>> Hi All, >>>>>> >>>>>> I installed numpy from the scipy superpack on Snow Leopard with python >>>>>> 2.7 and it all appears to work but when I do the following, I get a >>>>>> segmentation fault. >>>>>> >>>>>> >>> import numpy >>>>>> >>> print numpy.__version__, numpy.__file__ >>>>>> 2.0.0.dev-b5cdaee >>>>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc >>>>>> >>> numpy.test() >>>>>> Running unit tests for numpy >>>>>> NumPy version 2.0.0.dev-b5cdaee >>>>>> NumPy is installed in >>>>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy >>>>>> Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) [GCC >>>>>> 4.2.1 (Apple Inc. build 5666) (dot 3)] >>>>>> nose version 1.1.2 >>>>>> ............................................................................................................................................................................................................................................................................................................................Segmentation >>>>>> fault >>>>>> thomasmarkovich:~ Thomas$ >>>>>> >>>>>> What is the best way to trouble shoot this? Do you guys have any >>>>>> suggestions? I have also included the core dump in this email as a pastie >>>>>> link. >>>>>> >>>>>> http://pastie.org/2309652 >>>>>> >>>>>> Best, >>>>>> >>>>>> Thomas >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Tue Aug 2 13:15:50 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 02 Aug 2011 10:15:50 -0700 Subject: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice In-Reply-To: References: Message-ID: <4E3830C6.1060902@noaa.gov> On 8/2/11 8:38 AM, Jeremy Conlin wrote: > Thanks, Brett. Using StringIO and numpy.loadtxt worked great. I'm > still curious why what I was doing didn't work. Everything I can see > indicates it should work. In [11]: tfc_dtype Out[11]: dtype([('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')]) In [15]: n = numpy.fromstring(l, dtype=tfc_dtype, sep=' ') --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /Users/cbarker/ in () ValueError: don't know how to read character strings with that array type means just what it says. In theory, numpy.fromstring() (and fromfile() ) provides a way to quickly and efficiently generate arrays from text, but it practice, the code is quite limited (and has a bug or two). I don't think anyone has gotten around to writing the code to use structured dtypes with it -- so it can't do what you want (rational though that expectation is) In [21]: words Out[21]: ['32000', '7.89131E-01', '8.05999E-03', '3.88222E+03'] In [22]: p = Display all 249 possibilities? (y or n) In [22]: p = numpy.array(words, dtype=tfc_dtype) In [23]: p Out[23]: array([(3689064028291727360L, 0.0, 0.0, 0.0), (3976177339304456517L, 4.967820413490985e-91, 0.0, 0.0), (4048226120204106053L, 4.970217431784588e-91, 0.0, 0.0), (3687946958874489413L, 1.1572189237420885e-100, 0.0, 0.0)], dtype=[('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')]) similarly here -- converting from text to structured dtypes is not fully supported In [29]: a Out[29]: [32000, 0.789131, 0.00805999, 3882.22] In [30]: r = numpy.array(a) In [31]: r Out[31]: array([ 3.20000000e+04, 7.89131000e-01, 8.05999000e-03, 3.88222000e+03]) sure -- numpy's default behavior is to find a dtype that will hold all the input array -- this pre-dates structured dtypes, and probably what you would want b default anyway. In [32]: s = numpy.array(a, dtype=tfc_dtype) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Users/cbarker/ in () TypeError: expected a readable buffer object OK -- I can see why you'd expect that to work. However, the trick with structured dtypes is that the dimensionality of the inputs can be less than obvious -- you are passing in a 1-d list of 4 numbers -- do you want a 1-d array? or ? -- in this case, it's pretty obvious (as a human) what you would want -- you have a dtype with four fields, and you're passing in four numbers, but there are so many possible combinations that numpy doesn't try to be "smart" about it. So as a rule, you need to be quite specific when working with structured dtypes. However, the default is for numpy to map tuples to dtypes, so if you pass in a tuple instead, it works: In [34]: t = tuple(a) In [35]: s = numpy.array(t, dtype=tfc_dtype) In [36]: s Out[36]: array((32000L, 0.789131, 0.00805999, 3882.22), dtype=[('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')]) you were THIS close! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From shish at keba.be Tue Aug 2 13:21:59 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 2 Aug 2011 13:21:59 -0400 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: Maybe specify which scipy superpack. Your issue was probably because the superpack you installed was not meant to be used with Python 2.7. -=- Olivier 2011/8/2 Thomas Markovich > Oh okay, that's unfortunate but I guess not unexpected. Regardless, thank > you so much for all your help Ralf, Bruce, and Oliver! You guys are great. > > Just to recap, the issue appears to stem from using the scipy superpack > with python 2.7 from python.org. This was solved by using the apple python > along with the scipy superpack. > > Thomas > > > On Tue, Aug 2, 2011 at 12:06 PM, Ralf Gommers > wrote: > >> >> >> On Tue, Aug 2, 2011 at 6:57 PM, Thomas Markovich < >> thomasmarkovich at gmail.com> wrote: >> >>> It appears that uninstalling python 2.7 and installing the scipy >>> superpack with the apple standard python removes the segfaulting behavior >>> from numpy. Now it appears that just scipy is segfaulting at test >>> "test_arpack.test_hermitian_modes(True, , 'F', 2, 'SM', None, >>> 0.5, ) ... Segmentation fault" >>> >>> That is a known problem (unfortunately hard to fix), see >> http://projects.scipy.org/scipy/ticket/1472 >> Everything else besides arpack should work fine for you. >> >> Cheers, >> Ralf >> >> >>> >>> >>> >>> On Tue, Aug 2, 2011 at 11:28 AM, Ralf Gommers < >>> ralf.gommers at googlemail.com> wrote: >>> >>>> >>>> >>>> On Tue, Aug 2, 2011 at 6:14 PM, Thomas Markovich < >>>> thomasmarkovich at gmail.com> wrote: >>>> >>>>> I just have the default "apple" version of python that comes with Snow >>>>> Leopard (Python 2.6.1 (r261:67515, Aug 2 2010, 20:10:18)) and python 2.7 >>>>> (Python 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) ) installed. >>>>> >>>>> Should I just remove 2.7 and reinstall everything with the standard >>>>> apple python? >>>>> >>>>> Did you get it from http://stronginference.com/scipy-superpack/? The >>>> info on the 10.6 installer has disappeared, but the 10.7 one is built >>>> against Apple's Python. So conflicting Pythons makes sense. Even if you find >>>> the right one, it may be worth emailing Chris to ask him to put back the >>>> info for the 10.6 installer. >>>> >>>> Ralf >>>> >>>> >>>> On Tue, Aug 2, 2011 at 11:08 AM, Olivier Delalleau wrote: >>>>> >>>>>> It's a wild guess, but in the past I've had seg faults issues on Mac >>>>>> due to conflicting versions of Python. Do you have multiple Python installs >>>>>> on your Mac? >>>>>> >>>>>> -=- Olivier >>>>>> >>>>>> >>>>>> 2011/8/2 Thomas Markovich >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I installed numpy from the scipy superpack on Snow Leopard with >>>>>>> python 2.7 and it all appears to work but when I do the following, I get a >>>>>>> segmentation fault. >>>>>>> >>>>>>> >>> import numpy >>>>>>> >>> print numpy.__version__, numpy.__file__ >>>>>>> 2.0.0.dev-b5cdaee >>>>>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy/__init__.pyc >>>>>>> >>> numpy.test() >>>>>>> Running unit tests for numpy >>>>>>> NumPy version 2.0.0.dev-b5cdaee >>>>>>> NumPy is installed in >>>>>>> /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy-2.0.0.dev_b5cdaee_20110710-py2.6-macosx-10.6-universal.egg/numpy >>>>>>> Python version 2.7.2 (v2.7.2:8527427914a2, Jun 11 2011, 15:22:34) >>>>>>> [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] >>>>>>> nose version 1.1.2 >>>>>>> ............................................................................................................................................................................................................................................................................................................................Segmentation >>>>>>> fault >>>>>>> thomasmarkovich:~ Thomas$ >>>>>>> >>>>>>> What is the best way to trouble shoot this? Do you guys have any >>>>>>> suggestions? I have also included the core dump in this email as a pastie >>>>>>> link. >>>>>>> >>>>>>> http://pastie.org/2309652 >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Thomas >>>>>>> >>>>>>> _______________________________________________ >>>>>>> NumPy-Discussion mailing list >>>>>>> NumPy-Discussion at scipy.org >>>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>>> >>>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Tue Aug 2 13:24:15 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 02 Aug 2011 10:24:15 -0700 Subject: [Numpy-discussion] Segmentation Fault in Numpy.test() In-Reply-To: References: Message-ID: <4E3832BF.1030300@noaa.gov> On 8/2/11 10:14 AM, Thomas Markovich wrote: > Just to recap, the issue appears to stem from using the scipy superpack > with python 2.7 from python.org . This was solved by > using the apple python along with the scipy superpack. This sure sounds like a bug in the sciy superpack installer -- if it was build for the system python2.6, it should not get installed into 2.7. Unless you did something to force that. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From derek at astro.physik.uni-goettingen.de Tue Aug 2 14:19:31 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 2 Aug 2011 20:19:31 +0200 Subject: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice In-Reply-To: <4E3830C6.1060902@noaa.gov> References: <4E3830C6.1060902@noaa.gov> Message-ID: On 2 Aug 2011, at 19:15, Christopher Barker wrote: > In [32]: s = numpy.array(a, dtype=tfc_dtype) > --------------------------------------------------------------------------- > TypeError Traceback (most recent > call last) > > /Users/cbarker/ in () > > TypeError: expected a readable buffer object > > OK -- I can see why you'd expect that to work. However, the trick with > structured dtypes is that the dimensionality of the inputs can be less > than obvious -- you are passing in a 1-d list of 4 numbers -- do you > want a 1-d array? or ? -- in this case, it's pretty obvious (as a > human) > what you would want -- you have a dtype with four fields, and you're > passing in four numbers, but there are so many possible combinations > that numpy doesn't try to be "smart" about it. So as a rule, you > need to > be quite specific when working with structured dtypes. > > However, the default is for numpy to map tuples to dtypes, so if you > pass in a tuple instead, it works: > > In [34]: t = tuple(a) > > In [35]: s = numpy.array(t, dtype=tfc_dtype) > > In [36]: s > Out[36]: > array((32000L, 0.789131, 0.00805999, 3882.22), > dtype=[('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', > '>f8')]) > > you were THIS close! Thanks for the detailed discussion! BTW this works also without explicitly converting the words one by one: In [1]: l = ' 32000 7.89131E-01 8.05999E-03 3.88222E+03' In [2]: tfc_dtype = numpy.dtype([('nps', 'u8'), ('t', 'f8'), ('e', 'f8'),('fom', 'f8')]) In [3]: numpy.array(tuple(l.split()), dtype=tfc_dtype) Out[3]: array((32000L, 0.789131, 0.00805999, 3882.22), dtype=[('nps', ' References: <3D27C63E-6C4A-4908-B2E5-0CA01EDB53A2@me.com> <59DEC051-5161-49B2-9577-8C873A89CB3C@gmail.com> Message-ID: <58F1BDB2-4767-4E4B-9E33-0E04F80C1D52@me.com> duplicate column in dtype? I just consolidated some of the columns and the error went away... none had duplicate field names... hence the question. On Aug 1, 2011, at 11:18 PM, Pierre GM wrote: > > On Aug 2, 2011, at 1:20 AM, Craig Yoshioka wrote: > >> Is there a limit to the number of fields a numpy recarray can have? I was getting a strange error about a duplicate column name, but it wasn't a duplicate. > > And the error was? ? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jsseabold at gmail.com Tue Aug 2 15:31:06 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 2 Aug 2011 15:31:06 -0400 Subject: [Numpy-discussion] limit to number of fields in recarray In-Reply-To: <58F1BDB2-4767-4E4B-9E33-0E04F80C1D52@me.com> References: <3D27C63E-6C4A-4908-B2E5-0CA01EDB53A2@me.com> <59DEC051-5161-49B2-9577-8C873A89CB3C@gmail.com> <58F1BDB2-4767-4E4B-9E33-0E04F80C1D52@me.com> Message-ID: On Tue, Aug 2, 2011 at 3:19 PM, Craig Yoshioka wrote: > duplicate column in dtype? > "Duplicate field names given."? Can you post code to replicate? > I just consolidated some of the columns and the error went away... none had duplicate field names... hence the question. > I don't think this would be raised unless there are duplicates. There is some name changing for invalid field names that could result in a name collision. I think I've run into this before. Skipper From craigyk at me.com Tue Aug 2 16:09:59 2011 From: craigyk at me.com (Craig Yoshioka) Date: Tue, 02 Aug 2011 13:09:59 -0700 Subject: [Numpy-discussion] limit to number of fields in recarray In-Reply-To: References: <3D27C63E-6C4A-4908-B2E5-0CA01EDB53A2@me.com> <59DEC051-5161-49B2-9577-8C873A89CB3C@gmail.com> <58F1BDB2-4767-4E4B-9E33-0E04F80C1D52@me.com> Message-ID: <04693AFD-A761-45DC-8E20-074064B82F36@me.com> yup, duplicate field names given. I didn't commit the non-working version and I didn't want to mess up my working code so I tried duplicating the dtype in a new file and couldn't recreate the error. I suppose the answer to my question is, there is no limit to the number of records? Must have been an invalid name, or a different error on my part. Out of curiosity, what does recarray consider an invalid field name? On Aug 2, 2011, at 12:31 PM, Skipper Seabold wrote: > On Tue, Aug 2, 2011 at 3:19 PM, Craig Yoshioka wrote: >> duplicate column in dtype? >> > > "Duplicate field names given."? Can you post code to replicate? > >> I just consolidated some of the columns and the error went away... none had duplicate field names... hence the question. >> > > I don't think this would be raised unless there are duplicates. There > is some name changing for invalid field names that could result in a > name collision. I think I've run into this before. > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jsseabold at gmail.com Tue Aug 2 16:18:01 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 2 Aug 2011 16:18:01 -0400 Subject: [Numpy-discussion] limit to number of fields in recarray In-Reply-To: <04693AFD-A761-45DC-8E20-074064B82F36@me.com> References: <3D27C63E-6C4A-4908-B2E5-0CA01EDB53A2@me.com> <59DEC051-5161-49B2-9577-8C873A89CB3C@gmail.com> <58F1BDB2-4767-4E4B-9E33-0E04F80C1D52@me.com> <04693AFD-A761-45DC-8E20-074064B82F36@me.com> Message-ID: On Tue, Aug 2, 2011 at 4:09 PM, Craig Yoshioka wrote: > yup, duplicate field names given. ?I didn't commit the non-working version and I didn't want to mess up my working code so I tried duplicating the dtype in a new file and couldn't recreate the error. ? I suppose the answer to my question is, there is no limit to the number of records? ?Must have been an invalid name, or a different error on my part. ?Out of curiosity, what does recarray consider an invalid field name? I guess this checking is only done in genfromtxt and that's where I recall coming across it. http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html#validating-names Skipper From jlconlin at gmail.com Tue Aug 2 16:40:01 2011 From: jlconlin at gmail.com (Jeremy Conlin) Date: Tue, 2 Aug 2011 14:40:01 -0600 Subject: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice In-Reply-To: <4E3830C6.1060902@noaa.gov> References: <4E3830C6.1060902@noaa.gov> Message-ID: On Tue, Aug 2, 2011 at 11:15 AM, Christopher Barker wrote: > On 8/2/11 8:38 AM, Jeremy Conlin wrote: >> Thanks, Brett. Using StringIO and numpy.loadtxt worked great. I'm >> still curious why what I was doing didn't work. Everything I can see >> indicates it should work. > > In [11]: tfc_dtype > Out[11]: dtype([('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')]) > > In [15]: n = numpy.fromstring(l, dtype=tfc_dtype, sep=' ') > --------------------------------------------------------------------------- > ValueError ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last) > > /Users/cbarker/ in () > > ValueError: don't know how to read character strings with that array type > > means just what it says. In theory, numpy.fromstring() (and fromfile() ) > provides a way to quickly and efficiently generate arrays from text, but > it practice, the code is quite limited (and has a bug or two). I don't > think anyone has gotten around to writing the code to use structured > dtypes with it -- so it can't do what you want (rational though that > expectation is) > > In [21]: words > Out[21]: ['32000', '7.89131E-01', '8.05999E-03', '3.88222E+03'] > > In [22]: p = > Display all 249 possibilities? (y or n) > > In [22]: p = numpy.array(words, dtype=tfc_dtype) > > In [23]: p > Out[23]: > array([(3689064028291727360L, 0.0, 0.0, 0.0), > ? ? ? ?(3976177339304456517L, 4.967820413490985e-91, 0.0, 0.0), > ? ? ? ?(4048226120204106053L, 4.970217431784588e-91, 0.0, 0.0), > ? ? ? ?(3687946958874489413L, 1.1572189237420885e-100, 0.0, 0.0)], > ? ? ? dtype=[('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')]) > > similarly here -- converting from text to structured dtypes is not fully > supported > > In [29]: a > Out[29]: [32000, 0.789131, 0.00805999, 3882.22] > > In [30]: r = numpy.array(a) > > In [31]: r > Out[31]: > array([ ?3.20000000e+04, ? 7.89131000e-01, ? 8.05999000e-03, > ? ? ? ? ?3.88222000e+03]) > > sure -- numpy's default behavior is to find a dtype that will hold all > the input array -- this pre-dates structured dtypes, and probably what > you would want b default anyway. > > In [32]: s = numpy.array(a, dtype=tfc_dtype) > --------------------------------------------------------------------------- > TypeError ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call last) > > /Users/cbarker/ in () > > TypeError: expected a readable buffer object > > OK -- I can see why you'd expect that to work. However, the trick with > structured dtypes is that the dimensionality of the inputs can be less > than obvious -- you are passing in a 1-d list of 4 numbers -- do you > want a 1-d array? or ? -- in this case, it's pretty obvious (as a human) > what you would want -- you have a dtype with four fields, and you're > passing in four numbers, but there are so many possible combinations > that numpy doesn't try to be "smart" about it. So as a rule, you need to > be quite specific when working with structured dtypes. > > However, the default is for numpy to map tuples to dtypes, so if you > pass in a tuple instead, it works: > > In [34]: t = tuple(a) > > In [35]: s = numpy.array(t, dtype=tfc_dtype) > > In [36]: s > Out[36]: > array((32000L, 0.789131, 0.00805999, 3882.22), > ? ? ? dtype=[('nps', '>u8'), ('t', '>f8'), ('e', '>f8'), ('fom', '>f8')]) > > you were THIS close! > > -Chris > > > > > > > -- > Christopher Barker, Ph.D. > Oceanographer Chris, Thanks for that information. It helps greatly in understanding what is happening. Next time I'll put my data into tuples. Jeremy From fperez.net at gmail.com Wed Aug 3 03:40:46 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 3 Aug 2011 00:40:46 -0700 Subject: [Numpy-discussion] [ANN] IPython 0.11 is officially out In-Reply-To: References: Message-ID: On Sun, Jul 31, 2011 at 10:19 AM, Fernando Perez wrote: > Please see our release notes for the full details on everything about > this release: https://github.com/ipython/ipython/zipball/rel-0.11 And embarrassingly, that URL was for a zip download instead (copy/paste error), the detailed release notes are here: http://ipython.org/ipython-doc/rel-0.11/whatsnew/version0.11.html Sorry about the mistake... Cheers, f From Chris.Barker at noaa.gov Wed Aug 3 11:50:11 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Aug 2011 08:50:11 -0700 Subject: [Numpy-discussion] Finding many ways to incorrectly create a numpy array. Please advice In-Reply-To: References: <4E3830C6.1060902@noaa.gov> Message-ID: <4E396E33.4070503@noaa.gov> On 8/2/11 1:40 PM, Jeremy Conlin wrote: > Thanks for that information. It helps greatly in understanding what is > happening. Next time I'll put my data into tuples. I don't remember where they all are, but there are a few places in numpy where tuples and lists are interpreted differently (fancy indexing?). It kind of breaks python "duck typing" (a sequence is a sequence), but it's useful, too. So when a list fails to do what you want, try a tuple. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From kikocorreoso at gmail.com Wed Aug 3 12:30:29 2011 From: kikocorreoso at gmail.com (Kiko) Date: Wed, 3 Aug 2011 18:30:29 +0200 Subject: [Numpy-discussion] Reading a big netcdf file Message-ID: Hi. I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. The data are described as: *The GEBCO gridded data set is stored in NetCDF as a one dimensional array of 2-byte signed integers that represent integer elevations in metres. The complete data set gives global coverage. It consists of 21601 x 10801 data values, one for each one minute of latitude and longitude for 233312401 points. The data start at position 90?N, 180?W and are arranged in bands of 360 degrees x 60 points/degree + 1 = 21601 values. The data range eastward from 180?W longitude to 180?E longitude, i.e. the 180? value is repeated.* The problem is that it is very slow (or I am quite newbie). Anyone has a suggestion to get these data in a numpy array in a faster way? Thanks in advance. -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Wed Aug 3 12:46:18 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 3 Aug 2011 10:46:18 -0600 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: Message-ID: Here are my values for your comparison: test.nc file is about 715 MB. The details are below: In [21]: netCDF4.__version__ Out[21]: '0.9.4' In [22]: np.__version__ Out[22]: '2.0.0.dev-b233716' In [23]: from netCDF4 import Dataset In [24]: f = Dataset("test.nc") In [25]: f.variables['reflectivity'].shape Out[25]: (6, 18909, 506) In [26]: f.variables['reflectivity'].size Out[26]: 57407724 In [27]: f.variables['reflectivity'][:].dtype Out[27]: dtype('float32') In [28]: timeit z = f.variables['reflectivity'][:] 1 loops, best of 3: 731 ms per loop How long it takes in your side to read that big array? On Wed, Aug 3, 2011 at 10:30 AM, Kiko wrote: > Hi. > > I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. > > The data are described as: > *The GEBCO gridded data set is stored in NetCDF as a one dimensional array > of 2-byte signed integers that represent integer elevations in metres. > The complete data set gives global coverage. It consists of 21601 x 10801 > data values, one for each one minute of latitude and longitude for 233312401 > points. > The data start at position 90?N, 180?W and are arranged in bands of 360 > degrees x 60 points/degree + 1 = 21601 values. The data range eastward from > 180?W longitude to 180?E longitude, i.e. the 180? value is repeated.* > > The problem is that it is very slow (or I am quite newbie). > > Anyone has a suggestion to get these data in a numpy array in a faster way? > > Thanks in advance. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Aug 3 12:50:50 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Aug 2011 09:50:50 -0700 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: Message-ID: <4E397C6A.8010101@noaa.gov> On 8/3/11 9:30 AM, Kiko wrote: > I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. I've never noticed that netCDF4 was particularly slow for reading (writing can be pretty slow some times). How slow is slow? > The data are described as: please post the results of: ncdump -h the_file_name.nc So we can see if there is anything odd in the structure (though I don't know what it might be) Post your code (in the simnd pplest form you can). and post your timings and machine type Is the file netcdf4 or 3 format? (the python lib will read either) As a reference, reading that much data in from a raw file into a numpy array takes 2.57 on my machine (a rather old Mac, but disks haven't gotten much faster). YOu can test that like this: a = np.zeros((21601, 10801), dtype=np.uint16) a.tofile('temp.npa') del a timeit a = np.fromfile('temp.npa', dtype=np.uint16) (using ipython's timeit) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gokhansever at gmail.com Wed Aug 3 13:01:24 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 3 Aug 2011 11:01:24 -0600 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: Message-ID: Just a few extra tests on my side pushing the limits of my system memory: In [34]: k = np.zeros((21601, 10801, 3), dtype='int16') k ndarray 21601x10801x3: 699937203 elems, type `int16`, 1399874406 bytes (1335 Mb) And for the first time my memory explodes with a hard kernel crash: In [36]: k = np.zeros((21601, 10801, 13), dtype='int16') Message from syslogd at ccn at Aug 3 10:51:43 ... kernel:[48715.531155] ------------[ cut here ]------------ Message from syslogd at ccn at Aug 3 10:51:43 ... kernel:[48715.531163] invalid opcode: 0000 [#1] SMP Message from syslogd at ccn at Aug 3 10:51:43 ... kernel:[48715.531166] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map Message from syslogd at ccn at Aug 3 10:51:43 ... kernel:[48715.531253] Stack: Message from syslogd at ccn at Aug 3 10:51:43 ... kernel:[48715.531265] Call Trace: Message from syslogd at ccn at Aug 3 10:51:43 ... kernel:[48715.531332] Code: be 33 01 00 00 48 89 fb 48 c7 c7 67 31 7a 81 e8 b0 2d f1 ff e8 90 f2 33 00 48 89 df e8 86 db 00 00 48 83 bb 60 01 00 00 00 74 02 <0f> 0b 48 8b 83 10 02 00 00 a8 20 75 02 0f 0b a8 40 74 02 0f 0b On Wed, Aug 3, 2011 at 10:46 AM, G?khan Sever wrote: > Here are my values for your comparison: > > test.nc file is about 715 MB. The details are below: > > In [21]: netCDF4.__version__ > Out[21]: '0.9.4' > > In [22]: np.__version__ > Out[22]: '2.0.0.dev-b233716' > > In [23]: from netCDF4 import Dataset > > In [24]: f = Dataset("test.nc") > > In [25]: f.variables['reflectivity'].shape > Out[25]: (6, 18909, 506) > > In [26]: f.variables['reflectivity'].size > Out[26]: 57407724 > > In [27]: f.variables['reflectivity'][:].dtype > Out[27]: dtype('float32') > > In [28]: timeit z = f.variables['reflectivity'][:] > 1 loops, best of 3: 731 ms per loop > > How long it takes in your side to read that big array? > > On Wed, Aug 3, 2011 at 10:30 AM, Kiko wrote: > >> Hi. >> >> I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. >> >> The data are described as: >> *The GEBCO gridded data set is stored in NetCDF as a one dimensional >> array of 2-byte signed integers that represent integer elevations in metres. >> >> The complete data set gives global coverage. It consists of 21601 x 10801 >> data values, one for each one minute of latitude and longitude for 233312401 >> points. >> The data start at position 90?N, 180?W and are arranged in bands of 360 >> degrees x 60 points/degree + 1 = 21601 values. The data range eastward from >> 180?W longitude to 180?E longitude, i.e. the 180? value is repeated.* >> >> The problem is that it is very slow (or I am quite newbie). >> >> Anyone has a suggestion to get these data in a numpy array in a faster >> way? >> >> Thanks in advance. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > G?khan > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ijstokes at hkl.hms.harvard.edu Wed Aug 3 14:09:07 2011 From: ijstokes at hkl.hms.harvard.edu (Ian Stokes-Rees) Date: Wed, 03 Aug 2011 14:09:07 -0400 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: <4E397C6A.8010101@noaa.gov> References: <4E397C6A.8010101@noaa.gov> Message-ID: <4E398EC3.4080800@hkl.hms.harvard.edu> On 8/3/11 12:50 PM, Christopher Barker wrote: > As a reference, reading that much data in from a raw file into a numpy > array takes 2.57 on my machine (a rather old Mac, but disks haven't > gotten much faster). 2.57 seconds? or minutes? If seconds, does it actually read the whole thing into memory in that time, or is there some kind of delayed read going on? Ian -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ijstokes.vcf Type: text/x-vcard Size: 380 bytes Desc: not available URL: From Chris.Barker at noaa.gov Wed Aug 3 14:38:08 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Aug 2011 11:38:08 -0700 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: <4E398EC3.4080800@hkl.hms.harvard.edu> References: <4E397C6A.8010101@noaa.gov> <4E398EC3.4080800@hkl.hms.harvard.edu> Message-ID: <4E399590.3020203@noaa.gov> On 8/3/11 11:09 AM, Ian Stokes-Rees wrote: > On 8/3/11 12:50 PM, Christopher Barker wrote: >> As a reference, reading that much data in from a raw file into a numpy >> array takes 2.57 on my machine (a rather old Mac, but disks haven't >> gotten much faster). > > 2.57 seconds? or minutes? sorry -- seconds. >If seconds, does it actually read the whole > thing into memory in that time, or is there some kind of delayed read > going on? I think it reads it all in. However, now that you bring it up, I think "timeit" does it a few times, and after the first time, there may well be disk cache that speeds things up. In fact, as I recently wrote the file, there may be disk cache issues even on the first read. I'm no timing expert, but there must be ways to get a clean time. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Wed Aug 3 14:40:08 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Aug 2011 11:40:08 -0700 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: Message-ID: <4E399608.6000005@noaa.gov> On 8/3/11 9:46 AM, G?khan Sever wrote: > In [23]: from netCDF4 import Dataset > > In [24]: f = Dataset("test.nc ") > > In [25]: f.variables['reflectivity'].shape > Out[25]: (6, 18909, 506) > > In [26]: f.variables['reflectivity'].size > Out[26]: 57407724 > > In [27]: f.variables['reflectivity'][:].dtype > Out[27]: dtype('float32') > > In [28]: timeit z = f.variables['reflectivity'][:] > 1 loops, best of 3: 731 ms per loop that seems pretty fast, actually -- are you sure that [:] forces the full data read? It probably does, but I'm not totally sure. is "z" a numpy array object at that point? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gokhansever at gmail.com Wed Aug 3 16:57:19 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 3 Aug 2011 14:57:19 -0600 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: <4E397C6A.8010101@noaa.gov> References: <4E397C6A.8010101@noaa.gov> Message-ID: This is what I get here: In [1]: a = np.zeros((21601, 10801), dtype=np.uint16) In [2]: a.tofile('temp.npa') In [3]: del a In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 251 ms per loop On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker wrote: > On 8/3/11 9:30 AM, Kiko wrote: > > I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. > > I've never noticed that netCDF4 was particularly slow for reading > (writing can be pretty slow some times). How slow is slow? > > > The data are described as: > > please post the results of: > > ncdump -h the_file_name.nc > > So we can see if there is anything odd in the structure (though I don't > know what it might be) > > Post your code (in the simnd pplest form you can). > > and post your timings and machine type > > Is the file netcdf4 or 3 format? (the python lib will read either) > > As a reference, reading that much data in from a raw file into a numpy > array takes 2.57 on my machine (a rather old Mac, but disks haven't > gotten much faster). YOu can test that like this: > > a = np.zeros((21601, 10801), dtype=np.uint16) > > a.tofile('temp.npa') > > del a > > timeit a = np.fromfile('temp.npa', dtype=np.uint16) > > (using ipython's timeit) > > -Chris > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Wed Aug 3 17:02:28 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 3 Aug 2011 15:02:28 -0600 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: <4E399608.6000005@noaa.gov> References: <4E399608.6000005@noaa.gov> Message-ID: I think these answer your questions. In [3]: type f.variables['reflectivity'] ------> type(f.variables['reflectivity']) Out[3]: In [4]: type f.variables['reflectivity'][:] ------> type(f.variables['reflectivity'][:]) Out[4]: In [5]: z = f.variables['reflectivity'][:] In [6]: type z ------> type(z) Out[6]: In [10]: id f.variables['reflectivity'][:] -------> id(f.variables['reflectivity'][:]) Out[10]: 37895488 In [11]: id z -------> id(z) Out[11]: 37901440 On Wed, Aug 3, 2011 at 12:40 PM, Christopher Barker wrote: > On 8/3/11 9:46 AM, G?khan Sever wrote: > > In [23]: from netCDF4 import Dataset > > > > In [24]: f = Dataset("test.nc ") > > > > In [25]: f.variables['reflectivity'].shape > > Out[25]: (6, 18909, 506) > > > > In [26]: f.variables['reflectivity'].size > > Out[26]: 57407724 > > > > In [27]: f.variables['reflectivity'][:].dtype > > Out[27]: dtype('float32') > > > > In [28]: timeit z = f.variables['reflectivity'][:] > > 1 loops, best of 3: 731 ms per loop > > that seems pretty fast, actually -- are you sure that [:] forces the > full data read? It probably does, but I'm not totally sure. > > is "z" a numpy array object at that point? > > -Chris > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed Aug 3 17:15:06 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 03 Aug 2011 14:15:06 -0700 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: <4E397C6A.8010101@noaa.gov> Message-ID: <4E39BA5A.5070806@noaa.gov> On 8/3/11 1:57 PM, G?khan Sever wrote: > This is what I get here: > > In [1]: a = np.zeros((21601, 10801), dtype=np.uint16) > > In [2]: a.tofile('temp.npa') > > In [3]: del a > > In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) > 1 loops, best of 3: 251 ms per loop so that's about 10 times faster than my machine. I didn't think disks had gotten much faster -- they are still generally 7200 rpm (or slower in laptops). So I've either got a really slow disk, or you have a really fast one (or both), or maybe you're getting cache effect, as you wrote the file just before reading it. repeating, doing just what you did: In [8]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 2.53 s per loop then I wrote a bunch of others to disk, and tried again: In [17]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 2.45 s per loop so ti seems I'm not seeing cache effects, but maybe you are. Anyway, we haven't heard from the OP -- I'm not sure what s/he thought was slow. -Chris > > On Wed, Aug 3, 2011 at 10:50 AM, Christopher Barker > > wrote: > > On 8/3/11 9:30 AM, Kiko wrote: > > I'm trying to read a big netcdf file (445 Mb) using netcdf4-python. > > I've never noticed that netCDF4 was particularly slow for reading > (writing can be pretty slow some times). How slow is slow? > > > The data are described as: > > please post the results of: > > ncdump -h the_file_name.nc > > So we can see if there is anything odd in the structure (though I don't > know what it might be) > > Post your code (in the simnd pplest form you can). > > and post your timings and machine type > > Is the file netcdf4 or 3 format? (the python lib will read either) > > As a reference, reading that much data in from a raw file into a numpy > array takes 2.57 on my machine (a rather old Mac, but disks haven't > gotten much faster). YOu can test that like this: > > a = np.zeros((21601, 10801), dtype=np.uint16) > > a.tofile('temp.npa') > > del a > > timeit a = np.fromfile('temp.npa', dtype=np.uint16) > > (using ipython's timeit) > > -Chris > > > > -- > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main > reception > > Chris.Barker at noaa.gov > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > G?khan > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From gokhansever at gmail.com Wed Aug 3 17:24:33 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 3 Aug 2011 15:24:33 -0600 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: <4E39BA5A.5070806@noaa.gov> References: <4E397C6A.8010101@noaa.gov> <4E39BA5A.5070806@noaa.gov> Message-ID: On Wed, Aug 3, 2011 at 3:15 PM, Christopher Barker wrote: > On 8/3/11 1:57 PM, G?khan Sever wrote: > > This is what I get here: > > > > In [1]: a = np.zeros((21601, 10801), dtype=np.uint16) > > > > In [2]: a.tofile('temp.npa') > > > > In [3]: del a > > > > In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) > > 1 loops, best of 3: 251 ms per loop > > so that's about 10 times faster than my machine. I didn't think disks > had gotten much faster -- they are still generally 7200 rpm (or slower > in laptops). > > So I've either got a really slow disk, or you have a really fast one (or > both), or maybe you're getting cache effect, as you wrote the file just > before reading it. > > repeating, doing just what you did: > > In [8]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) > 1 loops, best of 3: 2.53 s per loop > > then I wrote a bunch of others to disk, and tried again: > > In [17]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) > 1 loops, best of 3: 2.45 s per loop > > so ti seems I'm not seeing cache effects, but maybe you are. > > Anyway, we haven't heard from the OP -- I'm not sure what s/he thought > was slow. > > -Chris In [11]: a = np.zeros((21601, 10801), dtype=np.uint16) In [12]: a.tofile('temp.npa') In [13]: del a Quitting here and restarting IPython. (this should cut the caching effect isn't it?) I[1]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 3: 263 ms per loop #More information about my system: hdparm -I /dev/sda | grep Rotation Nominal Media Rotation Rate: 7200 uname -a #64-bit Fedora 14 Linux ccn 2.6.35.13-92.fc14.x86_64 #1 Filesystem(s) ext4 -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Wed Aug 3 18:02:16 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 3 Aug 2011 17:02:16 -0500 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: <4E397C6A.8010101@noaa.gov> <4E39BA5A.5070806@noaa.gov> Message-ID: On Wed, Aug 3, 2011 at 4:24 PM, G?khan Sever wrote: > > > On Wed, Aug 3, 2011 at 3:15 PM, Christopher Barker > wrote: >> >> On 8/3/11 1:57 PM, G?khan Sever wrote: >> > This is what I get here: >> > >> > In [1]: a = np.zeros((21601, 10801), dtype=np.uint16) >> > >> > In [2]: a.tofile('temp.npa') >> > >> > In [3]: del a >> > >> > In [4]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) >> > 1 loops, best of 3: 251 ms per loop >> >> so that's about 10 times faster than my machine. I didn't think disks >> had gotten much faster -- they are still generally 7200 rpm (or slower >> in laptops). >> >> So I've either got a really slow disk, or you have a really fast one (or >> both), or maybe you're getting cache effect, as you wrote the file just >> before reading it. >> >> repeating, doing just what you did: >> >> In [8]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) >> 1 loops, best of 3: 2.53 s per loop >> >> then I wrote a bunch of others to disk, and tried again: >> >> In [17]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) >> 1 loops, best of 3: 2.45 s per loop >> >> so ti seems I'm not seeing cache effects, but maybe you are. >> >> Anyway, we haven't heard from the OP -- I'm not sure what s/he thought >> was slow. >> >> -Chris > > In [11]: a = np.zeros((21601, 10801), dtype=np.uint16) > In [12]: a.tofile('temp.npa') > In [13]: del a > Quitting here and restarting IPython. (this should cut the caching effect > isn't it?) Not necessarily. In Linux, this should do it: $ sync; echo 3 > /proc/sys/vm/drop_caches (Run as root, or use sudo.) Google for something like "linux reset disk cache" to find other variations. Warren > I[1]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) > 1 loops, best of 3: 263 ms per loop > #More information about my system: > hdparm -I /dev/sda | grep Rotation > Nominal Media Rotation Rate: 7200 > uname -a ?#64-bit Fedora 14 > Linux ccn 2.6.35.13-92.fc14.x86_64 #1 > Filesystem(s) ext4 > -- > G?khan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From efiring at hawaii.edu Wed Aug 3 18:52:12 2011 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 03 Aug 2011 12:52:12 -1000 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: <4E397C6A.8010101@noaa.gov> <4E39BA5A.5070806@noaa.gov> Message-ID: <4E39D11C.7010706@hawaii.edu> On 08/03/2011 11:24 AM, G?khan Sever wrote: > I[1]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) > 1 loops, best of 3: 263 ms per loop You need to clear your cache and then run timeit with options "-n1 -r1". Eric From gokhansever at gmail.com Wed Aug 3 18:56:04 2011 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Wed, 3 Aug 2011 16:56:04 -0600 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: <4E39D11C.7010706@hawaii.edu> References: <4E397C6A.8010101@noaa.gov> <4E39BA5A.5070806@noaa.gov> <4E39D11C.7010706@hawaii.edu> Message-ID: Back to the reality. After clearing the cache using Warren's suggestion: In [1]: timeit -n1 -r1 a = np.fromfile('temp.npa', dtype=np.uint16) 1 loops, best of 1: 7.23 s per loop On Wed, Aug 3, 2011 at 4:52 PM, Eric Firing wrote: > On 08/03/2011 11:24 AM, G?khan Sever wrote: > > > I[1]: timeit a = np.fromfile('temp.npa', dtype=np.uint16) > > 1 loops, best of 3: 263 ms per loop > > You need to clear your cache and then run timeit with options "-n1 -r1". > > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From kikocorreoso at gmail.com Thu Aug 4 06:46:55 2011 From: kikocorreoso at gmail.com (Kiko) Date: Thu, 4 Aug 2011 12:46:55 +0200 Subject: [Numpy-discussion] Reading a big netcdf file Message-ID: Hi, all. Thank you very much for your replies. I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdf libraries: In [4]: import netCDF4 as n4 In [5]: from scipy.io import netcdf as nS In [6]: import numpy as np In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r') In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r') Now, if a do: In [9]: z4 = gebco4.variables['z'] I got no problems and I have: In [14]: type(z4); z4.shape; z4.size Out[14]: Out[14]: (233312401,) Out[14]: 233312401 But if I do: In [15]: z4 = gebco4.variables['z'][:] ------------------------------------------------------------ Traceback (most recent call last): File "", line 1, in File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__ (netCDF4.c:22943) File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in _StartCountStride n = len(range(beg,end,inc)) MemoryError I got a memory error. But if a select a smaller array I've got: In [16]: z4 = gebco4.variables['z'][:10000000] In [17]: type(z4); z4.shape; z4.size Out[17]: Out[17]: (10000000,) Out[17]: 10000000 What's the difference between z4 as a netCDF4.Variable and as a numpy.ndarray? Now, if I use scipy.io.netcdf: In [18]: zS = gebcoS.variables['z'] In [20]: type(zS); zS.shape Out[20]: Out[20]: (233312401,) In [21]: zS = gebcoS.variables['z'][:] In [22]: type(zS); zS.shape Out[22]: Out[22]: (233312401,) What's the difference between zS as a scipy.io.netcdf.netcdf_variable and as a numpy.ndarray? Why with scipy.io.netcdf I do not have a MemoryError? Finally, if I do the following (maybe it's a silly thing do this) using Eric suggestions to clear the cache: In [32]: zS = gebcoS.variables['z'] In [38]: timeit -n1 -r1 zSS = np.array(zS[:100000000]) # 100.000.000 out of 233.312.401 because I've got a MemoryError 1 loops, best of 1: 73.1 s per loop (If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:100000000], copy=True), I get a MemoryError and I have to set the size to 50.000.000 but it's quite fast). Than you very much for your replies and excuse me if some questions are very basic. Best regards. *********************************************************************** The results of ncdump -h netcdf GridOne { dimensions: side = 2 ; xysize = 233312401 ; variables: double x_range(side) ; x_range:units = "user_x_unit" ; double y_range(side) ; y_range:units = "user_y_unit" ; short z_range(side) ; z_range:units = "user_z_unit" ; double spacing(side) ; short dimension(side) ; short z(xysize) ; z:scale_factor = 1. ; z:add_offset = 0. ; z:node_offset = 0 ; // global attributes: :title = "GEBCO One Minute Grid" ; :source = "1.02" ; } The file is publicly available from: http://www.gebco.net/data_and_products/gridded_bathymetry_data/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jswhit at fastmail.fm Thu Aug 4 11:53:03 2011 From: jswhit at fastmail.fm (Jeff Whitaker) Date: Thu, 04 Aug 2011 09:53:03 -0600 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: Message-ID: <4E3AC05F.2040006@fastmail.fm> On 8/4/11 4:46 AM, Kiko wrote: > Hi, all. > > Thank you very much for your replies. > > I am obtaining some issues. If I use netcdf4-python or scipy.io.netcdf > libraries: > > In [4]: import netCDF4 as n4 > In [5]: from scipy.io import netcdf as nS > In [6]: import numpy as np > In [7]: gebco4 = n4.Dataset('GridOne.grd', 'r') > In [8]: gebcoS = nS.netcdf_file('GridOne.grd', 'r') > > Now, if a do: > > In [9]: z4 = gebco4.variables['z'] > > I got no problems and I have: > > In [14]: type(z4); z4.shape; z4.size > Out[14]: > Out[14]: (233312401,) > Out[14]: 233312401 > > But if I do: > > In [15]: z4 = gebco4.variables['z'][:] > ------------------------------------------------------------ > Traceback (most recent call last): > File "", line 1, in > File "netCDF4.pyx", line 2466, in netCDF4.Variable.__getitem__ > (netCDF4.c:22943) > File "C:\Python26\lib\site-packages\netCDF4_utils.py", line 278, in > _StartCountStride > n = len(range(beg,end,inc)) > MemoryError > > I got a memory error. Kiko: I think the difference may be that when you read the data with netcdf4-python, it tries to unpack the short integers to a float32 array, thereby using much more memory (more than you have available). scipy.io.netcdf is just returning you a numpy array of short integers. I bet if you do gebco4.set_automaskandscale(False) before reading the data from the getco4 variable, it will work, since this turns off the auto conversion to float32. You'll have to do the conversion manually then, at which point you will may run out of memory anyway. > But if a select a smaller array I've got: > > In [16]: z4 = gebco4.variables['z'][:10000000] > In [17]: type(z4); z4.shape; z4.size > Out[17]: > Out[17]: (10000000,) > Out[17]: 10000000 > > What's the difference between z4 as a netCDF4.Variable and as a > numpy.ndarray? the netcdf variable object just refers to the data in the file - only when you slice the object is the data read in and converted to a numpy array. -Jeff > > Now, if I use scipy.io.netcdf: > > In [18]: zS = gebcoS.variables['z'] > In [20]: type(zS); zS.shape > Out[20]: > Out[20]: (233312401,) > > In [21]: zS = gebcoS.variables['z'][:] > In [22]: type(zS); zS.shape > Out[22]: > Out[22]: (233312401,) > > What's the difference between zS as a scipy.io.netcdf.netcdf_variable > and as a numpy.ndarray? > Why with scipy.io.netcdf I do not have a MemoryError? > > Finally, if I do the following (maybe it's a silly thing do this) > using Eric suggestions to clear the cache: > > In [32]: zS = gebcoS.variables['z'] > In [38]: timeit -n1 -r1 zSS = np.array(zS[:100000000]) # 100.000.000 > out of 233.312.401 because I've got a MemoryError > 1 loops, best of 1: 73.1 s per loop > > (If I use a copy, timeit -n1 -r1 zSS = np.array(zS[:100000000], > copy=True), I get a MemoryError and I have to set the size to > 50.000.000 but it's quite fast). > > Than you very much for your replies and excuse me if some questions > are very basic. > > Best regards. > > *********************************************************************** > The results of ncdump -h > netcdf GridOne { > dimensions: > side = 2 ; > xysize = 233312401 ; > variables: > double x_range(side) ; > x_range:units = "user_x_unit" ; > double y_range(side) ; > y_range:units = "user_y_unit" ; > short z_range(side) ; > z_range:units = "user_z_unit" ; > double spacing(side) ; > short dimension(side) ; > short z(xysize) ; > z:scale_factor = 1. ; > z:add_offset = 0. ; > z:node_offset = 0 ; > > // global attributes: > :title = "GEBCO One Minute Grid" ; > :source = "1.02" ; > } > > The file is publicly available from: > http://www.gebco.net/data_and_products/gridded_bathymetry_data/ > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Aug 4 13:02:19 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 04 Aug 2011 10:02:19 -0700 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: Message-ID: <4E3AD09B.6020406@noaa.gov> On 8/4/11 3:46 AM, Kiko wrote: > In [9]: z4 = gebco4.variables['z'] > > I got no problems and I have: > > In [14]: type(z4); z4.shape; z4.size > Out[14]: > Out[14]: (233312401,) > Out[14]: 233312401 > > But if I do: > > In [15]: z4 = gebco4.variables['z'][:] > MemoryError > What's the difference between z4 as a netCDF4.Variable and as a > numpy.ndarray? a netCDF4.Variable is an object that holds the properties of the variable, but does not actually load the dat from the file into memory until it is needed, so, it doesn't matter how big the data is at this point. > The results of ncdump -h ... > short z_range(side) ; > z_range:units = "user_z_unit" ; On 8/4/11 8:53 AM, Jeff Whitaker wrote: > Kiko: I think the difference may be that when you read the data with > netcdf4-python, it tries to unpack the short integers to a float32 > array. Jeff, why is that? is it an netcdf4 convention? I always thought that the netcdf data model matched numpy's quite well, including the clear choice and specification of data type. I guess I've mostly used float data anyway, so hadn't noticed this, but ti comes as a surprise to me! > gebco4.set_automaskandscale(False) > before reading the data from the getco4 variable, it will work, since > this turns off the auto conversion to float32. Thanks -- I'll have to remember that. > You'll have to do the conversion manually then, at which point you will > may run out of memory anyway. why would you have to do the conversion at all? (OK, you may, depending on your use case, but for the most part, data stored in a file as an integer type would be suitable for use in an integer array) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Aug 4 13:04:16 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 04 Aug 2011 10:04:16 -0700 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: References: <4E397C6A.8010101@noaa.gov> <4E39BA5A.5070806@noaa.gov> <4E39D11C.7010706@hawaii.edu> Message-ID: <4E3AD110.7000608@noaa.gov> On 8/3/11 3:56 PM, G?khan Sever wrote: > Back to the reality. After clearing the cache using Warren's suggestion: > > In [1]: timeit -n1 -r1 a = np.fromfile('temp.npa', dtype=np.uint16) > 1 loops, best of 1: 7.23 s per loop yup -- that cache sure can be handy! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Thu Aug 4 15:26:29 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 04 Aug 2011 12:26:29 -0700 Subject: [Numpy-discussion] Reading a big netcdf file In-Reply-To: <4E3AD09B.6020406@noaa.gov> References: <4E3AD09B.6020406@noaa.gov> Message-ID: <4E3AF265.9020802@noaa.gov> On 8/4/11 10:02 AM, Christopher Barker wrote: > On 8/4/11 8:53 AM, Jeff Whitaker wrote: >> Kiko: I think the difference may be that when you read the data with >> netcdf4-python, it tries to unpack the short integers to a float32 >> array. > > Jeff, why is that? is it an netcdf4 convention? I always thought that > the netcdf data model matched numpy's quite well, including the clear > choice and specification of data type. I guess I've mostly used float > data anyway, so hadn't noticed this, but ti comes as a surprise to me! > > > gebco4.set_automaskandscale(False) OK -- looked at this a bit more, and see in the OP's ncdump: variables: short z(xysize) ; z:scale_factor = 1. ; z:add_offset = 0. ; z:node_offset = 0 ; so I presume netCDF4 is seeing the scale_factor and offsets, and thus converting to float. In this case, the scale factor is 1.0, and the offsets are 0.0, so there isn't any need to convert, but that may be too smart! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From derek at astro.physik.uni-goettingen.de Thu Aug 4 19:08:51 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 5 Aug 2011 01:08:51 +0200 Subject: [Numpy-discussion] longlong format error with Python <= 2.6 in scalartypes.c Message-ID: <8D5A8864-6827-4164-B8F6-198000B7491D@astro.physik.uni-goettingen.de> Hi, commits c15a807e and c135371e (thus most immediately addressed to Mark, but I am sending this to the list hoping for more insight on the issue) introduce a test failure with Python 2.5+2.6 on Mac: FAIL: test_timedelta_scalar_construction (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/Users/derek/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py", line 219, in test_timedelta_scalar_construction assert_equal(str(np.timedelta64(3, 's')), '3 seconds') File "/Users/derek/lib/python2.6/site-packages/numpy/testing/utils.py", line 313, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: '%lld seconds' DESIRED: '3 seconds' due to the "lld" format passed to PyUString_FromFormat in scalartypes.c. In the current npy_common.h I found the comment * in Python 2.6 the %lld formatter is not supported. In this * case we work around the problem by using the %zd formatter. though I did not notice that problem when I cleaned up the NPY_LONGLONG_FMT definitions in that file (and it is not entirely clear whether the comment only pertains to Windows...). Anyway changing the formatters in scalartypes.c to "zd" as well removes the failure and still works with Python 2.7 and 3.2 (at least on Mac OS). However I am wondering if a) NPY_[U]LONGLONG_FMT should also be defined conditional to the Python version (and if "%zu" is a valid formatter), and b) scalartypes.c should use NPY_LONGLONG_FMT from npy_common.h I am attaching a patch implementing a), but only the quick and dirty solution to b). Cheers, Derek -------------- next part -------------- A non-text attachment was scrubbed... Name: npy_longlong_fmt.patch Type: application/octet-stream Size: 2151 bytes Desc: not available URL: From morph at debian.org Fri Aug 5 19:37:56 2011 From: morph at debian.org (Sandro Tosi) Date: Sat, 6 Aug 2011 01:37:56 +0200 Subject: [Numpy-discussion] Error building numpy (1.5.1 and 1.6.1rc3) with python2.7 debug In-Reply-To: References: Message-ID: Hello, On Sat, Jul 16, 2011 at 22:45, Bruce Southey wrote: > On Sat, Jul 16, 2011 at 4:34 AM, Sandro Tosi wrote: >> Hello, >> while preparing a test upload for 1.6.1rc3 in Debian, I noticed that >> it gets an error when building blas with python 2.7 in the debug >> flavor, the build log is at [1]. It's also been confirmed it fails >> also with 1.5.1 [2] >> >> [1] http://people.debian.org/~morph/python-numpy_1.6.1~rc3-1_amd64.build >> [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=634012 >> >> I think it might be a toolchain change in Debian (since 1.5.1 was >> built successfully and now it fails), but could you please give me a >> hand in debugging the issue? >> >> Thanks in advance, >> -- >> Sandro Tosi (aka morph, morpheus, matrixhasu) >> My website: http://matrixhasu.altervista.org/ >> Me at Debian: http://wiki.debian.org/SandroTosi >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > Hi, > What do you mean by 'python2.7 debug'? > > Numpy 1.6.1rc's and earlier build and install with Python 2.7 build in > debug mode ($ ./configure --with-pydebug > ) on 64-bit Fedora 14 and 15. But, if I can follow you build process > (should be the plain 'python setup.py build' to be useful) I think > numpy is not finding the correct blas/lapack/atlas libraries so either > you may need a site.cfg for that system or install those in the Linux > standard locations such as /usr/lib64. > > You should probably try building without blas, lapack and atlas etc.: > BLAS=None LAPACK=None ATLAS=None python setup.py build It's not a matter of not finding the headers: the same build process succeeds if run using gfortran-4.5 while fails if run with gfortran-4.6 , it's likely that gcc is more strict now and something needs to be adapted in numpy. Has someone successfully built numpy with gcc 4.6 ? Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From charlesr.harris at gmail.com Sat Aug 6 00:25:04 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 5 Aug 2011 22:25:04 -0600 Subject: [Numpy-discussion] Error building numpy (1.5.1 and 1.6.1rc3) with python2.7 debug In-Reply-To: References: Message-ID: On Fri, Aug 5, 2011 at 5:37 PM, Sandro Tosi wrote: > Hello, > > On Sat, Jul 16, 2011 at 22:45, Bruce Southey wrote: > > On Sat, Jul 16, 2011 at 4:34 AM, Sandro Tosi wrote: > >> Hello, > >> while preparing a test upload for 1.6.1rc3 in Debian, I noticed that > >> it gets an error when building blas with python 2.7 in the debug > >> flavor, the build log is at [1]. It's also been confirmed it fails > >> also with 1.5.1 [2] > >> > >> [1] > http://people.debian.org/~morph/python-numpy_1.6.1~rc3-1_amd64.build > >> [2] http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=634012 > >> > >> I think it might be a toolchain change in Debian (since 1.5.1 was > >> built successfully and now it fails), but could you please give me a > >> hand in debugging the issue? > >> > >> Thanks in advance, > >> -- > >> Sandro Tosi (aka morph, morpheus, matrixhasu) > >> My website: http://matrixhasu.altervista.org/ > >> Me at Debian: http://wiki.debian.org/SandroTosi > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > Hi, > > What do you mean by 'python2.7 debug'? > > > > Numpy 1.6.1rc's and earlier build and install with Python 2.7 build in > > debug mode ($ ./configure --with-pydebug > > ) on 64-bit Fedora 14 and 15. But, if I can follow you build process > > (should be the plain 'python setup.py build' to be useful) I think > > numpy is not finding the correct blas/lapack/atlas libraries so either > > you may need a site.cfg for that system or install those in the Linux > > standard locations such as /usr/lib64. > > > > You should probably try building without blas, lapack and atlas etc.: > > BLAS=None LAPACK=None ATLAS=None python setup.py build > > It's not a matter of not finding the headers: the same build process > succeeds if run using gfortran-4.5 while fails if run with > gfortran-4.6 , it's likely that gcc is more strict now and something > needs to be adapted in numpy. > > Has someone successfully built numpy with gcc 4.6 ? > > Yes, all the time ;) gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.s.seljebotn at astro.uio.no Sat Aug 6 05:18:53 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sat, 06 Aug 2011 11:18:53 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 Message-ID: <4E3D06FD.9030701@astro.uio.no> We are excited to announce the release of Cython 0.15, which is a huge step forward in achieving full Python language coverage as well as many new features, optimizations, and bugfixes. Download: http://cython.org/ or http://pypi.python.org/pypi/Cython == Major Features == * Generators (yield) - Cython has full support for generators, generator expressions and coroutines. http://www.python.org/dev/peps/pep-0342/ * The nonlocal keyword is supported. * Re-acquiring the gil: with gil - works as expected within a nogil context. * OpenMP support: http://docs.cython.org/0.15/src/userguide/parallelism.html * Control flow analysis prunes dead code and emits warnings and errors about uninitialised variables. * Debugger command cy set to assign values of expressions to Cython variables and cy exec counterpart cy_eval(). * Exception chaining http://www.python.org/dev/peps/pep-3134/ * Relative imports http://www.python.org/dev/peps/pep-0328/ * The with statement has its own dedicated and faster C implementation. * Improved pure syntax including cython.cclass, cython.cfunc, and cython.ccall. http://docs.cython.org/0.15/src/tutorial/pure.html * Support for del. * Boundschecking directives implemented for builtin Python sequence types. * Several updates and additions to the shipped standard library pxd files https://github.com/cython/cython/tree/master/Cython/Includes * Forward declaration of types is no longer required for circular references. Note: this will be the last release to support Python 2.3; Python 2.4 will be supported for at least one more release. == General improvements and bug fixes == This release contains over a thousand commits including hundreds of bugfixes and optimizations. The bug tracker has not been as heavily used this release cycle, but is still useful http://trac.cython.org/cython_trac/query?status=closed&group=component&order=id&col=id&col=summary&col=milestone&col=status&col=type&col=priority&col=owner&col=component&milestone=0.15&desc=1 == Incompatible changes == * Uninitialized variables are no longer initialized to None and accessing them has the same semantics as standard Python. * globals() now returns a read-only dict of the Cython module's globals, rather than the globals of the first non-Cython module in the stack * Many C++ exceptions are now special cases to give closer Python counterparts. This means that except+ functions that formally raised generic RuntimeErrors may raise something else such as ArithmaticError. == Known regressions == * The inlined generator expressions (introduced in Cython 0.13) were disabled in favour of full generator expression support. This induces a performance regression for cases that were previously inlined. == Contributors == Many thanks to: Francesc Alted, Haoyu Bai, Stefan Behnel, Robert Bradshaw, Lars Buitinck, Lisandro Dalcin, John Ehresman, Mark Florisson, Christoph Gohlke, Jason Grout, Chris Lasher, Vitja Makarov, Brent Pedersen, Dag Sverre Seljebotn, Nathaniel Smith, and Pauli Virtanen From morph at debian.org Sat Aug 6 18:43:41 2011 From: morph at debian.org (Sandro Tosi) Date: Sun, 7 Aug 2011 00:43:41 +0200 Subject: [Numpy-discussion] Error building numpy (1.5.1 and 1.6.1rc3) with python2.7 debug In-Reply-To: References: Message-ID: On Sat, Aug 6, 2011 at 06:25, Charles R Harris wrote: > Yes, all the time ;) > > gcc version 4.6.0 20110603 (Red Hat 4.6.0-10) (GCC) Great, in fact it turned out it was a debian tool that went nuts :) (I was able to build _doblas by hand, so it's just a matter of configuration) The situation is this: - until recently, we had this command in our makefile: 'unexport LDFLAGS' that removes any presence of the variable LDFLAGS from the environment. - recently, a Debian-specific tool, started adding LDFLAGS (and other build variables) to the env, in a way no more controllable by the makefile - with that variable set, gfortran misses a '-shared' option and it generates the error I mentioned in the original email - I'm following the path to ask for that tool to be made more flexible, so to allow to "unset" those variables, but maybe I can workaround it patching the code (I know, I hate to diverge from upstream, but in extreme situations...), so I'd like to ask your guidance in thise :) it's probably something numpy/distutils/fcompiler/ but additional clues would be awesome :) Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From sturla at molden.no Sat Aug 6 22:09:01 2011 From: sturla at molden.no (Sturla Molden) Date: Sun, 07 Aug 2011 04:09:01 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <4E3D06FD.9030701@astro.uio.no> References: <4E3D06FD.9030701@astro.uio.no> Message-ID: <4E3DF3BD.5000703@molden.no> Den 06.08.2011 11:18, skrev Dag Sverre Seljebotn: > We are excited to announce the release of Cython 0.15, which is a huge > step forward in achieving full Python language coverage as well as > many new features, optimizations, and bugfixes. > > This is really great. With Cython progressing like this, I might soon have written my last line of Fortran. :-) I'm finally getting over the post-traumatic stress from writing Matlab MEX files ;-) Sturla From derek at astro.physik.uni-goettingen.de Sun Aug 7 15:58:27 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sun, 7 Aug 2011 21:58:27 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <4E3DF3BD.5000703@molden.no> References: <4E3D06FD.9030701@astro.uio.no> <4E3DF3BD.5000703@molden.no> Message-ID: <3A86B0A1-B6E6-4146-A5B1-626112AD7E47@astro.physik.uni-goettingen.de> On 7 Aug 2011, at 04:09, Sturla Molden wrote: > Den 06.08.2011 11:18, skrev Dag Sverre Seljebotn: >> We are excited to announce the release of Cython 0.15, which is a huge >> step forward in achieving full Python language coverage as well as >> many new features, optimizations, and bugfixes. >> >> > > This is really great. With Cython progressing like this, I might soon > have written my last line of Fortran. :-) +1 (except the bit about writing Fortran, probably ;-) I am only getting 4 errors with Python 3.1 + 3.2 (Mac OS X 10.6/x86_64): compiling (cpp) and running numpy_bufacc_T155, numpy_cimport, numpy_parallel, numpy_test... I could not find much documentation about the runtests.py script (like how to figure out the exact gcc command used), but I am happy to send more details wherever requested. Adding a '-v' flag prints the following additional info: numpy_bufacc_T155.c: In function ?PyInit_numpy_bufacc_T155?: numpy_bufacc_T155.c:3652: warning: ?return? with no value, in function returning non-void .numpy_bufacc_T155.cpp: In function ?PyObject* PyInit_numpy_bufacc_T155()?: numpy_bufacc_T155.cpp:3652: error: return-statement with no value, in function returning ?PyObject*? Enumpy_cimport.c: In function ?PyInit_numpy_cimport?: numpy_cimport.c:3327: warning: ?return? with no value, in function returning non-void .numpy_cimport.cpp: In function ?PyObject* PyInit_numpy_cimport()?: numpy_cimport.cpp:3327: error: return-statement with no value, in function returning ?PyObject*? Enumpy_parallel.c: In function ?PyInit_numpy_parallel?: numpy_parallel.c:3824: warning: ?return? with no value, in function returning non-void .numpy_parallel.cpp: In function ?PyObject* PyInit_numpy_parallel()?: numpy_parallel.cpp:3824: error: return-statement with no value, in function returning ?PyObject*? Enumpy_test.c: In function ?PyInit_numpy_test?: numpy_test.c:11611: warning: ?return? with no value, in function returning non-void .numpy_test.cpp: In function ?PyObject* PyInit_numpy_test()?: numpy_test.cpp:11611: error: return-statement with no value, in function returning ?PyObject*? This happens with numpy 1.5.1, 1.6.0, 1.6.1 or git master installed, With Python 2.5-2.7 all 5536 tests are passing! Cheers, Derek From paul.anton.letnes at gmail.com Sun Aug 7 16:11:38 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 7 Aug 2011 22:11:38 +0200 Subject: [Numpy-discussion] numpy.savetxt Ticket 1573 - suggested fix Message-ID: <0F3DE703-0C85-49BF-9FA9-6C4E33F377C3@gmail.com> (A pull request has been submitted on github, but I'm posting here so people can discuss the user interface issues.) As of now, the fmt= kwarg kan be (for complex dtype): a) a single specifier, fmt='%.4e', resulting in numbers formatted like ' (%s+%sj)' % (fmt, fmt) b) a full string specifying every real and imaginary part, e.g. ' %.4e %+.4j' * 3 for 3 columns c) a list of specifiers, one per column - in this case, the real and imaginary part must have separate specifiers, e.g. ['%.3e + %.3ej', '(%.15e%+.15ej)'] It would be good if people could air their opinion as to whether this is what they would expect from savetxt behavior for real (float) numbers. Ticket link: http://projects.scipy.org/numpy/ticket/1573 Cheers, Paul From paul.anton.letnes at gmail.com Sun Aug 7 16:31:27 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 7 Aug 2011 22:31:27 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <4E3D06FD.9030701@astro.uio.no> References: <4E3D06FD.9030701@astro.uio.no> Message-ID: Looks like you have done some great work! I've been using f2py in the past, but I always liked the idea of cython - gradually wrapping more and more code as the need arises. I read somewhere that fortran wrapping with cython was coming - dare I ask what the status on this is? Is it a goal for cython to support easy fortran wrapping at all? Keep up the good work! Paul On 6. aug. 2011, at 11.18, Dag Sverre Seljebotn wrote: > We are excited to announce the release of Cython 0.15, which is a huge > step forward in achieving full Python language coverage as well as > many new features, optimizations, and bugfixes. > > Download: http://cython.org/ or http://pypi.python.org/pypi/Cython > > == Major Features == > > * Generators (yield) - Cython has full support for generators, > generator expressions and coroutines. > http://www.python.org/dev/peps/pep-0342/ > > * The nonlocal keyword is supported. > > * Re-acquiring the gil: with gil - works as expected within a nogil > context. > > * OpenMP support: > http://docs.cython.org/0.15/src/userguide/parallelism.html > > * Control flow analysis prunes dead code and emits warnings and > errors about uninitialised variables. > > * Debugger command cy set to assign values of expressions to Cython > variables and cy exec counterpart cy_eval(). > > * Exception chaining http://www.python.org/dev/peps/pep-3134/ > > * Relative imports http://www.python.org/dev/peps/pep-0328/ > > * The with statement has its own dedicated and faster C > implementation. > > * Improved pure syntax including cython.cclass, cython.cfunc, and > cython.ccall. http://docs.cython.org/0.15/src/tutorial/pure.html > > * Support for del. > > * Boundschecking directives implemented for builtin Python sequence > types. > > * Several updates and additions to the shipped standard library pxd > files https://github.com/cython/cython/tree/master/Cython/Includes > > * Forward declaration of types is no longer required for circular > references. > > Note: this will be the last release to support Python 2.3; Python 2.4 > will be supported for at least one more release. > > == General improvements and bug fixes == > > This release contains over a thousand commits including hundreds of > bugfixes and optimizations. The bug tracker has not been as heavily > used this release cycle, but is still useful > http://trac.cython.org/cython_trac/query?status=closed&group=component&order=id&col=id&col=summary&col=milestone&col=status&col=type&col=priority&col=owner&col=component&milestone=0.15&desc=1 > > == Incompatible changes == > > * Uninitialized variables are no longer initialized to None and > accessing them has the same semantics as standard Python. > > * globals() now returns a read-only dict of the Cython module's > globals, rather than the globals > of the first non-Cython module in the stack > > * Many C++ exceptions are now special cases to give closer Python > counterparts. This means that except+ functions that formally raised > generic RuntimeErrors may raise something else such as > ArithmaticError. > > == Known regressions == > > * The inlined generator expressions (introduced in Cython 0.13) were > disabled in favour of full generator expression support. This induces > a performance regression for cases that were previously inlined. > > == Contributors == > > Many thanks to: > > Francesc Alted, > Haoyu Bai, > Stefan Behnel, > Robert Bradshaw, > Lars Buitinck, > Lisandro Dalcin, > John Ehresman, > Mark Florisson, > Christoph Gohlke, > Jason Grout, > Chris Lasher, > Vitja Makarov, > Brent Pedersen, > Dag Sverre Seljebotn, > Nathaniel Smith, > and Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From d.s.seljebotn at astro.uio.no Sun Aug 7 17:24:42 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 07 Aug 2011 23:24:42 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: References: <4E3D06FD.9030701@astro.uio.no> Message-ID: <4E3F029A.9010201@astro.uio.no> On 08/07/2011 10:31 PM, Paul Anton Letnes wrote: > Looks like you have done some great work! I've been using f2py in the past, but I always liked the idea of cython - gradually wrapping more and more code as the need arises. I read somewhere that fortran wrapping with cython was coming - dare I ask what the status on this is? Is it a goal for cython to support easy fortran wrapping at all? Fwrap scans Fortran sources and generate a Cython wrapper around a iso_c_binding Fortran 2003 wrapper around your Fortran code. Which is a bit more portable than f2py in theory, although it's pretty much the same in practice currently. It doesn't work for all Fortran code, but I think it works for what f2py does and then some more. The big difference is that it allows you to sidestep Python boxing of arguments when calling from Cython. In addition to the main website (use Google) there's been quite a lot more work on it my Github: https://github.com/dagss/fwrap that's not released. I'd like to continue on Fwrap but there's always 2-3 items higher on the priority list. I can't tell you yet whether the project will survive. But anyway, this is the way Fortran+Cython is supported. Dag Sverre From derek at astro.physik.uni-goettingen.de Sun Aug 7 17:26:57 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sun, 7 Aug 2011 23:26:57 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: References: <4E3D06FD.9030701@astro.uio.no> Message-ID: <2923CFF4-306F-4E15-9DD9-EBA9706BB598@astro.physik.uni-goettingen.de> On 7 Aug 2011, at 22:31, Paul Anton Letnes wrote: > Looks like you have done some great work! I've been using f2py in the past, but I always liked the idea of cython - gradually wrapping more and more code as the need arises. I read somewhere that fortran wrapping with cython was coming - dare I ask what the status on this is? Is it a goal for cython to support easy fortran wrapping at all? > Don't know if there is one besides fwrap, but http://pypi.python.org/pypi/fwrap/0.1.1 builds and tests OK on python 2.[5-7]. So I am bound to continue my Fortran writing... > Keep up the good work! Absolutely agreed! Derek From d.s.seljebotn at astro.uio.no Sun Aug 7 17:27:05 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Sun, 07 Aug 2011 23:27:05 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <3A86B0A1-B6E6-4146-A5B1-626112AD7E47@astro.physik.uni-goettingen.de> References: <4E3D06FD.9030701@astro.uio.no> <4E3DF3BD.5000703@molden.no> <3A86B0A1-B6E6-4146-A5B1-626112AD7E47@astro.physik.uni-goettingen.de> Message-ID: <4E3F0329.9090606@astro.uio.no> On 08/07/2011 09:58 PM, Derek Homeier wrote: > On 7 Aug 2011, at 04:09, Sturla Molden wrote: > >> Den 06.08.2011 11:18, skrev Dag Sverre Seljebotn: >>> We are excited to announce the release of Cython 0.15, which is a huge >>> step forward in achieving full Python language coverage as well as >>> many new features, optimizations, and bugfixes. >>> >>> >> >> This is really great. With Cython progressing like this, I might soon >> have written my last line of Fortran. :-) > > +1 (except the bit about writing Fortran, probably ;-) > > I am only getting 4 errors with Python 3.1 + 3.2 (Mac OS X 10.6/x86_64): > compiling (cpp) and running numpy_bufacc_T155, numpy_cimport, numpy_parallel, numpy_test... > I could not find much documentation about the runtests.py script (like how to figure out the exact gcc command used), but I am happy to send more details wherever requested. Adding a '-v' flag prints the following additional info: > > numpy_bufacc_T155.c: In function ?PyInit_numpy_bufacc_T155?: > numpy_bufacc_T155.c:3652: warning: ?return? with no value, in function returning non-void > .numpy_bufacc_T155.cpp: In function ?PyObject* PyInit_numpy_bufacc_T155()?: > numpy_bufacc_T155.cpp:3652: error: return-statement with no value, in function returning ?PyObject*? > Enumpy_cimport.c: In function ?PyInit_numpy_cimport?: > numpy_cimport.c:3327: warning: ?return? with no value, in function returning non-void > .numpy_cimport.cpp: In function ?PyObject* PyInit_numpy_cimport()?: > numpy_cimport.cpp:3327: error: return-statement with no value, in function returning ?PyObject*? > Enumpy_parallel.c: In function ?PyInit_numpy_parallel?: > numpy_parallel.c:3824: warning: ?return? with no value, in function returning non-void > .numpy_parallel.cpp: In function ?PyObject* PyInit_numpy_parallel()?: > numpy_parallel.cpp:3824: error: return-statement with no value, in function returning ?PyObject*? > Enumpy_test.c: In function ?PyInit_numpy_test?: > numpy_test.c:11611: warning: ?return? with no value, in function returning non-void > .numpy_test.cpp: In function ?PyObject* PyInit_numpy_test()?: > numpy_test.cpp:11611: error: return-statement with no value, in function returning ?PyObject*? > > This happens with numpy 1.5.1, 1.6.0, 1.6.1 or git master installed, > With Python 2.5-2.7 all 5536 tests are passing! I believe this is http://projects.scipy.org/numpy/ticket/1919 Can you confirm? I don't think there's anything we can do on the Cython end to fix this, if the report is correct. Dag Sverre From derek at astro.physik.uni-goettingen.de Sun Aug 7 19:35:26 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Mon, 8 Aug 2011 01:35:26 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <4E3F0329.9090606@astro.uio.no> References: <4E3D06FD.9030701@astro.uio.no> <4E3DF3BD.5000703@molden.no> <3A86B0A1-B6E6-4146-A5B1-626112AD7E47@astro.physik.uni-goettingen.de> <4E3F0329.9090606@astro.uio.no> Message-ID: <0EDEE61F-FA30-4E74-BCDA-5FAB6196FFF4@astro.physik.uni-goettingen.de> On 7 Aug 2011, at 23:27, Dag Sverre Seljebotn wrote: >> Enumpy_test.c: In function ?PyInit_numpy_test?: >> numpy_test.c:11611: warning: ?return? with no value, in function returning non-void >> .numpy_test.cpp: In function ?PyObject* PyInit_numpy_test()?: >> numpy_test.cpp:11611: error: return-statement with no value, in function returning ?PyObject*? >> >> This happens with numpy 1.5.1, 1.6.0, 1.6.1 or git master installed, >> With Python 2.5-2.7 all 5536 tests are passing! > > I believe this is http://projects.scipy.org/numpy/ticket/1919 > > Can you confirm? > > I don't think there's anything we can do on the Cython end to fix this, > if the report is correct. Yes, the proposed patch fixes the errors! I have added a comment to the ticket, hopefully this can be merged soon. Cheers, Derek From seb.haase at gmail.com Mon Aug 8 04:21:57 2011 From: seb.haase at gmail.com (Sebastian Haase) Date: Mon, 8 Aug 2011 10:21:57 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <4E3F029A.9010201@astro.uio.no> References: <4E3D06FD.9030701@astro.uio.no> <4E3F029A.9010201@astro.uio.no> Message-ID: On Sun, Aug 7, 2011 at 11:24 PM, Dag Sverre Seljebotn wrote: > On 08/07/2011 10:31 PM, Paul Anton Letnes wrote: >> Looks like you have done some great work! I've been using f2py in the past, but I always liked the idea of cython - gradually wrapping more and more code as the need arises. I read somewhere that fortran wrapping with cython was coming - dare I ask what the status on this is? Is it a goal for cython to support easy fortran wrapping at all? > > Fwrap scans Fortran sources and generate a Cython wrapper around a > iso_c_binding Fortran 2003 wrapper around your Fortran code. Which is a > bit more portable than f2py in theory, although it's pretty much the > same in practice currently. > > It doesn't work for all Fortran code, but I think it works for what f2py > does and then some more. > > The big difference is that it allows you to sidestep Python boxing of > arguments when calling from Cython. > > In addition to the main website (use Google) there's been quite a lot > more work on it my Github: > > https://github.com/dagss/fwrap > > that's not released. I'd like to continue on Fwrap but there's always > 2-3 items higher on the priority list. I can't tell you yet whether the > project will survive. > > But anyway, this is the way Fortran+Cython is supported. > > Dag Sverre Hi, Not to hijack this thread .... First, also my congratulations to making such great progress with such a great project ! a) Is there anything that would parse a C/C++ header file and generate Cython wrapper code for it ? b) What is the status of supporting multi-type Cython functions -- ala C++ templates ? This would be one of my top ranked favorites, since I like writing simple algorithms (like computing certain statistics over a numpy array), and have this support all of e.g. unit8, int32, unit16, float32 and float64... (I'm using some macro-enhanced SWIG for this so far) Thanks, Sebastian Haase From d.s.seljebotn at astro.uio.no Mon Aug 8 05:47:26 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Mon, 08 Aug 2011 11:47:26 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: References: <4E3D06FD.9030701@astro.uio.no> <4E3F029A.9010201@astro.uio.no> Message-ID: <4E3FB0AE.6010504@astro.uio.no> On 08/08/2011 10:21 AM, Sebastian Haase wrote: > On Sun, Aug 7, 2011 at 11:24 PM, Dag Sverre Seljebotn > wrote: >> On 08/07/2011 10:31 PM, Paul Anton Letnes wrote: >>> Looks like you have done some great work! I've been using f2py in the past, but I always liked the idea of cython - gradually wrapping more and more code as the need arises. I read somewhere that fortran wrapping with cython was coming - dare I ask what the status on this is? Is it a goal for cython to support easy fortran wrapping at all? >> >> Fwrap scans Fortran sources and generate a Cython wrapper around a >> iso_c_binding Fortran 2003 wrapper around your Fortran code. Which is a >> bit more portable than f2py in theory, although it's pretty much the >> same in practice currently. >> >> It doesn't work for all Fortran code, but I think it works for what f2py >> does and then some more. >> >> The big difference is that it allows you to sidestep Python boxing of >> arguments when calling from Cython. >> >> In addition to the main website (use Google) there's been quite a lot >> more work on it my Github: >> >> https://github.com/dagss/fwrap >> >> that's not released. I'd like to continue on Fwrap but there's always >> 2-3 items higher on the priority list. I can't tell you yet whether the >> project will survive. >> >> But anyway, this is the way Fortran+Cython is supported. >> >> Dag Sverre > > Hi, > Not to hijack this thread .... > First, also my congratulations to making such great progress with such > a great project ! > > a) Is there anything that would parse a C/C++ header file and > generate Cython wrapper code for it ? This come up now and again and I believe there's several half-baked/started solutions out there by Cython users, but nothing that is standard or that I know is carried out to completion. I.e., you should ask on the cython-users list. It'd be good if somebody would compile a list of the efforts so far on the wiki as well... > b) What is the status of supporting multi-type Cython functions -- ala > C++ templates ? > This would be one of my top ranked favorites, since I like writing > simple algorithms (like computing certain statistics over a numpy > array), and have this support all of e.g. unit8, int32, unit16, > float32 and float64... (I'm using some macro-enhanced SWIG for this > so far) It's been implemented as part of Mark Florisson's GSoC (he also did the OpenMP support!), currently waiting for review AFAIK. We take an approach different to C++ though. http://wiki.cython.org/enhancements/fusedtypes Dag Sverre From amcmorl at gmail.com Mon Aug 8 11:27:14 2011 From: amcmorl at gmail.com (Angus McMorland) Date: Mon, 8 Aug 2011 11:27:14 -0400 Subject: [Numpy-discussion] PEP 3118 array size check Message-ID: Hi all, I've just upgraded to the latest numpy from git along with upgrading Ubuntu to natty. Now some of my code, which relies on ctypes-wrapping of data structures from a messaging system, fails with the error message: "RuntimeWarning: Item size computed from the PEP 3118 buffer format string does not match the actual item size." Can anyone tell me if this was a change that has been added into the git version recently, in which case I can checkout a previous version of numpy, or if I've got to try downgrading the whole system (ergh.) Thanks, Angus -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From sturla at molden.no Mon Aug 8 11:29:38 2011 From: sturla at molden.no (Sturla Molden) Date: Mon, 08 Aug 2011 17:29:38 +0200 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <4E3FB0AE.6010504@astro.uio.no> References: <4E3D06FD.9030701@astro.uio.no> <4E3F029A.9010201@astro.uio.no> <4E3FB0AE.6010504@astro.uio.no> Message-ID: <4E4000E2.9030304@molden.no> Den 08.08.2011 11:47, skrev Dag Sverre Seljebotn: > This come up now and again and I believe there's several > half-baked/started solutions out there by Cython users, but nothing > that is standard or that I know is carried out to completion. I.e., > you should ask on the cython-users list. It'd be good if somebody > would compile a list of the efforts so far on the wiki as well... I wrote a mock-up pxd-generator for the OpenGL headers. It only worked with a particular set of OpenGL header files, and the output still required a few cases of manual editing. But this still saved me a lot of time re-declaring OpenGL to Cython, and an important benefit is correctness (i.e. almost no manual code to proof) :) The script is so bad, though, I am not sure I want to show it to anyone ;-) Writing a general "headerfile2pxd.py" script is a huge undertaking. The C preprocessor makes this particularly annoying, because some symbols might be defined at compile-time. To make things worse, C headers can also be recursive. Running the preprocessor in advance is not an option either, because some C APIs rely heavily on defined symbols. These will not survive through the preprocessor. Sometimes we want to fool Cython into thinking that a defined constant is an external variable or function with some C types, which also complicates this effort. It's not that bad if we write a generator for a particular set of header files, but a PITA when we write one for general use. Sturla From Chris.Barker at noaa.gov Mon Aug 8 12:46:06 2011 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 08 Aug 2011 09:46:06 -0700 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: References: <4E3D06FD.9030701@astro.uio.no> <4E3F029A.9010201@astro.uio.no> Message-ID: <4E4012CE.2020908@noaa.gov> On 8/8/11 1:21 AM, Sebastian Haase wrote: > b) What is the status of supporting multi-type Cython functions -- ala > C++ templates ? You might want to take a look at what Keith Goodman has done with the "Bottleneck" project -- I think he used a generic template tool to generate Cython code for a variety of types from a single definition. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From shish at keba.be Mon Aug 8 12:54:17 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 8 Aug 2011 12:54:17 -0400 Subject: [Numpy-discussion] Weird upcast behavior with 1.6.x, working as intended? Message-ID: Hi, This is with numpy 1.6.1 under Linux x86_64, testing the upcast mechanism of "scalar + array": >>> import numpy; print (numpy.array(3, dtype=numpy.complex128) + numpy.ones(3, dtype=numpy.float32)).dtype complex64 Since it has to upcast my array (float32 is not "compatible enough" with complex128), why does it upcast it to complex64 instead of complex128? As far as I can tell 1.4.x and 1.5.x versions of numpy are indeed upcasting to complex128. Thanks, -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Aug 8 13:24:13 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 8 Aug 2011 11:24:13 -0600 Subject: [Numpy-discussion] Weird upcast behavior with 1.6.x, working as intended? In-Reply-To: References: Message-ID: On Mon, Aug 8, 2011 at 10:54 AM, Olivier Delalleau wrote: > Hi, > > This is with numpy 1.6.1 under Linux x86_64, testing the upcast mechanism > of "scalar + array": > > >>> import numpy; print (numpy.array(3, dtype=numpy.complex128) + > numpy.ones(3, dtype=numpy.float32)).dtype > complex64 > > Since it has to upcast my array (float32 is not "compatible enough" with > complex128), why does it upcast it to complex64 instead of complex128? > As far as I can tell 1.4.x and 1.5.x versions of numpy are indeed upcasting > to complex128. > > The 0 dimensional array is being treated as a scalar, hence is cast to the type of the 1d array. This seems more consistent with the idea that 0 dimensional arrays act like scalars, but I suppose that is open to discussion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Mon Aug 8 15:38:05 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 8 Aug 2011 15:38:05 -0400 Subject: [Numpy-discussion] Weird upcast behavior with 1.6.x, working as intended? In-Reply-To: References: Message-ID: 2011/8/8 Charles R Harris > > > On Mon, Aug 8, 2011 at 10:54 AM, Olivier Delalleau wrote: > >> Hi, >> >> This is with numpy 1.6.1 under Linux x86_64, testing the upcast mechanism >> of "scalar + array": >> >> >>> import numpy; print (numpy.array(3, dtype=numpy.complex128) + >> numpy.ones(3, dtype=numpy.float32)).dtype >> complex64 >> >> Since it has to upcast my array (float32 is not "compatible enough" with >> complex128), why does it upcast it to complex64 instead of complex128? >> As far as I can tell 1.4.x and 1.5.x versions of numpy are indeed >> upcasting to complex128. >> >> > The 0 dimensional array is being treated as a scalar, hence is cast to the > type of the 1d array. This seems more consistent with the idea that 0 > dimensional arrays act like scalars, but I suppose that is open to > discussion. > > Chuck > I'm afraid I don't understand your reply. I know that the 0d array is a scalar, and thus should not lead to an upcast "unless the scalar is of a fundamentally different kind of data (*i.e.*, under a different hierarchy in the data-type hierarchy) than the array" (quoted from http://docs.scipy.org/doc/numpy/reference/ufuncs.html). This is one case where it is under a different hierarchy and thus should trigger an upcast. What I don't understand it why it upcasts to complex64 instead of complex128. Note that: 1. When replacing "numpy.ones" with "numpy.array" it yields complex128 (expected upcast of scalar addition of complex128 with float32) 2. The behavior is similar if instead of "3" I use a number which cannot be represented exactly with a complex64 (so it's not a rule about picking the smallest data type able to exactly represent the result) -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Mon Aug 8 19:01:28 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 08 Aug 2011 19:01:28 -0400 Subject: [Numpy-discussion] Warning: invalid value encountered in divide Message-ID: Warning: invalid value encountered in divide No traceback. How can I get more info on this? Can this warning be converted to an exception so I can get a trace? From wesmckinn at gmail.com Mon Aug 8 19:25:32 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Mon, 8 Aug 2011 19:25:32 -0400 Subject: [Numpy-discussion] Warning: invalid value encountered in divide In-Reply-To: References: Message-ID: On Mon, Aug 8, 2011 at 7:01 PM, Neal Becker wrote: > Warning: invalid value encountered in divide > > No traceback. ?How can I get more info on this? ?Can this warning be converted > to an exception so I can get a trace? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Try calling np.seterr(divide='raise') or np.seterr(all='raise') From charlesr.harris at gmail.com Tue Aug 9 00:56:09 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 8 Aug 2011 22:56:09 -0600 Subject: [Numpy-discussion] Static analysis of python c extensions. Message-ID: Thought some might find this of interest. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at math.washington.edu Tue Aug 9 12:25:17 2011 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 9 Aug 2011 09:25:17 -0700 Subject: [Numpy-discussion] [ANN] Cython 0.15 In-Reply-To: <4E4012CE.2020908@noaa.gov> References: <4E3D06FD.9030701@astro.uio.no> <4E3F029A.9010201@astro.uio.no> <4E4012CE.2020908@noaa.gov> Message-ID: On Mon, Aug 8, 2011 at 9:46 AM, Christopher Barker wrote: > On 8/8/11 1:21 AM, Sebastian Haase wrote: > >> b) What is the status of supporting multi-type Cython functions -- ala >> C++ templates ? > > You might want to take a look at what Keith Goodman has done with the > "Bottleneck" project -- I think he used a generic template tool to > generate Cython code for a variety of types from a single definition. Templating and type parameterization is a really tricky issue to get right, especially when grafting into a "statically typeless" language like Python. The consensus that we had over Cython days was, at least for the medium term, was: (1) Implement http://wiki.cython.org/enhancements/fusedtypes for the most common usecases. Mark has nearly finished this as part of his GSoC project. (2) Better support for pre-processing Cython with a templating language, which would provide users with a high level of flexibility to do anything sophisticated. (Not implemented, but users are already doing this.) (3) While would like to support template C++ functions better in C++ mode, for many reasons this is not the model we want to follow for Cython type parameterization. - Robert From alex.flint at gmail.com Tue Aug 9 17:23:36 2011 From: alex.flint at gmail.com (Alex Flint) Date: Tue, 9 Aug 2011 17:23:36 -0400 Subject: [Numpy-discussion] dealing with RGB images Message-ID: Until now, I've been representing RGB images in numpy using MxNx3 arrays, as returned by scipy.misc.imread and friends. However, when performing image transformations, the color dimension is semantically different from the spatial dimensions. For example, I would like to write an image scaling function that works for both grayscale array (MxN) and RGB images (MxNx3): def imscale(image, scale): return scipy.ndimage.zoom(imscale, scale) But this will apply scaling along the color dimension, resulting in an image with more/less image channels. So I do: def imscale(image, scale) if image.ndim == 2: return scipy.ndimage.zoom(imscale, scale) else: return scipy.ndimage.zoom(imscale, (scale[0], scale[1], 1)) But now this fails if the scale argument is a scalar. It is possible to cover all cases but all my functions are become case nighmares as the combinations of RGB and scalar images multiply. I am thinking of writing an RGB class with overrides for all the math operations that make sense (addition, scalar multiplication), and then creating arrays with dtype=RGB. This will mean that color images always have ndim=2. Does this make sense? Is there a neater way to achieve this within numpy? Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Aug 9 19:06:56 2011 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 9 Aug 2011 16:06:56 -0700 Subject: [Numpy-discussion] dealing with RGB images In-Reply-To: References: Message-ID: 1) Have you considered using MxNx1 arrays for greyscale images, so all images have the same dimensionality? 2) Instead of defining an RGB class from scratch, would a structured dtype do what you want? - Nathaniel On Aug 9, 2011 2:23 PM, "Alex Flint" wrote: > Until now, I've been representing RGB images in numpy using MxNx3 arrays, as > returned by scipy.misc.imread and friends. However, when performing image > transformations, the color dimension is semantically different from the > spatial dimensions. For example, I would like to write an image scaling > function that works for both grayscale array (MxN) and RGB images (MxNx3): > > def imscale(image, scale): > return scipy.ndimage.zoom(imscale, scale) > > But this will apply scaling along the color dimension, resulting in an image > with more/less image channels. So I do: > > def imscale(image, scale) > if image.ndim == 2: > return scipy.ndimage.zoom(imscale, scale) > else: > return scipy.ndimage.zoom(imscale, (scale[0], scale[1], 1)) > > But now this fails if the scale argument is a scalar. It is possible to > cover all cases but all my functions are become case nighmares as the > combinations of RGB and scalar images multiply. > > I am thinking of writing an RGB class with overrides for all the math > operations that make sense (addition, scalar multiplication), and then > creating arrays with dtype=RGB. This will mean that color images always have > ndim=2. Does this make sense? Is there a neater way to achieve this within > numpy? > > Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Wed Aug 10 04:01:59 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 10 Aug 2011 08:01:59 +0000 (UTC) Subject: [Numpy-discussion] PEP 3118 array size check References: Message-ID: Mon, 08 Aug 2011 11:27:14 -0400, Angus McMorland wrote: > I've just upgraded to the latest numpy from git along with upgrading > Ubuntu to natty. Now some of my code, which relies on ctypes-wrapping of > data structures from a messaging system, fails with the error message: > > "RuntimeWarning: Item size computed from the PEP 3118 buffer format > string does not match the actual item size." > > Can anyone tell me if this was a change that has been added into the git > version recently, in which case I can checkout a previous version of > numpy, or if I've got to try downgrading the whole system (ergh.) Python's ctypes module implements its PEP 3118 support incorrectly in recent Python versions. There's a patch in waiting: http://bugs.python.org/issue10744 In the meantime, you can just silence the warnings using the warnings module, warnings.simplefilter("ignore", RuntimeWarning) -- Pauli Virtanen From gnurser at gmail.com Wed Aug 10 06:45:52 2011 From: gnurser at gmail.com (George Nurser) Date: Wed, 10 Aug 2011 11:45:52 +0100 Subject: [Numpy-discussion] problems with multiple outputs with numpy.nditer Message-ID: Hi, I'm running numpy 1.6.1rc2 + python 2.7.1 64-bit from python.org on OSX 10.6.8. I have a f2py'd fortran routine that inputs and outputs fortran real*8 scalars, and I normally call it like tu,tv,E,El,IF,HF,HFI = LW.rotate2u(u,v,NN,ff,0) I now want to call it over 2D arrays UT,VT,N,f Using steam-age indexing works fine: mflux_east,mflux_north,IWE,IWE_lin,InvFr,HFroude = np.empty([6,ny-1,nx],dtype=np.float64) for j in range(ny-1): for i in range(nx): u,v,NN,ff = [x[j,i] for x in UT,VT,N,f] mflux_east[j,i],mflux_north[j,i],IWE[j,i],IWE_lin[j,i],InvFr[j,i],HFroude[j,i],HFI = LW.rotate2u(u,v,NN,ff,0) I decided to try the new nditer option, with it = np.nditer([UT,VT,N,f,None,None,None,None,None,None,None] ,op_flags=4*[['readonly']]+7*[['writeonly','allocate']] ,op_dtypes=np.float64) for (u,v,NN,ff,tu,tv,E,El,IF,HF,HFI) in it: tu,tv,E,El,IF,HF,HFI = LW.rotate2u(u,v,NN,ff,0) Unfortunately this doesn't seem to work. Writing aa,bb,cc,dd,ee,ff,gg = it.operands[4:] aa seems to contain the contents of UT (bizarrely rescaled to lie between 0 and 1), while bb,cc etc are all zero. I'm not sure whether I've just called it incorrectly, or whether perhaps it's only supposed to work with one output array. --George Nurser. From mwwiebe at gmail.com Wed Aug 10 12:15:48 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 10 Aug 2011 09:15:48 -0700 Subject: [Numpy-discussion] problems with multiple outputs with numpy.nditer In-Reply-To: References: Message-ID: On Wed, Aug 10, 2011 at 3:45 AM, George Nurser wrote: > Hi, > I'm running numpy 1.6.1rc2 + python 2.7.1 64-bit from python.org on OSX > 10.6.8. > > I have a f2py'd fortran routine that inputs and outputs fortran real*8 > scalars, and I normally call it like > > tu,tv,E,El,IF,HF,HFI = LW.rotate2u(u,v,NN,ff,0) > > I now want to call it over 2D arrays UT,VT,N,f > > Using steam-age indexing works fine: > > mflux_east,mflux_north,IWE,IWE_lin,InvFr,HFroude = > np.empty([6,ny-1,nx],dtype=np.float64) > for j in range(ny-1): > for i in range(nx): > u,v,NN,ff = [x[j,i] for x in UT,VT,N,f] > > mflux_east[j,i],mflux_north[j,i],IWE[j,i],IWE_lin[j,i],InvFr[j,i],HFroude[j,i],HFI > = LW.rotate2u(u,v,NN,ff,0) > > > > I decided to try the new nditer option, with > > it = np.nditer([UT,VT,N,f,None,None,None,None,None,None,None] > ,op_flags=4*[['readonly']]+7*[['writeonly','allocate']] > ,op_dtypes=np.float64) > for (u,v,NN,ff,tu,tv,E,El,IF,HF,HFI) in it: > tu,tv,E,El,IF,HF,HFI = LW.rotate2u(u,v,NN,ff,0) > > > Unfortunately this doesn't seem to work. Writing > aa,bb,cc,dd,ee,ff,gg = it.operands[4:] > One problem here is that the assignment needs to assign into the view the iterator gives, something a direct assignment doesn't actually do. Instead of a, b = f(c,d) you need to write it like a[...], b[...] = f(c,d) so that the actual values being iterated get modified. Here's what I get: In [7]: a = np.arange(5.) In [8]: b, c, d = a + 1, a + 2, a + 3 In [9]: it = np.nditer([a,b,c,d] + [None]*7, ...: op_flags=4*[['readonly']]+7*[['writeonly','allocate']], ...: op_dtypes=np.float64) In [10]: for (x,y,z,w,A,B,C,D,E,F,G) in it: ....: A[...], B[...], C[...], D[...], E[...], F[...], G[...] = x, y, z, w, x+y, y+z, z+w ....: In [11]: it.operands[4] Out[11]: array([ 0., 1., 2., 3., 4.]) In [12]: it.operands[5] Out[12]: array([ 1., 2., 3., 4., 5.]) In [13]: it.operands[6] Out[13]: array([ 2., 3., 4., 5., 6.]) In [14]: it.operands[7] Out[14]: array([ 3., 4., 5., 6., 7.]) In [15]: it.operands[8] Out[15]: array([ 1., 3., 5., 7., 9.]) In [16]: it.operands[9] Out[16]: array([ 3., 5., 7., 9., 11.]) In [17]: it.operands[10] Out[17]: array([ 5., 7., 9., 11., 13.]) -Mark > > aa seems to contain the contents of UT (bizarrely rescaled to lie > between 0 and 1), while bb,cc etc are all zero. > > > I'm not sure whether I've just called it incorrectly, or whether > perhaps it's only supposed to work with one output array. > > > --George Nurser. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amcmorl at gmail.com Wed Aug 10 12:50:39 2011 From: amcmorl at gmail.com (Angus McMorland) Date: Wed, 10 Aug 2011 12:50:39 -0400 Subject: [Numpy-discussion] numpy/ctypes segfault [was: PEP 3118 array size check] Message-ID: On 10 August 2011 04:01, Pauli Virtanen wrote: > Mon, 08 Aug 2011 11:27:14 -0400, Angus McMorland wrote: >> I've just upgraded to the latest numpy from git along with upgrading >> Ubuntu to natty. Now some of my code, which relies on ctypes-wrapping of >> data structures from a messaging system, fails with the error message: >> >> "RuntimeWarning: Item size computed from the PEP 3118 buffer format >> string does not match the actual item size." >> >> Can anyone tell me if this was a change that has been added into the git >> version recently, in which case I can checkout a previous version of >> numpy, or if I've got to try downgrading the whole system (ergh.) > > Python's ctypes module implements its PEP 3118 support incorrectly > in recent Python versions. There's a patch in waiting: > > ? ? ? ?http://bugs.python.org/issue10744 > > In the meantime, you can just silence the warnings using the warnings > module, > > ? ? ? ?warnings.simplefilter("ignore", RuntimeWarning) Thanks Pauli. I was seeing a segfault everytime I saw the error message, and since both started happening at the same time, I was guilty of mixing correlation and causation. After rebuilding numpy about 10 times, I have identified the first git commit after which the segfault appears (feb8079070b8a659d7ee) , and a small piece of code that triggers it: from ctypes import Structure, c_double #-- copied out of an xml2py generated file class S(Structure): pass S._pack_ = 4 S._fields_ = [ ('field', c_double * 2), ] #-- import numpy as np print np.version.version s = S() print "S", np.asarray(s.field) Can anyone confirm this, in which case it's probably a bug? Thanks, Angus -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From gnurser at gmail.com Wed Aug 10 12:55:59 2011 From: gnurser at gmail.com (George Nurser) Date: Wed, 10 Aug 2011 17:55:59 +0100 Subject: [Numpy-discussion] problems with multiple outputs with numpy.nditer In-Reply-To: References: Message-ID: Works fine with the [...]s. Thanks very much. --George On 10 August 2011 17:15, Mark Wiebe wrote: > On Wed, Aug 10, 2011 at 3:45 AM, George Nurser wrote: >> >> Hi, >> I'm running numpy 1.6.1rc2 + python 2.7.1 64-bit from python.org on OSX >> 10.6.8. >> >> I have a f2py'd fortran routine that inputs and outputs fortran real*8 >> scalars, and I normally call it like >> >> tu,tv,E,El,IF,HF,HFI = LW.rotate2u(u,v,NN,ff,0) >> >> I now want to call it over 2D arrays UT,VT,N,f >> >> Using steam-age indexing works fine: >> >> mflux_east,mflux_north,IWE,IWE_lin,InvFr,HFroude = >> np.empty([6,ny-1,nx],dtype=np.float64) >> for j in range(ny-1): >> ? for i in range(nx): >> ? ? ? u,v,NN,ff = [x[j,i] for x in UT,VT,N,f] >> >> mflux_east[j,i],mflux_north[j,i],IWE[j,i],IWE_lin[j,i],InvFr[j,i],HFroude[j,i],HFI >> = LW.rotate2u(u,v,NN,ff,0) >> >> >> >> I decided to try the new nditer option, with >> >> it = np.nditer([UT,VT,N,f,None,None,None,None,None,None,None] >> ? ? ? ? ? ? ?,op_flags=4*[['readonly']]+7*[['writeonly','allocate']] >> ? ? ? ? ? ? ?,op_dtypes=np.float64) >> for (u,v,NN,ff,tu,tv,E,El,IF,HF,HFI) in it: >> ? tu,tv,E,El,IF,HF,HFI = LW.rotate2u(u,v,NN,ff,0) >> >> >> Unfortunately this doesn't seem to work. Writing >> aa,bb,cc,dd,ee,ff,gg = it.operands[4:] > > One problem here is that the assignment needs to assign into the view the > iterator gives, something a direct assignment doesn't actually do. Instead > of > a, b = f(c,d) > you need to write it like > a[...], b[...] = f(c,d) > so that the actual values being iterated get modified. Here's what I get: > In [7]: a = np.arange(5.) > In [8]: b, c, d = a + 1, a + 2, a + 3 > In [9]: it = np.nditer([a,b,c,d] + [None]*7, > ? ?...: ? ? ? ? op_flags=4*[['readonly']]+7*[['writeonly','allocate']], > ? ?...: ? ? ? ? op_dtypes=np.float64) > In [10]: for (x,y,z,w,A,B,C,D,E,F,G) in it: > ? ?....: ? ? A[...], B[...], C[...], D[...], E[...], F[...], G[...] = x, y, > z, w, x+y, y+z, z+w > ? ?....: > In [11]: it.operands[4] > Out[11]: array([ 0., ?1., ?2., ?3., ?4.]) > In [12]: it.operands[5] > Out[12]: array([ 1., ?2., ?3., ?4., ?5.]) > In [13]: it.operands[6] > Out[13]: array([ 2., ?3., ?4., ?5., ?6.]) > In [14]: it.operands[7] > Out[14]: array([ 3., ?4., ?5., ?6., ?7.]) > In [15]: it.operands[8] > Out[15]: array([ 1., ?3., ?5., ?7., ?9.]) > In [16]: it.operands[9] > Out[16]: array([ ?3., ? 5., ? 7., ? 9., ?11.]) > In [17]: it.operands[10] > Out[17]: array([ ?5., ? 7., ? 9., ?11., ?13.]) > > -Mark > >> >> aa seems to contain the contents of UT (bizarrely rescaled to lie >> between 0 and 1), while bb,cc etc are all zero. >> >> >> I'm not sure whether I've just called it incorrectly, or whether >> perhaps it's only supposed to work with one output array. >> >> >> --George Nurser. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From rowen at uw.edu Wed Aug 10 13:22:14 2011 From: rowen at uw.edu (Russell E. Owen) Date: Wed, 10 Aug 2011 10:22:14 -0700 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? Message-ID: A coworker is trying to load a 1Gb text data file into a numpy array using numpy.loadtxt, but he says it is using up all of his machine's 6Gb of RAM. Is there a more efficient way to read such text data files? -- Russell From matthew.brett at gmail.com Wed Aug 10 15:28:53 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 Aug 2011 12:28:53 -0700 Subject: [Numpy-discussion] numpydoc - latex longtables error Message-ID: Hi, I think this one might be for Pauli. I've run into an odd problem that seems to be an interaction of numpydoc and autosummary and large classes. In summary, large classes and numpydoc lead to large tables of class methods, and there seems to be an error in the creation of the large tables in latex. Specifically, if I run 'make latexpdf' with the attached minimal sphinx setup, I get a pdflatex error ending thus: ... l.118 \begin{longtable}{LL} and this is because longtable does not accept LL as an argument, but needs '|l|l|' (bar - el - bar - el - bar). I see in sphinx.writers.latex.py, around line 657, that sphinx knows about this in general, and long tables in standard ReST work fine with the el-bar arguments passed to longtable. if self.table.colspec: self.body.append(self.table.colspec) else: if self.table.has_problematic: colwidth = 0.95 / self.table.colcount colspec = ('p{%.3f\\linewidth}|' % colwidth) * \ self.table.colcount self.body.append('{|' + colspec + '}\n') elif self.table.longtable: self.body.append('{|' + ('l|' * self.table.colcount) + '}\n') else: self.body.append('{|' + ('L|' * self.table.colcount) + '}\n') However, using numpydoc and autosummary (see the conf.py file), what seems to happen is that, when we reach the self.table.colspec test at the beginning of the snippet above, 'self.table.colspec' is defined: In [1]: self.table.colspec Out[1]: '{LL}\n' and thus the LL gets written as the arg to longtable: \begin{longtable}{LL} and the pdf build breaks. I'm using the numpydoc out of the current numpy source tree. At that point I wasn't sure how to proceed with debugging. Can you give any hints? Thanks a lot, Matthew -------------- next part -------------- A non-text attachment was scrubbed... Name: long_test.tgz Type: application/x-gzip Size: 11907 bytes Desc: not available URL: From jsseabold at gmail.com Wed Aug 10 15:38:24 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Wed, 10 Aug 2011 15:38:24 -0400 Subject: [Numpy-discussion] numpydoc - latex longtables error In-Reply-To: References: Message-ID: On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett wrote: > Hi, > > I think this one might be for Pauli. > > I've run into an odd problem that seems to be an interaction of > numpydoc and autosummary and large classes. > > In summary, large classes and numpydoc lead to large tables of class > methods, and there seems to be an error in the creation of the large > tables in latex. > > Specifically, if I run 'make latexpdf' with the attached minimal > sphinx setup, I get a pdflatex error ending thus: > > ... > l.118 \begin{longtable}{LL} > > and this is because longtable does not accept LL as an argument, but > needs '|l|l|' (bar - el - bar - el - bar). > > I see in sphinx.writers.latex.py, around line 657, that sphinx knows > about this in general, and long tables in standard ReST work fine with > the el-bar arguments passed to longtable. > > ? ? ? ?if self.table.colspec: > ? ? ? ? ? ?self.body.append(self.table.colspec) > ? ? ? ?else: > ? ? ? ? ? ?if self.table.has_problematic: > ? ? ? ? ? ? ? ?colwidth = 0.95 / self.table.colcount > ? ? ? ? ? ? ? ?colspec = ('p{%.3f\\linewidth}|' % colwidth) * \ > ? ? ? ? ? ? ? ? ? ? ? ? ?self.table.colcount > ? ? ? ? ? ? ? ?self.body.append('{|' + colspec + '}\n') > ? ? ? ? ? ?elif self.table.longtable: > ? ? ? ? ? ? ? ?self.body.append('{|' + ('l|' * self.table.colcount) + '}\n') > ? ? ? ? ? ?else: > ? ? ? ? ? ? ? ?self.body.append('{|' + ('L|' * self.table.colcount) + '}\n') > > However, using numpydoc and autosummary (see the conf.py file), what > seems to happen is that, when we reach the self.table.colspec test at > the beginning of the snippet above, 'self.table.colspec' is defined: > > In [1]: self.table.colspec > Out[1]: '{LL}\n' > > and thus the LL gets written as the arg to longtable: > > \begin{longtable}{LL} > > and the pdf build breaks. > > I'm using the numpydoc out of the current numpy source tree. > > At that point I wasn't sure how to proceed with debugging. ?Can you > give any hints? > It's not a proper fix, but our workaround is to edit the Makefile for latex (and latexpdf) to https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94 https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121 to call the script to replace the longtable arguments https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py The workaround itself probably isn't optimal, and I'd be happy to hear of a proper fix. Cheers, Skipper From derek at astro.physik.uni-goettingen.de Wed Aug 10 15:43:47 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 10 Aug 2011 21:43:47 +0200 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: References: Message-ID: On 10 Aug 2011, at 19:22, Russell E. Owen wrote: > A coworker is trying to load a 1Gb text data file into a numpy array > using numpy.loadtxt, but he says it is using up all of his machine's 6Gb > of RAM. Is there a more efficient way to read such text data files? The npyio routines (loadtxt as well as genfromtxt) first read in the entire data as lists, which creates of course significant overhead, but is not easy to circumvent, since numpy arrays are immutable - so you have to first store the numbers in some kind of mutable object. One could write a custom parser that tries to be somewhat more efficient, e.g. first reading in sub-arrays from a smaller buffer. Concatenating those sub-arrays would still require about twice the memory of the final array. I don't know if using the array.array type (which is mutable) is much more efficient than a list... To really avoid any excess memory usage you'd have to know the total data size in advance - either by reading in the file in a first pass to count the rows, or explicitly specifying it to a custom reader. Basically, assuming a completely regular file without missing values etc., you could then read in the data like X = np.zeros((n_lines, n_columns), dtype=float) delimiter = ' ' for n, line in enumerate(file(fname, 'r')): X[n] = np.array(line.split(delimiter), dtype=float) (adjust delimiter and dtype as needed...) HTH, Derek From aarchiba at physics.mcgill.ca Wed Aug 10 16:01:37 2011 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Wed, 10 Aug 2011 16:01:37 -0400 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: References: Message-ID: There was also some work on a semi-mutable array type that allowed appending along one axis, then 'freezing' to yield a normal numpy array (unfortunately I'm not sure how to find it in the mailing list archives). One could write such a setup by hand, using mmap() or realloc(), but I'd be inclined to simply write a filter that converted the text file to some sort of binary file on the fly, value by value. Then the file can be loaded in or mmap()ed. A 1 Gb text file is a miserable object anyway, so it might be desirable to convert to (say) HDF5 and then throw away the text file. Anne On 10 August 2011 15:43, Derek Homeier wrote: > On 10 Aug 2011, at 19:22, Russell E. Owen wrote: > >> A coworker is trying to load a 1Gb text data file into a numpy array >> using numpy.loadtxt, but he says it is using up all of his machine's 6Gb >> of RAM. Is there a more efficient way to read such text data files? > > The npyio routines (loadtxt as well as genfromtxt) first read in the entire data as lists, which creates of course significant overhead, but is not easy to circumvent, since numpy arrays are immutable - so you have to first store the numbers in some kind of mutable object. One could write a custom parser that tries to be somewhat more efficient, e.g. first reading in sub-arrays from a smaller buffer. Concatenating those sub-arrays would still require about twice the memory of the final array. I don't know if using the array.array type (which is mutable) is much more efficient than a list... > To really avoid any excess memory usage you'd have to know the total data size in advance - either by reading in the file in a first pass to count the rows, or explicitly specifying it to a custom reader. Basically, assuming a completely regular file without missing values etc., you could then read in the data like > > X = np.zeros((n_lines, n_columns), dtype=float) > delimiter = ' ' > for n, line in enumerate(file(fname, 'r')): > ? ?X[n] = np.array(line.split(delimiter), dtype=float) > > (adjust delimiter and dtype as needed...) > > HTH, > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From gael.varoquaux at normalesup.org Wed Aug 10 16:03:03 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Wed, 10 Aug 2011 22:03:03 +0200 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: References: Message-ID: <20110810200303.GA24720@phare.normalesup.org> On Wed, Aug 10, 2011 at 04:01:37PM -0400, Anne Archibald wrote: > A 1 Gb text file is a miserable object anyway, so it might be desirable > to convert to (say) HDF5 and then throw away the text file. +1 G From derek at astro.physik.uni-goettingen.de Wed Aug 10 16:12:37 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 10 Aug 2011 22:12:37 +0200 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: <20110810200303.GA24720@phare.normalesup.org> References: <20110810200303.GA24720@phare.normalesup.org> Message-ID: <28298416-05FF-446F-8841-039DE31AD77A@astro.physik.uni-goettingen.de> On 10 Aug 2011, at 22:03, Gael Varoquaux wrote: > On Wed, Aug 10, 2011 at 04:01:37PM -0400, Anne Archibald wrote: >> A 1 Gb text file is a miserable object anyway, so it might be desirable >> to convert to (say) HDF5 and then throw away the text file. > > +1 There might be concerns about ensuring data accessibility agains throwing the text file away, but converting to HDF5 would be an elegant for reading in without the memory issues, too (I must confess though, I've regularly read ~ 1GB ASCII files into memory - with decent virtual memory management it did not turn out too bad...) Cheers, Derek From paul.anton.letnes at gmail.com Wed Aug 10 16:23:06 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Wed, 10 Aug 2011 21:23:06 +0100 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: <20110810200303.GA24720@phare.normalesup.org> References: <20110810200303.GA24720@phare.normalesup.org> Message-ID: <0F0B6E30-34C3-429E-9098-AD512532F4D0@gmail.com> On 10. aug. 2011, at 21.03, Gael Varoquaux wrote: > On Wed, Aug 10, 2011 at 04:01:37PM -0400, Anne Archibald wrote: >> A 1 Gb text file is a miserable object anyway, so it might be desirable >> to convert to (say) HDF5 and then throw away the text file. > > +1 > > G +1 and a very warm recommendation of h5py. Paul From ben.root at ou.edu Wed Aug 10 16:55:37 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 10 Aug 2011 15:55:37 -0500 Subject: [Numpy-discussion] bug with assignment into an indexed array? Message-ID: Came across this today when trying to determine what was wrong with my code: import numpy as np matched_to = np.array([-1] * 5) in_ellipse = np.array([False, True, True, True, False]) match = np.array([False, True, True]) matched_to[in_ellipse][match] = 3 I would expect matched_to to now be "array([-1, -1, 3, 3, -1])", but instead, it is still all -1. It would seem that unless the view was created by a slice, then the assignment into the indexed view would not work as expected. This works: >>> matched_to[:3][match] = 3 but not: >>> matched_to[np.array([0, 1, 2])][match] = 3 Note that the following does work: >>> matched_to[np.array([0, 1, 2])] = 3 Is this a bug, or was I wrong to expect this to work this way? Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Aug 10 18:17:26 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 Aug 2011 15:17:26 -0700 Subject: [Numpy-discussion] numpydoc - latex longtables error In-Reply-To: References: Message-ID: Hi, On Wed, Aug 10, 2011 at 12:38 PM, Skipper Seabold wrote: > On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett wrote: >> Hi, >> >> I think this one might be for Pauli. >> >> I've run into an odd problem that seems to be an interaction of >> numpydoc and autosummary and large classes. >> >> In summary, large classes and numpydoc lead to large tables of class >> methods, and there seems to be an error in the creation of the large >> tables in latex. >> >> Specifically, if I run 'make latexpdf' with the attached minimal >> sphinx setup, I get a pdflatex error ending thus: >> >> ... >> l.118 \begin{longtable}{LL} >> >> and this is because longtable does not accept LL as an argument, but >> needs '|l|l|' (bar - el - bar - el - bar). >> >> I see in sphinx.writers.latex.py, around line 657, that sphinx knows >> about this in general, and long tables in standard ReST work fine with >> the el-bar arguments passed to longtable. >> >> ? ? ? ?if self.table.colspec: >> ? ? ? ? ? ?self.body.append(self.table.colspec) >> ? ? ? ?else: >> ? ? ? ? ? ?if self.table.has_problematic: >> ? ? ? ? ? ? ? ?colwidth = 0.95 / self.table.colcount >> ? ? ? ? ? ? ? ?colspec = ('p{%.3f\\linewidth}|' % colwidth) * \ >> ? ? ? ? ? ? ? ? ? ? ? ? ?self.table.colcount >> ? ? ? ? ? ? ? ?self.body.append('{|' + colspec + '}\n') >> ? ? ? ? ? ?elif self.table.longtable: >> ? ? ? ? ? ? ? ?self.body.append('{|' + ('l|' * self.table.colcount) + '}\n') >> ? ? ? ? ? ?else: >> ? ? ? ? ? ? ? ?self.body.append('{|' + ('L|' * self.table.colcount) + '}\n') >> >> However, using numpydoc and autosummary (see the conf.py file), what >> seems to happen is that, when we reach the self.table.colspec test at >> the beginning of the snippet above, 'self.table.colspec' is defined: >> >> In [1]: self.table.colspec >> Out[1]: '{LL}\n' >> >> and thus the LL gets written as the arg to longtable: >> >> \begin{longtable}{LL} >> >> and the pdf build breaks. >> >> I'm using the numpydoc out of the current numpy source tree. >> >> At that point I wasn't sure how to proceed with debugging. ?Can you >> give any hints? >> > > It's not a proper fix, but our workaround is to edit the Makefile for > latex (and latexpdf) to > > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94 > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121 > > to call the script to replace the longtable arguments > > https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py > > The workaround itself probably isn't optimal, and I'd be happy to hear > of a proper fix. Thanks - yes - I found your workaround in my explorations, I put in a version in our tree too: https://github.com/matthew-brett/nipy/blob/latex_build_fixes/tools/fix_longtable.py - but I agree it seems much better to get to the root cause. See you, Matthew From josef.pktd at gmail.com Wed Aug 10 20:03:53 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 10 Aug 2011 20:03:53 -0400 Subject: [Numpy-discussion] numpydoc - latex longtables error In-Reply-To: References: Message-ID: On Wed, Aug 10, 2011 at 6:17 PM, Matthew Brett wrote: > Hi, > > On Wed, Aug 10, 2011 at 12:38 PM, Skipper Seabold wrote: >> On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett wrote: >>> Hi, >>> >>> I think this one might be for Pauli. >>> >>> I've run into an odd problem that seems to be an interaction of >>> numpydoc and autosummary and large classes. >>> >>> In summary, large classes and numpydoc lead to large tables of class >>> methods, and there seems to be an error in the creation of the large >>> tables in latex. >>> >>> Specifically, if I run 'make latexpdf' with the attached minimal >>> sphinx setup, I get a pdflatex error ending thus: >>> >>> ... >>> l.118 \begin{longtable}{LL} >>> >>> and this is because longtable does not accept LL as an argument, but >>> needs '|l|l|' (bar - el - bar - el - bar). >>> >>> I see in sphinx.writers.latex.py, around line 657, that sphinx knows >>> about this in general, and long tables in standard ReST work fine with >>> the el-bar arguments passed to longtable. >>> >>> ? ? ? ?if self.table.colspec: >>> ? ? ? ? ? ?self.body.append(self.table.colspec) >>> ? ? ? ?else: >>> ? ? ? ? ? ?if self.table.has_problematic: >>> ? ? ? ? ? ? ? ?colwidth = 0.95 / self.table.colcount >>> ? ? ? ? ? ? ? ?colspec = ('p{%.3f\\linewidth}|' % colwidth) * \ >>> ? ? ? ? ? ? ? ? ? ? ? ? ?self.table.colcount >>> ? ? ? ? ? ? ? ?self.body.append('{|' + colspec + '}\n') >>> ? ? ? ? ? ?elif self.table.longtable: >>> ? ? ? ? ? ? ? ?self.body.append('{|' + ('l|' * self.table.colcount) + '}\n') >>> ? ? ? ? ? ?else: >>> ? ? ? ? ? ? ? ?self.body.append('{|' + ('L|' * self.table.colcount) + '}\n') >>> >>> However, using numpydoc and autosummary (see the conf.py file), what >>> seems to happen is that, when we reach the self.table.colspec test at >>> the beginning of the snippet above, 'self.table.colspec' is defined: >>> >>> In [1]: self.table.colspec >>> Out[1]: '{LL}\n' >>> >>> and thus the LL gets written as the arg to longtable: >>> >>> \begin{longtable}{LL} >>> >>> and the pdf build breaks. >>> >>> I'm using the numpydoc out of the current numpy source tree. >>> >>> At that point I wasn't sure how to proceed with debugging. ?Can you >>> give any hints? >>> >> >> It's not a proper fix, but our workaround is to edit the Makefile for >> latex (and latexpdf) to >> >> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94 >> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121 >> >> to call the script to replace the longtable arguments >> >> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py >> >> The workaround itself probably isn't optimal, and I'd be happy to hear >> of a proper fix. > > Thanks - yes - I found your workaround in my explorations, I put in a > version in our tree too: > > https://github.com/matthew-brett/nipy/blob/latex_build_fixes/tools/fix_longtable.py > > ?- but I agree it seems much better to get to the root cause. When I tried to figure this out, I never found out why the correct sphinx longtable code path never gets reached, or which code (numpydoc, autosummary or sphinx) is filling in the colspec. Josef > > See you, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Wed Aug 10 20:17:26 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 10 Aug 2011 17:17:26 -0700 Subject: [Numpy-discussion] numpydoc - latex longtables error In-Reply-To: References: Message-ID: Hi, On Wed, Aug 10, 2011 at 5:03 PM, wrote: > On Wed, Aug 10, 2011 at 6:17 PM, Matthew Brett wrote: >> Hi, >> >> On Wed, Aug 10, 2011 at 12:38 PM, Skipper Seabold wrote: >>> On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett wrote: >>>> Hi, >>>> >>>> I think this one might be for Pauli. >>>> >>>> I've run into an odd problem that seems to be an interaction of >>>> numpydoc and autosummary and large classes. >>>> >>>> In summary, large classes and numpydoc lead to large tables of class >>>> methods, and there seems to be an error in the creation of the large >>>> tables in latex. >>>> >>>> Specifically, if I run 'make latexpdf' with the attached minimal >>>> sphinx setup, I get a pdflatex error ending thus: >>>> >>>> ... >>>> l.118 \begin{longtable}{LL} >>>> >>>> and this is because longtable does not accept LL as an argument, but >>>> needs '|l|l|' (bar - el - bar - el - bar). >>>> >>>> I see in sphinx.writers.latex.py, around line 657, that sphinx knows >>>> about this in general, and long tables in standard ReST work fine with >>>> the el-bar arguments passed to longtable. >>>> >>>> ? ? ? ?if self.table.colspec: >>>> ? ? ? ? ? ?self.body.append(self.table.colspec) >>>> ? ? ? ?else: >>>> ? ? ? ? ? ?if self.table.has_problematic: >>>> ? ? ? ? ? ? ? ?colwidth = 0.95 / self.table.colcount >>>> ? ? ? ? ? ? ? ?colspec = ('p{%.3f\\linewidth}|' % colwidth) * \ >>>> ? ? ? ? ? ? ? ? ? ? ? ? ?self.table.colcount >>>> ? ? ? ? ? ? ? ?self.body.append('{|' + colspec + '}\n') >>>> ? ? ? ? ? ?elif self.table.longtable: >>>> ? ? ? ? ? ? ? ?self.body.append('{|' + ('l|' * self.table.colcount) + '}\n') >>>> ? ? ? ? ? ?else: >>>> ? ? ? ? ? ? ? ?self.body.append('{|' + ('L|' * self.table.colcount) + '}\n') >>>> >>>> However, using numpydoc and autosummary (see the conf.py file), what >>>> seems to happen is that, when we reach the self.table.colspec test at >>>> the beginning of the snippet above, 'self.table.colspec' is defined: >>>> >>>> In [1]: self.table.colspec >>>> Out[1]: '{LL}\n' >>>> >>>> and thus the LL gets written as the arg to longtable: >>>> >>>> \begin{longtable}{LL} >>>> >>>> and the pdf build breaks. >>>> >>>> I'm using the numpydoc out of the current numpy source tree. >>>> >>>> At that point I wasn't sure how to proceed with debugging. ?Can you >>>> give any hints? >>>> >>> >>> It's not a proper fix, but our workaround is to edit the Makefile for >>> latex (and latexpdf) to >>> >>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94 >>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121 >>> >>> to call the script to replace the longtable arguments >>> >>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py >>> >>> The workaround itself probably isn't optimal, and I'd be happy to hear >>> of a proper fix. >> >> Thanks - yes - I found your workaround in my explorations, I put in a >> version in our tree too: >> >> https://github.com/matthew-brett/nipy/blob/latex_build_fixes/tools/fix_longtable.py >> >> ?- but I agree it seems much better to get to the root cause. > > When I tried to figure this out, I never found out why the correct > sphinx longtable code path never gets reached, or which code > (numpydoc, autosummary or sphinx) is filling in the colspec. No - it looked hard to debug. I established that it required numpydoc and autosummary to be enabled. See you, Matthew From yoyoq at yahoo.com Wed Aug 10 20:50:00 2011 From: yoyoq at yahoo.com (jp d) Date: Wed, 10 Aug 2011 17:50:00 -0700 (PDT) Subject: [Numpy-discussion] matrix inversion Message-ID: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> hi, i am trying to invert matrices like this: [[ 0.01643777 -0.13539939? 0.11946689] ?[ 0.12479926? 0.01210898 -0.09217618] ?[-0.13050087? 0.07575163? 0.01144993]] in perl using Math::MatrixReal; and in various online calculators i get [? 2.472715991745? 3.680743681735 -3.831392002314 ] [ -4.673105249083 -5.348238625096 -5.703193038649 ] [? 2.733966489601 -6.567940452290 -5.936617926811 ] using python , numpy and linalg.inv (or linalg.pinv) i get? a divergent answer [[? 6.79611151e+07?? 1.01163031e+08?? 1.05303510e+08] ?[? 1.01163057e+08?? 1.50585545e+08?? 1.56748838e+08] ?[? 1.05303548e+08?? 1.56748831e+08?? 1.63164381e+08]] any suggestions? thanks jpd -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Wed Aug 10 21:06:41 2011 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 10 Aug 2011 21:06:41 -0400 Subject: [Numpy-discussion] matrix inversion In-Reply-To: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> References: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> Message-ID: <4E432B21.607@gmail.com> On 8/10/2011 8:50 PM, jp d wrote: > i am trying to invert matrices like this: > [[ 0.01643777 -0.13539939 0.11946689] > [ 0.12479926 0.01210898 -0.09217618] > [-0.13050087 0.07575163 0.01144993]] > in perl using Math::MatrixReal; > and in various online calculators i get > [ 2.472715991745 3.680743681735 -3.831392002314 ] > [ -4.673105249083 -5.348238625096 -5.703193038649 ] > [ 2.733966489601 -6.567940452290 -5.936617926811 ] > using python , numpy and linalg.inv (or linalg.pinv) i get a divergent answer > [[ 6.79611151e+07 1.01163031e+08 1.05303510e+08] > [ 1.01163057e+08 1.50585545e+08 1.56748838e+08] > [ 1.05303548e+08 1.56748831e+08 1.63164381e+08]] Please demonstrate with code:: >>> m = np.mat([[ 0.01643777,-0.13539939, 0.11946689],[ 0.12479926, 0.01210898,-0.09217618 ],[-0.13050087, 0.07575163, 0.01144993]]) >>> m.I matrix([[ -2.60023901e+08, -3.87056678e+08, -4.02898472e+08], [ -3.87056814e+08, -5.76150592e+08, -5.99731775e+08], [ -4.02898597e+08, -5.99731775e+08, -6.24278108e+08]]) Thank you, Alan Isaac From nadavh at visionsense.com Thu Aug 11 00:21:34 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Wed, 10 Aug 2011 21:21:34 -0700 Subject: [Numpy-discussion] matrix inversion In-Reply-To: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> References: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> Message-ID: <26FC23E7C398A64083C980D16001012D246DFC5F87@VA3DIAXVS361.RED001.local> The matrix in singular, so you can not expect a stable inverse. Nadav. ________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of jp d [yoyoq at yahoo.com] Sent: 11 August 2011 03:50 To: numpy-discussion at scipy.org Subject: [Numpy-discussion] matrix inversion hi, i am trying to invert matrices like this: [[ 0.01643777 -0.13539939 0.11946689] [ 0.12479926 0.01210898 -0.09217618] [-0.13050087 0.07575163 0.01144993]] in perl using Math::MatrixReal; and in various online calculators i get [ 2.472715991745 3.680743681735 -3.831392002314 ] [ -4.673105249083 -5.348238625096 -5.703193038649 ] [ 2.733966489601 -6.567940452290 -5.936617926811 ] using python , numpy and linalg.inv (or linalg.pinv) i get a divergent answer [[ 6.79611151e+07 1.01163031e+08 1.05303510e+08] [ 1.01163057e+08 1.50585545e+08 1.56748838e+08] [ 1.05303548e+08 1.56748831e+08 1.63164381e+08]] any suggestions? thanks jpd -------------- next part -------------- An HTML attachment was scrubbed... URL: From focke at slac.stanford.edu Thu Aug 11 00:42:33 2011 From: focke at slac.stanford.edu (Warren Focke) Date: Wed, 10 Aug 2011 21:42:33 -0700 (PDT) Subject: [Numpy-discussion] matrix inversion In-Reply-To: <26FC23E7C398A64083C980D16001012D246DFC5F87@VA3DIAXVS361.RED001.local> References: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> <26FC23E7C398A64083C980D16001012D246DFC5F87@VA3DIAXVS361.RED001.local> Message-ID: The svs are 1.99991695e-01, 1.99991682e-01, 6.84719250e-10 so if you try >>> np.linalg.pinv(a,1e-5) array([[ 0.41097834, 3.12024106, -3.26279309], [-3.38526587, 0.30274957, 1.89394811], [ 2.98692033, -2.30459609, 0.28627222]]) you at least get an answer that's not near-random. w On Wed, 10 Aug 2011, Nadav Horesh wrote: > The matrix in singular, so you can not expect a stable inverse. > > Nadav. > > ________________________________ > From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of jp d [yoyoq at yahoo.com] > Sent: 11 August 2011 03:50 > To: numpy-discussion at scipy.org > Subject: [Numpy-discussion] matrix inversion > > hi, > i am trying to invert matrices like this: > [[ 0.01643777 -0.13539939 0.11946689] > [ 0.12479926 0.01210898 -0.09217618] > [-0.13050087 0.07575163 0.01144993]] > > in perl using Math::MatrixReal; > and in various online calculators i get > [ 2.472715991745 3.680743681735 -3.831392002314 ] > [ -4.673105249083 -5.348238625096 -5.703193038649 ] > [ 2.733966489601 -6.567940452290 -5.936617926811 ] > > using python , numpy and linalg.inv (or linalg.pinv) i get a divergent answer > [[ 6.79611151e+07 1.01163031e+08 1.05303510e+08] > [ 1.01163057e+08 1.50585545e+08 1.56748838e+08] > [ 1.05303548e+08 1.56748831e+08 1.63164381e+08]] > > any suggestions? > > thanks > jpd > From lkb.teichmann at gmail.com Thu Aug 11 02:41:06 2011 From: lkb.teichmann at gmail.com (Martin Teichmann) Date: Thu, 11 Aug 2011 08:41:06 +0200 Subject: [Numpy-discussion] matrix inversion In-Reply-To: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> References: <1313023800.18375.YahooMailNeo@web130115.mail.mud.yahoo.com> Message-ID: Hi, > i am trying to invert matrices like this: > [[ 0.01643777 -0.13539939? 0.11946689] > ?[ 0.12479926? 0.01210898 -0.09217618] > ?[-0.13050087? 0.07575163? 0.01144993]] > > in perl using Math::MatrixReal; > and in various online calculators i get > [? 2.472715991745? 3.680743681735 -3.831392002314 ] > [ -4.673105249083 -5.348238625096 -5.703193038649 ] > [? 2.733966489601 -6.567940452290 -5.936617926811 ] well, inverting latter matrix, I get >>> n=np.mat([[ 2.472715991745 , 3.680743681735 ,-3.831392002314 ], [ -4.673105249083, -5.348238625096, -5.703193038649 ], [ 2.733966489601, -6.567940452290, -5.936617926811 ]]) >>> n.I matrix([[ 0.01643777, -0.13539939, 0.11946689], [ 0.12479926, 0.01210898, -0.09217618], [-0.13050087, -0.07575163, -0.01144993]]) Which is nearly the same matrix as the matrix you started with, but quite. There are some minus signs more in the last two values... sure you didn't forget these? Adding them by hand gives nearly the same value as your Perl result, but better, the residua on the off diagonals are significantly lower. Greetings Martin From keith.hughitt at gmail.com Thu Aug 11 08:59:57 2011 From: keith.hughitt at gmail.com (Keith Hughitt) Date: Thu, 11 Aug 2011 08:59:57 -0400 Subject: [Numpy-discussion] Returning ndimage subclass instances from scipy methods? Message-ID: Hi all, Does anyone know if it is possible to have SciPy methods which work on/return ndarray instances return subclass instances instead? For example, I can pass in an instance of a ndarray subclass to methods in scipy.ndimage, but a normal ndarray is returned instead of a new subclass instance. Wrapping the result in __array_wrap__ and __array_finalize__ has not seemed to help. I also tried asking on the scipy-users list, but so far have not gotten any response. Any suggestions? Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhanjal at telecom-paristech.fr Thu Aug 11 09:23:22 2011 From: dhanjal at telecom-paristech.fr (dhanjal at telecom-paristech.fr) Date: Thu, 11 Aug 2011 15:23:22 +0200 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix Message-ID: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> Hi all, I get an error message "numpy.linalg.linalg.LinAlgError: SVD did not converge" when calling numpy.linalg.svd on a "clean" matrix of size (1952, 895). The matrix is clean in the sense that it contains no NaN or Inf values. The corresponding npz file is available here: https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx&hl=fr Here is some information about my setup: I use Python 2.7.1 on Ubuntu 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be solved by recompiling numpy with my local ATLAS library (version 3.8.3), and this didn't seem to help. On another machine with Python 2.7.1 and numpy 1.5.1 the SVD does converge however it contains 1 NaN singular value and 3 negative singular values of the order -10^-1 (singular values should always be non-negative). I also tried computing the SVD of the matrix using Octave 3.2.4 and Matlab 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. Any help is greatly appreciated. Thanks in advance, Charanpal From shish at keba.be Thu Aug 11 09:37:45 2011 From: shish at keba.be (Olivier Delalleau) Date: Thu, 11 Aug 2011 09:37:45 -0400 Subject: [Numpy-discussion] bug with assignment into an indexed array? In-Reply-To: References: Message-ID: Maybe confusing, but working as expected. When you write: matched_to[np.array([0, 1, 2])] = 3 it calls __setitem__ on matched_to, with arguments (np.array([0, 1, 2]), 3). So numpy understand you want to write 3 at these indices. When you write: matched_to[:3][match] = 3 it first calls __getitem__ with the slice as argument, which returns a view of your array, then it calls __setitem__ on this view, and it fills your matched_to array at the same time. But when you write: matched_to[np.array([0, 1, 2])][match] = 3 it first calls __getitem__ with the array as argument, which retunrs a *copy* of your array, so that calling __setitem__ on this copy has no effect on your original array. -=- Olivier 2011/8/10 Benjamin Root > Came across this today when trying to determine what was wrong with my > code: > > import numpy as np > matched_to = np.array([-1] * 5) > in_ellipse = np.array([False, True, True, True, False]) > match = np.array([False, True, True]) > matched_to[in_ellipse][match] = 3 > > I would expect matched_to to now be "array([-1, -1, 3, 3, -1])", but > instead, it is still all -1. > > It would seem that unless the view was created by a slice, then the > assignment into the indexed view would not work as expected. This works: > > >>> matched_to[:3][match] = 3 > > but not: > > >>> matched_to[np.array([0, 1, 2])][match] = 3 > > Note that the following does work: > > >>> matched_to[np.array([0, 1, 2])] = 3 > > Is this a bug, or was I wrong to expect this to work this way? > > Thanks, > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Thu Aug 11 10:21:09 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Thu, 11 Aug 2011 07:21:09 -0700 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> Message-ID: <26FC23E7C398A64083C980D16001012D246DFC5F90@VA3DIAXVS361.RED001.local> Had no problem on a gentoo 64 bit machine using atlas 3.8.0 (Core I7, python 2.7.2, numpy versions1.60 and 1.6.1) Nadav ________________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of dhanjal at telecom-paristech.fr [dhanjal at telecom-paristech.fr] Sent: 11 August 2011 16:23 To: numpy-discussion at scipy.org Subject: [Numpy-discussion] SVD does not converge on "clean" matrix Hi all, I get an error message "numpy.linalg.linalg.LinAlgError: SVD did not converge" when calling numpy.linalg.svd on a "clean" matrix of size (1952, 895). The matrix is clean in the sense that it contains no NaN or Inf values. The corresponding npz file is available here: https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx&hl=fr Here is some information about my setup: I use Python 2.7.1 on Ubuntu 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be solved by recompiling numpy with my local ATLAS library (version 3.8.3), and this didn't seem to help. On another machine with Python 2.7.1 and numpy 1.5.1 the SVD does converge however it contains 1 NaN singular value and 3 negative singular values of the order -10^-1 (singular values should always be non-negative). I also tried computing the SVD of the matrix using Octave 3.2.4 and Matlab 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. Any help is greatly appreciated. Thanks in advance, Charanpal _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Thu Aug 11 11:16:37 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 11 Aug 2011 10:16:37 -0500 Subject: [Numpy-discussion] bug with assignment into an indexed array? In-Reply-To: References: Message-ID: On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau wrote: > Maybe confusing, but working as expected. > > > When you write: > matched_to[np.array([0, 1, 2])] = 3 > it calls __setitem__ on matched_to, with arguments (np.array([0, 1, 2]), > 3). So numpy understand you want to write 3 at these indices. > > > When you write: > matched_to[:3][match] = 3 > it first calls __getitem__ with the slice as argument, which returns a view > of your array, then it calls __setitem__ on this view, and it fills your > matched_to array at the same time. > > > But when you write: > matched_to[np.array([0, 1, 2])][match] = 3 > it first calls __getitem__ with the array as argument, which retunrs a > *copy* of your array, so that calling __setitem__ on this copy has no effect > on your original array. > > -=- Olivier > > Right, but I guess my question is does it *have* to be that way? I guess it makes some sense with respect to indexing with a numpy array like I did with the last example, because an element could be referred to multiple times (which explains the common surprise with '+='), but with boolean indexing, we are guaranteed that each element of the view will appear at most once. Therefore, shouldn't boolean indexing always return a view, not a copy? Is the general case of arbitrary array selection inherently impossible to encode in a view versus a slice with a regular spacing? Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Thu Aug 11 11:33:41 2011 From: shish at keba.be (Olivier Delalleau) Date: Thu, 11 Aug 2011 11:33:41 -0400 Subject: [Numpy-discussion] bug with assignment into an indexed array? In-Reply-To: References: Message-ID: 2011/8/11 Benjamin Root > > > On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau wrote: > >> Maybe confusing, but working as expected. >> >> >> When you write: >> matched_to[np.array([0, 1, 2])] = 3 >> it calls __setitem__ on matched_to, with arguments (np.array([0, 1, 2]), >> 3). So numpy understand you want to write 3 at these indices. >> >> >> When you write: >> matched_to[:3][match] = 3 >> it first calls __getitem__ with the slice as argument, which returns a >> view of your array, then it calls __setitem__ on this view, and it fills >> your matched_to array at the same time. >> >> >> But when you write: >> matched_to[np.array([0, 1, 2])][match] = 3 >> it first calls __getitem__ with the array as argument, which retunrs a >> *copy* of your array, so that calling __setitem__ on this copy has no effect >> on your original array. >> >> -=- Olivier >> >> > Right, but I guess my question is does it *have* to be that way? I guess > it makes some sense with respect to indexing with a numpy array like I did > with the last example, because an element could be referred to multiple > times (which explains the common surprise with '+='), but with boolean > indexing, we are guaranteed that each element of the view will appear at > most once. Therefore, shouldn't boolean indexing always return a view, not > a copy? Is the general case of arbitrary array selection inherently > impossible to encode in a view versus a slice with a regular spacing? > Yes, due to the fact the array interface only supports regular spacing (otherwise it is more difficult to get efficient access to arbitrary array positions). -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From tmp50 at ukr.net Thu Aug 11 12:02:00 2011 From: tmp50 at ukr.net (Dmitrey) Date: Thu, 11 Aug 2011 19:02:00 +0300 Subject: [Numpy-discussion] bug with latest numpy git snapshot build with Python3 Message-ID: bug in KUBUNTU 11.04, latest numpy git snapshot build with Python3 >>> import numpy Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.2/dist-packages/numpy/__init__.py", line 137, in from . import add_newdocs File "/usr/local/lib/python3.2/dist-packages/numpy/add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "/usr/local/lib/python3.2/dist-packages/numpy/lib/__init__.py", line 4, in from .type_check import * File "/usr/local/lib/python3.2/dist-packages/numpy/lib/type_check.py", line 8, in import numpy.core.numeric as _nx File "/usr/local/lib/python3.2/dist-packages/numpy/core/__init__.py", line 10, in from .numeric import * File "/usr/local/lib/python3.2/dist-packages/numpy/core/numeric.py", line 27, in import multiarray ImportError: No module named multiarray -------------- next part -------------- An HTML attachment was scrubbed... URL: From wesmckinn at gmail.com Thu Aug 11 14:25:07 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 11 Aug 2011 14:25:07 -0400 Subject: [Numpy-discussion] Questionable reduceat behavior Message-ID: I'm a little perplexed why reduceat was made to behave like this: In [26]: arr = np.ones((10, 4), dtype=bool) In [27]: arr Out[27]: array([[ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True], [ True, True, True, True]], dtype=bool) In [30]: np.add.reduceat(arr, [0, 3, 3, 7, 9], axis=0) Out[30]: array([[3, 3, 3, 3], [1, 1, 1, 1], [4, 4, 4, 4], [2, 2, 2, 2], [1, 1, 1, 1]]) this does not seem intuitively correct. Since we have: In [33]: arr[3:3].sum(0) Out[33]: array([0, 0, 0, 0]) I would expect array([[3, 3, 3, 3], [0, 0, 0, 0], [4, 4, 4, 4], [2, 2, 2, 2], [1, 1, 1, 1]]) Obviously I can RTFM and see why it does this ("if ``indices[i] >= indices[i + 1]``, the i-th generalized "row" is simply ``a[indices[i]]``"), but it doesn't make much sense to me, and I need work around it. Suggestions? From rowen at uw.edu Thu Aug 11 14:50:14 2011 From: rowen at uw.edu (Russell E. Owen) Date: Thu, 11 Aug 2011 11:50:14 -0700 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? References: Message-ID: In article , Anne Archibald wrote: > There was also some work on a semi-mutable array type that allowed > appending along one axis, then 'freezing' to yield a normal numpy > array (unfortunately I'm not sure how to find it in the mailing list > archives). One could write such a setup by hand, using mmap() or > realloc(), but I'd be inclined to simply write a filter that converted > the text file to some sort of binary file on the fly, value by value. > Then the file can be loaded in or mmap()ed. A 1 Gb text file is a > miserable object anyway, so it might be desirable to convert to (say) > HDF5 and then throw away the text file. Thank you and the others for your help. It seems a shame that loadtxt has no argument for predicted length, which would allow preallocation and less appending/copying data. And yes...reading the whole file first to figure out how many elements it has seems sensible to me -- at least as a switchable behavior, and preferably the default. 1Gb isn't that large in modern systems, but loadtxt is filing up all 6Gb of RAM reading it! I'll suggest the HDF5 solution to my colleague. Meanwhile I think he's hacked around the problem by reading the file through once to figure out the array length, allocating that, and reading the data in with a Python loop. Sounds slow, but it's working. -- Russell From ben.root at ou.edu Thu Aug 11 16:37:26 2011 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 11 Aug 2011 15:37:26 -0500 Subject: [Numpy-discussion] bug with assignment into an indexed array? In-Reply-To: References: Message-ID: On Thu, Aug 11, 2011 at 10:33 AM, Olivier Delalleau wrote: > 2011/8/11 Benjamin Root > >> >> >> On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau wrote: >> >>> Maybe confusing, but working as expected. >>> >>> >>> When you write: >>> matched_to[np.array([0, 1, 2])] = 3 >>> it calls __setitem__ on matched_to, with arguments (np.array([0, 1, 2]), >>> 3). So numpy understand you want to write 3 at these indices. >>> >>> >>> When you write: >>> matched_to[:3][match] = 3 >>> it first calls __getitem__ with the slice as argument, which returns a >>> view of your array, then it calls __setitem__ on this view, and it fills >>> your matched_to array at the same time. >>> >>> >>> But when you write: >>> matched_to[np.array([0, 1, 2])][match] = 3 >>> it first calls __getitem__ with the array as argument, which retunrs a >>> *copy* of your array, so that calling __setitem__ on this copy has no effect >>> on your original array. >>> >>> -=- Olivier >>> >>> >> Right, but I guess my question is does it *have* to be that way? I guess >> it makes some sense with respect to indexing with a numpy array like I did >> with the last example, because an element could be referred to multiple >> times (which explains the common surprise with '+='), but with boolean >> indexing, we are guaranteed that each element of the view will appear at >> most once. Therefore, shouldn't boolean indexing always return a view, not >> a copy? Is the general case of arbitrary array selection inherently >> impossible to encode in a view versus a slice with a regular spacing? >> > > Yes, due to the fact the array interface only supports regular spacing > (otherwise it is more difficult to get efficient access to arbitrary array > positions). > > -=- Olivier > > This still bothers me, though. I imagine that it is next to impossible to detect this situation from numpy's perspective, so it can't even emit a warning or error. Furthermore, for someone who makes a general function to modify the contents of some externally provided array, there is a possibility that the provided array is actually a copy not a view. Although, I guess it is the responsibility of the user to know the difference. I guess that is the key problem. The key advantage we are taught about numpy arrays is the use of views for efficient access. It would seem that most access operations would use it, but in reality, only sliced access do. Everything else is a copy (unless you are doing fancy indexing with assignment). Maybe with some of the forthcoming changes that have been done with respect to nditer and ufuncs (in particular, I am thinking of the "where" kwarg), maybe we could consider an enhancement allowing fancy indexing (or at least boolean indexing) to produce a view? Even if it is less efficient than a view from slicing, it would bring better consistency in behavior between the different forms of indexing. Just my 2 cents, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From borreguero at gmail.com Thu Aug 11 19:43:12 2011 From: borreguero at gmail.com (Jose Borreguero) Date: Thu, 11 Aug 2011 19:43:12 -0400 Subject: [Numpy-discussion] how to create a block diagonal matrix by repeating the block? Message-ID: Dear numpy users, I have a 3x3 matrix which I want to repeat 50 times along a diagonal, thus creating a 150x150 block diagonal matrix. I know of a method usin scipy.linalg.block_diag, but I don't know if this is the best one: a = random.randn(3,3) b = a.reshape(1,3,3).repeat(50,axis=0) scipy.linalg.block_diag( *b ) Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Thu Aug 11 20:15:21 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 11 Aug 2011 17:15:21 -0700 Subject: [Numpy-discussion] how to create a block diagonal matrix by repeating the block? In-Reply-To: References: Message-ID: On Thu, Aug 11, 2011 at 4:43 PM, Jose Borreguero wrote: > a = random.randn(3,3) > b = a.reshape(1,3,3).repeat(50,axis=0) > scipy.linalg.block_diag( *b ) > slightly simpler, but equivalent, code: b = [a]*50 scipy.linalg.block_diag( *b) Cheers, f From warren.weckesser at enthought.com Thu Aug 11 22:01:03 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Thu, 11 Aug 2011 21:01:03 -0500 Subject: [Numpy-discussion] how to create a block diagonal matrix by repeating the block? In-Reply-To: References: Message-ID: On Thu, Aug 11, 2011 at 7:15 PM, Fernando Perez wrote: > On Thu, Aug 11, 2011 at 4:43 PM, Jose Borreguero > wrote: > > a = random.randn(3,3) > > b = a.reshape(1,3,3).repeat(50,axis=0) > > scipy.linalg.block_diag( *b ) > > > > slightly simpler, but equivalent, code: > > b = [a]*50 > scipy.linalg.block_diag( *b) > > The following is unnecessarily complicated--using block_diag is fine--but it can be fun to stretch out into the fourth dimension with stride tricks: from numpy import array, zeros from numpy.lib.stride_tricks import as_strided # N is the number of 3x3 blocks. # N = 50 N = 4 a = array([[1,2,3],[4,5,6],[7,8,9]]) # b will be the block-diagonal array. b = zeros((3*N, 3*N), dtype=a.dtype) bstr = b.strides c = as_strided(b, shape=(N,N,3,3), strides=(3*bstr[0], 3*bstr[1], bstr[0], bstr[1])) # Assign a to the diagonal blocks. c[range(N), range(N)] = a print b Output: [[1 2 3 0 0 0 0 0 0 0 0 0] [4 5 6 0 0 0 0 0 0 0 0 0] [7 8 9 0 0 0 0 0 0 0 0 0] [0 0 0 1 2 3 0 0 0 0 0 0] [0 0 0 4 5 6 0 0 0 0 0 0] [0 0 0 7 8 9 0 0 0 0 0 0] [0 0 0 0 0 0 1 2 3 0 0 0] [0 0 0 0 0 0 4 5 6 0 0 0] [0 0 0 0 0 0 7 8 9 0 0 0] [0 0 0 0 0 0 0 0 0 1 2 3] [0 0 0 0 0 0 0 0 0 4 5 6] [0 0 0 0 0 0 0 0 0 7 8 9]] Warren Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Fri Aug 12 00:49:18 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu, 11 Aug 2011 21:49:18 -0700 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: References: Message-ID: <4E44B0CE.4070402@noaa.gov> On 8/10/2011 1:01 PM, Anne Archibald wrote: > There was also some work on a semi-mutable array type that allowed > appending along one axis, then 'freezing' to yield a normal numpy > array (unfortunately I'm not sure how to find it in the mailing list > archives). That was me, and here is the thread -- however, I'm on vacation, and don't have the test code, etc with me, but I found the core class. It's enclosed. >> The npyio routines (loadtxt as well as genfromtxt) first read in the entire data as lists, which creates of course significant overhead, but is not easy to circumvent, since numpy arrays are immutable - so you have to first store the numbers in some kind of mutable object. One could write a custom parser that tries to be somewhat more efficient, e.g. first reading in sub-arrays from a smaller buffer. Concatenating those sub-arrays would still require about twice the memory of the final array. I don't know if using the array.array type (which is mutable) is much more efficient than a list... Indeed, and are holding all the text as well, which is generally going to be bigger than the resulting numbers. Interesting, when I wrote accumulator, I found that it didn't, for the most part, have any performance advantage over accumlating on lists, then converting to arrays -- but there is a memory advantage, so this may be a good use case. you could do something like (untested): If your rows are all one dtype: X = accumulator(dtype=np.float32, block_shape = (num_cols,)) if they are not, then build a custon dtype to hold the rows, and use that: dt = np.dtype('%id'%num_columns) # create a dtype that holds a row #num_columns doubles in this case. # create an accumulator for that dtype X = accumulator(dtype=dt) # loop through the file to build the array: delimiter = ' ' for line in file(fname, 'r'): X.append ( np.array(line.split(delimiter), dtype=float) ) X = np.array(X) # gives a regular old array as a copy I note that converting to a regular array requires a data copy, which, if memoery is tight, might not be good. The solution would be to have a way to make a view, so you'd get a regular array from the same data (with maybe the extra buffer space) I'd like to see this calss get more mature, robust, and better performing, but so far it's worked for my use cases. Contributions welcome. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: accumulator.py URL: From Chris.Barker at noaa.gov Fri Aug 12 00:51:37 2011 From: Chris.Barker at noaa.gov (Chris Barker) Date: Thu, 11 Aug 2011 21:51:37 -0700 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: <4E44B0CE.4070402@noaa.gov> References: <4E44B0CE.4070402@noaa.gov> Message-ID: <4E44B159.1080505@noaa.gov> aarrgg! I cleaned up the doc string a bit, but didn't save before sending -- here it is again, Sorry about that. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: accumulator.py URL: From dhanjal at telecom-paristech.fr Fri Aug 12 05:03:30 2011 From: dhanjal at telecom-paristech.fr (Charanpal Dhanjal) Date: Fri, 12 Aug 2011 11:03:30 +0200 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> Message-ID: <06f59405fc0dfce9e04f83d001963a23@telecom-paristech.fr> Thank Nadav for testing out the matrix. I wonder if you had a chance to check if the resulting decomposition contained NaN or Inf values? As far I understood, numpy.linalg.svd uses routines in LAPACK and ATLAS (if available) to compute the corresponding SVD. I did some complementary tests on Debian Squeeze on an Intel Xeon W3550 CPU and the call to numpy.linalg.svd results in the LinAlgError "SVD did not converge", however the test leading to results containing NaN values ran on Debian Lenny on an Intel Core 2 Quad. In both of these situations we use Python 2.7.1 and numpy 1.5.1 (without ATLAS), and so the reasons for the differences seem to be OS or processor dependent. Any ideas? Charanpal Date: Thu, 11 Aug 2011 07:21:09 -0700 From: Nadav Horesh Subject: Re: [Numpy-discussion] SVD does not converge on "clean" matrix To: Discussion of Numerical Python Message-ID: <26FC23E7C398A64083C980D16001012D246DFC5F90 at VA3DIAXVS361.RED001.local> Content-Type: text/plain; charset="us-ascii" > Had no problem on a gentoo 64 bit machine using atlas 3.8.0 (Core I7, > python 2.7.2, numpy versions1.60 and 1.6.1) > > Nadav >On Thu, 11 Aug 2011 15:23:22 +0200, dhanjal at telecom-paristech.fr > wrote: >> Hi all, >> >> I get an error message "numpy.linalg.linalg.LinAlgError: SVD did not >> converge" when calling numpy.linalg.svd on a "clean" matrix of size >> (1952, >> 895). The matrix is clean in the sense that it contains no NaN or >> Inf >> values. The corresponding npz file is available here: >> >> https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx&hl=fr >> >> Here is some information about my setup: I use Python 2.7.1 on >> Ubuntu >> 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be >> solved >> by recompiling numpy with my local ATLAS library (version 3.8.3), >> and this >> didn't seem to help. On another machine with Python 2.7.1 and numpy >> 1.5.1 >> the SVD does converge however it contains 1 NaN singular value and 3 >> negative singular values of the order -10^-1 (singular values should >> always be non-negative). >> >> I also tried computing the SVD of the matrix using Octave 3.2.4 and >> Matlab >> 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. Any >> help >> is greatly appreciated. >> >> Thanks in advance, >> Charanpal From borreguero at gmail.com Fri Aug 12 07:53:29 2011 From: borreguero at gmail.com (Jose Borreguero) Date: Fri, 12 Aug 2011 07:53:29 -0400 Subject: [Numpy-discussion] how to create a block diagonal matrix by repeating the block? In-Reply-To: References: Message-ID: Thanks! Jose On Thu, Aug 11, 2011 at 8:15 PM, Fernando Perez wrote: > On Thu, Aug 11, 2011 at 4:43 PM, Jose Borreguero > wrote: > > a = random.randn(3,3) > > b = a.reshape(1,3,3).repeat(50,axis=0) > > scipy.linalg.block_diag( *b ) > > > > slightly simpler, but equivalent, code: > > b = [a]*50 > scipy.linalg.block_diag( *b) > > Cheers, > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.gavana at gmail.com Fri Aug 12 09:32:05 2011 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Fri, 12 Aug 2011 15:32:05 +0200 Subject: [Numpy-discussion] Statistical distributions on samples Message-ID: Hi All, I am working on something that appeared to be a no-brainer issue (at the beginning), by my complete ignorance in statistics is overwhelming and I got stuck. What I am trying to do can be summarized as follows Let's assume that I have to generate a sample of a 1,000 values for a variable (let's say, "velocity") using a normal distribution (but later I will have to do it with log-normal, triangular and a couple of others). The only thing I know about this velocity sample is the minimum and maximum values (let's say 50 and 200 respectively) and, obviously for the normal distribution (but not so for the other distributions), the mean value (125 in this case). Now, I would like to generate this sample of 1,000 points, in which none of the point has velocity smaller than 50 or bigger than 200, and the number of samples close to the mean (125) should be higher than the number of samples close to the minimum and the maximum, following some kind of normal distribution. What I have tried up to now is summarized in the code below, but as you can easily see, I don't really know what I am doing. I am open to every suggestion, and I apologize for the dumbness of my question. import numpy from scipy import stats import matplotlib.pyplot as plt minval, maxval = 50.0, 250.0 x = numpy.linspace(minval, maxval, 500) samp = stats.norm.rvs(size=len(x)) pdf = stats.norm.pdf(x) cdf = stats.norm.cdf(x) ppf = stats.norm.ppf(x) ax1 = plt.subplot(2, 2, 1) ax1.plot(range(len(x)), samp) ax2 = plt.subplot(2, 2, 2) ax2.plot(x, pdf) ax3 = plt.subplot(2, 2, 3) ax3.plot(x, cdf) ax4 = plt.subplot(2, 2, 4) ax4.plot(x, ppf) plt.show() Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Fri Aug 12 09:33:46 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 12 Aug 2011 08:33:46 -0500 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: <06f59405fc0dfce9e04f83d001963a23@telecom-paristech.fr> References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> <06f59405fc0dfce9e04f83d001963a23@telecom-paristech.fr> Message-ID: On Fri, Aug 12, 2011 at 4:03 AM, Charanpal Dhanjal < dhanjal at telecom-paristech.fr> wrote: > Thank Nadav for testing out the matrix. I wonder if you had a chance to > check if the resulting decomposition contained NaN or Inf values? > > As far I understood, numpy.linalg.svd uses routines in LAPACK and ATLAS > (if available) to compute the corresponding SVD. I did some > complementary tests on Debian Squeeze on an Intel Xeon W3550 CPU and the > call to numpy.linalg.svd results in the LinAlgError "SVD did not > converge", however the test leading to results containing NaN values ran > on Debian Lenny on an Intel Core 2 Quad. In both of these situations we > use Python 2.7.1 and numpy 1.5.1 (without ATLAS), and so the reasons for > the differences seem to be OS or processor dependent. Any ideas? > > Charanpal > > Date: Thu, 11 Aug 2011 07:21:09 -0700 > From: Nadav Horesh > Subject: Re: [Numpy-discussion] SVD does not converge on "clean" > matrix > To: Discussion of Numerical Python > Message-ID: > > <26FC23E7C398A64083C980D16001012D246DFC5F90 at VA3DIAXVS361.RED001.local> > Content-Type: text/plain; charset="us-ascii" > > > > Had no problem on a gentoo 64 bit machine using atlas 3.8.0 (Core I7, > > python 2.7.2, numpy versions1.60 and 1.6.1) > Another data point: on Mac OS X, with Python 2.7.2 and numpy 1.6.0 (using EPD 7.1), I get the error: $ ipython --pylab Enthought Python Distribution -- www.enthought.com Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) Type "copyright", "credits" or "license" for more information. IPython 0.11.rc1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Welcome to pylab, a matplotlib-based Python environment [backend: WXAgg]. For more information, type 'help(pylab)'. In [1]: numpy.__version__ Out[1]: '1.6.0' In [2]: arr = load('matrix_leading_to_bad_SVD.npz')['arr_0'] In [3]: np.linalg.svd(arr) --------------------------------------------------------------------------- LinAlgError Traceback (most recent call last) /Users/warren/tmp/ in () ----> 1 np.linalg.svd(arr) /Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/linalg/linalg.py in svd(a, full_matrices, compute_uv) 1319 work, lwork, iwork, 0) 1320 if results['info'] > 0: -> 1321 raise LinAlgError, 'SVD did not converge' 1322 s = s.astype(_realType(result_t)) 1323 if compute_uv: LinAlgError: SVD did not converge Warren > > > > Nadav > > >On Thu, 11 Aug 2011 15:23:22 +0200, dhanjal at telecom-paristech.fr > > wrote: > >> Hi all, > >> > >> I get an error message "numpy.linalg.linalg.LinAlgError: SVD did not > >> converge" when calling numpy.linalg.svd on a "clean" matrix of size > >> (1952, > >> 895). The matrix is clean in the sense that it contains no NaN or > >> Inf > >> values. The corresponding npz file is available here: > >> > >> > https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx&hl=fr > >> > >> Here is some information about my setup: I use Python 2.7.1 on > >> Ubuntu > >> 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be > >> solved > >> by recompiling numpy with my local ATLAS library (version 3.8.3), > >> and this > >> didn't seem to help. On another machine with Python 2.7.1 and numpy > >> 1.5.1 > >> the SVD does converge however it contains 1 NaN singular value and 3 > >> negative singular values of the order -10^-1 (singular values should > >> always be non-negative). > >> > >> I also tried computing the SVD of the matrix using Octave 3.2.4 and > >> Matlab > >> 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. Any > >> help > >> is greatly appreciated. > >> > >> Thanks in advance, > >> Charanpal > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri Aug 12 10:30:00 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 12 Aug 2011 10:30 -0400 Subject: [Numpy-discussion] nditer confusion Message-ID: There'a a boatload of options for nditer. I need a simple explanation, maybe a few simple examples. Is there anything that might help? From cjordan1 at uw.edu Fri Aug 12 10:53:12 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Fri, 12 Aug 2011 09:53:12 -0500 Subject: [Numpy-discussion] Statistical distributions on samples In-Reply-To: References: Message-ID: Hi Andrea--An easy way to get something like this would be import numpy as np import scipy.stats as stats sigma = #some reasonable standard deviation for your application x = stats.norm.rvs(size=1000, loc=125, scale=sigma) x = x[x>50] x = x[x<200] That will give a roughly normal distribution to your velocities, as long as, say, sigma<25. (I'm using the rule of thumb for the normal distribution that normal random samples lie 3 standard deviations away from the mean about 1 out of 350 times.) Though you won't be able to get exactly normal errors about your mean since normal random samples can theoretically be of any size. You can use this same process for any other distribution, as long as you've chosen a scale variable so that the probability of samples being outside your desired interval is really small. Of course, once again your random errors won't be exactly from the distribution you get your original samples from. -Chris JS On Fri, Aug 12, 2011 at 8:32 AM, Andrea Gavana wrote: > Hi All, > > I am working on something that appeared to be a no-brainer issue (at > the beginning), by my complete ignorance in statistics is overwhelming and I > got stuck. > > What I am trying to do can be summarized as follows > > Let's assume that I have to generate a sample of a 1,000 values for a > variable (let's say, "velocity") using a normal distribution (but later I > will have to do it with log-normal, triangular and a couple of others). The > only thing I know about this velocity sample is the minimum and maximum > values (let's say 50 and 200 respectively) and, obviously for the normal > distribution (but not so for the other distributions), the mean value (125 > in this case). > > Now, I would like to generate this sample of 1,000 points, in which none of > the point has velocity smaller than 50 or bigger than 200, and the number of > samples close to the mean (125) should be higher than the number of samples > close to the minimum and the maximum, following some kind of normal > distribution. > > What I have tried up to now is summarized in the code below, but as you can > easily see, I don't really know what I am doing. I am open to every > suggestion, and I apologize for the dumbness of my question. > > import numpy > > from scipy import stats > import matplotlib.pyplot as plt > > minval, maxval = 50.0, 250.0 > x = numpy.linspace(minval, maxval, 500) > > samp = stats.norm.rvs(size=len(x)) > pdf = stats.norm.pdf(x) > cdf = stats.norm.cdf(x) > ppf = stats.norm.ppf(x) > > ax1 = plt.subplot(2, 2, 1) > ax1.plot(range(len(x)), samp) > > ax2 = plt.subplot(2, 2, 2) > ax2.plot(x, pdf) > > ax3 = plt.subplot(2, 2, 3) > ax3.plot(x, cdf) > > ax4 = plt.subplot(2, 2, 4) > ax4.plot(x, ppf) > > plt.show() > > > Andrea. > > "Imagination Is The Only Weapon In The War Against Reality." > http://xoomer.alice.it/infinity77/ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nadavh at visionsense.com Fri Aug 12 13:23:53 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Fri, 12 Aug 2011 10:23:53 -0700 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> <06f59405fc0dfce9e04f83d001963a23@telecom-paristech.fr>, Message-ID: <26FC23E7C398A64083C980D16001012D246DFC5F95@VA3DIAXVS361.RED001.local> I tested all the the result 3 matrices with alltrue(infinite(mat)) and got True answer for all of them. Nadav ________________________________ From: numpy-discussion-bounces at scipy.org [numpy-discussion-bounces at scipy.org] On Behalf Of Warren Weckesser [warren.weckesser at enthought.com] Sent: 12 August 2011 16:33 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] SVD does not converge on "clean" matrix On Fri, Aug 12, 2011 at 4:03 AM, Charanpal Dhanjal > wrote: Thank Nadav for testing out the matrix. I wonder if you had a chance to check if the resulting decomposition contained NaN or Inf values? As far I understood, numpy.linalg.svd uses routines in LAPACK and ATLAS (if available) to compute the corresponding SVD. I did some complementary tests on Debian Squeeze on an Intel Xeon W3550 CPU and the call to numpy.linalg.svd results in the LinAlgError "SVD did not converge", however the test leading to results containing NaN values ran on Debian Lenny on an Intel Core 2 Quad. In both of these situations we use Python 2.7.1 and numpy 1.5.1 (without ATLAS), and so the reasons for the differences seem to be OS or processor dependent. Any ideas? Charanpal Date: Thu, 11 Aug 2011 07:21:09 -0700 From: Nadav Horesh > Subject: Re: [Numpy-discussion] SVD does not converge on "clean" matrix To: Discussion of Numerical Python > Message-ID: <26FC23E7C398A64083C980D16001012D246DFC5F90 at VA3DIAXVS361.RED001.local> Content-Type: text/plain; charset="us-ascii" > Had no problem on a gentoo 64 bit machine using atlas 3.8.0 (Core I7, > python 2.7.2, numpy versions1.60 and 1.6.1) Another data point: on Mac OS X, with Python 2.7.2 and numpy 1.6.0 (using EPD 7.1), I get the error: $ ipython --pylab Enthought Python Distribution -- www.enthought.com Python 2.7.2 |EPD 7.1-1 (32-bit)| (default, Jul 3 2011, 15:40:35) Type "copyright", "credits" or "license" for more information. IPython 0.11.rc1 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object', use 'object??' for extra details. Welcome to pylab, a matplotlib-based Python environment [backend: WXAgg]. For more information, type 'help(pylab)'. In [1]: numpy.__version__ Out[1]: '1.6.0' In [2]: arr = load('matrix_leading_to_bad_SVD.npz')['arr_0'] In [3]: np.linalg.svd(arr) --------------------------------------------------------------------------- LinAlgError Traceback (most recent call last) /Users/warren/tmp/ in () ----> 1 np.linalg.svd(arr) /Library/Frameworks/Python.framework/Versions/7.1/lib/python2.7/site-packages/numpy/linalg/linalg.py in svd(a, full_matrices, compute_uv) 1319 work, lwork, iwork, 0) 1320 if results['info'] > 0: -> 1321 raise LinAlgError, 'SVD did not converge' 1322 s = s.astype(_realType(result_t)) 1323 if compute_uv: LinAlgError: SVD did not converge Warren > > Nadav >On Thu, 11 Aug 2011 15:23:22 +0200, dhanjal at telecom-paristech.fr > wrote: >> Hi all, >> >> I get an error message "numpy.linalg.linalg.LinAlgError: SVD did not >> converge" when calling numpy.linalg.svd on a "clean" matrix of size >> (1952, >> 895). The matrix is clean in the sense that it contains no NaN or >> Inf >> values. The corresponding npz file is available here: >> >> https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx&hl=fr >> >> Here is some information about my setup: I use Python 2.7.1 on >> Ubuntu >> 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be >> solved >> by recompiling numpy with my local ATLAS library (version 3.8.3), >> and this >> didn't seem to help. On another machine with Python 2.7.1 and numpy >> 1.5.1 >> the SVD does converge however it contains 1 NaN singular value and 3 >> negative singular values of the order -10^-1 (singular values should >> always be non-negative). >> >> I also tried computing the SVD of the matrix using Octave 3.2.4 and >> Matlab >> 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. Any >> help >> is greatly appreciated. >> >> Thanks in advance, >> Charanpal _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Aug 12 14:35:13 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 12 Aug 2011 11:35:13 -0700 Subject: [Numpy-discussion] nditer confusion In-Reply-To: References: Message-ID: I'll write up some more introductory-style documentation, you're right that the examples I put in the reference page aren't a nice simple starting point. Will post back here for feedback when I have a draft for you to review. Cheers, Mark On Fri, Aug 12, 2011 at 7:30 AM, Neal Becker wrote: > There'a a boatload of options for nditer. I need a simple explanation, > maybe a > few simple examples. Is there anything that might help? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Fri Aug 12 16:06:49 2011 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 12 Aug 2011 16:06:49 -0400 Subject: [Numpy-discussion] Theano 0.4.1 released Message-ID: =========================== Announcing Theano 0.4.1 =========================== This is an important release, with lots of new features, bug fixes and some deprecation warning. The upgrade is recommended for everybody. For those using the bleeding edge version in the mercurial repository, we encourage you to update to the `0.4.1` tag. What's New ---------- New features: * `R_op `_ macro like theano.tensor.grad * Not all tests are done yet (TODO) * Added alias theano.tensor.bitwise_{and,or,xor,not}. They are the numpy names. * Updates returned by Scan (you need to pass them to the theano.function) are now a new Updates class. That allow more check and easier work with them. The Updates class is a subclass of dict * Scan can now work in a "do while" loop style. * We scan until a condition is met. * There is a minimum of 1 iteration(can't do "while do" style loop) * The "Interactive Debugger" (compute_test_value theano flags) * Now should work with all ops (even the one with only C code) * In the past some errors were caught and re-raised as unrelated errors (ShapeMismatch replaced with NotImplemented). We don't do that anymore. * The new Op.make_thunk function(introduced in 0.4.0) is now used by constant_folding and DebugMode * Added A_TENSOR_VARIABLE.astype() as a way to cast. NumPy allows this syntax. * New BLAS GER implementation. * Insert GEMV more frequently. * Added new ifelse(scalar condition, rval_if_true, rval_if_false) Op. * This is a subset of the elemwise switch (tensor condition, rval_if_true, rval_if_false). * With the new feature in the sandbox, only one of rval_if_true or rval_if_false will be evaluated. Optimizations: * Subtensor has C code * {Inc,Set}Subtensor has C code * ScalarFromTensor has C code * dot(zeros,x) and dot(x,zeros) * IncSubtensor(x, zeros, idx) -> x * SetSubtensor(x, x[idx], idx) -> x (when x is a constant) * subtensor(alloc,...) -> alloc * Many new scan optimization * Lower scan execution overhead with a Cython implementation * Removed scan double compilation (by using the new Op.make_thunk mechanism) * Certain computations from the inner graph are now Pushed out into the outer graph. This means they are not re-comptued at every step of scan. * Different scan ops get merged now into a single op (if possible), reducing the overhead and sharing computations between the two instances GPU: * PyCUDA/CUDAMat/Gnumpy/Theano bridge and `documentation `_. * New function to easily convert pycuda GPUArray object to and from CudaNdarray object * Fixed a bug if you crated a view of a manually created CudaNdarray that are view of GPUArray. * Removed a warning when nvcc is not available and the user did not requested it. * renamed config option cuda.nvccflags -> nvcc.flags * Allow GpuSoftmax and GpuSoftmaxWithBias to work with bigger input. Bugs fixed: * In one case an AdvancedSubtensor1 could be converted to a GpuAdvancedIncSubtensor1 insted of GpuAdvancedSubtensor1. It probably didn't happen due to the order of optimizations, but that order is not guaranteed to be the same on all computers. * Derivative of set_subtensor was wrong. * Derivative of Alloc was wrong. Crash fixed: * On an unusual Python 2.4.4 on Windows * When using a C cache copied from another location * On Windows 32 bits when setting a complex64 to 0. * Compilation crash with CUDA 4 * When wanting to copy the compilation cache from a computer to another * This can be useful for using Theano on a computer without a compiler. * GPU: * Compilation crash fixed under Ubuntu 11.04 * Compilation crash fixed with CUDA 4.0 Know bug: * CAReduce with nan in inputs don't return the good output (`Ticket `_). * This is used in tensor.{max,mean,prod,sum} and in the grad of PermuteRowElements. * This is not a new bug, just a bug discovered since the last release that we didn't had time to fix. Deprecation (will be removed in Theano 0.5, warning generated if you use them): * The string mode (accepted only by theano.function()) FAST_RUN_NOGC. Use Mode(linker='c|py_nogc') instead. * The string mode (accepted only by theano.function()) STABILIZE. Use Mode(optimizer='stabilize') instead. * scan interface change: * The use of `return_steps` for specifying how many entries of the output scan has been depricated * The same thing can be done by applying a subtensor on the output return by scan to select a certain slice * The inner function (that scan receives) should return its outputs and updates following this order: [outputs], [updates], [condition]. One can skip any of the three if not used, but the order has to stay unchanged. * tensor.grad(cost, wrt) will return an object of the "same type" as wrt (list/tuple/TensorVariable). * Currently tensor.grad return a type list when the wrt is a list/tuple of more then 1 element. Sandbox: * MRG random generator now implements the same casting behavior as the regular random generator. Sandbox New features(not enabled by default): * New Linkers (theano flags linker={vm,cvm}) * The new linker allows lazy evaluation of the new ifelse op, meaning we compute only the true or false branch depending of the condition. This can speed up some types of computation. * Uses a new profiling system (that currently tracks less stuff) * The cvm is implemented in C, so it lowers Theano's overhead. * The vm is implemented in python. So it can help debugging in some cases. * In the future, the default will be the cvm. * Some new not yet well tested sparse ops: theano.sparse.sandbox.{SpSum, Diag, SquareDiagonal, ColScaleCSC, RowScaleCSC, Remove0, EnsureSortedIndices, ConvolutionIndices} Documentation: * How to compute the `Jacobian, Hessian, Jacobian times a vector, Hessian times a vector `_. * Slide for a 3 hours class with exercises that was done at the HPCS2011 Conference in Montreal. Others: * Logger name renamed to be consistent. * Logger function simplified and made more consistent. * Fixed transformation of error by other not related error with the compute_test_value Theano flag. * Compilation cache enhancements. * Made compatible with NumPy 1.6 and SciPy 0.9 * Fix tests when there was new dtype in NumPy that is not supported by Theano. * Fixed some tests when SciPy is not available. * Don't compile anything when Theano is imported. Compile support code when we compile the first C code. * Python 2.4 fix: * Fix the file theano/misc/check_blas.py * For python 2.4.4 on Windows, replaced float("inf") with numpy.inf. * Removes useless inputs to a scan node * Beautification mostly, making the graph more visible. Such inputs would appear as a consequence of other optimizations Core: * there is a new mechanism that lets an Op permit that one of its inputs to be aliased to another destroyed input. This will generally result in incorrect calculation, so it should be used with care! The right way to use it is when the caller can guarantee that even if these two inputs look aliased, they actually will never overlap. This mechanism can be used, for example, by a new alternative approach to implementing Scan. If an op has an attribute called "destroyhandler_tolerate_aliased" then this is what's going on. IncSubtensor is thus far the only Op to use this mechanism.Mechanism Download -------- You can download Theano from http://pypi.python.org/pypi/Theano. Description ----------- Theano is a Python library that allows you to define, optimize, and efficiently evaluate mathematical expressions involving multi-dimensional arrays. It is built on top of NumPy. Theano features: * tight integration with NumPy: a similar interface to NumPy's. numpy.ndarrays are also used internally in Theano-compiled functions. * transparent use of a GPU: perform data-intensive computations up to 140x faster than on a CPU (support for float32 only). * efficient symbolic differentiation: Theano can compute derivatives for functions of one or many inputs. * speed and stability optimizations: avoid nasty bugs when computing expressions such as log(1+ exp(x)) for large values of x. * dynamic C code generation: evaluate expressions faster. * extensive unit-testing and self-verification: includes tools for detecting and diagnosing bugs and/or potential problems. Theano has been powering large-scale computationally intensive scientific research since 2007, but it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal). Resources --------- About Theano: http://deeplearning.net/software/theano/ About NumPy: http://numpy.scipy.org/ About SciPy: http://www.scipy.org/ Machine Learning Tutorial with Theano on Deep Architectures: http://deeplearning.net/tutorial/ Acknowledgments --------------- I would like to thank all contributors of Theano. For this particular release, here is the people that contributed code and/or documentation: (in alphabetical order) Frederic Bastien, James Bergstra, Olivier Delalleau, Xavier Glorot, Ian Goodfellow, Pascal Lamblin, Gr?goire Mesnil, Razvan Pascanu, Ilya Sutskever and David Warde-Farley Also, thank you to all NumPy and Scipy developers as Theano builds on its strength. All questions/comments are always welcome on the Theano mailing-lists ( http://deeplearning.net/software/theano/ ) From ralf.gommers at googlemail.com Sat Aug 13 11:58:41 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 13 Aug 2011 17:58:41 +0200 Subject: [Numpy-discussion] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) Message-ID: On Thu, Aug 11, 2011 at 8:19 PM, Jonathan Guyer wrote: > > On Aug 10, 2011, at 5:16 PM, Ralf Gommers wrote: > > > Ah, with "svn" you actually meant svn:) I thought that was supposed to > not even work anymore. > > It does work and it's confusing. I had not been following the transition > closely and so was under the impression that the svn repository was being > mirrored from git. It's not. It's just old. > > Who can disable SVN access for numpy and scipy? There are still plenty of links to http://svn.scipy.org/svn/numpy/trunk/ floating around that can confuse users. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sat Aug 13 12:14:11 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 13 Aug 2011 18:14:11 +0200 Subject: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) In-Reply-To: References: Message-ID: On Sat, Aug 13, 2011 at 6:00 PM, Ognen Duzlevski wrote: > On Sat, Aug 13, 2011 at 11:58 AM, Ralf Gommers < > ralf.gommers at googlemail.com> wrote: > >> >> >> On Thu, Aug 11, 2011 at 8:19 PM, Jonathan Guyer wrote: >> >>> >>> On Aug 10, 2011, at 5:16 PM, Ralf Gommers wrote: >>> >>> > Ah, with "svn" you actually meant svn:) I thought that was supposed to >>> not even work anymore. >>> >>> It does work and it's confusing. I had not been following the transition >>> closely and so was under the impression that the svn repository was being >>> mirrored from git. It's not. It's just old. >>> >>> Who can disable SVN access for numpy and scipy? There are still plenty of >> links to http://svn.scipy.org/svn/numpy/trunk/ floating around that can >> confuse users. >> >> Ralf >> > > Hi Ognen, > Ralf, > > I am the new Enthought sys admin. Is there anything I can do to help? > > We should check if there's still any code in SVN branches that is useful. If so the people who are interested in it should move it somewhere else. Anyone? After that I think you can pull the plug on http://svn.scipy.org/svn/numpy/and http://svn.scipy.org/svn/scipy/. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 13 15:13:25 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 13 Aug 2011 13:13:25 -0600 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> Message-ID: On Thu, Aug 11, 2011 at 7:23 AM, wrote: > Hi all, > > I get an error message "numpy.linalg.linalg.LinAlgError: SVD did not > converge" when calling numpy.linalg.svd on a "clean" matrix of size (1952, > 895). The matrix is clean in the sense that it contains no NaN or Inf > values. The corresponding npz file is available here: > > https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx&hl=fr > > Here is some information about my setup: I use Python 2.7.1 on Ubuntu > 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be solved > by recompiling numpy with my local ATLAS library (version 3.8.3), and this > didn't seem to help. On another machine with Python 2.7.1 and numpy 1.5.1 > the SVD does converge however it contains 1 NaN singular value and 3 > negative singular values of the order -10^-1 (singular values should > always be non-negative). > > I also tried computing the SVD of the matrix using Octave 3.2.4 and Matlab > 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. Any help > is greatly appreciated. > > Thanks in advance, > Charanpal > > > Fails here also, fedora 15 64 bits AMD 940. There should be a maximum iterations argument somewhere... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 13 15:42:19 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 13 Aug 2011 13:42:19 -0600 Subject: [Numpy-discussion] bug with latest numpy git snapshot build with Python3 In-Reply-To: References: Message-ID: 2011/8/11 Dmitrey > bug in KUBUNTU 11.04, latest numpy git snapshot build with Python3 > >>> import numpy > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python3.2/dist-packages/numpy/__init__.py", line > 137, in > from . import add_newdocs > File "/usr/local/lib/python3.2/dist-packages/numpy/add_newdocs.py", line > 9, in > from numpy.lib import add_newdoc > File "/usr/local/lib/python3.2/dist-packages/numpy/lib/__init__.py", line > 4, in > from .type_check import * > File "/usr/local/lib/python3.2/dist-packages/numpy/lib/type_check.py", > line 8, in > import numpy.core.numeric as _nx > File "/usr/local/lib/python3.2/dist-packages/numpy/core/__init__.py", > line 10, in > from .numeric import > * > File "/usr/local/lib/python3.2/dist-packages/numpy/core/numeric.py", line > 27, in > import > multiarray > ImportError: No module named multiarray > > I don't see this. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Aug 13 15:45:23 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 13 Aug 2011 19:45:23 +0000 (UTC) Subject: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) References: Message-ID: Sat, 13 Aug 2011 18:14:11 +0200, Ralf Gommers wrote: [clip] > We should check if there's still any code in SVN branches that is > useful. > If so the people who are interested in it should move it somewhere else. > Anyone? All the SVN branches are available in Git, though some are hidden. Do git fetch upstream +refs/*:refs/remotes/upstream/everything/* and you shall receive (also some Github's internal branches named pull/*). However, AFAIK, there's not so much useful in there. In any case, as far as I'm aware, the SVN can be safely be turned off, both for Numpy and Scipy. The admins can access the original repository on the server, so if something turns out to be missed, it can be brought back. Pauli From paul.anton.letnes at gmail.com Sat Aug 13 16:00:57 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sat, 13 Aug 2011 21:00:57 +0100 Subject: [Numpy-discussion] bug with latest numpy git snapshot build with Python3 In-Reply-To: References: Message-ID: On 13. aug. 2011, at 20.42, Charles R Harris wrote: > > > 2011/8/11 Dmitrey > bug in KUBUNTU 11.04, latest numpy git snapshot build with Python3 > >>> import numpy > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/lib/python3.2/dist-packages/numpy/__init__.py", line 137, in > from . import add_newdocs > File "/usr/local/lib/python3.2/dist-packages/numpy/add_newdocs.py", line 9, in > from numpy.lib import add_newdoc > File "/usr/local/lib/python3.2/dist-packages/numpy/lib/__init__.py", line 4, in > from .type_check import * > File "/usr/local/lib/python3.2/dist-packages/numpy/lib/type_check.py", line 8, in > import numpy.core.numeric as _nx > File "/usr/local/lib/python3.2/dist-packages/numpy/core/__init__.py", line 10, in > from .numeric import * > File "/usr/local/lib/python3.2/dist-packages/numpy/core/numeric.py", line 27, in > import multiarray > ImportError: No module named multiarray > > > I don't see this. > > Chuck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Mac OS X 10.6.8, python3.2, I don't see this either. "import multiarray" does not work, but "import numpy" works beautifully. Paul From mwwiebe at gmail.com Sat Aug 13 18:00:41 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 13 Aug 2011 15:00:41 -0700 Subject: [Numpy-discussion] nditer confusion In-Reply-To: References: Message-ID: I've made a pull request with some fairly extensive introductory material. It's available here: https://github.com/numpy/numpy/pull/138 It walks through nditer usage starting with basic iteration of one array, through broadcasting and iterator-allocated outputs, and finally covers accelerating the inner loop with Cython. Please read and review! Thanks, Mark On Fri, Aug 12, 2011 at 7:30 AM, Neal Becker wrote: > There'a a boatload of options for nditer. I need a simple explanation, > maybe a > few simple examples. Is there anything that might help? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Aug 13 20:06:59 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 13 Aug 2011 17:06:59 -0700 Subject: [Numpy-discussion] Questionable reduceat behavior In-Reply-To: References: Message-ID: Looks like this is the second-oldest open bug in the bug tracker. http://projects.scipy.org/numpy/ticket/236 For what it's worth, I'm in favour of changing this behavior to be more consistent as proposed in that ticket. -Mark On Thu, Aug 11, 2011 at 11:25 AM, Wes McKinney wrote: > I'm a little perplexed why reduceat was made to behave like this: > > In [26]: arr = np.ones((10, 4), dtype=bool) > > In [27]: arr > Out[27]: > array([[ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True], > [ True, True, True, True]], dtype=bool) > > > In [30]: np.add.reduceat(arr, [0, 3, 3, 7, 9], axis=0) > Out[30]: > array([[3, 3, 3, 3], > [1, 1, 1, 1], > [4, 4, 4, 4], > [2, 2, 2, 2], > [1, 1, 1, 1]]) > > this does not seem intuitively correct. Since we have: > > In [33]: arr[3:3].sum(0) > Out[33]: array([0, 0, 0, 0]) > > I would expect > > array([[3, 3, 3, 3], > [0, 0, 0, 0], > [4, 4, 4, 4], > [2, 2, 2, 2], > [1, 1, 1, 1]]) > > Obviously I can RTFM and see why it does this ("if ``indices[i] >= > indices[i + 1]``, the i-th generalized "row" is simply > ``a[indices[i]]``"), but it doesn't make much sense to me, and I need > work around it. Suggestions? > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Aug 13 20:17:32 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 13 Aug 2011 17:17:32 -0700 Subject: [Numpy-discussion] bug with assignment into an indexed array? In-Reply-To: References: Message-ID: On Thu, Aug 11, 2011 at 1:37 PM, Benjamin Root wrote: > On Thu, Aug 11, 2011 at 10:33 AM, Olivier Delalleau wrote: > >> 2011/8/11 Benjamin Root >> >>> >>> >>> On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau wrote: >>> >>>> Maybe confusing, but working as expected. >>>> >>>> >>>> When you write: >>>> matched_to[np.array([0, 1, 2])] = 3 >>>> it calls __setitem__ on matched_to, with arguments (np.array([0, 1, 2]), >>>> 3). So numpy understand you want to write 3 at these indices. >>>> >>>> >>>> When you write: >>>> matched_to[:3][match] = 3 >>>> it first calls __getitem__ with the slice as argument, which returns a >>>> view of your array, then it calls __setitem__ on this view, and it fills >>>> your matched_to array at the same time. >>>> >>>> >>>> But when you write: >>>> matched_to[np.array([0, 1, 2])][match] = 3 >>>> it first calls __getitem__ with the array as argument, which retunrs a >>>> *copy* of your array, so that calling __setitem__ on this copy has no effect >>>> on your original array. >>>> >>>> -=- Olivier >>>> >>>> >>> Right, but I guess my question is does it *have* to be that way? I guess >>> it makes some sense with respect to indexing with a numpy array like I did >>> with the last example, because an element could be referred to multiple >>> times (which explains the common surprise with '+='), but with boolean >>> indexing, we are guaranteed that each element of the view will appear at >>> most once. Therefore, shouldn't boolean indexing always return a view, not >>> a copy? Is the general case of arbitrary array selection inherently >>> impossible to encode in a view versus a slice with a regular spacing? >>> >> >> Yes, due to the fact the array interface only supports regular spacing >> (otherwise it is more difficult to get efficient access to arbitrary array >> positions). >> >> -=- Olivier >> >> > This still bothers me, though. I imagine that it is next to impossible to > detect this situation from numpy's perspective, so it can't even emit a > warning or error. Furthermore, for someone who makes a general function to > modify the contents of some externally provided array, there is a > possibility that the provided array is actually a copy not a view. > Although, I guess it is the responsibility of the user to know the > difference. > > I guess that is the key problem. The key advantage we are taught about > numpy arrays is the use of views for efficient access. It would seem that > most access operations would use it, but in reality, only sliced access do. > Everything else is a copy (unless you are doing fancy indexing with > assignment). Maybe with some of the forthcoming changes that have been done > with respect to nditer and ufuncs (in particular, I am thinking of the > "where" kwarg), maybe we could consider an enhancement allowing fancy > indexing (or at least boolean indexing) to produce a view? Even if it is > less efficient than a view from slicing, it would bring better consistency > in behavior between the different forms of indexing. > > Just my 2 cents, > Ben Root > I think it would be nice to evolve the NumPy indexing and array representation towards the goal of indexing returning a view in all cases with no exceptions. This would provide a much nicer mental model to program with. Accomplishing such a transition will take a fair bit of time, though. -Mark > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sat Aug 13 22:00:33 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 13 Aug 2011 22:00:33 -0400 Subject: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) In-Reply-To: References: Message-ID: On Sat, Aug 13, 2011 at 3:45 PM, Pauli Virtanen wrote: > Sat, 13 Aug 2011 18:14:11 +0200, Ralf Gommers wrote: > [clip] >> We should check if there's still any code in SVN branches that is >> useful. >> If so the people who are interested in it should move it somewhere else. >> Anyone? > > All the SVN branches are available in Git, though some are hidden. Do > > ? ? ? ?git fetch upstream +refs/*:refs/remotes/upstream/everything/* > > and you shall receive (also some Github's internal branches named pull/*). > However, AFAIK, there's not so much useful in there. > > In any case, as far as I'm aware, the SVN can be safely be turned off, > both for Numpy and Scipy. The admins can access the original repository > on the server, so if something turns out to be missed, it can be brought > back. > > ? ? ? ?Pauli Does Trac require svn access to dig out old information? for example links to old changesets, annotate/blame, ... ? Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From dhanjal at telecom-paristech.fr Sun Aug 14 08:22:07 2011 From: dhanjal at telecom-paristech.fr (Charanpal Dhanjal) Date: Sun, 14 Aug 2011 14:22:07 +0200 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> Message-ID: I had a quick look at the code (https://github.com/numpy/numpy/blob/master/numpy/linalg/linalg.py) and the numpy.linalg.svd function calls lapack_lite.dgesdd (for real matrices) so I guess the non-convergence occurs in this function. As I understood lapack_lite is used by default unless numpy is installed with ATLAS/MKL etc. I wonder why svd works for Nadav and not for anyone else? Any ideas anyone? Charanpal On Sat, 13 Aug 2011 13:13:25 -0600, Charles R Harris wrote: > On Thu, Aug 11, 2011 at 7:23 AM, wrote: > >> Hi all, >> >> I get an error message "numpy.linalg.linalg.LinAlgError: SVD did >> not >> converge" when calling numpy.linalg.svd on a "clean" matrix of size >> (1952, >> 895). The matrix is clean in the sense that it contains no NaN or >> Inf >> values. The corresponding npz file is available here: >> > > https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx&hl=fr >> [1] >> >> Here is some information about my setup: I use Python 2.7.1 on >> Ubuntu >> 11.04 with numpy 1.6.1. Furthermore, I thought the problem might be >> solved >> by recompiling numpy with my local ATLAS library (version 3.8.3), >> and this >> didnt seem to help. On another machine with Python 2.7.1 and numpy >> 1.5.1 >> the SVD does converge however it contains 1 NaN singular value and >> 3 >> negative singular values of the order -10^-1 (singular values >> should >> always be non-negative). >> >> I also tried computing the SVD of the matrix using Octave 3.2.4 and >> Matlab >> 7.10.0.499 (R2010a) 64-bit (glnxa64) and there were no problems. >> Any help >> is greatly appreciated. >> >> Thanks in advance, >> Charanpal > > Fails here also, fedora 15 64 bits AMD 940. There should be a maximum > iterations argument somewhere... > > Chuck > > > > Links: > ------ > [1] > > https://docs.google.com/leaf?id=0Bw0NXKxxc40jMWEyNTljMWUtMzBmNS00NGZmLThhZWUtY2I2MWU2MGZiNDgx|+|amp|+|hl=fr > [2] mailto:dhanjal at telecom-paristech.fr From lou_boog2000 at yahoo.com Sun Aug 14 10:27:06 2011 From: lou_boog2000 at yahoo.com (Lou Pecora) Date: Sun, 14 Aug 2011 07:27:06 -0700 (PDT) Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> Message-ID: <1313332026.61861.YahooMailNeo@web34404.mail.mud.yahoo.com> Chuck wrote: ________________________________ Fails here also, fedora 15 64 bits AMD 940. There should be a maximum iterations argument somewhere... Chuck --------------------------------------------------- ?? *** ?Here's the "FIX": Chuck is right. ?There is a max iterations. ?Here is a reply from a thread of mine in this group several years ago about this problem and some comments that might help you. ---- From Mr.?Damian Menscher who was kind enough to find the iteration location and provide some insight: Ok, so after several hours of trying to read that code, I found the parameter that needs to be tuned. ?In case anyone has this problem and finds this thread a year from now, here's your hint: File: Src/dlapack_lite.c Subroutine: dlasd4_ Line: 22562 There's a for loop there that limits the number of iterations to 20. ?Increasing this value to 50 allows my matrix to converge. I have not bothered to test what the "best" value for this number is, though. ?In any case, it appears the number just exists to prevent infinite loops, and 50 isn't really that much closer to infinity than 20.... ?(Actually, I'm just going to set it to 100 so I don't have to think about it ever again.) Damian Menscher --? -=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign |#=- -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 |#=- -=#| 1412 DCL, Workstation Services Group, CITES Ofc:(217)244-3862 |#=- -=#| www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- ---- My reply and a "fix" of sorts without changing the hard coded iteration max: I have looked in Src/dlapack_lite.c and line 22562 is no longer a line that sets a max. iterations parameter. ?There are several set in the file, but that code is hard to figure (sort of a Fortran-in-C hybrid). ? Here's one, for example: ?? ?maxit = *n * 6 * *n; ? // Line 887 I have no idea which parameter to tweak. ?Apparently this error is still in numpy (at least to my version). Does anyone have a fix? ?Should I start a ticket (I think this is what people do)? ?Any help appreciated. I'm using a Mac Book Pro (Intel chip), system 10.4.11, Python 2.4.4. ============ Possible try/except ===========================? # ?A is the original matrix try: ?? ?U,W,VT=linalg.svd(A) except linalg.linalg.LinAlgError: ?# "Square" the matrix and do SVD ?? ?print "Got svd except, trying square of A." ?? ?A2=dot(conj(A.T),A) ?? ?U,W2,VT=linalg.svd(A2) This works so far. --------------------------------------------------------------------------------------- I've been using that simple "fix" of "squaring" the original matrix for several years and it's worked every time. ?I'm not sure why. ?It was just a test and it worked. ? You could also change the underlying C or Fortran code, but you then have to recompile everything in numpy. ?I wasn't that brave. -- Lou Pecora, my views are my own. -------------- next part -------------- An HTML attachment was scrubbed... URL: From torgil.svensson at gmail.com Sun Aug 14 11:31:24 2011 From: torgil.svensson at gmail.com (Torgil Svensson) Date: Sun, 14 Aug 2011 17:31:24 +0200 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: References: Message-ID: Try the fromiter function, that will allow you to pass an iterator which can read the file line by line and not preload the whole file. file_iterator = iter(open('filename.txt') line_parser = lambda x: map(float,x.split('\t')) a=np.fromiter(itertools.imap(line_parser,file_iterator),dtype=float) You have also the option to iterate the file twice and pass the "count" argument. //Torgil On Wed, Aug 10, 2011 at 7:22 PM, Russell E. Owen wrote: > A coworker is trying to load a 1Gb text data file into a numpy array > using numpy.loadtxt, but he says it is using up all of his machine's 6Gb > of RAM. Is there a more efficient way to read such text data files? > > -- Russell > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From wesmckinn at gmail.com Sun Aug 14 11:58:30 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Sun, 14 Aug 2011 11:58:30 -0400 Subject: [Numpy-discussion] Questionable reduceat behavior In-Reply-To: References: Message-ID: On Sat, Aug 13, 2011 at 8:06 PM, Mark Wiebe wrote: > Looks like this is the second-oldest open bug in the bug tracker. > http://projects.scipy.org/numpy/ticket/236 > For what it's worth, I'm in favour of changing this behavior to be more > consistent as proposed in that ticket. > -Mark > > On Thu, Aug 11, 2011 at 11:25 AM, Wes McKinney wrote: >> >> I'm a little perplexed why reduceat was made to behave like this: >> >> In [26]: arr = np.ones((10, 4), dtype=bool) >> >> In [27]: arr >> Out[27]: >> array([[ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True], >> ? ? ? [ True, ?True, ?True, ?True]], dtype=bool) >> >> >> In [30]: np.add.reduceat(arr, [0, 3, 3, 7, 9], axis=0) >> Out[30]: >> array([[3, 3, 3, 3], >> ? ? ? [1, 1, 1, 1], >> ? ? ? [4, 4, 4, 4], >> ? ? ? [2, 2, 2, 2], >> ? ? ? [1, 1, 1, 1]]) >> >> this does not seem intuitively correct. Since we have: >> >> In [33]: arr[3:3].sum(0) >> Out[33]: array([0, 0, 0, 0]) >> >> I would expect >> >> array([[3, 3, 3, 3], >> ? ? ? [0, 0, 0, 0], >> ? ? ? [4, 4, 4, 4], >> ? ? ? [2, 2, 2, 2], >> ? ? ? [1, 1, 1, 1]]) >> >> Obviously I can RTFM and see why it does this ("if ``indices[i] >= >> indices[i + 1]``, the i-th generalized "row" is simply >> ``a[indices[i]]``"), but it doesn't make much sense to me, and I need >> work around it. Suggestions? >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Well, I certainly hope it doesn't get forgotten about for another 5 years. I think having more consistent behavior would be better rather than conforming to a seemingly arbitrary decision made ages ago in Numeric. - Wes From alan at ajackson.org Sun Aug 14 13:43:06 2011 From: alan at ajackson.org (alan at ajackson.org) Date: Sun, 14 Aug 2011 12:43:06 -0500 Subject: [Numpy-discussion] help translating matlab to numpy Message-ID: <20110814124306.51a1aba1@ajackson.org> I'm translating some code from Matlab to numpy, and struggling a bit since I have very little knowledge of Matlab. My question is this - the arg function in Matlab (which seems to be deprecated, they don't show it in their current documentation) is exactly equivalent to what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians, but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ? Thanks! -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From silva at lma.cnrs-mrs.fr Sun Aug 14 13:55:12 2011 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Sun, 14 Aug 2011 19:55:12 +0200 Subject: [Numpy-discussion] help translating matlab to numpy In-Reply-To: <20110814124306.51a1aba1@ajackson.org> References: <20110814124306.51a1aba1@ajackson.org> Message-ID: <1313344512.2432.1.camel@amilo.coursju> Le dimanche 14 ao?t 2011 ? 12:43 -0500, alan at ajackson.org a ?crit : > I'm translating some code from Matlab to numpy, and struggling a bit > since I have very little knowledge of Matlab. > > My question is this - the arg function in Matlab (which seems to be deprecated, > they don't show it in their current documentation) is exactly equivalent to > what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians, > but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ? Can you tell from which toolbox your arg function comes from ? Using help (or which ?) for instance... It could help! -- Fabrice From alan at ajackson.org Sun Aug 14 13:56:43 2011 From: alan at ajackson.org (alan at ajackson.org) Date: Sun, 14 Aug 2011 12:56:43 -0500 Subject: [Numpy-discussion] help translating matlab to numpy In-Reply-To: <20110814124306.51a1aba1@ajackson.org> References: <20110814124306.51a1aba1@ajackson.org> Message-ID: <20110814125643.2f9311d4@ajackson.org> Never mind, I've been digging through too much stuff and got confused... I think trying to read Matlab code can do that to you. 8-) >I'm translating some code from Matlab to numpy, and struggling a bit >since I have very little knowledge of Matlab. > >My question is this - the arg function in Matlab (which seems to be deprecated, >they don't show it in their current documentation) is exactly equivalent to >what in Numpy? I know it is angle(x, deg=1) to get degrees instead of radians, >but what is the output range for the Matlab function -pi to pi, or 0 to 2*pi ? > >Thanks! > >-- >----------------------------------------------------------------------- >| Alan K. Jackson | To see a World in a Grain of Sand | >| alan at ajackson.org | And a Heaven in a Wild Flower, | >| www.ajackson.org | Hold Infinity in the palm of your hand | >| Houston, Texas | And Eternity in an hour. - Blake | >----------------------------------------------------------------------- >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From dhanjal at telecom-paristech.fr Sun Aug 14 15:15:35 2011 From: dhanjal at telecom-paristech.fr (Charanpal Dhanjal) Date: Sun, 14 Aug 2011 21:15:35 +0200 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: <1313332026.61861.YahooMailNeo@web34404.mail.mud.yahoo.com> References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> <1313332026.61861.YahooMailNeo@web34404.mail.mud.yahoo.com> Message-ID: <6d45c5e06b9e78cd9f56cf3ff2d604a5@telecom-paristech.fr> Thanks very much Lou for the information. I tried delving into the C code and found a line in the dlasd4_ routine which reads: for (niter = iter; niter <= MAXITERLOOPS; ++niter) { This is apparently the main loop for this subroutine and the value of MAXITERLOOPS = 100. All I did was increase the maximum number of iterations to 200, and this seemed to solve the problem for the matrix in question. Let this matrix be called A, then >>> P0, o0, Q0 = numpy.linalg.svd(A, full_matrices=False) >>> numpy.linalg.norm((P0*o0).dot(Q0)- A) 1.8558089412794851 >>> numpy.linalg.norm(A) 4.558649005154054 >>> A.shape (1952, 895) It seems A has quite a small norm given its dimension, and perhaps this explains the error in the SVD (the numpy.linalg.norm((P0*o0).dot(Q0)- A) bit). To investigate a little further I tried finding the SVD of A*1000: >>> P0, o0, Q0 = numpy.linalg.svd(A*1000, full_matrices=False) >>> numpy.isfinite(Q0).all() False >>> numpy.isfinite(P0).all() False >>> numpy.isfinite(o0).all() False and hence increasing the number of iterations does not solve the problem in this case. That was about as far as I felt I could go with investigating the C code. In the meanwhile I will try the squaring the matrix solution. Incidentally, I am confused as to why numpy calls the lapack lite routines - when I call numpy.show_config() it seems to have detected my ATLAS libraries and I would have expected it to use those. Charanpal On Sun, 14 Aug 2011 07:27:06 -0700 (PDT), Lou Pecora wrote: > Chuck wrote: > > ------------------------- > > Fails here also, fedora 15 64 bits AMD 940. There should be a maximum > iterations argument somewhere... > > Chuck > > --------------------------------------------------- > > *** Here's the "FIX": > > Chuck is right. There is a max iterations. Here is a reply from a > thread of mine in this group several years ago about this problem and > some comments that might help you. > > ---- From Mr. Damian Menscher who was kind enough to find the > iteration location and provide some insight: > > Ok, so after several hours of trying to read that code, I found > the parameter that needs to be tuned. In case anyone has this > problem and finds this thread a year from now, here's your hint: > > File: Src/dlapack_lite.c > Subroutine: dlasd4_ > Line: 22562 > > There's a for loop there that limits the number of iterations to > 20. Increasing this value to 50 allows my matrix to converge. > I have not bothered to test what the "best" value for this number > is, though. In any case, it appears the number just exists to > prevent infinite loops, and 50 isn't really that much closer to > infinity than 20.... (Actually, I'm just going to set it to 100 > so I don't have to think about it ever again.) > > Damian Menscher > -- > -=#| Physics Grad Student & SysAdmin @ U Illinois Urbana-Champaign > |#=- > -=#| 488 LLP, 1110 W. Green St, Urbana, IL 61801 Ofc:(217)333-0038 > |#=- > -=#| 1412 DCL, Workstation Services Group, CITES Ofc:(217)244-3862 > |#=- > -=#| www.uiuc.edu/~menscher/ Fax:(217)333-9819 |#=- > > ---- My reply and a "fix" of sorts without changing the hard coded > iteration max: > > I have looked in Src/dlapack_lite.c and line 22562 is no longer a > line > that sets a max. iterations parameter. There are several set in the > file, but that code is hard to figure (sort of a Fortran-in-C > hybrid). > > > Here's one, for example: > > maxit = *n * 6 * *n; // Line 887 > > I have no idea which parameter to tweak. Apparently this error is > still in numpy (at least to my version). Does anyone have a fix? > Should I start a ticket (I think this is what people do)? Any help > appreciated. > > I'm using a Mac Book Pro (Intel chip), system 10.4.11, Python 2.4.4. > > ============ Possible try/except =========================== > > # A is the original matrix > try: > U,W,VT=linalg.svd(A) > except linalg.linalg.LinAlgError: # "Square" the matrix and do SVD > > print "Got svd except, trying square of A." > A2=dot(conj(A.T),A) > U,W2,VT=linalg.svd(A2) > > This works so far. > > > --------------------------------------------------------------------------------------- > > I've been using that simple "fix" of "squaring" the original matrix > for several years and it's worked every time. I'm not sure why. It > was > just a test and it worked. > > You could also change the underlying C or Fortran code, but you then > have to recompile everything in numpy. I wasn't that brave. > > -- Lou Pecora, my views are my own. From brennan.williams at visualreservoir.com Sun Aug 14 18:59:27 2011 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Mon, 15 Aug 2011 10:59:27 +1200 Subject: [Numpy-discussion] Statistical distributions on samples In-Reply-To: References: Message-ID: <4E48534F.8010008@visualreservoir.com> You can use scipy.stats.truncnorm, can't you? Unless I misread, you want to sample a normal distribution but with generated values only being within a specified range? However you also say you want to do this with triangular and log normal and for these I presume the easiest way is to sample and then accept/reject. Brennan On 13/08/2011 2:53 a.m., Christopher Jordan-Squire wrote: > Hi Andrea--An easy way to get something like this would be > > import numpy as np > import scipy.stats as stats > > sigma = #some reasonable standard deviation for your application > x = stats.norm.rvs(size=1000, loc=125, scale=sigma) > x = x[x>50] > x = x[x<200] > > That will give a roughly normal distribution to your velocities, as > long as, say, sigma<25. (I'm using the rule of thumb for the normal > distribution that normal random samples lie 3 standard deviations away > from the mean about 1 out of 350 times.) Though you won't be able to > get exactly normal errors about your mean since normal random samples > can theoretically be of any size. > > You can use this same process for any other distribution, as long as > you've chosen a scale variable so that the probability of samples > being outside your desired interval is really small. Of course, once > again your random errors won't be exactly from the distribution you > get your original samples from. > > -Chris JS > > On Fri, Aug 12, 2011 at 8:32 AM, Andrea Gavana > > wrote: > > Hi All, > > I am working on something that appeared to be a no-brainer > issue (at the beginning), by my complete ignorance in statistics > is overwhelming and I got stuck. > > What I am trying to do can be summarized as follows > > Let's assume that I have to generate a sample of a 1,000 values > for a variable (let's say, "velocity") using a normal distribution > (but later I will have to do it with log-normal, triangular and a > couple of others). The only thing I know about this velocity > sample is the minimum and maximum values (let's say 50 and 200 > respectively) and, obviously for the normal distribution (but not > so for the other distributions), the mean value (125 in this case). > > Now, I would like to generate this sample of 1,000 points, in > which none of the point has velocity smaller than 50 or bigger > than 200, and the number of samples close to the mean (125) should > be higher than the number of samples close to the minimum and the > maximum, following some kind of normal distribution. > > What I have tried up to now is summarized in the code below, but > as you can easily see, I don't really know what I am doing. I am > open to every suggestion, and I apologize for the dumbness of my > question. > > import numpy > > from scipy import stats > import matplotlib.pyplot as plt > > minval, maxval = 50.0, 250.0 > x = numpy.linspace(minval, maxval, 500) > > samp = stats.norm.rvs(size=len(x)) > pdf = stats.norm.pdf(x) > cdf = stats.norm.cdf(x) > ppf = stats.norm.ppf(x) > > ax1 = plt.subplot(2, 2, 1) > ax1.plot(range(len(x)), samp) > > ax2 = plt.subplot(2, 2, 2) > ax2.plot(x, pdf) > > ax3 = plt.subplot(2, 2, 3) > ax3.plot(x, cdf) > > ax4 = plt.subplot(2, 2, 4) > ax4.plot(x, ppf) > > plt.show() > > > Andrea. > > "Imagination Is The Only Weapon In The War Against Reality." > http://xoomer.alice.it/infinity77/ > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From aronne.merrelli at gmail.com Sun Aug 14 19:51:10 2011 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Sun, 14 Aug 2011 18:51:10 -0500 Subject: [Numpy-discussion] __array_wrap__ / __array_finalize__ in NumPy v1.4+ Message-ID: Hello, I'm attempting to implement a subclass of ndarray, and becoming confused about the way __array_wrap__ and __array_finalize__ operate. I boiled it down to a short subclass, which is the example on the website at http://docs.scipy.org/doc/numpy-1.6.0/user/basics.subclassing.html, with one added attribute that is a copy of the self array multiplied by 2. The doubled copy is stored in a "plain" ndarray. The attachment has the python code. The output below is from NumPy 1.3 and 1.6 (1.4 has the same output as 1.6). The output from 1.3 matches the documentation on the website. In 1.6, __array_wrap__ and __array_finalize__ are invoked in the opposite order, __array_finalize__ appears to be getting an "empty" array, and array_wrap's argument is no longer an ndarray but rather an instance of the subclass. This doesn't match the documentation so I am not sure if this is the correct behavior in newer NumPy. Is this a bug, or the expected behavior in newer NumPy versions? Am I just missing something simple? The actual code I am trying to write uses essentially the same idea - keeping another array, related to the self array through some calculation, as another object attribute. Is there a better way to accomplish this? Thanks, Aronne NumPy version: 1.3.0 object creation In __array_finalize__: self type , values TestClass([0, 1]) obj type , values array([0, 1]) object + ndarray In __array_wrap__: self type , values TestClass([0, 1]) arr type , values array([2, 3]) In __array_finalize__: self type , values TestClass([2, 3]) obj type , values TestClass([0, 1]) obj= [0 1] [0 2] ret= [2 3] [4 6] NumPy version: 1.6.0 object creation In __array_finalize__: self type , values TestClass([0, 1]) obj type , values array([0, 1]) object + ndarray In __array_finalize__: self type , values TestClass([ 15, 22033837]) obj type , values TestClass([0, 1]) In __array_wrap__: self type , values TestClass([0, 1]) arr type , values TestClass([2, 3]) obj= [0 1] [0 2] ret= [2 3] [ 30 44067674] -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: array_wrap_test.py Type: application/octet-stream Size: 1033 bytes Desc: not available URL: From andrea.gavana at gmail.com Mon Aug 15 05:10:37 2011 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Mon, 15 Aug 2011 11:10:37 +0200 Subject: [Numpy-discussion] Statistical distributions on samples In-Reply-To: <4E48534F.8010008@visualreservoir.com> References: <4E48534F.8010008@visualreservoir.com> Message-ID: Hi Chris & Brennan, On 15 August 2011 00:59, Brennan Williams wrote: > You can use scipy.stats.truncnorm, can't you? Unless I misread, you want to > sample a normal distribution but with generated values only being within a > specified range? However you also say you want to do this with triangular > and log normal and for these I presume the easiest way is to sample and then > accept/reject. > > Brennan > > On 13/08/2011 2:53 a.m., Christopher Jordan-Squire wrote: > > Hi Andrea--An easy way to get something like this would be > > import numpy as np > import scipy.stats as stats > > sigma = #some reasonable standard deviation for your application > x = stats.norm.rvs(size=1000, loc=125, scale=sigma) > x = x[x>50] > x = x[x<200] > > That will give a roughly normal distribution to your velocities, as long as, > say, sigma<25. (I'm using the rule of thumb for the normal distribution that > normal random samples lie 3 standard deviations away from the mean about 1 > out of 350 times.) Though you won't be able to get exactly normal errors > about your mean since normal random samples can theoretically be of any > size. > > You can use this same process for any other distribution, as long as you've > chosen a scale variable so that the probability of samples being outside > your desired interval is really small. Of course, once again your random > errors won't be exactly from the distribution you get your original samples > from. Thank you for your answer. Indeed, it appears that a truncated distribution implementation exists only for the normal distribution (in the subset of distributions I need to use). I haven't checked yet what the code for truncnorm does but maybe it might be possible to apply the same approach for other distributions. In any case the sampling/reject/accept approach is the best approach for me, due to my ignorance about statistical things :-) Thank you again. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ >>> import PyQt4.QtGui Traceback (most recent call last): ? File "", line 1, in ImportError: No module named PyQt4.QtGui >>> >>> import pygtk Traceback (most recent call last): ? File "", line 1, in ImportError: No module named pygtk >>> >>> import wx >>> >>> From pearu.peterson at gmail.com Mon Aug 15 08:50:40 2011 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Mon, 15 Aug 2011 15:50:40 +0300 Subject: [Numpy-discussion] ULONG not in UINT16, UINT32, UINT64 under 64-bit windows, is this possible? Message-ID: Hi, A student of mine using 32-bit numpy 1.5 under 64-bit Windows 7 noticed that giving a numpy array with dtype=uint32 to an extension module the following codelet would fail: switch(PyArray_TYPE(ARR)) { case PyArray_UINT16: /* do smth */ break; case PyArray_UINT32: /* do smth */ break; case PyArray_UINT64: /* do smth */ break; default: /* raise type error exception */ } The same test worked fine under Linux. Checking the value of PyArray_TYPE(ARR) (=8) showed that it corresponds to NPY_ULONG (when counting the items in the enum definition). Is this situation possible where NPY_ULONG does not correspond to a 16 or 32 or 64 bit integer? Or does this indicate a bug somewhere for this particular platform? Thanks, Pearu -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Mon Aug 15 09:25:22 2011 From: shish at keba.be (Olivier Delalleau) Date: Mon, 15 Aug 2011 09:25:22 -0400 Subject: [Numpy-discussion] ULONG not in UINT16, UINT32, UINT64 under 64-bit windows, is this possible? In-Reply-To: References: Message-ID: The reason is there can be multiple dtypes (i.e. with different .num) representing the same kind of data. Usually in Python this goes unnoticed, because you do not test a dtype through its .num, instead you use for instance "== 'uint32'", and all works fine. However, it can indeed confuse C code in situations like the one you describe, because of direct comparison of .num. I guess you have a few options: - Do not compare .num (I'm not sure what would be the equivalent to "== 'utin32' in C though) => probably slower - Re-cast your array in the exact dtype you need (in Python you can do this with .view) => probably cumbersome - Write a customized comparison function that figures out at initialization time all dtypes that represent the same data, and then is able to do a fast comparison based on .num => probably best, but requires more work Here's some Python code that lists the various scalar dtypes associated to unique .num in numpy (excerpt slightly modified from code found in Theano -- http://deeplearning.net/software/theano -- BSD license). Call the "get_numeric_types()" function, and print both the string representation of the resulting dtypes as well as their .num. def get_numeric_subclasses(cls=numpy.number, ignore=None): """ Return subclasses of `cls` in the numpy scalar hierarchy. We only return subclasses that correspond to unique data types. The hierarchy can be seen here: http://docs.scipy.org/doc/numpy/reference/arrays.scalars.html """ if ignore is None: ignore = [] rval = [] dtype = numpy.dtype(cls) dtype_num = dtype.num if dtype_num not in ignore: # Safety check: we should be able to represent 0 with this data type. numpy.array(0, dtype=dtype) rval.append(cls) ignore.append(dtype_num) for sub in cls.__subclasses__(): rval += [c for c in get_numeric_subclasses(sub, ignore=ignore)] return rval def get_numeric_types(): """ Return numpy numeric data types. :returns: A list of unique data type objects. Note that multiple data types may share the same string representation, but can be differentiated through their `num` attribute. """ rval = [] def is_within(cls1, cls2): # Return True if scalars defined from `cls1` are within the hierarchy # starting from `cls2`. # The third test below is to catch for instance the fact that # one can use ``dtype=numpy.number`` and obtain a float64 scalar, even # though `numpy.number` is not under `numpy.floating` in the class # hierarchy. return (cls1 is cls2 or issubclass(cls1, cls2) or isinstance(numpy.array([0], dtype=cls1)[0], cls2)) for cls in get_numeric_subclasses(): dtype = numpy.dtype(cls) rval.append([str(dtype), dtype, dtype.num]) # We sort it to be deterministic, then remove the string and num elements. return [x[1] for x in sorted(rval, key=str)] 2011/8/15 Pearu Peterson > > Hi, > > A student of mine using 32-bit numpy 1.5 under 64-bit Windows 7 noticed > that > giving a numpy array with dtype=uint32 to an extension module the > following codelet would fail: > > switch(PyArray_TYPE(ARR)) { > case PyArray_UINT16: /* do smth */ break; > case PyArray_UINT32: /* do smth */ break; > case PyArray_UINT64: /* do smth */ break; > default: /* raise type error exception */ > } > > The same test worked fine under Linux. > > Checking the value of PyArray_TYPE(ARR) (=8) showed that it corresponds to > NPY_ULONG (when counting the items in the enum definition). > > Is this situation possible where NPY_ULONG does not correspond to a 16 or > 32 or 64 bit integer? > Or does this indicate a bug somewhere for this particular platform? > > Thanks, > Pearu > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.gavana at gmail.com Mon Aug 15 09:53:17 2011 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Mon, 15 Aug 2011 15:53:17 +0200 Subject: [Numpy-discussion] Statistical distributions on samples In-Reply-To: References: Message-ID: Hi Chris and All, On 12 August 2011 16:53, Christopher Jordan-Squire wrote: > Hi Andrea--An easy way to get something like this would be > > import numpy as np > import scipy.stats as stats > > sigma = #some reasonable standard deviation for your application > x = stats.norm.rvs(size=1000, loc=125, scale=sigma) > x = x[x>50] > x = x[x<200] > > That will give a roughly normal distribution to your velocities, as long as, > say, sigma<25. (I'm using the rule of thumb for the normal distribution that > normal random samples lie 3 standard deviations away from the mean about 1 > out of 350 times.) Though you won't be able to get exactly normal errors > about your mean since normal random samples can theoretically be of any > size. > > You can use this same process for any other distribution, as long as you've > chosen a scale variable so that the probability of samples being outside > your desired interval is really small. Of course, once again your random > errors won't be exactly from the distribution you get your original samples > from. Thank you for your suggestion. There are a couple of things I am not clear with, however. The first one (the easy one), is: let's suppose I need 200 values, and the accept/discard procedure removes 5 of them from the list. Is there any way to draw these 200 values from a bigger sample so that the accept/reject procedure will not interfere too much? And how do I get 200 values out of the bigger sample so that these values are still representative? Another thing, possibly completely unrelated. I am trying to design a toy Latin Hypercube script (just for my own understanding). I found this piece of code on the web (and I modified it slightly): def lhs(dist, size=100): ''' Latin Hypercube sampling of any distrbution. dist is is a scipy.stats random number generator such as stats.norm, stats.beta, etc parms is a tuple with the parameters needed for the specified distribution. :Parameters: - `dist`: random number generator from scipy.stats module. - `size` :size for the output sample ''' n = size perc = numpy.arange(0.0, 1.0, 1.0/n) numpy.random.shuffle(perc) smp = [stats.uniform(i,1.0/n).rvs() for i in perc] v = dist.ppf(smp) return v Now, I am not 100% clear of what the percent point function is (I have read around the web, but please keep in mind that my statistical skills are close to minus infinity). From this page: http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm I gather that, if you plot the results of the ppf, with the horizontal axis as probability, the vertical axis goes from the smallest to the largest value of the cumulative distribution function. If i do this: numpy.random.seed(123456) distribution = stats.norm(loc=125, scale=25) my_lhs = lhs(distribution, 50) Will my_lhs always contain valid values (i.e., included between 50 and 200)? I assume the answer is no... but even if this was the case, is this my_lhs array ready to be used to setup a LHS experiment when I have multi-dimensional problems (in which all the variables are completely independent from each other - no correlation)? My apologies for the idiocy of the questions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ >>> import PyQt4.QtGui Traceback (most recent call last): ? File "", line 1, in ImportError: No module named PyQt4.QtGui >>> >>> import pygtk Traceback (most recent call last): ? File "", line 1, in ImportError: No module named pygtk >>> >>> import wx >>> >>> From tmp50 at ukr.net Mon Aug 15 15:21:30 2011 From: tmp50 at ukr.net (Dmitrey) Date: Mon, 15 Aug 2011 22:21:30 +0300 Subject: [Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision Message-ID: Hi all, I'm glad to inform you that general constraints handling for interalg (free solver with guaranteed user-defined precision) now is available. Despite it is very premature and requires lots of improvements, it is already capable of outperforming commercial BARON (example: http://openopt.org/interalg_bench#Test_4) and thus you could be interested in trying it right now (next OpenOpt release will be no sooner than 1 month). interalg can be especially more effective than BARON (and some other competitors) on problems with huge or absent Lipschitz constant, for example on funcs like sqrt(x), log(x), 1/x, x**alpha, alpha<1, when domain of x is something like [small_positive_value, another_value]. Let me also remember you that interalg can search for all solutions of nonlinear equations / systems of them where local solvers like scipy.optimize fsolve cannot find anyone, and search single/multiple integral with guaranteed user-defined precision (speed of integration is intended to be enhanced in future). However, only FuncDesigner models are handled (read interalg webpage for more details). Regards, D. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Mon Aug 15 15:40:16 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Mon, 15 Aug 2011 14:40:16 -0500 Subject: [Numpy-discussion] Statistical distributions on samples In-Reply-To: References: Message-ID: On Mon, Aug 15, 2011 at 8:53 AM, Andrea Gavana wrote: > Hi Chris and All, > > On 12 August 2011 16:53, Christopher Jordan-Squire wrote: > > Hi Andrea--An easy way to get something like this would be > > > > import numpy as np > > import scipy.stats as stats > > > > sigma = #some reasonable standard deviation for your application > > x = stats.norm.rvs(size=1000, loc=125, scale=sigma) > > x = x[x>50] > > x = x[x<200] > > > > That will give a roughly normal distribution to your velocities, as long > as, > > say, sigma<25. (I'm using the rule of thumb for the normal distribution > that > > normal random samples lie 3 standard deviations away from the mean about > 1 > > out of 350 times.) Though you won't be able to get exactly normal errors > > about your mean since normal random samples can theoretically be of any > > size. > > > > You can use this same process for any other distribution, as long as > you've > > chosen a scale variable so that the probability of samples being outside > > your desired interval is really small. Of course, once again your random > > errors won't be exactly from the distribution you get your original > samples > > from. > > Thank you for your suggestion. There are a couple of things I am not > clear with, however. The first one (the easy one), is: let's suppose I > need 200 values, and the accept/discard procedure removes 5 of them > from the list. Is there any way to draw these 200 values from a bigger > sample so that the accept/reject procedure will not interfere too > much? And how do I get 200 values out of the bigger sample so that > these values are still representative? > FWIW, I'm not really advocating a truncated normal so much as making the standard deviation small enough so that there's no real difference between a true normal distribution and a truncated normal. If you're worried about getting exactly 200 samples, then you could sample N with N>200 and such that after throwing out the ones that lie outside your desired region you're left with M>200. Then just randomly pick 200 from those M. That shouldn't bias anything as long as you randomly pick them. (Or just pick the first 200, if you haven't done anything to impose any order on the samples, such as sorting them by size.) But I'm not sure why you'd want exactly 200 samples instead of some number of samples close to 200. > > Another thing, possibly completely unrelated. I am trying to design a > toy Latin Hypercube script (just for my own understanding). I found > this piece of code on the web (and I modified it slightly): > > def lhs(dist, size=100): > ''' > Latin Hypercube sampling of any distrbution. > dist is is a scipy.stats random number generator > such as stats.norm, stats.beta, etc > parms is a tuple with the parameters needed for > the specified distribution. > > :Parameters: > - `dist`: random number generator from scipy.stats module. > - `size` :size for the output sample > ''' > > n = size > > perc = numpy.arange(0.0, 1.0, 1.0/n) > numpy.random.shuffle(perc) > > smp = [stats.uniform(i,1.0/n).rvs() for i in perc] > > v = dist.ppf(smp) > > return v > > > Now, I am not 100% clear of what the percent point function is (I have > read around the web, but please keep in mind that my statistical > skills are close to minus infinity). From this page: > > http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm > > The ppf is what's called the quantile function elsewhere. I do not know why scipy calls it the ppf/percent point function. The quantile function is the inverse of the cumulative density function (cdf). So dist.ppf(z) is the x such that P(dist <= x) = z. Roughly. (Things get slightly more finicky if you think about discrete distributions because then you have to pick what happens at the jumps in the cdf.) So dist.ppf(0.5) gives the median of dist, and dist.ppf(0.25) gives the lower/first quartile of dist. > I gather that, if you plot the results of the ppf, with the horizontal > axis as probability, the vertical axis goes from the smallest to the > largest value of the cumulative distribution function. If i do this: > > numpy.random.seed(123456) > > distribution = stats.norm(loc=125, scale=25) > > my_lhs = lhs(distribution, 50) > > Will my_lhs always contain valid values (i.e., included between 50 and > 200)? I assume the answer is no... but even if this was the case, is > this my_lhs array ready to be used to setup a LHS experiment when I > have multi-dimensional problems (in which all the variables are > completely independent from each other - no correlation)? > > I'm not really sure if the above function is doing the lhs you want. To answer your question, it won't always generate values within [50,200]. If size is large enough then you're dividing up the probability space evenly. So even with the random perturbations (whose use I don't really understand), you'll ensure that some of the samples you get when you apply the ppf will correspond to the extremely low probability samples that are <50 or >200. -Chris JS My apologies for the idiocy of the questions. > > Andrea. > > "Imagination Is The Only Weapon In The War Against Reality." > http://xoomer.alice.it/infinity77/ > > >>> import PyQt4.QtGui > Traceback (most recent call last): > File "", line 1, in > ImportError: No module named PyQt4.QtGui > >>> > >>> import pygtk > Traceback (most recent call last): > File "", line 1, in > ImportError: No module named pygtk > >>> > >>> import wx > >>> > >>> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrea.gavana at gmail.com Mon Aug 15 16:01:05 2011 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Mon, 15 Aug 2011 23:01:05 +0300 Subject: [Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision In-Reply-To: References: Message-ID: Hi Dmitrey, 2011/8/15 Dmitrey : > Hi all, > I'm glad to inform you that general constraints handling for interalg (free > solver with guaranteed user-defined precision) now is available. Despite it > is very premature and requires lots of improvements, it is already capable > of outperforming commercial BARON (example: > http://openopt.org/interalg_bench#Test_4)? and thus you could be interested > in trying it right now (next OpenOpt release will be no sooner than 1 > month). > > interalg can be especially more effective than BARON (and some other > competitors) on problems with huge or absent Lipschitz constant, for example > on funcs like sqrt(x), log(x), 1/x, x**alpha, alpha<1, when domain of x is > something like [small_positive_value, another_value]. > > Let me also remember you that interalg can search for all solutions of > nonlinear equations / systems of them where local solvers like > scipy.optimize fsolve cannot find anyone, and search single/multiple > integral with guaranteed user-defined precision (speed of integration is > intended to be enhanced in future). > However, only FuncDesigner models are handled (read interalg webpage for > more details). Thank you for this new improvements. I am one of those who use OpenOpt in real life problems, and if I can advance a suggestion (for the second time), when you post a benchmark of various optimization methods, please do not consider the "elapsed time" only as a meaningful variable to measure a success/failure of an algorithm. Some (most?) of real life problems require intensive and time consuming simulations for every *function evaluation*; the time spent by the solver itself doing its calculations simply disappears in front of the real process simulation. I know it because our simulations take between 2 and 48 hours to run, so what's 300 seconds more or less in the solver calculations? If you talk about synthetic problems (such as the ones defined by a formula), I can see your point. For everything else, I believe the number of function evaluations is a more direct way to assess the quality of an optimization algorithm. Just my 2c. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ >>> import PyQt4.QtGui Traceback (most recent call last): ? File "", line 1, in ImportError: No module named PyQt4.QtGui >>> >>> import pygtk Traceback (most recent call last): ? File "", line 1, in ImportError: No module named pygtk >>> >>> import wx >>> >>> From tmp50 at ukr.net Mon Aug 15 16:09:37 2011 From: tmp50 at ukr.net (Dmitrey) Date: Mon, 15 Aug 2011 23:09:37 +0300 Subject: [Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision In-Reply-To: References: Message-ID: Hi Andrea, I believe benchmarks should be like Hans Mittelman do ( http://plato.asu.edu/bench.html ) and of course number of funcs evaluations matters when slow Python code vs compiled is tested, but my current work doesn't allow me to spend so much time for OpenOpt development, so, moreover, for auxiliary work such as benchmarking (and making it properly like that). Also, benchmarks of someone's own soft usually are not very trustful, moreover, on his own probs. BTW, please don't reply on my posts in scipy mail lists - I use them only to post the announcements like this and can miss a reply. Regards, D. --- ???????? ????????? --- ?? ????: " Andrea Gavana" ????: " Discussion of Numerical Python" ????: 15 ??????? 2011, 23:01:05 ????: Re: [Numpy-discussion] [ANN] Constrained optimization solver with guaranteed precision Hi Dmitrey, 2011/8/15 Dmitrey < tmp50 at ukr.net >: > Hi all, > I'm glad to inform you that general constraints handling for interalg (free > solver with guaranteed user-defined precision) now is available. Despite it > is very premature and requires lots of improvements, it is already capable > of outperforming commercial BARON (example: > http://openopt.org/interalg_bench#Test_4 ) and thus you could be interested > in trying it right now (next OpenOpt release will be no sooner than 1 > month). > > interalg can be especially more effective than BARON (and some other > competitors) on problems with huge or absent Lipschitz constant, for example > on funcs like sqrt(x), log(x), 1/x, x**alpha, alpha<1, when domain of x is > something like [small_positive_value, another_value]. > > Let me also remember you that interalg can search for all solutions of > nonlinear equations / systems of them where local solvers like > scipy.optimize fsolve cannot find anyone, and search single/multiple > integral with guaranteed user-defined precision (speed of integration is > intended to be enhanced in future). > However, only FuncDesigner models are handled (read interalg webpage for > more details). Thank you for this new improvements. I am one of those who use OpenOpt in real life problems, and if I can advance a suggestion (for the second time), when you post a benchmark of various optimization methods, please do not consider the "elapsed time" only as a meaningful variable to measure a success/failure of an algorithm. Some (most?) of real life problems require intensive and time consuming simulations for every *function evaluation*; the time spent by the solver itself doing its calculations simply disappears in front of the real process simulation. I know it because our simulations take between 2 and 48 hours to run, so what's 300 seconds more or less in the solver calculations? If you talk about synthetic problems (such as the ones defined by a formula), I can see your point. For everything else, I believe the number of function evaluations is a more direct way to assess the quality of an optimization algorithm. Just my 2c. Andrea. -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniel.wheeler2 at gmail.com Mon Aug 15 16:11:24 2011 From: daniel.wheeler2 at gmail.com (Daniel Wheeler) Date: Mon, 15 Aug 2011 16:11:24 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: <4E1BFCE6.5030702@astro.uio.no> References: <4E1BFCE6.5030702@astro.uio.no> Message-ID: Hi, I put together a set of tools for inverting, multiplying and finding eigenvalues for many small matrices (arrays of shape (N, M, M) where MxM is the size of each matrix). Thanks to the posoter who suggested using the Tokyo package. Although not used directly, it helped with figuring the correct arguments to pass to the lapack routines and getting stated with cython. I put the code up here if anyone happens to be interested. It consists of three files, smallMatrixTools.py, smt.pyx amd smt.pxd. The speed tests comparing with numpy came out something like this... N, M, M: 10000, 2, 2 mulinv speed up: 65.9, eigvec speed up: 11.2 N, M, M: 10000, 3, 3 mulinv speed up: 32.3, eigvec speed up: 7.2 N, M, M: 10000, 4, 4 mulinv speed up: 24.1, eigvec speed up: 5.9 N, M, M: 10000, 5, 5 mulinv speed up: 17.0, eigvec speed up: 5.2 for random matrices. Not bad speed ups, but not out of this world. I'm new to cython so there may be some costly mistakes in the implementation. I profiled and it seems that most of the time is now being spent in the lapack routines, but still not completely convinced by the profiling results. One thing that I know I'm doing wrong is reassigning every sub-matrix to a new array. This may not be that costly, but it seems fairly ugly. I wasn't sure how to pass the address of the submatrix to the lapack routines so I'm assigning to a new array and passing that instead. I tested with and speed tests were done using . Cheers On Tue, Jul 12, 2011 at 3:51 AM, Dag Sverre Seljebotn wrote: > On 07/11/2011 11:01 PM, Daniel Wheeler wrote: >> Hi, I am trying to find the eigenvalues and eigenvectors as well as >> the inverse for a large number of small matrices. The matrix size >> (MxM) will typically range from 2x2 to 8x8 at most. The number of >> matrices (N) can be from 100 up to a million or more. My current >> solution is to define "eig" and "inv" to be, >> >> def inv(A): >> ? ? ?""" >> ? ? ?Inverts N MxM matrices, A.shape = (M, M, N), inv(A).shape = (M, M, N). >> ? ? ?""" >> ? ? ?return np.array(map(np.linalg.inv, A.transpose(2, 0, 1))).transpose(1, 2, 0) >> >> def eig(A): >> ? ? ?""" >> ? ? ?Calculate the eigenvalues and eigenvectors of N MxM matrices, >> A.shape = (M, M, N), eig(A)[0].shape = (M, N), eig(A)[1].shape = (M, >> M, N) >> ? ? ?""" >> ? ? ?tmp = zip(*map(np.linalg.eig, A.transpose(2, 0, 1))) >> ? ? ?return (np.array(tmp[0]).swapaxes(0,1), np.array(tmp[1]).transpose(1,2,0)) >> >> The above uses "map" to fake a vector solution, but this is heinously >> slow. Are there any better ways to do this without resorting to cython >> or weave (would it even be faster (or possible) to use "np.linalg.eig" >> and "np.linalg.inv" within cython)? I could write specialized versions > > If you want to go the Cython route, here's a start: > > http://www.vetta.org/2009/09/tokyo-a-cython-blas-wrapper-for-fast-matrix-math/ > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Daniel Wheeler From rowen at uw.edu Mon Aug 15 16:25:50 2011 From: rowen at uw.edu (Russell E. Owen) Date: Mon, 15 Aug 2011 13:25:50 -0700 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? References: Message-ID: In article , Torgil Svensson wrote: > Try the fromiter function, that will allow you to pass an iterator > which can read the file line by line and not preload the whole file. > > file_iterator = iter(open('filename.txt') > line_parser = lambda x: map(float,x.split('\t')) > a=np.fromiter(itertools.imap(line_parser,file_iterator),dtype=float) > > You have also the option to iterate the file twice and pass the > "count" argument. Thanks. That sounds great! -- RUssell From matthew.brett at gmail.com Mon Aug 15 17:53:13 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 15 Aug 2011 14:53:13 -0700 Subject: [Numpy-discussion] Segfault for np.lookfor Message-ID: Hi, On current trunk, all tests pass but running the (forgive my language) doctests, I found this: In [1]: import numpy as np In [2]: np.__version__ Out[2]: '2.0.0.dev-730b861' In [3]: np.lookfor('cos') Segmentation fault on: Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux Ubuntu Natty Python 2.7.1+ Best, Matthew From matthew.brett at gmail.com Mon Aug 15 18:29:08 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 15 Aug 2011 15:29:08 -0700 Subject: [Numpy-discussion] numpydoc - latex longtables error In-Reply-To: References: Message-ID: Hi, On Wed, Aug 10, 2011 at 5:17 PM, Matthew Brett wrote: > Hi, > > On Wed, Aug 10, 2011 at 5:03 PM, ? wrote: >> On Wed, Aug 10, 2011 at 6:17 PM, Matthew Brett wrote: >>> Hi, >>> >>> On Wed, Aug 10, 2011 at 12:38 PM, Skipper Seabold wrote: >>>> On Wed, Aug 10, 2011 at 3:28 PM, Matthew Brett wrote: >>>>> Hi, >>>>> >>>>> I think this one might be for Pauli. >>>>> >>>>> I've run into an odd problem that seems to be an interaction of >>>>> numpydoc and autosummary and large classes. >>>>> >>>>> In summary, large classes and numpydoc lead to large tables of class >>>>> methods, and there seems to be an error in the creation of the large >>>>> tables in latex. >>>>> >>>>> Specifically, if I run 'make latexpdf' with the attached minimal >>>>> sphinx setup, I get a pdflatex error ending thus: >>>>> >>>>> ... >>>>> l.118 \begin{longtable}{LL} >>>>> >>>>> and this is because longtable does not accept LL as an argument, but >>>>> needs '|l|l|' (bar - el - bar - el - bar). >>>>> >>>>> I see in sphinx.writers.latex.py, around line 657, that sphinx knows >>>>> about this in general, and long tables in standard ReST work fine with >>>>> the el-bar arguments passed to longtable. >>>>> >>>>> ? ? ? ?if self.table.colspec: >>>>> ? ? ? ? ? ?self.body.append(self.table.colspec) >>>>> ? ? ? ?else: >>>>> ? ? ? ? ? ?if self.table.has_problematic: >>>>> ? ? ? ? ? ? ? ?colwidth = 0.95 / self.table.colcount >>>>> ? ? ? ? ? ? ? ?colspec = ('p{%.3f\\linewidth}|' % colwidth) * \ >>>>> ? ? ? ? ? ? ? ? ? ? ? ? ?self.table.colcount >>>>> ? ? ? ? ? ? ? ?self.body.append('{|' + colspec + '}\n') >>>>> ? ? ? ? ? ?elif self.table.longtable: >>>>> ? ? ? ? ? ? ? ?self.body.append('{|' + ('l|' * self.table.colcount) + '}\n') >>>>> ? ? ? ? ? ?else: >>>>> ? ? ? ? ? ? ? ?self.body.append('{|' + ('L|' * self.table.colcount) + '}\n') >>>>> >>>>> However, using numpydoc and autosummary (see the conf.py file), what >>>>> seems to happen is that, when we reach the self.table.colspec test at >>>>> the beginning of the snippet above, 'self.table.colspec' is defined: >>>>> >>>>> In [1]: self.table.colspec >>>>> Out[1]: '{LL}\n' >>>>> >>>>> and thus the LL gets written as the arg to longtable: >>>>> >>>>> \begin{longtable}{LL} >>>>> >>>>> and the pdf build breaks. >>>>> >>>>> I'm using the numpydoc out of the current numpy source tree. >>>>> >>>>> At that point I wasn't sure how to proceed with debugging. ?Can you >>>>> give any hints? >>>>> >>>> >>>> It's not a proper fix, but our workaround is to edit the Makefile for >>>> latex (and latexpdf) to >>>> >>>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/Makefile#L94 >>>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/make.bat#L121 >>>> >>>> to call the script to replace the longtable arguments >>>> >>>> https://github.com/statsmodels/statsmodels/blob/master/scikits/statsmodels/docs/fix_longtable.py >>>> >>>> The workaround itself probably isn't optimal, and I'd be happy to hear >>>> of a proper fix. >>> >>> Thanks - yes - I found your workaround in my explorations, I put in a >>> version in our tree too: >>> >>> https://github.com/matthew-brett/nipy/blob/latex_build_fixes/tools/fix_longtable.py >>> >>> ?- but I agree it seems much better to get to the root cause. >> >> When I tried to figure this out, I never found out why the correct >> sphinx longtable code path never gets reached, or which code >> (numpydoc, autosummary or sphinx) is filling in the colspec. > > No - it looked hard to debug. ?I established that it required numpydoc > and autosummary to be enabled. It looks like this conversation dried up, so I've moved it to a ticket: http://projects.scipy.org/numpy/ticket/1935 Best, Matthew From charlesr.harris at gmail.com Mon Aug 15 20:56:12 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Aug 2011 18:56:12 -0600 Subject: [Numpy-discussion] Segfault for np.lookfor In-Reply-To: References: Message-ID: On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett wrote: > Hi, > > On current trunk, all tests pass but running the (forgive my language) > doctests, I found this: > > In [1]: import numpy as np > > In [2]: np.__version__ > Out[2]: '2.0.0.dev-730b861' > > In [3]: np.lookfor('cos') > Segmentation fault > > on: > > Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC > 2011 x86_64 x86_64 x86_64 GNU/Linux > Ubuntu Natty Python 2.7.1+ > > The problem is somewhere in print_coercion_tables.py Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Aug 15 21:09:11 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Aug 2011 19:09:11 -0600 Subject: [Numpy-discussion] Segfault for np.lookfor In-Reply-To: References: Message-ID: On Mon, Aug 15, 2011 at 6:56 PM, Charles R Harris wrote: > > > On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett wrote: > >> Hi, >> >> On current trunk, all tests pass but running the (forgive my language) >> doctests, I found this: >> >> In [1]: import numpy as np >> >> In [2]: np.__version__ >> Out[2]: '2.0.0.dev-730b861' >> >> In [3]: np.lookfor('cos') >> Segmentation fault >> >> on: >> >> Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC >> 2011 x86_64 x86_64 x86_64 GNU/Linux >> Ubuntu Natty Python 2.7.1+ >> >> > The problem is somewhere in print_coercion_tables.py > > Or more precisely, it triggered by importing print_coercion_tables.py. I don't think lookfor should be doing that, but in any case: array + scalar + ? b h i l q p B H I L Q P e f d g F D G S U V O M m ? ? b h i l l l B H I L L L e f d g F D G O O # O ! m b b b b b b b b b b b b b b e f d g F D G O O # O ! m h h h h h h h h h h h h h h f f d g F D G O O # O ! m i i i i i i i i i i i i i i d d d g D D G O O # O ! m l l l l l l l l l l l l l l d d d g D D G O O # O ! m q l l l l l l l l l l l l l d d d g D D G O O # O ! m p l l l l l l l l l l l l l d d d g D D G O O # O ! m B B B B B B B B B B B B B B e f d g F D G O O # O ! m H H H H H H H H H H H H H H f f d g F D G O O # O ! m I I I I I I I I I I I I I I d d d g D D G O O # O ! m L L L L L L L L L L L L L L d d d g D D G O O # O ! m Q L L L L L L L L L L L L L d d d g D D G O O # O ! m P L L L L L L L L L L L L L d d d g D D G O O # O ! m e e e e e e e e e e e e e e e e e e F F F O O # O ! # f f f f f f f f f f f f f f f f f f F F F O O # O ! # d d d d d d d d d d d d d d d d d d D D D O O # O ! # g g g g g g g g g g g g g g g g g g G G G O O # O ! # F F F F F F F F F F F F F F F F F F F F F O O # O ! # D D D D D D D D D D D D D D D D D D D D D O O # O ! # G G G G G G G G G G G G G G G G G G G G G O O # O ! # S O O O O O O O O O O O O O O O O O O O O O O # O ! O U O O O O O O O O O O O O O O O O O O O O O O # O ! O Segmentation fault (core dumped) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Aug 15 21:43:57 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 15 Aug 2011 19:43:57 -0600 Subject: [Numpy-discussion] Segfault for np.lookfor In-Reply-To: References: Message-ID: On Mon, Aug 15, 2011 at 7:09 PM, Charles R Harris wrote: > > > On Mon, Aug 15, 2011 at 6:56 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett wrote: >> >>> Hi, >>> >>> On current trunk, all tests pass but running the (forgive my language) >>> doctests, I found this: >>> >>> In [1]: import numpy as np >>> >>> In [2]: np.__version__ >>> Out[2]: '2.0.0.dev-730b861' >>> >>> In [3]: np.lookfor('cos') >>> Segmentation fault >>> >>> on: >>> >>> Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC >>> 2011 x86_64 x86_64 x86_64 GNU/Linux >>> Ubuntu Natty Python 2.7.1+ >>> >>> >> The problem is somewhere in print_coercion_tables.py >> >> > Or more precisely, it triggered by importing print_coercion_tables.py. I > don't think lookfor should be doing that, but in any case: > > array + scalar > + ? b h i l q p B H I L Q P e f d g F D G S U V O M m > ? ? b h i l l l B H I L L L e f d g F D G O O # O ! m > b b b b b b b b b b b b b b e f d g F D G O O # O ! m > h h h h h h h h h h h h h h f f d g F D G O O # O ! m > i i i i i i i i i i i i i i d d d g D D G O O # O ! m > l l l l l l l l l l l l l l d d d g D D G O O # O ! m > q l l l l l l l l l l l l l d d d g D D G O O # O ! m > p l l l l l l l l l l l l l d d d g D D G O O # O ! m > B B B B B B B B B B B B B B e f d g F D G O O # O ! m > H H H H H H H H H H H H H H f f d g F D G O O # O ! m > I I I I I I I I I I I I I I d d d g D D G O O # O ! m > L L L L L L L L L L L L L L d d d g D D G O O # O ! m > Q L L L L L L L L L L L L L d d d g D D G O O # O ! m > P L L L L L L L L L L L L L d d d g D D G O O # O ! m > e e e e e e e e e e e e e e e e e e F F F O O # O ! # > f f f f f f f f f f f f f f f f f f F F F O O # O ! # > d d d d d d d d d d d d d d d d d d D D D O O # O ! # > g g g g g g g g g g g g g g g g g g G G G O O # O ! # > F F F F F F F F F F F F F F F F F F F F F O O # O ! # > D D D D D D D D D D D D D D D D D D D D D O O # O ! # > G G G G G G G G G G G G G G G G G G G G G O O # O ! # > S O O O O O O O O O O O O O O O O O O O O O O # O ! O > U O O O O O O O O O O O O O O O O O O O O O O # O ! O > Segmentation fault (core dumped) > A quick fix is to put the print statements in a function. diff --git a/numpy/testing/print_coercion_tables.py b/numpy/testing/print_coercion_tables.p index d875449..3bc9253 100755 --- a/numpy/testing/print_coercion_tables.py +++ b/numpy/testing/print_coercion_tables.py @@ -65,22 +65,23 @@ def print_coercion_table(ntypes, inputfirstvalue, inputsecondvalue, fir print char, print -print "can cast" -print_cancast_table(np.typecodes['All']) -print -print "In these tables, ValueError is '!', OverflowError is '@', TypeError is '#'" -print -print "scalar + scalar" -print_coercion_table(np.typecodes['All'], 0, 0, False) -print -print "scalar + neg scalar" -print_coercion_table(np.typecodes['All'], 0, -1, False) -print -print "array + scalar" -print_coercion_table(np.typecodes['All'], 0, 0, True) -print -print "array + neg scalar" -print_coercion_table(np.typecodes['All'], 0, -1, True) -print -print "promote_types" -print_coercion_table(np.typecodes['All'], 0, 0, False, True) +def printem(): + print "can cast" + print_cancast_table(np.typecodes['All']) + print + print "In these tables, ValueError is '!', OverflowError is '@', TypeError is '#'" + print + print "scalar + scalar" + print_coercion_table(np.typecodes['All'], 0, 0, False) + print + print "scalar + neg scalar" + print_coercion_table(np.typecodes['All'], 0, -1, False) + print + print "array + scalar" + print_coercion_table(np.typecodes['All'], 0, 0, True) + print + print "array + neg scalar" + print_coercion_table(np.typecodes['All'], 0, -1, True) + print + print "promote_types" + print_coercion_table(np.typecodes['All'], 0, 0, False, True) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From lpc at cmu.edu Mon Aug 15 22:42:43 2011 From: lpc at cmu.edu (Luis Pedro Coelho) Date: Mon, 15 Aug 2011 22:42:43 -0400 Subject: [Numpy-discussion] As any array, really any array Message-ID: <201108152242.48346.lpc@cmu.edu> Hello all, I often find myself writing the following code: try: features = np.asanyarray(features) except: features = np.asanyarray(features, dtype=object) I basically want to be able to use fany indexing on features and, in most cases, it will be a numpy floating point array. Otherwise, default to having it be an array of dtype=object. Is there a more elegant way to do it with numpy? Thank you, -- Luis Pedro Coelho | Carnegie Mellon University | http://luispedro.org -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part. URL: From wardefar at iro.umontreal.ca Tue Aug 16 05:53:59 2011 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Tue, 16 Aug 2011 05:53:59 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: References: <4E1BFCE6.5030702@astro.uio.no> Message-ID: <97108383-605D-472E-B97C-75D2150B88D8@iro.umontreal.ca> On 2011-08-15, at 4:11 PM, Daniel Wheeler wrote: > One thing that I know I'm doing wrong is > reassigning every sub-matrix to a new array. This may not be that > costly, but it seems fairly ugly. I wasn't sure how to pass the > address of the submatrix to the lapack routines so I'm assigning to a > new array and passing that instead. It looks like the arrays you're passing are C contiguous. Am I right about this? (I ask because I was under the impression that BLAS/LAPACK routines typically want Fortran-ordered input arrays). If your 3D array is also C-contiguous, you should be able to do pointer arithmetic with A.data and B.data. foo.strides[0] will tell you how many bytes you need to move to get to the next element along that axis. If the 3D array is anything but C contiguous, then I believe the copy is necessary. You should check for that in your Python-visible "solve" wrapper, and make a copy of it that is C contiguous if necessary (check foo.flags.c_contiguous), as this will be likely faster than copying to the same buffer each time in the loop. David From J.Lee at bom.gov.au Tue Aug 16 07:32:34 2011 From: J.Lee at bom.gov.au (Jin Lee) Date: Tue, 16 Aug 2011 21:32:34 +1000 Subject: [Numpy-discussion] f2py - undefined symbol: _intel_fast_memset [SEC=UNCLASSIFIED] In-Reply-To: <0E3686EB9FA8AA409AFA0A25468DCE43017138F5542C@BOM-VMBX-HO.bom.gov.au> References: <0E3686EB9FA8AA409AFA0A25468DCE43017138F5542C@BOM-VMBX-HO.bom.gov.au> Message-ID: <0E3686EB9FA8AA409AFA0A25468DCE43017138F5542E@BOM-VMBX-HO.bom.gov.au> Hello, This is my very first attempt at using f2py but I have come across a problem. If anyone can assist me I would appreciate it very much. I have a very simple test Fortran source, sub.f90 which is: subroutine sub1(x,y) implicit none integer, intent(in) :: x integer, intent(out) :: y ! start y = x end subroutine sub1 I then used f2py to produce an object file, sub.so: f2py -c -m sub sub.f90 --fcompiler='gfortran' After starting a Python interactive session I tried to import the Fortran-derived Python module but I get an error message: >>> import sub Traceback (most recent call last): File "", line 1, in ImportError: ./sub.so: undefined symbol: _intel_fast_memset Can anyone suggest what this error message means and how I can overcome it, please? Regards, Jin From pearu.peterson at gmail.com Tue Aug 16 07:45:20 2011 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Tue, 16 Aug 2011 14:45:20 +0300 Subject: [Numpy-discussion] f2py - undefined symbol: _intel_fast_memset [SEC=UNCLASSIFIED] In-Reply-To: <0E3686EB9FA8AA409AFA0A25468DCE43017138F5542E@BOM-VMBX-HO.bom.gov.au> References: <0E3686EB9FA8AA409AFA0A25468DCE43017138F5542C@BOM-VMBX-HO.bom.gov.au> <0E3686EB9FA8AA409AFA0A25468DCE43017138F5542E@BOM-VMBX-HO.bom.gov.au> Message-ID: <4E4A5850.3020901@cens.ioc.ee> On 08/16/2011 02:32 PM, Jin Lee wrote: > Hello, > > This is my very first attempt at using f2py but I have come across a problem. If anyone can assist me I would appreciate it very much. > > I have a very simple test Fortran source, sub.f90 which is: > > subroutine sub1(x,y) > implicit none > > integer, intent(in) :: x > integer, intent(out) :: y > > ! start > y = x > > end subroutine sub1 > > > I then used f2py to produce an object file, sub.so: > > f2py -c -m sub sub.f90 --fcompiler='gfortran' > > After starting a Python interactive session I tried to import the Fortran-derived Python module but I get an error message: > >>>> import sub > Traceback (most recent call last): > File "", line 1, in > ImportError: ./sub.so: undefined symbol: _intel_fast_memset > > > Can anyone suggest what this error message means and how I can overcome it, please? Try f2py -c -m sub sub.f90 --fcompiler=gnu95 HTH, Pearu From pav at iki.fi Tue Aug 16 09:01:56 2011 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 16 Aug 2011 13:01:56 +0000 (UTC) Subject: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) References: Message-ID: Sat, 13 Aug 2011 22:00:33 -0400, josef.pktd wrote: [clip] > Does Trac require svn access to dig out old information? for example > links to old changesets, annotate/blame, ... ? It does not require HTTP access to SVN, as it looks directly at the SVN repo on the local disk. It also probably doesn't use the old SVN repo for anything in reality, as there's a simple Git plugin installed that just grabs the Git history to the timeline, and redirects source browsing etc to Github. However, I don't know whether the timeline views etc continue to function even without the local SVN repo, so I'd just disable the HTTP access and leave the local repo as it is as a backup. Pauli From tkluck at infty.nl Tue Aug 16 10:22:40 2011 From: tkluck at infty.nl (Timo Kluck) Date: Tue, 16 Aug 2011 16:22:40 +0200 Subject: [Numpy-discussion] numpy.interp running time In-Reply-To: References: <4E3452F1.7010607@hawaii.edu> Message-ID: 2011/8/1 Timo Kluck : > I just submitted a patch at > http://projects.scipy.org/numpy/ticket/1920 . It implements Eric's > suggestion. Please review, I'll be happy to adapt it to any of your > feedback. > I submitted a minor patch a while ago. It hasn't been reviewed yet, but I don't know whether that's just because the reviewers just haven't had time yet, or whether some extra action is required on my part. Perhaps the ticket should be 'tagged' for review, or similar? Let me know if there's anything more that I should do. Timo From daniel.wheeler2 at gmail.com Tue Aug 16 10:28:20 2011 From: daniel.wheeler2 at gmail.com (Daniel Wheeler) Date: Tue, 16 Aug 2011 10:28:20 -0400 Subject: [Numpy-discussion] inverting and calculating eigenvalues for many small matrices In-Reply-To: <97108383-605D-472E-B97C-75D2150B88D8@iro.umontreal.ca> References: <4E1BFCE6.5030702@astro.uio.no> <97108383-605D-472E-B97C-75D2150B88D8@iro.umontreal.ca> Message-ID: On Tue, Aug 16, 2011 at 5:53 AM, David Warde-Farley wrote: > On 2011-08-15, at 4:11 PM, Daniel Wheeler wrote: > >> One thing that I know I'm doing wrong is >> reassigning every sub-matrix to a new array. This may not be that >> costly, but it seems fairly ugly. I wasn't sure how to pass the >> address of the submatrix to the lapack routines so I'm assigning to a >> new array and passing that instead. > > It looks like the arrays you're passing are C contiguous. Am I right about this? (I ask because I was under the impression that BLAS/LAPACK routines typically want Fortran-ordered input arrays). Are you saying that fortran ordered arrays should be passed? The tests pass when compared against doing numpy equivalents so I don't believe that its currently broken (maybe suboptimal). There is a transpose and copy here and here . I believe that reorders correctly. Maybe I should cast the arrays to explicit fortran ordering rather than doing that (not sure how)? However, the transpose and copy doesn't seem to be expensive compared with the actual lapack routines. > If your 3D array is also C-contiguous, you should be able to do pointer arithmetic with A.data and B.data. foo.strides[0] will tell you how many bytes you need to move to get to the next element along that axis. Sounds complicated, but I'll try and figure it out. Thanks for the idea. > If the 3D array is anything but C contiguous, then I believe the copy is necessary. You should check for that in your Python-visible "solve" wrapper, and make a copy of it that is C contiguous if necessary (check foo.flags.c_contiguous), as this will be likely faster than copying to the same buffer each time in the loop. The copy is required after the transpose (which is required for fortran ordering). I'll look into the pointer arithmetic stuff and see if that helps any. Thanks. -- Daniel Wheeler From wesmckinn at gmail.com Tue Aug 16 11:02:06 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 16 Aug 2011 11:02:06 -0400 Subject: [Numpy-discussion] Questionable reduceat behavior In-Reply-To: References: Message-ID: On Sun, Aug 14, 2011 at 11:58 AM, Wes McKinney wrote: > On Sat, Aug 13, 2011 at 8:06 PM, Mark Wiebe wrote: >> Looks like this is the second-oldest open bug in the bug tracker. >> http://projects.scipy.org/numpy/ticket/236 >> For what it's worth, I'm in favour of changing this behavior to be more >> consistent as proposed in that ticket. >> -Mark >> >> On Thu, Aug 11, 2011 at 11:25 AM, Wes McKinney wrote: >>> >>> I'm a little perplexed why reduceat was made to behave like this: >>> >>> In [26]: arr = np.ones((10, 4), dtype=bool) >>> >>> In [27]: arr >>> Out[27]: >>> array([[ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True], >>> ? ? ? [ True, ?True, ?True, ?True]], dtype=bool) >>> >>> >>> In [30]: np.add.reduceat(arr, [0, 3, 3, 7, 9], axis=0) >>> Out[30]: >>> array([[3, 3, 3, 3], >>> ? ? ? [1, 1, 1, 1], >>> ? ? ? [4, 4, 4, 4], >>> ? ? ? [2, 2, 2, 2], >>> ? ? ? [1, 1, 1, 1]]) >>> >>> this does not seem intuitively correct. Since we have: >>> >>> In [33]: arr[3:3].sum(0) >>> Out[33]: array([0, 0, 0, 0]) >>> >>> I would expect >>> >>> array([[3, 3, 3, 3], >>> ? ? ? [0, 0, 0, 0], >>> ? ? ? [4, 4, 4, 4], >>> ? ? ? [2, 2, 2, 2], >>> ? ? ? [1, 1, 1, 1]]) >>> >>> Obviously I can RTFM and see why it does this ("if ``indices[i] >= >>> indices[i + 1]``, the i-th generalized "row" is simply >>> ``a[indices[i]]``"), but it doesn't make much sense to me, and I need >>> work around it. Suggestions? >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > Well, I certainly hope it doesn't get forgotten about for another 5 > years. I think having more consistent behavior would be better rather > than conforming to a seemingly arbitrary decision made ages ago in > Numeric. > > - Wes > just a manual hack for now where I needed it... https://github.com/wesm/pandas/blob/master/pandas/core/frame.py#L2155 From jgomezdans at gmail.com Tue Aug 16 12:50:25 2011 From: jgomezdans at gmail.com (Jose Gomez-Dans) Date: Tue, 16 Aug 2011 17:50:25 +0100 Subject: [Numpy-discussion] [f2py] How to specify compile options in setup.py Message-ID: Hi, Up to now, I have managed to build Fortran extensions with f2py by ussing the following command: $ python setup.py config_fc --fcompiler=gnu95 --f77flags='-fmy_flags' --f90flags='-fmy_flags' build I think that these options should be able to go in a setup.py file, and use the f2py_options file. One way of doing this is to extend sys.argv with the required command line options: import sys sys.argv.extend ( ['config_fc', '--fcompiler=gnu95', '--f77flags="-fmy_flags"', "--f90flags='-fmy_flags"] ) This works well if all the extensions require the same flags. In my case, however, One of the extensions requires a different set of flags (in particular, it requires that flag -fdefault-real-8 isn't set, which is required by the extensions). I tried setting the f2py_options in the add_extension method call: config.add_extension( 'my_extension', sources = my_sources, f2py_options=['f77flags="-ffixed-line-length-0" -fdefault-real-8', 'f90flags="-fdefault-real-8"'] ) This compiles the extensions (using the two dashes in front of the f2py option eg --f77flags results in an unrecognised option), but the f2p_options goes unheeded. Here's the relevant bit of the output from python setup.py build: compiling Fortran sources Fortran f77 compiler: /usr/bin/gfortran -ffixed-line-length-0 -fPIC -O3 -march=native Fortran f90 compiler: /usr/bin/gfortran -ffixed-line-length-0 -fPIC -O3 -march=native Fortran fix compiler: /usr/bin/gfortran -Wall -ffixed-form -fno-second-underscore -ffixed-line-length-0 -fPIC -O3 -march=native compile options: '-Ibuild/src.linux-i686-2.7 -I/usr/lib/pymodules/python2.7/numpy/core/include -I/usr/include/python2.7 -c' extra options: '-Jbuild/temp.linux-i686-2.7/my_dir -Ibuild/temp.linux-i686-2.7/my_dir' How can I disable (or enable) one option for compiling one particular extension? Thanks! Jose -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Aug 16 13:05:33 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 16 Aug 2011 11:05:33 -0600 Subject: [Numpy-discussion] Segfault for np.lookfor In-Reply-To: References: Message-ID: On Mon, Aug 15, 2011 at 7:43 PM, Charles R Harris wrote: > > > On Mon, Aug 15, 2011 at 7:09 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Mon, Aug 15, 2011 at 6:56 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> >>> >>> On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett wrote: >>> >>>> Hi, >>>> >>>> On current trunk, all tests pass but running the (forgive my language) >>>> doctests, I found this: >>>> >>>> In [1]: import numpy as np >>>> >>>> In [2]: np.__version__ >>>> Out[2]: '2.0.0.dev-730b861' >>>> >>>> In [3]: np.lookfor('cos') >>>> Segmentation fault >>>> >>>> on: >>>> >>>> Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC >>>> 2011 x86_64 x86_64 x86_64 GNU/Linux >>>> Ubuntu Natty Python 2.7.1+ >>>> >>>> >>> The problem is somewhere in print_coercion_tables.py >>> >>> >> Or more precisely, it triggered by importing print_coercion_tables.py. I >> don't think lookfor should be doing that, but in any case: >> >> array + scalar >> + ? b h i l q p B H I L Q P e f d g F D G S U V O M m >> ? ? b h i l l l B H I L L L e f d g F D G O O # O ! m >> b b b b b b b b b b b b b b e f d g F D G O O # O ! m >> h h h h h h h h h h h h h h f f d g F D G O O # O ! m >> i i i i i i i i i i i i i i d d d g D D G O O # O ! m >> l l l l l l l l l l l l l l d d d g D D G O O # O ! m >> q l l l l l l l l l l l l l d d d g D D G O O # O ! m >> p l l l l l l l l l l l l l d d d g D D G O O # O ! m >> B B B B B B B B B B B B B B e f d g F D G O O # O ! m >> H H H H H H H H H H H H H H f f d g F D G O O # O ! m >> I I I I I I I I I I I I I I d d d g D D G O O # O ! m >> L L L L L L L L L L L L L L d d d g D D G O O # O ! m >> Q L L L L L L L L L L L L L d d d g D D G O O # O ! m >> P L L L L L L L L L L L L L d d d g D D G O O # O ! m >> e e e e e e e e e e e e e e e e e e F F F O O # O ! # >> f f f f f f f f f f f f f f f f f f F F F O O # O ! # >> d d d d d d d d d d d d d d d d d d D D D O O # O ! # >> g g g g g g g g g g g g g g g g g g G G G O O # O ! # >> F F F F F F F F F F F F F F F F F F F F F O O # O ! # >> D D D D D D D D D D D D D D D D D D D D D O O # O ! # >> G G G G G G G G G G G G G G G G G G G G G O O # O ! # >> S O O O O O O O O O O O O O O O O O O O O O O # O ! O >> U O O O O O O O O O O O O O O O O O O O O O O # O ! O >> Segmentation fault (core dumped) >> > > A quick fix is to put the print statements in a function. > > diff --git a/numpy/testing/print_coercion_tables.py > b/numpy/testing/print_coercion_tables.p > index d875449..3bc9253 100755 > --- a/numpy/testing/print_coercion_tables.py > +++ b/numpy/testing/print_coercion_tables.py > @@ -65,22 +65,23 @@ def print_coercion_table(ntypes, inputfirstvalue, > inputsecondvalue, fir > print char, > print > > -print "can cast" > -print_cancast_table(np.typecodes['All']) > -print > -print "In these tables, ValueError is '!', OverflowError is '@', TypeError > is '#'" > -print > -print "scalar + scalar" > -print_coercion_table(np.typecodes['All'], 0, 0, False) > -print > -print "scalar + neg scalar" > -print_coercion_table(np.typecodes['All'], 0, -1, False) > -print > -print "array + scalar" > -print_coercion_table(np.typecodes['All'], 0, 0, True) > -print > -print "array + neg scalar" > -print_coercion_table(np.typecodes['All'], 0, -1, True) > -print > -print "promote_types" > -print_coercion_table(np.typecodes['All'], 0, 0, False, True) > +def printem(): > + print "can cast" > + print_cancast_table(np.typecodes['All']) > + print > + print "In these tables, ValueError is '!', OverflowError is '@', > TypeError is '#'" > + print > + print "scalar + scalar" > + print_coercion_table(np.typecodes['All'], 0, 0, False) > + print > + print "scalar + neg scalar" > + print_coercion_table(np.typecodes['All'], 0, -1, False) > + print > + print "array + scalar" > + print_coercion_table(np.typecodes['All'], 0, 0, True) > + print > + print "array + neg scalar" > + print_coercion_table(np.typecodes['All'], 0, -1, True) > + print > + print "promote_types" > + print_coercion_table(np.typecodes['All'], 0, 0, False, True) > > I opened ticket #1937 for this Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Tue Aug 16 13:14:35 2011 From: efiring at hawaii.edu (Eric Firing) Date: Tue, 16 Aug 2011 07:14:35 -1000 Subject: [Numpy-discussion] numpy.interp running time In-Reply-To: References: <4E3452F1.7010607@hawaii.edu> Message-ID: <4E4AA57B.8070506@hawaii.edu> On 08/16/2011 04:22 AM, Timo Kluck wrote: > 2011/8/1 Timo Kluck: >> I just submitted a patch at >> http://projects.scipy.org/numpy/ticket/1920 . It implements Eric's >> suggestion. Please review, I'll be happy to adapt it to any of your >> feedback. >> > I submitted a minor patch a while ago. It hasn't been reviewed yet, > but I don't know whether that's just because the reviewers just > haven't had time yet, or whether some extra action is required on my > part. Perhaps the ticket should be 'tagged' for review, or similar? > Let me know if there's anything more that I should do. > > Timo Timo, I suspect the one thing that would improve the likelihood of review would be if you were to supply the patch via a github pull request. In addition, posting a timing test (code and results) might help. Eric From matthew.brett at gmail.com Tue Aug 16 15:15:22 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 16 Aug 2011 12:15:22 -0700 Subject: [Numpy-discussion] Segfault for np.lookfor In-Reply-To: References: Message-ID: Hi, On Tue, Aug 16, 2011 at 10:05 AM, Charles R Harris wrote: > > > On Mon, Aug 15, 2011 at 7:43 PM, Charles R Harris > wrote: >> >> >> On Mon, Aug 15, 2011 at 7:09 PM, Charles R Harris >> wrote: >>> >>> >>> On Mon, Aug 15, 2011 at 6:56 PM, Charles R Harris >>> wrote: >>>> >>>> >>>> On Mon, Aug 15, 2011 at 3:53 PM, Matthew Brett >>>> wrote: >>>>> >>>>> Hi, >>>>> >>>>> On current trunk, all tests pass but running the (forgive my language) >>>>> doctests, I found this: >>>>> >>>>> In [1]: import numpy as np >>>>> >>>>> In [2]: np.__version__ >>>>> Out[2]: '2.0.0.dev-730b861' >>>>> >>>>> In [3]: np.lookfor('cos') >>>>> Segmentation fault >>>>> >>>>> on: >>>>> >>>>> Linux angela 2.6.38-10-generic #46-Ubuntu SMP Tue Jun 28 15:07:17 UTC >>>>> 2011 x86_64 x86_64 x86_64 GNU/Linux >>>>> Ubuntu Natty Python 2.7.1+ >>>>> >>>> >>>> The problem is somewhere in print_coercion_tables.py >>>> >>> >>> Or more precisely, it triggered by importing? print_coercion_tables.py. I >>> don't think lookfor should be doing that, but in any case: >>> >>> array + scalar >>> + ? b h i l q p B H I L Q P e f d g F D G S U V O M m >>> ? ? b h i l l l B H I L L L e f d g F D G O O # O ! m >>> b b b b b b b b b b b b b b e f d g F D G O O # O ! m >>> h h h h h h h h h h h h h h f f d g F D G O O # O ! m >>> i i i i i i i i i i i i i i d d d g D D G O O # O ! m >>> l l l l l l l l l l l l l l d d d g D D G O O # O ! m >>> q l l l l l l l l l l l l l d d d g D D G O O # O ! m >>> p l l l l l l l l l l l l l d d d g D D G O O # O ! m >>> B B B B B B B B B B B B B B e f d g F D G O O # O ! m >>> H H H H H H H H H H H H H H f f d g F D G O O # O ! m >>> I I I I I I I I I I I I I I d d d g D D G O O # O ! m >>> L L L L L L L L L L L L L L d d d g D D G O O # O ! m >>> Q L L L L L L L L L L L L L d d d g D D G O O # O ! m >>> P L L L L L L L L L L L L L d d d g D D G O O # O ! m >>> e e e e e e e e e e e e e e e e e e F F F O O # O ! # >>> f f f f f f f f f f f f f f f f f f F F F O O # O ! # >>> d d d d d d d d d d d d d d d d d d D D D O O # O ! # >>> g g g g g g g g g g g g g g g g g g G G G O O # O ! # >>> F F F F F F F F F F F F F F F F F F F F F O O # O ! # >>> D D D D D D D D D D D D D D D D D D D D D O O # O ! # >>> G G G G G G G G G G G G G G G G G G G G G O O # O ! # >>> S O O O O O O O O O O O O O O O O O O O O O O # O ! O >>> U O O O O O O O O O O O O O O O O O O O O O O # O ! O >>> Segmentation fault (core dumped) >> >> A quick fix is to put the print statements in a function. >> >> diff --git a/numpy/testing/print_coercion_tables.py >> b/numpy/testing/print_coercion_tables.p >> index d875449..3bc9253 100755 >> --- a/numpy/testing/print_coercion_tables.py >> +++ b/numpy/testing/print_coercion_tables.py >> @@ -65,22 +65,23 @@ def print_coercion_table(ntypes, inputfirstvalue, >> inputsecondvalue, fir >> ???????????? print char, >> ???????? print >> >> -print "can cast" >> -print_cancast_table(np.typecodes['All']) >> -print >> -print "In these tables, ValueError is '!', OverflowError is '@', >> TypeError is '#'" >> -print >> -print "scalar + scalar" >> -print_coercion_table(np.typecodes['All'], 0, 0, False) >> -print >> -print "scalar + neg scalar" >> -print_coercion_table(np.typecodes['All'], 0, -1, False) >> -print >> -print "array + scalar" >> -print_coercion_table(np.typecodes['All'], 0, 0, True) >> -print >> -print "array + neg scalar" >> -print_coercion_table(np.typecodes['All'], 0, -1, True) >> -print >> -print "promote_types" >> -print_coercion_table(np.typecodes['All'], 0, 0, False, True) >> +def printem(): >> +??? print "can cast" >> +??? print_cancast_table(np.typecodes['All']) >> +??? print >> +??? print "In these tables, ValueError is '!', OverflowError is '@', >> TypeError is '#'" >> +??? print >> +??? print "scalar + scalar" >> +??? print_coercion_table(np.typecodes['All'], 0, 0, False) >> +??? print >> +??? print "scalar + neg scalar" >> +??? print_coercion_table(np.typecodes['All'], 0, -1, False) >> +??? print >> +??? print "array + scalar" >> +??? print_coercion_table(np.typecodes['All'], 0, 0, True) >> +??? print >> +??? print "array + neg scalar" >> +??? print_coercion_table(np.typecodes['All'], 0, -1, True) >> +??? print >> +??? print "promote_types" >> +??? print_coercion_table(np.typecodes['All'], 0, 0, False, True) >> > > I opened ticket #1937 for this >From git-bisect it looks like the culprit is: feb8079070b8a659d7eee1b4acbddf470fd8a81d is the first bad commit commit feb8079070b8a659d7eee1b4acbddf470fd8a81d Author: Ben Walsh Date: Sun Jul 10 12:52:52 2011 +0100 BUT: Stop _array_find_type trying to make every list element a subtype of bool. Just to remind me, my procedure was: <~/tmp/testfor.py> #!/usr/bin/env python import sys from functools import partial from subprocess import check_call, Popen, PIPE, CalledProcessError caller = partial(check_call, shell=True) popener = partial(Popen, stdout=PIPE, stderr=PIPE, shell=True) try: caller('git clean -fxd') caller('python setup.py build_ext -i') except CalledProcessError: sys.exit(125) # untestable proc = popener('python -c "%s"' % """import sys import numpy as np np.lookfor('cos', output=sys.stdout) """) stdout, stderr = proc.communicate() if 'Segmentation fault' in stderr: sys.exit(1) # bad sys.exit(0) # good Then, I established the v1.6.1 did not have the segfault, and (man git-bisect): git co main-master # current upstream master git bisect start HEAD v1.6.1 -- git bisect run ~/tmp/testfor.py See y'all, Matthew From pearu.peterson at gmail.com Tue Aug 16 16:51:24 2011 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Tue, 16 Aug 2011 23:51:24 +0300 Subject: [Numpy-discussion] [f2py] How to specify compile options in setup.py In-Reply-To: References: Message-ID: , On Tue, Aug 16, 2011 at 7:50 PM, Jose Gomez-Dans wrote: > Hi, > > Up to now, I have managed to build Fortran extensions with f2py by ussing > the following command: > $ python setup.py config_fc --fcompiler=gnu95 > --f77flags='-fmy_flags' --f90flags='-fmy_flags' build > > I think that these options should be able to go in a setup.py file, and use > the f2py_options file. One way of doing this is to extend sys.argv with the > required command line options: > import sys > sys.argv.extend ( ['config_fc', '--fcompiler=gnu95', > '--f77flags="-fmy_flags"', "--f90flags='-fmy_flags"] ) > > This works well if all the extensions require the same flags. In my case, > however, One of the extensions requires a different set of flags (in > particular, it requires that flag -fdefault-real-8 isn't set, which is > required by the extensions). I tried setting the f2py_options in the > add_extension method call: > > config.add_extension( 'my_extension', sources = my_sources, > f2py_options=['f77flags="-ffixed-line-length-0" -fdefault-real-8', > 'f90flags="-fdefault-real-8"'] ) > > This compiles the extensions (using the two dashes in front of the f2py > option eg --f77flags results in an unrecognised option), but the f2p_options > goes unheeded. Here's the relevant bit of the output from python setup.py > build: > > compiling Fortran sources > Fortran f77 compiler: /usr/bin/gfortran -ffixed-line-length-0 -fPIC -O3 > -march=native > Fortran f90 compiler: /usr/bin/gfortran -ffixed-line-length-0 -fPIC -O3 > -march=native > Fortran fix compiler: /usr/bin/gfortran -Wall -ffixed-form > -fno-second-underscore -ffixed-line-length-0 -fPIC -O3 -march=native > compile options: '-Ibuild/src.linux-i686-2.7 > -I/usr/lib/pymodules/python2.7/numpy/core/include -I/usr/include/python2.7 > -c' > extra options: '-Jbuild/temp.linux-i686-2.7/my_dir > -Ibuild/temp.linux-i686-2.7/my_dir' > > How can I disable (or enable) one option for compiling one particular > extension? > > You cannot do it unless you update numpy from git repo. I just implemented the support for extra_f77_compile_args and extra_f90_compile_args options that can be used in config.add_extension as well as in config.add_library. See https://github.com/numpy/numpy/commit/43862759 So, with recent numpy the following will work config.add_extension( 'my_extension', sources = my_sources, extra_f77_compile_args = ["-ffixed-line-length-0", "-fdefault-real-8"], extra_f90_compile_args = ["-fdefault-real-8"], ) HTH, Pearu -------------- next part -------------- An HTML attachment was scrubbed... URL: From hongchunjin at gmail.com Tue Aug 16 17:19:07 2011 From: hongchunjin at gmail.com (Hongchun Jin) Date: Tue, 16 Aug 2011 16:19:07 -0500 Subject: [Numpy-discussion] Trim a numpy array in numpy. Message-ID: *Hi there, * * * *I have a question regarding how to trim a string array in numpy. * * * *>>> import numpy as np* *>>> x = np.array(['aaa.hdf', 'bbb.hdf', 'ccc.hdf', 'ddd.hdf'])* * * *I expect to trim a certain part of each element in the array, for example '.hdf', giving me ['aaa', 'bbb', 'ccc', 'ddd']. Of course, I can do a loop thing. However, in my actual dataset, I have more than one million elements in such an array. So I am wondering is there a faster and better way to do it, like STRMID function in IDL? I try to google it, but it turns out that I can not find any discussion about it. Thanks. * * Hongchun* -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Tue Aug 16 17:39:26 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 16 Aug 2011 23:39:26 +0200 Subject: [Numpy-discussion] Trim a numpy array in numpy. In-Reply-To: References: Message-ID: Hi Hongchun, On 16 Aug 2011, at 23:19, Hongchun Jin wrote: > I have a question regarding how to trim a string array in numpy. > > >>> import numpy as np > >>> x = np.array(['aaa.hdf', 'bbb.hdf', 'ccc.hdf', 'ddd.hdf']) > > I expect to trim a certain part of each element in the array, for example '.hdf', giving me ['aaa', 'bbb', 'ccc', 'ddd']. Of course, I can do a loop thing. However, in my actual dataset, I have more than one million elements in such an array. So I am wondering is there a faster and better way to do it, like STRMID function in IDL? I try to google it, but it turns out that I can not find any discussion about it. Thanks. > For a case like above, if you really have all constant length strings and want to truncate to a fixed length, you could simply do x.astype('|S3') For more complex cases like trimming regex patterns I can't think of a numpy solution right now, coding the loop in cython might be a better bet there... Cheers, Derek From hongchunjin at gmail.com Tue Aug 16 17:51:49 2011 From: hongchunjin at gmail.com (Hongchun Jin) Date: Tue, 16 Aug 2011 16:51:49 -0500 Subject: [Numpy-discussion] Trim a numpy array in numpy. In-Reply-To: References: Message-ID: *Thanks Derek for the quick reply. But **I am sorry, I did not make it clear in my last email. Assume I have an array like * * ['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' ..., 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'] I need to get the sub-string for date and time, for example, ** '2008-01-31T23-56-35ZD' in the middle of each element. In more general cases, the sub-string could be any part of the string in such an array. I hope to assign the start and stop of the sub-string when I am subsetting it. * *Best, Hongchun * On Tue, Aug 16, 2011 at 4:39 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > x.astype('|S3') -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Tue Aug 16 18:43:50 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Wed, 17 Aug 2011 00:43:50 +0200 Subject: [Numpy-discussion] Trim a numpy array in numpy. In-Reply-To: References: Message-ID: On 16 Aug 2011, at 23:51, Hongchun Jin wrote: > Thanks Derek for the quick reply. But I am sorry, I did not make it clear in my last email. Assume I have an array like > ['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' ..., > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'] > > I need to get the sub-string for date and time, for example, > > '2008-01-31T23-56-35ZD' in the middle of each element. In more general cases, the sub-string could be any part of the string in such an array. I hope to assign the start and stop of the sub-string when I am subsetting it. > Well, maybe I was a bit too quick in my reply - see the documentation for np.char for some vectorized array operations that might be of use. Unfortunately, operations like 'lstrip' and 'rstrip' don't do exactly what you might them expect to, but you could use for example np.char.split(x,'.') to create an array of lists of substrings and then deal with them; something like removing the '.hdf' suffix would already require a somewhat lengthy recursion: np.char.rstrip(np.char.rstrip(np.char.rstrip(np.char.rstrip(x, 'f'), 'd'), 'h'), '.') To also remove the leading substring in your case clearly would lead to a very clumsy expression... It turns out however, something like the above for a similar test case with a length 100000 array takes about 3 times longer than the np.char.split() way; but even that is slower than a direct loop over string functions: In [6]: %timeit -n 10 y = np.char.split(x, '.') 10 loops, best of 3: 188 ms per loop In [7]: %timeit -n 10 y = np.char.split(x, '.'); z = np.fromiter( (l[1] for l in y), dtype='|S3', count=x.shape[0]) 10 loops, best of 3: 218 ms per loop In [8]: %timeit -n 10 z = np.fromiter( (l.split('.')[1] for l in x), dtype='|S3', count=x.shape[0]) 10 loops, best of 3: 143 ms per loop So it seems all of the vectorization in np.char is not that great after all (and the direct loop might still be acceptable for 1.e6 elements...)! Cheers, Derek From warren.weckesser at enthought.com Tue Aug 16 19:44:10 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 16 Aug 2011 18:44:10 -0500 Subject: [Numpy-discussion] Trim a numpy array in numpy. In-Reply-To: References: Message-ID: On Tue, Aug 16, 2011 at 4:51 PM, Hongchun Jin wrote: > *Thanks Derek for the quick reply. But **I am sorry, I did not make it > clear in my last email. Assume I have an array like * > * > > ['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' ..., > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' > > 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'] > > I need to get the sub-string for date and time, for example, > ** > > '2008-01-31T23-56-35ZD' in the middle of each element. In more general > cases, the sub-string could be any part of the string in such an array. I > hope to assign the start and stop of the sub-string when I am subsetting it. > > * > Here's one way: ----- import numpy as np def strslice(x, start=None, stop=None, step=None): """ Given a contiguous 1-d numpy array `x` of strings, return a new numpy array `y` of strings so that y[k] = x[k][start:stop:step]. `y` contains a copy of the data, not a view. """ slc = slice(start, stop, step) x2d = x.view(np.byte).reshape(-1, x.itemsize) y2d = x2d[:, slc].copy() y = y2d.view('S' + str(y2d.shape[-1])).ravel() return y if __name__ == "__main__": x = np.array(['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf', 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf', 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf', 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf', 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf', 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf']) print "x:\n", x y = strslice(x, start=31, stop=52) print "y:\n", y ----- Output: ----- x: ['CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-01T00-37-48ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf' 'CAL_LID_L2_05kmCLay-Prov-V3-01.2008-01-31T23-56-35ZD.hdf'] y: ['2008-01-01T00-37-48ZD' '2008-01-01T00-37-48ZD' '2008-01-01T00-37-48ZD' '2008-01-31T23-56-35ZD' '2008-01-31T23-56-35ZD' '2008-01-31T23-56-35ZD'] ----- Warren * > > > * > *Best, > > Hongchun > * > > > On Tue, Aug 16, 2011 at 4:39 PM, Derek Homeier < > derek at astro.physik.uni-goettingen.de> wrote: > >> x.astype('|S3') > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From J.Lee at bom.gov.au Tue Aug 16 22:16:41 2011 From: J.Lee at bom.gov.au (Jin Lee) Date: Wed, 17 Aug 2011 12:16:41 +1000 Subject: [Numpy-discussion] f2py - undefined symbol: _intel_fast_memset [SEC=UNCLASSIFIED] In-Reply-To: <4E4A5850.3020901@cens.ioc.ee> Message-ID: <0E3686EB9FA8AA409AFA0A25468DCE43017138E5BD75@BOM-VMBX-HO.bom.gov.au> Hello Pearu, Thank you for your reply. It turned out that I was using Intel C/C++ compiler (icc) as my environment was set up for that compiler. I changed my compile environment to gcc and f2py worked. BTW for the '--fcompiler' switch both 'gnu95' and 'gfortran' seem to work fine. Many thanks for your prompt reply. Regards, Jin > -----Original Message----- > From: numpy-discussion-bounces at scipy.org > [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of > Pearu Peterson > Sent: Tuesday, 16 August 2011 21:45 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] f2py - undefined symbol: > _intel_fast_memset [SEC=UNCLASSIFIED] > > > > On 08/16/2011 02:32 PM, Jin Lee wrote: > > Hello, > > > > This is my very first attempt at using f2py but I have come > across a problem. If anyone can assist me I would appreciate > it very much. > > > > I have a very simple test Fortran source, sub.f90 which is: > > > > subroutine sub1(x,y) > > implicit none > > > > integer, intent(in) :: x > > integer, intent(out) :: y > > > > ! start > > y = x > > > > end subroutine sub1 > > > > > > I then used f2py to produce an object file, sub.so: > > > > f2py -c -m sub sub.f90 --fcompiler='gfortran' > > > > After starting a Python interactive session I tried to > import the Fortran-derived Python module but I get an error message: > > > >>>> import sub > > Traceback (most recent call last): > > File "", line 1, in > > ImportError: ./sub.so: undefined symbol: _intel_fast_memset > > > > > > Can anyone suggest what this error message means and how I > can overcome it, please? > > Try > f2py -c -m sub sub.f90 --fcompiler=gnu95 > > HTH, > Pearu > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jdh2358 at gmail.com Wed Aug 17 09:01:53 2011 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 17 Aug 2011 08:01:53 -0500 Subject: [Numpy-discussion] segfault on complex array on solaris x86 In-Reply-To: References: Message-ID: On Wed, Apr 13, 2011 at 8:50 AM, John Hunter wrote: > On Sat, Jan 15, 2011 at 7:28 AM, Ralf Gommers > wrote: >> I've opened http://projects.scipy.org/numpy/ticket/1713 so this doesn't get >> lost. > > Just wanted to bump this -- bug still exists in numpy HEAD 2.0.0.dev-fe3852f Just wanted to mention that this segfault still exists in 2.0.0.dev-4386275 and I updated the ticket at http://projects.scipy.org/numpy/ticket/1713 with a much simpler test script. Basically:: import numpy as np xn = np.exp(2j) is causing a segfault on my solaris platform From keith.hughitt at gmail.com Wed Aug 17 10:04:10 2011 From: keith.hughitt at gmail.com (Keith Hughitt) Date: Wed, 17 Aug 2011 10:04:10 -0400 Subject: [Numpy-discussion] Best way to construct/slice 3-dimensional ndarray from multiple 2d ndarrays? Message-ID: Hi all, I have a method which builds a single 3d ndarray from several equal-dimension 2d ndarrays, and another method which extracts the original 2d ndarrays back out from the 3d one. The way I'm doing this right now is pretty simple, e.g.: cube = np.asarray([arr1, arr2,...]) ... x = cube[0] I believe the way this is currently handled, is to use new memory locations first for the 3d array, and then later for the 2d slices. Does anyone know if there is a better way to handle this? Ideally, I would like to reuse the same memory locations instead of copying it anew each time. Also, when subclassing ndarray and calling obj = data.view(cls) for an ndarray "data", does this copy the data into the new object by value or reference? The method which extracts the 2d slice actually returns a subclass of ndarray created using the extracted data, so this is why I ask. Any insight or suggestions would be appreciated. Thanks! Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Wed Aug 17 13:00:19 2011 From: shish at keba.be (Olivier Delalleau) Date: Wed, 17 Aug 2011 13:00:19 -0400 Subject: [Numpy-discussion] Best way to construct/slice 3-dimensional ndarray from multiple 2d ndarrays? In-Reply-To: References: Message-ID: Right now you allocate new memory only when creating your 3d array. When you do "x = cube[0]" this creates a view that does not allocate more memory. If your 2d arrays were created independently, I don't think you can avoid this. If you have some control on the way your original 2D arrays are created, you can first initialize the 3d array with correct shape (or an upper bound on the number of 2d arrays), then use views on this 3d array ("x_i = cube[i]") to fill your 2D arrays in the same memory space. I can't help with your second question, sorry. -=- Olivier 2011/8/17 Keith Hughitt > Hi all, > > I have a method which builds a single 3d ndarray from several > equal-dimension 2d ndarrays, and another method which extracts the original > 2d ndarrays back out from the 3d one. > > The way I'm doing this right now is pretty simple, e.g.: > > cube = np.asarray([arr1, arr2,...]) > ... > x = cube[0] > > I believe the way this is currently handled, is to use new memory locations > first for the 3d array, and then later for the 2d slices. > > Does anyone know if there is a better way to handle this? Ideally, I would > like to reuse the same memory locations instead of copying it anew each > time. > > Also, when subclassing ndarray and calling obj = data.view(cls) for an > ndarray "data", does this copy the data into the new object by value or > reference? The method which extracts the 2d slice actually returns a > subclass of ndarray created using the extracted data, so this is why I ask. > > Any insight or suggestions would be appreciated. > > Thanks! > Keith > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.hughitt at gmail.com Wed Aug 17 13:43:55 2011 From: keith.hughitt at gmail.com (Keith Hughitt) Date: Wed, 17 Aug 2011 13:43:55 -0400 Subject: [Numpy-discussion] Best way to construct/slice 3-dimensional ndarray from multiple 2d ndarrays? In-Reply-To: References: Message-ID: The 2d arrays are read in using another library (PyFITS), so I probably won't be able to control that too much, otherwise that sounds like exactly what I need. I'm actually overriding the indexing operation so that the user gets back an ndarray subclass when they do "cube[0]": def __getitem__(self, key): """Overiding indexing operation""" if isinstance(key, int): data = np.ndarray.__getitem__(self, key) header = self._headers[key] for cls in BaseMap.__subclasses__(): if cls.is_datasource_for(header): return cls(data, header) raise UnrecognizedDataSouceError else: return np.ndarray.__getitem__(self, key) Which relates to the second part of the question I had about how the ndarray is handled when an instance of a ndarray subclass is created. Thanks for the suggestions! Keith On Wed, Aug 17, 2011 at 1:00 PM, Olivier Delalleau wrote: > Right now you allocate new memory only when creating your 3d array. When > you do "x = cube[0]" this creates a view that does not allocate more memory. > > If your 2d arrays were created independently, I don't think you can avoid > this. > If you have some control on the way your original 2D arrays are created, > you can first initialize the 3d array with correct shape (or an upper bound > on the number of 2d arrays), then use views on this 3d array ("x_i = > cube[i]") to fill your 2D arrays in the same memory space. > > I can't help with your second question, sorry. > > -=- Olivier > > 2011/8/17 Keith Hughitt > >> Hi all, >> >> I have a method which builds a single 3d ndarray from several >> equal-dimension 2d ndarrays, and another method which extracts the original >> 2d ndarrays back out from the 3d one. >> >> The way I'm doing this right now is pretty simple, e.g.: >> >> cube = np.asarray([arr1, arr2,...]) >> ... >> x = cube[0] >> >> I believe the way this is currently handled, is to use new memory >> locations first for the 3d array, and then later for the 2d slices. >> >> Does anyone know if there is a better way to handle this? Ideally, I would >> like to reuse the same memory locations instead of copying it anew each >> time. >> >> Also, when subclassing ndarray and calling obj = data.view(cls) for an >> ndarray "data", does this copy the data into the new object by value or >> reference? The method which extracts the 2d slice actually returns a >> subclass of ndarray created using the extracted data, so this is why I ask. >> >> Any insight or suggestions would be appreciated. >> >> Thanks! >> Keith >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aronne.merrelli at gmail.com Wed Aug 17 13:46:12 2011 From: aronne.merrelli at gmail.com (Aronne Merrelli) Date: Wed, 17 Aug 2011 12:46:12 -0500 Subject: [Numpy-discussion] Best way to construct/slice 3-dimensional ndarray from multiple 2d ndarrays? In-Reply-To: References: Message-ID: On Wed, Aug 17, 2011 at 9:04 AM, Keith Hughitt wrote: > > Also, when subclassing ndarray and calling obj = data.view(cls) for an > ndarray "data", does this copy the data into the new object by value or > reference? The method which extracts the 2d slice actually returns a > subclass of ndarray created using the extracted data, so this is why I ask. > > > I think it should pass a reference - the following code suggests the subclass is sharing the same fundamental array object. You can use the .base attribute of the ndarray object to see if it is a view back to another ndarray object: import numpy as np class TestClass(np.ndarray): def __new__(cls, inp_array): return inp_array.view(cls) In [2]: x = np.ones(5) In [3]: obj = TestClass(x) In [4]: id(x), id(obj), id(obj.base) Out[4]: (23517648, 19708080, 23517648) In [5]: print x, obj [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] In [6]: x[2] = 2 In [7]: print x, obj [ 1. 1. 2. 1. 1.] [ 1. 1. 2. 1. 1.] If you change the TestClass.__new__() to: "return np.array(inp_array).view(cls)" then you will make a copy of the input array instead, if that is needed. In that case, it looks like the .base attribute is a new ndarray, copied from the input array. Aronne [PS - also note that .base is set to None, if the ndarray is not a view into another ndarray; it turns out that None has a valid object number, which confused me at first - see id(None).] -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.hughitt at gmail.com Wed Aug 17 14:11:33 2011 From: keith.hughitt at gmail.com (Keith Hughitt) Date: Wed, 17 Aug 2011 14:11:33 -0400 Subject: [Numpy-discussion] Best way to construct/slice 3-dimensional ndarray from multiple 2d ndarrays? In-Reply-To: References: Message-ID: Great! It looks like it is in fact working as desired: In [4]: cube.shape Out[4]: (5, 4096, 4096) In [5]: slice = cube[0] In [6]: cube[0,1000,1000] Out[6]: 618 In [7]: slice[1000,1000] Out[7]: 618 In [8]: slice[1000,1000] = 123 In [9]: cube[0, 1000,1000] Out[9]: 123 I didn't know about the .base attribute; that is really useful. Thank you both for the feedback. Keith On Wed, Aug 17, 2011 at 1:46 PM, Aronne Merrelli wrote: > > > On Wed, Aug 17, 2011 at 9:04 AM, Keith Hughitt wrote: > >> >> Also, when subclassing ndarray and calling obj = data.view(cls) for an >> ndarray "data", does this copy the data into the new object by value or >> reference? The method which extracts the 2d slice actually returns a >> subclass of ndarray created using the extracted data, so this is why I ask. >> >> >> > I think it should pass a reference - the following code suggests the > subclass is sharing the same fundamental array object. You can use the .base > attribute of the ndarray object to see if it is a view back to another > ndarray object: > > import numpy as np > class TestClass(np.ndarray): > def __new__(cls, inp_array): > return inp_array.view(cls) > > In [2]: x = np.ones(5) > In [3]: obj = TestClass(x) > In [4]: id(x), id(obj), id(obj.base) > Out[4]: (23517648, 19708080, 23517648) > In [5]: print x, obj > [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] > In [6]: x[2] = 2 > In [7]: print x, obj > [ 1. 1. 2. 1. 1.] [ 1. 1. 2. 1. 1.] > > > If you change the TestClass.__new__() to: "return > np.array(inp_array).view(cls)" then you will make a copy of the input array > instead, if that is needed. In that case, it looks like the .base attribute > is a new ndarray, copied from the input array. > > > Aronne > > [PS - also note that .base is set to None, if the ndarray is not a view > into another ndarray; it turns out that None has a valid object number, > which confused me at first - see id(None).] > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Aug 17 14:54:06 2011 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 17 Aug 2011 13:54:06 -0500 Subject: [Numpy-discussion] bug with assignment into an indexed array? In-Reply-To: References: Message-ID: On Sat, Aug 13, 2011 at 7:17 PM, Mark Wiebe wrote: > On Thu, Aug 11, 2011 at 1:37 PM, Benjamin Root wrote: > >> On Thu, Aug 11, 2011 at 10:33 AM, Olivier Delalleau wrote: >> >>> 2011/8/11 Benjamin Root >>> >>>> >>>> >>>> On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau wrote: >>>> >>>>> Maybe confusing, but working as expected. >>>>> >>>>> >>>>> When you write: >>>>> matched_to[np.array([0, 1, 2])] = 3 >>>>> it calls __setitem__ on matched_to, with arguments (np.array([0, 1, >>>>> 2]), 3). So numpy understand you want to write 3 at these indices. >>>>> >>>>> >>>>> When you write: >>>>> matched_to[:3][match] = 3 >>>>> it first calls __getitem__ with the slice as argument, which returns a >>>>> view of your array, then it calls __setitem__ on this view, and it fills >>>>> your matched_to array at the same time. >>>>> >>>>> >>>>> But when you write: >>>>> matched_to[np.array([0, 1, 2])][match] = 3 >>>>> it first calls __getitem__ with the array as argument, which retunrs a >>>>> *copy* of your array, so that calling __setitem__ on this copy has no effect >>>>> on your original array. >>>>> >>>>> -=- Olivier >>>>> >>>>> >>>> Right, but I guess my question is does it *have* to be that way? I >>>> guess it makes some sense with respect to indexing with a numpy array like I >>>> did with the last example, because an element could be referred to multiple >>>> times (which explains the common surprise with '+='), but with boolean >>>> indexing, we are guaranteed that each element of the view will appear at >>>> most once. Therefore, shouldn't boolean indexing always return a view, not >>>> a copy? Is the general case of arbitrary array selection inherently >>>> impossible to encode in a view versus a slice with a regular spacing? >>>> >>> >>> Yes, due to the fact the array interface only supports regular spacing >>> (otherwise it is more difficult to get efficient access to arbitrary array >>> positions). >>> >>> -=- Olivier >>> >>> >> This still bothers me, though. I imagine that it is next to impossible to >> detect this situation from numpy's perspective, so it can't even emit a >> warning or error. Furthermore, for someone who makes a general function to >> modify the contents of some externally provided array, there is a >> possibility that the provided array is actually a copy not a view. >> Although, I guess it is the responsibility of the user to know the >> difference. >> >> I guess that is the key problem. The key advantage we are taught about >> numpy arrays is the use of views for efficient access. It would seem that >> most access operations would use it, but in reality, only sliced access do. >> Everything else is a copy (unless you are doing fancy indexing with >> assignment). Maybe with some of the forthcoming changes that have been done >> with respect to nditer and ufuncs (in particular, I am thinking of the >> "where" kwarg), maybe we could consider an enhancement allowing fancy >> indexing (or at least boolean indexing) to produce a view? Even if it is >> less efficient than a view from slicing, it would bring better consistency >> in behavior between the different forms of indexing. >> >> Just my 2 cents, >> Ben Root >> > > I think it would be nice to evolve the NumPy indexing and array > representation towards the goal of indexing returning a view in all cases > with no exceptions. This would provide a much nicer mental model to program > with. Accomplishing such a transition will take a fair bit of time, though. > > -Mark > > Mark, It is good to know that there is a chance to make this possible, eventually. However, I just thought of a possible barrier that might have to be overcome before achieving this. Because it has always been very clear that non-slicing produces copies, I can easily imagine situations where developers have come to depend on this copying behavior. While I think most copies are unintended (but unnoticed because it was read-only), it is quite possible that there are situations where this copy behavior is entirely intended. Therefore, changing this behavior may break code in subtle ways. I am not saying that it shouldn't be done (clarity and simplicity should be paramount), but one should tread carefully here. My 2 cents, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Aug 17 15:12:28 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 17 Aug 2011 12:12:28 -0700 Subject: [Numpy-discussion] bug with assignment into an indexed array? In-Reply-To: References: Message-ID: On Wed, Aug 17, 2011 at 11:54 AM, Benjamin Root wrote: > On Sat, Aug 13, 2011 at 7:17 PM, Mark Wiebe wrote: > >> On Thu, Aug 11, 2011 at 1:37 PM, Benjamin Root wrote: >> >>> On Thu, Aug 11, 2011 at 10:33 AM, Olivier Delalleau wrote: >>> >>>> 2011/8/11 Benjamin Root >>>> >>>>> >>>>> >>>>> On Thu, Aug 11, 2011 at 8:37 AM, Olivier Delalleau wrote: >>>>> >>>>>> Maybe confusing, but working as expected. >>>>>> >>>>>> >>>>>> When you write: >>>>>> matched_to[np.array([0, 1, 2])] = 3 >>>>>> it calls __setitem__ on matched_to, with arguments (np.array([0, 1, >>>>>> 2]), 3). So numpy understand you want to write 3 at these indices. >>>>>> >>>>>> >>>>>> When you write: >>>>>> matched_to[:3][match] = 3 >>>>>> it first calls __getitem__ with the slice as argument, which returns a >>>>>> view of your array, then it calls __setitem__ on this view, and it fills >>>>>> your matched_to array at the same time. >>>>>> >>>>>> >>>>>> But when you write: >>>>>> matched_to[np.array([0, 1, 2])][match] = 3 >>>>>> it first calls __getitem__ with the array as argument, which retunrs a >>>>>> *copy* of your array, so that calling __setitem__ on this copy has no effect >>>>>> on your original array. >>>>>> >>>>>> -=- Olivier >>>>>> >>>>>> >>>>> Right, but I guess my question is does it *have* to be that way? I >>>>> guess it makes some sense with respect to indexing with a numpy array like I >>>>> did with the last example, because an element could be referred to multiple >>>>> times (which explains the common surprise with '+='), but with boolean >>>>> indexing, we are guaranteed that each element of the view will appear at >>>>> most once. Therefore, shouldn't boolean indexing always return a view, not >>>>> a copy? Is the general case of arbitrary array selection inherently >>>>> impossible to encode in a view versus a slice with a regular spacing? >>>>> >>>> >>>> Yes, due to the fact the array interface only supports regular spacing >>>> (otherwise it is more difficult to get efficient access to arbitrary array >>>> positions). >>>> >>>> -=- Olivier >>>> >>>> >>> This still bothers me, though. I imagine that it is next to impossible >>> to detect this situation from numpy's perspective, so it can't even emit a >>> warning or error. Furthermore, for someone who makes a general function to >>> modify the contents of some externally provided array, there is a >>> possibility that the provided array is actually a copy not a view. >>> Although, I guess it is the responsibility of the user to know the >>> difference. >>> >>> I guess that is the key problem. The key advantage we are taught about >>> numpy arrays is the use of views for efficient access. It would seem that >>> most access operations would use it, but in reality, only sliced access do. >>> Everything else is a copy (unless you are doing fancy indexing with >>> assignment). Maybe with some of the forthcoming changes that have been done >>> with respect to nditer and ufuncs (in particular, I am thinking of the >>> "where" kwarg), maybe we could consider an enhancement allowing fancy >>> indexing (or at least boolean indexing) to produce a view? Even if it is >>> less efficient than a view from slicing, it would bring better consistency >>> in behavior between the different forms of indexing. >>> >>> Just my 2 cents, >>> Ben Root >>> >> >> I think it would be nice to evolve the NumPy indexing and array >> representation towards the goal of indexing returning a view in all cases >> with no exceptions. This would provide a much nicer mental model to program >> with. Accomplishing such a transition will take a fair bit of time, though. >> >> -Mark >> >> > > Mark, > > It is good to know that there is a chance to make this possible, > eventually. However, I just thought of a possible barrier that might have > to be overcome before achieving this. Because it has always been very clear > that non-slicing produces copies, I can easily imagine situations where > developers have come to depend on this copying behavior. While I think most > copies are unintended (but unnoticed because it was read-only), it is quite > possible that there are situations where this copy behavior is entirely > intended. Therefore, changing this behavior may break code in subtle ways. > > I am not saying that it shouldn't be done (clarity and simplicity should be > paramount), but one should tread carefully here. > Absolutely. It would necessarily be very long term and the specifics of how it could be done are nontrivial, but I figured it was worth mentioning the idea. -Mark > > My 2 cents, > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.hughitt at gmail.com Wed Aug 17 15:25:33 2011 From: keith.hughitt at gmail.com (Keith Hughitt) Date: Wed, 17 Aug 2011 15:25:33 -0400 Subject: [Numpy-discussion] Overriding numpy.ndarray.__getitem__? Message-ID: Hi all, I have a subclass of ndarray which is built using using a stack of images. Rather than store the image header information separately, I overrode __getitem__ so that when the user indexes into the image cube a single image a different object type (which includes the header information) is returned : class ImageCube(np.ndarray): . . . def __getitem__(self, key): """Overiding indexing operation""" if isinstance(key, int): data = np.ndarray.__getitem__(self, key) header = self._headers[key] return SingleImage(data, header) else: return np.ndarray.__getitem__(self, key) Everything seems to work well, however, now when I try to combine that with indexing into the other dimensions of a single image, errors relating to numpy's array printing arise, e.g.: >>> print imagecube[0,0:256,0:256] . . . /usr/lib/pymodules/python2.7/numpy/core/arrayprint.pyc in _formatArray(a, format_function, rank, max_line_len, next_line_prefix, separator, edge_items, summary_insert) 371 if leading_items or i != trailing_items: 372 s += next_line_prefix --> 373 s += _formatArray(a[-i], format_function, rank-1, max_line_len, 374 " " + next_line_prefix, separator, edge_items, 375 summary_insert) I think the problem has to do with how I am overriding __getitem__: I check to see if the input is a single integer, and if it is, I return the new object instance. This should only occur when something like "imagecube[n]" is called, however, array2str ends up calling imagecube[x], even if the original thing you are trying to print is something like imagecube[0,1:256,1:256]. Any ideas? I apologize if the explanation is not very clear; I'm still trying to figure out exactly what is going on. Thanks, Keith -------------- next part -------------- An HTML attachment was scrubbed... URL: From keith.hughitt at gmail.com Wed Aug 17 16:22:30 2011 From: keith.hughitt at gmail.com (Keith Hughitt) Date: Wed, 17 Aug 2011 16:22:30 -0400 Subject: [Numpy-discussion] Overriding numpy.ndarray.__getitem__? In-Reply-To: References: Message-ID: Okay, I found something that seems to do the trick for this particular problem. Instead of just checking whether the input to __getitem__ is an int, I also check the number of dimensions to make sure we are indexing within the full cube, and not some sub-index of the cube: if self.ndim is 3 and isinstance(key, int): ... I think what was happening is that when repr() is called on the map, it recursively walks through displaying one dimension at a time, and this is what was causing my code to choke; the instantiation of a subclass only makes sense for one of the three dimensions. Keith On Wed, Aug 17, 2011 at 3:25 PM, Keith Hughitt wrote: > Hi all, > > I have a subclass of ndarray which is built using using a stack of images. > Rather than store the image header information separately, I overrode > __getitem__ so that when the user indexes into the image cube a single image > a different object type (which includes the header information) is returned > : > > class ImageCube(np.ndarray): > . > . > . > def __getitem__(self, key): > """Overiding indexing operation""" > if isinstance(key, int): > data = np.ndarray.__getitem__(self, key) > header = self._headers[key] > return SingleImage(data, header) > else: > return np.ndarray.__getitem__(self, key) > > > Everything seems to work well, however, now when I try to combine that with > indexing into the other dimensions of a single image, errors relating to > numpy's array printing arise, e.g.: > > > >>> print imagecube[0,0:256,0:256] > . > . > . > > /usr/lib/pymodules/python2.7/numpy/core/arrayprint.pyc in _formatArray(a, > format_function, rank, max_line_len, next_line_prefix, separator, > edge_items, summary_insert) > 371 if leading_items or i != trailing_items: > 372 s += next_line_prefix > --> 373 s += _formatArray(a[-i], format_function, rank-1, > max_line_len, > 374 " " + next_line_prefix, separator, > edge_items, > 375 summary_insert) > > > I think the problem has to do with how I am overriding __getitem__: I check > to see if the input is a single integer, and if it is, I return the new > object instance. This should only occur when something like "imagecube[n]" > is called, however, array2str ends up calling imagecube[x], even if the > original thing you are trying to print is something like > imagecube[0,1:256,1:256]. > > Any ideas? I apologize if the explanation is not very clear; I'm still > trying to figure out exactly what is going on. > > Thanks, > Keith > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Thu Aug 18 10:19:06 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Thu, 18 Aug 2011 07:19:06 -0700 Subject: [Numpy-discussion] summing an array Message-ID: <4E4D1F5A.2000205@simplistix.co.uk> Hi All, Hopefully a simple newbie question, if I have an array such as : array([0, 1, 2, 3, 4]) ...what's the best way to cummulatively sum it so that I end up with: array([0, 1, 3, 6, 10]) How would I do this both in-place and to create a new array? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From jsseabold at gmail.com Thu Aug 18 10:22:51 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 18 Aug 2011 10:22:51 -0400 Subject: [Numpy-discussion] summing an array In-Reply-To: <4E4D1F5A.2000205@simplistix.co.uk> References: <4E4D1F5A.2000205@simplistix.co.uk> Message-ID: On Thu, Aug 18, 2011 at 10:19 AM, Chris Withers wrote: > Hi All, > > Hopefully a simple newbie question, if I have an array such as : > > array([0, 1, 2, 3, 4]) > > ...what's the best way to cummulatively sum it so that I end up with: > > array([0, 1, 3, 6, 10]) > > How would I do this both in-place and to create a new array? > [~/] [1]: a = np.arange(5) [~/] [2]: a [2]: array([0, 1, 2, 3, 4]) [~/] [3]: np.cumsum(a) [3]: array([ 0, 1, 3, 6, 10]) [~/] [4]: np.cumsum(a,out=a) [4]: array([ 0, 1, 3, 6, 10]) [~/] [5]: a [5]: array([ 0, 1, 3, 6, 10]) Skipper From rjd4+numpy at cam.ac.uk Thu Aug 18 10:58:25 2011 From: rjd4+numpy at cam.ac.uk (Bob Dowling) Date: Thu, 18 Aug 2011 15:58:25 +0100 Subject: [Numpy-discussion] summing an array In-Reply-To: <4E4D1F5A.2000205@simplistix.co.uk> References: <4E4D1F5A.2000205@simplistix.co.uk> Message-ID: <4E4D2891.4@cam.ac.uk> On 18/08/11 15:19, Chris Withers wrote: > Hopefully a simple newbie question, if I have an array such as : > > array([0, 1, 2, 3, 4]) > > ...what's the best way to cummulatively sum it so that I end up with: > > array([0, 1, 3, 6, 10]) > > How would I do this both in-place and to create a new array? >>> a = numpy.arange(0,5) >>> a array([0, 1, 2, 3, 4]) >>> numpy.add.accumulate(a) array([ 0, 1, 3, 6, 10]) >>> numpy.add.accumulate(a, out=a) array([ 0, 1, 3, 6, 10]) >>> a array([ 0, 1, 3, 6, 10]) >>> And similarly with numpy.multiply for products etc. From mwwiebe at gmail.com Thu Aug 18 17:43:17 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 18 Aug 2011 14:43:17 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test Message-ID: It's taken a lot of changes to get the NA mask support to its current point, but the code ready for some testing now. You can read the work-in-progress release notes here: https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst To try it out, check out the missingdata branch from my github account, here, and build in the standard way: https://github.com/m-paradox/numpy The things most important to test are: * Confirm that existing code still works correctly. I've tested against SciPy and matplotlib. * Confirm that the performance of code not using NA masks is the same or better. * Try to do computations with the NA values, find places they don't work yet, and nominate unimplemented functionality important to you to be next on the development list. The release notes have a preliminary list of implemented/unimplemented functions. * Report any crashes, build problems, or unexpected behaviors. In addition to adding the NA mask, I've also added features and done a few performance changes here and there, like letting reductions like sum take lists of axes instead of being a single axis or all of them. These changes affect various bugs like http://projects.scipy.org/numpy/ticket/1143 and http://projects.scipy.org/numpy/ticket/533. Thanks! Mark Here's a small example run using NAs: >>> import numpy as np >>> np.__version__ '2.0.0.dev-8a5e2a1' >>> a = np.random.rand(3,3,3) >>> a.flags.maskna = True >>> a[np.random.rand(3,3,3) < 0.5] = np.NA >>> a array([[[NA, NA, 0.11511708], [ 0.46661454, 0.47565512, NA], [NA, NA, NA]], [[NA, 0.57860351, NA], [NA, NA, 0.72012669], [ 0.36582123, NA, 0.76289794]], [[ 0.65322748, 0.92794386, NA], [ 0.53745165, 0.97520989, 0.17515083], [ 0.71219688, 0.5184328 , 0.75802805]]]) >>> np.mean(a, axis=-1) array([[NA, NA, NA], [NA, NA, NA], [NA, 0.56260412, 0.66288591]]) >>> np.std(a, axis=-1) array([[NA, NA, NA], [NA, NA, NA], [NA, 0.32710662, 0.10384331]]) >>> np.mean(a, axis=-1, skipna=True) /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: RuntimeWarning: invalid value encountered in true_divide um.true_divide(ret, rcount, out=ret, casting='unsafe') array([[ 0.11511708, 0.47113483, nan], [ 0.57860351, 0.72012669, 0.56435958], [ 0.79058567, 0.56260412, 0.66288591]]) >>> np.std(a, axis=-1, skipna=True) /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: RuntimeWarning: invalid value encountered in true_divide um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: RuntimeWarning: invalid value encountered in true_divide um.true_divide(ret, rcount, out=ret, casting='unsafe') array([[ 0. , 0.00452029, nan], [ 0. , 0. , 0.19853835], [ 0.13735819, 0.32710662, 0.10384331]]) >>> np.std(a, axis=(1,2), skipna=True) array([ 0.16786895, 0.15498008, 0.23811937]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From rblove_lists at comcast.net Thu Aug 18 22:24:01 2011 From: rblove_lists at comcast.net (Robert Love) Date: Thu, 18 Aug 2011 21:24:01 -0500 Subject: [Numpy-discussion] dtype and shape for 1.6.1 seems broken? Message-ID: <506D2909-E407-4BE5-9F82-48D2E5D88E9D@comcast.net> This works under 1.5.1 and 1.6.0 but gives me errors in 1.6.1 import numpy as np def main(): print"numpy version: "+ np.__version__ zdt = np.dtype([('et','i4'),('r','f8',3)]) zdata = np.loadtxt('zdum.txt', zdt) In 1.6.1 I get this error: ValueError: setting an array element with a sequence. Is this a known problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Aug 18 23:44:50 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 18 Aug 2011 20:44:50 -0700 Subject: [Numpy-discussion] dtype and shape for 1.6.1 seems broken? In-Reply-To: <506D2909-E407-4BE5-9F82-48D2E5D88E9D@comcast.net> References: <506D2909-E407-4BE5-9F82-48D2E5D88E9D@comcast.net> Message-ID: This could be related to ticket #1936: http://projects.scipy.org/numpy/ticket/1936 for which there's a pull request against master here: https://github.com/numpy/numpy/pull/140 -Mark On Thu, Aug 18, 2011 at 7:24 PM, Robert Love wrote: > > This works under 1.5.1 and 1.6.0 but gives me errors in 1.6.1 > > import numpy as np > > def main(): > > print"numpy version: "+ np.__version__ > > zdt = np.dtype([('et','i4'),('r','f8',3)]) > > zdata = np.loadtxt('zdum.txt', zdt) > > In 1.6.1 I get this error: > > ValueError: setting an array element with a sequence. Is this a known > problem? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgohlke at uci.edu Thu Aug 18 23:45:09 2011 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 18 Aug 2011 20:45:09 -0700 Subject: [Numpy-discussion] dtype and shape for 1.6.1 seems broken? In-Reply-To: <506D2909-E407-4BE5-9F82-48D2E5D88E9D@comcast.net> References: <506D2909-E407-4BE5-9F82-48D2E5D88E9D@comcast.net> Message-ID: <4E4DDC45.3010303@uci.edu> On 8/18/2011 7:24 PM, Robert Love wrote: > > This works under 1.5.1 and 1.6.0 but gives me errors in 1.6.1 > > import numpy as np > > def main(): > > print"numpy version: "+ np.__version__ > > zdt = np.dtype([('et','i4'),('r','f8',3)]) > > zdata = np.loadtxt('zdum.txt', zdt) > > In 1.6.1 I get this error: > > ValueError: setting an array element with a sequence. Is this a known > problem? > This looks like The ValueError is raised in "numpy\lib\npyio.py", line 804, in loadtxt. Npyio.py is identical for numpy 1.6.0 and 1.6.1. This is an actual function call from line 804, which works in numpy 1.6.0 but fails with 1.6.1: >>> np.array([(0, ((0., 0., 0.),))], dtype=[('et', ' References: <506D2909-E407-4BE5-9F82-48D2E5D88E9D@comcast.net> Message-ID: <28A70610-5F10-4D9C-8F6B-FFE17C4F5A1C@iro.umontreal.ca> On 2011-08-18, at 10:24 PM, Robert Love wrote: > In 1.6.1 I get this error: > > ValueError: setting an array element with a sequence. Is this a known problem? You'll have to post a traceback if we're to figure out what the problem is. A few lines of zdum.txt would also be nice. Suffice it to say the dtype line runs fine in 1.6.1, so the problem is either in loadtxt or the data it's being asked to process. From mwwiebe at gmail.com Fri Aug 19 00:01:42 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 18 Aug 2011 21:01:42 -0700 Subject: [Numpy-discussion] longlong format error with Python <= 2.6 in scalartypes.c In-Reply-To: <8D5A8864-6827-4164-B8F6-198000B7491D@astro.physik.uni-goettingen.de> References: <8D5A8864-6827-4164-B8F6-198000B7491D@astro.physik.uni-goettingen.de> Message-ID: On Thu, Aug 4, 2011 at 4:08 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > Hi, > > commits c15a807e and c135371e (thus most immediately addressed to Mark, but > I am sending this to the list hoping for more insight on the issue) > introduce a test failure with Python 2.5+2.6 on Mac: > > FAIL: test_timedelta_scalar_construction (test_datetime.TestDateTime) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/Users/derek/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py", > line 219, in test_timedelta_scalar_construction > assert_equal(str(np.timedelta64(3, 's')), '3 seconds') > File "/Users/derek/lib/python2.6/site-packages/numpy/testing/utils.py", > line 313, in assert_equal > raise AssertionError(msg) > AssertionError: > Items are not equal: > ACTUAL: '%lld seconds' > DESIRED: '3 seconds' > > due to the "lld" format passed to PyUString_FromFormat in scalartypes.c. > In the current npy_common.h I found the comment > * in Python 2.6 the %lld formatter is not supported. In this > * case we work around the problem by using the %zd formatter. > though I did not notice that problem when I cleaned up the NPY_LONGLONG_FMT > definitions in that file (and it is not entirely clear whether the comment > only pertains to Windows...). Anyway changing the formatters in > scalartypes.c to "zd" as well removes the failure and still works with > Python 2.7 and 3.2 (at least on Mac OS). However I am wondering if > a) NPY_[U]LONGLONG_FMT should also be defined conditional to the Python > version (and if "%zu" is a valid formatter), and > b) scalartypes.c should use NPY_LONGLONG_FMT from npy_common.h > > I am attaching a patch implementing a), but only the quick and dirty > solution to b). > I've touched this stuff as little as possible, because I rather dislike the way the *_FMT macros are set up right now. I added a comment about NPY_INTP_FMT in npy_common.h which I see you read. If you're going to try to fix this, I hope you fix it deeper than this patch so it's not error-prone anymore. NPY_INTP_FMT is used together with PyErr_Format/PyString_FromFormat, whereas the other *_FMT are used with the *printf functions from the C libraries. These are not compatible, and the %zd hack was put in place because it exists even in Python 2.4, and Py_ssize_t seems matches the pointer size in all CPython versions. Switching the timedelta64 format in scalartypes.c.src to "%zd" won't help on 32-bit platforms, because it won't be a 64-bit type there, unlike how it works ok for the NPY_INTP_FMT. In summary: * There need to be changes to create a clear distinction between the *_FMT for PyString_FromFormat vs the *_FMT for C library *printf functions * I suspect we're out of luck for 32-bit older versions of CPython with PyString_FromFormat Cheers, -Mark > > Cheers, > Derek > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 19 00:32:44 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 18 Aug 2011 22:32:44 -0600 Subject: [Numpy-discussion] dtype and shape for 1.6.1 seems broken? In-Reply-To: <4E4DDC45.3010303@uci.edu> References: <506D2909-E407-4BE5-9F82-48D2E5D88E9D@comcast.net> <4E4DDC45.3010303@uci.edu> Message-ID: On Thu, Aug 18, 2011 at 9:45 PM, Christoph Gohlke wrote: > > > On 8/18/2011 7:24 PM, Robert Love wrote: > > > > This works under 1.5.1 and 1.6.0 but gives me errors in 1.6.1 > > > > import numpy as np > > > > def main(): > > > > print"numpy version: "+ np.__version__ > > > > zdt = np.dtype([('et','i4'),('r','f8',3)]) > > > > zdata = np.loadtxt('zdum.txt', zdt) > > > > In 1.6.1 I get this error: > > > > ValueError: setting an array element with a sequence. Is this a known > > problem? > > > > This looks like > > The ValueError is raised in "numpy\lib\npyio.py", line 804, in loadtxt. > > Npyio.py is identical for numpy 1.6.0 and 1.6.1. > > This is an actual function call from line 804, which works in numpy > 1.6.0 but fails with 1.6.1: > > >>> np.array([(0, ((0., 0., 0.),))], dtype=[('et', ' (3,))]) > > Looks malformed, shouldn't that be In [16]: np.array((0, (0., 0., 0.)), dtype=[('et', ' From ralf.gommers at googlemail.com Fri Aug 19 06:48:29 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 19 Aug 2011 12:48:29 +0200 Subject: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) In-Reply-To: References: Message-ID: On Tue, Aug 16, 2011 at 3:01 PM, Pauli Virtanen wrote: > Sat, 13 Aug 2011 22:00:33 -0400, josef.pktd wrote: > [clip] > > Does Trac require svn access to dig out old information? for example > > links to old changesets, annotate/blame, ... ? > > It does not require HTTP access to SVN, as it looks directly at the > SVN repo on the local disk. > > It also probably doesn't use the old SVN repo for anything in reality, > as there's a simple Git plugin installed that just grabs the Git history > to the timeline, and redirects source browsing etc to Github. > However, I don't know whether the timeline views etc continue to > function even without the local SVN repo, so I'd just disable the HTTP > access and leave the local repo as it is as a backup. > > Hi Ognen, Could you please disable http access to numpy and scipy svn? Thanks a lot, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From dirk.ullrich at googlemail.com Fri Aug 19 07:26:16 2011 From: dirk.ullrich at googlemail.com (Dirk Ullrich) Date: Fri, 19 Aug 2011 13:26:16 +0200 Subject: [Numpy-discussion] Build of current Git HEAD for NumPy fails Message-ID: Hi, when trying to build current Git HAED of NumPy with - both for $PYTHON=python2 or $PYTHON=python3: $PYTHON setup.py config_fc --fcompiler=gnu95 install --prefix=$WHATEVER I get the following error - here for PYTHON=python3.2 running build_clib customize UnixCCompiler customize UnixCCompiler using build_clib building 'npymath' library Traceback (most recent call last): File "setup.py", line 214, in setup_package() File "setup.py", line 207, in setup_package configuration=configuration ) File "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/core.py", line 186, in setup return old_setup(**new_attr) File "/usr/lib/python3.2/distutils/core.py", line 150, in setup dist.run_commands() File "/usr/lib/python3.2/distutils/dist.py", line 919, in run_commands self.run_command(cmd) File "/usr/lib/python3.2/distutils/dist.py", line 938, in run_command cmd_obj.run() File "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build.py", line 37, in run old_build.run(self) File "/usr/lib/python3.2/distutils/command/build.py", line 128, in run self.run_command(cmd_name) File "/usr/lib/python3.2/distutils/cmd.py", line 315, in run_command self.distribution.run_command(command) File "/usr/lib/python3.2/distutils/dist.py", line 938, in run_command cmd_obj.run() File "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py", line 100, in run self.build_libraries(self.libraries) File "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py", line 119, in build_libraries self.build_a_library(build_info, lib_name, libraries) File "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py", line 179, in build_a_library fcompiler.extra_f77_compile_args = build_info.get('extra_f77_compile_args') or [] AttributeError: 'str' object has no attribute 'extra_f77_compile_args' It seems that `fcompiler's value in line 179 of `numpy/distutils/command/build_clib.py' is not properly initialized as an appropriate `fcompiler' object. Dirk From pearu.peterson at gmail.com Fri Aug 19 07:59:58 2011 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Fri, 19 Aug 2011 14:59:58 +0300 Subject: [Numpy-discussion] Build of current Git HEAD for NumPy fails In-Reply-To: References: Message-ID: <4E4E503E.4060408@cens.ioc.ee> On 08/19/2011 02:26 PM, Dirk Ullrich wrote: > Hi, > > when trying to build current Git HAED of NumPy with - both for > $PYTHON=python2 or $PYTHON=python3: > > $PYTHON setup.py config_fc --fcompiler=gnu95 install --prefix=$WHATEVER > > I get the following error - here for PYTHON=python3.2 The command works fine here with Numpy HEAD and Python 2.7. Btw, why do you specify --fcompiler=gnu95 for numpy? Numpy has no Fortran sources. So, fortran compiler is not needed for building Numpy (unless you use Fortran libraries for numpy.linalg). > running build_clib ... > File "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py", > line 179, in build_a_library > fcompiler.extra_f77_compile_args = > build_info.get('extra_f77_compile_args') or [] > AttributeError: 'str' object has no attribute 'extra_f77_compile_args' Reading the code, I don't see how this can happen. Very strange. Anyway, I cleaned up build_clib to follow similar coding convention as in build_ext. Could you try numpy head again? Regards, Pearu From pav at iki.fi Fri Aug 19 08:48:01 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 19 Aug 2011 12:48:01 +0000 (UTC) Subject: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) References: Message-ID: Fri, 19 Aug 2011 12:48:29 +0200, Ralf Gommers wrote: [clip] > Hi Ognen, > > Could you please disable http access to numpy and scipy svn? Turns out also I had enough permissions to disable this. Now: $ svn co http://svn.scipy.org/svn/numpy/trunk numpy svn: Repository moved permanently to 'http://github.com/numpy/numpy/'; please relocate From dirk.ullrich at googlemail.com Fri Aug 19 08:50:18 2011 From: dirk.ullrich at googlemail.com (Dirk Ullrich) Date: Fri, 19 Aug 2011 14:50:18 +0200 Subject: [Numpy-discussion] Build of current Git HEAD for NumPy fails In-Reply-To: <4E4E503E.4060408@cens.ioc.ee> References: <4E4E503E.4060408@cens.ioc.ee> Message-ID: Hi Paeru, 2011/8/19 Pearu Peterson : > > > On 08/19/2011 02:26 PM, Dirk Ullrich wrote: >> Hi, >> >> when trying to build current Git HAED of NumPy with - both for >> $PYTHON=python2 or $PYTHON=python3: >> >> $PYTHON setup.py config_fc --fcompiler=gnu95 install --prefix=$WHATEVER >> >> I get the following error - here for PYTHON=python3.2 > > The command works fine here with Numpy HEAD and Python 2.7. > Btw, why do you specify --fcompiler=gnu95 for numpy? Numpy > has no Fortran sources. So, fortran compiler is not needed > for building Numpy (unless you use Fortran libraries > for numpy.linalg). > I do use Lapack. Sorry for not mentioning it. >> running build_clib > ... >> ? ?File "/common/packages/build/makepkg-du/python-numpy-git/src/numpy-build/build/py3k/numpy/distutils/command/build_clib.py", >> line 179, in build_a_library >> ? ? ?fcompiler.extra_f77_compile_args = >> build_info.get('extra_f77_compile_args') or [] >> AttributeError: 'str' object has no attribute 'extra_f77_compile_args' > > Reading the code, I don't see how this can happen. Very strange. > Anyway, I cleaned up build_clib to follow similar coding convention > as in build_ext. Could you try numpy head again? >[...] Now it seems to work for for Python 3.2 and 2.7. Thank you very much, Pearu! Dirk From jlconlin at gmail.com Fri Aug 19 09:00:31 2011 From: jlconlin at gmail.com (Jeremy Conlin) Date: Fri, 19 Aug 2011 07:00:31 -0600 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap Message-ID: I would like to use numpy's memmap on some data files I have. The first 12 or so lines of the files contain text (header information) and the remainder has the numerical data. Is there a way I can tell memmap to skip a specified number of lines instead of a number of bytes? Thanks, Jeremy From pav at iki.fi Fri Aug 19 09:19:24 2011 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 19 Aug 2011 13:19:24 +0000 (UTC) Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap References: Message-ID: Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: > I would like to use numpy's memmap on some data files I have. The first > 12 or so lines of the files contain text (header information) and the > remainder has the numerical data. Is there a way I can tell memmap to > skip a specified number of lines instead of a number of bytes? First use standard Python I/O functions to determine the number of bytes to skip at the beginning and the number of data items. Then pass in `offset` and `shape` parameters to numpy.memmap. -- Pauli Virtanen From jlconlin at gmail.com Fri Aug 19 09:29:44 2011 From: jlconlin at gmail.com (Jeremy Conlin) Date: Fri, 19 Aug 2011 07:29:44 -0600 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: > Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >> I would like to use numpy's memmap on some data files I have. The first >> 12 or so lines of the files contain text (header information) and the >> remainder has the numerical data. Is there a way I can tell memmap to >> skip a specified number of lines instead of a number of bytes? > > First use standard Python I/O functions to determine the number of > bytes to skip at the beginning and the number of data items. Then pass > in `offset` and `shape` parameters to numpy.memmap. Thanks for that suggestion. However, I'm unfamiliar with the I/O functions you are referring to. Can you point me to do the documentation? Thanks again, Jeremy From ralf.gommers at googlemail.com Fri Aug 19 09:48:51 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 19 Aug 2011 15:48:51 +0200 Subject: [Numpy-discussion] [SciPy-User] disabling SVN (was: Trouble installing scipy after upgrading to Mac OS X 10.7 aka Lion) In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 2:48 PM, Pauli Virtanen wrote: > Fri, 19 Aug 2011 12:48:29 +0200, Ralf Gommers wrote: > [clip] > > Hi Ognen, > > > > Could you please disable http access to numpy and scipy svn? > > Turns out also I had enough permissions to disable this. Now: > > $ svn co http://svn.scipy.org/svn/numpy/trunk numpy > svn: Repository moved permanently to 'http://github.com/numpy/numpy/'; > please relocate > > A helpful message even, nice touch. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpederse at gmail.com Fri Aug 19 10:01:06 2011 From: bpederse at gmail.com (Brent Pedersen) Date: Fri, 19 Aug 2011 08:01:06 -0600 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: > On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >>> I would like to use numpy's memmap on some data files I have. The first >>> 12 or so lines of the files contain text (header information) and the >>> remainder has the numerical data. Is there a way I can tell memmap to >>> skip a specified number of lines instead of a number of bytes? >> >> First use standard Python I/O functions to determine the number of >> bytes to skip at the beginning and the number of data items. Then pass >> in `offset` and `shape` parameters to numpy.memmap. > > Thanks for that suggestion. However, I'm unfamiliar with the I/O > functions you are referring to. Can you point me to do the > documentation? > > Thanks again, > Jeremy > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > this might get you started: import numpy as np # make some fake data with 12 header lines. with open('test.mm', 'w') as fhw: print >> fhw, "\n".join('header' for i in range(12)) np.arange(100, dtype=np.uint).tofile(fhw) # use normal python io to determine of offset after 12 lines. with open('test.mm') as fhr: for i in range(12): fhr.readline() offset = fhr.tell() # use the offset in your call to np.memmap. a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) assert all(a == np.arange(100)) From pearu.peterson at gmail.com Fri Aug 19 10:07:54 2011 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Fri, 19 Aug 2011 17:07:54 +0300 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap In-Reply-To: References: Message-ID: <4E4E6E3A.5010908@cens.ioc.ee> On 08/19/2011 05:01 PM, Brent Pedersen wrote: > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >>>> I would like to use numpy's memmap on some data files I have. The first >>>> 12 or so lines of the files contain text (header information) and the >>>> remainder has the numerical data. Is there a way I can tell memmap to >>>> skip a specified number of lines instead of a number of bytes? >>> >>> First use standard Python I/O functions to determine the number of >>> bytes to skip at the beginning and the number of data items. Then pass >>> in `offset` and `shape` parameters to numpy.memmap. >> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O >> functions you are referring to. Can you point me to do the >> documentation? >> >> Thanks again, >> Jeremy >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > this might get you started: > > > import numpy as np > > # make some fake data with 12 header lines. > with open('test.mm', 'w') as fhw: > print>> fhw, "\n".join('header' for i in range(12)) > np.arange(100, dtype=np.uint).tofile(fhw) > > # use normal python io to determine of offset after 12 lines. > with open('test.mm') as fhr: > for i in range(12): fhr.readline() > offset = fhr.tell() I think that before reading a line the program should check whether the line starts with "#". Otherwise fhr.readline() may return a very large junk of data (may be the rest of the file content) that ought to be read only via memmap. HTH, Pearu From bsouthey at gmail.com Fri Aug 19 10:15:56 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 19 Aug 2011 09:15:56 -0500 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: <4E4E701C.1030305@gmail.com> On 08/18/2011 04:43 PM, Mark Wiebe wrote: > It's taken a lot of changes to get the NA mask support to its current > point, but the code ready for some testing now. You can read the > work-in-progress release notes here: > > https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst > > To try it out, check out the missingdata branch from my github > account, here, and build in the standard way: > > https://github.com/m-paradox/numpy > > The things most important to test are: > > * Confirm that existing code still works correctly. I've tested > against SciPy and matplotlib. > * Confirm that the performance of code not using NA masks is the same > or better. > * Try to do computations with the NA values, find places they don't > work yet, and nominate unimplemented functionality important to you to > be next on the development list. The release notes have a preliminary > list of implemented/unimplemented functions. > * Report any crashes, build problems, or unexpected behaviors. > > In addition to adding the NA mask, I've also added features and done a > few performance changes here and there, like letting reductions like > sum take lists of axes instead of being a single axis or all of them. > These changes affect various bugs like > http://projects.scipy.org/numpy/ticket/1143 and > http://projects.scipy.org/numpy/ticket/533. > > Thanks! > Mark > > Here's a small example run using NAs: > > >>> import numpy as np > >>> np.__version__ > '2.0.0.dev-8a5e2a1' > >>> a = np.random.rand(3,3,3) > >>> a.flags.maskna = True > >>> a[np.random.rand(3,3,3) < 0.5] = np.NA > >>> a > array([[[NA, NA, 0.11511708], > [ 0.46661454, 0.47565512, NA], > [NA, NA, NA]], > > [[NA, 0.57860351, NA], > [NA, NA, 0.72012669], > [ 0.36582123, NA, 0.76289794]], > > [[ 0.65322748, 0.92794386, NA], > [ 0.53745165, 0.97520989, 0.17515083], > [ 0.71219688, 0.5184328 , 0.75802805]]]) > >>> np.mean(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.32710662, 0.10384331]]) > >>> np.mean(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0.11511708, 0.47113483, nan], > [ 0.57860351, 0.72012669, 0.56435958], > [ 0.79058567, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0. , 0.00452029, nan], > [ 0. , 0. , 0.19853835], > [ 0.13735819, 0.32710662, 0.10384331]]) > >>> np.std(a, axis=(1,2), skipna=True) > array([ 0.16786895, 0.15498008, 0.23811937]) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi, That is great news! (Python2.x will be another email.) Python3.1 and Python3.2 failed with building 'multiarraymodule_onefile.o' but I could not see any obvious reason. I had removed my build directory and then 'python3 setup.py build' but I saw this message: Running from numpy source directory. numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch detected, the C API version numbers have to be updated. Current C api version is 6, with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum for C API version 6 in codegen_dir/cversions.txt is e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, you have to update C_API_VERSION in numpy/core/setup_common.py. MismatchCAPIWarning) Upstream of the build log is below. Bruce In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: numpy/core/src/multiarray/na_singleton.c: At top level: numpy/core/src/multiarray/na_singleton.c:708:25: error: ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? defined but not used numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? declared ?static? but never defined numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: ?gentype_getsegcount? defined but not used numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: ?gentype_getcharbuf? defined but not used numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? defined but not used numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined but not used numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? defined but not used numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? defined but not used numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? defined but not used numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? defined but not used numpy/core/src/multiarray/na_mask.c:681:1: warning: ?PyArray_GetMaskInversionFunction? defined but not used In file included from numpy/core/src/multiarray/scalartypes.c.src:25:0, from numpy/core/src/multiarray/multiarraymodule_onefile.c:10: numpy/core/src/multiarray/_datetime.h:9:1: warning: function declaration isn?t a prototype In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:13:0: numpy/core/src/multiarray/datetime.c:33:1: warning: function declaration isn?t a prototype In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: numpy/core/src/multiarray/arraytypes.c.src:643:9: warning: passing argument 2 of ?PyArray_SetBaseObject? from incompatible pointer type build/src.linux-x86_64-3.2/numpy/core/include/numpy/__multiarray_api.h:763:12: note: expected ?struct PyObject *? but argument is of type ?struct PyArrayObject *? In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:44:0: numpy/core/src/multiarray/nditer_pywrap.c: In function ?npyiter_subscript?: numpy/core/src/multiarray/nditer_pywrap.c:2395:29: warning: passing argument 1 of ?PySlice_GetIndices? from incompatible pointer type /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct PyObject *? but argument is of type ?struct PySliceObject *? numpy/core/src/multiarray/nditer_pywrap.c: In function ?npyiter_ass_subscript?: numpy/core/src/multiarray/nditer_pywrap.c:2440:29: warning: passing argument 1 of ?PySlice_GetIndices? from incompatible pointer type /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct PyObject *? but argument is of type ?struct PySliceObject *? In file included from numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: numpy/core/src/multiarray/na_singleton.c: At top level: numpy/core/src/multiarray/na_singleton.c:708:25: error: ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? defined but not used numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? declared ?static? but never defined numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: ?gentype_getsegcount? defined but not used numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: ?gentype_getcharbuf? defined but not used numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? defined but not used numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined but not used numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? defined but not used numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? defined but not used numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? defined but not used numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? defined but not used numpy/core/src/multiarray/na_mask.c:681:1: warning: ?PyArray_GetMaskInversionFunction? defined but not used error: Command "gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/local/include/python3.2m -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c numpy/core/src/multiarray/multiarraymodule_onefile.c -o build/temp.linux-x86_64-3.2/numpy/core/src/multiarray/multiarraymodule_onefile.o" failed with exit status 1 -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Fri Aug 19 10:49:33 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Fri, 19 Aug 2011 07:49:33 -0700 Subject: [Numpy-discussion] summing an array In-Reply-To: <4E4D2891.4@cam.ac.uk> References: <4E4D1F5A.2000205@simplistix.co.uk> <4E4D2891.4@cam.ac.uk> Message-ID: <4E4E77FD.8070107@simplistix.co.uk> On 18/08/2011 07:58, Bob Dowling wrote: > > >>> numpy.add.accumulate(a) > array([ 0, 1, 3, 6, 10]) > > >>> numpy.add.accumulate(a, out=a) > array([ 0, 1, 3, 6, 10]) What's the difference between numpy.cumsum and numpy.add.accumulate? Where can I find the reference docs for these? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From bsouthey at gmail.com Fri Aug 19 10:55:46 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 19 Aug 2011 09:55:46 -0500 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: <4E4E7972.9060807@gmail.com> On 08/18/2011 04:43 PM, Mark Wiebe wrote: > It's taken a lot of changes to get the NA mask support to its current > point, but the code ready for some testing now. You can read the > work-in-progress release notes here: > > https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst > > To try it out, check out the missingdata branch from my github > account, here, and build in the standard way: > > https://github.com/m-paradox/numpy > > The things most important to test are: > > * Confirm that existing code still works correctly. I've tested > against SciPy and matplotlib. > * Confirm that the performance of code not using NA masks is the same > or better. > * Try to do computations with the NA values, find places they don't > work yet, and nominate unimplemented functionality important to you to > be next on the development list. The release notes have a preliminary > list of implemented/unimplemented functions. > * Report any crashes, build problems, or unexpected behaviors. > > In addition to adding the NA mask, I've also added features and done a > few performance changes here and there, like letting reductions like > sum take lists of axes instead of being a single axis or all of them. > These changes affect various bugs like > http://projects.scipy.org/numpy/ticket/1143 and > http://projects.scipy.org/numpy/ticket/533. > > Thanks! > Mark > > Here's a small example run using NAs: > > >>> import numpy as np > >>> np.__version__ > '2.0.0.dev-8a5e2a1' > >>> a = np.random.rand(3,3,3) > >>> a.flags.maskna = True > >>> a[np.random.rand(3,3,3) < 0.5] = np.NA > >>> a > array([[[NA, NA, 0.11511708], > [ 0.46661454, 0.47565512, NA], > [NA, NA, NA]], > > [[NA, 0.57860351, NA], > [NA, NA, 0.72012669], > [ 0.36582123, NA, 0.76289794]], > > [[ 0.65322748, 0.92794386, NA], > [ 0.53745165, 0.97520989, 0.17515083], > [ 0.71219688, 0.5184328 , 0.75802805]]]) > >>> np.mean(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.32710662, 0.10384331]]) > >>> np.mean(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0.11511708, 0.47113483, nan], > [ 0.57860351, 0.72012669, 0.56435958], > [ 0.79058567, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0. , 0.00452029, nan], > [ 0. , 0. , 0.19853835], > [ 0.13735819, 0.32710662, 0.10384331]]) > >>> np.std(a, axis=(1,2), skipna=True) > array([ 0.16786895, 0.15498008, 0.23811937]) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Hi, I had to rebuild my Python2.6 as a 'normal' version. Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests. Curiously, only tests in Python2.7 give almost no warnings but all the other Python2.x give lots of warnings - Python2.6 and Python2.7 are below. My expectation is that all versions should behave the same regarding printing messages. Also the message 'Need pytz library to test datetime timezones' means that there are invalid tests that have to be rewritten (ticket 1939: http://projects.scipy.org/numpy/ticket/1939 ). Bruce $ python2.6 -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 2.0.0.dev-93236a2 NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1 20100924 (Red Hat 4.5.1-4)] nose version 1.0.0 ......................../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py:1313: UserWarning: Need pytz library to test datetime timezones warnings.warn("Need pytz library to test datetime timezones") .........................................................................................................................../usr/local/lib/python2.6/unittest.py:336: DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they are platform specific. Use 'O' instead callableObj(*args, **kwargs) ............................................................................................................................................................................................................./usr/local/lib/python2.6/site-packages/numpy/core/_internal.py:555: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version value.names = tuple(names) ...../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1912: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version dt.names = tuple(names) ...../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:804: DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they are platform specific. Use 'O' instead return loads(obj) ..../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1046: DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as the mask instead np.putmask(x,[True,False,True],-1) ../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1025: DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as the mask instead np.putmask(x, mask, val) ................................................/usr/local/lib/python2.6/unittest.py:336: DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as the mask instead callableObj(*args, **kwargs) ../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1057: DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as the mask instead np.putmask(rec['x'],[True,False],10) /usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1061: DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as the mask instead np.putmask(rec['y'],[True,False],11) .S/usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1395: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version dt.names = ['p','q'] ..................................................................................................................................................................................................................................................................................................................................................................................../usr/local/lib/python2.6/site-packages/numpy/core/records.py:157: DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they are platform specific. Use 'O' instead dtype = sb.dtype(formats, aligned) ........................................................./usr/local/lib/python2.6/site-packages/numpy/core/tests/test_regression.py:1426: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version ra.dtype.names = ('f1', 'f2') /usr/local/lib/python2.6/unittest.py:336: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version callableObj(*args, **kwargs) ............../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_regression.py:1017: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version a.dtype.names = b ......................................................................................................................./usr/local/lib/python2.6/pickle.py:1133: DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they are platform specific. Use 'O' instead value = func(*args) ..........................................................................................K..................................................................................................K......................K..........................................................................................................S...................................../usr/local/lib/python2.6/site-packages/numpy/lib/_iotools.py:857: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version ndtype.names = validate(ndtype.names, defaultfmt=defaultfmt) /usr/local/lib/python2.6/site-packages/numpy/lib/_iotools.py:854: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version ndtype.names = validate([''] * nbtypes, defaultfmt=defaultfmt) /usr/local/lib/python2.6/site-packages/numpy/lib/_iotools.py:847: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version defaultfmt=defaultfmt) ......................................................................................................................................................................................./usr/local/lib/python2.6/site-packages/numpy/lib/format.py:358: DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they are platform specific. Use 'O' instead dtype = numpy.dtype(d['descr']) /usr/local/lib/python2.6/site-packages/numpy/lib/format.py:449: DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they are platform specific. Use 'O' instead array = cPickle.load(fp) .............................................................................................................................................................................................................................................................................................................................../usr/local/lib/python2.6/site-packages/numpy/ma/core.py:366: DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they are platform specific. Use 'O' instead deflist.append(default_fill_value(np.dtype(currenttype))) ................/usr/local/lib/python2.6/site-packages/numpy/lib/npyio.py:1640: DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will become immutable in a future version dtype.names = names ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ ---------------------------------------------------------------------- Ran 3064 tests in 22.795s OK (KNOWNFAIL=3, SKIP=2) $ python -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 2.0.0.dev-93236a2 NumPy is installed in /usr/lib64/python2.7/site-packages/numpy Python version 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 (Red Hat 4.5.1-3)] nose version 1.0.0 ......................../usr/lib64/python2.7/site-packages/numpy/core/tests/test_datetime.py:1313: UserWarning: Need pytz library to test datetime timezones warnings.warn("Need pytz library to test datetime timezones") ...........................................................................................................................................................................................................................................................................................................................................................................................................S............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K..................................................................................................K......................K..........................................................................................................S................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ---------------------------------------------------------------------- Ran 3064 tests in 23.180s OK (KNOWNFAIL=3, SKIP=2) -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Fri Aug 19 11:04:38 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 19 Aug 2011 17:04:38 +0200 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: <4E4E7972.9060807@gmail.com> References: <4E4E7972.9060807@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey wrote: > ** > > Hi, > I had to rebuild my Python2.6 as a 'normal' version. > > Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests. > > Curiously, only tests in Python2.7 give almost no warnings but all the > other Python2.x give lots of warnings - Python2.6 and Python2.7 are below. > My expectation is that all versions should behave the same regarding > printing messages. > This is due to a change in Python 2.7 itself - deprecation warnings are not shown anymore by default. Furthermore, all those messages are unrelated to Mark's missing data commits. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlconlin at gmail.com Fri Aug 19 11:09:26 2011 From: jlconlin at gmail.com (Jeremy Conlin) Date: Fri, 19 Aug 2011 09:09:26 -0600 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen wrote: > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >>>> I would like to use numpy's memmap on some data files I have. The first >>>> 12 or so lines of the files contain text (header information) and the >>>> remainder has the numerical data. Is there a way I can tell memmap to >>>> skip a specified number of lines instead of a number of bytes? >>> >>> First use standard Python I/O functions to determine the number of >>> bytes to skip at the beginning and the number of data items. Then pass >>> in `offset` and `shape` parameters to numpy.memmap. >> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O >> functions you are referring to. Can you point me to do the >> documentation? >> >> Thanks again, >> Jeremy >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > this might get you started: > > > import numpy as np > > # make some fake data with 12 header lines. > with open('test.mm', 'w') as fhw: > ? ?print >> fhw, "\n".join('header' for i in range(12)) > ? ?np.arange(100, dtype=np.uint).tofile(fhw) > > # use normal python io to determine of offset after 12 lines. > with open('test.mm') as fhr: > ? ?for i in range(12): fhr.readline() > ? ?offset = fhr.tell() > > # use the offset in your call to np.memmap. > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) Thanks, that looks good. I tried it, but it doesn't get the correct data. I really don't understand what is going on. A simple code and sample data is attached if anyone has a chance to look at it. Thanks, Jeremy -------------- next part -------------- A non-text attachment was scrubbed... Name: tmp.dat Type: application/octet-stream Size: 1668 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: tmp.py Type: application/octet-stream Size: 429 bytes Desc: not available URL: From mwwiebe at gmail.com Fri Aug 19 11:14:39 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 19 Aug 2011 08:14:39 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: <4E4E7972.9060807@gmail.com> References: <4E4E7972.9060807@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 7:55 AM, Bruce Southey wrote: > ** > On 08/18/2011 04:43 PM, Mark Wiebe wrote: > > It's taken a lot of changes to get the NA mask support to its current > point, but the code ready for some testing now. You can read the > work-in-progress release notes here: > > > https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst > > To try it out, check out the missingdata branch from my github account, > here, and build in the standard way: > > https://github.com/m-paradox/numpy > > The things most important to test are: > > * Confirm that existing code still works correctly. I've tested against > SciPy and matplotlib. > * Confirm that the performance of code not using NA masks is the same or > better. > * Try to do computations with the NA values, find places they don't work > yet, and nominate unimplemented functionality important to you to be next on > the development list. The release notes have a preliminary list of > implemented/unimplemented functions. > * Report any crashes, build problems, or unexpected behaviors. > > In addition to adding the NA mask, I've also added features and done a > few performance changes here and there, like letting reductions like sum > take lists of axes instead of being a single axis or all of them. These > changes affect various bugs like > http://projects.scipy.org/numpy/ticket/1143 and > http://projects.scipy.org/numpy/ticket/533. > > Thanks! > Mark > > Here's a small example run using NAs: > > >>> import numpy as np > >>> np.__version__ > '2.0.0.dev-8a5e2a1' > >>> a = np.random.rand(3,3,3) > >>> a.flags.maskna = True > >>> a[np.random.rand(3,3,3) < 0.5] = np.NA > >>> a > array([[[NA, NA, 0.11511708], > [ 0.46661454, 0.47565512, NA], > [NA, NA, NA]], > > [[NA, 0.57860351, NA], > [NA, NA, 0.72012669], > [ 0.36582123, NA, 0.76289794]], > > [[ 0.65322748, 0.92794386, NA], > [ 0.53745165, 0.97520989, 0.17515083], > [ 0.71219688, 0.5184328 , 0.75802805]]]) > >>> np.mean(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.32710662, 0.10384331]]) > >>> np.mean(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0.11511708, 0.47113483, nan], > [ 0.57860351, 0.72012669, 0.56435958], > [ 0.79058567, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0. , 0.00452029, nan], > [ 0. , 0. , 0.19853835], > [ 0.13735819, 0.32710662, 0.10384331]]) > >>> np.std(a, axis=(1,2), skipna=True) > array([ 0.16786895, 0.15498008, 0.23811937]) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > Hi, > I had to rebuild my Python2.6 as a 'normal' version. > > Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests. > Thanks for running the tests! > > Curiously, only tests in Python2.7 give almost no warnings but all the > other Python2.x give lots of warnings - Python2.6 and Python2.7 are below. > My expectation is that all versions should behave the same regarding > printing messages. > The lack of deprecation warnings is because you need to add -Wd explicitly when you run under 2.7. There was an idea to make this the default from within the test suite execution code, but no one has stepped up and implemented that. See here: http://projects.scipy.org/numpy/ticket/1894 > Also the message 'Need pytz library to test datetime timezones' means that > there are invalid tests that have to be rewritten (ticket 1939: > http://projects.scipy.org/numpy/ticket/1939 ). > I did it this way because Python has no timezone objects built in, just provides the interface. If someone is willing to copy or write timezone instances into the testsuite to fix this I would be very grateful! I think all these policies I keep breaking should be written down somewhere. I don't think it's reasonable to call something a community/project policy unless a particular wording of it in an easily discoverable official document has been agreed upon by the community. I nominate this as a new policy. ;) Thanks, Mark > > Bruce > > $ python2.6 -c "import numpy; numpy.test()" > Running unit tests for numpy > NumPy version 2.0.0.dev-93236a2 > NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy > Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1 > 20100924 (Red Hat 4.5.1-4)] > nose version 1.0.0 > ......................../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py:1313: > UserWarning: Need pytz library to test datetime timezones > warnings.warn("Need pytz library to test datetime timezones") > .........................................................................................................................../usr/local/lib/python2.6/unittest.py:336: > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they > are platform specific. Use 'O' instead > callableObj(*args, **kwargs) > ............................................................................................................................................................................................................./usr/local/lib/python2.6/site-packages/numpy/core/_internal.py:555: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > value.names = tuple(names) > ...../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1912: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > dt.names = tuple(names) > ...../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:804: > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they > are platform specific. Use 'O' instead > return loads(obj) > ..../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1046: > DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as > the mask instead > np.putmask(x,[True,False,True],-1) > ../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1025: > DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as > the mask instead > np.putmask(x, mask, val) > ................................................/usr/local/lib/python2.6/unittest.py:336: > DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as > the mask instead > callableObj(*args, **kwargs) > ../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1057: > DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as > the mask instead > np.putmask(rec['x'],[True,False],10) > /usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1061: > DeprecationWarning: putmask has been deprecated. Use copyto with 'where' as > the mask instead > np.putmask(rec['y'],[True,False],11) > .S/usr/local/lib/python2.6/site-packages/numpy/core/tests/test_multiarray.py:1395: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > dt.names = ['p','q'] > ..................................................................................................................................................................................................................................................................................................................................................................................../usr/local/lib/python2.6/site-packages/numpy/core/records.py:157: > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they > are platform specific. Use 'O' instead > dtype = sb.dtype(formats, aligned) > ........................................................./usr/local/lib/python2.6/site-packages/numpy/core/tests/test_regression.py:1426: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > ra.dtype.names = ('f1', 'f2') > /usr/local/lib/python2.6/unittest.py:336: DeprecationWarning: Setting NumPy > dtype names is deprecated, the dtype will become immutable in a future > version > callableObj(*args, **kwargs) > ............../usr/local/lib/python2.6/site-packages/numpy/core/tests/test_regression.py:1017: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > a.dtype.names = b > ......................................................................................................................./usr/local/lib/python2.6/pickle.py:1133: > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they > are platform specific. Use 'O' instead > value = func(*args) > ..........................................................................................K..................................................................................................K......................K..........................................................................................................S...................................../usr/local/lib/python2.6/site-packages/numpy/lib/_iotools.py:857: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > ndtype.names = validate(ndtype.names, defaultfmt=defaultfmt) > /usr/local/lib/python2.6/site-packages/numpy/lib/_iotools.py:854: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > ndtype.names = validate([''] * nbtypes, defaultfmt=defaultfmt) > /usr/local/lib/python2.6/site-packages/numpy/lib/_iotools.py:847: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > defaultfmt=defaultfmt) > ......................................................................................................................................................................................./usr/local/lib/python2.6/site-packages/numpy/lib/format.py:358: > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they > are platform specific. Use 'O' instead > dtype = numpy.dtype(d['descr']) > /usr/local/lib/python2.6/site-packages/numpy/lib/format.py:449: > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they > are platform specific. Use 'O' instead > array = cPickle.load(fp) > .............................................................................................................................................................................................................................................................................................................................../usr/local/lib/python2.6/site-packages/numpy/ma/core.py:366: > DeprecationWarning: DType strings 'O4' and 'O8' are deprecated because they > are platform specific. Use 'O' instead > deflist.append(default_fill_value(np.dtype(currenttype))) > ................/usr/local/lib/python2.6/site-packages/numpy/lib/npyio.py:1640: > DeprecationWarning: Setting NumPy dtype names is deprecated, the dtype will > become immutable in a future version > dtype.names = names > .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. > .......................................................................................................................................................................................................................... > ---------------------------------------------------------------------- > Ran 3064 tests in 22.795s > > OK (KNOWNFAIL=3, SKIP=2) > $ python -c "import numpy; numpy.test()" > Running unit tests for numpy > NumPy version 2.0.0.dev-93236a2 > NumPy is installed in /usr/lib64/python2.7/site-packages/numpy > Python version 2.7 (r27:82500, Sep 16 2010, 18:02:00) [GCC 4.5.1 20100907 > (Red Hat 4.5.1-3)] > nose version 1.0.0 > ......................../usr/lib64/python2.7/site-packages/numpy/core/tests/test_datetime.py:1313: > UserWarning: Need pytz library to test datetime timezones > warnings.warn("Need pytz library to test datetime timezones") > ...........................................................................................................................................................................................................................................................................................................................................................................................................S.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. > ..........................................................K..................................................................................................K......................K..........................................................................................................S.............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. > .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. > ....................................................................... > ---------------------------------------------------------------------- > Ran 3064 tests in 23.180s > > OK (KNOWNFAIL=3, SKIP=2) > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bpederse at gmail.com Fri Aug 19 11:18:12 2011 From: bpederse at gmail.com (Brent Pedersen) Date: Fri, 19 Aug 2011 09:18:12 -0600 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 9:09 AM, Jeremy Conlin wrote: > On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen wrote: >> On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin wrote: >>> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >>>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >>>>> I would like to use numpy's memmap on some data files I have. The first >>>>> 12 or so lines of the files contain text (header information) and the >>>>> remainder has the numerical data. Is there a way I can tell memmap to >>>>> skip a specified number of lines instead of a number of bytes? >>>> >>>> First use standard Python I/O functions to determine the number of >>>> bytes to skip at the beginning and the number of data items. Then pass >>>> in `offset` and `shape` parameters to numpy.memmap. >>> >>> Thanks for that suggestion. However, I'm unfamiliar with the I/O >>> functions you are referring to. Can you point me to do the >>> documentation? >>> >>> Thanks again, >>> Jeremy >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> this might get you started: >> >> >> import numpy as np >> >> # make some fake data with 12 header lines. >> with open('test.mm', 'w') as fhw: >> ? ?print >> fhw, "\n".join('header' for i in range(12)) >> ? ?np.arange(100, dtype=np.uint).tofile(fhw) >> >> # use normal python io to determine of offset after 12 lines. >> with open('test.mm') as fhr: >> ? ?for i in range(12): fhr.readline() >> ? ?offset = fhr.tell() >> >> # use the offset in your call to np.memmap. >> a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) > > Thanks, that looks good. I tried it, but it doesn't get the correct > data. I really don't understand what is going on. A simple code and > sample data is attached if anyone has a chance to look at it. > > Thanks, > Jeremy > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > in that case, i would use: np.loadtxt('tmp.dat', skiprows=12) From bsouthey at gmail.com Fri Aug 19 11:23:37 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 19 Aug 2011 10:23:37 -0500 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: <4E4E7972.9060807@gmail.com> Message-ID: <4E4E7FF9.6030006@gmail.com> On 08/19/2011 10:04 AM, Ralf Gommers wrote: > > > On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey > wrote: > > Hi, > I had to rebuild my Python2.6 as a 'normal' version. > > Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy > tests. > > Curiously, only tests in Python2.7 give almost no warnings but all > the other Python2.x give lots of warnings - Python2.6 and > Python2.7 are below. My expectation is that all versions should > behave the same regarding printing messages. > > > This is due to a change in Python 2.7 itself - deprecation warnings > are not shown anymore by default. Furthermore, all those messages are > unrelated to Mark's missing data commits. > > Cheers, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Yet: $ python2.6 -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 1.6.1 NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1 20100924 (Red Hat 4.5.1-4)] nose version 1.0.0 ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K.................................................................................................K......................K..................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ---------------------------------------------------------------------- Ran 3533 tests in 22.062s OK (KNOWNFAIL=3) Hence why I was curious about all the messages having not seen them. Is there some plan to cleanup these tests rather than 'hide' them? Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at enthought.com Fri Aug 19 11:23:34 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 19 Aug 2011 10:23:34 -0500 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin wrote: > On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen > wrote: > > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin > wrote: > >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: > >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: > >>>> I would like to use numpy's memmap on some data files I have. The > first > >>>> 12 or so lines of the files contain text (header information) and the > >>>> remainder has the numerical data. Is there a way I can tell memmap to > >>>> skip a specified number of lines instead of a number of bytes? > >>> > >>> First use standard Python I/O functions to determine the number of > >>> bytes to skip at the beginning and the number of data items. Then pass > >>> in `offset` and `shape` parameters to numpy.memmap. > >> > >> Thanks for that suggestion. However, I'm unfamiliar with the I/O > >> functions you are referring to. Can you point me to do the > >> documentation? > >> > >> Thanks again, > >> Jeremy > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > this might get you started: > > > > > > import numpy as np > > > > # make some fake data with 12 header lines. > > with open('test.mm', 'w') as fhw: > > print >> fhw, "\n".join('header' for i in range(12)) > > np.arange(100, dtype=np.uint).tofile(fhw) > > > > # use normal python io to determine of offset after 12 lines. > > with open('test.mm') as fhr: > > for i in range(12): fhr.readline() > > offset = fhr.tell() > > > > # use the offset in your call to np.memmap. > > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) > > Thanks, that looks good. I tried it, but it doesn't get the correct > data. I really don't understand what is going on. A simple code and > sample data is attached if anyone has a chance to look at it. > Your data file is all text. memmap is generally for binary data; it won't work with this file. Warren > > Thanks, > Jeremy > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jlconlin at gmail.com Fri Aug 19 11:26:02 2011 From: jlconlin at gmail.com (Jeremy Conlin) Date: Fri, 19 Aug 2011 09:26:02 -0600 Subject: [Numpy-discussion] How to start at line # x when using numpy.memmap In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 9:23 AM, Warren Weckesser wrote: > > > On Fri, Aug 19, 2011 at 10:09 AM, Jeremy Conlin wrote: >> >> On Fri, Aug 19, 2011 at 8:01 AM, Brent Pedersen >> wrote: >> > On Fri, Aug 19, 2011 at 7:29 AM, Jeremy Conlin >> > wrote: >> >> On Fri, Aug 19, 2011 at 7:19 AM, Pauli Virtanen wrote: >> >>> Fri, 19 Aug 2011 07:00:31 -0600, Jeremy Conlin wrote: >> >>>> I would like to use numpy's memmap on some data files I have. The >> >>>> first >> >>>> 12 or so lines of the files contain text (header information) and the >> >>>> remainder has the numerical data. Is there a way I can tell memmap to >> >>>> skip a specified number of lines instead of a number of bytes? >> >>> >> >>> First use standard Python I/O functions to determine the number of >> >>> bytes to skip at the beginning and the number of data items. Then pass >> >>> in `offset` and `shape` parameters to numpy.memmap. >> >> >> >> Thanks for that suggestion. However, I'm unfamiliar with the I/O >> >> functions you are referring to. Can you point me to do the >> >> documentation? >> >> >> >> Thanks again, >> >> Jeremy >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > this might get you started: >> > >> > >> > import numpy as np >> > >> > # make some fake data with 12 header lines. >> > with open('test.mm', 'w') as fhw: >> > ? ?print >> fhw, "\n".join('header' for i in range(12)) >> > ? ?np.arange(100, dtype=np.uint).tofile(fhw) >> > >> > # use normal python io to determine of offset after 12 lines. >> > with open('test.mm') as fhr: >> > ? ?for i in range(12): fhr.readline() >> > ? ?offset = fhr.tell() >> > >> > # use the offset in your call to np.memmap. >> > a = np.memmap('test.mm', mode='r', dtype=np.uint, offset=offset) >> >> Thanks, that looks good. I tried it, but it doesn't get the correct >> data. I really don't understand what is going on. A simple code and >> sample data is attached if anyone has a chance to look at it. > > > Your data file is all text.? memmap is generally for binary data; it won't > work with this file. > > Warren Yikes! I missed the "binary" in the first line of the documentation. Sorry! Jeremy From ralf.gommers at googlemail.com Fri Aug 19 11:27:43 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 19 Aug 2011 17:27:43 +0200 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: <4E4E7FF9.6030006@gmail.com> References: <4E4E7972.9060807@gmail.com> <4E4E7FF9.6030006@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 5:23 PM, Bruce Southey wrote: > ** > On 08/19/2011 10:04 AM, Ralf Gommers wrote: > > > > On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey wrote: > >> Hi, >> I had to rebuild my Python2.6 as a 'normal' version. >> >> Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests. >> >> Curiously, only tests in Python2.7 give almost no warnings but all the >> other Python2.x give lots of warnings - Python2.6 and Python2.7 are below. >> My expectation is that all versions should behave the same regarding >> printing messages. >> > > This is due to a change in Python 2.7 itself - deprecation warnings are not > shown anymore by default. Furthermore, all those messages are unrelated to > Mark's missing data commits. > > Cheers, > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > Yet: > > $ python2.6 -c "import numpy; numpy.test()" > Running unit tests for numpy > NumPy version 1.6.1 > > NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy > Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1 > 20100924 (Red Hat 4.5.1-4)] > nose version 1.0.0 > ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K............................... > ..................................................................K......................K.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. > ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > ---------------------------------------------------------------------- > Ran 3533 tests in 22.062s > > OK (KNOWNFAIL=3) > > Hence why I was curious about all the messages having not seen them. > > Is there some plan to cleanup these tests rather than 'hide' them? > > Yes, that happens before every release. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From alok at merfinllc.com Fri Aug 19 11:41:59 2011 From: alok at merfinllc.com (Alok Singhal) Date: Fri, 19 Aug 2011 08:41:59 -0700 Subject: [Numpy-discussion] longlong format error with Python <= 2.6 in scalartypes.c In-Reply-To: References: <8D5A8864-6827-4164-B8F6-198000B7491D@astro.physik.uni-goettingen.de> Message-ID: On Thu, Aug 18, 2011 at 9:01 PM, Mark Wiebe wrote: > On Thu, Aug 4, 2011 at 4:08 PM, Derek Homeier > wrote: >> >> Hi, >> >> commits c15a807e and c135371e (thus most immediately addressed to Mark, >> but I am sending this to the list hoping for more insight on the issue) >> introduce a test failure with Python 2.5+2.6 on Mac: >> >> FAIL: test_timedelta_scalar_construction (test_datetime.TestDateTime) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ?File >> "/Users/derek/lib/python2.6/site-packages/numpy/core/tests/test_datetime.py", >> line 219, in test_timedelta_scalar_construction >> ? ?assert_equal(str(np.timedelta64(3, 's')), '3 seconds') >> ?File "/Users/derek/lib/python2.6/site-packages/numpy/testing/utils.py", >> line 313, in assert_equal >> ? ?raise AssertionError(msg) >> AssertionError: >> Items are not equal: >> ?ACTUAL: '%lld seconds' >> ?DESIRED: '3 seconds' >> >> due to the "lld" format passed to PyUString_FromFormat in scalartypes.c. >> In the current npy_common.h I found the comment >> ?* ? ? ?in Python 2.6 the %lld formatter is not supported. In this >> ?* ? ? ?case we work around the problem by using the %zd formatter. >> though I did not notice that problem when I cleaned up the >> NPY_LONGLONG_FMT definitions in that file (and it is not entirely clear >> whether the comment only pertains to Windows...). Anyway changing the >> formatters in scalartypes.c to "zd" as well removes the failure and still >> works with Python 2.7 and 3.2 (at least on Mac OS). However I am wondering >> if >> a) NPY_[U]LONGLONG_FMT should also be defined conditional to the Python >> version (and if "%zu" is a valid formatter), and >> b) scalartypes.c should use NPY_LONGLONG_FMT from npy_common.h >> >> I am attaching a patch implementing a), but only the quick and dirty >> solution to b). > > I've touched this stuff as little as possible, because I rather dislike the > way the *_FMT macros are set up right now. I added a comment about > NPY_INTP_FMT in npy_common.h which I see you read. If you're going to try to > fix this, I hope you fix it deeper than this patch so it's not error-prone > anymore. > NPY_INTP_FMT is used together with PyErr_Format/PyString_FromFormat, whereas > the other *_FMT are used with the *printf functions from the C libraries. > These are not compatible, and the %zd hack was put in place because it > exists even in Python 2.4, and Py_ssize_t seems matches the ?pointer size in > all CPython versions. > Switching the timedelta64 format in scalartypes.c.src to "%zd" won't help on > 32-bit platforms, because it won't be a 64-bit type there, unlike how it > works ok for the NPY_INTP_FMT. In summary: > * There need to be changes to create a clear distinction between the *_FMT > for PyString_FromFormat vs the *_FMT for C library *printf functions > * I suspect we're out of luck for 32-bit older versions of CPython with > PyString_FromFormat > Cheers, > -Mark By the way, the above bug is fixed in the current master (see https://github.com/numpy/numpy/commit/730b861120094b1ab38670b9a8895a36c19296a7). I fixed it in the most direct way possible, because "the correct" way would require changes to a lot of places. From mwwiebe at gmail.com Fri Aug 19 11:48:54 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 19 Aug 2011 08:48:54 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: <4E4E701C.1030305@gmail.com> References: <4E4E701C.1030305@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey wrote: > ** > On 08/18/2011 04:43 PM, Mark Wiebe wrote: > > It's taken a lot of changes to get the NA mask support to its current > point, but the code ready for some testing now. You can read the > work-in-progress release notes here: > > > https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst > > To try it out, check out the missingdata branch from my github account, > here, and build in the standard way: > > https://github.com/m-paradox/numpy > > The things most important to test are: > > * Confirm that existing code still works correctly. I've tested against > SciPy and matplotlib. > * Confirm that the performance of code not using NA masks is the same or > better. > * Try to do computations with the NA values, find places they don't work > yet, and nominate unimplemented functionality important to you to be next on > the development list. The release notes have a preliminary list of > implemented/unimplemented functions. > * Report any crashes, build problems, or unexpected behaviors. > > In addition to adding the NA mask, I've also added features and done a > few performance changes here and there, like letting reductions like sum > take lists of axes instead of being a single axis or all of them. These > changes affect various bugs like > http://projects.scipy.org/numpy/ticket/1143 and > http://projects.scipy.org/numpy/ticket/533. > > Thanks! > Mark > > Here's a small example run using NAs: > > >>> import numpy as np > >>> np.__version__ > '2.0.0.dev-8a5e2a1' > >>> a = np.random.rand(3,3,3) > >>> a.flags.maskna = True > >>> a[np.random.rand(3,3,3) < 0.5] = np.NA > >>> a > array([[[NA, NA, 0.11511708], > [ 0.46661454, 0.47565512, NA], > [NA, NA, NA]], > > [[NA, 0.57860351, NA], > [NA, NA, 0.72012669], > [ 0.36582123, NA, 0.76289794]], > > [[ 0.65322748, 0.92794386, NA], > [ 0.53745165, 0.97520989, 0.17515083], > [ 0.71219688, 0.5184328 , 0.75802805]]]) > >>> np.mean(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.32710662, 0.10384331]]) > >>> np.mean(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0.11511708, 0.47113483, nan], > [ 0.57860351, 0.72012669, 0.56435958], > [ 0.79058567, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0. , 0.00452029, nan], > [ 0. , 0. , 0.19853835], > [ 0.13735819, 0.32710662, 0.10384331]]) > >>> np.std(a, axis=(1,2), skipna=True) > array([ 0.16786895, 0.15498008, 0.23811937]) > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > Hi, > That is great news! > (Python2.x will be another email.) > > Python3.1 and Python3.2 failed with building 'multiarraymodule_onefile.o' > but I could not see any obvious reason. > I've pushed a change to fix the Python 3 build, it was a use of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always default now. Tested with 3.2. Thanks! Mark > > I had removed my build directory and then 'python3 setup.py build' but I > saw this message: > Running from numpy source directory. > numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch detected, > the C API version numbers have to be updated. Current C api version is 6, > with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum for C > API version 6 in codegen_dir/cversions.txt is > e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, you > have to update C_API_VERSION in numpy/core/setup_common.py. > MismatchCAPIWarning) > > Upstream of the build log is below. > > Bruce > > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: > numpy/core/src/multiarray/na_singleton.c: At top level: > numpy/core/src/multiarray/na_singleton.c:708:25: error: > ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) > numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? > defined but not used > numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? declared > ?static? but never defined > numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: > ?gentype_getsegcount? defined but not used > numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: > ?gentype_getcharbuf? defined but not used > numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? > defined but not used > numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined > but not used > numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? > defined but not used > numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? > defined but not used > numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? > defined but not used > numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? > defined but not used > numpy/core/src/multiarray/na_mask.c:681:1: warning: > ?PyArray_GetMaskInversionFunction? defined but not used > In file included from numpy/core/src/multiarray/scalartypes.c.src:25:0, > from > numpy/core/src/multiarray/multiarraymodule_onefile.c:10: > numpy/core/src/multiarray/_datetime.h:9:1: warning: function declaration > isn?t a prototype > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:13:0: > numpy/core/src/multiarray/datetime.c:33:1: warning: function declaration > isn?t a prototype > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: > numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: > numpy/core/src/multiarray/arraytypes.c.src:643:9: warning: passing argument > 2 of ?PyArray_SetBaseObject? from incompatible pointer type > build/src.linux-x86_64-3.2/numpy/core/include/numpy/__multiarray_api.h:763:12: > note: expected ?struct PyObject *? but argument is of type ?struct > PyArrayObject *? > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:44:0: > numpy/core/src/multiarray/nditer_pywrap.c: In function ?npyiter_subscript?: > numpy/core/src/multiarray/nditer_pywrap.c:2395:29: warning: passing > argument 1 of ?PySlice_GetIndices? from incompatible pointer type > /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct > PyObject *? but argument is of type ?struct PySliceObject *? > numpy/core/src/multiarray/nditer_pywrap.c: In function > ?npyiter_ass_subscript?: > numpy/core/src/multiarray/nditer_pywrap.c:2440:29: warning: passing > argument 1 of ?PySlice_GetIndices? from incompatible pointer type > /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct > PyObject *? but argument is of type ?struct PySliceObject *? > In file included from > numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: > numpy/core/src/multiarray/na_singleton.c: At top level: > numpy/core/src/multiarray/na_singleton.c:708:25: error: > ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) > numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? > defined but not used > numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? declared > ?static? but never defined > numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: > ?gentype_getsegcount? defined but not used > numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: > ?gentype_getcharbuf? defined but not used > numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? > defined but not used > numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined > but not used > numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? > defined but not used > numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? > defined but not used > numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? > defined but not used > numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? > defined but not used > numpy/core/src/multiarray/na_mask.c:681:1: warning: > ?PyArray_GetMaskInversionFunction? defined but not used > error: Command "gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall > -Wstrict-prototypes -fPIC -Inumpy/core/include > -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy > -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath > -Inumpy/core/src/npysort -Inumpy/core/include > -I/usr/local/include/python3.2m > -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c > numpy/core/src/multiarray/multiarraymodule_onefile.c -o > build/temp.linux-x86_64-3.2/numpy/core/src/multiarray/multiarraymodule_onefile.o" > failed with exit status 1 > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Aug 19 11:50:50 2011 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 19 Aug 2011 10:50:50 -0500 Subject: [Numpy-discussion] Can't mix np.newaxis with boolean indexing Message-ID: I could have sworn that this use to work: import numpy as np a = np.random.random((100,)) b = (a > 0.5) print a[b, np.newaxis] But instead, I get this error on the latest master: Traceback (most recent call last): File "", line 1, in TypeError: long() argument must be a string or a number, not 'NoneType' Note, the simple work-around would be "a[b][:, np.newaxis]", but I can't imagine why the intuitive syntax would not be valid. Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjd4+numpy at cam.ac.uk Fri Aug 19 12:14:15 2011 From: rjd4+numpy at cam.ac.uk (Bob Dowling) Date: Fri, 19 Aug 2011 17:14:15 +0100 Subject: [Numpy-discussion] summing an array In-Reply-To: <4E4E77FD.8070107@simplistix.co.uk> References: <4E4D1F5A.2000205@simplistix.co.uk> <4E4D2891.4@cam.ac.uk> <4E4E77FD.8070107@simplistix.co.uk> Message-ID: <4E4E8BD7.8020201@cam.ac.uk> On 19/08/11 15:49, Chris Withers wrote: > On 18/08/2011 07:58, Bob Dowling wrote: >> >> >>> numpy.add.accumulate(a) >> array([ 0, 1, 3, 6, 10]) >> >> >>> numpy.add.accumulate(a, out=a) >> array([ 0, 1, 3, 6, 10]) > > What's the difference between numpy.cumsum and numpy.add.accumulate? I think they're equivalent, with numpy.cumprod() serving for numpy.multiply.accumulate() I have a prefeence for general procedures rather than special short cuts. The numpy..accumulate works for any of the binary ufuncs I think. The cumsum() and cumprod() functions only exist for add and multiply. e.g. >>> a = numpy.arange(2,5) >>> a array([2, 3, 4]) >>> numpy.power.accumulate(a) array([ 2, 8, 4096]) > Where can I find the reference docs for these? help(numpy.ufunc) help(numpy.ufunc.accumulate) is where I started. From bsouthey at gmail.com Fri Aug 19 13:55:13 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 19 Aug 2011 12:55:13 -0500 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: <4E4E701C.1030305@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 10:48 AM, Mark Wiebe wrote: > On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey wrote: >> >> On 08/18/2011 04:43 PM, Mark Wiebe wrote: >> >> It's taken a lot of changes to get the NA mask support to its current >> point, but the code ready for some testing now. You can read the >> work-in-progress release notes here: >> >> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst >> To try it out, check out the missingdata branch from my github account, >> here, and build in the standard way: >> https://github.com/m-paradox/numpy >> The things most important to test are: >> * Confirm that existing code still works correctly. I've tested against >> SciPy and matplotlib. >> * Confirm that the performance of code not using NA masks is the same or >> better. >> * Try to do computations with the NA values, find places they don't work >> yet, and nominate unimplemented functionality important to you to be next on >> the development list. The release notes have a preliminary list of >> implemented/unimplemented functions. >> * Report any crashes, build problems, or unexpected behaviors. >> In addition to adding the NA mask, I've also added features and done a few >> performance changes here and there, like letting reductions like sum take >> lists of axes instead of being a single axis or all of them. These changes >> affect various bugs >> like?http://projects.scipy.org/numpy/ticket/1143?and?http://projects.scipy.org/numpy/ticket/533. >> Thanks! >> Mark >> Here's a small example run using NAs: >> >>> import numpy as np >> >>> np.__version__ >> '2.0.0.dev-8a5e2a1' >> >>> a = np.random.rand(3,3,3) >> >>> a.flags.maskna = True >> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA >> >>> a >> array([[[NA, NA, ?0.11511708], >> ? ? ? ? [ 0.46661454, ?0.47565512, NA], >> ? ? ? ? [NA, NA, NA]], >> ? ? ? ?[[NA, ?0.57860351, NA], >> ? ? ? ? [NA, NA, ?0.72012669], >> ? ? ? ? [ 0.36582123, NA, ?0.76289794]], >> ? ? ? ?[[ 0.65322748, ?0.92794386, NA], >> ? ? ? ? [ 0.53745165, ?0.97520989, ?0.17515083], >> ? ? ? ? [ 0.71219688, ?0.5184328 , ?0.75802805]]]) >> >>> np.mean(a, axis=-1) >> array([[NA, NA, NA], >> ? ? ? ?[NA, NA, NA], >> ? ? ? ?[NA, ?0.56260412, ?0.66288591]]) >> >>> np.std(a, axis=-1) >> array([[NA, NA, NA], >> ? ? ? ?[NA, NA, NA], >> ? ? ? ?[NA, ?0.32710662, ?0.10384331]]) >> >>> np.mean(a, axis=-1, skipna=True) >> >> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: >> RuntimeWarning: invalid value encountered in true_divide >> ? um.true_divide(ret, rcount, out=ret, casting='unsafe') >> array([[ 0.11511708, ?0.47113483, ? ? ? ? nan], >> ? ? ? ?[ 0.57860351, ?0.72012669, ?0.56435958], >> ? ? ? ?[ 0.79058567, ?0.56260412, ?0.66288591]]) >> >>> np.std(a, axis=-1, skipna=True) >> >> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: >> RuntimeWarning: invalid value encountered in true_divide >> ? um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') >> >> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: >> RuntimeWarning: invalid value encountered in true_divide >> ? um.true_divide(ret, rcount, out=ret, casting='unsafe') >> array([[ 0. ? ? ? ?, ?0.00452029, ? ? ? ? nan], >> ? ? ? ?[ 0. ? ? ? ?, ?0. ? ? ? ?, ?0.19853835], >> ? ? ? ?[ 0.13735819, ?0.32710662, ?0.10384331]]) >> >>> np.std(a, axis=(1,2), skipna=True) >> array([ 0.16786895, ?0.15498008, ?0.23811937]) >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> Hi, >> That is great news! >> (Python2.x will be another email.) >> >> Python3.1 and Python3.2 failed with building 'multiarraymodule_onefile.o' >> but I could not see any obvious reason. > > I've pushed a change to fix the Python 3 build, it was a use > of?Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always > default now. Tested with 3.2. > Thanks! > Mark > >> >> I had removed my build directory and then 'python3 setup.py build' but I >> saw this message: >> Running from numpy source directory. >> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch detected, >> the C API version numbers have to be updated. Current C api version is 6, >> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum for C >> API version 6 in codegen_dir/cversions.txt is >> e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, you >> have to update C_API_VERSION? in numpy/core/setup_common.py. >> ? MismatchCAPIWarning) >> >> Upstream of the build log is below. >> >> Bruce >> >> In file included from >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: >> numpy/core/src/multiarray/na_singleton.c: At top level: >> numpy/core/src/multiarray/na_singleton.c:708:25: error: >> ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) >> numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? >> defined but not used >> numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? >> declared ?static? but never defined >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: >> ?gentype_getsegcount? defined but not used >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: >> ?gentype_getcharbuf? defined but not used >> numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? >> defined but not used >> numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined >> but not used >> numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? >> defined but not used >> numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? >> defined but not used >> numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? >> defined but not used >> numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? >> defined but not used >> numpy/core/src/multiarray/na_mask.c:681:1: warning: >> ?PyArray_GetMaskInversionFunction? defined but not used >> In file included from numpy/core/src/multiarray/scalartypes.c.src:25:0, >> ???????????????? from >> numpy/core/src/multiarray/multiarraymodule_onefile.c:10: >> numpy/core/src/multiarray/_datetime.h:9:1: warning: function declaration >> isn?t a prototype >> In file included from >> numpy/core/src/multiarray/multiarraymodule_onefile.c:13:0: >> numpy/core/src/multiarray/datetime.c:33:1: warning: function declaration >> isn?t a prototype >> In file included from >> numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: >> numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: >> numpy/core/src/multiarray/arraytypes.c.src:643:9: warning: passing >> argument 2 of ?PyArray_SetBaseObject? from incompatible pointer type >> >> build/src.linux-x86_64-3.2/numpy/core/include/numpy/__multiarray_api.h:763:12: >> note: expected ?struct PyObject *? but argument is of type ?struct >> PyArrayObject *? >> In file included from >> numpy/core/src/multiarray/multiarraymodule_onefile.c:44:0: >> numpy/core/src/multiarray/nditer_pywrap.c: In function >> ?npyiter_subscript?: >> numpy/core/src/multiarray/nditer_pywrap.c:2395:29: warning: passing >> argument 1 of ?PySlice_GetIndices? from incompatible pointer type >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct >> PyObject *? but argument is of type ?struct PySliceObject *? >> numpy/core/src/multiarray/nditer_pywrap.c: In function >> ?npyiter_ass_subscript?: >> numpy/core/src/multiarray/nditer_pywrap.c:2440:29: warning: passing >> argument 1 of ?PySlice_GetIndices? from incompatible pointer type >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct >> PyObject *? but argument is of type ?struct PySliceObject *? >> In file included from >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: >> numpy/core/src/multiarray/na_singleton.c: At top level: >> numpy/core/src/multiarray/na_singleton.c:708:25: error: >> ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) >> numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? >> defined but not used >> numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? >> declared ?static? but never defined >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: >> ?gentype_getsegcount? defined but not used >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: >> ?gentype_getcharbuf? defined but not used >> numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? >> defined but not used >> numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? defined >> but not used >> numpy/core/src/multiarray/number.c:464:1: warning: ?array_inplace_divide? >> defined but not used >> numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? >> defined but not used >> numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? >> defined but not used >> numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? >> defined but not used >> numpy/core/src/multiarray/na_mask.c:681:1: warning: >> ?PyArray_GetMaskInversionFunction? defined but not used >> error: Command "gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall >> -Wstrict-prototypes -fPIC -Inumpy/core/include >> -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy >> -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath >> -Inumpy/core/src/npysort -Inumpy/core/include >> -I/usr/local/include/python3.2m >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c >> numpy/core/src/multiarray/multiarraymodule_onefile.c -o >> build/temp.linux-x86_64-3.2/numpy/core/src/multiarray/multiarraymodule_onefile.o" >> failed with exit status 1 >> >> >> >> Thanks for the prompt responses. That fixes the build problem for both Python3.1 and Python3.2. I got some test errors below but I guess you are working on those. Bruce $ python3 -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 2.0.0.dev-965a5c6 NumPy is installed in /usr/lib64/python3.2/site-packages/numpy Python version 3.2 (r32:88445, Feb 21 2011, 21:11:06) [GCC 4.6.0 20110212 (Red Hat 4.6.0-0.7)] nose version 1.0.0 ..............S.......EFF.....E............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...................................................................................................................................................................................................K..................................................................................................K......................K..........................................................................................................S......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./usr/lib64/python3.2/site-packages/numpy/lib/format.py:575: ResourceWarning: unclosed file <_io.BufferedReader name='/tmp/tmpfmmo7x'> mode=mode, offset=offset) ......................................................................................................................................................................................................................../usr/lib64/python3.2/subprocess.py:460: ResourceWarning: unclosed file <_io.BufferedReader name=3> return Popen(*popenargs, **kwargs).wait() /usr/lib64/python3.2/subprocess.py:460: ResourceWarning: unclosed file <_io.BufferedReader name=8> return Popen(*popenargs, **kwargs).wait() .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... ====================================================================== ERROR: test_datetime_array_str (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", line 510, in test_datetime_array_str assert_equal(str(a), "['2011-03-16' '1920-01-01' '2013-05-19']") File "/usr/lib64/python3.2/site-packages/numpy/core/numeric.py", line 1400, in array_str return array2string(a, max_line_width, precision, suppress_small, ' ', "", str) File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", line 459, in array2string separator, prefix, formatter=formatter) File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", line 331, in _array2string _summaryEdgeItems, summary_insert)[:-1] File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", line 502, in _formatArray word = format_function(a[-i]) + separator File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", line 770, in __call__ casting=self.casting) TypeError: Cannot create a local timezone-based date string from a NumPy datetime without forcing 'unsafe' casting ====================================================================== ERROR: test_datetime_divide (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", line 926, in test_datetime_divide assert_equal(tda / tdb, 6.0 / 9.0) TypeError: internal error: could not find appropriate datetime inner loop in true_divide ufunc ====================================================================== FAIL: test_datetime_as_string (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", line 1166, in test_datetime_as_string '1959') File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", line 313, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: b'1959' DESIRED: '1959' ====================================================================== FAIL: test_datetime_as_string_timezone (test_datetime.TestDateTime) ---------------------------------------------------------------------- Traceback (most recent call last): File "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", line 1277, in test_datetime_as_string_timezone '2010-03-15T06:30Z') File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", line 313, in assert_equal raise AssertionError(msg) AssertionError: Items are not equal: ACTUAL: b'2010-03-15T06:30Z' DESIRED: '2010-03-15T06:30Z' ---------------------------------------------------------------------- Ran 3063 tests in 37.701s FAILED (KNOWNFAIL=4, SKIP=2, errors=2, failures=2) From bsouthey at gmail.com Fri Aug 19 13:55:55 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 19 Aug 2011 12:55:55 -0500 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: <4E4E7972.9060807@gmail.com> <4E4E7FF9.6030006@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 10:27 AM, Ralf Gommers wrote: > > > On Fri, Aug 19, 2011 at 5:23 PM, Bruce Southey wrote: >> >> On 08/19/2011 10:04 AM, Ralf Gommers wrote: >> >> On Fri, Aug 19, 2011 at 4:55 PM, Bruce Southey wrote: >>> >>> Hi, >>> I had to rebuild my Python2.6 as a 'normal' version. >>> >>> Anyhow, Python2.4, 2.5, 2.6 and 2.7 all build and pass the numpy tests. >>> >>> Curiously, only tests in Python2.7 give almost no warnings but all the >>> other Python2.x give lots of warnings - Python2.6 and Python2.7 are below. >>> My expectation is that all versions should behave the same regarding >>> printing messages. >> >> This is due to a change in Python 2.7 itself - deprecation warnings are >> not shown anymore by default. Furthermore, all those messages are unrelated >> to Mark's missing data commits. >> >> Cheers, >> Ralf >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> Yet: >> $ python2.6 -c "import numpy; numpy.test()" >> Running unit tests for numpy >> NumPy version 1.6.1 >> NumPy is installed in /usr/local/lib/python2.6/site-packages/numpy >> Python version 2.6.6 (r266:84292, Aug 19 2011, 09:21:38) [GCC 4.5.1 >> 20100924 (Red Hat 4.5.1-4)] >> nose version 1.0.0 >> >> ..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K............................... >> ..................................................................K......................K.................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >> .............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................. >> ................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >> ---------------------------------------------------------------------- >> Ran 3533 tests in 22.062s >> >> OK (KNOWNFAIL=3) >> >> Hence why I was curious about all the messages having not seen them. >> >> Is there some plan to cleanup these tests rather than 'hide' them? >> > Yes, that happens before every release. > > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Many thanks for the clarification! Bruce From charlesr.harris at gmail.com Fri Aug 19 14:07:45 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Aug 2011 12:07:45 -0600 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: <4E4E701C.1030305@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 11:55 AM, Bruce Southey wrote: > On Fri, Aug 19, 2011 at 10:48 AM, Mark Wiebe wrote: > > On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey > wrote: > >> > >> On 08/18/2011 04:43 PM, Mark Wiebe wrote: > >> > >> It's taken a lot of changes to get the NA mask support to its current > >> point, but the code ready for some testing now. You can read the > >> work-in-progress release notes here: > >> > >> > https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst > >> To try it out, check out the missingdata branch from my github account, > >> here, and build in the standard way: > >> https://github.com/m-paradox/numpy > >> The things most important to test are: > >> * Confirm that existing code still works correctly. I've tested against > >> SciPy and matplotlib. > >> * Confirm that the performance of code not using NA masks is the same or > >> better. > >> * Try to do computations with the NA values, find places they don't work > >> yet, and nominate unimplemented functionality important to you to be > next on > >> the development list. The release notes have a preliminary list of > >> implemented/unimplemented functions. > >> * Report any crashes, build problems, or unexpected behaviors. > >> In addition to adding the NA mask, I've also added features and done a > few > >> performance changes here and there, like letting reductions like sum > take > >> lists of axes instead of being a single axis or all of them. These > changes > >> affect various bugs > >> like http://projects.scipy.org/numpy/ticket/1143 and > http://projects.scipy.org/numpy/ticket/533. > >> Thanks! > >> Mark > >> Here's a small example run using NAs: > >> >>> import numpy as np > >> >>> np.__version__ > >> '2.0.0.dev-8a5e2a1' > >> >>> a = np.random.rand(3,3,3) > >> >>> a.flags.maskna = True > >> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA > >> >>> a > >> array([[[NA, NA, 0.11511708], > >> [ 0.46661454, 0.47565512, NA], > >> [NA, NA, NA]], > >> [[NA, 0.57860351, NA], > >> [NA, NA, 0.72012669], > >> [ 0.36582123, NA, 0.76289794]], > >> [[ 0.65322748, 0.92794386, NA], > >> [ 0.53745165, 0.97520989, 0.17515083], > >> [ 0.71219688, 0.5184328 , 0.75802805]]]) > >> >>> np.mean(a, axis=-1) > >> array([[NA, NA, NA], > >> [NA, NA, NA], > >> [NA, 0.56260412, 0.66288591]]) > >> >>> np.std(a, axis=-1) > >> array([[NA, NA, NA], > >> [NA, NA, NA], > >> [NA, 0.32710662, 0.10384331]]) > >> >>> np.mean(a, axis=-1, skipna=True) > >> > >> > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: > >> RuntimeWarning: invalid value encountered in true_divide > >> um.true_divide(ret, rcount, out=ret, casting='unsafe') > >> array([[ 0.11511708, 0.47113483, nan], > >> [ 0.57860351, 0.72012669, 0.56435958], > >> [ 0.79058567, 0.56260412, 0.66288591]]) > >> >>> np.std(a, axis=-1, skipna=True) > >> > >> > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: > >> RuntimeWarning: invalid value encountered in true_divide > >> um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') > >> > >> > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: > >> RuntimeWarning: invalid value encountered in true_divide > >> um.true_divide(ret, rcount, out=ret, casting='unsafe') > >> array([[ 0. , 0.00452029, nan], > >> [ 0. , 0. , 0.19853835], > >> [ 0.13735819, 0.32710662, 0.10384331]]) > >> >>> np.std(a, axis=(1,2), skipna=True) > >> array([ 0.16786895, 0.15498008, 0.23811937]) > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> Hi, > >> That is great news! > >> (Python2.x will be another email.) > >> > >> Python3.1 and Python3.2 failed with building > 'multiarraymodule_onefile.o' > >> but I could not see any obvious reason. > > > > I've pushed a change to fix the Python 3 build, it was a use > > of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always > > default now. Tested with 3.2. > > Thanks! > > Mark > > > >> > >> I had removed my build directory and then 'python3 setup.py build' but I > >> saw this message: > >> Running from numpy source directory. > >> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch > detected, > >> the C API version numbers have to be updated. Current C api version is > 6, > >> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum > for C > >> API version 6 in codegen_dir/cversions.txt is > >> e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, > you > >> have to update C_API_VERSION in numpy/core/setup_common.py. > >> MismatchCAPIWarning) > >> > >> Upstream of the build log is below. > >> > >> Bruce > >> > >> In file included from > >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: > >> numpy/core/src/multiarray/na_singleton.c: At top level: > >> numpy/core/src/multiarray/na_singleton.c:708:25: error: > >> ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) > >> numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? > >> defined but not used > >> numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? > >> declared ?static? but never defined > >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: > >> ?gentype_getsegcount? defined but not used > >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: > >> ?gentype_getcharbuf? defined but not used > >> numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? > >> defined but not used > >> numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? > defined > >> but not used > >> numpy/core/src/multiarray/number.c:464:1: warning: > ?array_inplace_divide? > >> defined but not used > >> numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? > >> defined but not used > >> numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? > >> defined but not used > >> numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? > >> defined but not used > >> numpy/core/src/multiarray/na_mask.c:681:1: warning: > >> ?PyArray_GetMaskInversionFunction? defined but not used > >> In file included from numpy/core/src/multiarray/scalartypes.c.src:25:0, > >> from > >> numpy/core/src/multiarray/multiarraymodule_onefile.c:10: > >> numpy/core/src/multiarray/_datetime.h:9:1: warning: function declaration > >> isn?t a prototype > >> In file included from > >> numpy/core/src/multiarray/multiarraymodule_onefile.c:13:0: > >> numpy/core/src/multiarray/datetime.c:33:1: warning: function declaration > >> isn?t a prototype > >> In file included from > >> numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: > >> numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: > >> numpy/core/src/multiarray/arraytypes.c.src:643:9: warning: passing > >> argument 2 of ?PyArray_SetBaseObject? from incompatible pointer type > >> > >> > build/src.linux-x86_64-3.2/numpy/core/include/numpy/__multiarray_api.h:763:12: > >> note: expected ?struct PyObject *? but argument is of type ?struct > >> PyArrayObject *? > >> In file included from > >> numpy/core/src/multiarray/multiarraymodule_onefile.c:44:0: > >> numpy/core/src/multiarray/nditer_pywrap.c: In function > >> ?npyiter_subscript?: > >> numpy/core/src/multiarray/nditer_pywrap.c:2395:29: warning: passing > >> argument 1 of ?PySlice_GetIndices? from incompatible pointer type > >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct > >> PyObject *? but argument is of type ?struct PySliceObject *? > >> numpy/core/src/multiarray/nditer_pywrap.c: In function > >> ?npyiter_ass_subscript?: > >> numpy/core/src/multiarray/nditer_pywrap.c:2440:29: warning: passing > >> argument 1 of ?PySlice_GetIndices? from incompatible pointer type > >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected ?struct > >> PyObject *? but argument is of type ?struct PySliceObject *? > >> In file included from > >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: > >> numpy/core/src/multiarray/na_singleton.c: At top level: > >> numpy/core/src/multiarray/na_singleton.c:708:25: error: > >> ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) > >> numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? > >> defined but not used > >> numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? > >> declared ?static? but never defined > >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: > >> ?gentype_getsegcount? defined but not used > >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: > >> ?gentype_getcharbuf? defined but not used > >> numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? > >> defined but not used > >> numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? > defined > >> but not used > >> numpy/core/src/multiarray/number.c:464:1: warning: > ?array_inplace_divide? > >> defined but not used > >> numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? > >> defined but not used > >> numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? > >> defined but not used > >> numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? > >> defined but not used > >> numpy/core/src/multiarray/na_mask.c:681:1: warning: > >> ?PyArray_GetMaskInversionFunction? defined but not used > >> error: Command "gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall > >> -Wstrict-prototypes -fPIC -Inumpy/core/include > >> -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy > >> -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core > >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray > -Inumpy/core/src/umath > >> -Inumpy/core/src/npysort -Inumpy/core/include > >> -I/usr/local/include/python3.2m > >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray > >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c > >> numpy/core/src/multiarray/multiarraymodule_onefile.c -o > >> > build/temp.linux-x86_64-3.2/numpy/core/src/multiarray/multiarraymodule_onefile.o" > >> failed with exit status 1 > >> > >> > >> > >> > Thanks for the prompt responses. > > That fixes the build problem for both Python3.1 and Python3.2. > > I got some test errors below but I guess you are working on those. > > > Bruce > > > > $ python3 -c "import numpy; numpy.test()" > Running unit tests for numpy > NumPy version 2.0.0.dev-965a5c6 > NumPy is installed in /usr/lib64/python3.2/site-packages/numpy > Python version 3.2 (r32:88445, Feb 21 2011, 21:11:06) [GCC 4.6.0 > 20110212 (Red Hat 4.6.0-0.7)] > nose version 1.0.0 > > ..............S.......EFF.....E............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...................................................................................................................................................................................................K..................................................................................................K......................K..........................................................................................................S......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./usr/lib64/python3.2/site-packages/numpy/lib/format.py:575: > ResourceWarning: unclosed file <_io.BufferedReader > name='/tmp/tmpfmmo7x'> > mode=mode, offset=offset) > > ......................................................................................................................................................................................................................../usr/lib64/python3.2/subprocess.py:460: > ResourceWarning: unclosed file <_io.BufferedReader name=3> > return Popen(*popenargs, **kwargs).wait() > /usr/lib64/python3.2/subprocess.py:460: ResourceWarning: unclosed file > <_io.BufferedReader name=8> > return Popen(*popenargs, **kwargs).wait() > > .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... > ====================================================================== > ERROR: test_datetime_array_str (test_datetime.TestDateTime) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", > line 510, in test_datetime_array_str > assert_equal(str(a), "['2011-03-16' '1920-01-01' '2013-05-19']") > File "/usr/lib64/python3.2/site-packages/numpy/core/numeric.py", > line 1400, in array_str > return array2string(a, max_line_width, precision, suppress_small, > ' ', "", str) > File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", > line 459, in array2string > separator, prefix, formatter=formatter) > File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", > line 331, in _array2string > _summaryEdgeItems, summary_insert)[:-1] > File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", > line 502, in _formatArray > word = format_function(a[-i]) + separator > File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", > line 770, in __call__ > casting=self.casting) > TypeError: Cannot create a local timezone-based date string from a > NumPy datetime without forcing 'unsafe' casting > > ====================================================================== > ERROR: test_datetime_divide (test_datetime.TestDateTime) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", > line 926, in test_datetime_divide > assert_equal(tda / tdb, 6.0 / 9.0) > TypeError: internal error: could not find appropriate datetime inner > loop in true_divide ufunc > > ====================================================================== > FAIL: test_datetime_as_string (test_datetime.TestDateTime) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", > line 1166, in test_datetime_as_string > '1959') > File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", > line 313, in assert_equal > raise AssertionError(msg) > AssertionError: > Items are not equal: > ACTUAL: b'1959' > DESIRED: '1959' > > ====================================================================== > FAIL: test_datetime_as_string_timezone (test_datetime.TestDateTime) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", > line 1277, in test_datetime_as_string_timezone > '2010-03-15T06:30Z') > File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", > line 313, in assert_equal > raise AssertionError(msg) > AssertionError: > Items are not equal: > ACTUAL: b'2010-03-15T06:30Z' > DESIRED: '2010-03-15T06:30Z' > > ---------------------------------------------------------------------- > Ran 3063 tests in 37.701s > > FAILED (KNOWNFAIL=4, SKIP=2, errors=2, failures=2) > The 3.2 test errors aren't new. I'd fix the tests except I'm not sure if Mark wants to modify the datetime stuff instead. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Aug 19 14:12:10 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 19 Aug 2011 11:12:10 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: <4E4E701C.1030305@gmail.com> Message-ID: On Fri, Aug 19, 2011 at 11:07 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Fri, Aug 19, 2011 at 11:55 AM, Bruce Southey wrote: > >> On Fri, Aug 19, 2011 at 10:48 AM, Mark Wiebe wrote: >> > On Fri, Aug 19, 2011 at 7:15 AM, Bruce Southey >> wrote: >> >> >> >> On 08/18/2011 04:43 PM, Mark Wiebe wrote: >> >> >> >> It's taken a lot of changes to get the NA mask support to its current >> >> point, but the code ready for some testing now. You can read the >> >> work-in-progress release notes here: >> >> >> >> >> https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst >> >> To try it out, check out the missingdata branch from my github account, >> >> here, and build in the standard way: >> >> https://github.com/m-paradox/numpy >> >> The things most important to test are: >> >> * Confirm that existing code still works correctly. I've tested against >> >> SciPy and matplotlib. >> >> * Confirm that the performance of code not using NA masks is the same >> or >> >> better. >> >> * Try to do computations with the NA values, find places they don't >> work >> >> yet, and nominate unimplemented functionality important to you to be >> next on >> >> the development list. The release notes have a preliminary list of >> >> implemented/unimplemented functions. >> >> * Report any crashes, build problems, or unexpected behaviors. >> >> In addition to adding the NA mask, I've also added features and done a >> few >> >> performance changes here and there, like letting reductions like sum >> take >> >> lists of axes instead of being a single axis or all of them. These >> changes >> >> affect various bugs >> >> like http://projects.scipy.org/numpy/ticket/1143 and >> http://projects.scipy.org/numpy/ticket/533. >> >> Thanks! >> >> Mark >> >> Here's a small example run using NAs: >> >> >>> import numpy as np >> >> >>> np.__version__ >> >> '2.0.0.dev-8a5e2a1' >> >> >>> a = np.random.rand(3,3,3) >> >> >>> a.flags.maskna = True >> >> >>> a[np.random.rand(3,3,3) < 0.5] = np.NA >> >> >>> a >> >> array([[[NA, NA, 0.11511708], >> >> [ 0.46661454, 0.47565512, NA], >> >> [NA, NA, NA]], >> >> [[NA, 0.57860351, NA], >> >> [NA, NA, 0.72012669], >> >> [ 0.36582123, NA, 0.76289794]], >> >> [[ 0.65322748, 0.92794386, NA], >> >> [ 0.53745165, 0.97520989, 0.17515083], >> >> [ 0.71219688, 0.5184328 , 0.75802805]]]) >> >> >>> np.mean(a, axis=-1) >> >> array([[NA, NA, NA], >> >> [NA, NA, NA], >> >> [NA, 0.56260412, 0.66288591]]) >> >> >>> np.std(a, axis=-1) >> >> array([[NA, NA, NA], >> >> [NA, NA, NA], >> >> [NA, 0.32710662, 0.10384331]]) >> >> >>> np.mean(a, axis=-1, skipna=True) >> >> >> >> >> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: >> >> RuntimeWarning: invalid value encountered in true_divide >> >> um.true_divide(ret, rcount, out=ret, casting='unsafe') >> >> array([[ 0.11511708, 0.47113483, nan], >> >> [ 0.57860351, 0.72012669, 0.56435958], >> >> [ 0.79058567, 0.56260412, 0.66288591]]) >> >> >>> np.std(a, axis=-1, skipna=True) >> >> >> >> >> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: >> >> RuntimeWarning: invalid value encountered in true_divide >> >> um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') >> >> >> >> >> /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: >> >> RuntimeWarning: invalid value encountered in true_divide >> >> um.true_divide(ret, rcount, out=ret, casting='unsafe') >> >> array([[ 0. , 0.00452029, nan], >> >> [ 0. , 0. , 0.19853835], >> >> [ 0.13735819, 0.32710662, 0.10384331]]) >> >> >>> np.std(a, axis=(1,2), skipna=True) >> >> array([ 0.16786895, 0.15498008, 0.23811937]) >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> Hi, >> >> That is great news! >> >> (Python2.x will be another email.) >> >> >> >> Python3.1 and Python3.2 failed with building >> 'multiarraymodule_onefile.o' >> >> but I could not see any obvious reason. >> > >> > I've pushed a change to fix the Python 3 build, it was a use >> > of Py_TPFLAGS_CHECKTYPES, which is no longer in Python3 but is always >> > default now. Tested with 3.2. >> > Thanks! >> > Mark >> > >> >> >> >> I had removed my build directory and then 'python3 setup.py build' but >> I >> >> saw this message: >> >> Running from numpy source directory. >> >> numpy/core/setup_common.py:86: MismatchCAPIWarning: API mismatch >> detected, >> >> the C API version numbers have to be updated. Current C api version is >> 6, >> >> with checksum ef5688af03ffa23dd8e11734f5b69313, but recorded checksum >> for C >> >> API version 6 in codegen_dir/cversions.txt is >> >> e61d5dc51fa1c6459328266e215d6987. If functions were added in the C API, >> you >> >> have to update C_API_VERSION in numpy/core/setup_common.py. >> >> MismatchCAPIWarning) >> >> >> >> Upstream of the build log is below. >> >> >> >> Bruce >> >> >> >> In file included from >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: >> >> numpy/core/src/multiarray/na_singleton.c: At top level: >> >> numpy/core/src/multiarray/na_singleton.c:708:25: error: >> >> ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) >> >> numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? >> >> defined but not used >> >> numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? >> >> declared ?static? but never defined >> >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: >> >> ?gentype_getsegcount? defined but not used >> >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: >> >> ?gentype_getcharbuf? defined but not used >> >> numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? >> >> defined but not used >> >> numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? >> defined >> >> but not used >> >> numpy/core/src/multiarray/number.c:464:1: warning: >> ?array_inplace_divide? >> >> defined but not used >> >> numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? >> >> defined but not used >> >> numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? >> >> defined but not used >> >> numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? >> >> defined but not used >> >> numpy/core/src/multiarray/na_mask.c:681:1: warning: >> >> ?PyArray_GetMaskInversionFunction? defined but not used >> >> In file included from numpy/core/src/multiarray/scalartypes.c.src:25:0, >> >> from >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:10: >> >> numpy/core/src/multiarray/_datetime.h:9:1: warning: function >> declaration >> >> isn?t a prototype >> >> In file included from >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:13:0: >> >> numpy/core/src/multiarray/datetime.c:33:1: warning: function >> declaration >> >> isn?t a prototype >> >> In file included from >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:17:0: >> >> numpy/core/src/multiarray/arraytypes.c.src: In function ?VOID_getitem?: >> >> numpy/core/src/multiarray/arraytypes.c.src:643:9: warning: passing >> >> argument 2 of ?PyArray_SetBaseObject? from incompatible pointer type >> >> >> >> >> build/src.linux-x86_64-3.2/numpy/core/include/numpy/__multiarray_api.h:763:12: >> >> note: expected ?struct PyObject *? but argument is of type ?struct >> >> PyArrayObject *? >> >> In file included from >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:44:0: >> >> numpy/core/src/multiarray/nditer_pywrap.c: In function >> >> ?npyiter_subscript?: >> >> numpy/core/src/multiarray/nditer_pywrap.c:2395:29: warning: passing >> >> argument 1 of ?PySlice_GetIndices? from incompatible pointer type >> >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected >> ?struct >> >> PyObject *? but argument is of type ?struct PySliceObject *? >> >> numpy/core/src/multiarray/nditer_pywrap.c: In function >> >> ?npyiter_ass_subscript?: >> >> numpy/core/src/multiarray/nditer_pywrap.c:2440:29: warning: passing >> >> argument 1 of ?PySlice_GetIndices? from incompatible pointer type >> >> /usr/local/include/python3.2m/sliceobject.h:38:5: note: expected >> ?struct >> >> PyObject *? but argument is of type ?struct PySliceObject *? >> >> In file included from >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c:53:0: >> >> numpy/core/src/multiarray/na_singleton.c: At top level: >> >> numpy/core/src/multiarray/na_singleton.c:708:25: error: >> >> ?Py_TPFLAGS_CHECKTYPES? undeclared here (not in a function) >> >> numpy/core/src/multiarray/common.c:48:1: warning: ?_use_default_type? >> >> defined but not used >> >> numpy/core/src/multiarray/ctors.h:93:1: warning: ?_arrays_overlap? >> >> declared ?static? but never defined >> >> numpy/core/src/multiarray/scalartypes.c.src:2251:1: warning: >> >> ?gentype_getsegcount? defined but not used >> >> numpy/core/src/multiarray/scalartypes.c.src:2269:1: warning: >> >> ?gentype_getcharbuf? defined but not used >> >> numpy/core/src/multiarray/mapping.c:110:1: warning: ?_array_ass_item? >> >> defined but not used >> >> numpy/core/src/multiarray/number.c:266:1: warning: ?array_divide? >> defined >> >> but not used >> >> numpy/core/src/multiarray/number.c:464:1: warning: >> ?array_inplace_divide? >> >> defined but not used >> >> numpy/core/src/multiarray/buffer.c:25:1: warning: ?array_getsegcount? >> >> defined but not used >> >> numpy/core/src/multiarray/buffer.c:58:1: warning: ?array_getwritebuf? >> >> defined but not used >> >> numpy/core/src/multiarray/buffer.c:71:1: warning: ?array_getcharbuf? >> >> defined but not used >> >> numpy/core/src/multiarray/na_mask.c:681:1: warning: >> >> ?PyArray_GetMaskInversionFunction? defined but not used >> >> error: Command "gcc -pthread -DNDEBUG -g -fwrapv -O3 -Wall >> >> -Wstrict-prototypes -fPIC -Inumpy/core/include >> >> -Ibuild/src.linux-x86_64-3.2/numpy/core/include/numpy >> >> -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core >> >> -Inumpy/core/src/npymath -Inumpy/core/src/multiarray >> -Inumpy/core/src/umath >> >> -Inumpy/core/src/npysort -Inumpy/core/include >> >> -I/usr/local/include/python3.2m >> >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/multiarray >> >> -Ibuild/src.linux-x86_64-3.2/numpy/core/src/umath -c >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c -o >> >> >> build/temp.linux-x86_64-3.2/numpy/core/src/multiarray/multiarraymodule_onefile.o" >> >> failed with exit status 1 >> >> >> >> >> >> >> >> >> Thanks for the prompt responses. >> >> That fixes the build problem for both Python3.1 and Python3.2. >> >> I got some test errors below but I guess you are working on those. >> >> >> Bruce >> >> >> >> $ python3 -c "import numpy; numpy.test()" >> Running unit tests for numpy >> NumPy version 2.0.0.dev-965a5c6 >> NumPy is installed in /usr/lib64/python3.2/site-packages/numpy >> Python version 3.2 (r32:88445, Feb 21 2011, 21:11:06) [GCC 4.6.0 >> 20110212 (Red Hat 4.6.0-0.7)] >> nose version 1.0.0 >> >> ..............S.......EFF.....E............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................K...................................................................................................................................................................................................K..................................................................................................K......................K..........................................................................................................S......................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./usr/lib64/python3.2/site-packages/numpy/lib/format.py:575: >> ResourceWarning: unclosed file <_io.BufferedReader >> name='/tmp/tmpfmmo7x'> >> mode=mode, offset=offset) >> >> ......................................................................................................................................................................................................................../usr/lib64/python3.2/subprocess.py:460: >> ResourceWarning: unclosed file <_io.BufferedReader name=3> >> return Popen(*popenargs, **kwargs).wait() >> /usr/lib64/python3.2/subprocess.py:460: ResourceWarning: unclosed file >> <_io.BufferedReader name=8> >> return Popen(*popenargs, **kwargs).wait() >> >> .................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >> ====================================================================== >> ERROR: test_datetime_array_str (test_datetime.TestDateTime) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", >> line 510, in test_datetime_array_str >> assert_equal(str(a), "['2011-03-16' '1920-01-01' '2013-05-19']") >> File "/usr/lib64/python3.2/site-packages/numpy/core/numeric.py", >> line 1400, in array_str >> return array2string(a, max_line_width, precision, suppress_small, >> ' ', "", str) >> File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", >> line 459, in array2string >> separator, prefix, formatter=formatter) >> File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", >> line 331, in _array2string >> _summaryEdgeItems, summary_insert)[:-1] >> File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", >> line 502, in _formatArray >> word = format_function(a[-i]) + separator >> File "/usr/lib64/python3.2/site-packages/numpy/core/arrayprint.py", >> line 770, in __call__ >> casting=self.casting) >> TypeError: Cannot create a local timezone-based date string from a >> NumPy datetime without forcing 'unsafe' casting >> >> ====================================================================== >> ERROR: test_datetime_divide (test_datetime.TestDateTime) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", >> line 926, in test_datetime_divide >> assert_equal(tda / tdb, 6.0 / 9.0) >> TypeError: internal error: could not find appropriate datetime inner >> loop in true_divide ufunc >> >> ====================================================================== >> FAIL: test_datetime_as_string (test_datetime.TestDateTime) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", >> line 1166, in test_datetime_as_string >> '1959') >> File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", >> line 313, in assert_equal >> raise AssertionError(msg) >> AssertionError: >> Items are not equal: >> ACTUAL: b'1959' >> DESIRED: '1959' >> >> ====================================================================== >> FAIL: test_datetime_as_string_timezone (test_datetime.TestDateTime) >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> File >> "/usr/lib64/python3.2/site-packages/numpy/core/tests/test_datetime.py", >> line 1277, in test_datetime_as_string_timezone >> '2010-03-15T06:30Z') >> File "/usr/lib64/python3.2/site-packages/numpy/testing/utils.py", >> line 313, in assert_equal >> raise AssertionError(msg) >> AssertionError: >> Items are not equal: >> ACTUAL: b'2010-03-15T06:30Z' >> DESIRED: '2010-03-15T06:30Z' >> >> ---------------------------------------------------------------------- >> Ran 3063 tests in 37.701s >> >> FAILED (KNOWNFAIL=4, SKIP=2, errors=2, failures=2) >> > > The 3.2 test errors aren't new. I'd fix the tests except I'm not sure if > Mark wants to modify the datetime stuff instead. > I left them largely untouched because I found it weird that the 'S' data type doesn't return strings in Python 3... I guess maybe the datetime_as_string function should convert to 'U' data type on Python 3 after building the 'S' array to work around this design choice. I'll look at it after the NA stuff is wrapped up. -Mark > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Fri Aug 19 14:37:28 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 19 Aug 2011 13:37:28 -0500 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: Hi, Just some immediate minor observations that are really about trying to be consistent: 1) Could you keep the display of the NA dtype be the same as the array? For example, NA dtype is displayed as '>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) >>> a array([[ 1., 2., 3., NA], [ 3., 4., nan, 5.]]) >>> a.dtype dtype('float64') >>> a.sum() NA(dtype='>> a.sum(skipna=True) Traceback (most recent call last): File "", line 1, in TypeError: 'skipna' is an invalid keyword argument for this function >>> np.sum(a,skipna=True) nan 3) Can the skipna flag be extended to exclude other non-finite cases like NaN? 4) Assigning a np.NA needs a better error message but the Integer array case is more informative: >>> b=np.array([1,2,3,4], dtype=np.float128) >>> b[0]=np.NA Traceback (most recent call last): File "", line 1, in TypeError: float() argument must be a string or a number >>> j=np.array([1,2,3]) >>> j array([1, 2, 3]) >>> j[0]=ina Traceback (most recent call last): File "", line 1, in TypeError: int() argument must be a string or a number, not 'numpy.NAType' But it is nice that np.NA 'adjusts' to the insertion array: >>> b.flags.maskna = True >>> ana NA(dtype='>> b[0]=ana >>> b[0] NA(dtype='>> j=np.array([1,2,3], dtype=np.int8) >>> j array([1, 2, 3], dtype=int8) >>> j.flags.maskna=True >>> j array([1, 2, 3], maskna=True, dtype=int8) >>> j[0]=np.NA >>> j array([NA, 2, 3], dtype=int8) # Ithink it should still display 'maskna=True'. Bruce From youknowho2000 at yahoo.com Fri Aug 19 14:38:23 2011 From: youknowho2000 at yahoo.com (Ian) Date: Fri, 19 Aug 2011 11:38:23 -0700 (PDT) Subject: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape Message-ID: <1313779103.55122.YahooMailNeo@web39408.mail.mud.yahoo.com> Hello list, I am storing a multidimensional array as binary in a Postgres 9.04 database. For retrieval of this array from the database I thought frombuffer() was my solution, however I see that this constructs a one-dimensional array. I read in the documentation about the buffer parameter in the ndarray() constructor, but that requires the shape of the array. Is there a way to re-construct a multidimensional array from a buffer without knowing its shape? Thanks. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Fri Aug 19 14:44:02 2011 From: shish at keba.be (Olivier Delalleau) Date: Fri, 19 Aug 2011 14:44:02 -0400 Subject: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape In-Reply-To: <1313779103.55122.YahooMailNeo@web39408.mail.mud.yahoo.com> References: <1313779103.55122.YahooMailNeo@web39408.mail.mud.yahoo.com> Message-ID: How could it be possible? If you only have the buffer data, there could be many different valid shapes associated to this data. -=- Olivier 2011/8/19 Ian > Hello list, > > I am storing a multidimensional array as binary in a Postgres 9.04 > database. For retrieval of this array from the database I thought > frombuffer() was my solution, however I see that this constructs a > one-dimensional array. I read in the documentation about the buffer > parameter in the ndarray() constructor, but that requires the shape of the > array. > > Is there a way to re-construct a multidimensional array from a buffer > without knowing its shape? > > Thanks. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Aug 19 14:44:18 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 19 Aug 2011 12:44:18 -0600 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey wrote: > Hi, > Just some immediate minor observations that are really about trying to > be consistent: > > 1) Could you keep the display of the NA dtype be the same as the array? > For example, NA dtype is displayed as ' 'float64' as that is the array dtype. > >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) > >>> a > array([[ 1., 2., 3., NA], > [ 3., 4., nan, 5.]]) > >>> a.dtype > dtype('float64') > >>> a.sum() > NA(dtype=' > 2) Can the 'skipna' flag be added to the methods? > >>> a.sum(skipna=True) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'skipna' is an invalid keyword argument for this function > >>> np.sum(a,skipna=True) > nan > > 3) Can the skipna flag be extended to exclude other non-finite cases like > NaN? > > 4) Assigning a np.NA needs a better error message but the Integer > array case is more informative: > >>> b=np.array([1,2,3,4], dtype=np.float128) > >>> b[0]=np.NA > Traceback (most recent call last): > File "", line 1, in > TypeError: float() argument must be a string or a number > > >>> j=np.array([1,2,3]) > >>> j > array([1, 2, 3]) > >>> j[0]=ina > Traceback (most recent call last): > File "", line 1, in > TypeError: int() argument must be a string or a number, not 'numpy.NAType' > > But it is nice that np.NA 'adjusts' to the insertion array: > >>> b.flags.maskna = True > >>> ana > NA(dtype=' >>> b[0]=ana > >>> b[0] > NA(dtype=' > 5) Different display depending on masked state. That is I think that > 'maskna=True' should be displayed always when flags.maskna is True : > >>> j=np.array([1,2,3], dtype=np.int8) > >>> j > array([1, 2, 3], dtype=int8) > >>> j.flags.maskna=True > >>> j > array([1, 2, 3], maskna=True, dtype=int8) > >>> j[0]=np.NA > >>> j > array([NA, 2, 3], dtype=int8) # Ithink it should still display > 'maskna=True'. > > My main peeve is that NA is upper case ;) I suppose that could use some discussion. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From youknowho2000 at yahoo.com Fri Aug 19 14:57:49 2011 From: youknowho2000 at yahoo.com (Ian) Date: Fri, 19 Aug 2011 11:57:49 -0700 (PDT) Subject: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape In-Reply-To: References: <1313779103.55122.YahooMailNeo@web39408.mail.mud.yahoo.com> Message-ID: <1313780269.53187.YahooMailNeo@web39414.mail.mud.yahoo.com> Right. I'm new to NumPy so I figured I'd check if there was some nifty way of preserving the shape without storing it in the database that I hadn't discovered yet. No worries, I'll store the shape alongside the array. Thanks for the reply. Ian >________________________________ >From: Olivier Delalleau >To: Discussion of Numerical Python >Sent: Friday, August 19, 2011 11:44 AM >Subject: Re: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape > > >How could it be possible? If you only have the buffer data, there could be many different valid shapes associated to this data. > >-=- Olivier > > >2011/8/19 Ian > >Hello list, >> >> >>I am storing a multidimensional array as binary in a Postgres 9.04 database. For retrieval of this array from the database I thought frombuffer() was my solution, however I see that this constructs a one-dimensional array. I read in the documentation about the buffer parameter in the ndarray() constructor, but that requires the shape of the array. >> >> >>Is there a way to re-construct a multidimensional array from a buffer without knowing its shape? >> >> >>Thanks. >>_______________________________________________ >>NumPy-Discussion mailing list >>NumPy-Discussion at scipy.org >>http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >_______________________________________________ >NumPy-Discussion mailing list >NumPy-Discussion at scipy.org >http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Fri Aug 19 15:13:51 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Fri, 19 Aug 2011 20:13:51 +0100 Subject: [Numpy-discussion] Reconstruct multidimensional array from buffer without shape In-Reply-To: <1313780269.53187.YahooMailNeo@web39414.mail.mud.yahoo.com> References: <1313779103.55122.YahooMailNeo@web39408.mail.mud.yahoo.com> <1313780269.53187.YahooMailNeo@web39414.mail.mud.yahoo.com> Message-ID: <3F5EE6DA-BC39-4B47-81D8-C8A20DA90F58@gmail.com> On 19. aug. 2011, at 19.57, Ian wrote: > Right. I'm new to NumPy so I figured I'd check if there was some nifty way of preserving the shape without storing it in the database that I hadn't discovered yet. No worries, I'll store the shape alongside the array. Thanks for the reply. > I love the h5py package so I keep recommending it (and pytables is supposed to be good, I think?). h5py stores files in hdf5, which is readable from C,C++,fortran,java,python... It also keeps track of shape and you can store other metadata (e.g. strings) as desired. Also I believe the numpy format (see e.g. http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html#numpy.savez) can do the same, although I don't think performance scales as well for huge arrays, and it's not language-neutral (to my knowledge). Cheers Paul From mwwiebe at gmail.com Fri Aug 19 15:15:05 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 19 Aug 2011 12:15:05 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey wrote: > Hi, > Just some immediate minor observations that are really about trying to > be consistent: > > 1) Could you keep the display of the NA dtype be the same as the array? > For example, NA dtype is displayed as ' 'float64' as that is the array dtype. > >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) > >>> a > array([[ 1., 2., 3., NA], > [ 3., 4., nan, 5.]]) > >>> a.dtype > dtype('float64') > >>> a.sum() > NA(dtype=' I suppose I can do it that way, sure. I think it would be good to change the 'float64' into ' 2) Can the 'skipna' flag be added to the methods? > >>> a.sum(skipna=True) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'skipna' is an invalid keyword argument for this function > >>> np.sum(a,skipna=True) > nan > Yeah, but I think this is low priority compared to a lot of other things that need doing. The methods are written in C with a particular hardcoded implementation pattern, whereas with the functions in the numpy namespace I was able to adjust to call the ufunc reduce methods without much menial effort. 3) Can the skipna flag be extended to exclude other non-finite cases like > NaN? > That wasn't really within the scope of the original design, except for one particular case of the NA-bitpattern dtypes. It's possible to make a new mask and assign NA to the NaN values like this: a = [array with NaNs] aview = a.view(ownmaskna=True) aview[np.isnan(aview)] = np.NA np.sum(aview, skipna=True) 4) Assigning a np.NA needs a better error message but the Integer > array case is more informative: > >>> b=np.array([1,2,3,4], dtype=np.float128) > >>> b[0]=np.NA > Traceback (most recent call last): > File "", line 1, in > TypeError: float() argument must be a string or a number > > >>> j=np.array([1,2,3]) > >>> j > array([1, 2, 3]) > >>> j[0]=ina > Traceback (most recent call last): > File "", line 1, in > TypeError: int() argument must be a string or a number, not 'numpy.NAType' > I coded this up the way I did to ease the future transition to NA-bitpattern dtypes, which would handle this conversion from the NA object. The error message is being produced by CPython in both of these cases, so it looks like they didn't make their messages consistent. This could be changed to match the error message like this: >>> a = np.array([np.NA, 3]) >>> b = np.array([3,4]) >>> b[...] = a Traceback (most recent call last): File "", line 1, in ValueError: Cannot assign NA value to an array which does not support NAs > But it is nice that np.NA 'adjusts' to the insertion array: > >>> b.flags.maskna = True > >>> ana > NA(dtype=' >>> b[0]=ana > >>> b[0] > NA(dtype=' It should generally follow the NumPy type promotion rules, but may be a bit more liberal in places. > 5) Different display depending on masked state. That is I think that > 'maskna=True' should be displayed always when flags.maskna is True : > >>> j=np.array([1,2,3], dtype=np.int8) > >>> j > array([1, 2, 3], dtype=int8) > >>> j.flags.maskna=True > >>> j > array([1, 2, 3], maskna=True, dtype=int8) > >>> j[0]=np.NA > >>> j > array([NA, 2, 3], dtype=int8) # Ithink it should still display > 'maskna=True'. > This is just like how NumPy hides the dtype in some cases, it's hiding the maskna=True whenever it would be automatically detected from the input list. >>> np.array([1.0, 2.0]) array([ 1., 2.]) >>> np.array([1.0, 2.0], dtype=np.float32) array([ 1., 2.], dtype=float32) Cheers, Mark > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan at ajackson.org Fri Aug 19 16:01:25 2011 From: alan at ajackson.org (alan at ajackson.org) Date: Fri, 19 Aug 2011 15:01:25 -0500 Subject: [Numpy-discussion] Statistical distributions on samples In-Reply-To: References: Message-ID: <20110819150125.1649d49e@ajackson.org> I have applied the update to the documentation (although that function needs a general rewrite - later...) >On Mon, Aug 15, 2011 at 8:53 AM, Andrea Gavana wrote: > >> Hi Chris and All, >> >> On 12 August 2011 16:53, Christopher Jordan-Squire wrote: >> > Hi Andrea--An easy way to get something like this would be >> > >> > import numpy as np >> > import scipy.stats as stats >> > >> > sigma = #some reasonable standard deviation for your application >> > x = stats.norm.rvs(size=1000, loc=125, scale=sigma) >> > x = x[x>50] >> > x = x[x<200] >> > >> > That will give a roughly normal distribution to your velocities, as long >> as, >> > say, sigma<25. (I'm using the rule of thumb for the normal distribution >> that >> > normal random samples lie 3 standard deviations away from the mean about >> 1 >> > out of 350 times.) Though you won't be able to get exactly normal errors >> > about your mean since normal random samples can theoretically be of any >> > size. >> > >> > You can use this same process for any other distribution, as long as >> you've >> > chosen a scale variable so that the probability of samples being outside >> > your desired interval is really small. Of course, once again your random >> > errors won't be exactly from the distribution you get your original >> samples >> > from. >> >> Thank you for your suggestion. There are a couple of things I am not >> clear with, however. The first one (the easy one), is: let's suppose I >> need 200 values, and the accept/discard procedure removes 5 of them >> from the list. Is there any way to draw these 200 values from a bigger >> sample so that the accept/reject procedure will not interfere too >> much? And how do I get 200 values out of the bigger sample so that >> these values are still representative? >> > >FWIW, I'm not really advocating a truncated normal so much as making the >standard deviation small enough so that there's no real difference between a >true normal distribution and a truncated normal. > >If you're worried about getting exactly 200 samples, then you could sample N >with N>200 and such that after throwing out the ones that lie outside your >desired region you're left with M>200. Then just randomly pick 200 from >those M. That shouldn't bias anything as long as you randomly pick them. (Or >just pick the first 200, if you haven't done anything to impose any order on >the samples, such as sorting them by size.) But I'm not sure why you'd want >exactly 200 samples instead of some number of samples close to 200. > > >> >> Another thing, possibly completely unrelated. I am trying to design a >> toy Latin Hypercube script (just for my own understanding). I found >> this piece of code on the web (and I modified it slightly): >> >> def lhs(dist, size=100): >> ''' >> Latin Hypercube sampling of any distrbution. >> dist is is a scipy.stats random number generator >> such as stats.norm, stats.beta, etc >> parms is a tuple with the parameters needed for >> the specified distribution. >> >> :Parameters: >> - `dist`: random number generator from scipy.stats module. >> - `size` :size for the output sample >> ''' >> >> n = size >> >> perc = numpy.arange(0.0, 1.0, 1.0/n) >> numpy.random.shuffle(perc) >> >> smp = [stats.uniform(i,1.0/n).rvs() for i in perc] >> >> v = dist.ppf(smp) >> >> return v >> >> >> Now, I am not 100% clear of what the percent point function is (I have >> read around the web, but please keep in mind that my statistical >> skills are close to minus infinity). From this page: >> >> http://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm >> >> >The ppf is what's called the quantile function elsewhere. I do not know why >scipy calls it the ppf/percent point function. > >The quantile function is the inverse of the cumulative density function >(cdf). So dist.ppf(z) is the x such that P(dist <= x) = z. Roughly. (Things >get slightly more finicky if you think about discrete distributions because >then you have to pick what happens at the jumps in the cdf.) So >dist.ppf(0.5) gives the median of dist, and dist.ppf(0.25) gives the >lower/first quartile of dist. > > >> I gather that, if you plot the results of the ppf, with the horizontal >> axis as probability, the vertical axis goes from the smallest to the >> largest value of the cumulative distribution function. If i do this: >> >> numpy.random.seed(123456) >> >> distribution = stats.norm(loc=125, scale=25) >> >> my_lhs = lhs(distribution, 50) >> >> Will my_lhs always contain valid values (i.e., included between 50 and >> 200)? I assume the answer is no... but even if this was the case, is >> this my_lhs array ready to be used to setup a LHS experiment when I >> have multi-dimensional problems (in which all the variables are >> completely independent from each other - no correlation)? >> >> >I'm not really sure if the above function is doing the lhs you want. To >answer your question, it won't always generate values within [50,200]. If >size is large enough then you're dividing up the probability space evenly. >So even with the random perturbations (whose use I don't really understand), >you'll ensure that some of the samples you get when you apply the ppf will >correspond to the extremely low probability samples that are <50 or >200. > >-Chris JS > >My apologies for the idiocy of the questions. >> >> Andrea. >> >> "Imagination Is The Only Weapon In The War Against Reality." >> http://xoomer.alice.it/infinity77/ >> >> >>> import PyQt4.QtGui >> Traceback (most recent call last): >> File "", line 1, in >> ImportError: No module named PyQt4.QtGui >> >>> >> >>> import pygtk >> Traceback (most recent call last): >> File "", line 1, in >> ImportError: No module named pygtk >> >>> >> >>> import wx >> >>> >> >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> -- ----------------------------------------------------------------------- | Alan K. Jackson | To see a World in a Grain of Sand | | alan at ajackson.org | And a Heaven in a Wild Flower, | | www.ajackson.org | Hold Infinity in the palm of your hand | | Houston, Texas | And Eternity in an hour. - Blake | ----------------------------------------------------------------------- From ralf.gommers at googlemail.com Fri Aug 19 16:04:15 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 19 Aug 2011 22:04:15 +0200 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 9:15 PM, Mark Wiebe wrote: > On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey wrote: > >> Hi, >> Just some immediate minor observations that are really about trying to >> be consistent: >> >> 1) Could you keep the display of the NA dtype be the same as the array? >> For example, NA dtype is displayed as '> 'float64' as that is the array dtype. >> >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) >> >>> a >> array([[ 1., 2., 3., NA], >> [ 3., 4., nan, 5.]]) >> >>> a.dtype >> dtype('float64') >> >>> a.sum() >> NA(dtype='> > > I suppose I can do it that way, sure. I think it would be good to change > the 'float64' into ' > I don't think that looks better. It would also screws up people's doctests again. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From amcmorl at gmail.com Fri Aug 19 16:04:03 2011 From: amcmorl at gmail.com (Angus McMorland) Date: Fri, 19 Aug 2011 16:04:03 -0400 Subject: [Numpy-discussion] numpy segfaults with ctypes Message-ID: Hi all, I'm giving this email a new subject, in case that helps it catch the attention of someone who can fix my problem. I currently cannot upgrade numpy from git to any date more recent than 10 July. Git commit feb8079070b8a659d7ee is the first that causes the problem (according to github, the commit was authored by walshb and committed by m-paradox, in case that jogs anyone's memory). I've tried taking a look at the code diff, but I'm afraid I'm just a user, rather than a developer, and it didn't make much sense. My problem is that python segfaults when I run it with the following code: > from ctypes import Structure, c_double > > #-- copied out of an xml2py generated file > class S(Structure): > ? ?pass > S._pack_ = 4 > S._fields_ = [ > ? ?('field', c_double * 2), > ? ] > #-- > > import numpy as np > print np.version.version > s = S() > print "S", np.asarray(s.field) Thanks, Angus -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From mwwiebe at gmail.com Fri Aug 19 16:05:23 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 19 Aug 2011 13:05:23 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 11:44 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey wrote: > >> Hi, >> Just some immediate minor observations that are really about trying to >> be consistent: >> >> 1) Could you keep the display of the NA dtype be the same as the array? >> For example, NA dtype is displayed as '> 'float64' as that is the array dtype. >> >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) >> >>> a >> array([[ 1., 2., 3., NA], >> [ 3., 4., nan, 5.]]) >> >>> a.dtype >> dtype('float64') >> >>> a.sum() >> NA(dtype='> >> 2) Can the 'skipna' flag be added to the methods? >> >>> a.sum(skipna=True) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: 'skipna' is an invalid keyword argument for this function >> >>> np.sum(a,skipna=True) >> nan >> >> 3) Can the skipna flag be extended to exclude other non-finite cases like >> NaN? >> >> 4) Assigning a np.NA needs a better error message but the Integer >> array case is more informative: >> >>> b=np.array([1,2,3,4], dtype=np.float128) >> >>> b[0]=np.NA >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: float() argument must be a string or a number >> >> >>> j=np.array([1,2,3]) >> >>> j >> array([1, 2, 3]) >> >>> j[0]=ina >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: int() argument must be a string or a number, not 'numpy.NAType' >> >> But it is nice that np.NA 'adjusts' to the insertion array: >> >>> b.flags.maskna = True >> >>> ana >> NA(dtype='> >>> b[0]=ana >> >>> b[0] >> NA(dtype='> >> 5) Different display depending on masked state. That is I think that >> 'maskna=True' should be displayed always when flags.maskna is True : >> >>> j=np.array([1,2,3], dtype=np.int8) >> >>> j >> array([1, 2, 3], dtype=int8) >> >>> j.flags.maskna=True >> >>> j >> array([1, 2, 3], maskna=True, dtype=int8) >> >>> j[0]=np.NA >> >>> j >> array([NA, 2, 3], dtype=int8) # Ithink it should still display >> 'maskna=True'. >> >> > My main peeve is that NA is upper case ;) I suppose that could use some > discussion. > There is some proliferation of cases in the NaN case: >>> np.nan nan >>> np.NAN nan >>> np.NaN nan The pros I see for NA over na are: * less confusion of NA vs nan (should this carry over to the np.isna function, should it be np.isNA according to this point?) * more comfortable for switching between NumPy and R when people have to use both at the same time The main con is: * Inconsistent with current nan, inf printing. Here's a hackish workaround: >>> np.na = np.NA >>> np.set_printoptions(nastr='na') >>> np.array([np.na, 2.0]) array([na, 2.]) What's your list of pros and cons? -Mark > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Fri Aug 19 16:11:40 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 19 Aug 2011 13:11:40 -0700 Subject: [Numpy-discussion] numpy segfaults with ctypes In-Reply-To: References: Message-ID: Hi, On Fri, Aug 19, 2011 at 1:04 PM, Angus McMorland wrote: > Hi all, > > I'm giving this email a new subject, in case that helps it catch the > attention of someone who can fix my problem. I currently cannot > upgrade numpy from git to any date more recent than 10 July. Git > commit feb8079070b8a659d7ee is the first that causes the problem > (according to github, the commit was authored by walshb and committed > by m-paradox, in case that jogs anyone's memory). I've tried taking a > look at the code diff, but I'm afraid I'm just a user, rather than a > developer, and it didn't make much sense. > > My problem is that python segfaults when I run it with the following code: > >> from ctypes import Structure, c_double >> >> #-- copied out of an xml2py generated file >> class S(Structure): >> ? ?pass >> S._pack_ = 4 >> S._fields_ = [ >> ? ?('field', c_double * 2), >> ? ] >> #-- >> >> import numpy as np >> print np.version.version >> s = S() >> print "S", np.asarray(s.field) Just to say, that that commit is also the commit that causes a segfault for np.lookfor: http://www.mail-archive.com/numpy-discussion at scipy.org/msg33114.html http://projects.scipy.org/numpy/ticket/1937 The latter ticket is closed because Mark's missing-data development branch does not have the segfault. I guess you could try that branch and see whether it fixes the problem? I guess also that means we'll have to merge in the missing data branch in order to fix the problem. See you, matthew From mwwiebe at gmail.com Fri Aug 19 18:46:50 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 19 Aug 2011 15:46:50 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Thu, Aug 18, 2011 at 2:43 PM, Mark Wiebe wrote: > It's taken a lot of changes to get the NA mask support to its current > point, but the code ready for some testing now. You can read the > work-in-progress release notes here: > > > https://github.com/m-paradox/numpy/blob/missingdata/doc/release/2.0.0-notes.rst > > To try it out, check out the missingdata branch from my github account, > here, and build in the standard way: > > https://github.com/m-paradox/numpy > > The things most important to test are: > > * Confirm that existing code still works correctly. I've tested against > SciPy and matplotlib. > * Confirm that the performance of code not using NA masks is the same or > better. > * Try to do computations with the NA values, find places they don't work > yet, and nominate unimplemented functionality important to you to be next on > the development list. The release notes have a preliminary list of > implemented/unimplemented functions. > * Report any crashes, build problems, or unexpected behaviors. > > In addition to adding the NA mask, I've also added features and done a few > performance changes here and there, like letting reductions like sum take > lists of axes instead of being a single axis or all of them. These changes > affect various bugs like http://projects.scipy.org/numpy/ticket/1143 and > http://projects.scipy.org/numpy/ticket/533. > With a new fix to the unitless reduction logic I just committed, the situation for bug http://projects.scipy.org/numpy/ticket/450 is also improved. Cheers, Mark > Thanks! > Mark > > Here's a small example run using NAs: > > >>> import numpy as np > >>> np.__version__ > '2.0.0.dev-8a5e2a1' > >>> a = np.random.rand(3,3,3) > >>> a.flags.maskna = True > >>> a[np.random.rand(3,3,3) < 0.5] = np.NA > >>> a > array([[[NA, NA, 0.11511708], > [ 0.46661454, 0.47565512, NA], > [NA, NA, NA]], > > [[NA, 0.57860351, NA], > [NA, NA, 0.72012669], > [ 0.36582123, NA, 0.76289794]], > > [[ 0.65322748, 0.92794386, NA], > [ 0.53745165, 0.97520989, 0.17515083], > [ 0.71219688, 0.5184328 , 0.75802805]]]) > >>> np.mean(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1) > array([[NA, NA, NA], > [NA, NA, NA], > [NA, 0.32710662, 0.10384331]]) > >>> np.mean(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2474: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0.11511708, 0.47113483, nan], > [ 0.57860351, 0.72012669, 0.56435958], > [ 0.79058567, 0.56260412, 0.66288591]]) > >>> np.std(a, axis=-1, skipna=True) > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2707: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(arrmean, rcount, out=arrmean, casting='unsafe') > /home/mwiebe/installtest/lib64/python2.7/site-packages/numpy/core/fromnumeric.py:2730: > RuntimeWarning: invalid value encountered in true_divide > um.true_divide(ret, rcount, out=ret, casting='unsafe') > array([[ 0. , 0.00452029, nan], > [ 0. , 0. , 0.19853835], > [ 0.13735819, 0.32710662, 0.10384331]]) > >>> np.std(a, axis=(1,2), skipna=True) > array([ 0.16786895, 0.15498008, 0.23811937]) > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominique.orban at gmail.com Fri Aug 19 19:00:04 2011 From: dominique.orban at gmail.com (Dominique Orban) Date: Fri, 19 Aug 2011 23:00:04 +0000 Subject: [Numpy-discussion] ImportError: dynamic module does not define init function (initmultiarray) Message-ID: Dear list, I'm embedding Python inside a C program to pull functions from user-supplied Python modules. All is well except when the user-supplied module imports numpy. Requesting a stack trace when an exception occurs reveals the following: --- Traceback (most recent call last): File "/Users/dpo/.virtualenvs/matrox/matrox/curve.py", line 3, in import numpy as np File "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/__init__.py", line 137, in import add_newdocs File "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/add_newdocs.py", line 9, in from numpy.lib import add_newdoc File "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/lib/__init__.py", line 4, in from type_check import * File "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/lib/type_check.py", line 8, in import numpy.core.numeric as _nx File "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/core/__init__.py", line 5, in import multiarray ImportError: dynamic module does not define init function (initmultiarray) --- (here, "curve.py" is the user-supplied module in question.) The symbol initmultiarray *is* defined in multiarray.so so I'm wondering if anybody has suggestions as to what the problem may be here. A bit of Googling reveals the following: * The 3rd example of Section 31.2.5 of http://www.swig.org/Doc1.3/Python.html says "This error is almost always caused when a bad name is given to the shared object file. For example, if you created a file example.so instead of _example.so you would get this error." * Item #2 in the FAQ at http://biggles.sourceforge.net/doc/1.5/faq says "This is a problem with your module search path. Python is loading [multiarray].so as a module instead of [multiarray].py" But I don't have any multiarray.py. I have other multiarray.so's, but they're not in my search path. And I'm not finding any _multiarray.so with a leading underscore. So I am lead to ask: should multiarray.so really be called _multiarray.so? If not, any idea what the problem is? I'm using Python 2.7.2 compiled as a framework using Homebrew on OSX 10.6.8 and Numpy 1.6.1 installed from PyPi a day or two ago. Thanks much in advance! -- Dominique From bsouthey at gmail.com Fri Aug 19 19:52:53 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 19 Aug 2011 18:52:53 -0500 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 3:05 PM, Mark Wiebe wrote: > On Fri, Aug 19, 2011 at 11:44 AM, Charles R Harris > wrote: >> >> >> On Fri, Aug 19, 2011 at 12:37 PM, Bruce Southey >> wrote: >>> >>> Hi, >>> Just some immediate minor observations that are really about trying to >>> be consistent: >>> >>> 1) Could you keep the display of the NA dtype be the same as the array? >>> For example, NA dtype is displayed as '>> 'float64' as that is the array dtype. >>> ?>>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) >>> >>> a >>> array([[ ?1., ? 2., ? 3., NA], >>> ? ? ? [ ?3., ? 4., ?nan, ? 5.]]) >>> >>> a.dtype >>> dtype('float64') >>> >>> a.sum() >>> NA(dtype='>> >>> 2) Can the 'skipna' flag be added to the methods? >>> >>> a.sum(skipna=True) >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> TypeError: 'skipna' is an invalid keyword argument for this function >>> >>> np.sum(a,skipna=True) >>> nan >>> >>> 3) Can the skipna flag be extended to exclude other non-finite cases like >>> NaN? >>> >>> 4) Assigning a np.NA needs a better error message but the Integer >>> array case is more informative: >>> >>> b=np.array([1,2,3,4], dtype=np.float128) >>> >>> b[0]=np.NA >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> TypeError: float() argument must be a string or a number >>> >>> >>> j=np.array([1,2,3]) >>> >>> j >>> array([1, 2, 3]) >>> >>> j[0]=ina >>> Traceback (most recent call last): >>> ?File "", line 1, in >>> TypeError: int() argument must be a string or a number, not >>> 'numpy.NAType' >>> >>> But it is nice that np.NA 'adjusts' to the insertion array: >>> >>> b.flags.maskna = True >>> >>> ana >>> NA(dtype='>> >>> b[0]=ana >>> >>> b[0] >>> NA(dtype='>> >>> 5) Different display depending on masked state. That is I think that >>> 'maskna=True' should be displayed always when flags.maskna is True : >>> >>> j=np.array([1,2,3], dtype=np.int8) >>> >>> j >>> array([1, 2, 3], dtype=int8) >>> >>> j.flags.maskna=True >>> >>> j >>> array([1, 2, 3], maskna=True, dtype=int8) >>> >>> j[0]=np.NA >>> >>> j >>> array([NA, 2, 3], dtype=int8) # Ithink it should still display >>> 'maskna=True'. >>> >> >> My main peeve is that NA is upper case ;) I suppose that could use some >> discussion. > > There is some proliferation of cases in the NaN case: >>>> np.nan > nan >>>> np.NAN > nan >>>> np.NaN > nan > The pros I see for NA over na are: > * less confusion of NA vs nan (should this carry over to the np.isna > function, should it be np.isNA according to this point?) > * more comfortable for switching between NumPy and R when people have to use > both at the same time > The main con is: > * Inconsistent with current nan, inf printing. Here's a hackish workaround: >>>> np.na = np.NA >>>> np.set_printoptions(nastr='na') >>>> np.array([np.na, 2.0]) > array([na, ?2.]) > What's your list of pros and cons? > -Mark > >> >> Chuck >> In part I sort of like to have NA and nan since poor eyesight/typing/editing avoiding problems dropping the last 'n'. Regarding nan/NAN, do you mean something like my ticket 1051? http://projects.scipy.org/numpy/ticket/1051 I do not care that much about the case (mixed case is not good) provided that there is only one to specify these. Also should np.isfinite() return False for np.NA? >>> np.isfinite([1,2,np.NA,4]) array([ True, True, NA, True], dtype=bool) Anyhow, many thanks for the replies to my observations and your amazing effect in getting this done. Bruce From josef.pktd at gmail.com Fri Aug 19 23:19:01 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 19 Aug 2011 23:19:01 -0400 Subject: [Numpy-discussion] life expectancy of scipy.stats nan statistics Message-ID: I'm just looking at http://projects.scipy.org/scipy/ticket/1200 I agree with Ralf that the bias keyword should be changed to ddof as in the numpy functions. For functions in scipy.stats, and statistics in general, I prefer the usual axis=0 default. However, I think these functions, like scipy.stats.nanstd, should be replaced by corresponding numpy functions, which might happen relatively soon. But how soon? Is it worth deprecating bias in scipy 0.10, and then deprecate again for removal in 0.11 or 0.12? Josef From zelbier at gmail.com Sat Aug 20 03:47:10 2011 From: zelbier at gmail.com (Olivier Verdier) Date: Sat, 20 Aug 2011 09:47:10 +0200 Subject: [Numpy-discussion] Can't mix np.newaxis with boolean indexing In-Reply-To: References: Message-ID: Your syntax is not as intuitive as you may think. Suppose I take a matrix instead a = np.array([1,2,3,4]).reshape(2,2) b = (a>1) # np.array([[False,True],[True,True]]) How would a[b,np.newaxis] be supposed to work? Note that other (simple) slices work perfectly with newaxis, such as a[:1,np.newaxis] == Olivier On 19 August 2011 17:50, Benjamin Root wrote: > I could have sworn that this use to work: > > import numpy as np > a = np.random.random((100,)) > b = (a > 0.5) > print a[b, np.newaxis] > > But instead, I get this error on the latest master: > > Traceback (most recent call last): > ? File "", line 1, in > TypeError: long() argument must be a string or a number, not 'NoneType' > > Note, the simple work-around would be "a[b][:, np.newaxis]", but I can't > imagine why the intuitive syntax would not be valid. > > Thanks, > Ben Root > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From heshiming at gmail.com Sat Aug 20 04:01:42 2011 From: heshiming at gmail.com (He Shiming) Date: Sat, 20 Aug 2011 16:01:42 +0800 Subject: [Numpy-discussion] RGB <-> HSV in numpy? Message-ID: Hi, I'm wondering how to do RGB <-> HSV conversion in numpy. I found a couple solutions through stackoverflow, but somehow they can't be used in my array format. I understand the concept of conversion, but I'm not that familiar with numpy. My source buffer format is 'RGBA' sequence. I can take it into numpy via: numpy.fromstring(data, 'B').astype('I'). So that nd[0::4] becomes the array for the red channel. After color manipulation, I'll convert it back by nd.astype('B').tostring(). How do I run RGB <-> HSV conversion on the nd array? I'd like to keep SV values in the range of 0-1, and H in 0-360. Thank you. -- Best regards, He Shiming From wardefar at iro.umontreal.ca Sat Aug 20 04:17:26 2011 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Sat, 20 Aug 2011 04:17:26 -0400 Subject: [Numpy-discussion] RGB <-> HSV in numpy? In-Reply-To: References: Message-ID: On 2011-08-20, at 4:01 AM, He Shiming wrote: > Hi, > > I'm wondering how to do RGB <-> HSV conversion in numpy. I found a > couple solutions through stackoverflow, but somehow they can't be used > in my array format. I understand the concept of conversion, but I'm > not that familiar with numpy. > > My source buffer format is 'RGBA' sequence. I can take it into numpy > via: numpy.fromstring(data, 'B').astype('I'). So that nd[0::4] becomes > the array for the red channel. After color manipulation, I'll convert > it back by nd.astype('B').tostring(). There are functions for this available in scikits.image: http://stefanv.github.com/scikits.image/api/scikits.image.color.html Although you may need to reshape it with reshape(arr, (width, height, 4)) or something similar first. David From gael.varoquaux at normalesup.org Sat Aug 20 04:49:56 2011 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 20 Aug 2011 10:49:56 +0200 Subject: [Numpy-discussion] SVD does not converge on "clean" matrix In-Reply-To: <6d45c5e06b9e78cd9f56cf3ff2d604a5@telecom-paristech.fr> References: <203aa1b32d794c238d32cb8d29036cc2.squirrel@webmail1.telecom-paristech.fr> <1313332026.61861.YahooMailNeo@web34404.mail.mud.yahoo.com> <6d45c5e06b9e78cd9f56cf3ff2d604a5@telecom-paristech.fr> Message-ID: <20110820084956.GA16846@phare.normalesup.org> On Sun, Aug 14, 2011 at 09:15:35PM +0200, Charanpal Dhanjal wrote: > Incidentally, I am confused as to why numpy calls the lapack lite > routines - when I call numpy.show_config() it seems to have detected my > ATLAS libraries and I would have expected it to use those. My rule of thumb is to never use numpy for linear algebra, but only scipy. It avoids such confusions that I have seen so often, including with my colleagues. My 2 cents, Gael From mwwiebe at gmail.com Sat Aug 20 12:26:45 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 20 Aug 2011 09:26:45 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey wrote: > Hi, > Just some immediate minor observations that are really about trying to > be consistent: > > 1) Could you keep the display of the NA dtype be the same as the array? > For example, NA dtype is displayed as ' 'float64' as that is the array dtype. > >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) > >>> a > array([[ 1., 2., 3., NA], > [ 3., 4., nan, 5.]]) > >>> a.dtype > dtype('float64') > >>> a.sum() > NA(dtype=' I've implemented this: >>> a=np.array([[1,2,3,np.NA], [3,4,np.nan,5]]) >>> a array([[ 1., 2., 3., NA], [ 3., 4., nan, 5.]]) >>> a.dtype dtype('float64') >>> a.sum() NA(dtype='float64') > 2) Can the 'skipna' flag be added to the methods? > >>> a.sum(skipna=True) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'skipna' is an invalid keyword argument for this function > >>> np.sum(a,skipna=True) > nan > > 3) Can the skipna flag be extended to exclude other non-finite cases like > NaN? > > 4) Assigning a np.NA needs a better error message but the Integer > array case is more informative: > >>> b=np.array([1,2,3,4], dtype=np.float128) > >>> b[0]=np.NA > Traceback (most recent call last): > File "", line 1, in > TypeError: float() argument must be a string or a number > > >>> j=np.array([1,2,3]) > >>> j > array([1, 2, 3]) > >>> j[0]=ina > Traceback (most recent call last): > File "", line 1, in > TypeError: int() argument must be a string or a number, not 'numpy.NAType' > Here are the new error messages in these cases: >>> b=np.array([1,2,3,4], dtype=np.float128) >>> b[0]=np.NA Traceback (most recent call last): File "", line 1, in ValueError: Cannot assign NA to an array which does not support NAs >>> j=np.array([1,2,3]) >>> j[0] = np.NA Traceback (most recent call last): File "", line 1, in ValueError: Cannot assign NA to an array which does not support NAs Cheers, Mark > > But it is nice that np.NA 'adjusts' to the insertion array: > >>> b.flags.maskna = True > >>> ana > NA(dtype=' >>> b[0]=ana > >>> b[0] > NA(dtype=' > 5) Different display depending on masked state. That is I think that > 'maskna=True' should be displayed always when flags.maskna is True : > >>> j=np.array([1,2,3], dtype=np.int8) > >>> j > array([1, 2, 3], dtype=int8) > >>> j.flags.maskna=True > >>> j > array([1, 2, 3], maskna=True, dtype=int8) > >>> j[0]=np.NA > >>> j > array([NA, 2, 3], dtype=int8) # Ithink it should still display > 'maskna=True'. > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Aug 20 12:32:40 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 20 Aug 2011 09:32:40 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 4:52 PM, Bruce Southey wrote: > On Fri, Aug 19, 2011 at 3:05 PM, Mark Wiebe wrote: > > On Fri, Aug 19, 2011 at 11:44 AM, Charles R Harris > > wrote: > >> > >> > >> > >> > >> My main peeve is that NA is upper case ;) I suppose that could use some > >> discussion. > > > > There is some proliferation of cases in the NaN case: > >>>> np.nan > > nan > >>>> np.NAN > > nan > >>>> np.NaN > > nan > > The pros I see for NA over na are: > > * less confusion of NA vs nan (should this carry over to the np.isna > > function, should it be np.isNA according to this point?) > > * more comfortable for switching between NumPy and R when people have to > use > > both at the same time > > The main con is: > > * Inconsistent with current nan, inf printing. Here's a hackish > workaround: > >>>> np.na = np.NA > >>>> np.set_printoptions(nastr='na') > >>>> np.array([np.na, 2.0]) > > array([na, 2.]) > > What's your list of pros and cons? > > -Mark > > > >> > >> Chuck > >> > > In part I sort of like to have NA and nan since poor > eyesight/typing/editing avoiding problems dropping the last 'n'. > > Regarding nan/NAN, do you mean something like my ticket 1051? > http://projects.scipy.org/numpy/ticket/1051 > I do not care that much about the case (mixed case is not good) > provided that there is only one to specify these. > > Also should np.isfinite() return False for np.NA? > >>> np.isfinite([1,2,np.NA,4]) > array([ True, True, NA, True], dtype=bool) > This is correct according to the NA computational model in the NEP. An NA represents a value which exists but is unknown, and could be anything representable by the type. Thus, it could the a finite number or it could be inf, meaning the answer to isfinite could be True or it could be False, and the answer must be NA. > Anyhow, many thanks for the replies to my observations and your > amazing effect in getting this done. > Thanks for taking the time to take the software for a spin, I appreciate your feedback! -Mark > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sat Aug 20 16:17:02 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sat, 20 Aug 2011 15:17:02 -0500 Subject: [Numpy-discussion] Can't mix np.newaxis with boolean indexing In-Reply-To: References: Message-ID: On Sat, Aug 20, 2011 at 2:47 AM, Olivier Verdier wrote: > Your syntax is not as intuitive as you may think. > > Suppose I take a matrix instead > > a = np.array([1,2,3,4]).reshape(2,2) > b = (a>1) # np.array([[False,True],[True,True]]) > > How would a[b,np.newaxis] be supposed to work? > > Note that other (simple) slices work perfectly with newaxis, such as > a[:1,np.newaxis] > > == Olivier > > Personally, I would have expected it to flatten the results and added a dimension: a[b, np.newaxis] array([[2], [3], [4]]) or a[np.newaxis, b] array([[2, 3, 4]]) I mean, it flattens the results anyway when doing boolean indexing for multi-dimensional arrays, so someone doing that should expect that anyway. At the very least, I think maybe we could have a better error message than just saying that long() can't take a NoneType? Thanks, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris at simplistix.co.uk Sat Aug 20 18:37:19 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 20 Aug 2011 15:37:19 -0700 Subject: [Numpy-discussion] Decimal arrays? Message-ID: <4E50371F.1070105@simplistix.co.uk> Hi All, What's the best type of array to use for decimal values? (ie: where I care about precision and want to avoid any possible rounding errors) cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From robert.kern at gmail.com Sat Aug 20 18:38:20 2011 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 20 Aug 2011 17:38:20 -0500 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: <4E50371F.1070105@simplistix.co.uk> References: <4E50371F.1070105@simplistix.co.uk> Message-ID: On Sat, Aug 20, 2011 at 17:37, Chris Withers wrote: > Hi All, > > What's the best type of array to use for decimal values? > (ie: where I care about precision and want to avoid any possible > rounding errors) dtype=object -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From chris at simplistix.co.uk Sat Aug 20 18:49:40 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 20 Aug 2011 15:49:40 -0700 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: References: <4E50371F.1070105@simplistix.co.uk> Message-ID: <4E503A04.8070600@simplistix.co.uk> On 20/08/2011 15:38, Robert Kern wrote: > On Sat, Aug 20, 2011 at 17:37, Chris Withers wrote: >> Hi All, >> >> What's the best type of array to use for decimal values? >> (ie: where I care about precision and want to avoid any possible >> rounding errors) > > dtype=object Thanks! What are the performance implications, if any, of this array type? Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From chris at simplistix.co.uk Sat Aug 20 19:18:55 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Sat, 20 Aug 2011 16:18:55 -0700 Subject: [Numpy-discussion] saving groups of numpy arrays to disk Message-ID: <4E5040DF.9090303@simplistix.co.uk> Hi All, I've got a tree of nested dicts that at their leaves end in numpy arrays of identical sizes. What's the easiest way to persist these to disk so that I can pick up with them where I left off? What's the most "correct" way to do so? I'm using IPython if that makes things easier... I had wondered about PyTables, but that seems a bit too heavyweight for this, unless I'm missing something? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From charlesr.harris at gmail.com Sat Aug 20 19:41:49 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 20 Aug 2011 17:41:49 -0600 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: <4E503A04.8070600@simplistix.co.uk> References: <4E50371F.1070105@simplistix.co.uk> <4E503A04.8070600@simplistix.co.uk> Message-ID: On Sat, Aug 20, 2011 at 4:49 PM, Chris Withers wrote: > On 20/08/2011 15:38, Robert Kern wrote: > > On Sat, Aug 20, 2011 at 17:37, Chris Withers > wrote: > >> Hi All, > >> > >> What's the best type of array to use for decimal values? > >> (ie: where I care about precision and want to avoid any possible > >> rounding errors) > > > > dtype=object > > Thanks! > > What are the performance implications, if any, of this array type? > > It will be slower, the effect depends on how much data/computation you have. You need to look into using the decimal objects in decimal module, i.e., import decimal. Note that 1/7 still isn't going to be exact in decimals. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat Aug 20 20:08:04 2011 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 20 Aug 2011 19:08:04 -0500 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: <4E503A04.8070600@simplistix.co.uk> References: <4E50371F.1070105@simplistix.co.uk> <4E503A04.8070600@simplistix.co.uk> Message-ID: On Sat, Aug 20, 2011 at 17:49, Chris Withers wrote: > On 20/08/2011 15:38, Robert Kern wrote: >> On Sat, Aug 20, 2011 at 17:37, Chris Withers ?wrote: >>> Hi All, >>> >>> What's the best type of array to use for decimal values? >>> (ie: where I care about precision and want to avoid any possible >>> rounding errors) >> >> dtype=object > > Thanks! > > What are the performance implications, if any, of this array type? It will be slower than floats, obviously, because there will be several C function calls and plenty of extra instructions for each operation on each element. But it will be somewhat faster than looping in Python. Note that decimal.Decimal objects are implemented in pure Python, so you will also be paying for Python function call overhead and other costs going through ceval.c several times over. You may want to try the cdecimal package: http://pypi.python.org/pypi/cdecimal/ This will provide an extension module defining an extension type implemented in C. You can avoid the ceval.c overhead entirely during the array operation. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From heshiming at gmail.com Sun Aug 21 00:52:12 2011 From: heshiming at gmail.com (He Shiming) Date: Sun, 21 Aug 2011 12:52:12 +0800 Subject: [Numpy-discussion] RGB <-> HSV in numpy? In-Reply-To: References: Message-ID: On Sat, Aug 20, 2011 at 4:17 PM, David Warde-Farley wrote: > > There are functions for this available in scikits.image: > > http://stefanv.github.com/scikits.image/api/scikits.image.color.html > > Although you may need to reshape it with reshape(arr, (width, height, 4)) or something similar first. > > David Thanks, I'll check it out. -- Best regards, He Shiming From paul.anton.letnes at gmail.com Sun Aug 21 03:19:30 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 21 Aug 2011 08:19:30 +0100 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: <4E5040DF.9090303@simplistix.co.uk> References: <4E5040DF.9090303@simplistix.co.uk> Message-ID: <83268EE4-194B-4EF0-B370-8F3D1EA6EA6D@gmail.com> Hi! On 21. aug. 2011, at 00.18, Chris Withers wrote: > Hi All, > > I've got a tree of nested dicts that at their leaves end in numpy arrays > of identical sizes. > > What's the easiest way to persist these to disk so that I can pick up > with them where I left off? Probably giving them names like trunk_branch_leaf.txt with numpy.savetxt, if you want it quick and dirty. Or possibly, use numpy.savez directly on your dict. > What's the most "correct" way to do so? > > I'm using IPython if that makes things easier... > > I had wondered about PyTables, but that seems a bit too heavyweight for > this, unless I'm missing something? In my (perhaps limited) experience, hdf5 is great for this. I personally use h5py, I believe it is a little lighter. You get the "tree structure" for free in something like a directory structure: /branch/leaf /trunk/branch/leaf etc. Cheers Paul From ben_w_123 at yahoo.co.uk Sun Aug 21 04:53:05 2011 From: ben_w_123 at yahoo.co.uk (Ben Walsh) Date: Sun, 21 Aug 2011 09:53:05 +0100 (BST) Subject: [Numpy-discussion] Segfault for np.lookfor In-Reply-To: References: Message-ID: Hi My bad. Very sorry about that, guys. There's a patch for this here: https://github.com/walshb/numpy/tree/fix_np_lookfor_segv And I submitted a pull request. I'll add something to the tests too when I have a little more time. Cheers Ben > ------------------------------ > > Message: 3 > Date: Tue, 16 Aug 2011 12:15:22 -0700 > From: Matthew Brett > Subject: Re: [Numpy-discussion] Segfault for np.lookfor > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset=ISO-8859-1 > >> >> I opened ticket #1937 for this > >> From git-bisect it looks like the culprit is: > > feb8079070b8a659d7eee1b4acbddf470fd8a81d is the first bad commit > commit feb8079070b8a659d7eee1b4acbddf470fd8a81d > Author: Ben Walsh > Date: Sun Jul 10 12:52:52 2011 +0100 > > BUT: Stop _array_find_type trying to make every list element a > subtype of bool. > > Just to remind me, my procedure was: > > <~/tmp/testfor.py> > #!/usr/bin/env python > import sys > from functools import partial > from subprocess import check_call, Popen, PIPE, CalledProcessError > > caller = partial(check_call, shell=True) > popener = partial(Popen, stdout=PIPE, stderr=PIPE, shell=True) > > try: > caller('git clean -fxd') > caller('python setup.py build_ext -i') > except CalledProcessError: > sys.exit(125) # untestable > > proc = popener('python -c "%s"' % > """import sys > import numpy as np > np.lookfor('cos', output=sys.stdout) > """) > > stdout, stderr = proc.communicate() > if 'Segmentation fault' in stderr: > sys.exit(1) # bad > sys.exit(0) # good > > > Then, I established the v1.6.1 did not have the segfault, and (man git-bisect): > > git co main-master # current upstream master > git bisect start HEAD v1.6.1 -- > git bisect run ~/tmp/testfor.py > > See y'all, > > Matthew > From pav at iki.fi Sun Aug 21 08:24:46 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 21 Aug 2011 12:24:46 +0000 (UTC) Subject: [Numpy-discussion] saving groups of numpy arrays to disk References: <4E5040DF.9090303@simplistix.co.uk> Message-ID: On Sat, 20 Aug 2011 16:18:55 -0700, Chris Withers wrote: > I've got a tree of nested dicts that at their leaves end in numpy arrays > of identical sizes. > > What's the easiest way to persist these to disk so that I can pick up > with them where I left off? Depends on your requirements. You can use Python pickling, if you do *not* have a requirement for: - real persistence, i.e., being able to easily read the data years later - a standard data format - access from non-Python programs - safety against malicious parties (unpickling can execute some code in the input -- although this is possible to control) then you can use Python pickling: import pickle file = open('out.pck', 'wb') pickle.dump(file, tree, protocol=pickle.HIGHEST_PROTOCOL) file.close() file = open('out.pck', 'rb') tree = pickle.load(file) file.close() This should just work (TM) directly with your tree-of-dicts-and-arrays. > What's the most "correct" way to do so? > > I'm using IPython if that makes things easier... > > I had wondered about PyTables, but that seems a bit too heavyweight for > this, unless I'm missing something? If I had one or more of the requirements listed above, I'd use the HDF5 format, via either PyTables or h5py. If I'd just need to cache the trees, then I'd use pickling. I think the only reason to consider heavy-weighedness is distribution: does your target audience have these libraries already installed (they are pre-installed in several Python-for-science distributions), and how difficult would it be for you to ship them with your stuff, or to require the users to install them. -- Pauli Virtanen From torgil.svensson at gmail.com Sun Aug 21 16:07:16 2011 From: torgil.svensson at gmail.com (Torgil Svensson) Date: Sun, 21 Aug 2011 22:07:16 +0200 Subject: [Numpy-discussion] Can't mix np.newaxis with boolean indexing In-Reply-To: References: Message-ID: Since the result is one-dimensional after using boolean indexing you can always do: a[b][:, np.newaxis] array([[2], [3], [4]]) a[b][np.newaxis, :] array([[2, 3, 4]]) //Torgil On Sat, Aug 20, 2011 at 10:17 PM, Benjamin Root wrote: > On Sat, Aug 20, 2011 at 2:47 AM, Olivier Verdier wrote: >> >> Your syntax is not as intuitive as you may think. >> >> Suppose I take a matrix instead >> >> a = np.array([1,2,3,4]).reshape(2,2) >> b = (a>1) # np.array([[False,True],[True,True]]) >> >> How would a[b,np.newaxis] be supposed to work? >> >> Note that other (simple) slices work perfectly with newaxis, such as >> a[:1,np.newaxis] >> >> == Olivier >> > > Personally, I would have expected it to flatten the results and added a > dimension: > > a[b, np.newaxis] > array([[2], > ???????? [3], > ???????? [4]]) > > or > > a[np.newaxis, b] > array([[2, 3, 4]]) > > I mean, it flattens the results anyway when doing boolean indexing for > multi-dimensional arrays, so someone doing that should expect that anyway. > > At the very least, I think maybe we could have a better error message than > just saying that long() can't take a NoneType? > > Thanks, > Ben Root > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From ben.root at ou.edu Sun Aug 21 16:39:02 2011 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 21 Aug 2011 15:39:02 -0500 Subject: [Numpy-discussion] Can't mix np.newaxis with boolean indexing In-Reply-To: References: Message-ID: On Sunday, August 21, 2011, Torgil Svensson wrote: > Since the result is one-dimensional after using boolean indexing you > can always do: > > a[b][:, np.newaxis] > array([[2], > [3], > [4]]) > > a[b][np.newaxis, :] > array([[2, 3, 4]]) > > //Torgil Correct, which I already noted as a workaround in my first email. The point I am making is that that shouldn't be necessary because of generic programming concepts, or a better error message should be emitted in case a developer didn't know that he was doing Boolean indexing. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From heshiming at gmail.com Sun Aug 21 23:39:15 2011 From: heshiming at gmail.com (He Shiming) Date: Mon, 22 Aug 2011 11:39:15 +0800 Subject: [Numpy-discussion] RGB <-> HSV in numpy? In-Reply-To: References: Message-ID: > On Sat, Aug 20, 2011 at 4:17 PM, David Warde-Farley > wrote: > > Thanks, I'll check it out. > > -- > Best regards, > He Shiming > Hi again. Project scikits.image appeared to be difficult to install under ubuntu. It complains about something related to OpenCV, and I didn't see any option to compile without it. I'm wondering if there are any simpler solutions, without using scikits.image or scipy, just numpy plus calculations. All I'm trying to do is to convert this algorithm: http://code.activestate.com/recipes/576919-python-rgb-and-hsv-conversion/ to numpy flavor. -- Best regards, He Shiming From ralf.gommers at googlemail.com Mon Aug 22 02:21:33 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 22 Aug 2011 08:21:33 +0200 Subject: [Numpy-discussion] RGB <-> HSV in numpy? In-Reply-To: References: Message-ID: On Mon, Aug 22, 2011 at 5:39 AM, He Shiming wrote: > > On Sat, Aug 20, 2011 at 4:17 PM, David Warde-Farley > > wrote: > > > > Thanks, I'll check it out. > > > > -- > > Best regards, > > He Shiming > > > > Hi again. Project scikits.image appeared to be difficult to install > under ubuntu. It complains about something related to OpenCV, and I > didn't see any option to compile without it. I'm wondering if there > are any simpler solutions, without using scikits.image or scipy, just > numpy plus calculations. All I'm trying to do is to convert this > algorithm: > http://code.activestate.com/recipes/576919-python-rgb-and-hsv-conversion/ > to numpy flavor. > > You can use this file standalone without installing scikits.image: https://github.com/stefanv/scikits.image/blob/master/scikits/image/color/colorconv.py Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From heshiming at gmail.com Mon Aug 22 02:29:29 2011 From: heshiming at gmail.com (He Shiming) Date: Mon, 22 Aug 2011 14:29:29 +0800 Subject: [Numpy-discussion] RGB <-> HSV in numpy? In-Reply-To: References: Message-ID: On Mon, Aug 22, 2011 at 2:21 PM, Ralf Gommers wrote: > > > You can use this file standalone without installing scikits.image: > https://github.com/stefanv/scikits.image/blob/master/scikits/image/color/colorconv.py > > Ralf > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Thanks Ralf. I've managed to extract the conversion routines from this colorconv.py, and adapted it on RGBA sequence. -- Best regards, He Shiming From mdickinson at enthought.com Mon Aug 22 03:18:57 2011 From: mdickinson at enthought.com (Mark Dickinson) Date: Mon, 22 Aug 2011 08:18:57 +0100 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: References: <4E50371F.1070105@simplistix.co.uk> <4E503A04.8070600@simplistix.co.uk> Message-ID: On Sun, Aug 21, 2011 at 1:08 AM, Robert Kern wrote: > You may want to try the cdecimal package: > > ?http://pypi.python.org/pypi/cdecimal/ I'll second this suggestion. cdecimal is an extraordinarily carefully written and well-tested (almost) drop-in replacement for the decimal module, and well worth a try. It would probably be in the Python standard library by now if anyone had had proper time to review it... Mark From konrad.hinsen at fastmail.net Mon Aug 22 03:36:15 2011 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Mon, 22 Aug 2011 09:36:15 +0200 Subject: [Numpy-discussion] Bug or feature? Message-ID: <00EE14FD-C521-4197-8C94-5B7E53EAA246@fastmail.net> Hi everyone, I just stumbled on a behavior in NumPy for which I can't find an explanation in the documentation. I wonder whether this is a bug or an undocumented (or badly documented) feature: -------------------------------------------------------------------------------------- import numpy t = numpy.dtype([("rotation", numpy.float64, (3, 3)), ("translation", numpy.float64, (3,))]) # works a1 = numpy.array([], dtype=t) # doesn't work a2 = numpy.array((), dtype=t) # -> ValueError: size of tuple must match number of fields. -------------------------------------------------------------------------------------- According to my understanding of how numpy.array should work, it shouldn't make a difference if the first argument is a list or a tuple, but in this case there is a difference. Konrad. From stefan-usenet at bytereef.org Mon Aug 22 08:30:34 2011 From: stefan-usenet at bytereef.org (Stefan Krah) Date: Mon, 22 Aug 2011 14:30:34 +0200 Subject: [Numpy-discussion] memoryview shape/strides representation for ndim = 0 Message-ID: <20110822123034.GA7743@sleipnir.bytereef.org> Hello, Numpy arrays and memoryview currently have different representations for shape and strides if ndim = 0: >>> from numpy import * >>> x = array(9, int32) >>> x.ndim 0 >>> x.shape () >>> x.strides () >>> m = memoryview(x) >>> m.ndim 0L >>> m.shape is None True >>> m.strides is None True I think the Numpy representation is nicer. Also, I think that memoryviews should attempt to mimic the underlying object as closely as possible. Since the ndim = 0 case probably only occurs in Numpy, it might be possible to change the representation in memoryview. Travis, was the "shape is None" representation used for compatibility with ctypes? Would it be possible or advisable to use the Numpy representation? Stefan Krah From mdickinson at enthought.com Mon Aug 22 08:35:56 2011 From: mdickinson at enthought.com (Mark Dickinson) Date: Mon, 22 Aug 2011 13:35:56 +0100 Subject: [Numpy-discussion] memoryview shape/strides representation for ndim = 0 In-Reply-To: <20110822123034.GA7743@sleipnir.bytereef.org> References: <20110822123034.GA7743@sleipnir.bytereef.org> Message-ID: On Mon, Aug 22, 2011 at 1:30 PM, Stefan Krah wrote: > Numpy arrays and memoryview currently have different representations > for shape and strides if ndim = 0: > >>>> from numpy import * >>>> x = array(9, int32) >>>> x.ndim > 0 >>>> x.shape > () >>>> x.strides > () >>>> m = memoryview(x) >>>> m.ndim > 0L >>>> m.shape is None > True >>>> m.strides is None > True > > > I think the Numpy representation is nicer. Also, I think that memoryviews > should attempt to mimic the underlying object as closely as possible. Agreed on both points. If there's no good reason for m.shape and m.strides to be None, I think it should be changed. Mark From teoliphant at gmail.com Mon Aug 22 08:50:06 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Mon, 22 Aug 2011 07:50:06 -0500 Subject: [Numpy-discussion] Bug or feature? In-Reply-To: <00EE14FD-C521-4197-8C94-5B7E53EAA246@fastmail.net> References: <00EE14FD-C521-4197-8C94-5B7E53EAA246@fastmail.net> Message-ID: <32C3323B-4398-41A3-9A0D-3C8006DB14E6@enthought.com> This goes into the category of "feature". Structured arrays use tuples to indicate a record. So, (only) when using structured arrays as a dtype, there is a difference between lists and tuples. In this case, array sees the tuple and expects it to have 2 elements to match the number of fields in 2. Best, -Travis On Aug 22, 2011, at 2:36 AM, Konrad Hinsen wrote: > Hi everyone, > > I just stumbled on a behavior in NumPy for which I can't find an > explanation in the documentation. I wonder whether this is a bug or an > undocumented (or badly documented) feature: > > -------------------------------------------------------------------------------------- > import numpy > > t = numpy.dtype([("rotation", numpy.float64, (3, 3)), > ("translation", numpy.float64, (3,))]) > > # works > a1 = numpy.array([], dtype=t) > > # doesn't work > a2 = numpy.array((), dtype=t) > # -> ValueError: size of tuple must match number of fields. > -------------------------------------------------------------------------------------- > > According to my understanding of how numpy.array should work, it > shouldn't make a difference if the first argument is a list or a > tuple, but in this case there is a difference. > > Konrad. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From chris at simplistix.co.uk Mon Aug 22 11:07:11 2011 From: chris at simplistix.co.uk (Chris Withers) Date: Mon, 22 Aug 2011 08:07:11 -0700 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: References: <4E50371F.1070105@simplistix.co.uk> <4E503A04.8070600@simplistix.co.uk> Message-ID: <4E52709F.4050300@simplistix.co.uk> On 22/08/2011 00:18, Mark Dickinson wrote: > On Sun, Aug 21, 2011 at 1:08 AM, Robert Kern wrote: >> You may want to try the cdecimal package: >> >> http://pypi.python.org/pypi/cdecimal/ > > I'll second this suggestion. cdecimal is an extraordinarily carefully > written and well-tested (almost) drop-in replacement for the decimal > module, and well worth a try. It would probably be in the Python > standard library by now if anyone had had proper time to review it... Who would need to review it? I'm surprised this isn't in EPD... any ideas why? cheers, Chris -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk From mdickinson at enthought.com Mon Aug 22 11:10:23 2011 From: mdickinson at enthought.com (Mark Dickinson) Date: Mon, 22 Aug 2011 16:10:23 +0100 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: <4E52709F.4050300@simplistix.co.uk> References: <4E50371F.1070105@simplistix.co.uk> <4E503A04.8070600@simplistix.co.uk> <4E52709F.4050300@simplistix.co.uk> Message-ID: On Mon, Aug 22, 2011 at 4:07 PM, Chris Withers wrote: > On 22/08/2011 00:18, Mark Dickinson wrote: >> >> On Sun, Aug 21, 2011 at 1:08 AM, Robert Kern >> ?wrote: >>> >>> You may want to try the cdecimal package: >>> >>> ?http://pypi.python.org/pypi/cdecimal/ >> >> I'll second this suggestion. ?cdecimal is an extraordinarily carefully >> written and well-tested (almost) drop-in replacement for the decimal >> module, and well worth a try. ?It would probably be in the Python >> standard library by now if anyone had had proper time to review it... > > Who would need to review it? Well, anyone who has the time and understands the domain, really; it's just useful to have a second pair of eyes going through the code. Putting several thousands of lines of unreviewed C code into the Python standard library is a bit of a no-no. Mark From amcmorl at gmail.com Mon Aug 22 12:23:00 2011 From: amcmorl at gmail.com (Angus McMorland) Date: Mon, 22 Aug 2011 12:23:00 -0400 Subject: [Numpy-discussion] numpy segfaults with ctypes In-Reply-To: References: Message-ID: On 19 August 2011 16:11, Matthew Brett wrote: > Hi, > > On Fri, Aug 19, 2011 at 1:04 PM, Angus McMorland wrote: >> Hi all, >> >> I'm giving this email a new subject, in case that helps it catch the >> attention of someone who can fix my problem. I currently cannot >> upgrade numpy from git to any date more recent than 10 July. Git >> commit feb8079070b8a659d7ee is the first that causes the problem >> (according to github, the commit was authored by walshb and committed >> by m-paradox, in case that jogs anyone's memory). I've tried taking a >> look at the code diff, but I'm afraid I'm just a user, rather than a >> developer, and it didn't make much sense. >> >> My problem is that python segfaults when I run it with the following code: >> >>> from ctypes import Structure, c_double >>> >>> #-- copied out of an xml2py generated file >>> class S(Structure): >>> ? ?pass >>> S._pack_ = 4 >>> S._fields_ = [ >>> ? ?('field', c_double * 2), >>> ? ] >>> #-- >>> >>> import numpy as np >>> print np.version.version >>> s = S() >>> print "S", np.asarray(s.field) > > Just to say, that that commit is also the commit that causes a > segfault for np.lookfor: > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg33114.html > http://projects.scipy.org/numpy/ticket/1937 > > The latter ticket is closed because Mark's missing-data development > branch does not have the segfault. > > I guess you could try that branch and see whether it fixes the problem? > > I guess also that means we'll have to merge in the missing data branch > in order to fix the problem. Thanks for the reply Matthew. The latest commit d7b12a3 fixes the problem. Angus. > See you, > > matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- AJC McMorland Post-doctoral research fellow Neurobiology, University of Pittsburgh From matthew.brett at gmail.com Mon Aug 22 12:59:02 2011 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 22 Aug 2011 09:59:02 -0700 Subject: [Numpy-discussion] Segfault for np.lookfor In-Reply-To: References: Message-ID: Hi, On Sun, Aug 21, 2011 at 1:53 AM, Ben Walsh wrote: > > Hi > > My bad. Very sorry about that, guys. > > There's a patch for this here: > > https://github.com/walshb/numpy/tree/fix_np_lookfor_segv > > And I submitted a pull request. I'll add something to the tests too when I > have a little more time. Thanks a lot - no criticism intended - just life in the wilds of tracking trunk... Cheers, Matthew From robert.kern at gmail.com Mon Aug 22 16:51:03 2011 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 22 Aug 2011 15:51:03 -0500 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: <4E52709F.4050300@simplistix.co.uk> References: <4E50371F.1070105@simplistix.co.uk> <4E503A04.8070600@simplistix.co.uk> <4E52709F.4050300@simplistix.co.uk> Message-ID: On Mon, Aug 22, 2011 at 10:07, Chris Withers wrote: > On 22/08/2011 00:18, Mark Dickinson wrote: >> On Sun, Aug 21, 2011 at 1:08 AM, Robert Kern ?wrote: >>> You may want to try the cdecimal package: >>> >>> ? http://pypi.python.org/pypi/cdecimal/ >> >> I'll second this suggestion. ?cdecimal is an extraordinarily carefully >> written and well-tested (almost) drop-in replacement for the decimal >> module, and well worth a try. ?It would probably be in the Python >> standard library by now if anyone had had proper time to review it... > > Who would need to review it? > > I'm surprised this isn't in EPD... any ideas why? No one has asked for it, to my knowledge. We do provide it in our PyPI repository, so $ enpkg cdecimal should install it if you are an EPD subscriber. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From konrad.hinsen at fastmail.net Tue Aug 23 02:28:27 2011 From: konrad.hinsen at fastmail.net (Konrad Hinsen) Date: Tue, 23 Aug 2011 08:28:27 +0200 Subject: [Numpy-discussion] Bug or feature? In-Reply-To: <32C3323B-4398-41A3-9A0D-3C8006DB14E6@enthought.com> References: <00EE14FD-C521-4197-8C94-5B7E53EAA246@fastmail.net> <32C3323B-4398-41A3-9A0D-3C8006DB14E6@enthought.com> Message-ID: <00A2AF9E-5E2B-4042-816E-80D4934675C9@fastmail.net> On 22 Aug 2011, at 14:50, Travis Oliphant wrote: > This goes into the category of "feature". Structured arrays use > tuples to indicate a record. So, (only) when using structured > arrays as a dtype, there is a difference between lists and > tuples. In this case, array sees the tuple and expects it to have > 2 elements to match the number of fields in 2. Thanks, that sounds reasonable. But is this role of tuples in the creation of structured arrays documented anywhere? The documentation on structured arrays concentrates on specifying the dtype. All I could find about array construction is a few examples. Konrad. From stefan-usenet at bytereef.org Tue Aug 23 08:10:48 2011 From: stefan-usenet at bytereef.org (Stefan Krah) Date: Tue, 23 Aug 2011 14:10:48 +0200 Subject: [Numpy-discussion] PyBUF_SIMPLE/PyBUF_FORMAT: casts to unsigned bytes Message-ID: <20110823121048.GA14594@sleipnir.bytereef.org> Hello, PEP-3118 presumably intended that a PyBUF_SIMPLE request should cast the original buffer's data type to 'B' (unsigned bytes). Here is a one-dimensional example that currently occurs in Lib/test/test_multiprocessing: >>> import array, io >>> a = array.array('i', [1,2,3,4,5]) >>> m = memoryview(a) >>> m.format 'i' >>> buf = io.BytesIO(bytearray(5*8)) >>> buf.readinto(m) buf.readinto() calls PyObject_AsWriteBuffer(), which requests a simple buffer from the memoryview, thus casting the 'i' data type to the implied type 'B'. The consumer can see that a cast has occurred because the new buffer's format field is NULL. This seems fine for the one-dimensional case. Numpy currently also allows such casts for multidimensional contiguous and non-contiguous arrays. See below for the examples; I don't want to distract from the main point of the post, which is this: I'm seeking a clear specification for the Python documentation that determines under what circumstances casts to 'B' should succeed. I'll formulate the points as statements for clarity, but in fact they are also questions: 1) An exporter of a C-contiguous array with ndim <= 1 MUST honor a PyBUF_SIMPLE request, setting format, shape and strides to NULL and itemsize to 1. As a corner case, an array with ndim = 0, format = "L" (or other) would also morph into a buffer of unsigned bytes. test_ctypes currently makes use of this. 2) An exporter of a C-contiguous buffer with ndim > 1 MUST honor a PyBUF_SIMPLE request, setting format, shape, and strides to NULL and itemsize to 1. 3) An exporter of a buffer that is not C-contiguous MUST raise BufferError in response to a PyBUF_SIMPLE request. Why am I looking for such rigid rules? The problem with memoryview is that it has to act as a re-exporter itself. For several reasons (performance of chained memoryviews, garbage collection, early release, etc.) it has been decided that the new memoryview object has a managed buffer that takes a snapshot of the original exporter's buffer (See: http://bugs.python.org/issue10181). Now, since getbuffer requests to the memoryview object cannot be redirected to the original object, strict rules are needed for memory_getbuf(). Could you agree with these rules? Point 2) isn't clear from the PEP itself. I assumed it because Numpy currently allows it, and it appears harmless. Stefan Krah Examples: ========= Cast a multidimensional contiguous array: ----------------------------------------- I think itemsize in the result should be 1. [_testbuffer.ndarray is from http://hg.python.org/features/pep-3118#memoryview] >>> from _testbuffer import * >>> from numpy import * >>> from _testbuffer import ndarray as pyarray >>> >>> exporter = ndarray(shape=[3,4], dtype="L") # Issue a PyBUF_SIMPLE request to 'exporter' and act as a re-exporter: >>> x = pyarray(exporter, getbuf=PyBUF_SIMPLE) >>> x.len 96 >>> x.shape () >>> x.strides () >>> x.format '' >>> x.itemsize # I think this should be 1, not 8. 8 Cast a multidimensional non-contiguous array: --------------------------------------------- This is clearly not right, since y.buf points to a location that the consumer cannot handle without shape and strides. >>> nd = ndarray(buffer=bytearray(96), shape=[3,4], dtype="L") [182658 refs] >>> exporter = nd[::-1, ::-2] [182661 refs] >>> exporter array([[0, 0], [0, 0], [0, 0]], dtype=uint64) [182659 refs] >>> y = pyarray(exporter, getbuf=PyBUF_SIMPLE) [182665 refs] >>> y.len 48 [182666 refs] >>> y.strides () [182666 refs] >>> y.shape () [182666 refs] >>> y.format '' [182666 refs] >>> y.itemsize 8 [182666 refs] From stefan-usenet at bytereef.org Tue Aug 23 08:16:40 2011 From: stefan-usenet at bytereef.org (Stefan Krah) Date: Tue, 23 Aug 2011 14:16:40 +0200 Subject: [Numpy-discussion] memoryview shape/strides representation for ndim = 0 In-Reply-To: References: <20110822123034.GA7743@sleipnir.bytereef.org> Message-ID: <20110823121640.GB14594@sleipnir.bytereef.org> Mark Dickinson wrote: > On Mon, Aug 22, 2011 at 1:30 PM, Stefan Krah wrote: > > Numpy arrays and memoryview currently have different representations > > for shape and strides if ndim = 0: [...] > > I think the Numpy representation is nicer. Also, I think that memoryviews > > should attempt to mimic the underlying object as closely as possible. > > Agreed on both points. If there's no good reason for m.shape and > m.strides to be None, I think it should be changed. Excellent, I'll go ahead with it then (in the feature repo). Stefan Krah From nadavh at visionsense.com Tue Aug 23 10:33:16 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 23 Aug 2011 07:33:16 -0700 Subject: [Numpy-discussion] Wrong treatment of byte order? Message-ID: <26FC23E7C398A64083C980D16001012D246DFC5FB5@VA3DIAXVS361.RED001.local> My system is a 64 bit gentoo linux on core i7 machine. Numpy version 1.6.1 and pyton(s) 2.7.2 and 3.2.1 Problem summary: I tried t invert a matrix of explicit little endian byte-order and got an error. The inversion run properly with a native byte order, and I get a wrong answer with not error message when the matrix is set to big-endian. mat is a 3x3 float64 array >> import numpy as N >>> mat.dtype.byteorder '<' >>> N.linalg.inv(mat) # Refuse to ibvert Traceback (most recent call last): File "", line 1, in N.linalg.inv(mat) File "/usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py", line 445, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) File "/usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py", line 326, in solve results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0) LapackError: Parameter a has non-native byte order in lapack_lite.dgesv >>> N.linalg.inv(mat.newbyteorder('=')) # OK array([[ 0.09234453, 0.46163744, 0.2713108 ], [ 0.48886135, 0.51230859, 0.2277598 ], [ 0.48303131, 0.82571266, 0.17551993]]) >>> N.linalg.inv(mat.newbyteorder('>')) # WRONG !!! array([[ 2.39051169e-159, -7.70643158e-157, 5.34087235e-160], [ 2.11823992e+305, 2.37224043e+307, -4.31607382e+304], [ -1.26608299e+304, -1.43225563e+306, 7.22233688e+303]]) Nadav -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Tue Aug 23 12:07:08 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Tue, 23 Aug 2011 18:07:08 +0200 Subject: [Numpy-discussion] Efficient way to load a 1Gb file? In-Reply-To: References: Message-ID: <781AF0C6-B761-4ABB-9798-9385582536E5@astro.physik.uni-goettingen.de> On 11.08.2011, at 8:50PM, Russell E. Owen wrote: > It seems a shame that loadtxt has no argument for predicted length, > which would allow preallocation and less appending/copying data. > > And yes...reading the whole file first to figure out how many elements > it has seems sensible to me -- at least as a switchable behavior, and > preferably the default. 1Gb isn't that large in modern systems, but > loadtxt is filing up all 6Gb of RAM reading it! 1 GB is indeed not much in terms of disk space these days, but using text files for such data amounts is nonetheless very much non-state-of-the-art ;-) That said, of course there is no justification to use excessive amounts of memory where it could be avoided! Implementing the above scheme for npyio is not quite as straightforward as in the example I gave before, mainly for the following reasons: loadtxt also has to deal with more complex data like structured arrays, plus comments, empty lines etc., meaning it has to find and count the actual valid data lines. Ideally, genfromtxt, which offers yet more functionality to deal with missing data, should offer the same options, but they would be certainly more difficult to implement there. More than 6 GB is still remarkable - from what info I found in the web, lists seem to consume ~24 Bytes/element, i.e. 3 times more than a final float64 array. The text representation would typically take 10-20 char's for one float (though with <12 digits, they could usually be read as float32 without loss of precision). Thus a factor >6 seems quite extreme, unless the file is full of (relatively) short integers... But this also means copying of the final array would still have a relatively low memory footprint compared to the buffer list, thus using some kind of mutable array type for reading should be a reasonable solution as well. Unfortunately fromiter is not of that much use here since it only reads 1D-arrays. I haven't tried to use Chris' accumulator class yet, so for now I did go the 2x read approach with loadtxt, it turned out to add only ~10% to the read-in time. For compressed files this goes up to 30-50%, but once physical memory is exhausted it should probably actually become faster. I've made a pull request https://github.com/numpy/numpy/pull/144 implementing that option as a switch 'prescan'; could you review it in particular regarding the following: Is the option reasonably named and documented? In the case the allocated array does not match the input data (which really should never happen), right now just a warning is issued, filling any excess buffer with zeros or discarding remaining input data - should this rather raise an IndexError? No prediction if/when I might be able to provide this for genfromtxt, sorry! Cheers, Derek From fperez.net at gmail.com Tue Aug 23 16:13:04 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 23 Aug 2011 13:13:04 -0700 (PDT) Subject: [Numpy-discussion] Bug or feature? In-Reply-To: <00A2AF9E-5E2B-4042-816E-80D4934675C9@fastmail.net> References: <00EE14FD-C521-4197-8C94-5B7E53EAA246@fastmail.net> <32C3323B-4398-41A3-9A0D-3C8006DB14E6@enthought.com> <00A2AF9E-5E2B-4042-816E-80D4934675C9@fastmail.net> Message-ID: <4e5409d0.0295e50a.4548.0bd7@mx.google.com> On Mon, Aug 22, 2011 at 11:28 PM, Konrad Hinsen wrote: > Thanks, that sounds reasonable. But is this role of tuples in the > creation of structured arrays documented anywhere? The documentation > on structured arrays concentrates on specifying the dtype. All I could > find about array construction is a few examples. Note from the peanut gallery: this is one area of the docs that could really use a separate .. warning:: (or at least .. note::) with the info Travis gave, because it's not obvious at all, and deciphering the resulting error message is pretty tricky if you've never seen it before. I know it's bitten me more than once and I always scratch my head for a few minutes... It just occurred to me that it would be very cool to have in the docs a few standalone HowTo documents on selected topics. Over the years I've found some of the Python howtos extremely useful, and I was very happy when I saw they started including them in the bundled docs. They complement very nicely the reference/api and explain certain key topics in a more tutorial fashion. Off the top of my head, here are a few ideas for enterprising souls to make a very useful contribution with a howto on each of these topics: - dtype/structured arrays and record arrays - fancy indexing, broadcasting, lib.index_tricks (if Anne could find the time to write this one, we'd be eternally grateful) - ctypes and cython for C interfacing and optimization - missing data/masked arrays (including the new goodies). These could be written by a small team, perhaps pairing an experienced numpy contributor with a new member who can provide the balance of perspective of a newcomer (very important in tutorial documentation) and simultaneously gain in-depth experience with important topics. OK, back to the comfort of my chair up here in the gallery... Cheers, f From stefan at sun.ac.za Tue Aug 23 17:47:03 2011 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 23 Aug 2011 14:47:03 -0700 Subject: [Numpy-discussion] Bug or feature? In-Reply-To: <4e5409d0.0295e50a.4548.0bd7@mx.google.com> References: <00EE14FD-C521-4197-8C94-5B7E53EAA246@fastmail.net> <32C3323B-4398-41A3-9A0D-3C8006DB14E6@enthought.com> <00A2AF9E-5E2B-4042-816E-80D4934675C9@fastmail.net> <4e5409d0.0295e50a.4548.0bd7@mx.google.com> Message-ID: On Tue, Aug 23, 2011 at 1:13 PM, Fernando Perez wrote: > Off the top of my head, here are a few ideas for enterprising souls to make a very useful contribution with a howto on each of these topics: > > - dtype/structured arrays and record arrays > - fancy indexing, broadcasting, lib.index_tricks (if Anne could find the time to write this one, we'd be eternally grateful) > - ctypes and cython for C interfacing and optimization > - missing data/masked arrays (including the new goodies). Some of these are included in numpy: import numpy.doc as doc doc.structured_arrays doc.indexing doc.performance <-- currently empty and IIRC they are elso edited via the docs editor. > These could be written by a small team, perhaps pairing an experienced numpy contributor with a new member who can provide the balance of perspective of a newcomer (very important in tutorial documentation) and simultaneously gain in-depth experience with important topics. I agree fully. Cheers St?fan From d.s.seljebotn at astro.uio.no Wed Aug 24 05:49:31 2011 From: d.s.seljebotn at astro.uio.no (Dag Sverre Seljebotn) Date: Wed, 24 Aug 2011 11:49:31 +0200 Subject: [Numpy-discussion] PyBUF_SIMPLE/PyBUF_FORMAT: casts to unsigned bytes In-Reply-To: <20110823121048.GA14594@sleipnir.bytereef.org> References: <20110823121048.GA14594@sleipnir.bytereef.org> Message-ID: <9d39119a-723a-4018-8ba2-149416f59658@email.android.com> (sorry for the top-post, no way around it) Under 2), would it make sense to also export the contents of a Fortran-contiguous buffer as a raw byte stream? I was just the other week writing code to serialize an array in Fortran order to a binary stream. OTOH I could easily serialize its transpose for the same effect. Just something to think about. -- Sent from my Android phone with K-9 Mail. Please excuse my brevity. Stefan Krah wrote: Hello, PEP-3118 presumably intended that a PyBUF_SIMPLE request should cast the original buffer's data type to 'B' (unsigned bytes). Here is a one-dimensional example that currently occurs in Lib/test/test_multiprocessing: >>> import array, io >>> a = array.array('i', [1,2,3,4,5]) >>> m = memoryview(a) >>> m.format 'i' >>> buf = io.BytesIO(bytearray(5*8)) >>> buf.readinto(m) buf.readinto() calls PyObject_AsWriteBuffer(), which requests a simple buffer from the memoryview, thus casting the 'i' data type to the implied type 'B'. The consumer can see that a cast has occurred because the new buffer's format field is NULL. This seems fine for the one-dimensional case. Numpy currently also allows such casts for multidimensional contiguous and non-contiguous arrays. See below for the examples; I don't want to distract from the main point of the post, which is this: I'm seeking a clear specification for the Python documentation that determines under what circumstances casts to 'B' should succeed. I'll formulate the points as statements for clarity, but in fact they are also questions: 1) An exporter of a C-contiguous array with ndim <= 1 MUST honor a PyBUF_SIMPLE request, setting format, shape and strides to NULL and itemsize to 1. As a corner case, an array with ndim = 0, format = "L" (or other) would also morph into a buffer of unsigned bytes. test_ctypes currently makes use of this. 2) An exporter of a C-contiguous buffer with ndim > 1 MUST honor a PyBUF_SIMPLE request, setting format, shape, and strides to NULL and itemsize to 1. 3) An exporter of a buffer that is not C-contiguous MUST raise BufferError in response to a PyBUF_SIMPLE request. Why am I looking for such rigid rules? The problem with memoryview is that it has to act as a re-exporter itself. For several reasons (performance of chained memoryviews, garbage collection, early release, etc.) it has been decided that the new memoryview object has a managed buffer that takes a snapshot of the original exporter's buffer (See: http://bugs.python.org/issue10181). Now, since getbuffer requests to the memoryview object cannot be redirected to the original object, strict rules are needed for memory_getbuf(). Could you agree with these rules? Point 2) isn't clear from the PEP itself. I assumed it because Numpy currently allows it, and it appears harmless. Stefan Krah Examples: ========= Cast a multidimensional contiguous array:_____________________________________________ I think itemsize in the result should be 1. [_testbuffer.ndarray is from http://hg.python.org/features/pep-3118#memoryview] >>> from _testbuffer import * >>> from numpy import * >>> from _testbuffer import ndarray as pyarray >>> >>> exporter = ndarray(shape=[3,4], dtype="L") # Issue a PyBUF_SIMPLE request to 'exporter' and act as a re-exporter: >>> x = pyarray(exporter, getbuf=PyBUF_SIMPLE) >>> x.len 96 >>> x.shape () >>> x.strides () >>> x.format '' >>> x.itemsize # I think this should be 1, not 8. 8 Cast a multidimensional non-contiguous array:_____________________________________________ This is clearly not right, since y.buf points to a location that the consumer cannot handle without shape and strides. >>> nd = ndarray(buffer=bytearray(96), shape=[3,4], dtype="L") [182658 refs] >>> exporter = nd[::-1, ::-2] [182661 refs] >>> exporter array([[0, 0], [0, 0], [0, 0]], dtype=uint64) [182659 refs] >>> y = pyarray(exporter, getbuf=PyBUF_SIMPLE) [182665 refs] >>> y.len 48 [182666 refs] >>> y.strides () [182666 refs] >>> y.shape () [182666 refs] >>> y.format '' [182666 refs] >>> y.itemsize 8 [182666 refs]_____________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Wed Aug 24 12:22:10 2011 From: scopatz at gmail.com (Anthony Scopatz) Date: Wed, 24 Aug 2011 11:22:10 -0500 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: References: <4E5040DF.9090303@simplistix.co.uk> Message-ID: On Sun, Aug 21, 2011 at 7:24 AM, Pauli Virtanen wrote: > On Sat, 20 Aug 2011 16:18:55 -0700, Chris Withers wrote: > > I've got a tree of nested dicts that at their leaves end in numpy arrays > > of identical sizes. > > > > What's the easiest way to persist these to disk so that I can pick up > > with them where I left off? > > Depends on your requirements. > > You can use Python pickling, if you do *not* have a requirement for: > > - real persistence, i.e., being able to easily read the data years later > - a standard data format > - access from non-Python programs > - safety against malicious parties (unpickling can execute some code > in the input -- although this is possible to control) > > then you can use Python pickling: > > import pickle > > file = open('out.pck', 'wb') > pickle.dump(file, tree, protocol=pickle.HIGHEST_PROTOCOL) > file.close() > > file = open('out.pck', 'rb') > tree = pickle.load(file) > file.close() > > This should just work (TM) directly with your tree-of-dicts-and-arrays. > > > What's the most "correct" way to do so? > > > > I'm using IPython if that makes things easier... > > > > I had wondered about PyTables, but that seems a bit too heavyweight for > > this, unless I'm missing something? > > If I had one or more of the requirements listed above, I'd use the HDF5 > format, via either PyTables or h5py. If I'd just need to cache the trees, > then I'd use pickling. > > I think the only reason to consider heavy-weighedness is distribution: > does your target audience have these libraries already installed > (they are pre-installed in several Python-for-science distributions), > and how difficult would it be for you to ship them with your stuff, > or to require the users to install them. > +1 to PyTables or h5py. > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Wed Aug 24 19:53:09 2011 From: srean.list at gmail.com (srean) Date: Wed, 24 Aug 2011 18:53:09 -0500 Subject: [Numpy-discussion] c-info.ufunc-tutorial.rst Message-ID: Hi, I was reading this document, https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst its well written and there is a good build up to exciting code examples that are coming, but I do not see the actual examples, only how they may be used. Is it located somewhere else and not linked? or is it that the c-info.ufunc-tutorial.rst document is incomplete and the examples have not been written. I suspect the former. In that case could anyone point to the code examples and may be also update the c-info.ufunc-tutorial.rst document. Thanks -- srean -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Wed Aug 24 20:05:38 2011 From: srean.list at gmail.com (srean) Date: Wed, 24 Aug 2011 19:05:38 -0500 Subject: [Numpy-discussion] c-info.ufunc-tutorial.rst In-Reply-To: References: Message-ID: Following up on my own question: I can see the code in the commit. So it appears that code-block:: Are not being rendered correctly. Could anyone confirm ? In case it is my browser alone, though I did try after disabling no-script. On Wed, Aug 24, 2011 at 6:53 PM, srean wrote: > Hi, > > I was reading this document, > https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst > > its well written and there is a good build up to exciting code examples > that are coming, but I do not see the actual examples, only how they may be > used. Is it located somewhere else and not linked? or is it that the > c-info.ufunc-tutorial.rst document is incomplete and the examples have not > been written. I suspect the former. In that case could anyone point to the > code examples and may be also update the c-info.ufunc-tutorial.rst document. > > Thanks > > -- srean > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Aug 24 20:08:59 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 24 Aug 2011 17:08:59 -0700 Subject: [Numpy-discussion] NA mask C-API documentation Message-ID: I've added C-API documentation to the missingdata branch. The .rst file (beware of the github rst parser though, it drops some of the content) is here: https://github.com/m-paradox/numpy/blob/missingdata/doc/source/reference/c-api.maskna.rst and I made a small example module which goes with it here: https://github.com/m-paradox/spdiv Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Aug 24 20:10:34 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 24 Aug 2011 17:10:34 -0700 Subject: [Numpy-discussion] c-info.ufunc-tutorial.rst In-Reply-To: References: Message-ID: On Wed, Aug 24, 2011 at 5:05 PM, srean wrote: > Following up on my own question: I can see the code in the commit. So it > appears that > > code-block:: > > Are not being rendered correctly. Could anyone confirm ? In case it is my > browser alone, though I did try after disabling no-script. I believe this is because of github's .rst processor which simply drops blocks it can't understand. When building NumPy documentation, many more extensions and context exists. I'm getting the same thing in the C-API NA-mask documentation I just posted. -Mark > > > On Wed, Aug 24, 2011 at 6:53 PM, srean wrote: > >> Hi, >> >> I was reading this document, >> https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst >> >> its well written and there is a good build up to exciting code examples >> that are coming, but I do not see the actual examples, only how they may be >> used. Is it located somewhere else and not linked? or is it that the >> c-info.ufunc-tutorial.rst document is incomplete and the examples have not >> been written. I suspect the former. In that case could anyone point to the >> code examples and may be also update the c-info.ufunc-tutorial.rst document. >> >> Thanks >> >> -- srean >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Wed Aug 24 20:19:13 2011 From: scopatz at gmail.com (Anthony Scopatz) Date: Wed, 24 Aug 2011 19:19:13 -0500 Subject: [Numpy-discussion] c-info.ufunc-tutorial.rst In-Reply-To: References: Message-ID: code-block:: is a directive that I think might be specific to sphinx. Naturally, github's renderer will drop it. On Wed, Aug 24, 2011 at 7:10 PM, Mark Wiebe wrote: > On Wed, Aug 24, 2011 at 5:05 PM, srean wrote: > >> Following up on my own question: I can see the code in the commit. So it >> appears that >> >> code-block:: >> >> Are not being rendered correctly. Could anyone confirm ? In case it is my >> browser alone, though I did try after disabling no-script. > > > I believe this is because of github's .rst processor which simply drops > blocks it can't understand. When building NumPy documentation, many more > extensions and context exists. I'm getting the same thing in the C-API > NA-mask documentation I just posted. > > -Mark > > >> >> >> On Wed, Aug 24, 2011 at 6:53 PM, srean wrote: >> >>> Hi, >>> >>> I was reading this document, >>> https://github.com/numpy/numpy/blob/master/doc/source/user/c-info.ufunc-tutorial.rst >>> >>> its well written and there is a good build up to exciting code examples >>> that are coming, but I do not see the actual examples, only how they may be >>> used. Is it located somewhere else and not linked? or is it that the >>> c-info.ufunc-tutorial.rst document is incomplete and the examples have not >>> been written. I suspect the former. In that case could anyone point to the >>> code examples and may be also update the c-info.ufunc-tutorial.rst document. >>> >>> Thanks >>> >>> -- srean >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Aug 24 20:19:58 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 24 Aug 2011 17:19:58 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey wrote: > Hi, > > > 2) Can the 'skipna' flag be added to the methods? > >>> a.sum(skipna=True) > Traceback (most recent call last): > File "", line 1, in > TypeError: 'skipna' is an invalid keyword argument for this function > >>> np.sum(a,skipna=True) > nan > I've added this now, as well. I think that finishes up the changes you suggested in this email which felt right to me. Cheers, Mark > -------------- next part -------------- An HTML attachment was scrubbed... URL: From srean.list at gmail.com Wed Aug 24 20:34:28 2011 From: srean.list at gmail.com (srean) Date: Wed, 24 Aug 2011 19:34:28 -0500 Subject: [Numpy-discussion] c-info.ufunc-tutorial.rst In-Reply-To: References: Message-ID: Thanks Anthony and Mark, this is good to know. So what would be the advised way of looking at freshly baked documentation ? Just look at the raw files ? or is there some place else where the correct sphinx rendered docs are hosted. On Wed, Aug 24, 2011 at 7:19 PM, Anthony Scopatz wrote: > code-block:: is a directive that I think might be specific to sphinx. > Naturally, github's renderer will drop it. > > On Wed, Aug 24, 2011 at 7:10 PM, Mark Wiebe wrote: > >> >> I believe this is because of github's .rst processor which simply drops >> blocks it can't understand. When building NumPy documentation, many more >> extensions and context exists. I'm getting the same thing in the C-API >> NA-mask documentation I just posted. >> >> -Mark >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Wed Aug 24 20:43:45 2011 From: scopatz at gmail.com (Anthony Scopatz) Date: Wed, 24 Aug 2011 19:43:45 -0500 Subject: [Numpy-discussion] c-info.ufunc-tutorial.rst In-Reply-To: References: Message-ID: On Wed, Aug 24, 2011 at 7:34 PM, srean wrote: > Thanks Anthony and Mark, this is good to know. > > So what would be the advised way of looking at freshly baked documentation > ? Just look at the raw files ? or is there some place else where the correct > sphinx rendered docs are hosted. > Building the docs yourself is probably the safest bet. However, someone should probably hook up the numpy and scipy repos to readthedocs.org. That would solve this problem... > > On Wed, Aug 24, 2011 at 7:19 PM, Anthony Scopatz wrote: > >> code-block:: is a directive that I think might be specific to sphinx. >> Naturally, github's renderer will drop it. >> >> On Wed, Aug 24, 2011 at 7:10 PM, Mark Wiebe wrote: >> >>> >>> I believe this is because of github's .rst processor which simply drops >>> blocks it can't understand. When building NumPy documentation, many more >>> extensions and context exists. I'm getting the same thing in the C-API >>> NA-mask documentation I just posted. >>> >>> -Mark >>> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dominique.orban at gmail.com Wed Aug 24 21:07:33 2011 From: dominique.orban at gmail.com (dpo) Date: Wed, 24 Aug 2011 18:07:33 -0700 (PDT) Subject: [Numpy-discussion] ImportError: dynamic module does not define init function (initmultiarray) In-Reply-To: References: Message-ID: <32330873.post@talk.nabble.com> dpo wrote: > > --- > Traceback (most recent call last): > File "/Users/dpo/.virtualenvs/matrox/matrox/curve.py", line 3, in > > import numpy as np > File > "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/__init__.py", > line 137, in > import add_newdocs > File > "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/add_newdocs.py", > line 9, in > from numpy.lib import add_newdoc > File > "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/lib/__init__.py", > line 4, in > from type_check import * > File > "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/lib/type_check.py", > line 8, in > import numpy.core.numeric as _nx > File > "/Users/dpo/.virtualenvs/matrox/lib/python2.7/site-packages/numpy/core/__init__.py", > line 5, in > import multiarray > ImportError: dynamic module does not define init function (initmultiarray) > --- > > So I am lead to ask: should multiarray.so really be called > _multiarray.so? If not, any idea what the problem is? > If I may answer my own question, the answer is no. The issue here is that numpy was compiled for the x86_64 architecture only, while other libraries I need to link with are i386 only. Changing CFLAGS and LDFLAGS to "-arch i386 -arch x86_64" resolved the issue. Sorry for the noise. Dominique -- View this message in context: http://old.nabble.com/ImportError%3A-dynamic-module-does-not-define-init-function-%28initmultiarray%29-tp32299073p32330873.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From wesmckinn at gmail.com Wed Aug 24 21:09:35 2011 From: wesmckinn at gmail.com (Wes McKinney) Date: Wed, 24 Aug 2011 21:09:35 -0400 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Wed, Aug 24, 2011 at 8:19 PM, Mark Wiebe wrote: > On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey wrote: >> >> Hi, >> >> >> 2) Can the 'skipna' flag be added to the methods? >> >>> a.sum(skipna=True) >> Traceback (most recent call last): >> ?File "", line 1, in >> TypeError: 'skipna' is an invalid keyword argument for this function >> >>> np.sum(a,skipna=True) >> nan > > I've added this now, as well. I think that finishes up the changes you > suggested in this email which felt right to me. > Cheers, > Mark > >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Sorry I haven't had a chance to have a tinker yet. My initial observations: - I haven't decided whether this is a problem: In [50]: arr = np.arange(100) In [51]: arr[5:10] = np.NA --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/wesm/ in () ----> 1 arr[5:10] = np.NA ValueError: Cannot set NumPy array values to NA values without first enabling NA support in the array I assume when you flip the maskna switch that a mask is created? - Performance with skipna is a bit disappointing: In [52]: arr = np.random.randn(1e6) In [54]: arr.flags.maskna = True In [56]: arr[::2] = np.NA In [58]: timeit arr.sum(skipna=True) 100 loops, best of 3: 7.31 ms per loop this goes down to 2.12 ms if there are no NAs present. but: In [59]: import bottleneck as bn In [60]: arr = np.random.randn(1e6) In [61]: arr[::2] = np.nan In [62]: timeit bn.nansum(arr) 1000 loops, best of 3: 1.17 ms per loop do you have a sense if this gap can be closed? I assume you've been, as you should, focused on a correct implementation as opposed with squeezing out performance. best, Wes From mwwiebe at gmail.com Wed Aug 24 21:35:50 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 24 Aug 2011 18:35:50 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Wed, Aug 24, 2011 at 6:09 PM, Wes McKinney wrote: > On Wed, Aug 24, 2011 at 8:19 PM, Mark Wiebe wrote: > > On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey > wrote: > >> > >> Hi, > >> > >> > >> 2) Can the 'skipna' flag be added to the methods? > >> >>> a.sum(skipna=True) > >> Traceback (most recent call last): > >> File "", line 1, in > >> TypeError: 'skipna' is an invalid keyword argument for this function > >> >>> np.sum(a,skipna=True) > >> nan > > > > I've added this now, as well. I think that finishes up the changes you > > suggested in this email which felt right to me. > > Cheers, > > Mark > > > >> > >> > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > Sorry I haven't had a chance to have a tinker yet. My initial observations: > > - I haven't decided whether this is a problem: > > In [50]: arr = np.arange(100) > > In [51]: arr[5:10] = np.NA > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /home/wesm/ in () > ----> 1 arr[5:10] = np.NA > > ValueError: Cannot set NumPy array values to NA values without first > enabling NA support in the array > > I assume when you flip the maskna switch that a mask is created? > That's correct, it creates a fully exposed mask when you set the flag. The thought was that having an assignment automatically add a mask to an array would be a bad idea ("explicit vs implicit"). > > - Performance with skipna is a bit disappointing: > > In [52]: arr = np.random.randn(1e6) > In [54]: arr.flags.maskna = True > In [56]: arr[::2] = np.NA > In [58]: timeit arr.sum(skipna=True) > 100 loops, best of 3: 7.31 ms per loop > > this goes down to 2.12 ms if there are no NAs present. > The alternating case is going to get the worst possible performance currently. The masked loop has no specialization to the operation or data type whatsoever yet, it simply calls the regular inner loop on the appropriate runs of data. > but: > > In [59]: import bottleneck as bn > In [60]: arr = np.random.randn(1e6) > In [61]: arr[::2] = np.nan > In [62]: timeit bn.nansum(arr) > 1000 loops, best of 3: 1.17 ms per loop > > do you have a sense if this gap can be closed? I assume you've been, > as you should, focused on a correct implementation as opposed with > squeezing out performance. > I've been focusing on a correct implementation while installing hooks in the right places so that the performance can be improved later. For the straightforward masked copying code, I previously created a ticket describing what needs to be done: http://projects.scipy.org/numpy/ticket/1901 For element-wise ufuncs, the changes needed are similar, creating inner loops specialized for masks. In doing these changes, I also figured out a way to add the ability to more properly specialize the inner loops along the lines of einsum without breaking ABI compatibility, so I set up the API as required for this. Thanks for taking a look, Mark > > best, > Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Aug 24 22:29:44 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 24 Aug 2011 19:29:44 -0700 Subject: [Numpy-discussion] NA masks for NumPy are ready to test In-Reply-To: References: Message-ID: On Wed, Aug 24, 2011 at 6:09 PM, Wes McKinney wrote: > > - Performance with skipna is a bit disappointing: > > In [52]: arr = np.random.randn(1e6) > In [54]: arr.flags.maskna = True > In [56]: arr[::2] = np.NA > In [58]: timeit arr.sum(skipna=True) > 100 loops, best of 3: 7.31 ms per loop > > this goes down to 2.12 ms if there are no NAs present. > > but: > > In [59]: import bottleneck as bn > In [60]: arr = np.random.randn(1e6) > In [61]: arr[::2] = np.nan > In [62]: timeit bn.nansum(arr) > 1000 loops, best of 3: 1.17 ms per loop > > do you have a sense if this gap can be closed? I assume you've been, > as you should, focused on a correct implementation as opposed with > squeezing out performance. > It looks like the spdiv example module I created for the C-API documentation can give a bit of an idea for some performance expectations. The example has no specialization for strides, and it operates exactly like np.divide except it converts the output to NA instead of dividing by zero. It *always* creates an NA mask for the output, and does a masked loop. Here's a link to the example module: https://github.com/m-paradox/spdiv In [1]: from spdiv_mod import spdiv In [2]: arr = np.random.randn(1e6) Since spdiv always creates an NA mask, this is comparing an NA-masked divide with a regular NumPy divide: In [3]: timeit spdiv(arr, 3.1) 100 loops, best of 3: 13.8 ms per loop In [4]: timeit arr / 3.1 10 loops, best of 3: 11.4 ms per loop Here, the divide is causing an NA mask to be created in the output, just like in spdiv: In [5]: timeit spdiv(arr, np.NA) 100 loops, best of 3: 4.72 ms per loop In [6]: timeit arr / np.NA 100 loops, best of 3: 8.71 ms per loop Here are the same tests, but after giving 'arr' an NA mask: In [7]: arr.flags.maskna = True In [8]: timeit spdiv(arr, 3.1) 100 loops, best of 3: 14.2 ms per loop In [9]: timeit arr / 3.1 10 loops, best of 3: 20.1 ms per loop In [10]: timeit spdiv(arr, np.NA) 100 loops, best of 3: 4.02 ms per loop In [11]: timeit arr / np.NA 100 loops, best of 3: 8.69 ms per loop Another thought is to compare sum to count_nonzero, which is implemented in a straightforward fashion without the masked wrapping mechanism that's in the ufuncs. n [12]: arr[::2] = np.NA In [13]: np.count_nonzero(arr) Out[13]: NA(dtype='int64') In [14]: np.count_nonzero(arr, skipna=True) Out[14]: 500000 In [15]: timeit np.count_nonzero(arr, skipna=True) 100 loops, best of 3: 5.86 ms per loop In [16]: timeit np.sum(arr, skipna=True) 10 loops, best of 3: 16.1 ms per loop In [17]: timeit np.count_nonzero(arr, skipna=False) 100 loops, best of 3: 1.85 ms per loop In [18]: timeit np.sum(arr, skipna=False) 100 loops, best of 3: 1.86 ms per loop Cheers, Mark > > best, > Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Thu Aug 25 00:39:15 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Wed, 24 Aug 2011 23:39:15 -0500 Subject: [Numpy-discussion] Decimal arrays? In-Reply-To: References: <4E50371F.1070105@simplistix.co.uk> <4E503A04.8070600@simplistix.co.uk> <4E52709F.4050300@simplistix.co.uk> Message-ID: <4CFFFC38-D955-4AA2-9B52-F34DB941387E@enthought.com> On Aug 22, 2011, at 3:51 PM, Robert Kern wrote: > On Mon, Aug 22, 2011 at 10:07, Chris Withers wrote: >> On 22/08/2011 00:18, Mark Dickinson wrote: >>> On Sun, Aug 21, 2011 at 1:08 AM, Robert Kern wrote: >>>> You may want to try the cdecimal package: >>>> >>>> http://pypi.python.org/pypi/cdecimal/ >>> >>> I'll second this suggestion. cdecimal is an extraordinarily carefully >>> written and well-tested (almost) drop-in replacement for the decimal >>> module, and well worth a try. It would probably be in the Python >>> standard library by now if anyone had had proper time to review it... >> >> Who would need to review it? >> >> I'm surprised this isn't in EPD... any ideas why? > > No one has asked for it, to my knowledge. We do provide it in our PyPI > repository, so > > $ enpkg cdecimal > > should install it if you are an EPD subscriber. This should work with EPDFree even if you aren't an EPD subscriber as well. The fact that cDecimal isn't in EPD is a good thing for you in this case. Automatically built PyPI packages are provided to everybody as long as the package itself is not in EPD proper (to avoid our EPD customers getting "automatic" builds of core packages instead of our tested and verified builds). -Travis > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From teoliphant at gmail.com Thu Aug 25 00:42:06 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Wed, 24 Aug 2011 23:42:06 -0500 Subject: [Numpy-discussion] memoryview shape/strides representation for ndim = 0 In-Reply-To: References: <20110822123034.GA7743@sleipnir.bytereef.org> Message-ID: On Aug 22, 2011, at 7:35 AM, Mark Dickinson wrote: > On Mon, Aug 22, 2011 at 1:30 PM, Stefan Krah wrote: >> Numpy arrays and memoryview currently have different representations >> for shape and strides if ndim = 0: >> >>>>> from numpy import * >>>>> x = array(9, int32) >>>>> x.ndim >> 0 >>>>> x.shape >> () >>>>> x.strides >> () >>>>> m = memoryview(x) >>>>> m.ndim >> 0L >>>>> m.shape is None >> True >>>>> m.strides is None >> True >> >> >> I think the Numpy representation is nicer. Also, I think that memoryviews >> should attempt to mimic the underlying object as closely as possible. > > Agreed on both points. If there's no good reason for m.shape and > m.strides to be None, I think it should be changed. I can't think of any good reason not to change it to use the NumPy defaults. This sounds right to me. -Travis > > Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From josef.pktd at gmail.com Thu Aug 25 01:04:03 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 25 Aug 2011 01:04:03 -0400 Subject: [Numpy-discussion] c-info.ufunc-tutorial.rst In-Reply-To: References: Message-ID: On Wed, Aug 24, 2011 at 8:43 PM, Anthony Scopatz wrote: > > > On Wed, Aug 24, 2011 at 7:34 PM, srean wrote: >> >> Thanks Anthony and Mark, this is good to know. >> >> So what would be the advised way of looking at freshly baked documentation >> ? Just look at the raw files ? or is there some place else where the correct >> sphinx rendered docs are hosted. > > Building the docs yourself is probably the safest bet. ?However, someone > should probably hook up the numpy and scipy repos to readthedocs.org. ?That > would solve this problem... Maybe someone just needs to add it here http://docs.scipy.org/numpy/docs/numpy-docs/user/c-info.rst/#c-info and it would show up in numpy's own docs, which are hooked up to the repo, as far as I know. Josef > >> >> On Wed, Aug 24, 2011 at 7:19 PM, Anthony Scopatz >> wrote: >>> >>> code-block:: is a directive that I think might be specific to sphinx. >>> ?Naturally, github's renderer will drop it. >>> On Wed, Aug 24, 2011 at 7:10 PM, Mark Wiebe wrote: >>>> >>>> I believe this is because of github's .rst processor which simply drops >>>> blocks it can't understand. When building NumPy documentation, many more >>>> extensions and context exists. I'm getting the same thing in the C-API >>>> NA-mask documentation I just posted. >>>> -Mark >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From srean.list at gmail.com Thu Aug 25 08:23:40 2011 From: srean.list at gmail.com (srean) Date: Thu, 25 Aug 2011 07:23:40 -0500 Subject: [Numpy-discussion] the build and installation process Message-ID: Hi, I would like to know a bit about how the installation process works. Could you point me to a resource. In particular I want to know how the site.cfg configuration works. Is it numpy/scipy specific or is it standard with distutils. I googled for site.cfg and distutils but did not find any authoritative document. I believe many new users trip up on the installation process, especially in trying to substitute their favourite library in place os the standard. So a canonical document explaining the process will be very helpful. http://docs.scipy.org/doc/numpy/user/install.html does cover some of the important points but its a bit sketchy, and has a "this is all that you need to know" flavor. Doesnt quite enable the reader to fix his own problems. So a resource that is somewhere in between reading up all the sources that get invoked during the installation and building, and the current install document will be very welcome. English is not my native language, but if there is anyway I can help, I would do so gladly. -- srean -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Thu Aug 25 13:55:49 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Thu, 25 Aug 2011 10:55:49 -0700 Subject: [Numpy-discussion] NA-mask introductory documentation Message-ID: I've written some introductory documentation for the NA-masked arrays. The patch is here: https://github.com/m-paradox/numpy/commit/227e39c34b0e5d9dfde2bbce054b5a8ac088fd64 This is approaching the end of what I will implement for NA masks at the moment. I think the system is quite usable as is, though it is missing a number of major pieces like support for struct-NA, file I/O, and other things mentioned in the release notes. On the other hand, the C API for working with NA-masked arrays is solid and designed for future expansion to multi-NA, and many things can be done already with the implementation. It's also very stable and does not break ABI compatibility, so a NumPy release with NA masks in its current state should be perfectly reasonable. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Aug 25 14:42:04 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Thu, 25 Aug 2011 11:42:04 -0700 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: References: <4E5040DF.9090303@simplistix.co.uk> Message-ID: <4E56977C.3000107@noaa.gov> On 8/24/11 9:22 AM, Anthony Scopatz wrote: > You can use Python pickling, if you do *not* have a requirement for: I can't recall why, but it seem pickling of numpy arrays has been fragile and not very performant. I like the npy / npz format, built in to numpy, if you don't need: > - access from non-Python programs it's quick and easy to use: In [5]: a Out[5]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [6]: b Out[6]: array([ 0., 1., 2., 3., 4.]) In [7]: filename = "test.npz" In [8]: np.savez(filename, a=a, b=b) In [9]: del a, b In [10]: # now reload: In [11]: data = np.load(filename) In [14]: data['a'] Out[14]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [15]: data['b'] Out[15]: array([ 0., 1., 2., 3., 4.]) I'd go with hdf5 or netcdf if you want a standard format that can be read by non-python software. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From paulepanter at users.sourceforge.net Thu Aug 25 15:10:34 2011 From: paulepanter at users.sourceforge.net (Paul Menzel) Date: Thu, 25 Aug 2011 21:10:34 +0200 Subject: [Numpy-discussion] How to output array with indexes to a text file? Message-ID: <1314299436.18748.19.camel@mattotaupa> Dear NumPy folks, is there an easy way to also save the indexes of an array (columns, rows or both) when outputting it to a text file. For saving an array to a file I only found `savetxt()` [1] which does not seem to have such an option. Adding indexes manually is doable but I would like to avoid that. --- minimal example (also attached) --- from numpy import * a = zeros([2, 3], int) print(a) savetxt("/tmp/test1.txt", a, fmt='%8i') # Work around for adding the indexes for the columns. a[0] = range(3) print(a) savetxt("/tmp/test2.txt", a, fmt='%8i') --- minimal example --- The output is the following. $ python output-array.py [[0 0 0] [0 0 0]] [[0 1 2] [0 0 0]] $ more /tmp/test* :::::::::::::: /tmp/test1.txt :::::::::::::: 0 0 0 0 0 0 :::::::::::::: /tmp/test2.txt :::::::::::::: 0 1 2 0 0 0 Is there a way to accomplish that task without reserving the 0th row or column to store the indexes? I want to process these text files to produce graphs and MetaPost?s [2] graph package needs these indexes. (I know about Matplotlib [3], but I would like to use MetaPost.) Thanks, Paul [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html [2] http://wiki.contextgarden.net/MetaPost [3] http://matplotlib.sourceforge.net/ -------------- next part -------------- A non-text attachment was scrubbed... Name: output-array.py Type: text/x-python Size: 209 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: This is a digitally signed message part URL: From jjhelmus at gmail.com Thu Aug 25 15:36:57 2011 From: jjhelmus at gmail.com (Jonathan Helmus) Date: Thu, 25 Aug 2011 15:36:57 -0400 Subject: [Numpy-discussion] How to output array with indexes to a text file? In-Reply-To: <1314299436.18748.19.camel@mattotaupa> References: <1314299436.18748.19.camel@mattotaupa> Message-ID: <4E56A459.4010601@gmail.com> Paul Menzel wrote: > Dear NumPy folks, > > > is there an easy way to also save the indexes of an array (columns, rows > or both) when outputting it to a text file. For saving an array to a > file I only found `savetxt()` [1] which does not seem to have such an > option. Adding indexes manually is doable but I would like to avoid > that. > > --- minimal example (also attached) --- > from numpy import * > > a = zeros([2, 3], int) > print(a) > > savetxt("/tmp/test1.txt", a, fmt='%8i') > > # Work around for adding the indexes for the columns. > a[0] = range(3) > print(a) > > savetxt("/tmp/test2.txt", a, fmt='%8i') > --- minimal example --- > > The output is the following. > > $ python output-array.py > [[0 0 0] > [0 0 0]] > [[0 1 2] > [0 0 0]] > $ more /tmp/test* > :::::::::::::: > /tmp/test1.txt > :::::::::::::: > 0 0 0 > 0 0 0 > :::::::::::::: > /tmp/test2.txt > :::::::::::::: > 0 1 2 > 0 0 0 > > Is there a way to accomplish that task without reserving the 0th row or > column to store the indexes? > > I want to process these text files to produce graphs and MetaPost?s [2] > graph package needs these indexes. (I know about Matplotlib [3], but I > would like to use MetaPost.) > > > Thanks, > > Paul > > > [1] http://docs.scipy.org/doc/numpy/reference/generated/numpy.savetxt.html > [2] http://wiki.contextgarden.net/MetaPost > [3] http://matplotlib.sourceforge.net/ > > ------------------------------------------------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Paul, I don't know of any numpy function which will output the array indexes but with numpy's ndindex this can be accomplished with a for loop. import numpy as np a = np.arange(12).reshape(3,4) f = open("test.txt",'w') for i in np.ndindex(a.shape): print >> f," ".join([str[s] for s in i]),a[i] f.close() cat test.txt 0 0 0 0 1 1 0 2 2 ... From wardefar at iro.umontreal.ca Thu Aug 25 18:49:16 2011 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Thu, 25 Aug 2011 18:49:16 -0400 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: <4E56977C.3000107@noaa.gov> References: <4E5040DF.9090303@simplistix.co.uk> <4E56977C.3000107@noaa.gov> Message-ID: On 2011-08-25, at 2:42 PM, Chris.Barker wrote: > On 8/24/11 9:22 AM, Anthony Scopatz wrote: >> You can use Python pickling, if you do *not* have a requirement for: > > I can't recall why, but it seem pickling of numpy arrays has been > fragile and not very performant. > > I like the npy / npz format, built in to numpy, if you don't need: > >> - access from non-Python programs While I'm not aware of reader implementations for any other language, NPY is a dirt-simple and well-documented format designed by Robert Kern, and should be readable without too much trouble from any language that supports binary I/O. The full spec is at https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt It should be especially trivial to read arrays of simple scalar numeric dtypes, but reading compound dtypes is also doable. For NPZ, use a standard zip file reading library to access individual files in the archive, which are in .npy format (or just unzip it by hand first -- it's a normal .zip file with a special extension). David From kk1674 at nyu.edu Fri Aug 26 00:27:31 2011 From: kk1674 at nyu.edu (Kibeom Kim) Date: Fri, 26 Aug 2011 00:27:31 -0400 Subject: [Numpy-discussion] lazy loading ndarray? (not from file, but from user function) Message-ID: Hello, Q1. Is lazy loading ndarray from user defined data supplying function possible? Q2. If possible, how can I implement it? The closest method I can think of is, (which requires c++ posix) 1. create a memory region using mmap and protect read operation by mprotect. 2. add SIGSEGV signal handler to trap read operation on the memory region, and the handler will provide appropriate user data and recover from SIGSEGV. 3. slightly modify memmap class to use the above mmap (memmap is already using mmap internally, so it's not a big deal) but obviously, recovering from SIGSEGV requires removing mprotect (see http://stackoverflow.com/questions/2663456/write-a-signal-handler-to-catch-sigsegv) and it's impossible to know when to lock the region by mprotect again. Thanks, -Kibeom Kim From robert.kern at gmail.com Fri Aug 26 00:32:51 2011 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 25 Aug 2011 23:32:51 -0500 Subject: [Numpy-discussion] lazy loading ndarray? (not from file, but from user function) In-Reply-To: References: Message-ID: On Thu, Aug 25, 2011 at 23:27, Kibeom Kim wrote: > Hello, > > Q1. Is lazy loading ndarray from user defined data supplying function possible? No, not really. > Q2. If possible, how can I implement it? > > > The closest method I can think of is, (which requires c++ posix) > > 1. create a memory region using mmap and protect read operation by mprotect. > 2. add SIGSEGV signal handler to trap read operation on the memory > region, and the handler will provide appropriate user data and recover > from SIGSEGV. > 3. slightly modify memmap class to use the above mmap (memmap is > already using mmap internally, so it's not a big deal) > > but obviously, recovering from SIGSEGV requires removing mprotect (see > http://stackoverflow.com/questions/2663456/write-a-signal-handler-to-catch-sigsegv) > and it's impossible to know when to lock the region by mprotect again. Well, if you're willing to go *that* far, you might was well make a userspace file system with fuse and mmap a file within that. http://fuse.sourceforge.net/ You can even implement it in Python! http://pypi.python.org/pypi/fuse-python http://code.google.com/p/fusepy/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From paul.anton.letnes at gmail.com Fri Aug 26 05:22:56 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Fri, 26 Aug 2011 10:22:56 +0100 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: References: <4E5040DF.9090303@simplistix.co.uk> <4E56977C.3000107@noaa.gov> Message-ID: On 25. aug. 2011, at 23.49, David Warde-Farley wrote: > On 2011-08-25, at 2:42 PM, Chris.Barker wrote: > >> On 8/24/11 9:22 AM, Anthony Scopatz wrote: >>> You can use Python pickling, if you do *not* have a requirement for: >> >> I can't recall why, but it seem pickling of numpy arrays has been >> fragile and not very performant. >> >> I like the npy / npz format, built in to numpy, if you don't need: >> >>> - access from non-Python programs > > While I'm not aware of reader implementations for any other language, NPY is a dirt-simple and well-documented format designed by Robert Kern, and should be readable without too much trouble from any language that supports binary I/O. The full spec is at > > https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt > > It should be especially trivial to read arrays of simple scalar numeric dtypes, but reading compound dtypes is also doable. > > For NPZ, use a standard zip file reading library to access individual files in the archive, which are in .npy format (or just unzip it by hand first -- it's a normal .zip file with a special extension). > > David Out of curiosity: is the .npy format guaranteed to be independent of architecture (endianness and similar issues)? Paul From derek at astro.physik.uni-goettingen.de Fri Aug 26 08:04:20 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 26 Aug 2011 14:04:20 +0200 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: <4E56977C.3000107@noaa.gov> References: <4E5040DF.9090303@simplistix.co.uk> <4E56977C.3000107@noaa.gov> Message-ID: <1D5F9D60-05A9-476F-80D7-DB5E0949BE22@astro.physik.uni-goettingen.de> On 25.08.2011, at 8:42PM, Chris.Barker wrote: > On 8/24/11 9:22 AM, Anthony Scopatz wrote: >> You can use Python pickling, if you do *not* have a requirement for: > > I can't recall why, but it seem pickling of numpy arrays has been > fragile and not very performant. > Hmm, the pure Python version might be, but, I've used cPickle for a long time and never noted any stability problems. And it is still noticeably faster than pytables, in my experience. Still, for the sake of a standardised format I'd go with HDF5 any time now (and usually prefer h5py now when starting anything new - my pytables implementation mentioned above likely is not the most efficient compared to cPickle). But with the usual disclaimers, you should be able to simply use cPickle as a drop-in replacement in the example below. Cheers, Derek On 21.08.2011, at 2:24PM, Pauli Virtanen wrote: > You can use Python pickling, if you do *not* have a requirement for: > > - real persistence, i.e., being able to easily read the data years later > - a standard data format > - access from non-Python programs > - safety against malicious parties (unpickling can execute some code > in the input -- although this is possible to control) > > then you can use Python pickling: > > import pickle > > file = open('out.pck', 'wb') > pickle.dump(file, tree, protocol=pickle.HIGHEST_PROTOCOL) > file.close() > > file = open('out.pck', 'rb') > tree = pickle.load(file) > file.close() From derek at astro.physik.uni-goettingen.de Fri Aug 26 10:09:53 2011 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 26 Aug 2011 16:09:53 +0200 Subject: [Numpy-discussion] array_equal and array_equiv comparison functions for structured arrays Message-ID: <8CD90113-73B8-429D-9EE4-C23C91823CD2@astro.physik.uni-goettingen.de> Hi, as the subject says, the array_* comparison functions currently do not operate on structured/record arrays. Pull request https://github.com/numpy/numpy/pull/146 implements these comparisons. There are two commits, differing in their interpretation whether two arrays with different field names, but identical data, are equivalent; i.e. res = array_equiv(array((1,2), dtype=[('i','i4'),('v','f8')]), array((1,2), dtype=[('n','i4'),('f','f8')])) is True in the current HEAD, but False in its parent. Feedback and additional comments are invited. Cheers, Derek From Chris.Barker at noaa.gov Fri Aug 26 11:51:27 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Fri, 26 Aug 2011 08:51:27 -0700 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: <1D5F9D60-05A9-476F-80D7-DB5E0949BE22@astro.physik.uni-goettingen.de> References: <4E5040DF.9090303@simplistix.co.uk> <4E56977C.3000107@noaa.gov> <1D5F9D60-05A9-476F-80D7-DB5E0949BE22@astro.physik.uni-goettingen.de> Message-ID: <4E57C0FF.8070908@noaa.gov> On 8/26/11 5:04 AM, Derek Homeier wrote: > Hmm, the pure Python version might be, but, I've used cPickle for a long time > and never noted any stability problems. well, here is the NEP: https://github.com/numpy/numpy/blob/master/doc/neps/npy-format.txt It addresses the why's and hows of the format. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From robert.kern at gmail.com Fri Aug 26 12:05:19 2011 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 26 Aug 2011 11:05:19 -0500 Subject: [Numpy-discussion] saving groups of numpy arrays to disk In-Reply-To: <1D5F9D60-05A9-476F-80D7-DB5E0949BE22@astro.physik.uni-goettingen.de> References: <4E5040DF.9090303@simplistix.co.uk> <4E56977C.3000107@noaa.gov> <1D5F9D60-05A9-476F-80D7-DB5E0949BE22@astro.physik.uni-goettingen.de> Message-ID: On Fri, Aug 26, 2011 at 07:04, Derek Homeier wrote: > On 25.08.2011, at 8:42PM, Chris.Barker wrote: > >> On 8/24/11 9:22 AM, Anthony Scopatz wrote: >>> ? ?You can use Python pickling, if you do *not* have a requirement for: >> >> I can't recall why, but it seem pickling of numpy arrays has been >> fragile and not very performant. >> > Hmm, the pure Python version might be, but, I've used cPickle for a long time > and never noted any stability problems. IIRC, there have been one or two releases where we accidentally broke the ability to load some old pickles. I think that's the kind of fragility Chris meant. As for the other kind of stability, we have had, at times, problems passing unpickled arrays to linear algebra functions. This is because the SSE instructions used by the optimized linear algebra package required aligned memory, but the unpickling machinery did not give us such an option. We do some nasty hacks to make unpickling performant. The unpickling machinery reads the actual byte data in as a str object, then passes that to a numpy function to reconstruct the array object. We simply reuse the memory underlying the str object. This is a hack, but it's the only way to avoid copying potentially large amounts of data. This is the cause the unaligned memory. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From mjanikas at esri.com Fri Aug 26 13:10:39 2011 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 26 Aug 2011 10:10:39 -0700 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix Message-ID: Hello All, I am trying to identify columns of a matrix that are perfectly collinear. It is not that difficult to identify when two columns are identical are have zero variance, but I do not know how to ID when the culprit is of a higher order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide a very large condition number.... But they do not tell me which columns are causing the problem. For example: zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], [ 3. , 8. , 0. , 5. , 0. ]]) How can I identify that columns 0,1,2 are the issue because: column 1 + column 2 = column 0? Any input would be greatly appreciated. Thanks much, MJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjanikas at esri.com Fri Aug 26 13:12:20 2011 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 26 Aug 2011 10:12:20 -0700 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: As you will note, since most of the functions work on rows, the matrix in question has been transposed. From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Mark Janikas Sent: Friday, August 26, 2011 10:11 AM To: 'Discussion of Numerical Python' Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix Hello All, I am trying to identify columns of a matrix that are perfectly collinear. It is not that difficult to identify when two columns are identical are have zero variance, but I do not know how to ID when the culprit is of a higher order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide a very large condition number.... But they do not tell me which columns are causing the problem. For example: zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], [ 3. , 8. , 0. , 5. , 0. ]]) How can I identify that columns 0,1,2 are the issue because: column 1 + column 2 = column 0? Any input would be greatly appreciated. Thanks much, MJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Fri Aug 26 13:27:38 2011 From: jsseabold at gmail.com (Skipper Seabold) Date: Fri, 26 Aug 2011 13:27:38 -0400 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas wrote: > Hello All, > > > > I am trying to identify columns of a matrix that are perfectly collinear. > It is not that difficult to identify when two columns are identical are have > zero variance, but I do not know how to ID when the culprit is of a higher > order. i.e. columns 1 + 2 + 3 = column 4.? NUM.corrcoef(matrix.T) will > return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide > a very large condition number?. But they do not tell me which columns are > causing the problem. ??For example: > > > > zt = numpy. array([[ 1.? ,? 1.? ,? 1.? ,? 1.? ,? 1.? ], > > ?????? ????????????????????[ 0.25,? 0.1 ,? 0.2 ,? 0.25,? 0.5 ], > > ?????? ????????????????????[ 0.75,? 0.9 ,? 0.8 ,? 0.75,? 0.5 ], > > ?????? ????????????????????[ 3.? ,? 8.? ,? 0.? ,? 5.? ,? 0.? ]]) > > > > How can I identify that columns 0,1,2 are the issue because: column 1 + > column 2 = column 0? > > > > Any input would be greatly appreciated.? Thanks much, > The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Skipper From mjanikas at esri.com Fri Aug 26 13:34:33 2011 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 26 Aug 2011 10:34:33 -0700 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: I actually use the VIF when the design matrix can be inverted.... I do it the quick and dirty way as opposed to the step regression: 1. Calc the correlation coefficient of the matrix (w/o the intercept) 2. Return the diagonal of the inversion of the correlation matrix in step 1. Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear. MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Skipper Seabold Sent: Friday, August 26, 2011 10:28 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas wrote: > Hello All, > > > > I am trying to identify columns of a matrix that are perfectly collinear. > It is not that difficult to identify when two columns are identical are have > zero variance, but I do not know how to ID when the culprit is of a higher > order. i.e. columns 1 + 2 + 3 = column 4.? NUM.corrcoef(matrix.T) will > return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide > a very large condition number.. But they do not tell me which columns are > causing the problem. ??For example: > > > > zt = numpy. array([[ 1.? ,? 1.? ,? 1.? ,? 1.? ,? 1.? ], > > ?????? ????????????????????[ 0.25,? 0.1 ,? 0.2 ,? 0.25,? 0.5 ], > > ?????? ????????????????????[ 0.75,? 0.9 ,? 0.8 ,? 0.75,? 0.5 ], > > ?????? ????????????????????[ 3.? ,? 8.? ,? 0.? ,? 5.? ,? 0.? ]]) > > > > How can I identify that columns 0,1,2 are the issue because: column 1 + > column 2 = column 0? > > > > Any input would be greatly appreciated.? Thanks much, > The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From mjanikas at esri.com Fri Aug 26 13:41:35 2011 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 26 Aug 2011 10:41:35 -0700 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: I wonder if my last statement is essentially the only answer... which I wanted to avoid... Should I just use combinations of the columns and try and construct the corrcoef() (then ID whether NaNs are present), or use the condition number to ID the singularity? I just wanted to avoid the whole k! algorithm. MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Mark Janikas Sent: Friday, August 26, 2011 10:35 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix I actually use the VIF when the design matrix can be inverted.... I do it the quick and dirty way as opposed to the step regression: 1. Calc the correlation coefficient of the matrix (w/o the intercept) 2. Return the diagonal of the inversion of the correlation matrix in step 1. Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear. MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Skipper Seabold Sent: Friday, August 26, 2011 10:28 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas wrote: > Hello All, > > > > I am trying to identify columns of a matrix that are perfectly collinear. > It is not that difficult to identify when two columns are identical are have > zero variance, but I do not know how to ID when the culprit is of a higher > order. i.e. columns 1 + 2 + 3 = column 4.? NUM.corrcoef(matrix.T) will > return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide > a very large condition number.. But they do not tell me which columns are > causing the problem. ??For example: > > > > zt = numpy. array([[ 1.? ,? 1.? ,? 1.? ,? 1.? ,? 1.? ], > > ?????? ????????????????????[ 0.25,? 0.1 ,? 0.2 ,? 0.25,? 0.5 ], > > ?????? ????????????????????[ 0.75,? 0.9 ,? 0.8 ,? 0.75,? 0.5 ], > > ?????? ????????????????????[ 3.? ,? 8.? ,? 0.? ,? 5.? ,? 0.? ]]) > > > > How can I identify that columns 0,1,2 are the issue because: column 1 + > column 2 = column 0? > > > > Any input would be greatly appreciated.? Thanks much, > The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From fperez.net at gmail.com Fri Aug 26 14:01:46 2011 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 26 Aug 2011 20:01:46 +0200 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: On Fri, Aug 26, 2011 at 7:41 PM, Mark Janikas wrote: > I wonder if my last statement is essentially the only answer... which I wanted to avoid... > > Should I just use combinations of the columns and try and construct the corrcoef() (then ID whether NaNs are present), or use the condition number to ID the singularity? ?I just wanted to avoid the whole k! algorithm. > This is a completely naive, off-the-top of my head reply, so most likely completely wrong. But wouldn't a Gram-Schmidt type process let you identify things here? You're effectively looking for n vectors that belong to an m-dimensional subspace with n>m. As you walk through the G-S process you could probably track the projections and identify when one of the vectors in the m-n set is 'emptied out' by the G-S projections, and would have the info of what it projected into. I don't remember the details of G-S so perhaps there's a really obvious reason why the above is dumb and doesn't work. But just in case it gets you thinking in the right direction... (and I'll learn something from the corrections) Cheers, f From charlesr.harris at gmail.com Fri Aug 26 14:04:07 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Aug 2011 12:04:07 -0600 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: On Fri, Aug 26, 2011 at 11:41 AM, Mark Janikas wrote: > I wonder if my last statement is essentially the only answer... which I > wanted to avoid... > > Should I just use combinations of the columns and try and construct the > corrcoef() (then ID whether NaNs are present), or use the condition number > to ID the singularity? I just wanted to avoid the whole k! algorithm. > > MJ > > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto: > numpy-discussion-bounces at scipy.org] On Behalf Of Mark Janikas > Sent: Friday, August 26, 2011 10:35 AM > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix > > I actually use the VIF when the design matrix can be inverted.... I do it > the quick and dirty way as opposed to the step regression: > > 1. Calc the correlation coefficient of the matrix (w/o the intercept) > 2. Return the diagonal of the inversion of the correlation matrix in step > 1. > > Again, the problem lies in the multiple column relationship... I wouldn't > be able to run sub regressions at all when the columns are perfectly > collinear. > > MJ > > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto: > numpy-discussion-bounces at scipy.org] On Behalf Of Skipper Seabold > Sent: Friday, August 26, 2011 10:28 AM > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix > > On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas wrote: > > Hello All, > > > > > > > > I am trying to identify columns of a matrix that are perfectly collinear. > > It is not that difficult to identify when two columns are identical are > have > > zero variance, but I do not know how to ID when the culprit is of a > higher > > order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will > > return NaNs when the matrix is singular, and LA.cond(matrix.T) will > provide > > a very large condition number.. But they do not tell me which columns are > > causing the problem. For example: > > > > > > > > zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], > > > > [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], > > > > [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], > > > > [ 3. , 8. , 0. , 5. , 0. ]]) > > > > > > > > How can I identify that columns 0,1,2 are the issue because: column 1 + > > column 2 = column 0? > > > > > > > > Any input would be greatly appreciated. Thanks much, > > > > The way that I know to do this in a regression context for (near > perfect) multicollinearity is VIF. It's long been on my todo list for > statsmodels. > > http://en.wikipedia.org/wiki/Variance_inflation_factor > > Maybe there are other ways with decompositions. I'd be happy to hear about > them. > > Please post back if you write any code to do this. > > Why not svd? In [13]: u,d,v = svd(zt) In [14]: d Out[14]: array([ 1.01307066e+01, 1.87795095e+00, 3.03454566e-01, 3.29253945e-16]) In [15]: u[:,3] Out[15]: array([ 0.57735027, -0.57735027, -0.57735027, 0. ]) In [16]: dot(u[:,3], zt) Out[16]: array([ -7.77156117e-16, -6.66133815e-16, -7.21644966e-16, -7.77156117e-16, -8.88178420e-16]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Aug 26 14:13:43 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 26 Aug 2011 14:13:43 -0400 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: On Fri, Aug 26, 2011 at 1:41 PM, Mark Janikas wrote: > I wonder if my last statement is essentially the only answer... which I wanted to avoid... > > Should I just use combinations of the columns and try and construct the corrcoef() (then ID whether NaNs are present), or use the condition number to ID the singularity? ?I just wanted to avoid the whole k! algorithm. > > MJ > > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Mark Janikas > Sent: Friday, August 26, 2011 10:35 AM > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix > > I actually use the VIF when the design matrix can be inverted.... I do it the quick and dirty way as opposed to the step regression: > > 1. Calc the correlation coefficient of the matrix (w/o the intercept) > 2. Return the diagonal of the inversion of the correlation matrix in step 1. > > Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear. > > MJ > > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Skipper Seabold > Sent: Friday, August 26, 2011 10:28 AM > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix > > On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas wrote: >> Hello All, >> >> >> >> I am trying to identify columns of a matrix that are perfectly collinear. >> It is not that difficult to identify when two columns are identical are have >> zero variance, but I do not know how to ID when the culprit is of a higher >> order. i.e. columns 1 + 2 + 3 = column 4.? NUM.corrcoef(matrix.T) will >> return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide >> a very large condition number.. But they do not tell me which columns are >> causing the problem. ??For example: >> >> >> >> zt = numpy. array([[ 1.? ,? 1.? ,? 1.? ,? 1.? ,? 1.? ], >> >> ?????? ????????????????????[ 0.25,? 0.1 ,? 0.2 ,? 0.25,? 0.5 ], >> >> ?????? ????????????????????[ 0.75,? 0.9 ,? 0.8 ,? 0.75,? 0.5 ], >> >> ?????? ????????????????????[ 3.? ,? 8.? ,? 0.? ,? 5.? ,? 0.? ]]) >> >> >> >> How can I identify that columns 0,1,2 are the issue because: column 1 + >> column 2 = column 0? >> >> >> >> Any input would be greatly appreciated.? Thanks much, >> > > The way that I know to do this in a regression context for (near > perfect) multicollinearity is VIF. It's long been on my todo list for > statsmodels. > > http://en.wikipedia.org/wiki/Variance_inflation_factor > > Maybe there are other ways with decompositions. I'd be happy to hear about them. > > Please post back if you write any code to do this. Partial answer in a different context. I have written a function that only adds columns if they maintain invertibility, using brute force: add each column sequentially, check whether the matrix is singular. Don't add the columns that already included as linear combination. (But this doesn't tell which columns are in the colinear vector.) I did this for categorical variables, so sequence was predefined. Just finding a non-singular subspace would be easier, PCA, SVD, or scikits.learn matrix decomposition (?). (factor models and Johansen's cointegration tests are also just doing matrix decomposition that identify subspaces) Maybe rotation in Factor Analysis is able to identify the vectors, but I don't have much idea about that. Josef > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mjanikas at esri.com Fri Aug 26 14:38:28 2011 From: mjanikas at esri.com (Mark Janikas) Date: Fri, 26 Aug 2011 11:38:28 -0700 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: Charles! That looks like it could be a winner! It looks like you always choose the last column of the U matrix and ID the columns that have the same values? It works when I add extra columns as well! BTW, sorry for my lack of knowledge... but what was the point of the dot multiply at the end? That they add up to essentially zero, indicating singularity? Thanks so much! MJ From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Charles R Harris Sent: Friday, August 26, 2011 11:04 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 11:41 AM, Mark Janikas > wrote: I wonder if my last statement is essentially the only answer... which I wanted to avoid... Should I just use combinations of the columns and try and construct the corrcoef() (then ID whether NaNs are present), or use the condition number to ID the singularity? I just wanted to avoid the whole k! algorithm. MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Mark Janikas Sent: Friday, August 26, 2011 10:35 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix I actually use the VIF when the design matrix can be inverted.... I do it the quick and dirty way as opposed to the step regression: 1. Calc the correlation coefficient of the matrix (w/o the intercept) 2. Return the diagonal of the inversion of the correlation matrix in step 1. Again, the problem lies in the multiple column relationship... I wouldn't be able to run sub regressions at all when the columns are perfectly collinear. MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Skipper Seabold Sent: Friday, August 26, 2011 10:28 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Identifying Colinear Columns of a Matrix On Fri, Aug 26, 2011 at 1:10 PM, Mark Janikas > wrote: > Hello All, > > > > I am trying to identify columns of a matrix that are perfectly collinear. > It is not that difficult to identify when two columns are identical are have > zero variance, but I do not know how to ID when the culprit is of a higher > order. i.e. columns 1 + 2 + 3 = column 4. NUM.corrcoef(matrix.T) will > return NaNs when the matrix is singular, and LA.cond(matrix.T) will provide > a very large condition number.. But they do not tell me which columns are > causing the problem. For example: > > > > zt = numpy. array([[ 1. , 1. , 1. , 1. , 1. ], > > [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], > > [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], > > [ 3. , 8. , 0. , 5. , 0. ]]) > > > > How can I identify that columns 0,1,2 are the issue because: column 1 + > column 2 = column 0? > > > > Any input would be greatly appreciated. Thanks much, > The way that I know to do this in a regression context for (near perfect) multicollinearity is VIF. It's long been on my todo list for statsmodels. http://en.wikipedia.org/wiki/Variance_inflation_factor Maybe there are other ways with decompositions. I'd be happy to hear about them. Please post back if you write any code to do this. Why not svd? In [13]: u,d,v = svd(zt) In [14]: d Out[14]: array([ 1.01307066e+01, 1.87795095e+00, 3.03454566e-01, 3.29253945e-16]) In [15]: u[:,3] Out[15]: array([ 0.57735027, -0.57735027, -0.57735027, 0. ]) In [16]: dot(u[:,3], zt) Out[16]: array([ -7.77156117e-16, -6.66133815e-16, -7.21644966e-16, -7.77156117e-16, -8.88178420e-16]) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Fri Aug 26 14:47:59 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Fri, 26 Aug 2011 13:47:59 -0500 Subject: [Numpy-discussion] NA mask C-API documentation In-Reply-To: References: Message-ID: Regarding ufuncs and NA's, all the mechanics of handling NA from a ufunc are in the PyUFunc_FromFuncAndData function, right? So the ufunc creation docs don't have to be updated to include NA's? -Chris JS On Wed, Aug 24, 2011 at 7:08 PM, Mark Wiebe wrote: > I've added C-API documentation to the missingdata branch. The .rst file > (beware of the github rst parser though, it drops some of the content) is > here: > https://github.com/m-paradox/numpy/blob/missingdata/doc/source/reference/c-api.maskna.rst > and I made a small example module which goes with it here: > https://github.com/m-paradox/spdiv > Cheers, > Mark > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Fri Aug 26 14:57:26 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 26 Aug 2011 12:57:26 -0600 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: On Fri, Aug 26, 2011 at 12:38 PM, Mark Janikas wrote: > Charles! That looks like it could be a winner! It looks like you always > choose the last column of the U matrix and ID the columns that have the same > values? It works when I add extra columns as well! BTW, sorry for my lack > of knowledge? but what was the point of the dot multiply at the end? That > they add up to essentially zero, indicating singularity? Thanks so much! > The indicator of collinearity is the singular value in d, the corresponding column in u represent the linear combination of rows that are ~0, the corresponding row in v represents the linear combination of columns that are ~0. If you have several combinations that are ~0, of course you can add them together and get another. Basically, if you take the rows in v corresponding to small singular values, you get a basis for the for the null space of the matrix, the corresponding columns in u are a basis for the orthogonal complement of the range of the matrix. If that is getting a bit technical you can just play around with things. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Fri Aug 26 15:58:42 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 26 Aug 2011 12:58:42 -0700 Subject: [Numpy-discussion] NA mask C-API documentation In-Reply-To: References: Message-ID: On Fri, Aug 26, 2011 at 11:47 AM, Christopher Jordan-Squire wrote: > Regarding ufuncs and NA's, all the mechanics of handling NA from a > ufunc are in the PyUFunc_FromFuncAndData function, right? So the ufunc > creation docs don't have to be updated to include NA's? > That's correct, any ufunc will automatically support NAs with a propagation approach. It's probably worth mentioning this in the ufunc docs. I've added some additional type resolution and loop selection functions, but I'd rather keep them private in NumPy for a version or two so improvements can be made as experience is gained with them. Unfortunately some aspects of this are in public headers because of how the API is designed, ideally more of the classes struct layouts should be hidden from the ABI just as I've done in deprecating that access for PyArrayObject. -Mark > > -Chris JS > > On Wed, Aug 24, 2011 at 7:08 PM, Mark Wiebe wrote: > > I've added C-API documentation to the missingdata branch. The .rst file > > (beware of the github rst parser though, it drops some of the content) is > > here: > > > https://github.com/m-paradox/numpy/blob/missingdata/doc/source/reference/c-api.maskna.rst > > and I made a small example module which goes with it here: > > https://github.com/m-paradox/spdiv > > Cheers, > > Mark > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Aug 26 16:03:21 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 26 Aug 2011 16:03:21 -0400 Subject: [Numpy-discussion] Identifying Colinear Columns of a Matrix In-Reply-To: References: Message-ID: On Fri, Aug 26, 2011 at 2:57 PM, Charles R Harris wrote: > > > On Fri, Aug 26, 2011 at 12:38 PM, Mark Janikas wrote: >> >> Charles!? That looks like it could be a winner!? It looks like you always >> choose the last column of the U matrix and ID the columns that have the same >> values?? It works when I add extra columns as well!? BTW, sorry for my lack >> of knowledge? but what was the point of the dot multiply at the end?? That >> they add up to essentially zero, indicating singularity?? Thanks so much! > > The indicator of collinearity is the singular value in d, the corresponding > column in u represent the linear combination of rows that are ~0, the > corresponding row in v represents the linear combination of columns that are > ~0. If you have several combinations that are ~0, of course you can add them > together and get another. Basically, if you take the rows in v corresponding > to small singular values, you get a basis for the for the null space of the > matrix, the corresponding columns in u are a basis for the orthogonal > complement of the range of the matrix. If that is getting a bit technical > you can just play around with things. Interpretation is a bit difficult if there are more than one zero eigenvalues >>> zt2 = np.vstack((zt, zt[2,:] + zt[3,:])) >>> zt2 array([[ 1. , 1. , 1. , 1. , 1. ], [ 0.25, 0.1 , 0.2 , 0.25, 0.5 ], [ 0.75, 0.9 , 0.8 , 0.75, 0.5 ], [ 3. , 8. , 0. , 5. , 0. ], [ 3.75, 8.9 , 0.8 , 5.75, 0.5 ]]) >>> u,d,v = np.linalg.svd(zt2) >>> d array([ 1.51561431e+01, 1.91327688e+00, 3.25113875e-01, 1.05664844e-15, 5.29054218e-16]) >>> u[:,-2:] array([[ 0.59948553, -0.12496837], [-0.59948553, 0.12496837], [-0.51747833, -0.48188813], [ 0.0820072 , -0.60685651], [-0.0820072 , 0.60685651]]) Josef > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From brett.olsen at gmail.com Fri Aug 26 16:20:20 2011 From: brett.olsen at gmail.com (Brett Olsen) Date: Fri, 26 Aug 2011 15:20:20 -0500 Subject: [Numpy-discussion] How to output array with indexes to a text file? In-Reply-To: <1314299436.18748.19.camel@mattotaupa> References: <1314299436.18748.19.camel@mattotaupa> Message-ID: On Thu, Aug 25, 2011 at 2:10 PM, Paul Menzel wrote: > is there an easy way to also save the indexes of an array (columns, rows > or both) when outputting it to a text file. For saving an array to a > file I only found `savetxt()` [1] which does not seem to have such an > option. Adding indexes manually is doable but I would like to avoid > that. > Is there a way to accomplish that task without reserving the 0th row or > column to store the indexes? > > I want to process these text files to produce graphs and MetaPost?s [2] > graph package needs these indexes. (I know about Matplotlib [3], but I > would like to use MetaPost.) > > > Thanks, > > Paul Why don't you just write a wrapper for numpy.savetxt that adds the indices? E.g.: In [1]: import numpy as N In [2]: a = N.arange(6,12).reshape((2,3)) In [3]: a Out[3]: array([[ 6, 7, 8], [ 9, 10, 11]]) In [4]: def save_with_indices(filename, output): ...: (rows, cols) = output.shape ...: tmp = N.hstack((N.arange(1,rows+1).reshape((rows,1)), output)) ...: tmp = N.vstack((N.arange(cols+1).reshape((1,cols+1)), tmp)) ...: N.savetxt(filename, tmp, fmt='%8i') ...: In [5]: N.savetxt('noidx.txt', a, fmt='%8i') In [6]: save_with_indices('idx.txt', a) 'noidx.txt' looks like: 6 7 8 9 10 11 'idx.txt' looks like: 0 1 2 3 1 6 7 8 2 9 10 11 ~Brett From ralf.gommers at googlemail.com Fri Aug 26 17:52:52 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 26 Aug 2011 23:52:52 +0200 Subject: [Numpy-discussion] the build and installation process In-Reply-To: References: Message-ID: On Thu, Aug 25, 2011 at 2:23 PM, srean wrote: > Hi, > > I would like to know a bit about how the installation process works. Could > you point me to a resource. In particular I want to know how the site.cfg > configuration works. Is it numpy/scipy specific or is it standard with > distutils. I googled for site.cfg and distutils but did not find any > authoritative document. There is not much more than what's described in the site.cfg.example file that's in the numpy source tree root dir. As far as I know the site.cfg name is numpy specific, but python distutils uses a distutils.cfg file in the same format. > > I believe many new users trip up on the installation process, especially in > trying to substitute their favourite library in place os the standard. So a > canonical document explaining the process will be very helpful. > > http://docs.scipy.org/doc/numpy/user/install.html > The most up-to-date descriptions for each OS can be found at http://www.scipy.org/Installing_SciPy > > does cover some of the important points but its a bit sketchy, and has a > "this is all that you need to know" flavor. Doesnt quite enable the reader > to fix his own problems. So a resource that is somewhere in between reading > up all the sources that get invoked during the installation and building, > and the current install document will be very welcome. > > English is not my native language, but if there is anyway I can help, I > would do so gladly. > If the above docs don't help as much as you'd want, please point out the most problematic points. The install instructions are a wiki so you can make changes yourself. Especially about things like linking to specific versions of MKL there's not enough or outdated info, any contributions there will be very useful. Cheers, Ralf > -- srean > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sat Aug 27 06:28:14 2011 From: cournape at gmail.com (David Cournapeau) Date: Sat, 27 Aug 2011 12:28:14 +0200 Subject: [Numpy-discussion] Removing numscons, adding bento scripts to main branch ? Message-ID: Hi there, I am finally at a stage where bento can do most of what numscons could do. I would rather avoid having 3 different set of build scripts (distutils+bento+numscons) to maintain in the long term, so I would favor removing numscons scripts from numpy and scipy. I was thinking about keeping maybe numscons scripts for one release for both numpy/scipy, with a warning about their deprecation, and then removing them one release later. Does that sound ok with everyone ? cheers, David From ralf.gommers at googlemail.com Sat Aug 27 07:31:17 2011 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 27 Aug 2011 13:31:17 +0200 Subject: [Numpy-discussion] Removing numscons, adding bento scripts to main branch ? In-Reply-To: References: Message-ID: On Sat, Aug 27, 2011 at 12:28 PM, David Cournapeau wrote: > Hi there, > > I am finally at a stage where bento can do most of what numscons could > do. I would rather avoid having 3 different set of build scripts > (distutils+bento+numscons) to maintain in the long term, so I would > favor removing numscons scripts from numpy and scipy. > > That's awesome! > I was thinking about keeping maybe numscons scripts for one release > for both numpy/scipy, with a warning about their deprecation, and then > removing them one release later. > > Does that sound ok with everyone ? > > Sounds like the right thing to do. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Aug 27 08:30:25 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 27 Aug 2011 06:30:25 -0600 Subject: [Numpy-discussion] Removing numscons, adding bento scripts to main branch ? In-Reply-To: References: Message-ID: On Sat, Aug 27, 2011 at 4:28 AM, David Cournapeau wrote: > Hi there, > > I am finally at a stage where bento can do most of what numscons could > do. I would rather avoid having 3 different set of build scripts > (distutils+bento+numscons) to maintain in the long term, so I would > favor removing numscons scripts from numpy and scipy. > > I was thinking about keeping maybe numscons scripts for one release > for both numpy/scipy, with a warning about their deprecation, and then > removing them one release later. > > Does that sound ok with everyone ? > > Sounds good. The numscons scripts don't work for python3 builds anyway. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Aug 27 12:13:29 2011 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 27 Aug 2011 09:13:29 -0700 Subject: [Numpy-discussion] Removing numscons, adding bento scripts to main branch ? In-Reply-To: References: Message-ID: On Sat, Aug 27, 2011 at 3:28 AM, David Cournapeau wrote: > Hi there, > > I am finally at a stage where bento can do most of what numscons could > do. I would rather avoid having 3 different set of build scripts > (distutils+bento+numscons) to maintain in the long term, so I would > favor removing numscons scripts from numpy and scipy. > > I was thinking about keeping maybe numscons scripts for one release > for both numpy/scipy, with a warning about their deprecation, and then > removing them one release later. > > Does that sound ok with everyone ? > Sounds great to me! -Mark > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From teoliphant at gmail.com Sat Aug 27 13:53:01 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Sat, 27 Aug 2011 12:53:01 -0500 Subject: [Numpy-discussion] Removing numscons, adding bento scripts to main branch ? In-Reply-To: References: Message-ID: <0D65F094-742E-4379-B953-5231BDD26EBE@enthought.com> Three cheers! -Travis On Aug 27, 2011, at 5:28 AM, David Cournapeau wrote: > Hi there, > > I am finally at a stage where bento can do most of what numscons could > do. I would rather avoid having 3 different set of build scripts > (distutils+bento+numscons) to maintain in the long term, so I would > favor removing numscons scripts from numpy and scipy. > > I was thinking about keeping maybe numscons scripts for one release > for both numpy/scipy, with a warning about their deprecation, and then > removing them one release later. > > Does that sound ok with everyone ? > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From teoliphant at gmail.com Sat Aug 27 13:56:26 2011 From: teoliphant at gmail.com (Travis Oliphant) Date: Sat, 27 Aug 2011 12:56:26 -0500 Subject: [Numpy-discussion] Removing numscons, adding bento scripts to main branch ? In-Reply-To: References: Message-ID: <92EE6D63-023A-498F-AF4E-45453AF2B986@enthought.com> Three cheers! Thanks David, -Travis On Aug 27, 2011, at 5:28 AM, David Cournapeau wrote: > Hi there, > > I am finally at a stage where bento can do most of what numscons could > do. I would rather avoid having 3 different set of build scripts > (distutils+bento+numscons) to maintain in the long term, so I would > favor removing numscons scripts from numpy and scipy. > > I was thinking about keeping maybe numscons scripts for one release > for both numpy/scipy, with a warning about their deprecation, and then > removing them one release later. > > Does that sound ok with everyone ? > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com From cjordan1 at uw.edu Sat Aug 27 14:08:25 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Sat, 27 Aug 2011 14:08:25 -0400 Subject: [Numpy-discussion] load from text files Pull Request Review Message-ID: Hi-- I've submitted a pull request for a new method for loading data from text files into a record array/masked record array. https://github.com/numpy/numpy/pull/143 Click on the link for more info, but the general idea is to create a regular expression for what entries should look like and loop over the file, updating the regular expression if it's wrong. Once the types are determined the file is loaded line by line into a pre-allocated numpy array. Compared to genfromtxt this function has several advantages/potential advantages. *More modular (genfromtxt is a rather large, nearly 500 line, monolithic function. In my pull request no individual method is longer than around 80 lines, and they're fairly self-contained.) *delimiters can be specified via regex's *missing data can be specified via regex's *it's bit simpler and has sensible defaults *it actually works on some (unfortunately proprietary) data that genfromtxt doesn't seem robust enough for *it supports datetimes *fairly extensible for the power user *makes two passes through the file, the first to determine types/sizes for strings and the second to read in the data, and pre-allocates the array for the second pass. So no giant memory bloating for reading large text files *fairly fast, though I think there is plenty of room for optimizations All that said, it's entirely possible that the innards which determine the type should be ripped out and submitted as a function on their own. I'd love suggestions for improvements, as well as suggestions for a better name. (Currently it's called loadtable, which I don't really like. It was just a working name.) -Chris Jordan-Squire From dominique.orban at gmail.com Sat Aug 27 16:09:11 2011 From: dominique.orban at gmail.com (Dominique Orban) Date: Sat, 27 Aug 2011 20:09:11 +0000 Subject: [Numpy-discussion] numpy.log does not raise exceptions Message-ID: Hi, I'm wondering why numpy.log doesn't raise a ValueError exception the way math.log does: 1< import numpy as np 2< np.log([-1]) Warning: invalid value encountered in log 2> array([ nan]) 3< import math 4< math.log(-1) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) It would make it a lot easier to trap domain errors than using isnan(). Thanks, -- Dominique From robert.kern at gmail.com Sat Aug 27 16:37:14 2011 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 27 Aug 2011 15:37:14 -0500 Subject: [Numpy-discussion] numpy.log does not raise exceptions In-Reply-To: References: Message-ID: On Sat, Aug 27, 2011 at 15:09, Dominique Orban wrote: > Hi, > > I'm wondering why numpy.log doesn't raise a ValueError exception the > way math.log does: > > 1< import numpy as np > 2< np.log([-1]) > Warning: invalid value encountered in log > 2> array([ nan]) > > 3< import math > 4< math.log(-1) > --------------------------------------------------------------------------- > ValueError ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last) > > It would make it a lot easier to trap domain errors than using isnan(). The reason we don't raise exceptions by default is because when processing large arrays, you usually don't want to cancel the whole operation just because some values were out of the domain. You would rather get an array with NaNs in the elements that had invalid inputs so you can do something useful with the other elements and actually track down where the NaNs got their bad inputs. Always raising an exception destroys that information. That said, if you do want to raise an exception, this is entirely configurable. http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html [~] |1> import numpy as np [~] |2> np.log([-1]) Warning: invalid value encountered in log array([ nan]) [~] |3> np.seterr(invalid='raise') {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': 'ignore'} [~] |4> np.log([-1]) --------------------------------------------------------------------------- FloatingPointError Traceback (most recent call last) /Users/rkern/ in () ----> 1 np.log([-1]) FloatingPointError: invalid value encountered in log -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From pav at iki.fi Sat Aug 27 18:02:11 2011 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 27 Aug 2011 22:02:11 +0000 (UTC) Subject: [Numpy-discussion] Removing numscons, adding bento scripts to main branch ? References: Message-ID: Hey, Sat, 27 Aug 2011 12:28:14 +0200, David Cournapeau wrote: > I am finally at a stage where bento can do most of what numscons could > do. I would rather avoid having 3 different set of build scripts > (distutils+bento+numscons) to maintain in the long term, so I would > favor removing numscons scripts from numpy and scipy. > > I was thinking about keeping maybe numscons scripts for one release for > both numpy/scipy, with a warning about their deprecation, and then > removing them one release later. Definite +1 from me! Pauli From dominique.orban at gmail.com Sun Aug 28 12:36:27 2011 From: dominique.orban at gmail.com (dpo) Date: Sun, 28 Aug 2011 09:36:27 -0700 (PDT) Subject: [Numpy-discussion] numpy.log does not raise exceptions In-Reply-To: References: Message-ID: <32352209.post@talk.nabble.com> Robert Kern-2 wrote: > > The reason we don't raise exceptions by default is because when > processing large arrays, you usually don't want to cancel the whole > operation just because some values were out of the domain. You would > rather get an array with NaNs in the elements that had invalid inputs > so you can do something useful with the other elements and actually > track down where the NaNs got their bad inputs. Always raising an > exception destroys that information. > > That said, if you do want to raise an exception, this is entirely > configurable. > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.seterr.html > > [~] > |1> import numpy as np > > [~] > |2> np.log([-1]) > Warning: invalid value encountered in log > array([ nan]) > > [~] > |3> np.seterr(invalid='raise') > {'divide': 'print', 'invalid': 'print', 'over': 'print', 'under': > 'ignore'} > > [~] > |4> np.log([-1]) > --------------------------------------------------------------------------- > FloatingPointError Traceback (most recent call > last) > /Users/rkern/ in () > ----> 1 np.log([-1]) > > FloatingPointError: invalid value encountered in log > Excellent, thanks. I was hoping it would be configurable. Dominique -- View this message in context: http://old.nabble.com/numpy.log-does-not-raise-exceptions-tp32348907p32352209.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From rpmuller at gmail.com Mon Aug 29 10:56:31 2011 From: rpmuller at gmail.com (Rick Muller) Date: Mon, 29 Aug 2011 08:56:31 -0600 Subject: [Numpy-discussion] Eigenvalues did not converge Message-ID: I'm bumping into the old "Eigenvalues did not converge" error using numpy.linalg.eigh() on several different linux builds of numpy (1.4.1). The matrix is 166x166. I can compute the eigenvalues on a Macintosh build of numpy, and I can confirm that there aren't degenerate eigenvalues, and that the matrix appears to be negative definite. I've seen this before (though not for several years), and what I normally do is to build lapack with -O0. This trick did not work in the current instance. Does anyone have any tricks to getting eigh to work? Other weird things that I've noticed about this case: I can compute the eigenvalues using eigvals and eigvalsh, and can compute the eigenvals/vecs using eig(). The matrix is real symmetric, and I've tested that it's symmetric enough by forcibly symmetrizing it. Thanks in advance for any help you can offer. -- Rick Muller rpmuller at gmail.com 505-750-7557 -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhanjal at telecom-paristech.fr Mon Aug 29 11:21:05 2011 From: dhanjal at telecom-paristech.fr (Charanpal Dhanjal) Date: Mon, 29 Aug 2011 16:21:05 +0100 Subject: [Numpy-discussion] Eigenvalues did not converge In-Reply-To: References: Message-ID: <9992db82607f9fe061235882402b64f7@telecom-paristech.fr> I posted a similar question about the non-convergence of numpy.linalg.svd a few weeks ago. I'm not sure I can help but I wonder if you compiled numpy with ATLAS/MKL support (try numpy.show_config()) and whether it made a difference? Also what is the condition number and Frobenius norm of the matrix in question? Charanpal On Mon, 29 Aug 2011 08:56:31 -0600, Rick Muller wrote: > Im bumping into the old "Eigenvalues did not converge" error using > numpy.linalg.eigh() on several different linux builds of numpy > (1.4.1). The matrix is 166x166. I can compute the eigenvalues on a > Macintosh build of numpy, and I can confirm that there arent > degenerate eigenvalues, and that the matrix appears to be negative > definite. > > Ive seen this before (though not for several years), and what I > normally do is to build lapack with -O0. This trick did not work in > the current instance. Does anyone have any tricks to getting eigh to > work? > > Other weird things that Ive noticed about this case: I can compute > the eigenvalues using eigvals and eigvalsh, and can compute the > eigenvals/vecs using eig(). The matrix is real symmetric, and Ive > tested that its symmetric enough by forcibly symmetrizing it. > > Thanks in advance for any help you can offer. From paul.anton.letnes at gmail.com Mon Aug 29 11:31:09 2011 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Mon, 29 Aug 2011 16:31:09 +0100 Subject: [Numpy-discussion] Eigenvalues did not converge In-Reply-To: <9992db82607f9fe061235882402b64f7@telecom-paristech.fr> References: <9992db82607f9fe061235882402b64f7@telecom-paristech.fr> Message-ID: I recently got into trouble with these calculations (although I used scipy). I actually got segfaults and "bus errors". The solution for me was to not link against ATLAS, but rather link against Apple's blas/lapack libraries. That got everything working again. I would suggest trying to install against something other than ATLAS and see if that helps (or, more generally, determining which blas/lapack you are linking against, and try something else). Paul On 29. aug. 2011, at 16.21, Charanpal Dhanjal wrote: > I posted a similar question about the non-convergence of > numpy.linalg.svd a few weeks ago. I'm not sure I can help but I wonder > if you compiled numpy with ATLAS/MKL support (try numpy.show_config()) > and whether it made a difference? Also what is the condition number and > Frobenius norm of the matrix in question? > > Charanpal > > On Mon, 29 Aug 2011 08:56:31 -0600, Rick Muller wrote: >> Im bumping into the old "Eigenvalues did not converge" error using >> numpy.linalg.eigh() on several different linux builds of numpy >> (1.4.1). The matrix is 166x166. I can compute the eigenvalues on a >> Macintosh build of numpy, and I can confirm that there arent >> degenerate eigenvalues, and that the matrix appears to be negative >> definite. >> >> Ive seen this before (though not for several years), and what I >> normally do is to build lapack with -O0. This trick did not work in >> the current instance. Does anyone have any tricks to getting eigh to >> work? >> >> Other weird things that Ive noticed about this case: I can compute >> the eigenvalues using eigvals and eigvalsh, and can compute the >> eigenvals/vecs using eig(). The matrix is real symmetric, and Ive >> tested that its symmetric enough by forcibly symmetrizing it. >> >> Thanks in advance for any help you can offer. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From marquett at iap.fr Tue Aug 30 04:46:24 2011 From: marquett at iap.fr (Marquette Jean-Baptiste) Date: Tue, 30 Aug 2011 10:46:24 +0200 Subject: [Numpy-discussion] A question about dtype syntax Message-ID: Hi all, I have this piece of code: Stats = [CatBase, round(stats.mean(Data.Ra), 5), round(stats.mean(Data.Dec), 5), len(Sep), round(stats.mean(Sep),4), round(stats.stdev(Sep),4)] print Stats if First: StatsAll = np.array(np.asarray(Stats), dtype=('a11, f8, f8, i4, f8, f8')) First = False else: StatsAll = np.vstack((StatsAll, np.asarray(Stats))) print len(StatsAll) This yields the error: ['bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14] Traceback (most recent call last): File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 40, in StatsAll = np.array(np.asarray(Stats), dtype=('a11, f8, f8, i4, f8, f8')) ValueError: could not convert string to float: bs3000k.cat What's wrong ? Thanks for your help Cheers JB -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue Aug 30 05:50:48 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 30 Aug 2011 11:50:48 +0200 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: References: Message-ID: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> On Aug 30, 2011, at 10:46 AM, Marquette Jean-Baptiste wrote: > Hi all, > > I have this piece of code: > > Stats = [CatBase, round(stats.mean(Data.Ra), 5), round(stats.mean(Data.Dec), 5), len(Sep), round(stats.mean(Sep),4), round(stats.stdev(Sep),4)] > print Stats > if First: > StatsAll = np.array(np.asarray(Stats), dtype=('a11, f8, f8, i4, f8, f8')) > First = False > else: > StatsAll = np.vstack((StatsAll, np.asarray(Stats))) > print len(StatsAll) > > This yields the error: > > ['bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14] > Traceback (most recent call last): > File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 40, in > StatsAll = np.array(np.asarray(Stats), dtype=('a11, f8, f8, i4, f8, f8')) > ValueError: could not convert string to float: bs3000k.cat > > What's wrong ? My guess: Stats is a list of 5 elements, but you want a list of 1 5-element tuple to match the type. > Stats = [(CatBase, round(stats.mean(Data.Ra), 5), round(stats.mean(Data.Dec), 5), len(Sep), round(stats.mean(Sep),4), round(stats.stdev(Sep),4),)] From qisheng at multicorewareinc.com Tue Aug 30 05:51:28 2011 From: qisheng at multicorewareinc.com (Qisheng Yang) Date: Tue, 30 Aug 2011 17:51:28 +0800 Subject: [Numpy-discussion] Want to find a scientific app using NumPy to process large set of data, say more than 1000000 elements in ndarray. Message-ID: Hello, All As the subject say, I want to exercise *multiprocessing *module in NumPy in order to take advantage of multi-cores. A project which processing large set of data will be useful to compare single thread with multi-thread. I have reviewed some projects using NumPy/SciPy list on SciPy homepage. But I haven't yet found a project which using NumPy ufunc to process large set of data. Any suggestions would be greatly appreciated. Thanks much. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Aug 30 08:30:54 2011 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 30 Aug 2011 08:30:54 -0400 Subject: [Numpy-discussion] wierd numpy.void behavior Message-ID: I've encountered something weird about numpy.void. arr = np.empty ((len(results),), dtype=[('deltaf', float), ('quantize', [('int', int), ('frac', int)])]) for i,r in enumerate (results): arr[i] = (r[0]['deltaf'], tuple(r[0]['quantize_mf'])) from collections import defaultdict, namedtuple experiments = defaultdict(list) testcase = namedtuple ('testcase', ['quantize']) for e in arr: experiments[testcase(e['quantize'])].append (e) Now it seems that when e['quantize'] is used as a dictionary key, equal values are not compared as equal: In [36]: experiments Out[36]: defaultdict(, {testcase(quantize=(0, 0)): [(1.25, (0, 0))], testcase(quantize=(0, 0)): [(1.25, (0, 0))], testcase(quantize=(0, 0)): [(1.25, (0, 0))]}) See, there are 3 'testcases' inserted, all with keys quantize=(0,0). In [37]: e['quantize'] Out[37]: (0, 0) In [38]: type(e['quantize']) Out[38]: There's something weird here. If instead I do: for e in arr: experiments[testcase(tuple(e['quantize']))].append (e) that is, convert e['quantize'] to a tuple before using it as a key, I get the expected behavior: In [40]: experiments Out[40]: defaultdict(, {testcase(quantize=(0, 0)): [(1.25, (0, 0)), (1.25, (0, 0)), (1.25, (0, 0))]}) From shish at keba.be Tue Aug 30 09:09:50 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 30 Aug 2011 09:09:50 -0400 Subject: [Numpy-discussion] wierd numpy.void behavior In-Reply-To: References: Message-ID: It looks like numpy.void does not properly implement __hash__: In [35]: arr[0]['quantize'] == arr[1]['quantize'] Out[35]: True In [34]: hash(arr[0]['quantize']) == hash(arr[1]['quantize']) Out[34]: False I'm not familiar enough with this kind of data type to tell you if you are using it as it should be used though. Maybe such data is not supposed to be hashed (but then shouldn'it it raise an exception?). -=- Olivier 2011/8/30 Neal Becker > I've encountered something weird about numpy.void. > > arr = np.empty ((len(results),), dtype=[('deltaf', float), > ('quantize', [('int', int), ('frac', > int)])]) > > for i,r in enumerate (results): > arr[i] = (r[0]['deltaf'], > tuple(r[0]['quantize_mf'])) > > > from collections import defaultdict, namedtuple > experiments = defaultdict(list) > > testcase = namedtuple ('testcase', ['quantize']) > > for e in arr: > experiments[testcase(e['quantize'])].append (e) > > Now it seems that when e['quantize'] is used as a dictionary key, equal > values > are not compared as equal: > > In [36]: experiments > Out[36]: defaultdict(, {testcase(quantize=(0, 0)): [(1.25, (0, > 0))], testcase(quantize=(0, 0)): [(1.25, (0, 0))], testcase(quantize=(0, > 0)): > [(1.25, (0, 0))]}) > > See, there are 3 'testcases' inserted, all with keys quantize=(0,0). > > In [37]: e['quantize'] > Out[37]: (0, 0) > > In [38]: type(e['quantize']) > Out[38]: > > There's something weird here. If instead I do: > > for e in arr: > experiments[testcase(tuple(e['quantize']))].append (e) > > that is, convert e['quantize'] to a tuple before using it as a key, I get > the > expected behavior: > > In [40]: experiments > Out[40]: defaultdict(, {testcase(quantize=(0, 0)): [(1.25, (0, > 0)), > (1.25, (0, 0)), (1.25, (0, 0))]}) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmoores at gmail.com Tue Aug 30 09:47:22 2011 From: rdmoores at gmail.com (Richard D. Moores) Date: Tue, 30 Aug 2011 06:47:22 -0700 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install Message-ID: Python 3.2, 64-bit Win 7 When I try to install numpy-1.6.1.win32-py3.2.exe (md5) I get "Python version 3.2 required, which was not found in the registry". What to do? Thanks, Dick Moores From shish at keba.be Tue Aug 30 09:50:47 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 30 Aug 2011 09:50:47 -0400 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: win32 = 32 bit Python. That's probably the issue. -=- Olivier 2011/8/30 Richard D. Moores > Python 3.2, 64-bit Win 7 > > When I try to install numpy-1.6.1.win32-py3.2.exe (md5) I get "Python > version 3.2 required, which was not found in the registry". What to > do? > > Thanks, > > Dick Moores > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Aug 30 09:53:54 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 30 Aug 2011 07:53:54 -0600 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 7:47 AM, Richard D. Moores wrote: > Python 3.2, 64-bit Win 7 > > When I try to install numpy-1.6.1.win32-py3.2.exe (md5) I get "Python > version 3.2 required, which was not found in the registry". What to > do? > > Did you already install python from python.org ? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmoores at gmail.com Tue Aug 30 09:56:27 2011 From: rdmoores at gmail.com (Richard D. Moores) Date: Tue, 30 Aug 2011 06:56:27 -0700 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 06:53, Charles R Harris wrote: > > > On Tue, Aug 30, 2011 at 7:47 AM, Richard D. Moores > wrote: >> >> Python 3.2, 64-bit Win 7 >> >> When I try to install numpy-1.6.1.win32-py3.2.exe (md5) I get "Python >> version 3.2 required, which was not found in the registry". What to >> do? >> > > Did you already install python from python.org? Yes. Dick From bsouthey at gmail.com Tue Aug 30 10:19:54 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 30 Aug 2011 09:19:54 -0500 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 8:56 AM, Richard D. Moores wrote: > On Tue, Aug 30, 2011 at 06:53, Charles R Harris > wrote: >> >> >> On Tue, Aug 30, 2011 at 7:47 AM, Richard D. Moores >> wrote: >>> >>> Python 3.2, 64-bit Win 7 >>> >>> When I try to install numpy-1.6.1.win32-py3.2.exe (md5) I get "Python >>> version 3.2 required, which was not found in the registry". What to >>> do? >>> >> >> Did you already install python from python.org? > > Yes. > > Dick > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Where did you get that file from? The official file is called: numpy-1.6.1-win32-superpack-python3.2.exe (http://sourceforge.net/projects/numpy/files/NumPy/1.6.1/) Nor does it seem to be one of Christoph's as those have names like 'numpy-unoptimized-1.6.1.win32-py3.2.?exe' http://www.lfd.uci.edu/~gohlke/pythonlibs/ As Olivier indicated, this is for a 32-bit install of Python 3.2 and you do not have a 32-bit version of Python installed. I just confirmed that under my 64-bit Windows 7 system: Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] on win32 Type "copyright", "credits" or "license()" for more information. >>> import numpy >>> numpy.test() Running unit tests for numpy NumPy version 1.6.1 NumPy is installed in C:\Python32\lib\site-packages\numpy Python version 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit (Intel)] nose version 1.0.0 ..... Bruce From johann.cohentanugi at gmail.com Tue Aug 30 10:33:05 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Tue, 30 Aug 2011 16:33:05 +0200 Subject: [Numpy-discussion] numpy oddity Message-ID: <4E5CF4A1.7090505@gmail.com> I have numpy version 1.6.1 and I see the following behavior : In [380]: X Out[380]: 1.0476157527896641 In [381]: X.__class__ Out[381]: numpy.float64 In [382]: (2,3)*X Out[382]: (2, 3) In [383]: (2,3)/X Out[383]: array([ 1.90909691, 2.86364537]) In [384]: X=float(X) In [385]: (2,3)/X --------------------------------------------------------------------------- TypeError Traceback (most recent call last) /home/cohen/ in () ----> 1 (2,3)/X TypeError: unsupported operand type(s) for /: 'tuple' and 'float' So it appears that X being a numpy float allows numpy to play some trick on the tuple so that division becomes possible, which regular built-in float does not allow arithmetics with tuples. But why is multiplication with "*" not following the same prescription? best, Johann From charlesr.harris at gmail.com Tue Aug 30 10:52:09 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 30 Aug 2011 08:52:09 -0600 Subject: [Numpy-discussion] numpy oddity In-Reply-To: <4E5CF4A1.7090505@gmail.com> References: <4E5CF4A1.7090505@gmail.com> Message-ID: On Tue, Aug 30, 2011 at 8:33 AM, Johann Cohen-Tanugi < johann.cohentanugi at gmail.com> wrote: > I have numpy version 1.6.1 and I see the following behavior : > > In [380]: X > Out[380]: 1.0476157527896641 > > In [381]: X.__class__ > Out[381]: numpy.float64 > > In [382]: (2,3)*X > Out[382]: (2, 3) > > In [383]: (2,3)/X > Out[383]: array([ 1.90909691, 2.86364537]) > > In [384]: X=float(X) > > In [385]: (2,3)/X > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > /home/cohen/ in () > ----> 1 (2,3)/X > > TypeError: unsupported operand type(s) for /: 'tuple' and 'float' > > > So it appears that X being a numpy float allows numpy to play some trick > on the tuple so that division becomes possible, which regular built-in > float does not allow arithmetics with tuples. > But why is multiplication with "*" not following the same prescription? > > That's strange. In [16]: x = float64(2.1) In [17]: (2,3)*x Out[17]: (2, 3, 2, 3) In [18]: (2,3)/x Out[18]: array([ 0.95238095, 1.42857143]) Note that in the first case x is treated like an integer. In the second the tuple is turned into an array. I think both of these cases should raise exceptions. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shish at keba.be Tue Aug 30 11:01:53 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 30 Aug 2011 11:01:53 -0400 Subject: [Numpy-discussion] numpy oddity In-Reply-To: References: <4E5CF4A1.7090505@gmail.com> Message-ID: 2011/8/30 Charles R Harris > > > On Tue, Aug 30, 2011 at 8:33 AM, Johann Cohen-Tanugi < > johann.cohentanugi at gmail.com> wrote: > >> I have numpy version 1.6.1 and I see the following behavior : >> >> In [380]: X >> Out[380]: 1.0476157527896641 >> >> In [381]: X.__class__ >> Out[381]: numpy.float64 >> >> In [382]: (2,3)*X >> Out[382]: (2, 3) >> >> In [383]: (2,3)/X >> Out[383]: array([ 1.90909691, 2.86364537]) >> >> In [384]: X=float(X) >> >> In [385]: (2,3)/X >> >> --------------------------------------------------------------------------- >> TypeError Traceback (most recent call >> last) >> /home/cohen/ in () >> ----> 1 (2,3)/X >> >> TypeError: unsupported operand type(s) for /: 'tuple' and 'float' >> >> >> So it appears that X being a numpy float allows numpy to play some trick >> on the tuple so that division becomes possible, which regular built-in >> float does not allow arithmetics with tuples. >> But why is multiplication with "*" not following the same prescription? >> >> > That's strange. > > In [16]: x = float64(2.1) > > In [17]: (2,3)*x > Out[17]: (2, 3, 2, 3) > > In [18]: (2,3)/x > Out[18]: array([ 0.95238095, 1.42857143]) > > Note that in the first case x is treated like an integer. In the second the > tuple is turned into an array. I think both of these cases should raise > exceptions. > > Chuck > > > The tuple does not know what to do with /, so Python asks the numpy float if it can do something when dividing a tuple, and numpy implements this (see http://docs.python.org/reference/datamodel.html?highlight=radd#object.__radd__for how reflected operands work). That part makes sense to me. The behavior with * doesn't though, it definitely seems wrong. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmoores at gmail.com Tue Aug 30 11:48:42 2011 From: rdmoores at gmail.com (Richard D. Moores) Date: Tue, 30 Aug 2011 08:48:42 -0700 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 07:19, Bruce Southey wrote: > On Tue, Aug 30, 2011 at 8:56 AM, Richard D. Moores wrote: >> On Tue, Aug 30, 2011 at 06:53, Charles R Harris >> wrote: >>> >>> >>> On Tue, Aug 30, 2011 at 7:47 AM, Richard D. Moores >>> wrote: >>>> >>>> Python 3.2, 64-bit Win 7 >>>> >>>> When I try to install numpy-1.6.1.win32-py3.2.exe (md5) I get "Python >>>> version 3.2 required, which was not found in the registry". What to >>>> do? >> > Where did you get that file from? from , I believe, but right now the numpy link on that page times out. > > The official file is called: > numpy-1.6.1-win32-superpack-python3.2.exe > p://sourceforge.ne(httt/projects/numpy/files/NumPy/1.6.1/) > Nor does it seem to be one of Christoph's as those have names like > 'numpy-unoptimized-1.6.1.win32-py3.2.?exe' > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > As Olivier indicated, this is for a 32-bit install of Python 3.2 and > you do not have a 32-bit version of Python installed. I just confirmed > that under my 64-bit Windows 7 system: > Python 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 bit > (Intel)] on win32 > Type "copyright", "credits" or "license()" for more information. >>>> import numpy >>>> numpy.test() > Running unit tests for numpy > NumPy version 1.6.1 > NumPy is installed in C:\Python32\lib\site-packages\numpy > Python version 3.2 (r32:88445, Feb 20 2011, 21:29:02) [MSC v.1500 32 > bit (Intel)] > nose version 1.0.0 So there is no 64-bit 3.x numpy? Is it possible to install 32-bit Python 3.2 on 64-bit Win 7 (you seem to have done so), so I could use numpy? Dick From shish at keba.be Tue Aug 30 11:51:21 2011 From: shish at keba.be (Olivier Delalleau) Date: Tue, 30 Aug 2011 11:51:21 -0400 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: 2011/8/30 Richard D. Moores > Is it possible to install 32-bit > Python 3.2 on 64-bit Win 7 (you seem to have done so), so I could use > numpy? > > Yes you can insteall Python 32 bit on 64 bit Windows. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmoores at gmail.com Tue Aug 30 12:01:35 2011 From: rdmoores at gmail.com (Richard D. Moores) Date: Tue, 30 Aug 2011 09:01:35 -0700 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 08:51, Olivier Delalleau wrote: > > 2011/8/30 Richard D. Moores >> >> Is it possible to install 32-bit >> Python 3.2 on 64-bit Win 7 (you seem to have done so), so I could use >> numpy? >> > > Yes you can insteall Python 32 bit on 64 bit Windows. Thanks. Would doing so leave my 64-bit Python 3.2 intact, so I could switch to the 32-bit only to install and use numpy? Dick > > -=- Olivier > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Tue Aug 30 12:09:21 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 30 Aug 2011 10:09:21 -0600 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 10:01 AM, Richard D. Moores wrote: > On Tue, Aug 30, 2011 at 08:51, Olivier Delalleau wrote: > > > > 2011/8/30 Richard D. Moores > >> > >> Is it possible to install 32-bit > >> Python 3.2 on 64-bit Win 7 (you seem to have done so), so I could use > >> numpy? > >> > > > > Yes you can insteall Python 32 bit on 64 bit Windows. > > Thanks. Would doing so leave my 64-bit Python 3.2 intact, so I could > switch to the 32-bit only to install and use numpy? > > You might want to try the win64 packages here. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Tue Aug 30 12:21:00 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Tue, 30 Aug 2011 09:21:00 -0700 Subject: [Numpy-discussion] load from text files Pull Request Review In-Reply-To: References: Message-ID: <4E5D0DEC.2070507@noaa.gov> On 8/27/11 11:08 AM, Christopher Jordan-Squire wrote: > I've submitted a pull request for a new method for loading data from > text files into a record array/masked record array. > Click on the link for more info, but the general idea is to create a > regular expression for what entries should look like and loop over the > file, updating the regular expression if it's wrong. Once the types > are determined the file is loaded line by line into a pre-allocated > numpy array. nice stuff. Have you looked at my "accumulator" class, rather than pre-allocating? Less the class itself than that ideas behind it. It's easy enough to do, and would keep you from having to run through the file twice. The cost of memory re-allocation as the array grows is very small. I've posted the code recently, but let me know if you want it again. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From rdmoores at gmail.com Tue Aug 30 12:27:54 2011 From: rdmoores at gmail.com (Richard D. Moores) Date: Tue, 30 Aug 2011 09:27:54 -0700 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 09:09, Charles R Harris wrote: > > > On Tue, Aug 30, 2011 at 10:01 AM, Richard D. Moores > wrote: >> >> On Tue, Aug 30, 2011 at 08:51, Olivier Delalleau wrote: >> > >> > 2011/8/30 Richard D. Moores >> >> >> >> Is it possible to install 32-bit >> >> Python 3.2 on 64-bit Win 7 (you seem to have done so), so I could use >> >> numpy? >> >> >> > >> > Yes you can insteall Python 32 bit on 64 bit Windows. >> >> Thanks. Would doing so leave my 64-bit Python 3.2 intact, so I could >> switch to the 32-bit only to install and use numpy? >> > > You might want to try the win64 packages here. > > Chuck Thanks Chuck! I downloaded numpy-unoptimized-1.6.1.win-amd64-py3.2.exe. numpy is now installed for 64-bit Python 3.21 But what are the implications of "unoptimized"? Python 3.2.1 (default, Jul 10 2011, 20:02:51) [MSC v.1500 64 bit (AMD64)] Type "help", "copyright", "credits" or "license" for more information. >>> import numpy; help(numpy) Help on package numpy: I copy and pasted this to an RTF file dedicated to the numpy help. It has 86,363 lines! Wow! Dick From charlesr.harris at gmail.com Tue Aug 30 12:43:51 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 30 Aug 2011 10:43:51 -0600 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 10:27 AM, Richard D. Moores wrote: > On Tue, Aug 30, 2011 at 09:09, Charles R Harris > wrote: > > > > > > On Tue, Aug 30, 2011 at 10:01 AM, Richard D. Moores > > wrote: > >> > >> On Tue, Aug 30, 2011 at 08:51, Olivier Delalleau wrote: > >> > > >> > 2011/8/30 Richard D. Moores > >> >> > >> >> Is it possible to install 32-bit > >> >> Python 3.2 on 64-bit Win 7 (you seem to have done so), so I could use > >> >> numpy? > >> >> > >> > > >> > Yes you can insteall Python 32 bit on 64 bit Windows. > >> > >> Thanks. Would doing so leave my 64-bit Python 3.2 intact, so I could > >> switch to the 32-bit only to install and use numpy? > >> > > > > You might want to try the win64 packages here. > > > > Chuck > > Thanks Chuck! I downloaded > numpy-unoptimized-1.6.1.win-amd64-py3.2.exe. numpy is now installed > for 64-bit Python 3.21 > > But what are the implications of "unoptimized"? > > Array operations will be slower. The optimized versions will be faster because they are linked to the highly optimized and tuned Intel MKL library rather than the fallback code included in numpy. If you have a lot of big arrays the speed difference will be significant. For small arrays call overhead tends to dominate and there isn't that much difference. You might want to download ipython and matplotlib also so that you have the basic numpy stack. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmoores at gmail.com Tue Aug 30 13:02:17 2011 From: rdmoores at gmail.com (Richard D. Moores) Date: Tue, 30 Aug 2011 10:02:17 -0700 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 09:43, Charles R Harris wrote: > You might want to download ipython and matplotlib also so that you have the > basic numpy stack. Good idea. I got matplotlib, but ipython for Python 3x isn't on http://www.lfd.uci.edu/~gohlke/pythonlibs/ . Dick From charlesr.harris at gmail.com Tue Aug 30 13:22:03 2011 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 30 Aug 2011 11:22:03 -0600 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 11:02 AM, Richard D. Moores wrote: > On Tue, Aug 30, 2011 at 09:43, Charles R Harris > wrote: > > > You might want to download ipython and matplotlib also so that you have > the > > basic numpy stack. > > Good idea. I got matplotlib, but ipython for Python 3x isn't on > http://www.lfd.uci.edu/~gohlke/pythonlibs/ . > Looks like python 3 support is still experimental: http://wiki.ipython.org/Python_3. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rdmoores at gmail.com Tue Aug 30 14:00:29 2011 From: rdmoores at gmail.com (Richard D. Moores) Date: Tue, 30 Aug 2011 11:00:29 -0700 Subject: [Numpy-discussion] numpy-1.6.1.win32-py3.2.exe (md5) won't install In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 10:22, Charles R Harris wrote: > > > On Tue, Aug 30, 2011 at 11:02 AM, Richard D. Moores > wrote: >> >> On Tue, Aug 30, 2011 at 09:43, Charles R Harris >> wrote: >> >> > You might want to download ipython and matplotlib also so that you have >> > the >> > basic numpy stack. >> >> Good idea. I got matplotlib, but ipython for Python 3x isn't on >> http://www.lfd.uci.edu/~gohlke/pythonlibs/ . > > Looks like python 3 support is still experimental: > http://wiki.ipython.org/Python_3. > > Chuck Yes. Thanks again, Chuck. Dick From robert.kern at gmail.com Tue Aug 30 14:17:35 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Aug 2011 13:17:35 -0500 Subject: [Numpy-discussion] numpy oddity In-Reply-To: References: <4E5CF4A1.7090505@gmail.com> Message-ID: On Tue, Aug 30, 2011 at 09:52, Charles R Harris wrote: > > On Tue, Aug 30, 2011 at 8:33 AM, Johann Cohen-Tanugi > wrote: >> >> I have numpy version 1.6.1 and I see the following behavior : >> >> In [380]: X >> Out[380]: 1.0476157527896641 >> >> In [381]: X.__class__ >> Out[381]: numpy.float64 >> >> In [382]: (2,3)*X >> Out[382]: (2, 3) >> >> In [383]: (2,3)/X >> Out[383]: array([ 1.90909691, ?2.86364537]) >> >> In [384]: X=float(X) >> >> In [385]: (2,3)/X >> >> --------------------------------------------------------------------------- >> TypeError ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call >> last) >> /home/cohen/ in () >> ----> 1 (2,3)/X >> >> TypeError: unsupported operand type(s) for /: 'tuple' and 'float' >> >> >> So it appears that X being a numpy float allows numpy to play some trick >> on the tuple so that division becomes possible, which regular built-in >> float does not allow arithmetics with tuples. >> But why is multiplication with "*" not following the same prescription? >> > > That's strange. > > In [16]: x = float64(2.1) > > In [17]: (2,3)*x > Out[17]: (2, 3, 2, 3) > > In [18]: (2,3)/x > Out[18]: array([ 0.95238095,? 1.42857143]) > > Note that in the first case x is treated like an integer. In the second the > tuple is turned into an array. I think both of these cases should raise > exceptions. In scalartypes.c.src: tatic PyObject * gentype_multiply(PyObject *m1, PyObject *m2) { PyObject *ret = NULL; long repeat; if (!PyArray_IsScalar(m1, Generic) && ((Py_TYPE(m1)->tp_as_number == NULL) || (Py_TYPE(m1)->tp_as_number->nb_multiply == NULL))) { /* Try to convert m2 to an int and try sequence repeat */ repeat = PyInt_AsLong(m2); if (repeat == -1 && PyErr_Occurred()) { return NULL; } ret = PySequence_Repeat(m1, (int) repeat); } else if (!PyArray_IsScalar(m2, Generic) && ((Py_TYPE(m2)->tp_as_number == NULL) || (Py_TYPE(m2)->tp_as_number->nb_multiply == NULL))) { /* Try to convert m1 to an int and try sequence repeat */ repeat = PyInt_AsLong(m1); if (repeat == -1 && PyErr_Occurred()) { return NULL; } ret = PySequence_Repeat(m2, (int) repeat); } if (ret == NULL) { PyErr_Clear(); /* no effect if not set */ ret = PyArray_Type.tp_as_number->nb_multiply(m1, m2); } return ret; } The PyInt_AsLong() calls should be changed to check for __index__ability, instead. Not sure about the other operators. Some people *may* be relying on the coerce-sequences-to-ndarray behavior with numpy scalars just like they do so with ndarrays. On the other hand, the repeat behavior with * should have thrown a monkey wrench to them if they were, so the number of people who do this is probably small. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From johann.cohentanugi at gmail.com Tue Aug 30 14:58:04 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Tue, 30 Aug 2011 20:58:04 +0200 Subject: [Numpy-discussion] numpy oddity In-Reply-To: References: <4E5CF4A1.7090505@gmail.com> Message-ID: <4E5D32BC.4040307@gmail.com> I am not sure I follow : is the problem the coerce-sequences-to-ndarrays behavior, or is it the fact that it applies to division and not multiplication? I thought the second situation is the more problematic. Anyway, you seem to take it as a bug, should I file a ticket somewhere? thanks, johann On 08/30/2011 08:17 PM, Robert Kern wrote: > On Tue, Aug 30, 2011 at 09:52, Charles R Harris > wrote: >> On Tue, Aug 30, 2011 at 8:33 AM, Johann Cohen-Tanugi >> wrote: >>> I have numpy version 1.6.1 and I see the following behavior : >>> >>> In [380]: X >>> Out[380]: 1.0476157527896641 >>> >>> In [381]: X.__class__ >>> Out[381]: numpy.float64 >>> >>> In [382]: (2,3)*X >>> Out[382]: (2, 3) >>> >>> In [383]: (2,3)/X >>> Out[383]: array([ 1.90909691, 2.86364537]) >>> >>> In [384]: X=float(X) >>> >>> In [385]: (2,3)/X >>> >>> --------------------------------------------------------------------------- >>> TypeError Traceback (most recent call >>> last) >>> /home/cohen/ in() >>> ----> 1 (2,3)/X >>> >>> TypeError: unsupported operand type(s) for /: 'tuple' and 'float' >>> >>> >>> So it appears that X being a numpy float allows numpy to play some trick >>> on the tuple so that division becomes possible, which regular built-in >>> float does not allow arithmetics with tuples. >>> But why is multiplication with "*" not following the same prescription? >>> >> That's strange. >> >> In [16]: x = float64(2.1) >> >> In [17]: (2,3)*x >> Out[17]: (2, 3, 2, 3) >> >> In [18]: (2,3)/x >> Out[18]: array([ 0.95238095, 1.42857143]) >> >> Note that in the first case x is treated like an integer. In the second the >> tuple is turned into an array. I think both of these cases should raise >> exceptions. > In scalartypes.c.src: > > tatic PyObject * > gentype_multiply(PyObject *m1, PyObject *m2) > { > PyObject *ret = NULL; > long repeat; > > if (!PyArray_IsScalar(m1, Generic)&& > ((Py_TYPE(m1)->tp_as_number == NULL) || > (Py_TYPE(m1)->tp_as_number->nb_multiply == NULL))) { > /* Try to convert m2 to an int and try sequence repeat */ > repeat = PyInt_AsLong(m2); > if (repeat == -1&& PyErr_Occurred()) { > return NULL; > } > ret = PySequence_Repeat(m1, (int) repeat); > } > else if (!PyArray_IsScalar(m2, Generic)&& > ((Py_TYPE(m2)->tp_as_number == NULL) || > (Py_TYPE(m2)->tp_as_number->nb_multiply == NULL))) { > /* Try to convert m1 to an int and try sequence repeat */ > repeat = PyInt_AsLong(m1); > if (repeat == -1&& PyErr_Occurred()) { > return NULL; > } > ret = PySequence_Repeat(m2, (int) repeat); > } > if (ret == NULL) { > PyErr_Clear(); /* no effect if not set */ > ret = PyArray_Type.tp_as_number->nb_multiply(m1, m2); > } > return ret; > } > > The PyInt_AsLong() calls should be changed to check for > __index__ability, instead. Not sure about the other operators. Some > people *may* be relying on the coerce-sequences-to-ndarray behavior > with numpy scalars just like they do so with ndarrays. On the other > hand, the repeat behavior with * should have thrown a monkey wrench to > them if they were, so the number of people who do this is probably > small. > From robert.kern at gmail.com Tue Aug 30 15:06:34 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Aug 2011 14:06:34 -0500 Subject: [Numpy-discussion] numpy oddity In-Reply-To: <4E5D32BC.4040307@gmail.com> References: <4E5CF4A1.7090505@gmail.com> <4E5D32BC.4040307@gmail.com> Message-ID: On Tue, Aug 30, 2011 at 13:58, Johann Cohen-Tanugi wrote: > I am not sure I follow : is the problem the coerce-sequences-to-ndarrays > behavior, or is it the fact that it applies to division and not > multiplication? > I thought the second situation is the more problematic. > Anyway, you seem to take it as a bug, should I file a ticket somewhere? * is the odd one out. /+- all behave the same: they coerce the sequence to an ndarray and broadcast the operation. Whether this is desirable is debatable, but there is at least a logic to it. Charles would rather have it raise an exception. (sequence * np.integer) is an interesting case. It should probably have the "repeat" semantics. However, this makes it an exception to the coerce-to-ndarray-and-broadcast rule with the other operations. This gives weight to Charles' preference to make the other operations raise an exception. What is an unambiguous bug is the behavior of * with a *float* scalar. It should never have the "repeat" semantics, no matter what. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From johann.cohentanugi at gmail.com Tue Aug 30 15:10:10 2011 From: johann.cohentanugi at gmail.com (Johann Cohen-Tanugi) Date: Tue, 30 Aug 2011 21:10:10 +0200 Subject: [Numpy-discussion] numpy oddity In-Reply-To: References: <4E5CF4A1.7090505@gmail.com> <4E5D32BC.4040307@gmail.com> Message-ID: <4E5D3592.7090009@gmail.com> ok thanks a lot. Safe code is often better than over-smart code, so I would line up with Charles here. There is too much potential for ambiguity in expected behavior. Johann On 08/30/2011 09:06 PM, Robert Kern wrote: > On Tue, Aug 30, 2011 at 13:58, Johann Cohen-Tanugi > wrote: >> I am not sure I follow : is the problem the coerce-sequences-to-ndarrays >> behavior, or is it the fact that it applies to division and not >> multiplication? >> I thought the second situation is the more problematic. >> Anyway, you seem to take it as a bug, should I file a ticket somewhere? > * is the odd one out. /+- all behave the same: they coerce the > sequence to an ndarray and broadcast the operation. Whether this is > desirable is debatable, but there is at least a logic to it. Charles > would rather have it raise an exception. > > (sequence * np.integer) is an interesting case. It should probably > have the "repeat" semantics. However, this makes it an exception to > the coerce-to-ndarray-and-broadcast rule with the other operations. > This gives weight to Charles' preference to make the other operations > raise an exception. > > What is an unambiguous bug is the behavior of * with a *float* scalar. > It should never have the "repeat" semantics, no matter what. > From bryce.ready at gmail.com Tue Aug 30 16:34:18 2011 From: bryce.ready at gmail.com (Bryce Ready) Date: Tue, 30 Aug 2011 14:34:18 -0600 Subject: [Numpy-discussion] converting standard array to record array Message-ID: Hello all, So i'm using numpy 1.6.0, and trying to convert a (4,4) numpy array of dtype 'f8' into a record array of this dtype: dt = np.dtype([('mat','(4,4)f8')]) > Here is the code snippet: In [21]: a = np.random.randn(4,4) > > In [22]: a.view(dt) > and the resulting error: ValueError: new type not compatible with array. > Can anyone shed some light for me on why this conversion is not possible? It is certainly technically possible, since the memory layout of the two arrays should be the same. Can anyone recommend a better way to do this conversion? Thanks in advance! -Bryce Ready -------------- next part -------------- An HTML attachment was scrubbed... URL: From mjanikas at esri.com Tue Aug 30 18:48:18 2011 From: mjanikas at esri.com (Mark Janikas) Date: Tue, 30 Aug 2011 15:48:18 -0700 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm Message-ID: Hello All, Last week I posted a question involving the identification of linear dependent columns of a matrix... but now I am finding an interesting result based on the linalg.inv() function... sometime I am able to invert a matrix that has linear dependent columns and other times I get the LinAlgError()... this suggests that there is some kind of random component to the INV method. Is this normal? Thanks much ahead of time, MJ Mark Janikas Product Developer ESRI, Geoprocessing 380 New York St. Redlands, CA 92373 909-793-2853 (2563) mjanikas at esri.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Aug 30 18:54:43 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Aug 2011 17:54:43 -0500 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 17:48, Mark Janikas wrote: > Hello All, > > Last week I posted a question involving the identification of linear > dependent columns of a matrix? but now I am finding an interesting result > based on the linalg.inv() function? sometime I am able to invert a matrix > that has linear dependent columns and other times I get the LinAlgError()? > this suggests that there is some kind of random component to the INV > method.? Is this normal?? Thanks much ahead of time, With exactly the same input in the same process? Can you provide that input? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From cjordan1 at uw.edu Tue Aug 30 18:56:55 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Tue, 30 Aug 2011 17:56:55 -0500 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: Can you give an example matrix? I'm not a numerical linear algebra expert, but I suspect that if your matrix is singular (or nearly so, in floating point) then any inverse given will look pretty wonky. Huge determinant, eigenvalues, operator norm, etc.. -Chris JS On Tue, Aug 30, 2011 at 5:48 PM, Mark Janikas wrote: > Hello All, > > > > Last week I posted a question involving the identification of linear > dependent columns of a matrix? but now I am finding an interesting result > based on the linalg.inv() function? sometime I am able to invert a matrix > that has linear dependent columns and other times I get the LinAlgError()? > this suggests that there is some kind of random component to the INV > method.? Is this normal?? Thanks much ahead of time, > > > > MJ > > > > Mark Janikas > > Product Developer > > ESRI, Geoprocessing > > 380 New York St. > > Redlands, CA 92373 > > 909-793-2853 (2563) > > mjanikas at esri.com > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From mjanikas at esri.com Tue Aug 30 19:01:55 2011 From: mjanikas at esri.com (Mark Janikas) Date: Tue, 30 Aug 2011 16:01:55 -0700 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: Working on it... Give me a few minutes to get you the data. TY! MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Christopher Jordan-Squire Sent: Tuesday, August 30, 2011 3:57 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm Can you give an example matrix? I'm not a numerical linear algebra expert, but I suspect that if your matrix is singular (or nearly so, in floating point) then any inverse given will look pretty wonky. Huge determinant, eigenvalues, operator norm, etc.. -Chris JS On Tue, Aug 30, 2011 at 5:48 PM, Mark Janikas wrote: > Hello All, > > > > Last week I posted a question involving the identification of linear > dependent columns of a matrix. but now I am finding an interesting result > based on the linalg.inv() function. sometime I am able to invert a matrix > that has linear dependent columns and other times I get the LinAlgError(). > this suggests that there is some kind of random component to the INV > method.? Is this normal?? Thanks much ahead of time, > > > > MJ > > > > Mark Janikas > > Product Developer > > ESRI, Geoprocessing > > 380 New York St. > > Redlands, CA 92373 > > 909-793-2853 (2563) > > mjanikas at esri.com > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From mjanikas at esri.com Tue Aug 30 19:34:10 2011 From: mjanikas at esri.com (Mark Janikas) Date: Tue, 30 Aug 2011 16:34:10 -0700 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: When I export to ascii I am losing precision and it getting consistency... I will try a flat dump. More to come. TY MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Mark Janikas Sent: Tuesday, August 30, 2011 4:02 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm Working on it... Give me a few minutes to get you the data. TY! MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Christopher Jordan-Squire Sent: Tuesday, August 30, 2011 3:57 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm Can you give an example matrix? I'm not a numerical linear algebra expert, but I suspect that if your matrix is singular (or nearly so, in floating point) then any inverse given will look pretty wonky. Huge determinant, eigenvalues, operator norm, etc.. -Chris JS On Tue, Aug 30, 2011 at 5:48 PM, Mark Janikas wrote: > Hello All, > > > > Last week I posted a question involving the identification of linear > dependent columns of a matrix. but now I am finding an interesting result > based on the linalg.inv() function. sometime I am able to invert a matrix > that has linear dependent columns and other times I get the LinAlgError(). > this suggests that there is some kind of random component to the INV > method.? Is this normal?? Thanks much ahead of time, > > > > MJ > > > > Mark Janikas > > Product Developer > > ESRI, Geoprocessing > > 380 New York St. > > Redlands, CA 92373 > > 909-793-2853 (2563) > > mjanikas at esri.com > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Tue Aug 30 19:37:22 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Aug 2011 18:37:22 -0500 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 18:34, Mark Janikas wrote: > When I export to ascii I am losing precision and it getting consistency... I will try a flat dump. ?More to come. ?TY Might as well np.save() it to an .npy binary file and attach it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From robert.kern at gmail.com Tue Aug 30 19:42:26 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Aug 2011 18:42:26 -0500 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 17:48, Mark Janikas wrote: > Hello All, > > Last week I posted a question involving the identification of linear > dependent columns of a matrix? but now I am finding an interesting result > based on the linalg.inv() function? sometime I am able to invert a matrix > that has linear dependent columns and other times I get the LinAlgError()? > this suggests that there is some kind of random component to the INV > method.? Is this normal?? Thanks much ahead of time, We will also need to know the platform that you are on as well as the LAPACK library that you linked numpy against. It is the behavior of that LAPACK library that is controlling here. Standard LAPACK does sometimes use pseudorandom numbers in certain situations, but AFAICT it deterministically seeds the PRNG on every call, and I don't think it does this for any subroutine involved with inversion. But if you use an optimized LAPACK from some vendor, I don't know what they may be doing. Some optimized LAPACK/BLAS libraries may be threaded and may dynamically determine how to break up the problem based on load (I don't know of any that specifically do this, but it's a possibility). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From mjanikas at esri.com Tue Aug 30 20:38:59 2011 From: mjanikas at esri.com (Mark Janikas) Date: Tue, 30 Aug 2011 17:38:59 -0700 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: OK... so I have been using checksums to compare and it looks like I am getting a different value when it fails as opposed to when it passes... I.e. the input is NOT the same. When I save them to npy files and run LA.inv() I get consistent results. Now I have to track down in my code why the inputs are different.... Sucks, because I keep having to dive deeper (more checksums... yeh!). But it is all linear algebra from the same input, so kinda weird that there is a diversion. Thanks for all of your help! And Ill post again when I find the culprit. (probably me :-)) MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Robert Kern Sent: Tuesday, August 30, 2011 4:42 PM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm On Tue, Aug 30, 2011 at 17:48, Mark Janikas wrote: > Hello All, > > Last week I posted a question involving the identification of linear > dependent columns of a matrix? but now I am finding an interesting result > based on the linalg.inv() function? sometime I am able to invert a matrix > that has linear dependent columns and other times I get the LinAlgError()? > this suggests that there is some kind of random component to the INV > method.? Is this normal?? Thanks much ahead of time, We will also need to know the platform that you are on as well as the LAPACK library that you linked numpy against. It is the behavior of that LAPACK library that is controlling here. Standard LAPACK does sometimes use pseudorandom numbers in certain situations, but AFAICT it deterministically seeds the PRNG on every call, and I don't think it does this for any subroutine involved with inversion. But if you use an optimized LAPACK from some vendor, I don't know what they may be doing. Some optimized LAPACK/BLAS libraries may be threaded and may dynamically determine how to break up the problem based on load (I don't know of any that specifically do this, but it's a possibility). -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From thomas.robitaille at gmail.com Tue Aug 30 23:34:52 2011 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Tue, 30 Aug 2011 23:34:52 -0400 Subject: [Numpy-discussion] Issue with dtype and nx1 arrays Message-ID: Hello, Is the following behavior normal? In [1]: import numpy as np In [2]: np.dtype([('a',' Hi, this is probably my lack of understanding...when i set up some masks for 2 arrays and try to divide one by the other I get a runtime warning. Seemingly this is when I am asking python to divide one nan by the other, however I thought by masking the array numpy would then know to ignore these nans? For example import numpy as np a = np.array([4.5, 6.7, 8.0, 9.0, 0.00001]) b = np.array([0.0001, 6.7, 8.0, 9.0, 0.00001]) a = np.ma.where(np.logical_or(a<0.01, b<0.01), np.nan, a) b = np.ma.where(np.logical_or(a<0.01, b<0.01), np.nan, b) a/b will produce ?./numpy/ma/core.py:772: RuntimeWarning: invalid value encountered in absolute return umath.absolute(a) * self.tolerance >= umath.absolute(b) but of course give the correct result masked_array(data = [-- 1.0 1.0 1.0 --], mask = [ True False False False True], fill_value = 1e+20) But what is the correct way to do this array division such that I don't produce the warning? The only way I can see that you can do it is a bit convoluted and involves empty the array of the masked values, e.g. a = a[np.isnan(a) == False] b = b[np.isnan(b) == False] a/b thanks, Martin -- View this message in context: http://old.nabble.com/nan-division-warnings-tp32369310p32369310.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From warren.weckesser at enthought.com Tue Aug 30 23:42:27 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Tue, 30 Aug 2011 22:42:27 -0500 Subject: [Numpy-discussion] Issue with dtype and nx1 arrays In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 10:34 PM, Thomas Robitaille < thomas.robitaille at gmail.com> wrote: > Hello, > > Is the following behavior normal? > > In [1]: import numpy as np > > In [2]: np.dtype([('a',' Out[2]: dtype([('a', ' > In [3]: np.dtype([('a',' Out[3]: dtype([('a', ' > I.e. in the second case, the second dimension of the dtype (1) is > being ignored? Is there a way to avoid this? > Use a tuple to specify the dimension: In [11]: dtype([('a', ' > Thanks, > Thomas > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Aug 30 23:47:58 2011 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 30 Aug 2011 22:47:58 -0500 Subject: [Numpy-discussion] nan division warnings In-Reply-To: <32369310.post@talk.nabble.com> References: <32369310.post@talk.nabble.com> Message-ID: On Tue, Aug 30, 2011 at 22:39, mdekauwe wrote: > > Hi, > > this is probably my lack of understanding...when i set up some masks for 2 > arrays and try to divide one by the other I get a runtime warning. Seemingly > this is when I am asking python to divide one nan by the other, however I > thought by masking the array numpy would then know to ignore these nans? For > example > > import numpy as np > a = np.array([4.5, 6.7, 8.0, 9.0, 0.00001]) > b = np.array([0.0001, 6.7, 8.0, 9.0, 0.00001]) > a = np.ma.where(np.logical_or(a<0.01, b<0.01), np.nan, a) > b = np.ma.where(np.logical_or(a<0.01, b<0.01), np.nan, b) > a/b > > will produce > > ?./numpy/ma/core.py:772: RuntimeWarning: invalid value encountered in > absolute > return umath.absolute(a) * self.tolerance >= umath.absolute(b) > > but of course give the correct result > > masked_array(data = [-- 1.0 1.0 1.0 --], > ? ? ? ? ? ? mask = [ True False False False ?True], > ? ? ? fill_value = 1e+20) > > But what is the correct way to do this array division such that I don't > produce the warning? Just don't put NaNs in. [~] |10> a = np.array([4.5, 6.7, 8.0, 9.0, 0.00001]) [~] |11> b = np.array([0.0001, 6.7, 8.0, 9.0, 0.00001]) [~] |12> mask = (a < 0.01) | (b < 0.01) [~] |13> ma = np.ma.masked_array(a, mask=mask) [~] |14> mb = np.ma.masked_array(b, mask=mask) [~] |15> ma / mb masked_array(data = [-- 1.0 1.0 1.0 --], mask = [ True False False False True], fill_value = 1e+20) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From josef.pktd at gmail.com Tue Aug 30 23:49:54 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 30 Aug 2011 23:49:54 -0400 Subject: [Numpy-discussion] converting standard array to record array In-Reply-To: References: Message-ID: On Tue, Aug 30, 2011 at 4:34 PM, Bryce Ready wrote: > Hello all, > > So i'm using numpy 1.6.0, and trying to convert a (4,4) numpy array of dtype > 'f8' into a record array of this dtype: > >> dt = np.dtype([('mat','(4,4)f8')]) > > Here is the code snippet: > >> In [21]: a = np.random.randn(4,4) >> >> In [22]: a.view(dt) > > and the resulting error: > >> ValueError: new type not compatible with array. > > Can anyone shed some light for me on why this conversion is not possible? > It is certainly technically possible, since the memory layout of the two > arrays should be the same. > > Can anyone recommend a better way to do this conversion? I guess it can only convert rows, each row needs the memory size of the dt >>> np.random.randn(4,4).ravel().view(dt).shape (1,) >>> np.random.randn(2,4,4).reshape(-1,16).view(dt) array([[ ([[1.7107996212005496, 0.64334162481360346, -2.1589367225479004, 1.2302260107072134], [0.90703092017458831, -1.0297890301610224, -0.095086304368665275, 0.35407366904038495], [-1.1083969421298907, 0.83307347286837752, 0.39886399402076494, 0.26313136034262563], [0.81860729029038914, -1.1443047382313905, 0.73326737255810859, 0.34482475392499168]],)], [ ([[0.69027418489768777, 0.25867753263599164, 1.0320990807184023, 0.21836691513066409], [0.45913017094388614, -0.89570247025515981, 0.76452726059163534, -2.2953009964941642], [0.60248580944596275, 1.0863090037733505, -0.10849220482850662, -0.19176089514256078], [-1.0700600508627109, -1.4743316703511105, 0.79193567523155062, 0.82243321942810521]],)]], dtype=[('mat', ' > Thanks in advance! > > -Bryce Ready > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From mdekauwe at gmail.com Wed Aug 31 01:00:30 2011 From: mdekauwe at gmail.com (mdekauwe) Date: Tue, 30 Aug 2011 22:00:30 -0700 (PDT) Subject: [Numpy-discussion] nan division warnings In-Reply-To: References: <32369310.post@talk.nabble.com> Message-ID: <32369517.post@talk.nabble.com> Perfect that works how I envisaged, I am an idiot, I clearly overcomplicated my solution. thanks. -- View this message in context: http://old.nabble.com/nan-division-warnings-tp32369310p32369517.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From nadavh at visionsense.com Wed Aug 31 01:37:19 2011 From: nadavh at visionsense.com (Nadav Horesh) Date: Tue, 30 Aug 2011 22:37:19 -0700 Subject: [Numpy-discussion] Wrong treatment of byte-order. Message-ID: <26FC23E7C398A64083C980D16001012D261844C0BC@VA3DIAXVS361.RED001.local> Hi, This is my second post on this problem I found in numpy 1.6.1, and recently it cam up in the latest git version (2.0.0.dev-f3e70d9). The problem is numpy treats the native byte order ('<') as illegal while the wrong one ('>') as the right one. The output of the attached script (bult for python 2.6 + ) is given below (my system is a 64 bit linux on core i7. 64 bit python 2.7.2/3.2 , numpy uses ATLAS): $ python test_byte_order.py a = [[ 0.28596132 0.31658824 0.34929676] [ 0.48739246 0.68020533 0.39616588] [ 0.29310406 0.9584545 0.8120068 ]] a1 = [[ 0.28596132 0.31658824 0.34929676] [ 0.48739246 0.68020533 0.39616588] [ 0.29310406 0.9584545 0.8120068 ]] (Wrong byte order on Intel CPUs): a2 = [[ 8.97948198e-017 1.73406416e-025 -4.25909057e+014] [ 4.59443694e+090 7.91693101e-029 5.26959329e-135] [ 2.93240450e+060 -2.25898860e-051 -2.06126917e+302]] Invert a: OK Invert a2 (Wrong byte order!): OK invert a1: Traceback (most recent call last): File "test_byte_order.py", line 20, in b1 = N.linalg.inv(a1) File "/usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py", line 445, in inv return wrap(solve(a, identity(a.shape[0], dtype=a.dtype))) File "/usr/lib64/python2.7/site-packages/numpy/linalg/linalg.py", line 326, in solve results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0) lapack_lite.LapackError: Parameter a has non-native byte order in lapack_lite.dgesv -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: test_byte_order.py URL: From pav at iki.fi Wed Aug 31 04:59:44 2011 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 31 Aug 2011 08:59:44 +0000 (UTC) Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm References: Message-ID: On Tue, 30 Aug 2011 15:48:18 -0700, Mark Janikas wrote: > Last week I posted a question involving the identification of linear > dependent columns of a matrix... but now I am finding an interesting > result based on the linalg.inv() function... sometime I am able to > invert a matrix that has linear dependent columns and other times I get > the LinAlgError()... this suggests that there is some kind of random > component to the INV method. Is this normal? I suspect that this is a case of floating-point rounding errors. Floating-point arithmetic is inexact, so even if a certain matrix is singular in exact arithmetic, for a computer it may still be invertible (by a given algorithm). This type of things are not unusual in floating-point computations. The matrix condition number (`np.linalg.cond`) is a better measure of whether a matrix is invertible or not. -- Pauli Virtanen From marquett at iap.fr Wed Aug 31 06:20:01 2011 From: marquett at iap.fr (Jean-Baptiste Marquette) Date: Wed, 31 Aug 2011 12:20:01 +0200 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> References: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> Message-ID: <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> Hi Pierre, Thanks for the guess. Unfortunately, I got the same error: [('bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14)] Traceback (most recent call last): File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 40, in StatsAll = np.array(np.asarray(Stats), dtype=('a15, f8, f8, i4, f8, f8')) ValueError: could not convert string to float: bs3000k.cat The code is Stats = [(CatBase, round(stats.mean(Data.Ra), 5), round(stats.mean(Data.Dec), 5), len(Sep), round(stats.mean(Sep),4), round(stats.stdev(Sep),4),)] print Stats if First: StatsAll = np.array(np.asarray(Stats), dtype=('a15, f8, f8, i4, f8, f8')) First = False else: StatsAll = np.vstack((StatsAll, np.asarray(Stats))) print len(StatsAll) I tried various syntaxes, without success. Cheers JB > > On Aug 30, 2011, at 10:46 AM, Marquette Jean-Baptiste wrote: > >> Hi all, >> >> I have this piece of code: >> >> Stats = [CatBase, round(stats.mean(Data.Ra), 5), round(stats.mean(Data.Dec), 5), len(Sep), round(stats.mean(Sep),4), round(stats.stdev(Sep),4)] >> print Stats >> if First: >> StatsAll = np.array(np.asarray(Stats), dtype=('a11, f8, f8, i4, f8, f8')) >> First = False >> else: >> StatsAll = np.vstack((StatsAll, np.asarray(Stats))) >> print len(StatsAll) >> >> This yields the error: >> >> ['bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14] >> Traceback (most recent call last): >> File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 40, in >> StatsAll = np.array(np.asarray(Stats), dtype=('a11, f8, f8, i4, f8, f8')) >> ValueError: could not convert string to float: bs3000k.cat >> >> What's wrong ? > > My guess: > Stats is a list of 5 elements, but you want a list of 1 5-element tuple to match the type. > >> Stats = [(CatBase, round(stats.mean(Data.Ra), 5), round(stats.mean(Data.Dec), 5), len(Sep), round(stats.mean(Sep),4), round(stats.stdev(Sep),4),)] > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Aug 31 06:42:21 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 31 Aug 2011 12:42:21 +0200 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> References: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> Message-ID: <64CA5B4C-E000-4372-85FD-35767715DE3B@gmail.com> On Aug 31, 2011, at 12:20 PM, Jean-Baptiste Marquette wrote: > > Hi Pierre, > > Thanks for the guess. Unfortunately, I got the same error: > > [('bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14)] > Traceback (most recent call last): > File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 40, in > StatsAll = np.array(np.asarray(Stats), dtype=('a15, f8, f8, i4, f8, f8')) > ValueError: could not convert string to float: bs3000k.cat Of course, silly me Your line 40 is actually >>> StatsAll = np.array(np.asarray(Stats), dtype=('a15, f8, f8, i4, f8, f8')) With np.asarray(Stats), you're trying to load Stats as an array using a dtype of float by default. Of course, np.asarray is choking on the first element. So, try to use instead >>> StatsAll = np.array(Stats, dtype=('a15, f8, f8, i4, f8, f8')) From dieter at uellue.de Wed Aug 31 06:58:50 2011 From: dieter at uellue.de (Dieter Weber) Date: Wed, 31 Aug 2011 12:58:50 +0200 Subject: [Numpy-discussion] Numpy performance boost Message-ID: <1314788330.2418.13.camel@media> Hi, just wanted to show an example of how python3 + numpy compares with just python3 and many other languages and language implementations: http://shootout.alioth.debian.org/u64q/performance.php?test=mandelbrot#about The python3 program using numpy is #6 and you find it with the "interesting alternative" programs on the bottom because it was disqualified for doing things differently. It is 6.3x slower than the fastest program and well ahead of all other interpreted languages. Thanks to all contributors for making numpy such a great piece of software! Greetings, Dieter From davide.lasagna at polito.it Wed Aug 31 09:30:37 2011 From: davide.lasagna at polito.it (Davide) Date: Wed, 31 Aug 2011 15:30:37 +0200 Subject: [Numpy-discussion] Model Predictive Control package Message-ID: <4E5E377D.8050109@polito.it> Dear List, Does anybody knows if there is a python package for simulating LTI dynamic systems controlled with a model predictive controller? I am writing some code which does the job, but the math is not super-easy and i would not like to reinvent the wheel and loose to much time. I will soon publish such codes somewhere, i.e Github, so anyone interested can pick it up. Cheers, Davide From marquett at iap.fr Wed Aug 31 09:40:55 2011 From: marquett at iap.fr (Jean-Baptiste Marquette) Date: Wed, 31 Aug 2011 15:40:55 +0200 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: <64CA5B4C-E000-4372-85FD-35767715DE3B@gmail.com> References: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> <64CA5B4C-E000-4372-85FD-35767715DE3B@gmail.com> Message-ID: Hi Pierre, Bingo ! That works. I finally coded like: Stats = [(CatBase, round(stats.mean(Data.Ra), 5), round(stats.mean(Data.Dec), 5), len(Sep), round(stats.mean(Sep),4), round(stats.stdev(Sep),4),)] StatArray = np.array(Stats, dtype=([('Catalog', 'a15'), ('RaMean', 'f8'), ('DecMean', 'f8'), ('NStars', 'i4'), ('RMS', 'f8'), ('StdDev', 'f8')])) print StatArray if First: StatsAll = StatArray First = False else: StatsAll = np.vstack((StatsAll, StatArray)) My next problem deals with the writing of data to a file. I use the command: np.savetxt(Table, StatsAll, delimiter=' ', fmt=['%15s %.5f %.5f %5d %.4f %.4f']) which yields: Traceback (most recent call last): File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in np.savetxt(Table, StatsAll, delimiter=' ', fmt=['%15s %.5f %.5f %5d %.4f %.4f']) File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 979, in savetxt fh.write(asbytes(format % tuple(row) + newline)) TypeError: not enough arguments for format string I struggled with various unsuccessful fmt syntaxes, and the numpy doc is very discrete about that topic: fmt : string or sequence of strings A single format (%10.5f), a sequence of formats But I don't find this valid sequence nor an example... Cheers JB > > On Aug 31, 2011, at 12:20 PM, Jean-Baptiste Marquette wrote: > >> >> Hi Pierre, >> >> Thanks for the guess. Unfortunately, I got the same error: >> >> [('bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14)] >> Traceback (most recent call last): >> File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 40, in >> StatsAll = np.array(np.asarray(Stats), dtype=('a15, f8, f8, i4, f8, f8')) >> ValueError: could not convert string to float: bs3000k.cat > > Of course, silly me > > Your line 40 is actually >>>> StatsAll = np.array(np.asarray(Stats), dtype=('a15, f8, f8, i4, f8, f8')) > > With np.asarray(Stats), you're trying to load Stats as an array using a dtype of float by default. Of course, np.asarray is choking on the first element. > > So, try to use instead >>>> StatsAll = np.array(Stats, dtype=('a15, f8, f8, i4, f8, f8')) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Aug 31 10:02:06 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 31 Aug 2011 16:02:06 +0200 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: References: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> <64CA5B4C-E000-4372-85FD-35767715DE3B@gmail.com> Message-ID: <44057DAD-05DF-4513-A7D1-A80238702C21@gmail.com> On Aug 31, 2011, at 3:40 PM, Jean-Baptiste Marquette wrote: > Traceback (most recent call last): > File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in > np.savetxt(Table, StatsAll, delimiter=' ', fmt=['%15s %.5f %.5f %5d %.4f %.4f']) > File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 979, in savetxt > fh.write(asbytes(format % tuple(row) + newline)) > TypeError: not enough arguments for format string Without knowing StatsAll, it ain't easy? From the exception message, we could expect that one of rows is empty or as less than the 6 elements required by your format string. If you're using IPython, switch to debugger mode (pdb), then inspect row and format to find out the content of the offending line. > I struggled with various unsuccessful fmt syntaxes, and the numpy doc is very discrete about that topic: > > fmt : string or sequence of strings > > A single format (%10.5f), a sequence of formats Looks clear enough to me? But yes, a comment in the code shows that " `fmt` can be a string with multiple insertion points or a list of formats. E.g. '%10.5f\t%10d' or ('%10.5f', '$10d')" (so we should probably update the doc to this regard) From marquett at iap.fr Wed Aug 31 10:24:10 2011 From: marquett at iap.fr (Jean-Baptiste Marquette) Date: Wed, 31 Aug 2011 16:24:10 +0200 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: <44057DAD-05DF-4513-A7D1-A80238702C21@gmail.com> References: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> <64CA5B4C-E000-4372-85FD-35767715DE3B@gmail.com> <44057DAD-05DF-4513-A7D1-A80238702C21@gmail.com> Message-ID: <725FB731-F708-4B56-B084-4C8E1F34CDF1@iap.fr> Hi Pierre, > > On Aug 31, 2011, at 3:40 PM, Jean-Baptiste Marquette wrote: >> Traceback (most recent call last): >> File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in >> np.savetxt(Table, StatsAll, delimiter=' ', fmt=['%15s %.5f %.5f %5d %.4f %.4f']) >> File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 979, in savetxt >> fh.write(asbytes(format % tuple(row) + newline)) >> TypeError: not enough arguments for format string > > Without knowing StatsAll, it ain't easy? From the exception message, we could expect that one of rows is empty or as less than the 6 elements required by your format string. > If you're using IPython, switch to debugger mode (pdb), then inspect row and format to find out the content of the offending line. Here is a (short) sample of StatsAll: [[('bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14)] [('bs3000l.cat', 280.61389, -7.24097, 11490, 0.1923, 0.0747)] [('bs3000m.cat', 280.77074, -7.08237, 13989, 0.2289, 0.1009)] [('bs3000n.cat', 280.77228, -7.23563, 15811, 0.1767, 0.1327)] [('bs3001k.cat', 280.95383, -7.10004, 7402, 0.2539, 0.0777)] [('bs3001l.cat', 280.95495, -7.23409, 13840, 0.1463, 0.1008)] [('bs3001m.cat', 281.1172, -7.08094, 9608, 0.2311, 0.1458)] [('bs3001n.cat', 281.12447, -7.23398, 14030, 0.2538, 0.1022)] [('bs3002k.cat', 280.62533, -7.47818, 593, 0.0291, 0.0237)] [('bs3002l.cat', 280.61508, -7.60359, 9122, 0.0518, 0.0205)] [('bs3002m.cat', 280.77209, -7.46262, 1510, 0.0415, 0.0302)] [('bs3002n.cat', 280.77578, -7.60117, 14177, 0.0807, 0.0327)] [('bs3003k.cat', 280.96463, -7.42967, 13506, 0.0305, 0.0225)] [('bs3003l.cat', 280.95638, -7.58462, 17903, 0.0458, 0.0298)] [('bs3003m.cat', 281.12729, -7.42516, 15676, 0.0879, 0.0446)] [('bs3003n.cat', 281.1354, -7.58497, 16015, 0.0685, 0.0376)] [('bs3004k.cat', 280.61148, -7.78976, 14794, 0.079, 0.0473)] [('bs3004l.cat', 280.61791, -7.94186, 15455, 0.0818, 0.0727)] [('bs3004m.cat', 280.78388, -7.78834, 14986, 0.0966, 0.0313)] [('bs3004n.cat', 280.78261, -7.93932, 18713, 0.0925, 0.0472)] [('bs3005k.cat', 280.9659, -7.78816, 14906, 0.0456, 0.022)] [('bs3005l.cat', 280.96811, -7.93894, 19744, 0.021, 0.0218)] [('bs3005m.cat', 281.1344, -7.78035, 15943, 0.0687, 0.0203)] [('bs3005n.cat', 281.13915, -7.93027, 18183, 0.1173, 0.0695)] [('bs3006k.cat', 280.61294, -8.14201, 13309, 0.143, 0.065)] [('bs3006l.cat', 280.65109, -8.29416, 405, 0.258, 0.1147)] [('bs3006m.cat', 280.78767, -8.13916, 14527, 0.1106, 0.0568)] [('bs3006n.cat', 280.80935, -8.28823, 818, 0.2382, 0.0764)] [('bs3007k.cat', 280.96614, -8.1401, 13251, 0.0946, 0.0415)] [('bs3007l.cat', 280.97158, -8.23797, 5807, 0.1758, 0.0636)] [('bs3007m.cat', 281.14129, -8.13799, 13886, 0.1524, 0.0517)] [('bs3007n.cat', 281.15309, -8.2476, 214, 0.1584, 0.0648)]] > >> I struggled with various unsuccessful fmt syntaxes, and the numpy doc is very discrete about that topic: >> >> fmt : string or sequence of strings >> >> A single format (%10.5f), a sequence of formats > > Looks clear enough to me? But yes, a comment in the code shows that " `fmt` can be a string with multiple insertion points or a list of formats. E.g. '%10.5f\t%10d' or ('%10.5f', '$10d')" (so we should probably update the doc to this regard) The command with parentheses: np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s %.5f %.5f %5d %.4f %.4f')) fails as well, but with a different error: Traceback (most recent call last): File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s %.5f %.5f %5d %.4f %.4f')) File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 974, in savetxt % fmt) AttributeError: fmt has wrong number of % formats. %15s %.5f %.5f %5d %.4f %.4f Plus, this one: np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s', '%.5f', '%.5f', '%5d', '%.4f', '%.4f')) yields: Traceback (most recent call last): File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s', '%.5f', '%.5f', '%5d', '%.4f', '%.4f')) File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 966, in savetxt raise AttributeError('fmt has wrong shape. %s' % str(fmt)) AttributeError: fmt has wrong shape. ('%15s', '%.5f', '%.5f', '%5d', '%.4f', '%.4f') Quite puzzling... Should I switch to the I/O of asciitable package ? Anyway, thanks again for your help. JB -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Wed Aug 31 10:33:26 2011 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 31 Aug 2011 16:33:26 +0200 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: <725FB731-F708-4B56-B084-4C8E1F34CDF1@iap.fr> References: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> <64CA5B4C-E000-4372-85FD-35767715DE3B@gmail.com> <44057DAD-05DF-4513-A7D1-A80238702C21@gmail.com> <725FB731-F708-4B56-B084-4C8E1F34CDF1@iap.fr> Message-ID: <57A95D9F-1AB2-4188-8970-12716CC82F2B@gmail.com> On Aug 31, 2011, at 4:24 PM, Jean-Baptiste Marquette wrote: > > Hi Pierre, > >> >> On Aug 31, 2011, at 3:40 PM, Jean-Baptiste Marquette wrote: >>> Traceback (most recent call last): >>> File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in >>> np.savetxt(Table, StatsAll, delimiter=' ', fmt=['%15s %.5f %.5f %5d %.4f %.4f']) >>> File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 979, in savetxt >>> fh.write(asbytes(format % tuple(row) + newline)) >>> TypeError: not enough arguments for format string >> >> Without knowing StatsAll, it ain't easy? From the exception message, we could expect that one of rows is empty or as less than the 6 elements required by your format string. >> If you're using IPython, switch to debugger mode (pdb), then inspect row and format to find out the content of the offending line. > > Here is a (short) sample of StatsAll: > > [[('bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14)] Have you tried the debugger as I suggested ? There must be a line somewhere that doesn't follow the format (the first one?) >> >>> I struggled with various unsuccessful fmt syntaxes, and the numpy doc is very discrete about that topic: >>> >>> fmt : string or sequence of strings >>> >>> A single format (%10.5f), a sequence of formats >> >> Looks clear enough to me? But yes, a comment in the code shows that " `fmt` can be a string with multiple insertion points or a list of formats. E.g. '%10.5f\t%10d' or ('%10.5f', '$10d')" (so we should probably update the doc to this regard) > > The command with parentheses: > > np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s %.5f %.5f %5d %.4f %.4f')) > > fails as well, but with a different error: Well, either you use 1 string fmt="%15s %.5f %.5f %5d %.4f %.4f" or you use a list of strings fmt=("%15s", "%.5f", "%.5f", "%5d", "%.4f", "%.4f") > > Plus, this one: > > np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s', '%.5f', '%.5f', '%5d', '%.4f', '%.4f')) > > yields: > > Traceback (most recent call last): > File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in > np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s', '%.5f', '%.5f', '%5d', '%.4f', '%.4f')) > File "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", line 966, in savetxt > raise AttributeError('fmt has wrong shape. %s' % str(fmt)) > AttributeError: fmt has wrong shape. ('%15s', '%.5f', '%.5f', '%5d', '%.4f', '%.4f') try fmt=[('%15s', '%.5f', '%.5f', '%5d', '%.4f', '%.4f')] > Quite puzzling... > Should I switch to the I/O of asciitable package ? As you wish. The easiest might be to write the file yourself.: >>> fmt = "%15s %.5f %.5f %5d %.4f %.4f\n" >>> f=open(Table,'r') >>> for line in StatsAll: >>> f.write(fmt % line) >>> f.close() or something like that From warren.weckesser at enthought.com Wed Aug 31 10:49:45 2011 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Wed, 31 Aug 2011 09:49:45 -0500 Subject: [Numpy-discussion] A question about dtype syntax In-Reply-To: <725FB731-F708-4B56-B084-4C8E1F34CDF1@iap.fr> References: <6AB4E3BA-C9B9-4D99-A470-259CD81589A7@gmail.com> <8701E943-4F16-435B-9EA9-B39639E2754C@iap.fr> <64CA5B4C-E000-4372-85FD-35767715DE3B@gmail.com> <44057DAD-05DF-4513-A7D1-A80238702C21@gmail.com> <725FB731-F708-4B56-B084-4C8E1F34CDF1@iap.fr> Message-ID: On Wed, Aug 31, 2011 at 9:24 AM, Jean-Baptiste Marquette wrote: > > Hi Pierre, > > > On Aug 31, 2011, at 3:40 PM, Jean-Baptiste Marquette wrote: > > Traceback (most recent call last): > > File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in > > > np.savetxt(Table, StatsAll, delimiter=' ', fmt=['%15s %.5f %.5f %5d %.4f > %.4f']) > > File > "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", > line 979, in savetxt > > fh.write(asbytes(format % tuple(row) + newline)) > > TypeError: not enough arguments for format string > > > Without knowing StatsAll, it ain't easy? From the exception message, we > could expect that one of rows is empty or as less than the 6 elements > required by your format string. > If you're using IPython, switch to debugger mode (pdb), then inspect row > and format to find out the content of the offending line. > > > Here is a (short) sample of StatsAll: > > [[('bs3000k.cat', 280.60341, -7.09118, 9480, 0.2057, 0.14)] > [('bs3000l.cat', 280.61389, -7.24097, 11490, 0.1923, 0.0747)] > [('bs3000m.cat', 280.77074, -7.08237, 13989, 0.2289, 0.1009)] > [('bs3000n.cat', 280.77228, -7.23563, 15811, 0.1767, 0.1327)] > [('bs3001k.cat', 280.95383, -7.10004, 7402, 0.2539, 0.0777)] > [('bs3001l.cat', 280.95495, -7.23409, 13840, 0.1463, 0.1008)] > [('bs3001m.cat', 281.1172, -7.08094, 9608, 0.2311, 0.1458)] > [('bs3001n.cat', 281.12447, -7.23398, 14030, 0.2538, 0.1022)] > [('bs3002k.cat', 280.62533, -7.47818, 593, 0.0291, 0.0237)] > [('bs3002l.cat', 280.61508, -7.60359, 9122, 0.0518, 0.0205)] > [('bs3002m.cat', 280.77209, -7.46262, 1510, 0.0415, 0.0302)] > [('bs3002n.cat', 280.77578, -7.60117, 14177, 0.0807, 0.0327)] > [('bs3003k.cat', 280.96463, -7.42967, 13506, 0.0305, 0.0225)] > [('bs3003l.cat', 280.95638, -7.58462, 17903, 0.0458, 0.0298)] > [('bs3003m.cat', 281.12729, -7.42516, 15676, 0.0879, 0.0446)] > [('bs3003n.cat', 281.1354, -7.58497, 16015, 0.0685, 0.0376)] > [('bs3004k.cat', 280.61148, -7.78976, 14794, 0.079, 0.0473)] > [('bs3004l.cat', 280.61791, -7.94186, 15455, 0.0818, 0.0727)] > [('bs3004m.cat', 280.78388, -7.78834, 14986, 0.0966, 0.0313)] > [('bs3004n.cat', 280.78261, -7.93932, 18713, 0.0925, 0.0472)] > [('bs3005k.cat', 280.9659, -7.78816, 14906, 0.0456, 0.022)] > [('bs3005l.cat', 280.96811, -7.93894, 19744, 0.021, 0.0218)] > [('bs3005m.cat', 281.1344, -7.78035, 15943, 0.0687, 0.0203)] > [('bs3005n.cat', 281.13915, -7.93027, 18183, 0.1173, 0.0695)] > [('bs3006k.cat', 280.61294, -8.14201, 13309, 0.143, 0.065)] > [('bs3006l.cat', 280.65109, -8.29416, 405, 0.258, 0.1147)] > [('bs3006m.cat', 280.78767, -8.13916, 14527, 0.1106, 0.0568)] > [('bs3006n.cat', 280.80935, -8.28823, 818, 0.2382, 0.0764)] > [('bs3007k.cat', 280.96614, -8.1401, 13251, 0.0946, 0.0415)] > [('bs3007l.cat', 280.97158, -8.23797, 5807, 0.1758, 0.0636)] > [('bs3007m.cat', 281.14129, -8.13799, 13886, 0.1524, 0.0517)] > [('bs3007n.cat', 281.15309, -8.2476, 214, 0.1584, 0.0648)]] > > Notice that your array is actually a 2D structured array with shape (n, 1). Try reshaping it to (n,) or apply np.squeeze before calling savetxt. Warren > > I struggled with various unsuccessful fmt syntaxes, and the numpy doc is > very discrete about that topic: > > > fmt : string or sequence of strings > > > A single format (%10.5f), a sequence of formats > > > Looks clear enough to me? But yes, a comment in the code shows that " > `fmt` can be a string with multiple insertion points or a list of formats. > E.g. '%10.5f\t%10d' or ('%10.5f', '$10d')" (so we should probably update > the doc to this regard) > > > The command with parentheses: > > np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s %.5f %.5f > %5d %.4f %.4f')) > > fails as well, but with a different error: > > Traceback (most recent call last): > File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in > > np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s %.5f %.5f %5d > %.4f %.4f')) > File > "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", > line 974, in savetxt > % fmt) > AttributeError: fmt has wrong number of % formats. %15s %.5f %.5f %5d %.4f > %.4f > > Plus, this one: > > np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s', '%.5f', > '%.5f', '%5d', '%.4f', '%.4f')) > > yields: > > Traceback (most recent call last): > File "/Users/marquett/workspace/Distort/src/StatsSep.py", line 44, in > > np.savetxt(Table, StatsAll, delimiter=' ', fmt=('%15s', '%.5f', '%.5f', > '%5d', '%.4f', '%.4f')) > File > "/Library/Frameworks/EPD64.framework/Versions/7.1/lib/python2.7/site-packages/numpy/lib/npyio.py", > line 966, in savetxt > raise AttributeError('fmt has wrong shape. %s' % str(fmt)) > AttributeError: fmt has wrong shape. ('%15s', '%.5f', '%.5f', '%5d', > '%.4f', '%.4f') > > Quite puzzling... > Should I switch to the I/O of asciitable package ? > Anyway, thanks again for your help. > JB > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From igouy2 at yahoo.com Wed Aug 31 12:01:27 2011 From: igouy2 at yahoo.com (Isaac Gouy) Date: Wed, 31 Aug 2011 09:01:27 -0700 (PDT) Subject: [Numpy-discussion] Numpy performance boost In-Reply-To: <1314788330.2418.13.camel@media> References: <1314788330.2418.13.camel@media> Message-ID: <4eaad006-f5be-4ede-9584-ad0559debf35@p25g2000pri.googlegroups.com> Dieter, thank you for contributing a numpy mandelbrot program - but no thanks for your "disqualified for doing things differently" comment here. The benchmarks game has been showing a spectral-norm program based on numpy as an "interesting alternative" for the last couple of years - http://shootout.alioth.debian.org/u64q/program.php?test=spectralnorm&lang=python3&id=2 - simply because I thought numpy was interesting and wanted somehow to include a numpy program without taking on the chore of dealing with a whole bunch of numpy programs. The relevant point isn't that your numpy program is shown as "an interesting alternative". The relevant point is that your numpy program is shown at all. best wishes, Isaac On Aug 31, 3:58?am, Dieter Weber wrote: > Hi, > just wanted to show an example of how python3 + numpy compares with just > python3 and many other languages and language implementations:http://shootout.alioth.debian.org/u64q/performance.php?test=mandelbro... > > The python3 program using numpy is #6 and you find it with the > "interesting alternative" programs on the bottom because it was > disqualified for doing things differently. It is 6.3x slower than the > fastest program and well ahead of all other interpreted languages. > > Thanks to all contributors for making numpy such a great piece of > software! > > Greetings, > Dieter From Chris.Barker at noaa.gov Wed Aug 31 12:08:23 2011 From: Chris.Barker at noaa.gov (Chris.Barker) Date: Wed, 31 Aug 2011 09:08:23 -0700 Subject: [Numpy-discussion] Numpy performance boost In-Reply-To: <1314788330.2418.13.camel@media> References: <1314788330.2418.13.camel@media> Message-ID: <4E5E5C77.1090307@noaa.gov> On 8/31/11 3:58 AM, Dieter Weber wrote: > just wanted to show an example of how python3 + numpy compares with just > python3 and many other languages and language implementations: > http://shootout.alioth.debian.org/u64q/performance.php?test=mandelbrot#about hmmm - it would be interesting to see what PyPy does with this. Also Cython -- can you call that another language? Done right, it should be in the C ballpark. > The python3 program using numpy is #6 and you find it with the > "interesting alternative" programs on the bottom because it was > disqualified for doing things differently. I'm not sure what they mean by "differently" -- but if it's because numpy is not a standard part of the language -- who cares. It's too bad, though -- a lot of people do discount numpy for that reason, but as far as I'm concerned, doing numerics without numpy is like doing text processing without the string class (type?). Python would be essentially useless if string were implemented as lists or tuples of characters, and everything had to loop through them. So why isn't an ndarray considered a first class citizen in python? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From igouy2 at yahoo.com Wed Aug 31 12:53:12 2011 From: igouy2 at yahoo.com (Isaac Gouy) Date: Wed, 31 Aug 2011 09:53:12 -0700 (PDT) Subject: [Numpy-discussion] Numpy performance boost References: <1314788330.2418.13.camel@media> <4E5E5C77.1090307@noaa.gov> Message-ID: <1314809592.67472.YahooMailNeo@web65615.mail.ac4.yahoo.com> ----- Original Message ----- > From: Chris.Barker > To: numpy-discussion at scipy.org > Cc: > Sent: Wednesday, August 31, 2011 9:08 AM > Subject: Re: [Numpy-discussion] Numpy performance boost > > On 8/31/11 3:58 AM, Dieter Weber wrote: >>? just wanted to show an example of how python3 + numpy compares with just >>? python3 and many other languages and language implementations: >> > http://shootout.alioth.debian.org/u64q/performance.php?test=mandelbrot#about > > hmmm - it would be interesting to see what PyPy does with this. So do it! http://shootout.alioth.debian.org/help.php#languagex Here's the nightly snapshot with source code for all the programs - https://alioth.debian.org/frs/?group_id=30402 Have fun. From mjanikas at esri.com Wed Aug 31 13:56:28 2011 From: mjanikas at esri.com (Mark Janikas) Date: Wed, 31 Aug 2011 10:56:28 -0700 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: Right indeed... I have spent a lot of time looking at this and it seems a waste of time as the results are garbage anyways when the columns are collinear. I am just going to set a threshold, check the condition number, continue is satisfied, return error/warning if not.... now, what is too large?.... Ill poke around. TY! MJ -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Pauli Virtanen Sent: Wednesday, August 31, 2011 2:00 AM To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm On Tue, 30 Aug 2011 15:48:18 -0700, Mark Janikas wrote: > Last week I posted a question involving the identification of linear > dependent columns of a matrix... but now I am finding an interesting > result based on the linalg.inv() function... sometime I am able to > invert a matrix that has linear dependent columns and other times I get > the LinAlgError()... this suggests that there is some kind of random > component to the INV method. Is this normal? I suspect that this is a case of floating-point rounding errors. Floating-point arithmetic is inexact, so even if a certain matrix is singular in exact arithmetic, for a computer it may still be invertible (by a given algorithm). This type of things are not unusual in floating-point computations. The matrix condition number (`np.linalg.cond`) is a better measure of whether a matrix is invertible or not. -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From bsouthey at gmail.com Wed Aug 31 14:10:56 2011 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 31 Aug 2011 13:10:56 -0500 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: References: Message-ID: <4E5E7930.6060209@gmail.com> On 08/31/2011 12:56 PM, Mark Janikas wrote: > Right indeed... I have spent a lot of time looking at this and it seems a waste of time as the results are garbage anyways when the columns are collinear. I am just going to set a threshold, check the condition number, continue is satisfied, return error/warning if not.... now, what is too large?.... Ill poke around. TY! > > MJ The results are not 'garbage' as if you have collinear columns as these have very well-known and understandable meaning. But if you don't expect this then you really need to examine how you are modeling or measuring your data because that is where the problem lies. For example, if you are measuring two variables then it means that those measurements are not independent as you are assuming. Bruce > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Pauli Virtanen > Sent: Wednesday, August 31, 2011 2:00 AM > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm > > On Tue, 30 Aug 2011 15:48:18 -0700, Mark Janikas wrote: >> Last week I posted a question involving the identification of linear >> dependent columns of a matrix... but now I am finding an interesting >> result based on the linalg.inv() function... sometime I am able to >> invert a matrix that has linear dependent columns and other times I get >> the LinAlgError()... this suggests that there is some kind of random >> component to the INV method. Is this normal? > I suspect that this is a case of floating-point rounding errors. > Floating-point arithmetic is inexact, so even if a certain matrix > is singular in exact arithmetic, for a computer it may still be > invertible (by a given algorithm). This type of things are not > unusual in floating-point computations. > > The matrix condition number (`np.linalg.cond`) is a better measure > of whether a matrix is invertible or not. > From mjanikas at esri.com Wed Aug 31 14:32:19 2011 From: mjanikas at esri.com (Mark Janikas) Date: Wed, 31 Aug 2011 11:32:19 -0700 Subject: [Numpy-discussion] Question on LinAlg Inverse Algorithm In-Reply-To: <4E5E7930.6060209@gmail.com> References: <4E5E7930.6060209@gmail.com> Message-ID: When I say garbage, I mean in the context of my hypothesis testing when in the presence of perfect multicollinearity. I advise the user of the combination that leads to the problem and move on.... -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Bruce Southey Sent: Wednesday, August 31, 2011 11:11 AM To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm On 08/31/2011 12:56 PM, Mark Janikas wrote: > Right indeed... I have spent a lot of time looking at this and it seems a waste of time as the results are garbage anyways when the columns are collinear. I am just going to set a threshold, check the condition number, continue is satisfied, return error/warning if not.... now, what is too large?.... Ill poke around. TY! > > MJ The results are not 'garbage' as if you have collinear columns as these have very well-known and understandable meaning. But if you don't expect this then you really need to examine how you are modeling or measuring your data because that is where the problem lies. For example, if you are measuring two variables then it means that those measurements are not independent as you are assuming. Bruce > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Pauli Virtanen > Sent: Wednesday, August 31, 2011 2:00 AM > To: numpy-discussion at scipy.org > Subject: Re: [Numpy-discussion] Question on LinAlg Inverse Algorithm > > On Tue, 30 Aug 2011 15:48:18 -0700, Mark Janikas wrote: >> Last week I posted a question involving the identification of linear >> dependent columns of a matrix... but now I am finding an interesting >> result based on the linalg.inv() function... sometime I am able to >> invert a matrix that has linear dependent columns and other times I get >> the LinAlgError()... this suggests that there is some kind of random >> component to the INV method. Is this normal? > I suspect that this is a case of floating-point rounding errors. > Floating-point arithmetic is inexact, so even if a certain matrix > is singular in exact arithmetic, for a computer it may still be > invertible (by a given algorithm). This type of things are not > unusual in floating-point computations. > > The matrix condition number (`np.linalg.cond`) is a better measure > of whether a matrix is invertible or not. > _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From cjordan1 at uw.edu Wed Aug 31 14:51:13 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 31 Aug 2011 13:51:13 -0500 Subject: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement) Message-ID: In numpy, is there a way of generating a random integer in a specified range where the integers in that range have given probabilities? So, for example, generating a random integer between 1 and 3 with probabilities [0.1, 0.2, 0.7] for the three integers? I'd like to know how to do this without replacement, as well. If the probabilities are uniform, there are a number of ways, including just shuffling the data and taking the first however-many elements of the shuffle. But this doesn't apply with non-uniform probabilities. Similarly, one could try arbitrary-sampling-method X (such as inverse-cdf sampling) and then rejecting repeats. But that is clearly sub-optimal if the number of samples desired is near the same order of magnitude as the total population, or if the probabilities are very skewed. (E.g. a weighted sample of size 2 without replacement from [0,1,2] with probabilities [0.999,.00005, 0.00005] will take a long time if you just sample repeatedly until you have two distinct samples.) I know parts of what I want can be done in scipy.statistics using a discrete_rv or with the python standard library's random package. I would much prefer to do it only using numpy because the eventual application shouldn't have a scipy dependency and should use the same random seed as numpy.random. (For more background, what I want is to create a function like sample in R, where I can give it an array-like of doo-hickeys and another array-like of probabilities associated with each doo-hickey, and then generate a random sample of doo-hickeys with those probabilities. One step for that is generating ints, to use as indices, with the same probabilities. I'd like a version of this to be in numpy/scipy, but it doesn't really belong in scipy since it doesn't -Chris JS From shish at keba.be Wed Aug 31 15:07:01 2011 From: shish at keba.be (Olivier Delalleau) Date: Wed, 31 Aug 2011 15:07:01 -0400 Subject: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement) In-Reply-To: References: Message-ID: You can use: 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7])) For your "real" application you'll probably want to use a value >1 for the first parameter (equal to your sample size), instead of calling it multiple times. -=- Olivier 2011/8/31 Christopher Jordan-Squire > In numpy, is there a way of generating a random integer in a specified > range where the integers in that range have given probabilities? So, > for example, generating a random integer between 1 and 3 with > probabilities [0.1, 0.2, 0.7] for the three integers? > > I'd like to know how to do this without replacement, as well. If the > probabilities are uniform, there are a number of ways, including just > shuffling the data and taking the first however-many elements of the > shuffle. But this doesn't apply with non-uniform probabilities. > Similarly, one could try arbitrary-sampling-method X (such as > inverse-cdf sampling) and then rejecting repeats. But that is clearly > sub-optimal if the number of samples desired is near the same order of > magnitude as the total population, or if the probabilities are very > skewed. (E.g. a weighted sample of size 2 without replacement from > [0,1,2] with probabilities [0.999,.00005, 0.00005] will take a long > time if you just sample repeatedly until you have two distinct > samples.) > > I know parts of what I want can be done in scipy.statistics using a > discrete_rv or with the python standard library's random package. I > would much prefer to do it only using numpy because the eventual > application shouldn't have a scipy dependency and should use the same > random seed as numpy.random. > > (For more background, what I want is to create a function like sample > in R, where I can give it an array-like of doo-hickeys and another > array-like of probabilities associated with each doo-hickey, and then > generate a random sample of doo-hickeys with those probabilities. One > step for that is generating ints, to use as indices, with the same > probabilities. I'd like a version of this to be in numpy/scipy, but it > doesn't really belong in scipy since it doesn't > > -Chris JS > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Wed Aug 31 15:17:04 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 31 Aug 2011 14:17:04 -0500 Subject: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement) In-Reply-To: References: Message-ID: On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau wrote: > You can use: > 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7])) > > For your "real" application you'll probably want to use a value >1 for the > first parameter (equal to your sample size), instead of calling it multiple > times. > > -=- Olivier Thanks. Warren (Weckesser) mentioned this possibility to me yesterday and I forgot to put it in my post. I assume you mean something like x = np.arange(3) y = np.random.multinomial(30, [0.1,0.2,0.7]) z = np.repeat(x, y) np.random.shuffle(z) That look right? -Chris JS > > 2011/8/31 Christopher Jordan-Squire >> >> In numpy, is there a way of generating a random integer in a specified >> range where the integers in that range have given probabilities? So, >> for example, generating a random integer between 1 and 3 with >> probabilities [0.1, 0.2, 0.7] for the three integers? >> >> I'd like to know how to do this without replacement, as well. If the >> probabilities are uniform, there are a number of ways, including just >> shuffling the data and taking the first however-many elements of the >> shuffle. But this doesn't apply with non-uniform probabilities. >> Similarly, one could try arbitrary-sampling-method X (such as >> inverse-cdf sampling) and then rejecting repeats. But that is clearly >> sub-optimal if the number of samples desired is near the same order of >> magnitude as the total population, or if the probabilities are very >> skewed. (E.g. a weighted sample of size 2 without replacement from >> [0,1,2] with probabilities [0.999,.00005, 0.00005] will take a long >> time if you just sample repeatedly until you have two distinct >> samples.) >> >> I know parts of what I want can be done in scipy.statistics using a >> discrete_rv or with the python standard library's random package. I >> would much prefer to do it only using numpy because the eventual >> application shouldn't have a scipy dependency and should use the same >> random seed as numpy.random. >> >> (For more background, what I want is to create a function like sample >> in R, where I can give it an array-like of doo-hickeys and another >> array-like of probabilities associated with each doo-hickey, and then >> generate a random sample of doo-hickeys with those probabilities. One >> step for that is generating ints, to use as indices, with the same >> probabilities. I'd like a version of this to be in numpy/scipy, but it >> doesn't really belong in scipy since it doesn't >> >> -Chris JS >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From shish at keba.be Wed Aug 31 15:22:43 2011 From: shish at keba.be (Olivier Delalleau) Date: Wed, 31 Aug 2011 15:22:43 -0400 Subject: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement) In-Reply-To: References: Message-ID: 2011/8/31 Christopher Jordan-Squire > On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau wrote: > > You can use: > > 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7])) > > > > For your "real" application you'll probably want to use a value >1 for > the > > first parameter (equal to your sample size), instead of calling it > multiple > > times. > > > > -=- Olivier > > Thanks. Warren (Weckesser) mentioned this possibility to me yesterday > and I forgot to put it in my post. I assume you mean something like > > x = np.arange(3) > y = np.random.multinomial(30, [0.1,0.2,0.7]) > z = np.repeat(x, y) > np.random.shuffle(z) > > That look right? > > -Chris JS > > Yes, exactly. -=- Olivier -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Aug 31 16:34:17 2011 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 31 Aug 2011 16:34:17 -0400 Subject: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement) In-Reply-To: References: Message-ID: On Wed, Aug 31, 2011 at 3:22 PM, Olivier Delalleau wrote: > 2011/8/31 Christopher Jordan-Squire >> >> On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau wrote: >> > You can use: >> > 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7])) >> > >> > For your "real" application you'll probably want to use a value >1 for >> > the >> > first parameter (equal to your sample size), instead of calling it >> > multiple >> > times. >> > >> > -=- Olivier >> >> Thanks. Warren (Weckesser) mentioned this possibility to me yesterday >> and I forgot to put it in my post. I assume you mean something like >> >> x = np.arange(3) >> y = np.random.multinomial(30, [0.1,0.2,0.7]) >> z = np.repeat(x, y) >> np.random.shuffle(z) >> >> That look right? >> >> -Chris JS >> > > Yes, exactly. Chuck's answer to the same question, when I asked on the list, used searchsorted and is fast cdfvalues.searchsorted(np.random.random(size)) my recent version of it for FiniteLatticeDistribution def rvs(self, size=1): '''draw random variables with shape given by size ''' #w = self.pdfvalues #p = cumsum(w)/float(w.sum()) #p.searchsorted(np.random.random(size)) return self.support[self.cdfvalues.searchsorted(np.random.random(size))] Josef > > -=- Olivier > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From cjordan1 at uw.edu Wed Aug 31 16:58:08 2011 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Wed, 31 Aug 2011 15:58:08 -0500 Subject: [Numpy-discussion] non-uniform discrete sampling with given probabilities (w/ and w/o replacement) In-Reply-To: References: Message-ID: On Wed, Aug 31, 2011 at 3:34 PM, wrote: > On Wed, Aug 31, 2011 at 3:22 PM, Olivier Delalleau wrote: >> 2011/8/31 Christopher Jordan-Squire >>> >>> On Wed, Aug 31, 2011 at 2:07 PM, Olivier Delalleau wrote: >>> > You can use: >>> > 1 + numpy.argmax(numpy.random.multinomial(1, [0.1, 0.2, 0.7])) >>> > >>> > For your "real" application you'll probably want to use a value >1 for >>> > the >>> > first parameter (equal to your sample size), instead of calling it >>> > multiple >>> > times. >>> > >>> > -=- Olivier >>> >>> Thanks. Warren (Weckesser) mentioned this possibility to me yesterday >>> and I forgot to put it in my post. I assume you mean something like >>> >>> x = np.arange(3) >>> y = np.random.multinomial(30, [0.1,0.2,0.7]) >>> z = np.repeat(x, y) >>> np.random.shuffle(z) >>> >>> That look right? >>> >>> -Chris JS >>> >> >> Yes, exactly. > > Chuck's answer to the same question, when I asked on the list, used > searchsorted and is fast > > cdfvalues.searchsorted(np.random.random(size)) > > my recent version of it for FiniteLatticeDistribution > > ? ?def rvs(self, size=1): > ? ? ? ?'''draw random variables with shape given by size > > ? ? ? ?''' > ? ? ? ?#w = self.pdfvalues > ? ? ? ?#p = cumsum(w)/float(w.sum()) > ? ? ? ?#p.searchsorted(np.random.random(size)) > ? ? ? ?return self.support[self.cdfvalues.searchsorted(np.random.random(size))] > > Josef > That's exactly what I needed. Thanks! -Chris JS > >> >> -=- Olivier >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >