[Numpy-discussion] numpy.power -> numpy.random.choice Probabilities don't sum to 1

Sebastian Berg sebastian at sipsolutions.net
Sat Dec 19 11:17:22 EST 2015


On Sa, 2015-12-19 at 06:55 -0600, Andy Ray Terrel wrote:
> A simple fix would certainly by pass the check in random.choice, but I
> don't know how to get that. So let's focus on the summation.
> 
> 
> I believe you are hitting an instability in summing small numbers as a
> power to 10th order would produce. Here is an example:
> 
> 
> mymatrix = np.random.rand(1024,1024).astype('float16')*1e-7
> row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True)
> 
> sums = row_normalized.sum(axis=1)
> len(sums[sums != 1]) # ->  108
> 
> 
> One can use things like Kahan summation and you will need to collect
> the size of the error and truncate all numbers in mymatrix under that
> error.  I'm not quite sure how to quickly implement such a thing in
> numpy without a loop.

In fact, the code even seems to do kahan summation, however, I think it
always assumes double precision for the p keyword argument, so as a work
around at least, you have to sum to convert to and normalize it as
double.

- Sebastian

> 
> On Fri, Dec 18, 2015 at 7:00 PM, Nathaniel Smith <njs at pobox.com>
> wrote:
>         On Fri, Dec 18, 2015 at 1:25 PM, Ryan R. Rosario
>         <ryan at bytemining.com> wrote:
>         > Hi,
>         >
>         > I have a matrix whose entries I must raise to a certain
>         power and then normalize by row. After I do that, when I pass
>         some rows to numpy.random.choice, I get a ValueError:
>         probabilities do not sum to 1.
>         >
>         > I understand that floating point is not perfect, and my
>         matrix is so large that I cannot use np.longdouble because I
>         will run out of RAM.
>         >
>         > As an example on a smaller matrix:
>         >
>         > np.power(mymatrix, 10, out=mymatrix)
>         > row_normalized = np.apply_along_axis(lambda x: x /
>         np.sum(x), 1, mymatrix)
>         
>         I'm sorry I don't have a solution to your actual problem off
>         the top
>         of my head, but it's probably helpful in general to know that
>         a better
>         way to write this would be just
>         
>           row_normalized = mymatrix / np.sum(mymatrix, axis=1,
>         keepdims=True)
>         
>         apply_along_axis is slow and can almost always be replaced by
>         a
>         broadcasting expression like this.
>         
>         > sums = row_normalized.sum(axis=1)
>         > sums[np.where(sums != 1)]
>         
>         And here you can just write
>         
>           sums[sums != 1]
>         
>         i.e. the call to where() isn't doing anything useful.
>         
>         -n
>         
>         --
>         Nathaniel J. Smith -- http://vorpus.org
>         _______________________________________________
>         NumPy-Discussion mailing list
>         NumPy-Discussion at scipy.org
>         https://mail.scipy.org/mailman/listinfo/numpy-discussion
>         
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151219/982232df/attachment.sig>


More information about the NumPy-Discussion mailing list