[Numpy-discussion] numpy.power -> numpy.random.choice Probabilities don't sum to 1
Sebastian Berg
sebastian at sipsolutions.net
Sat Dec 19 11:17:22 EST 2015
On Sa, 2015-12-19 at 06:55 -0600, Andy Ray Terrel wrote:
> A simple fix would certainly by pass the check in random.choice, but I
> don't know how to get that. So let's focus on the summation.
>
>
> I believe you are hitting an instability in summing small numbers as a
> power to 10th order would produce. Here is an example:
>
>
> mymatrix = np.random.rand(1024,1024).astype('float16')*1e-7
> row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True)
>
> sums = row_normalized.sum(axis=1)
> len(sums[sums != 1]) # -> 108
>
>
> One can use things like Kahan summation and you will need to collect
> the size of the error and truncate all numbers in mymatrix under that
> error. I'm not quite sure how to quickly implement such a thing in
> numpy without a loop.
In fact, the code even seems to do kahan summation, however, I think it
always assumes double precision for the p keyword argument, so as a work
around at least, you have to sum to convert to and normalize it as
double.
- Sebastian
>
> On Fri, Dec 18, 2015 at 7:00 PM, Nathaniel Smith <njs at pobox.com>
> wrote:
> On Fri, Dec 18, 2015 at 1:25 PM, Ryan R. Rosario
> <ryan at bytemining.com> wrote:
> > Hi,
> >
> > I have a matrix whose entries I must raise to a certain
> power and then normalize by row. After I do that, when I pass
> some rows to numpy.random.choice, I get a ValueError:
> probabilities do not sum to 1.
> >
> > I understand that floating point is not perfect, and my
> matrix is so large that I cannot use np.longdouble because I
> will run out of RAM.
> >
> > As an example on a smaller matrix:
> >
> > np.power(mymatrix, 10, out=mymatrix)
> > row_normalized = np.apply_along_axis(lambda x: x /
> np.sum(x), 1, mymatrix)
>
> I'm sorry I don't have a solution to your actual problem off
> the top
> of my head, but it's probably helpful in general to know that
> a better
> way to write this would be just
>
> row_normalized = mymatrix / np.sum(mymatrix, axis=1,
> keepdims=True)
>
> apply_along_axis is slow and can almost always be replaced by
> a
> broadcasting expression like this.
>
> > sums = row_normalized.sum(axis=1)
> > sums[np.where(sums != 1)]
>
> And here you can just write
>
> sums[sums != 1]
>
> i.e. the call to where() isn't doing anything useful.
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151219/982232df/attachment.sig>
More information about the NumPy-Discussion
mailing list