[SciPy-User] rvs and broadcasting

Thu Nov 24 15:21:52 EST 2011

On Thu, Nov 24, 2011 at 2:50 PM, Friedrich Romstedt
<friedrichromstedt at gmail.com> wrote:
> 2011/11/24  <josef.pktd at gmail.com>:
>> 1) broadcast shape parameters, loc and scale, if they are arrays
>> produce rvs in that shape, and, if in this case size is not the same
>> or 1, then raise a ValueError
>> essentially
>>    lower, upper, loc, scale = np.broadcast_arrays(lower, upper, loc, scale)
>>    if (np.size(lower) > 1) and (size != (1,)) and (lower.shape != size):
>>        raise ValueError('Do you really want this? Then do it yourself.')
>>
>> 2) broadcast shape parameters, loc and scale,  for each of these
>> create random variables given by size, the return shape is essentially
>> broadcasted shape concatenated with size, for example
>>
>> assert_equal(truncnorm_rvs(lower*np.arange(4)[:,None], upper,
>>                                loc=np.arange(5), scale=1, size=(2,3)).shape,
>>                     (4, 5, 2, 3))
>>
>> this version is attached.
>>
>> Any opinions about which version should be preferred?
>>
>> (As aside, truncnorm and other distribution with parameter dependent
>> support might also have other problems than just the broadcasting of
>> shape and scale.)
>
> I don't know the context but the first solution is *a lot* more
> readable.  I'm not even interested in understanding the cryptic second
> one.

The first case is almost what numpy random does. When it allows array
parameters, then size has to correspond to the parameter shape.
size is just an nuisance parameter in this case, and is only used for
double checking. The simple fix to scipy's rvs would then just to
raise an exception if size != broadcasted shape.

The second case is not so difficult to understand: for each parameter
vector generate random variables given by size.
size is not redundant, and I have vectorized random sampling

>>> rvs = truncnorm_rvs(lower, upper, loc=np.arange(5), scale=1, size=4)
>>> rvs
array([[-0.37710312,  0.66820212,  1.00771998, -0.15534072],
       [ 0.81969069,  0.05458267,  0.57364918,  2.87949887],
       [ 3.40269956,  1.59968417,  2.56405538,  0.78657868],
       [ 2.06382416,  4.3537613 ,  5.0337632 ,  2.37801841],
       [ 4.32319984,  0.6052688 ,  3.72383011,  4.11273451]])
>>> rvs.mean(-1)
array([ 0.28586957,  1.08185535,  2.08825445,  3.45734177,  3.19125831])

>
> I don't know if all of the checks done in the ``if`` scope are really
> necessary, but I just believe you.
>
> Alternatively, since you essentially want to check if things are
> broadcastable AISI, why not just broadcasting them with a result array
> of shape ``size`` you need to generate anyway along the way,
> broadcasting all three and catching the ``ValueError: shape mismatch``
> exception?

I would still have to broadcast twice, once to get the shape of the
parameters and once to include the additional "size" dimension.

to your PS: I don't know of a numpy function that just calculates the
broadcasted shape without actually doing the broadcasting.

>
> Broadcasting is cheap since it plays stride stricks IIRC:
>
>>>> x = numpy.asarray([1, 2])
>>>> x2, y = numpy.broadcast_arrays(x, [[1, 2], [3, 4]])
>>>> x2.strides
> (0, 4)
>
> Would maybe make the algorithm more general?

I don't see where it could be more general, but there might be ways to
make the shape handling more straight forward.

size is given as a shape tuple, so I cannot use broadcast_arrays for
it without actually creating the array.

thanks,

Josef

>
> Might be there are some side effects I'm overlooking atm.  I'm not
> paying too much attention, since I already made my point that I would
> prefer the first one, and my suggestion is not crucial since you'll
> test it anyway.
>
> Friedrich
>
> P.S.: Maybe there is a numpy function around to just test
> compatibility instead of testing + broadcasting at the same time in
> direct succession?
> _______________________________________________
> SciPy-User mailing list
> SciPy-User at scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>