Generate a sequence of random numbers that sum up to 1?

Robert Kern robert.kern at gmail.com
Sun Apr 23 18:31:13 EDT 2006


Alex Martelli wrote:
> fumanchu <fumanchu at amor.org> wrote:
> 
>>I'm surprised noone has pursued a course of subtraction rather than
>>division. Say you want 10 numbers:
>>
>>>>>s = 1.0
>>>>>n = []
>>>>>for x in xrange(9):
>>
>>...   value = random.random() * s
>>...   n.append(value)
>>...   s -= value
>>...
>>
>>>>>n.append(s)
>>>>>n
>>
>>[0.7279111122901516, 0.082128708606867745, 0.0080516733577621798,
>>0.12122060245902817, 0.0034460458833209676, 0.0021046234724371184,
>>0.054109424914363845, 0.00035750970249204185, 0.00051175075536832372,
>>0.00015854855820800087]
>>
>>>>>sum(n)
>>
>>1.0
>>
>>Either:
>> 1) Just because they're *ordered* doesn't mean they're not *random*,
>>or
>> 2) You all now know why I'm not a mathematician. ;)
>>
>>It seems to me that the only constraint on the randomness of my results
>>is the OP's constraint: that they sum to 1. I'd be fascinated to learn
>>if and why that wouldn't work.
> 
> n[0] is uniformly distributed between 0 and 1; n[1] is not -- not sure
> how to characterize its distribution, but it's vastly skewed to favor
> smaller values -- and further n[x] values for x>1 are progressively more
> and more skewed similarly.
> 
> Such total disuniformity, where the very distribution of each value is
> skewed by the preceding one, may still be "random" for some sufficiently
> vague meaning of "random", but my intuition suggests it's unlikely to
> prove satisfactory for the OP's purposes.

[digression]

All of this discussion about whether the distribution of values is uniform or
not doesn't mean much until one has defined "uniformity," or equivalently,
"distance" in the space we're talking about. In this case, we're talking about
the unit n-simplex space (SS^n) which has elements S=(s_1, s_2, ... s_n) where
sum(S) = 1 and s_i >= 0. I favor the Aitchison distance:

import numpy as np

def aitchison_distance(x, y):
    """ Compute the Aitchison distance between two vectors in simplex space.
    """
    lx = np.log(x)
    ly = np.log(y)
    lgx = np.mean(lx, axis=-1)
    lgy = np.mean(ly, axis=-1)
    diff = (lx - lgx) - (ly - lgy)
    return np.sqrt(np.sum(diff*diff))

Note that zeros yield inifinities, so the borders of the unit simplex are
infinitely farther away from other points in the interior. Consequently,
generating "uniform" random samples from this space is as impractical as it is
to draw "uniform" random samples from the entire infinite real number line. It's
also not very interesting. However, one can transform SS^n into RR^(n-1) and
back again, so drawing numbers from a multivariate normal of whatever mean and
covariance you like will give you "nice" simplicial data and quite possibly even
realistic data, too, depending on your model. I like using the isometric
log-ratio transform ("ilr" transform) for this.

Good Google search terms: "compositional data", Aitchison

But of course, for the OP's purpose of creating synthetic Markov chain
transition tables, generating some random vectors uniformly on [0, 1)^n and
normalizing them to sum to 1 works a treat. Don't bother with anything else.

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco




More information about the Python-list mailing list