[Numpy-discussion] Pull request review #3770: Trapezoidal distribution

Jeremy Hetzel jthetzel at gmail.com
Sat Sep 21 13:55:36 EDT 2013


I've added a trapezoidal distribution to numpy.random for consideration,
pull request 3770:
https://github.com/numpy/numpy/pull/3770

Similar to the triangular distribution, the trapezoidal distribution may be
used where the underlying distribution is not known, but some knowledge of
the limits and mode exists. The trapezoidal distribution generalizes the
triangular distribution by allowing the modal values to be expressed as a
range instead of a point estimate.

The trapezoidal distribution implemented, known as the "generalized
trapezoidal distribution," has three additional parameters: growth, decay,
and boundary ratio. Adjusting these from the default values create
trapezoidal-like distributions with non-linear behavior. Examples can be
seen in an R vignette (
http://cran.r-project.org/web/packages/trapezoid/vignettes/trapezoid.pdf ),
as well as these papers by J.R. van Dorp and colleagues:

1) van Dorp, J. R. and Kotz, S. (2003) Generalized trapezoidal
distributions. Metrika. 58(1):85–97. Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/Metrika2003VanDorp.pdf

2) van Dorp, J. R., Rambaud, S.C., Perez, J. G., and Pleguezuelo, R. H.
(2007) An elicitation procedure for the generalized trapezoidal
distribution with a uniform central stage. Decision Analysis Journal.
4:156–166. Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/DA2007.pdf

The docstring for the proposed numpy.random.trapezoidal() is as follows:

"""
        trapezoidal(left, mode1, mode2, right, size=None, m=2, n=2, alpha=1)

        Draw samples from the generalized trapezoidal distribution.

        The trapezoidal distribution is defined by minimum (``left``),
lower mode (``mode1``), upper
        mode (``mode1``), and maximum (``right``) parameters. The
generalized trapezoidal distribution
        adds three more parameters: the growth rate (``m``), decay rate
(``n``), and boundary
        ratio (``alpha``) parameters. The generalized trapezoidal
distribution simplifies
        to the trapezoidal distribution when ``m = n = 2`` and ``alpha =
1``. It further
        simplifies to a triangular distribution when ``mode1 == mode2``.

        Parameters
        ----------
        left : scalar
            Lower limit.
        mode1 : scalar
            The value where the first peak of the distribution occurs.
            The value should fulfill the condition ``left <= mode1 <=
mode2``.
        mode2 : scalar
            The value where the first peak of the distribution occurs.
            The value should fulfill the condition ``mode1 <= mode2 <=
right``.
        right : scalar
            Upper limit, should be larger than or equal to `mode2`.
        size : int or tuple of ints, optional
            Output shape. Default is None, in which case a single value is
            returned.
        m : scalar, optional
            Growth parameter.
        n : scalar, optional
            Decay parameter.
        alpha : scalar, optional
            Boundary ratio parameter.

        Returns
        -------
        samples : ndarray or scalar
            The returned samples all lie in the interval [left, right].

        Notes
        -----
        With ``left``, ``mode1``, ``mode2``, ``right``, ``m``, ``n``, and
``alpha`` parametrized as
        :math:`a, b, c, d, m, n, \\text{ and } \\alpha`, respectively,
        the probability density function for the generalized trapezoidal
distribution is

        .. math::
                  f{\\scriptscriptstyle X}(x\mid\theta) =
\\mathcal{C}(\\Theta) \\times
                      \\begin{cases}
                          \\alpha \\left(\\frac{x - \\alpha}{b - \\alpha}
\\right)^{m - 1}, & \\text{for } a \\leq x < b \\\\
                          (1 - \\alpha) \\left(\frac{x - b}{c - b} \\right)
+ \\alpha, & \\text{for } b \\leq x < c \\\\
                          \\left(\\frac{d - x}{d - c} \\right)^{n-1}, &
\\text{for } c \\leq x \\leq d
                      \\end{cases}

        with the normalizing constant :math:`\\mathcal{C}(\\Theta)` defined
as

        ..math::
                \\mathcal{C}(\\Theta) =
                    \\frac{2mn}
                    {2 \\alpha \\left(b - a\\right) n +
                        \\left(\\alpha + 1 \\right) \\left(c - b \\right)mn
+
                        2 \\left(d - c \\right)m}

        and where the parameter vector :math:`\\Theta = \\{a, b, c, d, m,
n, \\alpha \\}, \\text{ } a \\leq b \\leq c \\leq d, \\text{ and } m, n,
\\alpha >0`.

        Similar to the triangular distribution, the trapezoidal
distribution may be used where the
        underlying distribution is not known, but some knowledge of the
limits and
        mode exists. The trapezoidal distribution generalizes the
triangular distribution by allowing
        the modal values to be expressed as a range instead of a point
estimate. The growth, decay, and
        boundary ratio parameters of the generalized trapezoidal
distribution further allow for non-linear
        behavior to be specified.

        References
        ----------
        .. [1] van Dorp, J. R. and Kotz, S. (2003) Generalized trapezoidal
distributions.
                Metrika. 58(1):85–97.
                Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/Metrika2003VanDorp.pdf
        .. [2] van Dorp, J. R., Rambaud, S.C., Perez, J. G., and
Pleguezuelo, R. H. (2007)
                An elicitation proce-dure for the generalized trapezoidal
distribution with a uniform central stage.
                Decision AnalysisJournal. 4:156–166.
                Preprint available:
http://www.seas.gwu.edu/~dorpjr/Publications/JournalPapers/DA2007.pdf

        Examples
        --------
        Draw values from the distribution and plot the histogram:

        >>> import matplotlib.pyplot as plt
        >>> h = plt.hist(np.random.triangular(0, 0.25, 0.75, 1, 100000),
bins=200,
        ...              normed=True)
        >>> plt.show()

"""

I am unsure if NumPy encourages incorporation of new distributions into
numpy.random or instead into separate modules, but found the exercise to be
helpful regardless.

Thanks,
Jeremy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20130921/f69cee20/attachment.html>


More information about the NumPy-Discussion mailing list