[Numpy-discussion] calculating weighted majority using two 3D arrays

Thu Mar 6 23:06:57 EST 2008

On Thu, Mar 6, 2008 at 11:37 AM, Gregory, Matthew <
matt.gregory at oregonstate.edu> wrote:

> Eads, Damian wrote:
> > You may need to be a bit more specific by what you mean by
> > weighted majority. What are the range of values for values
> > and weights, specifically? This sounds a lot like pixel
> > classification where each pixel is classified with a majority
> > vote over its weights and values. Is that what you're trying to do?
> >
> > Many numpy functions (e.g. mean, max, min, sum) have an axis
> > parameter, which specifies the axis along which the statistic
> > is computed. Omitting the axis parameter causes the statistic
> > to be computed over all values in the multidimensional array.
> >
> > Suppose the 'values' array contains floating point numbers in
> > the range
> > -1 to 1 and a larger absolute value gives a larger
> > confidence. Also suppose the weights are floating point
> > numbers between 0 and 1. The weighted majority vote for pixel
> > i,j over 10 real-valued (confidenced) votes, each vote having
> > a separate weight, is computed by
> >
> >    w_vote = numpy.sign((values[:,i,j]*weights[:,i,j]).sum())
> >
> > This can be vectorized to give a weighted majority vote for
> > each pixel by doing
> >
> >    w_vote = numpy.sign((values*weights).sum(axis=0))
> >
> > The values*weights expression gives a weighted prediction.
> > This also works if the 'values' are just predictions from the
> > set {-1, 1}, i.e.
> > there are ten classifiers, each one predicts either -1 and 1
> > on each pixel.
>
> Damian, thank you for the helpful response.  I should have been a bit
> more explicit about what I meant by weighted majority.  In my case, I
> need to find a discrete value (i.e. class) that occurs most often among
> ten observations where weighting is pre-determined by an
> inverse-distance calculation.  Ignoring for a moment the
> multidimensionality issue, my values and weights arrays might look like
> this:
>
> values = array([14, 32, 12, 50, 2, 8, 19, 12, 19, 10])
> weights = array([0.5, 0.1, 0.6, 0.1, 0.8, 0.3, 0.8, 0.4, 0.9, 0.2])
>
> My function to calculate the majority looks like this:
>
> def weightedMajority(a, b):
>
>        # Put all the samples into a dictionary with weights summed for
>        # duplicate values
>        wDict = {}
>        for i in xrange(len(a)):
>                (value, weight) = (a[i], b[i])
>
>                if wDict.has_key(value):
>                        wDict[value] += weight
>                else:
>                        wDict[value] = weight
>
>        # Create arrays of the values and weights
>        values = numpy.array(wDict.keys())
>        weights = numpy.array(wDict.values())
>
>        # Return the index of the maximum value
>        index = numpy.argmax(weights)
>
>        # Return the majority value
>        return values[index]
>
> In the above example:
>
> >> maj = weightedMajority(values, weights)
> >> maj
> 19
>

[SNIP]

If your values are integers in a reasonably small range, then you might want
to use an array to hold your weights as it makes things simpler and likely
faster. For example:

    from itertools import izip

    def weightedMajority2(a, b):
        wMap = np.zeros(256, float) # assume all values fall in [0,255]
        for value, weight in izip(a, b):
            wMap[value] += weight
        return numpy.argmax(wMap)

Regards,

-- 
. __
. |-\
.
. tim.hochberg at ieee.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080306/619559d8/attachment.html>