Looking for feedback on weighted voting algorithm

Fri Apr 15 13:52:20 EDT 2016

On Thursday, April 14, 2016 at 1:48:40 PM UTC-7, Michael Selik wrote:
> On Thu, Apr 14, 2016, 7:37 PM justin walters <walters.justin01 at gmail.com>
> wrote:
> 
> > On Apr 14, 2016 9:41 AM, "Martin A. Brown" <martin at linux-ip.net> wrote:
> > >
> > >
> > > Greetings Justin,
> > >
> > > >    score = sum_of_votes/num_of_votes
> > >
> > > >votes = [(72, 4), (96, 3), (48, 2), (53, 1), (26, 4), (31, 3), (68, 2),
> > (91, 1)]
> > >
> > > >Specifically, I'm wondering if this is a good algorithm for
> > > >weighted voting. Essentially a vote is weighted by the number of
> > > >votes it counts as. I realize that this is an extremely simple
> > > >algorithm, but I was wondering if anyone had suggestions on how to
> > > >improve it.
> > >
> > > I snipped most of your code.  I don't see anything wrong with your
> > > overall approach.  I will make one suggestion: watch out for
> > > DivisionByZero.
> > >
> > >     try:
> > >         score = sum_of_votes / num_of_votes
> > >     except ZeroDivisionError:
> > >         score = float('nan')
> > >
> > > In your example data, all of the weights were integers, which means
> > > that a simple mean function would work, as well, if you expanded the
> > > votes to an alternate representation:
> > >
> > >   votes = [72, 72, 72, 72, 96, 96, 96, 48, 48, 53, 26, 26, 26, 26,
> > >            31, 31, 31, 68, 68, 91]
> > >
> > > But, don't bother!
> > >
> > > Your function can handle votes that have a float weight:
> > >
> > >   >>> weight([(4, 1.3), (1, 1),])
> > >   2.695652173913044
> > >
> > > Have fun!
> > >
> > > -Martin
> > >
> > > --
> > > Martin A. Brown
> > > http://linux-ip.net/
> >
> > Thanks Martin!
> >
> > I'll add the check for division by zero. Didn't think about that. I think
> > I'm going to sanitize input anyways, but always better to be safe than
> > sorry.
> >
> 
> I suggest not worrying about sanitizing inputs. If someone provides bad
> data, Python will do the right thing: stop the program and print an
> explanation of what went wrong, often a more helpful message than one you'd
> write. Use error handling mostly for when you want to do something *other*
> than stop the program.
> 
> I'm not sure I'd use NaN instead of raise division by zero error. NaNs can
> be troublesome for downstream code that might not notice until it gets
> confusing. A div-by-zero error is clear and easier to track down because of
> the traceback.
> 
> What do you think of using list comprehensions?
> 
>     weighted_sum = sum(rating * weight for rating, weight in votes)
>     total_weights = sum(weight for rating, weight in votes)
>     score = weighted_sum / total_weights
> 
> It's two loops as I wrote it, which is instinctively slower, but it might
> actually execute faster because of the built-in sum vs a regular for loop.
> Not sure.
> 
> >

I disagree with your notion of not sanitizing inputs.

If I'm the only person that will be using my program, then sure, I won't sanitize inputs.  It was my mistake and I should read the traceback and know what I did wrong.

But if ANY third party is going to be using it, especially as part of a web service of some kind, I'd much rather sanitize the inputs and tell the user they did something bad than to have the script crash and give a 500 Internal Server Error to the user.

Even worse, I definitely wouldn't want to give any hints of the source code organization by letting the user see the traceback.  I'll log it privately so I can view it myself, of course.