[Catalog-sig] Please turn off ratings

Wed Apr 6 19:06:42 CEST 2011

On Wed, Apr 6, 2011 at 10:33 AM, Brian Jones <bkjones at gmail.com> wrote:
> I, for one, am not a fan of the ratings system, but I don't agree that PyPI
> has no business having one.

I'd probably be less inclined to complain if the system actually did
something useful, but I maintain that having a rating system is not
part of the mission of a catalog, and that the existence of one,
however nifty, is exclusionary, biased, and detracts from the
community-resource nature of a catalog.

The vast majority of programming language package catalogs don't have
any sort of ratings system -- RubyGems, NPM, PEAR, Hackage, CRAN, ...
in fact, the only example ( can find of a package catalog that *does*
have ratings is CPAN, and those ratings are on an external site
(cpanratings).

I have absolutely no problem with the idea of rating packages. Heck, I
built one myself once (https://github.com/jacobian/cheeserater) and I
promote http://djangopackages.com/ every chance I get. I don't believe
that these features are within the mission of the catalog itself, and
in fact I believe they're actively harmful when implicitly endorsed by
the catalogers.

All that being said, it's clear that I've lost this argument, so I
agree that if PyPI must have a ratings system, then it should at least
be a good one.

> Jacob, what would make you *want* those emails you're getting as a package
> owner? What would make users *want* to leave feedback that would be useful
> to maintainers & other users?

Well, I might be the wrong person to ask. I'm displaying a strong
bais, and I seem to be doing a great job acting like an asshole, so I
suspect any suggestions coming from me are going to be dismissed out
of hand at this point. That said, I'll try to engage with this
question dispassionately and give a constructive answer.

I'm a process nerd, so I tend to break this sort of problem down into
a series of "user stories" to describe the different stakeholders and
use cases a system like this should support. Here are the ones I can
come up with; I think I've got most of the important ones, but perhaps
I'm missing a few:

1. Amy is a Python user looking for a package to solve a specific
problem. When confronted with a list of alternatives ratings could
help her decide which package she should try first. When confronted
with a specific package, a rating could help her decide whether the
package is worth her time.

For ratings to be useful to Amy, then, they'll need to specific and
actionable -- that is, a rating should correspond to some indication
of how "useful" the package would be. Amy needs to be able to set a
threshold below which she won't consider a package. "3" doesn't help
Amy: how should she decide whether a "3" is worth her time or not?
"14% of users uninstalled this package within an hour" might be a more
useful sort of rating for Amy. "27 people liked this package" might
also be useful. A simple count of downloads is fairly useful as well.
Amy needs a context against which to evaluate these ratings.

Amy also needs ratings to cover a fairly broad spectrum of other users
like Amy. A single rating isn't helpful, nor is just a handful. She
probably needs somewhere in the neighborhood of about a dozen ratings
to be fairly confidant she's getting a useful indication of utility.

2. Brian is a Python user who's downloaded a package and wants to
provide feedback on it. Perhaps he's found that the tool he downloaded
doesn't work and wants to complain. Perhaps he has a patch. Perhaps
he's happy and wants to give the developer kudos.

Like Amy, Brian needs ratings to be specific and actionable: he needs
to be able to know *why* he should give a package a rating of "fresh"
instead of "moldy", or what effect clicking "Like" has. Brian's been
in Amy's shoes, so it's really the same problem, though for Brian, a
score is more useful than a "like" since "like" won't give him any way
of expressing negative feedback or patches. So Brian would appreciate
other feedback mechanisms besides just a strict rating (but that's out
of scope right now).

3. Carol is a package maintainer who lists her work on PyPI. Carol
might want to use ratings to determine the future direction of her
work, find out what users like and dislike, or even just find out if
anyone's paying attention.

Like Amy and Brian, Carol needs ratings to be actionable but in a
slightly different manner. Positive ratings are actually fairly
useless to Carol: knowing that 17 users like her code doesn't give her
anything beyond an ego boost. Instead, Carol needs feedback that gives
her a specific way to move forward: "fix X", "improve Y", "add Z". So
numeric ratings don't particularly help Carol unless they're tied to a
metric. If a score of 7.5 just represents an abstract "likeiness"
rating that's useless to her, but if a score of "B-" indicates "works
as advertised but has a few bugs" that's more helpful.

Carol might appreciate feedback that comes on PyPI, but Carol also
might have different mechanisms for users to give her feedback -- a
ticket tracker, mailing list, feedback tool, etc. -- that she'd rather
use. So Carol probably wants the ability to redirect or cross-post
feedback into another system to avoid creating a support channel she
doesn't check often.

4. Dave is a maintainer of PyPI. Dave's main goal is to help people
find good code on PyPI, so Dave's interest in ratings is towards that
end. Dave's thus probably most invested in seeing as many ratings as
possible (along with the usual needs of making sure that the system
isn't gamed or abused). Dave wants to see a rating system that
provides as much information to users as possible.

Thus, Dave would probably be most happy with a rating that's automatic
or very easy to use. Obviousness is important to Dave: he wants users
to have as little confusion around leaving a rating as possible. A "I
like this" or "I use this" button is something Dave might really like:
it's clear, easy to use, and has a clear precedent (Facebook). He
might also be happy with systems that collect some form of "happiness"
statistic or proxy automatically from users.

In the end, however, Dave's main interest is just in seeing the system
used and used frequently.

5. Edith is a PSF member, or board member, or director. Her main
interest is similar to Dave's: she wants to be able to use PyPI as a
proxy to demonstrate a vibrant, active community. She wants to point
to ongoing traffic and activity on PyPI and show that Python's being
used frequently and usually liked. She wants PyPI to be a place with
as broad an appeal as possible -- fragmentation is her enemy.

Edith has very little investment in the specifics of a ratings
mechanism, though she's probably less interested in anything involving
negative feedback since that sort of thing can turn into a flamewar on
PyPI. But she'd probably go along with any mechanism that helps
demonstrate -- and reinforce! -- the vibrancy of the Python community.

----

To me, the conclusions from the above write themselves. I think I'll
avoid drawing them, though, to try to keep this as dispassionate as
possible.

Jacob