Vote tallying...

Stefan Behnel stefan_ml at behnel.de
Fri Jan 18 03:47:43 EST 2013


Andrew Robinson, 18.01.2013 00:59:
> I have a problem which may fit in a mysql database

Everything fits in a MySQL database - not a reason to use it, though. Py2.5
and later ship with sqlite3 and if you go for an external database, why use
MySQL if you can have PostgreSQL for the same price?


> but which I only have
> python as an alternate tool to solve... so I'd like to hear some opinions...
> 
> I'm building a experimental content management program on a standard Linux
> Web server.
> And I'm needing to keep track of archived votes and their voters -- for years.
> 
> Periodically, a python program could be given a batch of new votes removed
> from the database, and some associated comments, which are no longer
> real-time necessary;  and then a python script needs to take that batch of
> votes, and apply them to an appropriate archive file.  It's important to
> note that it won't just be appending new votes, it will be sorting through
> a list of 10's of thousands of votes, and changing a *few* of them, and
> appending the rest.
> 
> XML may not be the ideal solution, but I am easily able to see how it might
> work.  I imagine a file like the following might be inefficient, but
> capable of solving the problem:
> 
> <?xml version="1.0"?>
> <data>
> 
>    <identify>
>        <contentid>12345A3</contentid>
>        <authorid>FF734B5D</authorid>
>        <permissions>7FBED</permissions>
>        <chapter>The woodstock games</chapter>
>     </identify>
> 
>     <comments>
>         <comment id="FF53524" date="2013.01.12">I think you're on drugs,
> man.!</comment>
>         <comment id="unregistered" date="2013.01.12">It would have been
> better if they didn't wake up in the morning.</comment>
>     </comments>
> 
>     <votes>
>         <v id="FF3424">10</v>
>         <v id="F713A4">1</v>
>         <v id="12312234">3</v>
>     </votes>
> </data>
> 
> The questions I have are, is using XML for vote recording going to be slow
> compared to other stock solutions that Python may have to offer?  The voter
> ID's are unique, 32 bits long, and the votes are only from 1 to 10. (4
> bits.).  I'm free to use any import that comes with python 2.5. so if
> there's something better than XML, I'm interested.
> 
> And secondly, how likely is this to still work once the vote count reaches
> 10 million?
> Is an XML file with millions of entries something someone has already tried
> succesfully?

Sure. However, XML files are a rather static thing and meant to be
processed from start to end on each run. That adds up if the changes are
small and local while the file is ever growing. You seem to propose one
file per article, which might work. That's unlikely to become too huge to
process, and Python's cElementTree is a very fast XML processor.

However, your problem sounds a lot like you could map it to one of the dbm
databases that Python ships. They work like dicts, just on disk.

IIUC, you want to keep track of comments and their associated votes, maybe
also keep a top-N list of the highest voted comments. So, keep each comment
and its votes in a dbm record, referenced by the comment's ID (which, I
assume, you keep a list of in the article that it comments on). You can use
pickle (see the shelve module) or JSON or whatever you like for storing
that record. Then, on each votes update, look up the comment, change its
votes and store it back. If you keep a top-N list for an article, update it
at the same time. Consider storing it either as part of the article or in
another record referenced by the article, depending of how you normally
access it. You can also store the votes independent of the comment (i.e. in
a separate record for each comment), in case you don't normally care about
the votes but read the comments frequently. It's just a matter of adding an
indirection for things that you use less frequently and/or that you use in
more than one place (not in your case, where comments and votes are unique
to an article).

You see, lots of options, even just using the stdlib...

Stefan





More information about the Python-list mailing list