Creating a very simple revision system for photos in python

Dan Stromberg drsalists at gmail.com
Fri Mar 11 13:37:37 EST 2011


On Fri, Mar 11, 2011 at 6:56 AM, Thomas W <thomas.weholt at gmail.com> wrote:

> I`m thinking about creating a very simple revision system for photos
> in python, something like bazaar, mercurial or git, but for photos.
> The problem is that handling large binary files compared to plain text
> files are quite different. Has anybody done something like this or
> have any thoughts about it, I`d be very grateful. If something like
> mercurial or git could be used and/or extended/customized that would
> be even better.
>
> We are talking about large numbers of photos and some of them are
> large in size as well, but the functionality does not have to be a
> full fledged revision system, just handle checking out, checking in,
> handling conflicts, rollbacks etc, preferrably without storing
> complete copies of the files in question for every operation.
>
> Thanks for any input. :-)
>

Check out the rolling_checksum portion of backshift, and pyrabinf:
http://stromberg.dnsalias.org/svn/backshift/trunk/
http://stromberg.dnsalias.org/svn/pyrabinf/

You could probably use a variable-length, shift-resistant blocking to chop
the inputs into binary chunks, and then make a checkin consist of a series
"pointers" (pathnames in a filesystem trie or keys into something like
mongodb) to those chunks, to avoid duplications.  Actually, something like
this could probably be wrapped around Mercurial or SVN or whatever,
depending on what your needs are.

I originally set up pyrabinf as a wrapper for a preexisting C++ Rabin
Fingerprinting algorithm; this is probably the more traditional way of doing
such blocking.

However, I've been playing around with rolling my own algorithm in pure
python (and also with Cython) using something that boils down to a rolling
(boxcar) sum of the bytes, so it'll work in pypy.  So far, it seems to be
working fine.  Rabin Fingerprinting should be less subject to generating the
same blocking for a file that has two adjacent bytes swapped, but in my
project, and I suspect in yours, that doesn't really matter.

But also check out http://mercurial.selenic.com/wiki/BfilesExtension - this
might be less time consuming for you, better leveraging an existing tool.

HTH
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110311/2eaf117c/attachment-0001.html>


More information about the Python-list mailing list