hard disk activity

Paul Rubin http
Mon Feb 13 15:58:41 EST 2006


"VSmirk" <vania.smirk at gmail.com> writes:
> But the trick in my mind is figuring out which specific bytes have been
> written to disk.  That's why I was thinking device level.  Am I going
> to have to work in C++ or Assembler for something like this?

No, you can do it in Python.  The basic idea is: locally compute a
separate checksum for (say) each 1% chunk of the file.  Do the same
thing on the remote side.  So for a 1GB file, you compute 100
checksums at each end, each checksum covering 10 MB.  Then send the
100 checksums over the network, which is just a few kbytes.  Compare
the checksums and you know which 10MB chunks have changed.  For the
chunks that have changed, divide them into 100-kbyte sub-chunks and
checksum those, etc.  The optimal number of chunks at each level
depends on network speed and various other things.  Anyway this is
basically how rsync works.

Doing anything device level will be highly OS dependent.



More information about the Python-list mailing list