How to best update remote compressed, encrypted archives incrementally?

robert no-spam at no-spam-no-spam.com
Sat Mar 11 10:09:22 EST 2006


Steven D'Aprano wrote:


> Let me see if I understand you.
> 
> On the remote machine, you have one large file, which is compressed and
> encrypted. Call the large file "Archive". Archive is made up of a number
> of virtual files, call them A, B, ... Z. Think of Archive as a compressed
> and encrypted tar file.
> 
> On the local machine, you have some, but not all, of those smaller
> files, let's say B, C, D, and E. You want to modify those smaller files,
> compress them, encrypt them, transmit them to the remote machine, and
> insert them in Archive, replacing the existing B, C, D and E.
> 
> Is that correct?

Yes, that is it. In addition a possiblity for (fast) comparison of 
individual files would be optimal.

>>Thats why I ask: how to get all these tasks into a cohesive encrypted 
>>backup solution not wasting disk space and network bandwidth?
> 
> What's your budget for developing this solution? $100? $1000? $10,000?
> Stop me when I get close. Remember, your time is money, and if you are a
> developer, every hour you spend on this is costing your employer anything
> from AUD$25 to AUD$150. (Of course, if you are working for yourself, you
> might value your time as Free.)
> 
> If you have an unlimited budget, you can probably create a solution to do
> this, keeping in mind that compressed/encrypted and modify-in-place
> *rarely* go together. 
> 
> If you have a lower budget, I'd suggest you drop the "single file"
> requirement. Hard disks are cheap, less than an Australian dollar a
> gigabyte, so don't get trapped into the false economy of spending $100 of
> developer time to save a gigabyte of data. Using multiple files makes it
> *much* simpler to modify-in-place: you simply replace the modified file.
> Of course the individual files can be compressed and encrypted, or you can
> use a compressed/encrypted file system. 
> 
> Lastly, have you considered that your attempted solution is completely the
> wrong way to solve the problem? If you explain _what_ you are wanting to
> do, rather than _how_ you want to do it, perhaps there is a better way.

So, there seems to be a big barrier for that task, when encryption is on 
the whole archive. A complex block navigation within a block cipher 
would be required, and obviously there is no such (handy) code already 
existing. Or is there a encryption/decryption method which you can can 
use like a file pipe _and_ which supports 'seek'?

Thus, a simple method would use a common treshold timestamp or 
archive-bits and create multiple archive slices. (Instable when the file 
set is dynamic and older files are copied to the tree.)

2 nearly optimal solutions which allows comparing individual files

1st:
+ an (s)ftp(s)-to-zip/tar bridge seems to be possible. E.g. by hooking 
ZipFile to use a virtual self.fp
+ the files would be individually encrypted by a password
- an external tool like "gpg -c" is necessary; (or is there a good 
encryption with a native python module? Is PGP (password only) possible 
with a native python module? )
- the filenames would be visible

2nd:
+ manage a dummy file-tree locally for speedy comparision (with 0-length 
files)
+ create encrypted archive slices for upload with iterated filenames
- an external tool like "gpg -c" is necessary
- extra file tree or file attribute database
- unrolling status from multiple archive slices is arduous

Robert



More information about the Python-list mailing list