[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance

Thomas Waldmann tw at waldmann-edv.de
Thu Apr 21 13:44:28 EDT 2016


>>> Background: Our VM tool kicks out a .tar file per container.

I was assuming it was kind of a disk image or pieces of a disk image, 
plus config file.

>> Please provide a tar listing so we can see how many / how big files
>> are in there. Without that, one can only speculate...
> I can't give you a listing, but I can tell you this, which should help:
> This tar is created (using the command below) against a brand-new Redhat
> OS install with no user data on it yet, and minimal services. It is
> approx 1GB, and is mostly small files of type OS.

OK, then the problem is as already analyzed. The default granularity of 
2MB of the chunker does not match that kind of input.

If you feed a lot of single, small files into borg, the chunks are 
determined automatically: each file of <512K will be automatically 1 chunk.

But if you kind of concatenate them all + intersect them with 
(changing?) metadata, these boundaries do not establish and likely there 
is always some change in the metadata.

So, it looks like you could just drop the tar step completely and just 
directly use borg to make it behave like you want.

> I apologize that this
> isn't exactly what you asked for, but I'm not permitted to give a
> specific listing of data due to some work policies, even though this is
> just a blank install.

No problem, I was assuming it was a different kind of listing, but it is 
clear enough now.

> Pretty much the only benefit this .tar file gives me, vs pointing borg
> against the mounted LVM snapshot itself, is that should a disaster
> occur, the recovery process relies on providing the VM server with the
> .tar file of each VM.

OK, so it's kind of an integration issue.

> A potential workaround to this would be to have borg work on the LVM
> mount itself, and then, during the restore process, I *might* (subject
> to testing) be able to run this tar command against the borg restore, in
> order to re-create the .tar file expected 'on demand' that can be
> consumed by the VM server. This feels a little wiggly, but I'll do some
> checking if all else fails.

There could be 2 other ways of solving this:
a) ask the VM/container sw provider to integrate borg
b) we could have a reader/chunker that reads from tar files instead of 
the filessystem alternatively to our normal chunker.

-- 

GPG Fingerprint: 6D5B EF9A DD20 7580 5747  B70F 9F88 FB52 FAF7 B393
Encrypted E-Mail is preferred / Verschluesselte E-Mail wird bevorzugt.


More information about the Borgbackup mailing list