[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance

Marcus Schopen lists at localguru.de
Thu Apr 21 05:50:29 EDT 2016


Hi,

Am Donnerstag, den 21.04.2016, 05:45 +0000 schrieb William Gogan:
> I'm trying borgbackup out, and so far it's performing really well in
> almost all tests.
> 
> 
> The one item where I'm seeing odd performance is for tar files. It
> appears not to be deduplicating except within the current archive.
> 
> 
> Background: Our VM tool kicks out a .tar file per container. It
> compresses (lzo) the .tar. For discussion purposes, let's pretend it's
> called vm.tar.lzo
> 
> 
> So, I call `lzop vm.tar.lzo -d --to-stdout |  borg create --verbose
> --stats --progress --chunker-params 19,23,21,4095 --compression
> lz4 /dir/borg/::2016-04-21-01-38 -` - I assumed lzo would wreck borg's
> dedupe, so I pipe in the decompressed version.
> 
> 
> Even if I generate a .tar file, then immediately generate a second one
> (within <30s of the first), and then feed them both to borgbackup, it
> shows about 80% of the blocks as non-duplicates despite 99% of the
> files not having changed on the disk (and so should not have changed
> in the .tar)
> 
> 
> I looked at the FAQ, and it does make specific mention of doing well
> at VM backups, so I'm wondering if I'm doing something wrong.
> 
> 
> What can I do to get better dedupe performance? I considered adding
> tar to the mix and untarring the file before piping it to borg, but
> that seems suboptimal.
> 
> 
> If anyone has any suggestions, I'd welcome them!


I have a similar deduplication problem with partclone images I'd like to
backup. Andy ideas of another dumper (instead of raw dd)?

Ciao
Marcus





More information about the Borgbackup mailing list