[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance

William Gogan william at conveystudio.com
Thu Apr 21 01:45:26 EDT 2016


I'm trying borgbackup out, and so far it's performing really well in almost
all tests.

The one item where I'm seeing odd performance is for tar files. It appears
not to be deduplicating except within the current archive.

Background: Our VM tool kicks out a .tar file per container. It compresses
(lzo) the .tar. For discussion purposes, let's pretend it's called
vm.tar.lzo

So, I call `lzop vm.tar.lzo -d --to-stdout |  borg create --verbose --stats
--progress --chunker-params 19,23,21,4095 --compression lz4
/dir/borg/::2016-04-21-01-38 -` - I assumed lzo would wreck borg's dedupe,
so I pipe in the decompressed version.

Even if I generate a .tar file, then immediately generate a second one
(within <30s of the first), and then feed them both to borgbackup, it shows
about 80% of the blocks as non-duplicates despite 99% of the files not
having changed on the disk (and so should not have changed in the .tar)

I looked at the FAQ, and it does make specific mention of doing well at VM
backups, so I'm wondering if I'm doing something wrong.

What can I do to get better dedupe performance? I considered adding tar to
the mix and untarring the file before piping it to borg, but that seems
suboptimal.

If anyone has any suggestions, I'd welcome them!

Thanks,
William.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20160421/1a4be69a/attachment.html>


More information about the Borgbackup mailing list