[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance

William Gogan william at conveystudio.com
Thu Apr 21 10:40:28 EDT 2016



public at enkore.de wrote:
> For this specific use case I'd recommend using the old chunker params
> which should allow better deduplication; still: unchanged, small files
> with updated metadata won't deduplicate.
For the sake of testing, I re-ran my same experiment (3 .tar files of 
the same system, taken ~30 seconds apart, piped to borg) *without* any 
chunker params, to let the defaults run. I was getting 10% deduplication 
when using the explicit chunker params, and it's still right at 10% 
using the default params.

However, note that the data is exactly as you predicted - the .tar file 
comprises almost entirely of small files (the .tar file contains the / 
directory of a brand-new redhat system with minimal installed services.. 
all files are small). Total deduplication is running around 20%.

So, this test (sample size=3) proved your expectation about small-file 
behavior was accurate.

I am going to now try mounting the tar as suggested in another comment, 
and will report back on what I get out of that.
>
>

-- 
William Gogan
Convey Studio / Custom. Digital. Branding.
719.278.3736
conveystudio.com <http://www.conveystudio.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20160421/95aec271/attachment.html>


More information about the Borgbackup mailing list