[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance
William Gogan
william at conveystudio.com
Thu Apr 21 10:40:28 EDT 2016
public at enkore.de wrote:
> For this specific use case I'd recommend using the old chunker params
> which should allow better deduplication; still: unchanged, small files
> with updated metadata won't deduplicate.
For the sake of testing, I re-ran my same experiment (3 .tar files of
the same system, taken ~30 seconds apart, piped to borg) *without* any
chunker params, to let the defaults run. I was getting 10% deduplication
when using the explicit chunker params, and it's still right at 10%
using the default params.
However, note that the data is exactly as you predicted - the .tar file
comprises almost entirely of small files (the .tar file contains the /
directory of a brand-new redhat system with minimal installed services..
all files are small). Total deduplication is running around 20%.
So, this test (sample size=3) proved your expectation about small-file
behavior was accurate.
I am going to now try mounting the tar as suggested in another comment,
and will report back on what I get out of that.
>
>
--
William Gogan
Convey Studio / Custom. Digital. Branding.
719.278.3736
conveystudio.com <http://www.conveystudio.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20160421/95aec271/attachment.html>
More information about the Borgbackup
mailing list