[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance
Antoine Beaupré
anarcat at debian.org
Thu Apr 21 10:42:28 EDT 2016
On 2016-04-21 10:40:28, William Gogan wrote:
> public at enkore.de wrote:
>> For this specific use case I'd recommend using the old chunker params
>> which should allow better deduplication; still: unchanged, small files
>> with updated metadata won't deduplicate.
> For the sake of testing, I re-ran my same experiment (3 .tar files of
> the same system, taken ~30 seconds apart, piped to borg) *without* any
> chunker params, to let the defaults run. I was getting 10% deduplication
> when using the explicit chunker params, and it's still right at 10%
> using the default params.
>
> However, note that the data is exactly as you predicted - the .tar file
> comprises almost entirely of small files (the .tar file contains the /
> directory of a brand-new redhat system with minimal installed services..
> all files are small). Total deduplication is running around 20%.
>
> So, this test (sample size=3) proved your expectation about small-file
> behavior was accurate.
>
> I am going to now try mounting the tar as suggested in another comment,
> and will report back on what I get out of that.
It would be interesting to have unit / perf tests for this stuff.
A.
--
If Christ were here there is one thing he would not be -- a Christian.
- Mark Twain
More information about the Borgbackup
mailing list