[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance

Antoine Beaupré anarcat at debian.org
Thu Apr 21 10:42:28 EDT 2016


On 2016-04-21 10:40:28, William Gogan wrote:
> public at enkore.de wrote:
>> For this specific use case I'd recommend using the old chunker params
>> which should allow better deduplication; still: unchanged, small files
>> with updated metadata won't deduplicate.
> For the sake of testing, I re-ran my same experiment (3 .tar files of 
> the same system, taken ~30 seconds apart, piped to borg) *without* any 
> chunker params, to let the defaults run. I was getting 10% deduplication 
> when using the explicit chunker params, and it's still right at 10% 
> using the default params.
>
> However, note that the data is exactly as you predicted - the .tar file 
> comprises almost entirely of small files (the .tar file contains the / 
> directory of a brand-new redhat system with minimal installed services.. 
> all files are small). Total deduplication is running around 20%.
>
> So, this test (sample size=3) proved your expectation about small-file 
> behavior was accurate.
>
> I am going to now try mounting the tar as suggested in another comment, 
> and will report back on what I get out of that.

It would be interesting to have unit / perf tests for this stuff.

A.

-- 
If Christ were here there is one thing he would not be -- a Christian.
                        - Mark Twain


More information about the Borgbackup mailing list