[Borgbackup] deduplication understanding and best practice?

Thomas Waldmann tw at waldmann-edv.de
Tue Feb 28 08:29:27 EST 2017


>>> i) make a "all in one" linux borg-repository in which I backup
>>> several PC
>>> in the aim of benefit of identical blocs deduplication of many PC
>>
>> Do this if you have time, but you want to save space (esp. if you have
>> considerable duplication amongst the machines).
> 
> that's not easy to know before
> but I understand
> a single repository can save space (If we have considerable duplication
> ) , but not time

It's not just "not saving time", it will need quite some additional time
for chunks cache resync.

> I have not said that in this test,  I was backuping only the users 
> /home/   directories
> not the OS
> 
> So indeed files are not the same between the /home directories of these 2 PC
> 
> BUT, in my "imaginary" understanding  I was thinking that
> "statistically" there could be a lot of duplication of blocs even in
> case of different files ?

Well, if one looks with a ~2MiB granularity at your data, there aren't
many identical chunks.

When looking with finer granularity, it might discover increasingly
more, but all these chunks need to get managed. That is the reason why
we use ~2MiB (and not e.g. 64KiB any more, like attic and early borg did
- the management overhead was just too big).

> I mean "n" differents files (data files, netcdf, C, fortran,  program
> file, pictures, etc...) ,  have surely many identical blocs sequences , no ?

No. Likely, the only widespread common block (in files of different
descent) is the all-zero block.

So, for small files (<512KiB), just assume that non-identical files
won't dedup at all.

For large files (>>2 MiB), assume there will be some dedup if they are
of common descent at least. E.g. same virtual machine disk file in
different states / ages.

> Ah that's a thing I didnt know, Where can I see and configure these
> chunker params ? RTFM?

Yes, --chunker-params.

But just keep the default except if you have quite specific knowledge
that you need something different.


-- 

GPG ID: 9F88FB52FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393



More information about the Borgbackup mailing list