[Borgbackup] Storage, CPU, RAM comparisons

Tue May 5 17:24:19 EDT 2020

Thanks Dmitry for your time,

>> >> Maybe borg improved from attic, but its not the point. Borg, attic,
>> >> duplicacy based on deduplication using massive higher storage space in
>> >> relation to duplicity (and rdiff-backup?). I dont' understand why
>> >> file-based delta is more storage efficient then deduplication which
>> >> can
>> >> consolidate chunks from all files in the repo. I expect the opposite
>> >> storage use ratio comparison.
>> 
>> Yet I still want to understand if it is true or not true that
>> deduplication would reduce disk space requirement. Isn't it the 
>> purpose
>> of deduplication? Even if compression choice was not fairly evaluated 
>> in
>> that comparison, why doesn't deduplication plus (worse) compression 
>> come
>> closer to duplicity?
> 
> Oh, but it is close. If you use zstd or zlib for compression then 
> (using
> files from the benchmark we are discussing) first borg backup will be 
> 179
> Mb, which is roughly what duplicity has.

Thank you. But this seem to point out that deduplication does not help 
reduce storage size of a set of files alone - that there are so few 
duplicate blocks detected between files. I found in a small test there 
was 1% shared chunks after first backup. Deduplication finds large 
number of shared chunks between archives with the same files, as we 
should expecting (my test, unique chunks does not grow very much so 
shared chunks % always close to 100/number of backups in repo).

I did expected there would be shared chunks between different files too 
as a reason I was expecting deduplication to have a larger storage 
advantage. As I see now, deduplication look similar to other forms of 
delta techniques in term of storage requirement.

Yet I'm surprised and do not understand the part of that study, where 
deduplication software storage growing much faster than rdiff/duplicity 
based software. Duplicacy and Borg grew sometime 1% to 30%+ but 
Duplicity grew never more than 9% sometime less than 1%.

If you think of the question in relativity terms like this can you talk 
about why deduplication storage balloon alot more than rdiff/duplicity 
solution?

You wrote about Borg overhead for many small files like Linux kernel, 
may take ~5-10Mb; is this per every backup archive? Is this part of the 
reason? Is that the cost of the many advantages of Borg?

>> Compare to hardlink setup like rsnapshot where changed files cause
>> entire new file to be kept, I expect because common blocks are kept 
>> only
>> once the storage reduction would be massive(? is it correct???)
> 
> Yes, this is correct - each unique block is stored at most once.

Do you think the conclusion also true, massive storage reduction 
(compare to hardlink technique) because of it?