[Borgbackup] Storage, CPU, RAM comparisons

Tue May 5 18:26:41 EDT 2020

On Tue, May 5, 2020, 22:24 MRob <mrobti at insiberia.net> wrote:

> Thanks Dmitry for your time,
>
> >> >> Maybe borg improved from attic, but its not the point. Borg, attic,
> >> >> duplicacy based on deduplication using massive higher storage space
> in
> >> >> relation to duplicity (and rdiff-backup?). I dont' understand why
> >> >> file-based delta is more storage efficient then deduplication which
> >> >> can
> >> >> consolidate chunks from all files in the repo. I expect the opposite
> >> >> storage use ratio comparison.
> >>
> >> Yet I still want to understand if it is true or not true that
> >> deduplication would reduce disk space requirement. Isn't it the
> >> purpose
> >> of deduplication? Even if compression choice was not fairly evaluated
> >> in
> >> that comparison, why doesn't deduplication plus (worse) compression
> >> come
> >> closer to duplicity?
> >
> > Oh, but it is close. If you use zstd or zlib for compression then
> > (using
> > files from the benchmark we are discussing) first borg backup will be
> > 179
> > Mb, which is roughly what duplicity has.
>
> Thank you. But this seem to point out that deduplication does not help
> reduce storage size of a set of files alone - that there are so few
> duplicate blocks detected between files. I found in a small test there
> was 1% shared chunks after first backup. Deduplication finds large
> number of shared chunks between archives with the same files, as we
> should expecting (my test, unique chunks does not grow very much so
> shared chunks % always close to 100/number of backups in repo).
>
> I did expected there would be shared chunks between different files too
> as a reason I was expecting deduplication to have a larger storage
> advantage. As I see now, deduplication look similar to other forms of
> delta techniques in term of storage requirement.
>

That of course depends on your files. I found that for VM images/snapshots,
multiple copies of the photos with different retouching applied, large text
files (logs), etc there is a good deal of sharing

> Yet I'm surprised and do not understand the part of that study, where
> deduplication software storage growing much faster than rdiff/duplicity
> based software. Duplicacy and Borg grew sometime 1% to 30%+ but
> Duplicity grew never more than 9% sometime less than 1%.
>

Assuming we are still talking about that github project, you are looking at
the archive of less than 1gb in size, and only because of this small data
size it is possible for the metadata for 12 backups to occupy a sizeable
portion of total size.

Thinking in percents is useless, this will not hold for large archives.

> If you think of the question in relativity terms like this can you talk
> about why deduplication storage balloon alot more than rdiff/duplicity
> solution?

Well, it does not, not for me.

> You wrote about Borg overhead for many small files like Linux kernel,
> may take ~5-10Mb; is this per every backup archive?

Yes, it would be

Is this part of the
> reason?

Yes, I suppose.

Is that the cost of the many advantages of Borg?
>

Is this really a cost? I suppose it could be viewed as a cost only if you
backup megabytes, not hundreds of gigabytes.

>
> >> Compare to hardlink setup like rsnapshot where changed files cause
> >> entire new file to be kept, I expect because common blocks are kept
> >> only
> >> once the storage reduction would be massive(? is it correct???)
> >
> > Yes, this is correct - each unique block is stored at most once.
>
> Do you think the conclusion also true, massive storage reduction
> (compare to hardlink technique) because of it?
>

Sorry, I failed to parse this.

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20200505/0720d946/attachment-0001.html>