[Borgbackup] Compression and De-duplication

Mon Aug 14 16:14:14 EDT 2017

> I'm interested to know at what point compression occurs within the
> backup process - does borg perform a compress on a file, then split into
> chunks to check for de-duplication, or does it split a file into chunks
> for deduplication, then compress those chunks when saving them to disk?

borg create:
read-from-file and chunk, id-hash, compress, encrypt, add-auth,
store-to-repo

borg extract: fetch-from-repo, check-auth, decrypt, decompress,
write-to-file

> The reason I ask is that I use hyperconverged storage at work (i.e.
> storage within our VMWare environment is de-duplicated real-time) - and
> one of the things pointed out by the vendor is that de-duplication is
> severely affected by compression - i.e. if you compress a file, then try
> to de-dup, the de-duplication can't locate similar blocks.

Yes, that's true for stream compression, like tar.gz.

Not so much for "rsyncable" compression.

Not sure what's the problem with your storage and borg though.

If your source files are on that storage, it won't matter for borg.

If your repo is on that storage, your hyperconverged-dedup won't work
due to encryption. But it doesn't need to work because a borg repo
internally is already deduped (by borg).

So, no problem.

-- 

GPG ID: 9F88FB52FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393