[Borgbackup] Very large delta's when pruning a static backup

Thomas Waldmann tw at waldmann-edv.de
Wed Jan 5 10:58:20 EST 2022


> best wishes for 2022.

2u2!

> borg create ended with a deduplicated size of 2.65kB

Sounds like borg is deduplicating perfectly:
- it has deduped all the files contents
- it has deduped even the metadata stream (containing all file metadata, 
like filenames, owner/group/mode, etc.)
- the few "new" kB remaining after this are some archive metadata

> borg prune ends with -24.60GB deduplicated size.

Likely because it removes 1 old archive after creating the new one to 
stay at the same archive count.

> But in the end I needed to replicate 32.764GB to my offsite mirror location.

That's way more than expected.

Ideas:

- offsite syncing broken somehow (not borg) - can you get a list of 
files it transfers? what tool do you use? does it work with mtime or 
ctime or based on what precisely?

- borg's compact_segments() internal code runs after every write 
operation (create, delete, ...), shuffling data from old non-compact 
segment files into new compact ones.

in your case, borg prune deletes 1 old archive, removing that old 
archive metadata stream from the segment files (punching a hole, making 
it non-compact) and also creating 1 new archive (writing the metadata 
stream to a new segment file). if you don't have too many files (thus 
not dealing with large md streams), that should only touch a few segment 
files, though.

But maybe it is worth changing the max segment file size (default 500MB, 
for remote syncing it is better to go a bit smaller, like 25..50MB). But 
this is rather about optimizing from 1GB to 0.1GB (not from 32GB to ...).

- if that does not help, create full ls -lR listing of the repo before 
borg operations and afterwards (directly before the big sync) and 
compare them.

> My expectation was that since no new data was written to the source 
> files, no extra data was added to the repo. That behaviour seems to happen.
> However I also expected that when pruning this repo data would be 
> deleted and that I only would be removing data in my offsite location, 
> not transfering so much.

Some overhead comes from compact_segments.

Without that, space usage would grow forever (like in append-only mode).

In borg 1.2, there will be a separate borg compact command and 
compact_segments() won't be called implicitly any more, so there is 
better control about when and how often that happens.

> I'm still using borgbackup 1.1.11 and upgrading involves red tape.

New stable releases within the same series (like 1.1.x) usually work 
quite good, if that is the problem.


More information about the Borgbackup mailing list