[Borgbackup] Very large delta's when pruning a static backup
Thomas Waldmann
tw at waldmann-edv.de
Wed Jan 5 10:58:20 EST 2022
> best wishes for 2022.
2u2!
> borg create ended with a deduplicated size of 2.65kB
Sounds like borg is deduplicating perfectly:
- it has deduped all the files contents
- it has deduped even the metadata stream (containing all file metadata,
like filenames, owner/group/mode, etc.)
- the few "new" kB remaining after this are some archive metadata
> borg prune ends with -24.60GB deduplicated size.
Likely because it removes 1 old archive after creating the new one to
stay at the same archive count.
> But in the end I needed to replicate 32.764GB to my offsite mirror location.
That's way more than expected.
Ideas:
- offsite syncing broken somehow (not borg) - can you get a list of
files it transfers? what tool do you use? does it work with mtime or
ctime or based on what precisely?
- borg's compact_segments() internal code runs after every write
operation (create, delete, ...), shuffling data from old non-compact
segment files into new compact ones.
in your case, borg prune deletes 1 old archive, removing that old
archive metadata stream from the segment files (punching a hole, making
it non-compact) and also creating 1 new archive (writing the metadata
stream to a new segment file). if you don't have too many files (thus
not dealing with large md streams), that should only touch a few segment
files, though.
But maybe it is worth changing the max segment file size (default 500MB,
for remote syncing it is better to go a bit smaller, like 25..50MB). But
this is rather about optimizing from 1GB to 0.1GB (not from 32GB to ...).
- if that does not help, create full ls -lR listing of the repo before
borg operations and afterwards (directly before the big sync) and
compare them.
> My expectation was that since no new data was written to the source
> files, no extra data was added to the repo. That behaviour seems to happen.
> However I also expected that when pruning this repo data would be
> deleted and that I only would be removing data in my offsite location,
> not transfering so much.
Some overhead comes from compact_segments.
Without that, space usage would grow forever (like in append-only mode).
In borg 1.2, there will be a separate borg compact command and
compact_segments() won't be called implicitly any more, so there is
better control about when and how often that happens.
> I'm still using borgbackup 1.1.11 and upgrading involves red tape.
New stable releases within the same series (like 1.1.x) usually work
quite good, if that is the problem.
More information about the Borgbackup
mailing list