[Borgbackup] Fw: pve-zsync + borgbackup "changed size" discrepancy

Thomas Waldmann tw at waldmann-edv.de
Mon Mar 4 11:06:05 EST 2019


> shouldn't we need some "backup efficiency advisor" which should be able to give a better hint on "how to get the most out of borg" ?

feel free to code one. :)

just be aware that not everybody has the goal of "take as little repo
space as possible".

there are also:
- "do not make my machine run out of memory"
- "be as fast as possible".
- "be able to process a huge amount of data"

> what about a tool which analyses the borg repo and giving a hint for more space efficient usage ?

i guess if you want to determine good chunker params you would need to
backup the source data using different chunker params and compare the
results.

which is a rather expensive operation if you want to try a lot of
different combinations (and maybe your input data is also not small).

also, the chunker is seeded with a random value, so the results might
not be 100% reproducable.

> at least i would have found it useful to have such information at hands (in the manpage for example) what tuning knob we need to look at to make backups of large-files-with-tiny-changes much more space efficient...

if you can improve the docs, do a pull request.

> mind that even with "--chunker-params 10,23,16,4095" borg backup diff grows up to 7.98MB where xdelta3 reports a diff of only 158218 . so this is still not optimal, but a value i can live with....

compared to other tools, borg's goal not finding the minimal diff
between two files, but rather:
- speed
- logically stable chunk cutting points (good if data gets inserted /
removed)
- produce a manageable amount of chunks (with default chunker params)

borg tries to cut chunks of roughly some give target size (default 2MB).

so if your file is 100MB, that is ~ 50 chunks.

you can totally spoil the dedup by 50 times changing 1 byte in each chunk.

if you produce 500 chunks and have the same 50 changes, your dedup still
works in 90% of chunks. but you'll need 10x as much memory to manage the
chunks.

a specialised tool (i don't know xdelta3, but i assume this might be
one) can produce a very small diff by comparing the 2 files - but that
is not how borg works.


-- 


GPG ID: 9F88FB52FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393


More information about the Borgbackup mailing list