[Borgbackup] Fw: pve-zsync + borgbackup "changed size" discrepancy

Mon Mar 4 20:11:19 EST 2019

thanks for great explanation!

> Gesendet: Montag, 04. März 2019 um 17:06 Uhr
> Von: "Thomas Waldmann" <tw at waldmann-edv.de>
> An: borgbackup at python.org
> Betreff: Re: [Borgbackup] Fw: pve-zsync + borgbackup "changed size" discrepancy
>
> > shouldn't we need some "backup efficiency advisor" which should be able to give a better hint on "how to get the most out of borg" ?
> 
> feel free to code one. :)
> 
> just be aware that not everybody has the goal of "take as little repo
> space as possible".
> 
> there are also:
> - "do not make my machine run out of memory"
> - "be as fast as possible".
> - "be able to process a huge amount of data"
> 
> > what about a tool which analyses the borg repo and giving a hint for more space efficient usage ?
> 
> i guess if you want to determine good chunker params you would need to
> backup the source data using different chunker params and compare the
> results.
> 
> which is a rather expensive operation if you want to try a lot of
> different combinations (and maybe your input data is also not small).
> 
> also, the chunker is seeded with a random value, so the results might
> not be 100% reproducable.
> 
> > at least i would have found it useful to have such information at hands (in the manpage for example) what tuning knob we need to look at to make backups of large-files-with-tiny-changes much more space efficient...
> 
> if you can improve the docs, do a pull request.
> 
> > mind that even with "--chunker-params 10,23,16,4095" borg backup diff grows up to 7.98MB where xdelta3 reports a diff of only 158218 . so this is still not optimal, but a value i can live with....
> 
> compared to other tools, borg's goal not finding the minimal diff
> between two files, but rather:
> - speed
> - logically stable chunk cutting points (good if data gets inserted /
> removed)
> - produce a manageable amount of chunks (with default chunker params)
> 
> borg tries to cut chunks of roughly some give target size (default 2MB).
> 
> so if your file is 100MB, that is ~ 50 chunks.
> 
> you can totally spoil the dedup by 50 times changing 1 byte in each chunk.
> 
> if you produce 500 chunks and have the same 50 changes, your dedup still
> works in 90% of chunks. but you'll need 10x as much memory to manage the
> chunks.
> 
> a specialised tool (i don't know xdelta3, but i assume this might be
> one) can produce a very small diff by comparing the 2 files - but that
> is not how borg works.
> 
> 
> -- 
> 
> 
> GPG ID: 9F88FB52FAF7B393
> GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393
> _______________________________________________
> Borgbackup mailing list
> Borgbackup at python.org
> https://mail.python.org/mailman/listinfo/borgbackup
>