[borgbackup] Borg speed tuning on large files

Sun Aug 30 23:27:29 EDT 2015

Hi Thomas,

On Sat, Aug 29, 2015 at 9:23 AM, Thomas Waldmann <tw at waldmann-edv.de> wrote:
>> Tool  Parameters      Data size (apparent)    Repo size       Hrs     Ratio   C Rat   C
>> MB/s
>> gzip  c3      2308843696      560376600       22      24%     4.1     7
>> Attic First Run       default         2251760621      531964928       48      24%     4.2     3
>> Attic Next Run        default         2308843696      234398336       32      10%     9.9     2
>> Borg First Run        C0,19,23,21,4095        2330579192      2354907008      26      101%    1       25
>> Borg Next Run         C0,19,23,21,4095        2270686256      1341393408      18      59%     1.7     21
>> Borg First Run        C3,19,23,21,4095        2270686256      568351360       33      25%     4       5
>> Borg Next Run         C3,19,23,21,4095        2268472600      302165632       23      13%     7.5     4
>> Borg Next Run         C1,19,23,21,4095        2247244128      422037120       24      19%     5.3     5
>
> Nice to see confirmation that we are quite faster than Attic. :)
>
> Hmm, should the last line read "Borg First Run ... C1"?

Yes, I switched the [now obsolete] parameter to level 1 for a "next run"

>
> In general, to evaluate the speed, it might be easier to only do "first
> runs", because there always some specific amount of data (== all input
> data) gets processed.

But...in that case gzip beats all :).

>
> In "next run", the amount of data actually needing processing might vary
> widely, depending on how much change there is between first and next run.

Understood, though the point of dedup is to save space on
shared/unchanged data regions.  In my case the data is likely not as
similar, with 59% at no compression it means we only found 41% of
"same data" whereas I know in these databases 10% of change a day is
high.   So maybe I need to go chunk size hunting.  For others this
will likely work in a more efficient  manner.

> BTW, note for other readers: the "Parameters" column can't be given that
> way to borg, it needs to be (e.g.):
> borg create -C1 --chunker-params 19,23,21,4095 repo::archive data
>
> Or in 0.25:
> borg create -C zlib,1 --chunker-params ....
>
>> Here is a picture in case the text does not come through well:
>
> Yeah, that looked better. :)
>
> BTW, what you currently have in the C MB/s column is how many compressed
> MB/s it actually writes to storage (and if that is a limiting factor, it
> would be your target storage, not borg).

Sorry, I should have commented, C is for computed, i.e. size divided
by time.  I assume storage is not an issue, as uncompressed data can
pump here at 50+ MB/s.

>
> Maybe more interesting would be how much uncompressed data it can
> process per second.
>
>> Oddly, compression setting of 1 took longer than C3.
>
> Either there is a mistake in your table or your cpu is so fast that
> higher compression saves more time by avoiding I/O than it needs for the
> better compression.

That makes sense, CPU on this box is quite powerful.

>
> With 0.25.0 you could try:
> - lz4 = superfast, but low compression
> - lzma = slow/expensive, but high compression
> - none - no compression, no overhead (this is not zlib,0 any more)

Started lz4 trials tonight, will update!

>
>> C0 shows the actual dedup capability of this data.
>
> Doesn't seem to find significant amounts of "internal" duplication
> within a "first run". Historical dedup seems to work and help, though.
>
> Does that match your expectations considering the contents of your files?

It's a big mystery, highly esoteric database (think MUMPS :) but I
know overall change is unlikely to exceed 10% of "business content"
per day.  So I am not finding the right chunk size yet.

>
> In case you measure again, keep an eye on CPU load.

I see borg taking 99% of one core, load average in the 3-4 range, but
other processes are working, so this may be a bit muddled, I will
observe at idle times.

>
>>  My business goal here is to get
>> the data in within a day, so about 12 hours or so.
>
> If you can partition your data set somehow into N pieces and use N
> separate repos, you could save some time by running N borgs in parallel
> (assuming your I/O isn't a bottleneck then).
>
> N ~= core count of your CPU
>
> At some time in the future, borg might be able to a similar thing by
> internal multithreading, but that is not ready for production yet.

Understood, hard to do and make safe.  Thanks.

>
> There are also some other optimizations possible in the code (using
> different hashes, different crypto modes, ...) - we'll try making it
> much faster.

Much appreciated, I have the good high stress real life playground to test this.

Alex

>
> --
>
>
> GPG ID: FAF7B393
> GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393
>