[borgbackup] Borg speed tuning on large files

Sat Aug 29 09:23:20 EDT 2015

> Tool 	Parameters 	Data size (apparent) 	Repo size 	Hrs 	Ratio 	C Rat 	C
> MB/s
> gzip 	c3 	2308843696 	560376600 	22 	24% 	4.1 	7
> Attic First Run 	default 	2251760621 	531964928 	48 	24% 	4.2 	3
> Attic Next Run 	default 	2308843696 	234398336 	32 	10% 	9.9 	2
> Borg First Run 	C0,19,23,21,4095 	2330579192 	2354907008 	26 	101% 	1 	25
> Borg Next Run 	C0,19,23,21,4095 	2270686256 	1341393408 	18 	59% 	1.7 	21
> Borg First Run 	C3,19,23,21,4095 	2270686256 	568351360 	33 	25% 	4 	5
> Borg Next Run 	C3,19,23,21,4095 	2268472600 	302165632 	23 	13% 	7.5 	4
> Borg Next Run 	C1,19,23,21,4095 	2247244128 	422037120 	24 	19% 	5.3 	5

Nice to see confirmation that we are quite faster than Attic. :)

Hmm, should the last line read "Borg First Run ... C1"?

In general, to evaluate the speed, it might be easier to only do "first
runs", because there always some specific amount of data (== all input
data) gets processed.

In "next run", the amount of data actually needing processing might vary
widely, depending on how much change there is between first and next run.

BTW, note for other readers: the "Parameters" column can't be given that
way to borg, it needs to be (e.g.):
borg create -C1 --chunker-params 19,23,21,4095 repo::archive data

Or in 0.25:
borg create -C zlib,1 --chunker-params ....

> Here is a picture in case the text does not come through well:

Yeah, that looked better. :)

BTW, what you currently have in the C MB/s column is how many compressed
MB/s it actually writes to storage (and if that is a limiting factor, it
would be your target storage, not borg).

Maybe more interesting would be how much uncompressed data it can
process per second.

> Oddly, compression setting of 1 took longer than C3.

Either there is a mistake in your table or your cpu is so fast that
higher compression saves more time by avoiding I/O than it needs for the
better compression.

With 0.25.0 you could try:
- lz4 = superfast, but low compression
- lzma = slow/expensive, but high compression
- none - no compression, no overhead (this is not zlib,0 any more)

> C0 shows the actual dedup capability of this data.

Doesn't seem to find significant amounts of "internal" duplication
within a "first run". Historical dedup seems to work and help, though.

Does that match your expectations considering the contents of your files?

In case you measure again, keep an eye on CPU load.

>  My business goal here is to get
> the data in within a day, so about 12 hours or so.  

If you can partition your data set somehow into N pieces and use N
separate repos, you could save some time by running N borgs in parallel
(assuming your I/O isn't a bottleneck then).

N ~= core count of your CPU

At some time in the future, borg might be able to a similar thing by
internal multithreading, but that is not ready for production yet.

There are also some other optimizations possible in the code (using
different hashes, different crypto modes, ...) - we'll try making it
much faster.

-- 

GPG ID: FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393