[borgbackup] Chunker params for very large files

Alex Gorbachev ag at iss-integration.com
Sun Aug 23 22:26:01 EDT 2015


Hi Thomas,

On Fri, Aug 21, 2015 at 7:48 AM, Thomas Waldmann <tw at waldmann-edv.de> wrote:

> If you have enough space and you rather care for good speed, little
> management overhead (but not so much about deduplicating with very fine
> grained blocks), use a higher value for HASH_MASK_BITS, like 20 or 21,
> so it creates larger chunks in the statistical medium. It sounds like
> this matches your case.
>
> If you care for very fine grained deduplication and you maybe don't have
> that much data and you can live with the management overhead, use a
> small chunksize (small HASH_MASK_BITS, like the default 16).
>
>> An existing recommendation of 19,23,21,4095 for huge files from
>> https://borgbackup.github.io/borgbackup/usage.html appears to
>> translate into:
>>
>> minimum chunk of 512 KiB
>> maximum chunk of 8 MiB
>> medium chunk of 2 MiB
>>
>> In a 100GB file we are looking at 51200 chunks.
>
> You need to take the total amount of your data (~2TB) and compute the
> chunk count (1.000.000). Then use the resource formula from the docs and
> compute the sizes of the index files (and RAM needs).
>
> In your case this looks quite reasonable, you could also use 1MB chunks,
> but better don't use 64KB chunks.

Thank you for the clarification.  Is the HASH_WINDOW_SIZE tunable in
any way or useful to change?

Best regards,
Alex

>
>> beneficial to raise these further?  The machine I have doing this has
>> plenty of RAM (32 GB) and 8 CPU cores at 2.3 GHz, so RAM/compute is
>> not a problem.
>
> Right. But if your index is rather big, it'll need to copy around a lot
> of data (for transactions, for resyncing the cache in case you backup
> multiple machines to same repo).
>
>
> Cheers, Thomas
>
> ----
>
> GPG Fingerprint: 6D5B EF9A DD20 7580 5747  B70F 9F88 FB52 FAF7 B393
> Encrypted E-Mail is preferred / Verschluesselte E-Mail wird bevorzugt.



More information about the Borgbackup mailing list