[Borgbackup] Determine if a chunk should be added to repository

Marian Beermann public at enkore.de
Tue Nov 19 10:20:25 EST 2019


Hash collisions are always possible, but depending on the size of the
hash and the number of items they can be highly improbable. For a hash
with N output values and k inputs you can approximate the likelihood of
at least one collision by k²/2N (for large k, N). For example,
1e12²/2**257 is ~4e-54.

-Marian

Am 18.11.19 um 22:24 schrieb Joseph Hesse:
> Thank you for your reply.
> Is the unique id for a content-addressed storage object a 256 bit hash? 
> Collisions are always possible, why doesn't Borg handle them, or, am I
> not understanding the idea behind content-addressed storage.
> Thank you, Joe
> 
> 
> On 11/17/19 11:27 AM, Marian Beermann wrote:
>> No. Borg uses content-addressed storage and does not handle hash
>> collisions. That's why Borg uses a 256 bit cryptographic hash
>> (HMAC-SHA2, BLAKE2).
>>
>> -Marian
>>
>> Am 17.11.19 um 18:05 schrieb Joseph Hesse:
>>> Hi,
>>>
>>> I've been reading all I can find about Deduplication.  The explanations
>>> say that hashes are used to determine if two chunks are identical or
>>> different.  Even though it is very rare, two chunks can have the same
>>> hash and be different.  So if chunk1's hash matches the hash of some
>>> chunk in the repository, it seems to me that a byte-by-byte comparison
>>> has to be done before deciding whether chunk1 should not be added to the
>>> repository.  Is this the case with Borg?
>>>
>>> Thank you,
>>>
>>> Joe
>>>
>>> _______________________________________________
>>> Borgbackup mailing list
>>> Borgbackup at python.org
>>> https://mail.python.org/mailman/listinfo/borgbackup
> 
> 



More information about the Borgbackup mailing list