[Borgbackup] hashtable performance issue explained (fix in 1.0.11*)

Thomas Waldmann tw at waldmann-edv.de
Wed Jul 5 09:47:05 EDT 2017


On 07/04/2017 10:26 PM, tmhikaru at gmail.com wrote:
> 	Hmm.  These fixes look very interesting, especially the hash table
> fixes and optimizations.  I honestly can't do it right now, but I think I'll
> try testing borg on my raspberry pi again when I get the chance, and see if
> these changes make a difference in its observed misbehavior.  Certainly the
> problems that these fixes are meant to solve are eerily similar to the
> stalls my rpi would get where it'd just sit there hashing at 100% cpu for
> days.

Yup, likely that was the root cause of the problem you observed. I also
observed it now and then, but in a less extreme way (due to faster hw).

Only counting the used buckets for the fill ratio of the hash table was
a mistake with major performance implications as it did not
rebuild/enlarge the hashtable even if there were no empty buckets left
(but everything was filled up with tombstones).

That's not a problem that leads to malfunction (as in crashing or
computing wrong data), but as only empty buckets terminate the search
for a nonexisting key early(*), the situation escalates rather quickly
and badly when the last empty buckets go away:

99 empty buckets -> has to scan ~1% of the table
9 empty buckets -> has to scan ~10% of the table
1 empty bucket -> has to scan ~50% of the table
0 empty buckets -> has to scan 100% of the table (*)

And once it reached 0 empty, it was working extremely slow until it
added / removed enough data to / from the hashtable to trigger a resize.

When the hashtable gets resized, all data (used buckets) is used to
build a new hashtable from scratch (that has no tombstones initially).

Fix: now it also rebuilds the HT if there are too few empty buckets.


(*) the other (unwanted, extremely slow) case for search termination is
when the code notices that it has scanned the whole table. borg's
hashtables are used to manage all the data/metadata chunks in the
repository, so they are large - megabytes to gigabytes size.

-- 

GPG ID: 9F88FB52FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393



More information about the Borgbackup mailing list