[Borgbackup] Baffling behavior on low cpu system with borg backup

Thu Jun 30 00:34:11 EDT 2016

On Wed, Jun 29, 2016 at 02:32:30PM +0200, Thomas Waldmann wrote:
> > Now, the interesting thing is, this was not a permanent fix! Despite that it
> > was able to sync twice in a row, a few days later when I tried to run a full
> > backup using my script which is run the same way my test was, it got
> > completely jammed
> 
> If your chunks count in the repo grew significantly due to that, it
> maybe was using much more than 96MB then.
> 
> There is also a files cache eating some memory, see the docs for the
> formula.
> 
> The memory is needed on the machine running "borg create" (not: "borg
> serve", there it only needs to hold the repo index in memory).
Chunks count did not increase significantly; in fact the cache size did not
change on the server side at all, not even one byte.  Regardless you could
be right about the ram being a problem in the future.

For now, I did just think of something I hadn't before.  In every occurence
of it getting stuck recently that I can recall, it has been AFTER a
different machine than the server modified the repo.  The third machine is
using an amd phenom 2, an intel compatible 64bit processor.  The linux
distro it has installed does not have python 3 so I have been using the
statically linked binary from the download page for borg 1.0.3.  It seems to
work fine...  but I think I should see what happens if I have the RPI try to
sync BEFORE that computer does its thing, and AFTER the server has.  Might
be nothing, but an easy test to try - if memory serves, since my backup
script has the server run LAST when I was doing my tests and having them
fail the third machine had been the last one to modify the repo.

All I should need to do to test this theory is to change the order that the
script executes in.

> I assume you mean sshfs as a source of backup data.
yes, I mean that I am having the server mount the remote machines (RPI)
root, and then backing it up that way.  It is significantly faster to do it
this way, which is a bit weird and annoying since the docs for sshfs claim
sshfs is very cpu and I/O intensive on the machine that is mounted.

> > I am still utterly clueless as to why this is happening. Ideas?
> 
> Besides memory issues, it could be also an instance of the suspected
> "hashtable performance breakdown" (see issue tracker) - this might
> depend on the specific values stored into the hashtable.

Hmm, yes from an uneducated perspective that does look suspiciously similar
to what I'm seeing. Certainly if this IS the problem it would match the
behavior I get. Any tunables for tweaking this yet? I'd love to make it try
to allocate more ram just to see what it would do. Even if it OOMed itself
I'd probably learn something.

Thanks,
Tim McGrath
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 465 bytes
Desc: Digital signature
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20160629/a2a3d9f4/attachment.sig>