[Borgbackup] Baffling behavior on low cpu system with borg backup

Fri Jul 8 19:20:34 EDT 2016

Thanks a lot for trying to debug this.

I've been looking at the code involved in index merging, and may have
found a bug there that may play out in low-memory situations (= failing
to acquire memory). I'm not saying, "that's the bug", but I can't rule
it out. I'll be preparing a patch. Maybe it helps.

Cheers, Marian

On 09.07.2016 00:21, tmhikaru at gmail.com wrote:
> On Thu, Jun 30, 2016 at 09:10:11PM -0700, tm at raspberrypi wrote:
>> On Thu, Jun 30, 2016 at 09:08:06AM +0200, Marian Beermann wrote:
>>> You can try enabling faulthandler. Set environment variable (export)
>>> PYTHONFAULTHANDLER to something, say, foobar. When it gets stuck you can
>>> send SIGABRT and should get a proper stack trace where it get stuck.
>> Will do. Oddly enough I may be on to something here, after changing the
>> order of which machines access the repo it started working again.
Could be
>> yet another fluke, so I'll do quite a few tests before I am satisfied.
>
> After making this change, I got through a full week of fully working full
> backups with nothing going wrong.  Data was being added and pruned with
> every cycle - each started with ~23 archives in the repo, which was pruned
> to ~20 before the remote work would begin on the RPI to sync.  Everything
> was working perfectly and I was having no trouble at all.  Yesterday I
> kicked off the fifth backup and went away for the day, assuming when I'd
> come home it'd be done and I could write to you about the workaround I'd
> done, despite it not making any sense.  Instead, I came home only to find
> out it'd gotten stuck almost instantly while merging chunks into the
master
> index *again* while processing local cached archive data when I got home
> that night, more than 10 hours later.  I killed it, break-lock'd, and
ran it
> again the same way, but with debug instead of info output and with this
> python variable set.  Without doing things like blowing away the local
> cache, I have seen this causes the program to get stuck in the exact same
> way it did before every time before now.  I was hoping I could get a
useful
> trace to see what it was trying to actually do, rather than continuing to
> make educated and uneducated guesses.
>
> Maddeningly, it worked 100% perfectly and didn't get stuck at all, even
> processing the very local archive data it'd gotten stuck on for ~10hrs in
> seconds.
>
> I give up. I cannot make this program work reliably the way I am trying to
> use it, or even diagnose what the actual problem is with such hit or miss
> behavior.  If I ever have to use xattrs on a low cpu/ram system I may wind
> up doing something like streaming tar data over the network to the server
> running borg.  Hopefully by the time I do need such a thing, sshfs may
have
> evolved to support xattrs.  Certainly, borg runs very poorly as a
client on
> a low cpu&ram machine that has to access a remote repo that holds a lot of
> data and is modified by other machines.  I cannot recommend trying to
> emulate my test setup, it just doesn't work reliably.
>
> I was using borg in a not suggested manner, and I have both read and have
> had helpful people here tell me not to use it in this way.  I
apologize for
> being difficult.  Thank you all for trying to help.
>
> Tim McGrath
>
>
>
> _______________________________________________
> Borgbackup mailing list
> Borgbackup at python.org
> https://mail.python.org/mailman/listinfo/borgbackup
>