[Borgbackup] Prune and backup failure

Mon Oct 21 18:47:19 EDT 2019

On Tue, 22 Oct 2019 00:02:08 +0200
Thomas Waldmann <tw at waldmann-edv.de> wrote:

> > I had an issue with BB ; I don't know if it comes from the serveur or
> > the client but it is not the first time, last time was a few months
> > ago and the only solution I found to cure that was to erase this
> > machine's repo and make another backup.
> 
> You could also try borg check [--repair].

IIRC it failed the first time - I was in a hurry (no time to call help
from the ML), so I zapped it and re-launched a backup that succeeded.

But now I have time and I just launched a repair.

> > 	FileNotFoundError: [Errno 2] No such file or directory:
> > 	'/BORG/data/0/510'
> > (/BORG is where the repo is NFS mounted)
> > 
> > Effectively, this file doesn't exist in the repo.
> 
> OK, so there can be multiple causes:

That is what I feared.

> - assuming the repo index is valid and the file should be there: you
> have lost (at least) 1 file somehow.
> 
> - the repo index is somehow invalid and pointing to a wrong segment
> file.

Noooo!? ;-)

> - your fs (NFS) is not working correctly / reliably

Nope, all other machines also use NFS and they never ever had a glitch.

> > Manifest.load(repository, compatibility)
> 
> BTW, the missing segment seems to have the manifest (directory of
> archives).
> 
> That's one of the last segment files produced in a backup run.
> 
> It's rather bad if that is missing as it has all the pointers to the
> archives, but borg check --repair should be able to rebuild the
> manifest (takes a while though). That would also rebuild the repo
> index.

Ok, I just answered 'YES' to the repair and it did not throw anything
at me, so now I'm waiting to see what will happen.

I think once this will be settled, I'm gonna launch a long memory test,
just in case (I've got another machine that blew a RAM without any good
reason, except it's age (+15 ans))

As BB's very good with other machines (and very good, period.)

> > sys.argv: ['/usr/local/bin/borg', 'create', '--verbose',
> > '--exclude-caches', '--show-rc', '--filter', 'AME', '--list',
> > '--stats', '--checkpoint-interval', '600', '--compression',
> > 'auto,zlib,6', '--exclude-from',
> > '/usr/local/sbin/BORG_EXCLUSIONS.list', '::{hostname}-{utcnow}Z',
> > '/'] SSH_ORIGINAL_COMMAND: None
> 
> Unrelated to your problem, but just as a hint:
> 
> checkpoint-interval 600 is rather frequent. this is good for unstable
> connections, but might impact performance if you have big indexes.

At this time, the machines I backup have very low used disk space and I
sometime forget to do the maintenance before launching BB - this delay
was shortened especially about that, to make sure BB wouldn't restart
from scratch if I have to kill it and relaunch it after maintenance.

> zlib is not quite the fastest nor modern.

At home, tests have shown that lz4 had an inferior score (less
compression), so as I play here with my own money, I used the best
compromise for my kind of data.

zstd was good, but it took more time to achieve the compression
(these machines are old, slow and are mostly mono-core.)

> maybe you rather want to consider the more modern zstd or lz4.

Thanks for your quick answer, I'll keep you aware about the repair
results, but for now, I'm going to bed.

Jean-Yves