[Borgbackup] (no subject)

Mason Schmitt mason at ftlcomputing.com
Mon Jul 26 14:10:53 EDT 2021


Hello,

OS - CentOS 7.9.2009
Borg - 1.1.16 (from the EPEL repo)
Filesystem - XFS on LVM
VM images - raw and qcow2 images on the XFS filesystem
VM image sizes - OS images are between 15GB and 100GB.  Data images can be
upwards of 2TB.
Backup process - Freeze VM fileystems, create LVM snapshots, mount
snapshots, run Borg against the mount point, unmount, delete snapshots


I have Borg running on over a dozen systems, all at different locations.  I
use it to backup VM images to locally attached USB drives.  Overall Borg
has been very reliable.

However, I do have one site where I've been experiencing backup failures.
Each time it has happened, I have taken increasingly drastic measures to
get the backups working again.  I'll share what I've done to troubleshoot
the issue in the past and provide more details about the most recent
failure, which started last week.


May 3 - 1st occurrence
-----------------------------------
Saw this in the log:
<lots of python error messages>
borg.helpers.IntegrityError: Data integrity error: Segment entry checksum
mismatch [segment 11589, offset 267452375]

Ran:
borg check -p -v --repair /mnt/borg/borg-backups/backup.borg/

The backups seemed to be ok for about a week, but then when the weekly
`borg check` was run, it reported tons of errors.

I had seen a single badblock report for the USB drive, so I replaced the
USB backup drive and initialized a new borg repo and things seemed to be ok
for two weeks.

As an aside - I ran badblocks against the old USB drive and it didn't find
any issues with the drive.


June 14 - 2nd occurrence
----------------------------------------------
Backups failed and I again saw tons of errors in the log, following a run
of `borg check`.

Decided to take the nuclear option:
- Deleted /root/.cache/borg
- Deleted /root/.config/borg
- Wiped the USB backup drive and re-initialized the repo

The backups again started to work properly and the weekly `borg check` also
seemed to be happy.


July 18 - 3rd occurrence
---------------------------------------------
Backups failed and continued to fail each day with very similar error
messages to what I saw on May 3.  See below for details.


What should I do?
-------------------------------
Given all of my sites are very similar (very similar hardware, same OS,
same Borg version, same configuration, all backing up VM images) and given
that I'm not having issues at any other sites; I'm guessing my Borg and LVM
snapshot configuration is ok.  So, does anyone have insight into what might
be going on at this site and what steps I could take to troubleshoot it?



Sun Jul 18 22:24:26 PDT 2021 Starting backup
borgbackup version 1.1.16
Creating archive at
"/mnt/borg/borg-backups/backup.borg::2021-07-18-fqdn-25793-system"

.... several lines ommitted .....


M /mnt/snapbak/var/lib/libvirt/images/vm_slow_data/fs1-DATA.qcow2
Data integrity error: Segment entry checksum mismatch [segment 2438,
offset 9906907]
Traceback (most recent call last):
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 4690, in main
    exit_code = archiver.run(args)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line 4622, in run
    return set_ec(func(args))
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line
177, in wrapper
    return method(self, args, repository=repository, **kwargs)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line
595, in do_create
    create_inner(archive, cache)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line
555, in create_inner
    read_special=args.read_special, dry_run=dry_run, st=st)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line
672, in _process
    read_special=read_special, dry_run=dry_run)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line
672, in _process
    read_special=read_special, dry_run=dry_run)
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line
672, in _process
    read_special=read_special, dry_run=dry_run)
  [Previous line repeated 3 more times]
  File "/usr/lib64/python3.6/site-packages/borg/archiver.py", line
646, in _process
    status = archive.process_file(path, st, cache)
  File "/usr/lib64/python3.6/site-packages/borg/archive.py", line
1091, in process_file
    self.chunk_file(item, cache, self.stats,
backup_io_iter(self.chunker.chunkify(fd, fh)))
  File "/usr/lib64/python3.6/site-packages/borg/archive.py", line
1012, in chunk_file
    from_chunk, part_number = self.write_part_file(item, from_chunk,
part_number)
  File "/usr/lib64/python3.6/site-packages/borg/archive.py", line 990,
in write_part_file
    self.write_checkpoint()
  File "/usr/lib64/python3.6/site-packages/borg/archive.py", line 483,
in write_checkpoint
    self.save(self.checkpoint_name)
  File "/usr/lib64/python3.6/site-packages/borg/archive.py", line 530, in save
    self.repository.commit()
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line
475, in commit
    self.compact_segments()
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line
756, in compact_segments
    for tag, key, offset, data in self.io.iter_objects(segment,
include_data=True):
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line
1437, in iter_objects
    read_data=read_data)
  File "/usr/lib64/python3.6/site-packages/borg/repository.py", line
1533, in _read
    segment, offset))
borg.helpers.IntegrityError: Data integrity error: Segment entry
checksum mismatch [segment 2438, offset 9906907]

Platform: Linux fqdn 3.10.0-1160.31.1.el7.x86_64 #1 SMP Thu Jun 10
13:32:12 UTC 2021 x86_64
Linux: CentOS Linux 7.9.2009 Core
Borg: 1.1.16  Python: CPython 3.6.8 msgpack: 0.5.6
PID: 25826  CWD: /root
sys.argv: ['/bin/borg', 'create', '--verbose', '--stats', '--list',
'--filter', 'AME', '--show-rc', '--show-version', '--compression',
'zstd', '--exclude-caches', '--exclude', '/mnt/snapbak/root/.caches',
'--exclude', '/mnt/snapbak/home/\\*/.caches', '--exclude',
'/mnt/snapbak/var/cache/\\*', '--exclude', '/mnt/snapbak/var/tmp/\\*',
'--exclude', '/mnt/snapbak/tmp/\\*',
'/mnt/borg/borg-backups/backup.borg::2021-07-18-fqdn-25793-system',
'/mnt/snapbak']
SSH_ORIGINAL_COMMAND: None

terminating with error status, rc 2




-- 

Mason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://mail.python.org/pipermail/borgbackup/attachments/20210726/26e9e306/attachment.html>


More information about the Borgbackup mailing list