[Borgbackup] Isn't locking broken, because stale lock removal doesn't comply with the locking protocol?

Thomas Portmann thomas at portmann.org
Thu Jan 9 11:42:52 EST 2020


Hi,

I'm currently writing a shell script for locking a borg repo with a (selectable) shared or exclusive lock while not touching the repo. Borg's with-lock command locks only exclusively and changes the repo after the target command has finished and before the exclusive lock is removed (for reasons I don't know yet). This is not what I need.

So, in order to follow the protocol and to not damage my repos, I wanted to learn how exactly locking of a borg repo is accomplished.

Based on what I learned and observed, I come to the preliminary assumption that borg locking is broken since introduction of the stale lock removal.

The reason is as simple as already stated in the subject: The procedure of killing a stale exclusive lock violates the locking protocol as described in

https://borgbackup.readthedocs.io/en/stable/internals/data-structures.html#lock-files

"If the process can create the lock.exclusive directory for a resource, it has the lock for it. If creation fails (because the directory has already been created by some other process), lock acquisition fails."

and

https://borgbackup.readthedocs.io/en/stable/usage/general.html#file-systems

"mkdir(2) should be atomic, since it is used for locking."

I used inotify and strace to observe how stale exclusive locks are removed by borg. The behaviour of version 1.1.7 and 1.1.10 seems to be the same and is as follows:

1. Try to create directory lock.exclusive.
2. If that fails, because it's already present, 
2.a (Look at the process's own lock indicator file...for whatever reason...in this situation, it doesn't exist.)
2.b Remove any stale lock indicator.
2.c Remove directory lock.exclusive.
# At this point, the actual lock acquisition happens:
3. Again, try to create directory lock.exclusive.
# If it was successful now...
4. Create the process's own lock indicator file.
... (work)
5. Remove it.
6. Remove directory lock.exclusive.

The violation is--looking exactly at the owning criterion (having created the directory successfully), that the process not owning the lock may remove it, while the owning process cannot safely detect this removal.

Let's assume that two borg processes A and B run on the same repo in parallel, for example, like this:

A.1/A.3 => lock.exclusive was created and is still empty,
B.1, B.2..B.3 => lock.exclusive has been removed and created again...
A.4, B.4 => BANG!! At least at this point, both A and B are thinking they own the lock.

My questions:
1.  Was any measure implemented to safely prevent this situation?
2.  If so, which one? Is there a secret protocol extension?

If not, what about making the locking protocol safe? For example like this:

AFAIK, on most reasonable OSes / local filesystems, not only mkdir(2) is atomic, but also rename(2). So instead of successful creation of the lock.exclusive directory being the criterion, one could define successful renaming of a randomly named temporary directory already prepared with the host/process identifier to lock.exclusive being the criterion. This way, there is no time gap between lock.exclusive coming to existence and creation of the identifier, where any other process could intervene. In a POSIX shell on a local repo, the following code would do the essence of this job:

tempdir=$(mktemp -d -p "$BORG_REPO")
touch "$tempfile/$BORG_HOST_ID.$$-0"
if mv -T "$tempfile" "$BORG_REPO/lock.exclusive"
then
      # I have the lock, so maintain lock.roster, remove lock.exclusive in case of shared locking, and do my work.
else
      # remove stale exclusive lock, if any, and try again or tidy up the temp dir.
fi
...

This is, because mv -T calls rename, which succeeds if the source is a directory and the destination is an empty directory or doesn't exist. It would also be compatible with the current protocol. It would be safe when running concurrently with another process with this protocol, and would have the same current problem when running concurrently with a process which follows the current protocol.

Cheers
Thomas





-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20200109/f89af4d4/attachment.html>


More information about the Borgbackup mailing list