[Borgbackup] Isn't locking broken, because stale lock removal doesn't comply with the locking protocol?

Thomas Portmann thomas at portmann.org
Thu Jan 9 12:17:28 EST 2020


It must read "$tempdir" instead of "$tempfile".

Am 9. Januar 2020 17:42:52 MEZ, schrieb Thomas Portmann <thomas at portmann.org>:
>Hi,
>
>I'm currently writing a shell script for locking a borg repo with a
>(selectable) shared or exclusive lock while not touching the repo.
>Borg's with-lock command locks only exclusively and changes the repo
>after the target command has finished and before the exclusive lock is
>removed (for reasons I don't know yet). This is not what I need.
>
>So, in order to follow the protocol and to not damage my repos, I
>wanted to learn how exactly locking of a borg repo is accomplished.
>
>Based on what I learned and observed, I come to the preliminary
>assumption that borg locking is broken since introduction of the stale
>lock removal.
>
>The reason is as simple as already stated in the subject: The procedure
>of killing a stale exclusive lock violates the locking protocol as
>described in
>
>https://borgbackup.readthedocs.io/en/stable/internals/data-structures.html#lock-files
>
>"If the process can create the lock.exclusive directory for a resource,
>it has the lock for it. If creation fails (because the directory has
>already been created by some other process), lock acquisition fails."
>
>and
>
>https://borgbackup.readthedocs.io/en/stable/usage/general.html#file-systems
>
>"mkdir(2) should be atomic, since it is used for locking."
>
>I used inotify and strace to observe how stale exclusive locks are
>removed by borg. The behaviour of version 1.1.7 and 1.1.10 seems to be
>the same and is as follows:
>
>1. Try to create directory lock.exclusive.
>2. If that fails, because it's already present, 
>2.a (Look at the process's own lock indicator file...for whatever
>reason...in this situation, it doesn't exist.)
>2.b Remove any stale lock indicator.
>2.c Remove directory lock.exclusive.
># At this point, the actual lock acquisition happens:
>3. Again, try to create directory lock.exclusive.
># If it was successful now...
>4. Create the process's own lock indicator file.
>... (work)
>5. Remove it.
>6. Remove directory lock.exclusive.
>
>The violation is--looking exactly at the owning criterion (having
>created the directory successfully), that the process not owning the
>lock may remove it, while the owning process cannot safely detect this
>removal.
>
>Let's assume that two borg processes A and B run on the same repo in
>parallel, for example, like this:
>
>A.1/A.3 => lock.exclusive was created and is still empty,
>B.1, B.2..B.3 => lock.exclusive has been removed and created again...
>A.4, B.4 => BANG!! At least at this point, both A and B are thinking
>they own the lock.
>
>My questions:
>1.  Was any measure implemented to safely prevent this situation?
>2.  If so, which one? Is there a secret protocol extension?
>
>If not, what about making the locking protocol safe? For example like
>this:
>
>AFAIK, on most reasonable OSes / local filesystems, not only mkdir(2)
>is atomic, but also rename(2). So instead of successful creation of the
>lock.exclusive directory being the criterion, one could define
>successful renaming of a randomly named temporary directory already
>prepared with the host/process identifier to lock.exclusive being the
>criterion. This way, there is no time gap between lock.exclusive coming
>to existence and creation of the identifier, where any other process
>could intervene. In a POSIX shell on a local repo, the following code
>would do the essence of this job:
>
>tempdir=$(mktemp -d -p "$BORG_REPO")
>touch "$tempfile/$BORG_HOST_ID.$$-0"
>if mv -T "$tempfile" "$BORG_REPO/lock.exclusive"
>then
># I have the lock, so maintain lock.roster, remove lock.exclusive in
>case of shared locking, and do my work.
>else
># remove stale exclusive lock, if any, and try again or tidy up the
>temp dir.
>fi
>...
>
>This is, because mv -T calls rename, which succeeds if the source is a
>directory and the destination is an empty directory or doesn't exist.
>It would also be compatible with the current protocol. It would be safe
>when running concurrently with another process with this protocol, and
>would have the same current problem when running concurrently with a
>process which follows the current protocol.
>
>Cheers
>Thomas
>
>
>
>
>
>-- 
>Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail
>gesendet.
>
>------------------------------------------------------------------------
>
>_______________________________________________
>Borgbackup mailing list
>Borgbackup at python.org
>https://mail.python.org/mailman/listinfo/borgbackup

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20200109/8c3032a7/attachment.html>


More information about the Borgbackup mailing list