[Borgbackup] hundreds of 17-byte files (1744 files out of 1762)

Sitaram Chamarty sitaramc at gmail.com
Tue Aug 31 12:02:55 EDT 2021


On Sun, Aug 29, 2021 at 06:28:22PM +0200, Thomas Waldmann wrote:
> 
> > I'm trying to use `par2` on a borg repository that I am
> > archiving long term; par2 uses Reed-Solomon coding to produce
> > extra files that help recreate the data as long as it's not too
> > damaged.
> > 
> > My borg repo is 3646 MB. It has 1762 files, of which 1744 are
> > 17-byte files with exactly the same content. This seems to
> > completely throw off par2 in terms of performance.
> 
> If a few small files "throw off" par2 in terms of performance, what would be
> the case if you had a really big backup?

Large files work fine -- performance scales much more linearly
with overall size of data.  It's some block size thing that
causes extremely small files to be a problem.

If I understand correctly, the two other popular tools that use
Reed Solomon coding have similar issues though their base speed
may be better, so this appears to be inherent to the algorithm,
not a fault in this particular tool.

(And I don't know if I'd call 1744 files "few".  Not when they
constitute almost 99% of the files in the repo; there are only
18 others).

> 
> > I've asked the par2 folks also, but I thought it would be worth
> > asking here: why do all those files have exactly the same
> > content, and if so why are ther 1744 of them?
> 
> This is a (harmless) bug in borg 1.1.x, see the issue tracker.

couldn't find an issue that is exactly about this.  #5679 and
#5315 seem closest -- is it one of those?

> 
> borg 1.2 will have a cleanup action to remove the superfluous commit files.

good to know.

meanwhile it's trivial for me to tar up all the 17-byte files,
then give `par2` only files that are not 17 bytes long.  I can
always untar and extract them if a repair is needed.



More information about the Borgbackup mailing list