[Borgbackup] hundreds of 17-byte files (1744 files out of 1762)

Sitaram Chamarty sitaramc at gmail.com
Wed Sep 1 10:03:17 EDT 2021


On Wed, Sep 01, 2021 at 03:00:53PM +0200, Thomas Waldmann wrote:
> 
> > > > My borg repo is 3646 MB. It has 1762 files, of which 1744 are
> > > > 17-byte files with exactly the same content. This seems to
> > > > completely throw off par2 in terms of performance.
> > > If a few small files "throw off" par2 in terms of performance, what
> > > would be
> > > the case if you had a really big backup?
> > Large files work fine -- performance scales much more linearly
> > with overall size of data. It's some block size thing that
> > causes extremely small files to be a problem.
> > 
> > If I understand correctly, the two other popular tools that use
> > Reed Solomon coding have similar issues though their base speed
> > may be better, so this appears to be inherent to the algorithm,
> > not a fault in this particular tool.
> > 
> > (And I don't know if I'd call 1744 files "few". Not when they
> > constitute almost 99% of the files in the repo; there are only
> > 18 others).
> 
> Well, you could also have 2000 big files in your borg backend - your 3.5GB
> repo isn't very big yet.

I already said large files work fine -- performance is
commensurate with the overall data size, and scales as expected.

Small files is where par2 breaks down.

> So if you have a problem with the small files now, I guess you would have a
> bigger problem if they were bigger.

Not at all.  Or at least not in a way that would be
surprising or unexpected.

> With 500MB (default) large segment files that would already be 1TB, so
> relatively large overall.
> 
> But you have to try whether 500MB is the best segment size for you anyway,
> for some scenarios some other size between 10 and 500MB might be better.
> 
> > > This is a (harmless) bug in borg 1.1.x, see the issue tracker.
> > couldn't find an issue that is exactly about this. #5679 and
> > #5315 seem closest -- is it one of those?
> 
> https://github.com/borgbackup/borg/issues/2850

thanks.

> > meanwhile it's trivial for me to tar up all the 17-byte files,
> > then give `par2` only files that are not 17 bytes long. I can
> > always untar and extract them if a repair is needed.
> > 
> If you remove the commits there might be data loss if borg thinks all inside
> that repo is uncommitted crap.

who said I'm removing them?

Par2 protects against bit flips etc., on disk.  When I mount a
disk some months later, I'd run par2 verify.

Since I did not *create* par2 files for the one thousand seven
hundred forty four 17 byte files, any errors in them won't get
verified.

But there is a tar file that contains all of them -- if that tar
file gets verified (and repaired if needed), I untar it to
overwrite clean copies of those 17-byte files.  The ones on disk
may have been good, or may not; I simply overwrite them.

Then the repo is back to what it was.




More information about the Borgbackup mailing list