[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance

Dmitry Astapov dastapov at gmail.com
Thu Apr 21 07:42:38 EDT 2016


If there is a good fuse mounter for tar files, you can achieve better
results mounting them and archiving from there.

On Thu, Apr 21, 2016 at 12:41 PM, <public at enkore.de> wrote:

> On 21.04.2016 11:03, Sitaram Chamarty wrote:
> > On 04/21/2016 01:58 PM, public at enkore.de wrote:
> >> Since Borg doesn't know the structure of a tar file my guess is that
> >> changed metadata that's stored in-line with file data will make
> >> deduplication of the file data impossible for files that are smaller
> >> than 1-2 avg chunk sizes (>2 MB).
> >
> > Oh very nice; I had not thought of this but it makes perfect sense!
> >
> >> For this specific use case I'd recommend using the old chunker params
> >> which should allow better deduplication; still: unchanged, small files
> >> with updated metadata won't deduplicate.
> >>
> >> When deduplicating actual file systems this doesn't seem to be as
> >> troublesome ; my guess here is that most file systems tend to put inodes
> >> (with the often-changing metadata) in one place and file data in
> >> another, hence metadata updates don't affect data deduplication as much.
> >
> > My guess would be that borg itself "knows" what is metadata and what is
> > file data, and has different storage/dedup mechanisms for them.
>
> My bad, I meant to write "deduplicating actual file system *images*".
>
> When Borg makes archives from a file system (not FS image) then the
> physical layout of the FS doesn't matter, it reads files/dirs with
> normal APIs like most programs would do.
>
> File contents directly go into chunks, metadata goes into the item
> (=files, dirs) stream, which is chunked with a different, very
> fine-grained chunker.
>
> Cheers, Marian
>
> >
> > regards
> > sitaram
> >
> >>
> >> Still, for optimal granularity you'll want Borg to be able to tell files
> >> apart.
> >>
> >> Cheers, Marian
> >>
> >> On 21.04.2016 09:11, heiko.helmle at horiba.com wrote:
> >>>> Borg isn't capable of handling duplicate pieces inside a file.
> >>>>
> >>>> oop; my apologies.  I reacted too fast and did not realise that borg
> was
> >>>> getting an uncompressed file.
> >>>>
> >>>> I assume this means borg gets the file via STDIN?  If so, maybe it has
> >>>> something to do with STDIN being less amenable to dedup?
> >>>>
> >>>> sorry again for my previous (useless) mail!
> >>>
> >>> I'm seeing something similar here. I used attic (and many early borg
> >>> revisions) to backup a few work VMs here. A slightly bigger one (about
> >>> 100Gigs) was backupped daily. This backup took about half an hour (with
> >>> -C lzma) and resulted in about 1-2 Gigs of new data (deduped and
> >>> compressed) each time.
> >>>
> >>> Now with recent borg, the amount of new data jumped to about 17-20Gigs
> >>> per day and it took much longer (i had to scale back to use zlib as
> >>> compression to have the backup finnish before the LVM snapshot filled
> >>> up). This indicates that the deduplication engine took a hit along the
> >>> way and feeds much more data to lzma, which makes the overall runtime
> >>> slower.
> >>>
> >>> This *might* coincide with the change in the default chunker params,
> but
> >>> I'm not sure. Unfortunately I didn't pay attention as to which release
> >>> actually started the drop in dedup performance. If I find the time, I
> >>> might start a trial run with the "classic" parameters (10,23,16,4095),
> >>> but not this week :)
> >>>
> >>> Best Regards
> >>>  Heiko
> >>>
> >>>
> >>> _______________________________________________
> >>> Borgbackup mailing list
> >>> Borgbackup at python.org
> >>> https://mail.python.org/mailman/listinfo/borgbackup
> >>>
> >>
> >>
> >> _______________________________________________
> >> Borgbackup mailing list
> >> Borgbackup at python.org
> >> https://mail.python.org/mailman/listinfo/borgbackup
> >>
> >
>
> _______________________________________________
> Borgbackup mailing list
> Borgbackup at python.org
> https://mail.python.org/mailman/listinfo/borgbackup
>



-- 
Dmitry Astapov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20160421/d549767e/attachment.html>


More information about the Borgbackup mailing list