[Borgbackup] Deduplication of tar files - doesn't seem to be giving good performance

William Gogan william at conveystudio.com
Thu Apr 21 02:52:33 EDT 2016



Sitaram Chamarty wrote:
> On 04/21/2016 11:15 AM, William Gogan wrote:
>> I'm trying borgbackup out, and so far it's performing really well in almost all tests.
>>
>> The one item where I'm seeing odd performance is for tar files. It appears not to be deduplicating except within the current archive.
>>
>> Background: Our VM tool kicks out a .tar file per container. It compresses (lzo) the .tar. For discussion purposes, let's pretend it's called vm.tar.lzo
>
> Compression changes the bytestream.  You may get lucky and the changes
> only happened to files at the end of a tar file, but that's unlikely.
> Depending on how many files changed, the probably that something changed
> at the beginning of the tar file is pretty high.
Just to confirm - even though as I mention I'm piping lzop -d 
--to-stdout vm.tar.lzo to borg (ie: borg is not getting a compressed 
file, it is being piped the uncompressed .tar file), it sounds like Borg 
isn't capable of handling duplicate pieces inside a file.

I guess, and I'm probably wrong about this.. I had hoped that it would 
go something like "borg is getting the uncompressed .tar, so it will see 
that 98% of the files in that tar didn't change, and it will deduplicate 
all of that".

I think what you're telling me though is that, when inside a single big 
file like a tar, borg doesn't cope very well with small changes, even if 
that big file is uncompressed like a straight tar.. is that right? Would 
I be better trying to totally extract the tar to a tmp disk and point 
borg at that each time?
>
> This is what I would guess is happening.

-- 
William Gogan
Convey Studio / Custom. Digital. Branding.
719.278.3736
conveystudio.com <http://www.conveystudio.com>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/borgbackup/attachments/20160421/cf799447/attachment.html>


More information about the Borgbackup mailing list