[Borgbackup] A small compression test

William Kenworthy billk at iinet.net.au
Wed Mar 8 00:03:59 EST 2023


 From experience:

     1. borg repos on a network file system (moosefs in my case) can be 
very very slow

     2. Borg has to read a complete VM image before it can calculate 
checksums - and if you store the VM's on a network filesystem it is time 
consuming just to read 500Mb of data in one image let alone process it 
and then have to go on to do a number of other images.

     3. Consider if you can avoid large VM's and use the OS files 
natively on a filesystem/partition, or backup the inside of the VM 
rather than the image - the borg algorithms skip files they see as not 
having changed from metadata (but does do a safety recheck after a 
certain number of runs - see docs).  VM's by their nature have to be 
read in their entirety every time to figure out what has changed, even 
if its just one byte of data in it.  I have found that reading a VM 
images` contents a much faster operation after the first time.  If (as 
in my case) both the VM's and the repos are on a network filesystem, you 
will need to carefully consider where the work (reading files and 
calculating checksums) is to be done - reading multiple VM and storage 
images of many hundreds of megabytes will take time and cant be 
avoided.  The good news is borg is still faster than most other backup 
systems even in this scenario.

     4. Consider paralleling as much as possible - running borgbackup on 
multiple hosts pushing into individual repos at the same time takes only 
a little longer than doing 1 backup. e.g. doing it serially is 1+1+1+1 
etc., while parallel would be something like 1.5 in total.  Note that in 
my case, this is also leveraging the internal parallelisation of moosefs 
running on a number of separate hosts.

** I found I reached the limits of my moosefs filesystem storing decades 
of email, hundreds of thousands of photos, borg repos and other files 
which it did quite well until I went too far for my hardware :(  Moving 
millions of smaller files in to loopback mounted images solved that 
problem, at the expense of blowing out a 15 minute backup sequence to 
many hours.  Backing up the files by reading into the image made quite a 
large timesaving.

BillK


On 8/3/23 02:00, Bzzzz wrote:
> On Tue, 7 Mar 2023 15:18:47 +0100
> Thorsten Schöning <tschoening at am-soft.de> wrote:
>
>> Guten Tag Bzzzz,
>> am Dienstag, 7. März 2023 um 14:42 schrieben Sie:
>>
>>> Normal : it is single threaded _and_ you have a lot more files to
>>> scan, to compare to what's in the repo and, eventually, compress.
>> The only change I'm aware of was lz4 to zstd and that doesn't
>> influence scan performance for changed files, that should be like
>> before. It only influences CPU load and compression time of changed
>> data.
> It does, as you have more compressed files in a BB file, so checksums
> are read faster than with lz4 because they're more concentrated.
>
>>> I meant think about only add changed VM chunks to the repo[...]
>> The changes per day to the VM images are larger than the changes to
>> the individually backed up files. So if X GiB are pretty fast for
>> VM-images and database dumps, I'm wondering why (X-Y) GiB of data is
>> that slow when backing up individual files. That doesn't make too much
>> sense.
> I reformulate to see if I understand correctly :
> * VM images & DB dumps are many GB of changed data and backup fast,
> * regular smaller files are not that often changed but backup slower.
>
> If I have to make a guess, I'd say that if a very few readings on
> either the client and the server, you have all what's needed for a
> VM/DB, when for regular files, that might not dwell into the same BB
> file and different areas on the HDD of the client, there's many more
> head movements (hence latency), plus BB have to calculate many more
> checksums when files are small than when they are made of big chunks.
>
> Jean-Yves
> _______________________________________________
> Borgbackup mailing list
> Borgbackup at python.org
> https://mail.python.org/mailman/listinfo/borgbackup


More information about the Borgbackup mailing list