[Borgbackup] Deduplication not efficient on single file VM images

Mateusz Kijowski mateusz.kijowski at gmail.com
Mon Dec 4 07:39:12 EST 2017


Hi,


I am in the process of migrating my backups from zbackup[1] to borg
for my VM machine images, and the deduplication is not behaving as I
expected it to. I am running borgbackup 1.0.9 from jessie-backports
with lzma compression, default chunker settings and repokey encryption
(password provided by environment variable).

The backup image files are created by another tool (so these are
proper backups, not live disk images) and I am piping them into borg
stdin in my wrapper script. I also set timestamp on borg create so
that I can prune the backups nicely.

I have separate borg repositories per VM, so that I can load them in
parallel thus making it fit in my backup window. Both the source files
and the repositories are on a single machine (but on different
storage). Also, from my experiments it doesn't seem that IOPS are a
problem.

The biggest problem right now is that Borg seems to fail to
deduplicate most of the data:

# du -sh {zbackup,borg}/vm-100
1,9G    zbackup/vm-100
8,0G    borg/vm-100

Another, similar machine repo with a single archive in it shows that
the baseline is fine:

# du -sh {zbackup,borg}/vm-404
1,6G    zbackup/vm-404
1,6G    borg/vm-404

Borg stats output for first, second and last borg create for vm-100:
 ------------------------------------------------------------------------------
 Archive name: vzdump-qemu-100-2017_11_20-15_52_32.vma
 Archive fingerprint:
d73fcf2fc30807338336b3dbcfe831f7ee1a853a50b086071b4efeb2004d7dad
 Time (start): Mon, 2017-11-20 13:36:45
 Time (end):   Mon, 2017-11-20 15:52:32
 Duration: 2 hours 15 minutes 46.54 seconds
 Number of files: 1
 ------------------------------------------------------------------------------
                        Original size      Compressed size    Deduplicated size
 This archive:               14.49 GB              1.99 GB              1.96 GB
 All archives:               14.49 GB              1.99 GB              1.96 GB

                        Unique chunks         Total chunks
 Chunk index:                    4975                 5510
 ------------------------------------------------------------------------------
 ------------------------------------------------------------------------------
 Archive name: vzdump-qemu-100-2017_11_29-01_00_02.vma
 Archive fingerprint:
09f7303382e8669e05c030033dcd9c824da004b5e6ac93f7ebfb55589b17bff1
 Time (start): Tue, 2017-11-28 22:59:18
 Time (end):   Wed, 2017-11-29 01:00:02
 Duration: 2 hours 43.32 seconds
 Number of files: 1
 ------------------------------------------------------------------------------
                        Original size      Compressed size    Deduplicated size
 This archive:               14.50 GB              1.99 GB              1.72 GB
 All archives:               28.99 GB              3.98 GB              3.67 GB

                        Unique chunks         Total chunks
 Chunk index:                    8697                11023
 ------------------------------------------------------------------------------
 ------------------------------------------------------------------------------
 Archive name: vzdump-qemu-100-2017_12_02-02_34_28.vma
 Archive fingerprint:
828ab7dac873ff441f18864c16858d40c9eb34a0e26985c9b7e95508358c9d18
 Time (start): Sat, 2017-12-02 00:09:49
 Time (end):   Sat, 2017-12-02 02:34:28
 Duration: 2 hours 24 minutes 38.86 seconds
 Number of files: 1
 ------------------------------------------------------------------------------
                        Original size      Compressed size    Deduplicated size
 This archive:               14.51 GB              1.99 GB              1.63 GB
 All archives:               72.51 GB              9.95 GB              8.51 GB

                        Unique chunks         Total chunks
 Chunk index:                   19058                27600
 ------------------------------------------------------------------------------

The machine itself is a simple shorewall based router and the image
doesn't change much. The only content that is changing are the logs,
so I am truly amazed why the deduplication performs so weakly.

I guess I could run zerofill on the VM images, but on the other hand
zbackup somehow managed to deduplicate most of the stuff, so I
wouldn't think that this is the issue.

Is there something I am missing from the documentation regarding
tuning for my use-case?

Since I have a bunch of existing backups I am currently converting
them from zbackup to borg, using parallel "zbackup restore ... | borg
create ... -" pipelines. Perhaps there is a problem with multiple
processes using the same cache dir? Should the cache dir be seaparate
for different repos?

Another problem is that the backup takes way longer (zbackup takes
around 8 minutes to process the non-initial 14GB images, borg takes
more than 2 hours every time). My assumption is that this difference
is due to zbackup using multiple threads fot lzma compression. I also
understand that I could use lz4 to have large processing time benefits
at the cost of disk space. I think that I can live with that, provided
that deduplication works as expected.

Example borg init args:
"init", "/mnt/zbackup/borg/vm-100"

Example borg create args:
"create", "--stats", "-v", "--timestamp", "2017-11-29T00:00:02",
"--compression", "lzma",
"/mnt/zbackup/borg/vm-100::vzdump-qemu-100-2017_11_29-01_00_02.vma",
"-"

I would appreciate any hints, let me know if you need more data.


Mateusz



[1] http://zbackup.org/


More information about the Borgbackup mailing list