[Borgbackup] Reasons of remove --compression-from

Marcin Zajączkowski mszpak at wp.pl
Sun Mar 24 19:21:29 EDT 2019


Thanks for your reply.

On 2019-03-24 23:08, Thomas Waldmann wrote:
>> I wonder what was the reason and if Borg can by any chance do it in some
>> other way as of 1.1/1.2?
> 
> The reason was that "auto" is way easier from a user perspective and not
> expensive on the CPU.
> 
> The previously planned feature would have required to maintain a file
> with a list of file formats and advised compression algorithms, reliably
> detect file format (which is an non-trivial issue by itself) and then
> still having an issue with file formats that might be compressed or not
> internally.

In fact, auto is much easier to use, however, it is much less flexible.
Especially the ratio determining if file should be compressed seems to
be hardcoded in code to 0.97:
https://github.com/borgbackup/borg/blob/2b16fc9039660abba0ce6f5a25ae9c0f31ad48f5/src/borg/compress.pyx#L317

It is not the best value in some situations. My test case - 760MB of JPG
photos. Two different empty repos on /tmp/ (in memory) with AES encryption.

Without compression:
> time -p borg create --progress --compression none --list --filter=AME
--stats ...
> ...
> Duration: 15.18 seconds
> Number of files: 128
> Utilization of max. archive size: 0%
> ------------------------------------------------------------------------------
>                        Original size      Compressed size    Deduplicated size
> This archive:              761.66 MB            761.68 MB            761.68 MB
> All archives:              761.66 MB            761.68 MB            761.68 MB
> 
>                        Unique chunks         Total chunks
> Chunk index:                     411                  411
> ------------------------------------------------------------------------------
> real 16.20
> user 10.43
> sys 0.89


With auto,lmza:

> $ time -p borg create --progress --compression auto,lzma --list --filter=AME --stats ...
> ...
> Duration: 1 minutes 53.94 seconds
> Number of files: 128
> Utilization of max. archive size: 0%
> ------------------------------------------------------------------------------
>                        Original size      Compressed size    Deduplicated size
> This archive:              761.66 MB            757.23 MB            757.23 MB
> All archives:              761.66 MB            757.23 MB            757.23 MB
> 
>                        Unique chunks         Total chunks
> Chunk index:                     421                  421
> ------------------------------------------------------------------------------
> real 115.02
> user 102.09
> sys 3.36

16 vs 115 seconds is noticeable. Especially that I have GBs of photos.

Why lzma? Over time, it is better for me to have smaller size of the
backup (to keep more snapshots) over backup duration (it can be done "in
background").

As a workaround, I could try to divide my data into more groups, but it
is much less convenient to manage, especially that for example 7z
archives can be placed in the same directory structure as "normal" files.

I suspect the ratio is to high in my case. I would like to have an
ability to change it from a command line. However, even better would be
an ability to define, at least, a list of extensions that should be
ignored from compression (as a lighter version of the removed mechanism).

What do you think about that?

Marcin

-- 
https://blog.solidsoft.info/ - Working code is not enough


More information about the Borgbackup mailing list