[Borgbackup] Reasons of remove --compression-from

Thomas Waldmann tw at waldmann-edv.de
Mon Mar 25 09:07:52 EDT 2019


> In fact, auto is much easier to use, however, it is much less flexible.
> Especially the ratio determining if file should be compressed seems to
> be hardcoded in code to 0.97:
> https://github.com/borgbackup/borg/blob/2b16fc9039660abba0ce6f5a25ae9c0f31ad48f5/src/borg/compress.pyx#L317

That could be:
- adjusted
- if no generally good value can be determined, it can be made an option

> It is not the best value in some situations. My test case - 760MB of JPG
> photos. Two different empty repos on /tmp/ (in memory) with AES encryption.
> 
> Without compression:
>> time -p borg create --progress --compression none --list --filter=AME
> --stats ...
>> ...
>> Duration: 15.18 seconds
>> Number of files: 128
>> Utilization of max. archive size: 0%
>> ------------------------------------------------------------------------------
>>                        Original size      Compressed size    Deduplicated size
>> This archive:              761.66 MB            761.68 MB            761.68 MB
>> All archives:              761.66 MB            761.68 MB            761.68 MB
>>
>>                        Unique chunks         Total chunks
>> Chunk index:                     411                  411
>> ------------------------------------------------------------------------------
>> real 16.20
>> user 10.43
>> sys 0.89

On unmodified code, could you also run just lzma compression mode
without auto, to see if it takes about 1:53 for your data, which would
indicate the current threshold takes the wrong decision for your data?

After that, can you adjust the value in the borg src and find the
highest one that gives close-to-uncompressed performance with auto,lzma?

Then, maybe verify doing the same with a bigger amount of photos.

Not sure if --debug option gives some more clues, but just try it to see
if it gives useful output for your case.

> 16 vs 115 seconds is noticeable. Especially that I have GBs of photos.

Lots of photos are a quite widespread use case, so we can adjust for that.

> Why lzma? Over time, it is better for me to have smaller size of the
> backup (to keep more snapshots) over backup duration (it can be done "in
> background").

zstd might be a more modern option.

> I suspect the ratio is to high in my case. I would like to have an
> ability to change it from a command line.

We can do that if no good value can be determined.
We should avoid adding lots of commandline options though as it makes
using borg harder and the documentation you need to read bigger.

> However, even better would be
> an ability to define, at least, a list of extensions that should be
> ignored from compression (as a lighter version of the removed mechanism).

Similar for that plus the effort of having to maintain yet another list.



-- 

GPG ID: 9F88FB52FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393


More information about the Borgbackup mailing list