[Borgbackup] faster / better deletion, for a bounty?

John Goerzen jgoerzen at complete.org
Thu Dec 22 09:44:57 EST 2016



On 12/22/2016 03:14 AM, Mario Emmenlauer wrote:
> What you say makes perfect sense for a single archive. But borg 
> reports also
> numbers for "all archives", which I understood to be the numbers for the full
> repository. Am I on the wrong track there? Because "all archives" is not the
> sum of the individual archives, so I assumed its the repo. For the repo,
> however, I think the dedup size should be equal to the disk size (except for
> overheads like meta data, index, etc). Therefore I was surprised to see that
> for me, its approx. 50% of disk usage.
Ah, what you're saying there does seem to mesh with what's documented.  
You've got me then.

I wonder, what does du -sh over your repo show?  And is it any different 
if you add --apparent-size to du?

John
>
> See here the output of borg list on one of my archives:
> Number of files: 1796064
>                         Original size      Compressed size    Deduplicated size
> This archive:               95.27 GB             70.53 GB            178.00 MB
> All archives:               78.26 TB             65.13 TB              1.82 TB
>                         Unique chunks         Total chunks
> Chunk index:                 9733154            414693364
>
> Cheers,
>
>      Mario
>
>
>
>> How you count up space is a funny business when you have deduplication going
>> on.  Same when you have hard links in your filesystem.  (du can say you've got
>> 50GB in a directory, but you might find that rm -r on it only frees up 50K if
>> there's a lot of hardlinks to other areas.)
>>
>> I think zfs might have a little clearer terminology on this: "referenced" is how
>> much data is pointed to by a given snapshot, and "used" is how much space would
>> be freed if only that one snapshot were deleted right now.  That's like borg's
>> archive size and dedup size.
>>
>> John
>>
>>
>>>
>>>>> (2) In the last months, my backup size went up quite a lot, even though
>>>>>      I did not change anything in borg. So I'd like to reverse engineer
>>>>>      which archives (or which files) contribute to the sudden increase in
>>>>>      size. I tried "borg list" on all archives, but only 7 have ~3 GB of
>>>>>      deduplicated space, and all others have less than 1 GB of dedup space!
>>>>>      I assumed 533 archives of ~1 GB dedup size = 533 GB total,
>>>> No, that is only the sum of the space ONLY used by a single archive.
>>>>
>>>> As soon as the same chunks are used by more than 1 archive, it does not
>>>> show up as "unique chunks" any more.
>>>>
>>>>>      How would I find the archives that free most space when deleted?
>>>> For a single archive deletion, that is the unique chunks space
>>>> ("deduplicated size") of that archive.
>>>>
>>>> For multiple archive deletion there is no easy way to see beforehands.
>>> Would it be possible to somehow change this reporting in borg? I
>>> think I (possibly accidentally) backed up a few huge files for a few
>>> days, that now use up 50% of my archive space. Since the chunks are
>>> shared, I have no way of knowing which archives are the "bad guys".
>>> My only option seems to prune with a shotgun-approach until eventually
>>> I get lucky and free significant disk space. If I'm unlucky I can
>>> prune a lot before freeing any significant space...
>>>
>>> I think for example 'du' when used on hard links reports the shared
>>> disk usage on the first directory it encounters, and does not duplicate
>>> the size of hard links on subsequent directories. Would this be a sane
>>> behaviour for borg too? Or add a new field for "shared chunks size"?
>>>
>>>
>>> Thanks a lot for the help, and all the best,
>>>
>>>      Mario Emmenlauer
>>>
>>>
>>> _______________________________________________
>>> Borgbackup mailing list
>>> Borgbackup at python.org
>>> https://mail.python.org/mailman/listinfo/borgbackup
>
>
> Viele Gruesse,
>
>      Mario Emmenlauer
>
>
> --
> BioDataAnalysis GmbH, Mario Emmenlauer      Tel. Buero: +49-89-74677203
> Balanstr. 43                   mailto: memmenlauer * biodataanalysis.de
> D-81669 München                          http://www.biodataanalysis.de/



More information about the Borgbackup mailing list