[Borgbackup] how to back up 300 million files

Zsolt Ero zsolt.ero at gmail.com
Thu May 4 09:18:16 EDT 2017


I can confirm that it is not possible to create the initial backup even in 1.10:

borg create --stats --progress --one-file-system backup::2 imagetiles
Killed stale lock production at 207216302901725.27788-0.
Removed stale exclusive roster lock for pid 27788.
Removed stale exclusive roster lock for pid 27788.
Killed stale lock production at 207216302901725.27788-0.
Removed stale exclusive roster lock for pid 27788.
Removed stale exclusive roster lock for pid 27788.
KilledGB O 53.09 GB C 310.07 kB D 4259789 N
imagetiles/3502f5816ea3ac8b/TileGroup205/9-173-99.jpg


borg info -v backup
Killed stale lock production at 207216302901725.28349-0.
Removed stale exclusive roster lock for pid 28349.
Killed stale lock production at 207216302901725.28349-0.
Removed stale exclusive roster lock for pid 28349.
Removed stale exclusive roster lock for pid 28349.
Repository ID: e7ae835fb34055a6d6f1d2ef469a681eeb65e78a5e2074bd24edef7d975d1462
Location: ...
Encrypted: Yes (authenticated BLAKE2b)
Cache: /root/.cache/borg/e7ae835fb34055a6d6f1d2ef469a681eeb65e78a5e2074bd24edef7d975d1462
Security dir: /root/.config/borg/security/e7ae835fb34055a6d6f1d2ef469a681eeb65e78a5e2074bd24edef7d975d1462
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
All archives:               57.76 GB             56.09 GB             54.83 GB

                       Unique chunks         Total chunks
Chunk index:                 3815976              4460576

On 4 May 2017 at 13:22, Zsolt Ero <zsolt.ero at gmail.com> wrote:
> The files are not identical, and are not compressible (tiny jpg files
> for a map-based viewer), and they will never change.
>
> I think I'll write a custom script to tar them up into 500 tar files
> based on subdirs and just
> store those tars somewhere cheap (S3, etc.).
>
> Zsolt
>
>
>
> On 4 May 2017 at 12:59, Mario Emmenlauer <mario at emmenlauer.de> wrote:
>>
>> On 04.05.2017 12:47, Maurice Libes wrote:
>>> Le 04/05/2017 à 12:26, Marian Beermann a écrit :
>>>> As far as I can see the information there is correct and complete.
>>>> MAX_OBJECT_SIZE is an internal constant
>>>>
>>>>> ... limited in size to MAX_OBJECT_SIZE (20MiB).
>>>> Regarding 1.1.x beta compatibility with stable releases; there is no
>>>> intent to break it. Doing so would make them pointless, since no one
>>>> would test such an unstable release.
>>>>
>>>> Cheers, Marian
>>>>
>>>> On 04.05.2017 12:20, Zsolt Ero wrote:
>>>>> Also, is this page still not updated to reflect 1.10 changes?
>>>>> http://borgbackup.readthedocs.io/en/latest/internals/data-structures.html#note-about-archive-limitations
>>>>>
>>>>> Is MAX_OBJECT_SIZE a constant or can be set using run time parameters?
>>>>>
>>>>> On 4 May 2017 at 11:42, Zsolt Ero <zsolt.ero at gmail.com> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am trying to solve the problem of backing up 300 million files,
>>>>>> preferably with borg. The files are small, totalling only 100 GB all
>>>>>> together (12 kb on average).`
>>> other answer/question from another point of view of a neophyte:
>>>
>>> Is borg an appropriate solution in this case of very small files (12kb) , since borg will never split the files into chunks?
>>> so don't we lose the benefit of deduplication ?
>>> or am I wrong?
>>> I don't remember what is the limit for a file to be split into chunks
>>
>> If some of the files are identical, they would still be de-duplicated.
>> But I agree that its not the standard use case of borg.
>>
>> Zsolt, do you have many duplicate files in your collection? If not, do
>> the files often change? Did you think about a simpler backup solution
>> like rsync with hard-links?
>>
>> Just my two cents.
>>
>> Cheers,
>>
>>     Mario
>>
>> _______________________________________________
>> Borgbackup mailing list
>> Borgbackup at python.org
>> https://mail.python.org/mailman/listinfo/borgbackup


More information about the Borgbackup mailing list