[Borgbackup] Larger setups
Thomas Waldmann
tw at waldmann-edv.de
Mon Mar 7 12:16:48 EST 2016
> The company I work for wants to switch from tape-based backups to
> file-based backups. We're building a 140 TB NAS/SAN already for this
> purpose.
I am not aware whether borg was tested on that scale yet, maybe some
other reader can comment on it.
I did personal tests with ~ 7.5TB of data.
And I'ld love to get feedback from you after your tests. :)
> My question is, I've got about 100 Linux servers to backup. Does
> BorgBackup scale that large?
I think it should not be a problem, if you do some careful planning and
scripting.
> Does it become a configuration/maintenance nightmare when it gets that big?
Aside from the commandline parameters and some env vars, there is no
configuration.
> Can I do concurrent backups of remote clients?
Not to the same repository at the same time.
But considering that amount of servers and data, using multiple
repositories is a good idea anyway.
Also, assuming the backup data goes over your network, running too many
backups in parallel would likely need more bandwidth than your network
has available anyway.
So, what I'ld do for a test setup is:
- read the docs once or twice, there are useful / important hints in there
- divide all your servers into smaller sets of servers.
servers that share a lot of identical or similar files go into same set.
the servers in the same set will share 1 repository and will have to run
backups one after each other (you can tweak --lock-wait so one waits
until the previous one has finished)
- try to achieve sets that have less than 10-20 TB of data.
The size of the chunks index is proportional to the amount of unique
data chunks stored into the repository (and that is usually proportional
to the size of your data, except if you have a lot of duplication inside
your files).
The index is stored on disk and loaded fully into RAM when borg is
running, so having that index stay within a reasonable size (that is
below your available RAM) is a good idea. The "internals" part of the
docs have a formular to estimate the index size.
Be aware that chunks index will need regular rebuilding/resyncing if you
backup multiple servers into same repository. That will cost some time
and cpu, but if the content is similar, it will save you backup space.
If you want to avoid the rebuid/resync, you'll need 1 repository per
machine (in that case, there will be no inter-machine deduplication, but
still usually a lot of historical dedup).
- if a server has a unusually high count of files or high amount of
data, use a separate repo just for that server.
- if a server has a small count of files and a small amount of data, you
can put it into another set of servers.
When you run the first backup of the server(s), it will transfer ALL
data into the repositories. Depending on your setup, that might take a
while. The 2nd+ backup will then work a lot faster and only transfer the
changed / added data.
To tune the excludes, you may want to use -v --dry-run --list and review
the file list it outputs (whether you have all you want, but no unneeded
files/dirs).
If you need help, just ask.
--
GPG ID: FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393
More information about the Borgbackup
mailing list