[Borgbackup] borg basic understanding

Thu May 4 08:55:44 EDT 2017

> from client side :
> 
> - borg parses each files to backup

maybe say "discovers" not "parses".

> - checks if the file have been touched or modified

yes, mtime, inode number, size is checked.

> - If the file is new or has been modified :
>   - borg splits the file into several chunks

yes, except if the file is rather small and thus only results in 1 chunk.

important here is that it is "content defined" chunking, so the cutting
offsets are not fixed raster, but determined by the content (the rolling
hash value). so insertions and deletions in a file do not trigger a lot
new chunks just because of the shift.

>   - then checks in the local cache (on client side) if a chunk already
> "exists" (has been already backuped) or not?
> 
> - If the chunk is new (never seen before) :
>   - the chunk is compressed and then encrypted *before* sending it over
> the network

after encryption, the data is additionally authenticated (a MAC is
computed) to "protect" it against tampering. so it is "authenticated
encryption".

>   - sends it over the network to the server
> 
> [main features of the backup processe relies on the local cache no?]

yes, the local chunks cache and files cache are responsible for the good
speed.

there are some more little details on the client side when encryption is
on (default):

- the chunk IDs are MACs (not just hashes over the plaintext as that
would tell too much about your data). only who can use the key can
compute these MACs.

- the chunker is seeded with a fixed, but per-key random value (so the
cutting offsets and lengths of cut pieces differ from another repo /
another key chunking the same data). only who can use the key can cut
chunks in that specific way.

this is both to protect privacy / confidentiality of your data and
counter attacks on untrusted repo storage.

> on server side  :
> what are the main actions in the backup process on borg server side ? 

it's basically a key/value store (with locking and an internal index).
it also computes CRC32 checksums over the stored entries, so a repo
server side check for accidential corruption can be done.

as it only sees encrypted data and metadata, it can not (and shall not)
do any high-level operations.

> if interested i can give my presentation for french audience

if you publish the slides somewhere, we could link to them.

-- 

GPG ID: 9F88FB52FAF7B393
GPG FP: 6D5B EF9A DD20 7580 5747 B70F 9F88 FB52 FAF7 B393