Most space-efficient way to store log entries

Chris Angelico rosuav at gmail.com
Thu Oct 29 06:52:16 EDT 2015


On Thu, Oct 29, 2015 at 9:35 PM, Marc Aymerich <glicerinu at gmail.com> wrote:
> 1) Each node on the cluster needs to keep track of *all* the changes
> that ever ocurred. So far, each node is storing each change as
> individual lines on a file (the "historical state log" I was referring
> to, the concept is very similar to the bitcoin blockchain)
>
> 2) The main communication channel is driven by a UDP gossip protocol.
> From the performance perspective, it makes a huge difference if the
> whole log entry fits into the UDP payload (512B), otherwise the log
> entry has to be transferred by other means. Because config files are
> mostly text, almost every single one of them can fit into a UDP
> packet, if properly compressed.
>
> After reading your replies I'm concluding that
>
> 1) I should use the most space-efficient encoding *only* for
> transferring the log entry, just lzma compress it.
> 2) I should use the most readable one for storing the block on the log
> file. Leave metadata as text and compress+base64 the "actual file
> content" so it fits in an space-less ascii block, something like:

I agree with those conclusions. If anything goes wrong, you have the
tidy log in a form that's easily dug into, and then compression is
used for transmission only.

A couple of points of interest, though:

1) Conflicts - since you lack any concept of central authority,
there's the possibility that two peers will simultaneously make
incompatible changes, and then begin propagating them through the
farm. What happens when a node receives a change it can't apply?

2) UDP is unreliable. What happens if a node misses out on a change?
Can it figure out that it's missed something, and go ask?

I'm assuming you've thought about these, and am curious as to how
you've solved them - might be useful in some of the things I've played
with.

ChrisA



More information about the Python-list mailing list