Most space-efficient way to store log entries

Marc Aymerich glicerinu at gmail.com
Thu Oct 29 06:35:02 EDT 2015


On Wed, Oct 28, 2015 at 11:30 PM, Marc Aymerich <glicerinu at gmail.com> wrote:
> Hi,
> I'm writting an application that saves historical state in a log file.
> I want to be really efficient in terms of used bytes.
>
> What I'm doing now is:
>
> 1) First use zlib.compress
> 2) And then remove all new lines using binascii.b2a_base64, so I have
> a log entry per line.
>
> but b2a_base64 is far from ideal: adds lots of bytes to the compressed
> log entry. So, I wonder if perhaps there is a better way to remove new
> lines from the zlib output? or maybe a different approach?
>
> Anyone?

[....]

wow, lots of interesting replies here, allow me to clarify my
situation and answear some of the questions.

I'm writing a toy project for my master thesis, which is never going
into production.

What I'm doing is a decentralized file system for configuration
managemente (without centralized authority). This means:

1) Each node on the cluster needs to keep track of *all* the changes
that ever ocurred. So far, each node is storing each change as
individual lines on a file (the "historical state log" I was referring
to, the concept is very similar to the bitcoin blockchain)

2) The main communication channel is driven by a UDP gossip protocol.
>From the performance perspective, it makes a huge difference if the
whole log entry fits into the UDP payload (512B), otherwise the log
entry has to be transferred by other means. Because config files are
mostly text, almost every single one of them can fit into a UDP
packet, if properly compressed.

After reading your replies I'm concluding that

1) I should use the most space-efficient encoding *only* for
transferring the log entry, just lzma compress it.
2) I should use the most readable one for storing the block on the log
file. Leave metadata as text and compress+base64 the "actual file
content" so it fits in an space-less ascii block, something like:

# $ cat log
# <parent_hash> <timestamp> <action> <path> <lzma+base56 content>
<fingerprint> <signature>

a5438566b83b4383899500c6b70dcac1 1446054664 WRITE /.keys
TUY4Q0FRRUVHQHNkl6MTNtZz09Cg==
2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f
iPDxBYuUEjlZl99/xGCNzpbuDezJJfolr+eNLNrXEYAgG/0yme3bu9DCkPO9Gq7+

cb4f67a712964699a5c2d49a42e48946 1446054664 WRITE /.cluster
MTcyLjE3LjLjEK 2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f
/VKMeVG95MT9VdObRyhidzxIgiTef+7nl3flgQpqVAgRfhqrBGRB4XTgJFSelvCo

5041fba6b6534dfe92bf99ed5ead8fa6 1446055543 MKDIR /etc
2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f
+CMeVp33FxXFSfczbmkoW4tnalu5ojuC1WprMkc7Kxp/WHlMsx9Os3Zal0Bi/uD8

80c47cd5a73e4881b7284eed465ab10a 1446055843 WRITE /etc/node.conf
aG9sYQo= 2d:ce:6d:c5:95:54:cb:d2:fe:ba:68:ed:1d:8e:74:0f
oQVF7UCAFRCC7cC0Ln8V16f8mnON465sdXoIEIGCKBUOWOBE5daFmJTu0thAkXVf



-- 
Marc



More information about the Python-list mailing list