Most space-efficient way to store log entries

Cameron Simpson cs at zip.com.au
Wed Oct 28 21:09:13 EDT 2015


On 29Oct2015 11:39, Chris Angelico <rosuav at gmail.com> wrote:
>> If it's only zipped, it's not opaque.  Just `zcat` or `zgrep` and
>> process away.  The whole base64+minus_newlines thing does opaquify
>> and doesn't really save all that much for the trouble.
>
>If you zip the whole file as a whole, yes. If you zip individual
>pieces, you can't zcat it (at least, I don't think so?).

If it is pure gzip, then yes you can. So this:

  gunzip < file1.gz; gunzip < file2.gz

and this:

  cat file1.gz file2.gz | gunzip

should produce the same output. I think this works at the record level too.

Of course all bets are off once you wrap the records in some outer layer (I 
have a file format with is little records which may have the data section 
zipped).

>Conversely,
>zipping the whole file means you have no choice but to sequentially
>scan it - you can't pull up the last section of the file. It's still a
>binary blob to many tools - we as humans may have handy tools around,
>but it's still going to be an extra step for any tool that doesn't
>intrinsically support it.

Yes. But if you're keeping a lot of data or you're using a very constrained 
system you probably do want compression somewhere in there. Maybe the OP is 
optimising prematurely, but again, maybe not.

However it sounds like the OP wants a text log encoding some test state, and is 
just compressing to gain a little room; I suspect that with a short record you 
might put on a line the compression obtained will be small and the loss from 
any base64 post step will undo it all.  He may be better off keeping 
conventional text logs and just rotating them and compressing the rotated 
copies.

Cheers,
Cameron Simpson <cs at zip.com.au>

Hoping to shave precious seconds off the time it would take me to get through 
the checkout process and on my way home, I opted for the express line ("9 Items 
Or Less [sic]"  Why nine items?  Where do they come up with these rules, 
anyway?  It's the same way at most stores -- always some oddball number like 
that, instead of a more understandable multiple of five.  Like "five.")
- Geoff Miller, geoffm at purplehaze.Corp.Sun.COM



More information about the Python-list mailing list