Most space-efficient way to store log entries

Chris Angelico rosuav at gmail.com
Wed Oct 28 20:39:26 EDT 2015


On Thu, Oct 29, 2015 at 9:53 AM, Tim Chase
<python.list at tim.thechases.com> wrote:
> On 2015-10-29 09:38, Chris Angelico wrote:
>> On Thu, Oct 29, 2015 at 9:30 AM, Marc Aymerich
>> <glicerinu at gmail.com> wrote:
>> > I'm writting an application that saves historical state in a log
>> > file. I want to be really efficient in terms of used bytes.
>>
>> Why, exactly?
>>
>> By zipping the state, you make it utterly opaque.
>
> If it's only zipped, it's not opaque.  Just `zcat` or `zgrep` and
> process away.  The whole base64+minus_newlines thing does opaquify
> and doesn't really save all that much for the trouble.

If you zip the whole file as a whole, yes. If you zip individual
pieces, you can't zcat it (at least, I don't think so?). Conversely,
zipping the whole file means you have no choice but to sequentially
scan it - you can't pull up the last section of the file. It's still a
binary blob to many tools - we as humans may have handy tools around,
but it's still going to be an extra step for any tool that doesn't
intrinsically support it.

>> Disk space is not expensive. Even if you manage to cut your file by
>> a factor of four (75% compression, which is entirely possible if
>> your content is plain text, but far from guaranteed)
>
> Though one also has to consider the speed of reading it off the drive
> for processing.  If you have spinning-rust drives, it's pretty slow
> (and SSD is still not like accessing RAM), and reading zipped
> content can shovel a LOT more data at your CPU than if it is coming
> off the drive uncompressed.  Logs aren't much good if they aren't
> being monitored and processed for the information they contain.  If
> nobody is monitoring the logs, just write them to /dev/null for 100%
> compression. ;-)

Yeah. There are lots of considerations, but frankly, I don't think
disk _capacity_ is a big one. Sometimes you _might_ get some benefit
from compression (writing less sectors might save you time), but I
almost never fill up my hard drives, and when I do, it's usually with
already-compressed data (movies and stuff).

ChrisA



More information about the Python-list mailing list