fseek In Compressed Files

Chris Angelico rosuav at gmail.com
Thu Jan 30 08:49:03 EST 2014


On Fri, Jan 31, 2014 at 12:34 AM, Ayushi Dalmia
<ayushidalmia2604 at gmail.com> wrote:
> where temp.txt is the posting list file which is first written in a compressed format and then read  later.

Unless you specify otherwise, a compressed file is likely to have
sub-byte boundaries. It might not be possible to seek to a specific
line.

What you could do, though, is explicitly compress each line, then
write out separately-compressed blocks. You can then seek to any one
that you want, read it, and decompress it. But at this point, you're
probably going to do better with a database; PostgreSQL, for instance,
will automatically compress any content that it believes it's
worthwhile to compress (as long as it's in a VARCHAR field or similar
and the table hasn't been configured to prevent that, yada yada). All
you have to do is tell Postgres to store this, retrieve that, and
it'll worry about the details of compression and decompression. As an
added benefit, you can divide the text up and let it do the hard work
of indexing, filtering, sorting, etc. I suspect you'll find that
deploying a database is a much more efficient use of your development
time than recreating all of that.

ChrisA



More information about the Python-list mailing list