[SciPy-user] read/write compressed files
Dominik Szczerba
domi at vision.ee.ethz.ch
Thu Jun 21 06:57:02 EDT 2007
Hi,
I meant bz2 over zlib due to higher compression, if slower performance.
This common belief was usually parallel to my experience. However, a
simple test below made with fresh morning data clearly undermines this
thinking:
> du -hsc test9*.dat
428M total
> time gzip test9*.dat
real 0m31.663s
user 0m28.946s
sys 0m1.612s
> du -hsc test9*.dat.gz
215M total
> time gunzip test9*.dat.gz
real 0m7.447s
user 0m6.036s
sys 0m1.264s
> time bzip2 test9*.dat
real 2m1.696s
user 1m54.527s
sys 0m4.008s
> du -hsc test9*.dat.bz2
219M total
> time bunzip2 test9*.dat.bz2
real 0m43.252s
user 0m39.926s
sys 0m2.792s
I am surprised, as I well remember cases where I could gain 20%. But
indeed, given the much slower performance, you have me convinced to use
zlib over bz2.
thanks for forcing me to do this test,
- Dominik
Francesc Altet wrote:
> El dc 20 de 06 del 2007 a les 21:01 +0200, en/na Dominik Szczerba va
> escriure:
>> PyTables is great (and big) while I just need to read in a sequence of
>> values.
>
> Ok, that's fine. In any case, I'm interested in knowing the reasons on
> why you are using bzip2 instead zlib. Have you detected some data
> pattern where you get significantly more compression than by using zlib
> for example?.
>
> I'm asking this because, in my experience with numerical data, I was
> unable to detect important compression level differences between bzip2
> and zlib. See:
>
> http://www.pytables.org/docs/manual/ch05.html#compressionIssues
>
> for some experiments in that regard.
>
> I'd appreciate any input on this subject (bzip2 vs zlib).
>
--
Dominik Szczerba, Ph.D.
Computer Vision Lab CH-8092 Zurich
http://www.vision.ee.ethz.ch/~domi
More information about the SciPy-User
mailing list