quick n' dirty measurement of compression and byte-compilation
Zooko
zooko at zooko.com
Tue Jan 1 12:08:21 EST 2002
[Please Cc: zooko at zooko.com in replies. Thank you!]
Dear Pythonismos and Pythonoreans:
I noticed that the PyXML build script byte-compiles its .py files. It seems
like this potentially introduces incompatibility if the version of Python used
to build differs from the version used to run the resulting package. Python
seems to have broken forward- and backward-compatibility for bytecode in almost
every release from 1.5 to 2.2.
I wondered about the savings in space and in load time that we get by byte-
compiling files before packaging them, so I collected all of the .py files in my
Mojo Nation[1] directory and byte-compiled and compressed them in various ways.
There were 3.5 MB worth of .py files. This includes complete copies of
PyXML v0.6.6 and pybsddb v3.3.0 and some source code borrowed from Zope as well
as the Mojo Nation source code.
It took 14 seconds to byte-compile all of them, in either -OO mode (optimized,
no docstrings) or in normal mode (non-optimized, including docstrings), on my
Pentium III 450 MHz laptop. Therefore I estimate that it takes approximately
4 milliseconds per KB of source code to do byte-compilation on such a PC.
Here are the sizes of the resulting files:
Key:
py. == not byte-compiled
pyc. == byte-compiled in normal mode
pyo. == byte-compiled in -OO mode
.tar == uncompressed
.tar.gz3 == compressed with `gzip -3' (normal gzip compression)
.tar.gz9 == compressed with `gzip -9'
.tar.bz2 == compressed with `bzip2 -9'
files sorted by type:
3512320 Jan 1 07:46 py.tar
4003840 Jan 1 07:46 pyc.tar
3317760 Jan 1 07:46 pyo.tar
739409 Jan 1 07:43 py.tar.gz3
1122935 Jan 1 07:43 pyc.tar.gz3
808596 Jan 1 07:43 pyo.tar.gz3
732414 Jan 1 07:43 py.tar.gz9
1115887 Jan 1 07:43 pyc.tar.gz9
799386 Jan 1 07:43 pyo.tar.gz9
601511 Jan 1 07:44 py.tar.bz2
846736 Jan 1 07:44 pyc.tar.bz2
608945 Jan 1 07:44 pyo.tar.bz2
files sorted by size:
601511 Jan 1 07:44 py.tar.bz2
608945 Jan 1 07:44 pyo.tar.bz2
732414 Jan 1 07:43 py.tar.gz
739409 Jan 1 07:43 py.tgz
799386 Jan 1 07:43 pyo.tar.gz
808596 Jan 1 07:43 pyo.tgz
846736 Jan 1 07:44 pyc.tar.bz2
1115887 Jan 1 07:43 pyc.tar.gz
1122935 Jan 1 07:43 pyc.tgz
3317760 Jan 1 07:46 pyo.tar
3512320 Jan 1 07:46 py.tar
4003840 Jan 1 07:46 pyc.tar
The surprising fact is that for these source files the original .py's compress
better than the .pyo's!
My suggestions are:
1. For compatibility and small packages, distribute plain .py's, not byte-
compiled files.
2. If you want even smaller packages, use bzip2.
3. If you want faster start-up time, make sure that the byte-compiled files get
persistently cached on the end-user's computer (that is, by making the directory
that contains the .py's writable by the user, or by byte-compiling upon
installation).
4. If #3 is difficult, check whether the actual difference in start-up times is
sufficiently important to you. It appears to be negligible for many purposes.
Regards,
Zooko
http://zooko.com/
Security and Distributed Systems Engineering
[1] http://mojonation.net/
More information about the Python-list
mailing list