quick n' dirty measurement of compression and byte-compilation

Zooko zooko at zooko.com
Tue Jan 1 12:08:21 EST 2002


[Please Cc: zooko at zooko.com in replies.  Thank you!]

Dear Pythonismos and Pythonoreans:

I noticed that the PyXML build script byte-compiles its .py files.  It seems 
like this potentially introduces incompatibility if the version of Python used 
to build differs from the version used to run the resulting package.  Python 
seems to have broken forward- and backward-compatibility for bytecode in almost 
every release from 1.5 to 2.2.

I wondered about the savings in space and in load time that we get by byte-
compiling files before packaging them, so I collected all of the .py files in my 
Mojo Nation[1] directory and byte-compiled and compressed them in various ways.

There were 3.5 MB worth of .py files.  This includes complete copies of 
PyXML v0.6.6 and pybsddb v3.3.0 and some source code borrowed from Zope as well 
as the Mojo Nation source code.

It took 14 seconds to byte-compile all of them, in either -OO mode (optimized, 
no docstrings) or in normal mode (non-optimized, including docstrings), on my 
Pentium III 450 MHz laptop.  Therefore I estimate that it takes approximately 
4 milliseconds per KB of source code to do byte-compilation on such a PC.

Here are the sizes of the resulting files:

Key:

py. == not byte-compiled
pyc. == byte-compiled in normal mode
pyo. == byte-compiled in -OO mode

.tar == uncompressed
.tar.gz3 == compressed with `gzip -3' (normal gzip compression)
.tar.gz9 == compressed with `gzip -9'
.tar.bz2 == compressed with `bzip2 -9'

files sorted by type:

 3512320 Jan  1 07:46 py.tar
 4003840 Jan  1 07:46 pyc.tar
 3317760 Jan  1 07:46 pyo.tar
  739409 Jan  1 07:43 py.tar.gz3
 1122935 Jan  1 07:43 pyc.tar.gz3
  808596 Jan  1 07:43 pyo.tar.gz3
  732414 Jan  1 07:43 py.tar.gz9
 1115887 Jan  1 07:43 pyc.tar.gz9
  799386 Jan  1 07:43 pyo.tar.gz9
  601511 Jan  1 07:44 py.tar.bz2
  846736 Jan  1 07:44 pyc.tar.bz2
  608945 Jan  1 07:44 pyo.tar.bz2

files sorted by size:

  601511 Jan  1 07:44 py.tar.bz2
  608945 Jan  1 07:44 pyo.tar.bz2
  732414 Jan  1 07:43 py.tar.gz
  739409 Jan  1 07:43 py.tgz
  799386 Jan  1 07:43 pyo.tar.gz
  808596 Jan  1 07:43 pyo.tgz
  846736 Jan  1 07:44 pyc.tar.bz2
 1115887 Jan  1 07:43 pyc.tar.gz
 1122935 Jan  1 07:43 pyc.tgz
 3317760 Jan  1 07:46 pyo.tar
 3512320 Jan  1 07:46 py.tar
 4003840 Jan  1 07:46 pyc.tar

The surprising fact is that for these source files the original .py's compress 
better than the .pyo's!

My suggestions are:

1. For compatibility and small packages, distribute plain .py's, not byte-
compiled files.

2. If you want even smaller packages, use bzip2.

3. If you want faster start-up time, make sure that the byte-compiled files get 
persistently cached on the end-user's computer (that is, by making the directory 
that contains the .py's writable by the user, or by byte-compiling upon 
installation).

4. If #3 is difficult, check whether the actual difference in start-up times is 
sufficiently important to you.  It appears to be negligible for many purposes.

Regards,

Zooko

http://zooko.com/
Security and Distributed Systems Engineering

[1] http://mojonation.net/





More information about the Python-list mailing list