[issue22789] Compress the marshalled data in PYC files

Marc-Andre Lemburg report at bugs.python.org
Sat Nov 8 11:08:17 CET 2014


Marc-Andre Lemburg added the comment:

On 08.11.2014 10:28, Serhiy Storchaka wrote:
> Compressing pyc files one by one wouldn't save much space because disk space is allocated by blocks (up to 32 KiB on FAT32). If the size of pyc file is less than block size, we will not gain anything. ZIP file has advantage due more compact packing of files. In additional it can has less access time due to less fragmentation. Unfortunately it doesn't support the LZ4 compression, but we can store LZ4 compressed files in ZIP file without additional compression.
> 
> Uncompressed TAR file has same advantages but needs longer initialization time (for building the index).

The aim is to reduce file load time, not really to save disk space.
By having less data to read from the disk, it may be possible
to achieve a small startup speedup.

However, you're right in that using a single archive with many PYC files
would be more efficient, since it lowers the number of stat() calls.
The trick to store LZ4 compressed data in a ZIP file would enable this.

BTW: We could add optional LZ4 compression to the marshal format to
make all this work transparently and without having to change the
import mechanism itself:

We'd just need to add a new flag or type code indicating that the rest
of the stream is LZ4 compressed. The PYC writer could then enable this
flag or type code per default (or perhaps enabled via some env var od
command line flag) and everything would then just work with both
LZ4 compressed byte code as well as non-compressed byte code.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue22789>
_______________________________________


More information about the Python-bugs-list mailing list