[Python-Dev] Proposal: .pyc file format change

Peter Funk pf@artcom-gmbh.de
Fri, 26 May 2000 13:50:02 +0200 (MEST)


[M.-A. Lemburg]:
> > Proposal:
> > The future format (Python 1.6 and newer) of a .pyc file should be as follows:
> > 
> > bytes 0-3   a new magic number, which should be definitely frozen in 1.6.
> > bytes 4-7   a version number (which should be == 1 in Python 1.6)
> > bytes 8-11  timestamp (mtime of .py file) (same as earlier)
> > bytes 12-*  marshalled code object (same as earlier)
> 
> This will break all tools relying on having the code object available
> in bytes[8:] and believe me: there are lots of those around ;-)

In some way, this is intentional:  If these tools (are there are really
that many out there, that munge with .pyc byte code files?) simply use
'imp.get_magic()' and then silently assume a specific content of the
marshalled code object, they probably need changes anyway, since the
code needed to deal with the new unicode object is missing from them.

> You cannot really change the file header, only add things to the end
> of the PYC file...

Why?  Will this idea really cause such earth quaking grumbling?
Please review this in the context of my proposal to change 'imp.get_magic()'
to return the old 1.5.2 MAGIC, when called without parameter.

> Hmm, or perhaps we should move the version number to the code object
> itself... after all, the changes we want to refer to
> using the version number are located in the code object and not the
> PYC file layout. Unmarshalling it would then raise the error.

Since the file layout is a very thin layer around the marshalled
code object, this makes really no big difference to me.  But it
will be harder to come up with reasonable entries for /etc/magic [1]
and similar mechanisms.  

Putting the version number at the end of file is possible. 
But such a solution is some what "dirty" and only gives the false 
impression that the general file layout (pyc[8:] instead of pyc[12:]) 
is something you can rely on until the end of time.  Hardcoding the
size of an unpadded header (something like using buffer[8:]) is IMO 
bad style anyway.

Regards, Peter
[1]: /etc/magic on Unices is a small textual data base used by the 'file' 
     command to identify the type of a file by looking at the first
     few bytes.  Unix file managers may either use /etc/magic directly
     or a similar scheme to asciociate files with mimetypes and/or default
     applications.