PEP 262: Database of Installed Python Packages

A.M. Kuchling akuchlin@mems-exchange.org
Wed, 27 Mar 2002 22:14:01 -0500


A package database is a necessary prequisite for managing the Python
packages installed on a system.  PEP 262 lists the requirements for
such a database and specifies a storage format for it.

I'd like to get this into Python 2.3, hopefully with a
still-to-be-specified package management tool.  Assuming no one points
out some requirement or use case missing from this draft of the PEP,
my next step will be to write a proposed interface, post that draft,
and then implement the PEP and integrate it with the Distutils.

Comments can be posted to comp.lang.python or to the Distutils SIG.

-- 
A.M. Kuchling			http://www.amk.ca
Thank you for letting me borrow your objects.
    -- Ute Lemper in concert, March 13, 1997

PEP: 262
Title: A Database of Installed Python Packages
Version: $Revision: 1.5 $
Author: A.M. Kuchling <akuchlin@mems-exchange.org>
Type: Standards Track
Created: 08-Jul-2001
Status: Draft
Post-History: 27-Mar-2002

Introduction

    This PEP describes a format for a database of Python packages 
    installed on a system.


Requirements

    We need a way to figure out what packages, and what versions of
    those packages, are installed on a system.  We want to provide
    features similar to CPAN, APT, or RPM.  Required use cases that
    should be supported are:
 
        * Is package X on a system?  
        * What version of package X is installed?
        * Where can the new version of package X be found?  (This can
          be defined as either "a home page where the user can go and
          find a download link", or "a place where a program can find
          the newest version?"  Both should probably be supported.)
        * What files did package X put on my system?
        * What package did the file x/y/z.py come from?
        * Has anyone modified x/y/z.py locally?


Database Location

    The database lives in a bunch of files under
    <prefix>/lib/python<version>/install/.  This location will be
    called INSTALLDB through the remainder of this PEP.

    The structure of the database is deliberately kept simple; each
    file in this directory or its subdirectories (if any) describes a
    single package.

    The rationale for scanning subdirectories is that we can move to a
    directory-based indexing scheme if the package directory contains
    too many entries.  For example, this would let us transparently
    switch from INSTALLDB/Numeric to INSTALLDB/N/Nu/Numeric or some
    similar hashing scheme.


Database Contents

    Each file in INSTALLDB or its subdirectories describes a single
    package, and has the following contents:

        An initial line listing the sections in this file, separated
        by whitespace.  Currently this will always be 'PKG-INFO
        FILES'.  This is for future-proofing; if we add a new section,
        for example to list documentation files, then we'd add a DOCS
        section and list it in the contents.  Sections are always
        separated by blank lines.

    PKG-INFO section

        An initial set of RFC-822 headers containing the package
        information for a file, as described in PEP 241, "Metadata for
        Python Software Packages".

        A blank line indicating the end of the PKG-INFO section.

    FILES section 
   
        An entry for each file installed by the package. Generated files
        such as .pyc and .pyo files are on this list as well as the original
        .py files installed by a package; their checksums won't be stored or
        checked, though.

    Each file's entry is a single tab-delimited line that contains
    the following fields: 

        * The file's full path, as installed on the system.  

        * The file's size

        * The file's permissions.  On Windows, this field will always be 
          'unknown'

        * The owner and group of the file, separated by a tab.
          On Windows, these fields will both be 'unknown'.
 
        * An MD5 digest of the file, encoded in hex.  

    A package that uses the Distutils for installation should
    automatically update the database.  Packages that roll their own
    installation will have to use the database's API to to manually
    add or update their own entry.  System package managers such as
    RPM or pkgadd can just create the new 'package name' file in the
    INSTALLDB directory.


Deliverables

    A description of the database API, to be added to this PEP.
  
    Patches to the Distutils that 1) implement an InstallationDatabase
    class, 2) Update the database when a new package is installed.  3)
    a simple package management tool, features to be added to this
    PEP.  (Or a separate PEP?)  


Rejected Suggestions

    Instead of using one text file per package, one large text file or
    an anydbm file could be used.  This has been rejected for a few
    reasons.  First, performance is probably not an extremely pressing
    concern as the package database is only used when installing or
    removing packages, a relatively infrequent task.  Scalability also
    likely isn't a problem, as people may have hundreds of Python
    packages installed, but thousands seems unlikely.  Finally,
    individual text files are compatible with installers such as RPM
    or DPKG because a package can just drop the new database file into
    the database directory.  If one large text file or a binary file
    were used, the Python database would then have to be updated by 
    running a postinstall script.

    On Windows, the permissions and owner/group of a file aren't
    stored.  Windows does in fact support ownership and access
    permissions, but reading and setting them requires the win32all
    extensions, and they aren't present in the basic Python installer
    for Windows.
  

References

    [1] Michael Muller's patch (posted to the Distutils-SIG around 28
        Dec 1999) generates a list of installed files.


Acknowledgements

    Ideas for this PEP originally came from postings by Greg Ward,
    Fred L. Drake Jr., Thomas Heller, Mats Wichmann, and others.

    Many changes and rewrites to this document were suggested by the
    readers of the Distutils SIG.   


Copyright

    This document has been placed in the public domain.