PEP262 - database of installed packages (Was: Re: Binary generation with distutils?

Thomas Heller theller at python.net
Fri Mar 22 18:07:43 EST 2002


A.M. Kuchling wrote in message news:slrna9kb27.nme.akuchlin at ute.mems-exchange.org...
> In article <a7d8g8$khnnh$1 at ID-59885.news.dfncis.de>, Thomas Heller wrote:
> > Are you interested in my remarks to the XXX comments *now*?
>
> Of course!
>
> --amk
Ok, here we go.

I've written my comments into a checked out version of PEP262, ran
cvs diff -c8 on it and here is the result (These are only comments,
I do not suggest to update the PEP according to this diff).

Thomas

Index: pep-0262.txt
===================================================================
RCS file: /cvsroot/python/python/nondist/peps/pep-0262.txt,v
retrieving revision 1.1
diff -c -8 -r1.1 pep-0262.txt
*** pep-0262.txt 9 Jul 2001 14:26:26 -0000 1.1
--- pep-0262.txt 22 Mar 2002 23:00:00 -0000
***************
*** 21,36 ****
--- 21,50 ----
      should be supported are:

          * Is package X on a system?
          * What version of package X is installed?
          * Where can the new version of package X be found?
            XXX Does this mean "a home page where the user can go and
            find a download link", or "a place where a program can find
            the newest version?"  Perhaps both...
+
+           THE (= Thomas Heller) thinks the first option. IMO a home
+           page for the user is much more useful until we get something
+           like to CPAN. Source distributions are platform and version
+           independent, but binary distros may come in an awful lot of
+           different formats and filenames.
+
+           "a place where a program can find the newest version" could
+           or should IMO implemented in a way where an extended
+           "package information file" would first be downloaded, which
+           would contain URLs for binary and source distributions
+           depending on platform and version and so on.
+
+
          * What files did package X put on my system?
          * What package did the file x/y/z.py come from?
          * Has anyone modified x/y/z.py locally?


  Database Location

      The database lives in a bunch of files under
***************
*** 54,69 ****
--- 68,91 ----

      XXX is the actual filename important?  Let's say the installation
      data for PIL is in the file INSTALLDB/Numeric.  Is this OK?  When
      we want to figure out if Numeric is installed, do we want to open
      a single file, or have to scan them all?  Note that for
      human-interface purposes, we'll often have to scan all the
      packages anyway, for a case-insensitive or keyword search.

+     THE: IMO typically a few dozen packages are installed, so we
+     probably have less than, say, one hundred files in the 'database',
+     which should reside in a single directory. Also it is not very
+     helpful to have the data for PIL in a file named Numeric. The
+     filename should be the package name.  Performance: I would guess
+     that reading the PKG-INFO section of these few dozen files would
+     typically take one or two seconds, so no database is needed.
+

  Database Contents

      Each file in INSTALLDB or its subdirectories describes a single
      package, and has the following contents:

          An initial line listing the sections in this file, separated
          by whitespace.  Currently this will always be 'PKG-INFO
***************
*** 76,126 ****
--- 98,184 ----
          containing the package information for a file, as described in
          PEP 241, "Metadata for Python Software Packages".

          A blank line indicating the end of the PKG-INFO section.

          An entry for each file installed by the package.
          XXX Are .pyc and .pyo files in this list?  What about compiled
          .so files?  AMK thinks "no" and "yes", respectively.
+
+         THE comments: The uninstall 'database' (which is a simple text
+         file) which bdist_wininst installers create, contain entries
+         for *every* file to be removed, this includes .pyc and .pyo
+         files. On the other hand, MD5 digests or CRC checksums are
+         probably not so useful for .pyc/.pyo files.
+

      Each file's entry is a single tab-delimited line that contains the
      following fields:
      XXX should each file entry be all on one line and
      tab-delimited?  More RFC-822 headers?  AMK thinks tab-delimited
      seems sufficent.

+     THE: tab-delimited is fine.
+
          * The file's size

          * XXX do we need to store permissions?  The owner/group?
+         THE: no, not on windows.

          * An MD5 digest of the file, written in hex.  (XXX All 16
            bytes of the digest seems unnecessary; first 8 bytes only,
            maybe?  Is a zlib.crc32() hash sufficient?)
+           THE: If the MD5 digest is calculated, which throw away
+           the first bytes? A quick test shows that the MD5 digest
+           calculation takes about twice as long as zlib.crc32(),
+           so it could be used.

          * The file's full path, as installed on the system.  (XXX
            should it be relative to sys.prefix, or sys.prefix +
            '/lib/python<version>?'  If so, full paths are still needed;
            consider a package that installs a startup script such as
            /etc/init.d/zope)
+           THE: Only full paths should be in the database file.

          * XXX some sort of type indicator, to indicate whether this is
            a Python module, binary module, documentation file, config
            file?  Do we need this?
+           THE: No, we don't need this. There is no mechanism (now) how
+           this could be specified, also I cannot think of a use case
+           where it would be needed.

      A package that uses the Distutils for installation will
      automatically update the database.  Packages that roll their own
      installation

      XXX what's the relationship between this database and the RPM or
      DPKG database?  I'm tempted to make the Python database completely
      optional; a distributor can preserve the interface of the package
      management tool and replace it with their own wrapper on top of
      their own package manager.  (XXX but how would the Distutils know
      that, and not bother to update the Python database?)
+
+     THE: Maybe I don't understand what you are talking about here. I
+     have absolutely no idea how RPM or DPKG is working, so I'm only
+     talking about bdist_wininst or maybe other windows installers
+     here. win32all and wxPython are examples of very popular packages
+     *not* distributed with distutils, but with windows installers
+     (WISE for win32all, InnoSetup for wxPython?).
+
+     1. The main purpose of the database discussed here is to determine
+     if package is installed, and which version - so IMO it should
+     *not* be left optional to update this database, even for platform
+     specific installers like RPM or windows installers.  My idea would
+     be to create the database file at build time, and simply copy it
+     into the database.
+
+     2. The existing python package managers I have seen so far (mainly
+     ciphon and pyppm) each have their own database (both have chosen
+     an XML based format), but they are certainly incompatible.


  Deliverables

      Patches to the Distutils that 1) implement a InstallationDatabase
      class, 2) Update the database when a new package is installed.  3)
      a simple package management tool, features to be added to this
      PEP.  (Or a separate PEP?)







More information about the Python-list mailing list