[Python-Dev] PEP 376

Antoine Pitrou solipsis at pitrou.net
Mon Jun 22 16:59:24 CEST 2009


Hello,

Tarek Ziadé <ziade.tarek <at> gmail.com> writes:
> 
> so I am mailing it here again, for a new round of feedback, if needed.
> 
> - the pep : http://svn.python.org/projects/peps/trunk/pep-0376.txt

Some comments:

  - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and `pyo`
    generated files will not have a hash.

Why the exception for pyc and pyo files?

  - `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the file
    paths are relative to it.

Is it a general rule? That is, the paths in RECORD are always relative to its
grandparent directory?

  The RECORD format
  -----------------

  The `RECORD` file is a CSV file, composed of records, one line per
  installed file. The ``csv`` module is used to read the file, with
  the `excel` dialect, which uses these options to read the file:

  - field delimiter : `,`
  - quoting char :  `"`.
  - line terminator : `\r\n`

Wouldn't it be better to use the native line terminator on the current
platform? (someone might want to edit or at least view the file)

What is the character encoding for non-ASCII filenames? UTF-8?

Are the RECORD file's contents somehow part of the DistributionMetadata?

  - ``DistributionDirectories``: manages ``EggInfoDirectory`` instances.

What is an EggInfoDirectory ?

A plural class name looks strange (I think it's the first time I see one in
the CPython codebase). How about another name? (DistributionPool,
DistributionMap, WorkingSet etc.).

  - ``get_egginfo_file(path, binary=False)`` -> file object

     Returns a file located under the `.egg-info` directory.

     Returns a ``file`` instance for the file pointed by ``path``.

Is it always a file instance or just a file-like object? (zipped distributions
come to mind). Is it opened read-only?

  - ``owner(path)`` -> ``Distribution`` instance or None

    If ``path`` is used by only one ``Distribution`` instance, returns it.
    Otherwise returns None.

This is a bit confusing. If it returns None, it doesn't distinguish between the
case where several Distributions refer to the path, and the case where no
distributions refer to it, does it?

Is there any reason to have this method while file_users(path) already exists?

  A new class called ``DistributionDirectories`` is created. It's a collection of
  ``DistributionDirectory`` and ``ZippedDistributionDirectory`` instances.
  The constructor takes one optional argument ``use_cache`` set to ``True`` by
  default.

You forgot to describe the constructor's signature and what it does exactly.

  ``EggInfoDirectories`` also provides the following methods besides the ones
  from ``dict``::

What is EggInfoDirectories?

  - ``append(path)``

    Creates an ``DistributionDirectory`` (or ``ZippedDistributionDirectory``)
    instance for ``path`` and adds it in the mapping.

  - ``load(paths)``

    Creates and adds ``DistributionDirectory`` (or
    ``ZippedDistributionDirectory``) instances corresponding to ``paths``.

Why are these methods named completely differently although they do almost the
same thing? Besides, append() makes it look like ordering matters. Does it?
(for a naming suggestion, e.g. load(path) and load_all(paths). Or,
even simpler, load(*paths) or load(paths))

  - ``get_file_users(path)`` -> Iterator of ``Distribution`` (or
    ``ZippedDistribution``) instances.

This method is named file_users in another class. Perhaps the naming should be
consistent?

  All these functions use the same global instance of ``DistributionDirectories`` 
  to use the cache.

Is the global instance available to users?

      >>> for path, hash, size in dist.get_installed_files()::
      ...     print '%s %s %d %s' % (path, hash, size)

There's one too many "%s" here.


Thanks for your work!

Antoine.




More information about the Python-Dev mailing list