[Python-Dev] PEP 376

Tarek Ziadé ziade.tarek at gmail.com
Mon Jun 22 17:42:24 CEST 2009


On Mon, Jun 22, 2009 at 4:59 PM, Antoine Pitrou <solipsis at pitrou.net> wrote:

>   - the **MD5** hash of the file, encoded in hex. Notice that `pyc` and
> `pyo`
>    generated files will not have a hash.
>
> Why the exception for pyc and pyo files?


As in PEP 262, since they are produced automatically from a py file,
checking the py file is enough
to decide if the file has changed.


>
>  - `zlib` and `zlib-2.5.2.egg-info` are located in `site-packages` so the
> file
>    paths are relative to it.
>
> Is it a general rule? That is, the paths in RECORD are always relative to
> its
> grandparent directory?



no, they can be located anywhere on the system. But when the paths are
located in the
same directory where the .egg-info directory is located, a relative path is
used. (see the section before this example)

I'll add an example that contains files located elswhere in the system (like
a script and a data file)



>
>
>  The RECORD format
>  -----------------
>
>  The `RECORD` file is a CSV file, composed of records, one line per
>  installed file. The ``csv`` module is used to read the file, with
>  the `excel` dialect, which uses these options to read the file:
>
>  - field delimiter : `,`
>  - quoting char :  `"`.
>  - line terminator : `\r\n`
>
> Wouldn't it be better to use the native line terminator on the current
> platform? (someone might want to edit or at least view the file)


Good idea, I'll change that,


>
>
> What is the character encoding for non-ASCII filenames? UTF-8?
>

Yes, there's a constant in Distutils, called PKG_INFO_ENCODING that will be
used for the file generation.



>
> Are the RECORD file's contents somehow part of the DistributionMetadata?



The DistributionMetadata correspond to the fields defined in PEP 345, e.g.
written in the PKG-INFO file,
which is mentioned in the RECORD file.

We are reworking PEP 345 as well, to add some fields. What did you have in
mind ?


>
>
>  - ``DistributionDirectories``: manages ``EggInfoDirectory`` instances.
>
> What is an EggInfoDirectory ?


Typo (old name), fixing this..


>
>
> A plural class name looks strange (I think it's the first time I see one in
> the CPython codebase). How about another name? (DistributionPool,
> DistributionMap, WorkingSet etc.).


Sure, WorkingSet is nice, it's the name used in setuptools,


>
>
>  - ``get_egginfo_file(path, binary=False)`` -> file object
>
>     Returns a file located under the `.egg-info` directory.
>
>     Returns a ``file`` instance for the file pointed by ``path``.
>
> Is it always a file instance or just a file-like object? (zipped
> distributions
> come to mind). Is it opened read-only?



It's in read-only mode, either "r" either "rb" and in case of a zip file,
it returns a file-like object using zipfile.ZipFile.open.


>
>  - ``owner(path)`` -> ``Distribution`` instance or None
>
>    If ``path`` is used by only one ``Distribution`` instance, returns it.
>    Otherwise returns None.
>
> This is a bit confusing. If it returns None, it doesn't distinguish between
> the
> case where several Distributions refer to the path, and the case where no
> distributions refer to it, does it?
>

The idea of this API is to find out of a distribution "owns" a file, e.g. is
the only distribution
that uses it, so it can be safely removed.


> Is there any reason to have this method while file_users(path) already
> exists?
>

Its just a helper for uninstallers, but file_users() could probably be
enough,
I can remove "owns" if people find it confusing,



>  A new class called ``DistributionDirectories`` is created. It's a
> collection of
>  ``DistributionDirectory`` and ``ZippedDistributionDirectory`` instances.
>  The constructor takes one optional argument ``use_cache`` set to ``True``
> by
>  default.
>
> You forgot to describe the constructor's signature and what it does
> exactly.
>

I'll add that,


>
>  ``EggInfoDirectories`` also provides the following methods besides the
> ones
>  from ``dict``::
>
> What is EggInfoDirectories?


This is a DistributionDirectories, one more instance I forgot to rename,
I'll fix that


>
>
>  - ``append(path)``
>
>    Creates an ``DistributionDirectory`` (or
> ``ZippedDistributionDirectory``)
>    instance for ``path`` and adds it in the mapping.
>
>  - ``load(paths)``
>
>    Creates and adds ``DistributionDirectory`` (or
>    ``ZippedDistributionDirectory``) instances corresponding to ``paths``.
>
> Why are these methods named completely differently although they do almost
> the
> same thing? Besides, append() makes it look like ordering matters. Does it?
> (for a naming suggestion, e.g. load(path) and load_all(paths). Or,
> even simpler, load(*paths) or load(paths))
>

Right, I'll fix that,



>
>  - ``get_file_users(path)`` -> Iterator of ``Distribution`` (or
>    ``ZippedDistribution``) instances.
>
> This method is named file_users in another class. Perhaps the naming should
> be
> consistent?


Right, it used to be get_*, that's a typo. I'll fix it,


>
>
>  All these functions use the same global instance of
> ``DistributionDirectories``
>  to use the cache.
>
> Is the global instance available to users?


No I didn't made it available to avoid concurrency problems,

>
>
>      >>> for path, hash, size in dist.get_installed_files()::
>      ...     print '%s %s %d %s' % (path, hash, size)
>
> There's one too many "%s" here.
>

Fixing it too,



>
>
> Thanks for your work!
>


Thanks for the feedback, I'll commit the fixes asap.

Tarek
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20090622/b2094aa6/attachment-0001.htm>


More information about the Python-Dev mailing list