[Python-Dev] zipimport.c broken with implicit namespace packages

Sun Jan 3 12:31:28 EST 2016

On Sat, 2 Jan 2016 at 21:31 <mike.romberg at comcast.net> wrote:

> >>>>> " " == Brett Cannon <brett at python.org> writes:
>
>      > I just wanted to quickly say that Guido's observation as to how
>      > a VFS is overkill is right. Imagine implementing a loader using
>      > sqlite and you quickly realize that doing a dull VFS is more
>      > than necessary to implement what import needs to function.
>
>   I fear I've made a poor choice in calling this abstract class a VFS
> (I'm terrible with names).  I'm not thinking of anything along the
> lines of a full file system that supports open(), seek(), read() and
> everything else.   That for sure would be overkill and way more
> complicated than it needs to be.
>
>   All I'm really thinking about is a simple abstract interface that is
> used by an importer to actually locate and retrieve the binary objects
> that will be loaded.  For the simple case I think just two methods
> would/could server a read only "blob/byte database":
>
>   listdir(path)  # returns an iterable container of "files"/"dirs" found
>                  # at path
>
>   get(path)      # returns a bytes object for the given path
>
>   I think with those two virtual calls a more high level import layer
> can locate and retrieve modules to be loaded without even knowing
> where they came from.
>
>   The higher level would know about things such as the difference
> between .py and .pyc "files" or the possible existance of __pycache__
> directories and what may be found in them.  Right now the zipimporter
> contains a list of file extensions to try and load (and in what
> order).  It also lacks any knowledge of __pycache__ directories (which
> is one of the outstanding bugs with it).   It just seems to me that
> this sorta logic would be better moved to a higher layer and the zip
> layer just translates paths into reads of byte blobs.
>
>   I mentioned write()/put() for two reasons:
>
>   1)  When I import a .py file then a .pyc file is created on my
>       filesystem.  I don't really know what piece of code created it.
>       But a write to the filesystem (assuming it is writeable and
>       permissions set etc) occurs.   It might be nice for other
>       storage systems (zip, sql, etc) could optionally support this.
>       They could if the code that crated the .pyc simply did a put()
>       to the object that pulled in the .py file.  The interface is
>       expanded by two calls (put() and delete()).
>
>   2)  Integration with package data.  I know there are
>       modules/packages out there that help a module try and locate
>       data files that may be associated with a package.  I think it
>       would be kinda cool for a module to instead be able to get a
>       handle to the abstract class that loaded it.  it could then use
>       the same listdir() get() and possibly write methods the importer
>       did.  The writing bit of this may or may not be a good idea :)
>
>
So it's possible that some abstraction might be possible, but up to this
point there hasn't been a motivating factor for importlib to do this as
zipimport is written in C and thus won't benefit from any abstraction that
importlib uses in its Python code (hence why zipimport needs to be
rewritten). Maybe after we have a zip-backed importer written in Python a
common API will fall through and we can abstract that out, but I wouldn't
want to guess until the code is written and someone can stare at the two
implementations.

It should also be said that there is nothing requiring that an importer
support any concept of a file. Going back to the sqlite database example,
you could make it nothing more than a table that maps module name to source
code and bytecode:

CREATE TABLE code (
    name TEXT PRIMARY KEY NOT NULL,
    source BLOB,
    bytecode BLOB
);
/* A trigger that re-generates `bytecode` when `source` is set would be
nice in this example, but that's beyond my SQL abilities. */

That's enough to implement an importer and there is no mention of file
paths anywhere. If you really wanted to could do a second table that acted
as a database-backed file system for reading package data files, but that
is not required to meet the minimum functionality -- and then some thanks
to bytecode support -- for an importer.

>
>   Anyway, hope I did not muddy the waters.  I was just thinking a bit
> out loud and none of this may live past my own experiments.   I was/am
> just trying to think of why the importers like the zipimporter don't
> work like a filesystem importer and how they would be cleaner if they
> just dealt with paths and byte blobs to store/get based on those paths.
>

It's a reasonable thing to consider, but it would be better to get
zipimport fixed for you, then rewritten, and then look at code for the
rewritten zipimport and what importlib.machinery.SourceFileLoader has to
see if there is a common API to abstract out and build upon.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20160103/ac7b0927/attachment.html>