[Python-Dev] how important is setting co_filename for a module being imported to what file is set to?

Mon Aug 31 04:43:48 CEST 2009

On Sun, Aug 30, 2009 at 19:34, Guido van Rossum<guido at python.org> wrote:
> On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon<brett at python.org> wrote:
>> On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido at python.org> wrote:
>>> On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett at python.org> wrote:
>>>> I am going through and running the entire test suite using importlib
>>>> to ferret out incompatibilities. I have found a bunch, although all
>>>> rather minor (raising a different exception typically; not even sure
>>>> they are worth backporting as anyone reliant on the old exceptions
>>>> might get a nasty surprise in the next micro release), and now I am
>>>> down to my last failing test suite: test_import.
>>>>
>>>> Ignoring the execution bit problem (http://bugs.python.org/issue6526
>>>> but I have no clue why this is happening), I am bumping up against
>>>> TestPycRewriting.test_incorrect_code_name. Turns out that import
>>>> resets co_filename on a code object to __file__ before exec'ing it to
>>>> create a module's namespace in order to ignore the file name passed
>>>> into compile() for the filename argument. Now I can't change
>>>> co_filename from Python as it's a read-only attribute and thus can't
>>>> match this functionality in importlib w/o creating some custom code to
>>>> allow me to specify the co_filename somewhere (marshal.loads() or some
>>>> new function).
>>>>
>>>> My question is how important is this functionality? Do I really need
>>>> to go through and add an argument to marshal.loads or some new
>>>> function just to set co_filename to something that someone explicitly
>>>> set in a .pyc file? Or I can let this go and have this be the one
>>>> place where builtins.__import__ and importlib.__import__ differ and
>>>> just not worry about it?
>>>
>>> ISTR that Bill Janssen once mentioned a file replication mechanism
>>> whereby there were two names for each file: the "canonical" name on a
>>> replicated read-only filesystem, and the longer "writable" name on a
>>> unique master copy. He ended up with the filenames in the .pyc files
>>> being pretty bogus (since not everyone had access to the writable
>>> filesystem). So setting co_filename to match __file__ (i.e. the name
>>> under which the module is being imported) would be a nice service in
>>> this case.
>>>
>>> In general this would happen whenever you pre-compile a bunch of .py
>>> files to .pyc/.pyo and then copy the lot to a different location. Not
>>> a completely unlikely scenario.
>
>> Well, to get this level of compatibility I am going to need to add
>> some magical API somewhere then to overwrite a code object's "file"
>> location. Blah.
>
> Agreed, no fun. Unfortunately for core Python it really pays to go the
> extra mile...
>

Definitely, which is why I will do it, just not tonight as I am tired
of compatibility fixing for now. =)

>> I will either add an argument to marshal.loads to specify an
>> overriding file path or add an imp.exec that takes a file path
>> argument to override the code object with.
>
> Remember, there are many code objects created from one pyc file.
> Adding it to marshal.load*() makes sense because then it's usable for
> other purposes too, and that attacks the issue from the root.

That was my thinking.

> (in
> import.c it's done by update_compiled_module() right after
> read_compiled_module(), which is a thin wrapper around marshal.load())
> I'm not sure how imp.exec would make sure that introspection of the
> loaded code objects always gets the right thing.
>

Basically it would be imp.exec(module, code, path) and it would tweak
the code object before execution based on introspecting what the
module had set for __file__. But might as well add the support to
marshal.

>>> (I was going to comment on the execution bit issue but I realized I'm
>>> not even sure if you're talking about import.c or not. :-)
>>
>> So it turns out a bunch of execution/write bit stuff has come up in
>> Python 2.7 and importlib has been ignoring it. =) Importlib has simply
>> been opening up the bytecode files with 'wb' and writing out the file.
>> But test_import tests that no execution bit get set or that a write
>> bit gets added if the source file lacks it. I guess I can use
>> posix.chmod and posix.stat to copy the source file's read and write
>> bits and always mask out the execution bits. I hate this low-level
>> file permission stuff.
>
> It's no fun -- see the layers of #ifdefs in open_exclusive() in
> import.c. (Though I think you won't need to worry about VMS. :-) But
> it's somewhat important to get it right from a security POV. I would
> use os.open() and wrap an io.BufferedWriter around it.

I will have to see what of that is implemented in C or in Python. I
have always tried to keep all pure Python code out of importlib for
bootstrapping reasons in order to keep the possibility of using
importlib as the implementation of import. But maybe I should not be
worrying about that right at the moment and instead do what keeps the
code simple.

-Brett

[Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

[Python-Dev] how important is setting co_filename for a module being imported to what file is set to?