[Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon brett at python.org
Mon Aug 31 21:00:48 CEST 2009


On Mon, Aug 31, 2009 at 10:02, Guido van Rossum<guido at python.org> wrote:
> On Mon, Aug 31, 2009 at 9:57 AM, Brett Cannon<brett at python.org> wrote:
>> On Mon, Aug 31, 2009 at 09:33, Guido van Rossum<guido at python.org> wrote:
>>> Hm... I still wonder if there would be bad side effects of making
>>> co_filename writable, but I can't think of any, so maybe you can make
>>> this work... The next step would be to not write it out when
>>> marshalling a code object -- this might save a bit of space in pyc
>>> files too! (I guess for compatibility you might want to write it as an
>>> empty string.)
>>
>> I would only want to consider stripping out the filename from the
>> marshal format if a filename argument to marshal.load* was required to
>> guarantee that code objects always in some sensible state. Otherwise
>> everyone would end up with tracebacks that made no sense by default.
>> But adding a required argument to marshal.load* would be quite the
>> pain for compatibility.
>
> Well... It would be, but consider this: marshal.load() already takes a
> file argument; in most cases you can extract the name from the file
> easily. And for marshal.loads(), I'm not sure that the filename baked
> into the data is all that reliable anyways.
>
>>> Of course, tracking down all the code objects in the return value of
>>> marshal.load*() might be a bit tricky -- API-wise I still think that
>>> making it an argument to marshal.load*() might be simpler. Also it
>>> would preserve the purity of code objects.
>>>
>>> (Michael: it would be fine if *other* implementations of Python made
>>> co_filename writable, as long as you can't think of security issues
>>> with this.)
>>
>> OK, so what does co_filename get used for? I think it is referenced to
>> open files for use in printing out the traceback. Python won't be able
>> to open files that you can't as a user, so that shouldn't be a
>> security risk. All places where co_filename is referenced would need
>> to gain a check or start using some new C function/macro which
>> verified that co_filename was a string and not some number or
>> something else which wouldn't get null-terminated and thus lead to
>> buffer overflow.
>
> You could also do the validation on assignment.
>
>> A quick grep for co_filename turns up 17 uses in C
>> code, although having to add some check would ruin the purity Guido is
>> talking about and make a single attribute on code objects something
>> people have to be careful about instead of having a guarantee that all
>> attributes have some specific type of value.
>>
>> I'm with Guido; I would rather add an optional argument to
>> marshal.load*. It must be a string and, if present, is used to
>> override co_filename in the resulting code object. Once we have had
>> the argument around we can then potentially make it a required
>> argument and have file paths in the marshal data go away (or decide to
>> default to some string constant when people don't specify the path
>> argument).
>
> Actually that sounds like a fine transitional argument.

I will plan to take this approach then;
http://bugs.python.org/issue6811 will track all of this. Since this is
a 3.2 thing I am not going to rush to implement this.

-Brett


More information about the Python-Dev mailing list