[Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

Brett Cannon brett at python.org
Mon Aug 31 18:57:13 CEST 2009


On Mon, Aug 31, 2009 at 09:33, Guido van Rossum<guido at python.org> wrote:
> On Mon, Aug 31, 2009 at 9:27 AM, Brett Cannon<brett at python.org> wrote:
>> On Mon, Aug 31, 2009 at 08:10, Antoine Pitrou<solipsis at pitrou.net> wrote:
>>> Benjamin Peterson <benjamin <at> python.org> writes:
>>>>
>>>> > Why can't we simply make co_filename a writable attribute instead of
>>> inventing
>>>> > some complicated API?
>>>>
>>>> Because code objects are supposed to be a immutable hashable object?
>>>
>>> Right, but co_filename is used neither in tp_hash nor in tp_richcompare.
>>
>> I didn't suggest this since I assumed co_filename was made read-only
>> for a reason back when the design decision was made. But if the
>> original safety concerns are not there then I am happy to simply
>> change the attribute to writable.
>
> Hm... I still wonder if there would be bad side effects of making
> co_filename writable, but I can't think of any, so maybe you can make
> this work... The next step would be to not write it out when
> marshalling a code object -- this might save a bit of space in pyc
> files too! (I guess for compatibility you might want to write it as an
> empty string.)

I would only want to consider stripping out the filename from the
marshal format if a filename argument to marshal.load* was required to
guarantee that code objects always in some sensible state. Otherwise
everyone would end up with tracebacks that made no sense by default.
But adding a required argument to marshal.load* would be quite the
pain for compatibility.

>
> Of course, tracking down all the code objects in the return value of
> marshal.load*() might be a bit tricky -- API-wise I still think that
> making it an argument to marshal.load*() might be simpler. Also it
> would preserve the purity of code objects.
>
> (Michael: it would be fine if *other* implementations of Python made
> co_filename writable, as long as you can't think of security issues
> with this.)

OK, so what does co_filename get used for? I think it is referenced to
open files for use in printing out the traceback. Python won't be able
to open files that you can't as a user, so that shouldn't be a
security risk. All places where co_filename is referenced would need
to gain a check or start using some new C function/macro which
verified that co_filename was a string and not some number or
something else which wouldn't get null-terminated and thus lead to
buffer overflow. A quick grep for co_filename turns up 17 uses in C
code, although having to add some check would ruin the purity Guido is
talking about and make a single attribute on code objects something
people have to be careful about instead of having a guarantee that all
attributes have some specific type of value.

I'm with Guido; I would rather add an optional argument to
marshal.load*. It must be a string and, if present, is used to
override co_filename in the resulting code object. Once we have had
the argument around we can then potentially make it a required
argument and have file paths in the marshal data go away (or decide to
default to some string constant when people don't specify the path
argument).

-Brett


More information about the Python-Dev mailing list