[Python-ideas] Prevent importing yourself?

Sat Jan 30 01:58:51 EST 2016

On Fri, Jan 29, 2016 at 04:16:46PM -0800, Andrew Barnert wrote:
> On Jan 29, 2016, at 15:42, Sjoerd Job Postmus <sjoerdjob at sjec.nl> wrote:
> > 
> > What I experienced was having collisions on the python-path, and modules
> > from my codebase colliding with libraries in the stdlib (or outside it).
> > For example, a library might import one of its dependencies which
> > coincidentally had the same name as one of the libraries I have.
> 
> Yes. The version of this I've seen most from novices is that they write a program named "json.py" that imports and uses requests, which tries to use the stdlib module json, which gives them an AttributeError on json.loads.
> 
> (One of my favorite questions on StackOverflow came from a really smart novice who'd written a program called "time.py", and he got an error about time.time on one machine, but not another. He figured out that obviously, requests wants him to define his own time function, which he was able to do by using the stuff in datetime. And he figured out the probable difference between the two machines--the working one had an older version of requests. He just wanted to know why requests didn't document this new requirement that they'd added. :))
> 
> > Maybe a suggestion would be to add the path of the module to the error
> > message?
> 
> That would probably help, but think about what it entails:
> 
> Most AttributeErrors aren't on module objects, they're on instances of user-defined classes with a typo, or on None because the user forgot a "return" somewhere, or on str because the user didn't realize the difference between the string representation of an object and the objects, etc.

True. Most AttributeErrors are on user-defined classes with a typo. But
that's not the case we're discussing here. Here we are discussing how a
user should debug the effects of module name collisions, and the
resulting AttributeError.

I would expect it to be quite unlikely that two modules with the same
name each have a class with the same name, and you accidentally
initialize the wrong one.

More likely (in my experience) is that you get an AttributeError on a
module (in the case of module-name collisions).

> To make matters worse, AttributeError objects don't even carry the name of the object being attributed, so even if you wanted to make tracebacks do some magic if isinstance(obj, types.ModuleType), there's no way to do it.
> 
> So, that means you'd have to make ModuleType.__getattr__ do the special error message formatting. 

Yes, indeed. That's what I was thinking of. I decided to write up a quick hack that added the filename to the exception string.

    sjoerdjob$ ../python mod_a.py 
    Traceback (most recent call last):
      File "mod_a.py", line 4, in <module>
        print(parse(JSON_DATA))
      File "/home/sjoerdjob/dev/cpython/tmp/mod_b.py", line 4, in parse
        return json.loads(blob)
    AttributeError: module 'json' (loaded from /home/sjoerdjob/dev/cpython/tmp/json.py) has no attribute 'loads'

Here's the patch, in case anyone is interested.

    diff --git a/Objects/moduleobject.c b/Objects/moduleobject.c
    index 24c5f4c..5cc144a 100644
    --- a/Objects/moduleobject.c
    +++ b/Objects/moduleobject.c
    @@ -654,17 +654,25 @@ module_repr(PyModuleObject *m)
     static PyObject*
     module_getattro(PyModuleObject *m, PyObject *name)
     {
    -    PyObject *attr, *mod_name;
    +    PyObject *attr, *mod_name, *mod_file;
         attr = PyObject_GenericGetAttr((PyObject *)m, name);
         if (attr || !PyErr_ExceptionMatches(PyExc_AttributeError))
             return attr;
         PyErr_Clear();
         if (m->md_dict) {
             _Py_IDENTIFIER(__name__);
             mod_name = _PyDict_GetItemId(m->md_dict, &PyId___name__);
             if (mod_name) {
    -            PyErr_Format(PyExc_AttributeError,
    +            _Py_IDENTIFIER(__file__);
    +            mod_file = _PyDict_GetItemId(m->md_dict, &PyId___file__);
    +            if (mod_file && PyUnicode_Check(mod_file)) {
    +                PyErr_Format(PyExc_AttributeError,
    +                        "module '%U' (loaded from %U) has no attribute '%U'", mod_name, mod_file, name);
    +            } else {
    +                PyErr_Format(PyExc_AttributeError,
                             "module '%U' has no attribute '%U'", mod_name, name);
    +            }
                 return NULL;
             }
             else if (PyErr_Occurred()) {

Unfortunately, I do think this might impose **some** performance issue, but on
the other hand, I'd be inclined to think that attribute-errors on module
objects are not that likely to begin with, except for typos and issues
like these. (And of course the case that you have to support older
versions of Python with a slower implementation, but you most often see
those checks being done at the module-level, so it would only impact
load-time and not running-time.)

The added benefit would be quicker debugging when finally having posted
to a forum: "Ah, I see from the message that the path of the module is
not likely a standard-library path. Maybe you have a name collision?
Check for files or directories named '<module name here>(.py)' in your
working directory / project / ... .

> 
> > (Of course, another option would be to look for other modules of the
> > same name when you get an attribute-error on a module to aid debugging,
> > but I think that's too heavy-weight.)
> 
> If that could be done only when the exception escapes to top level and dumps s traceback, that might be reasonable. And it would _definitely_ be helpful. But I don't think it's possible without major changes.

No, indeed, that was also my expectation: helpful, but too big a hassle
to be worth it.