[Web-SIG] A more useful command-line wsgiref.simple_server?

Graham Dumpleton graham.dumpleton at gmail.com
Fri Mar 30 23:12:23 CEST 2012


On 31 March 2012 06:58, Geoffrey Spear <geoffspear at gmail.com> wrote:
> On Fri, Mar 30, 2012 at 3:20 PM, Masklinn <masklinn at masklinn.net> wrote:
>> 2. You seem to have asserted from the start that the default should be
>>   mounting modules, but I have seen no evidence or argument in favor of
>>   that so far.
>>
>>   Defaulting to scripts not only works with both local modules and
>>   arbitrary files and follow cpython's (and most tools's) own behavior,
>>   but would also allows using -mwsgiref.simple_server as a shebang
>>   line. I find this to have quite a lot of value.
>
> I may be dense, but is there actually a use case for using a WSGI
> application from a script? Presumably a script that defines a WSGI
> application would also run it.

Some history for you.

Seeing the file containing a WSGI application entry point as a file
rather than a module derives from how Apache works.

Take for example CGI under Apache, one can say for a directory context:

  AddHandler cgi-script .py

What this means is that any files with a .py extension are executed as
a CGI script. Thus, would have to be an executable file and have an
appropriate #! line which can resolve the Python interpreter to use.
In this case the extension used is actually irrelevant.

When mod_python came along it allowed one instead to say:

  AddHandler python-script .py

The way mod_python then originally worked was that when it resolved a
URL to a directory containing the target .py file, it would add that
directory to sys.path, import the module based on the basename of the
target file. It would then execute the entry point callable within the
loaded module.

No #! line was needed, nor did file need to be executable. The first
wasn't needed because mod_python dictated what Python version was
used.

The problem with what mod_python did was the AddHandler can span
multiple directories. As target files in each directory were accessed,
each directory would get added into sys.path to be able to import it.

Because these are normal file system directories and treated as
separate module directories and not part of an overall package
structure, there was nothing to stop you having the same name file in
each directory. It was common for example to have:

  DirectoryIndex index.py

This means that if the directory itself was the target, it would use
the index.py in the directory as means of generating the directory
index.

If more than one directory was added to sys.path containing an
index.py file, you can only have one loaded as a module, not both.

Thus you ended up with an in memory instance of 'index' module being
used rather than the second one encountered, or depending on sys.path
ordering, you could import the 'index' module from the wrong
directory. Basically, things were a bit unpredictable if you ever used
the same file name more than once.

There was various other things that could go wrong as well.

In latter version of mod_python the whole module importing system was
rewritten to avoid adding directories into sys.path. Instead a custom
module importer was used with special lookup rules to find modules in
directories itself.

Further, when modules were loaded, the __name__ of the module was not
just the basename of the file, but a magic string taking into account
the full file path name. By doing this, even though index.py may occur
in separate places, they would be distinct modules in memory.

The complexity of still allowing relative module imports from the same
directory to simulate things as if directory was in sys.path was
frightening though. Add to that that mod_python had a reloading
mechanism which could look not just at the immediate file, but all sub
modules imported from the directories managed by the mod_python custom
module importer and also trigger a reload when one of the used modules
was changed and not just the top level one.

Now when doing mod_wsgi, a similar method of loading each file
separately with a __name__ based on file system path was used to
ensure each was distinct when same file name used in different
directories.

What mod_wsgi didn't do though was replicate the custom module
importer that mod_python had as that really was a nightmare.

This mean that relative module imports from same directory would not
work. If someone really wanted that, they would need to add the
directory to sys.path themselves.

Once they did that though, because the target file as loaded by
mod_wsgi had a __name__ which didn't match the basename for file, then
if someone tried to import that module file back into something else,
you would end up with two copies in memory. The first being the magic
one mod_wsgi loaded as file and the other loaded as module.

To make it more obvious that they were treated a bit differently, and
to avoid people making this mistake, it was promoted to use a .wsgi
extension for the WSGI script file rather than .py. That way people
would not go inadvertently importing it a second time.

Further, because of the way that the .wsgi script file was loaded,
ie., not as regular module import, and with __name__ being special
there were certain things you couldn't do in it.

One example was that you could not put class definitions for objects
in it which you then pickled up instances of. This is because when
unpickling Python would not know how to automatically import the
module containing the class definitions because of __name__ not having
any meaning to the Python where being unpickled.

So that is the history of why there is a distinction between WSGI
script file and module containing a WSGI application in Apache at
least. Although it is called 'script file', it is really only 'file'
as it isn't executable itself in the sense of what people generally
talk about when they say 'script'.

FWIW, in the past when pushing the idea of a WSGI script file being
the lowest common denominator, part of the reason I found I couldn't
get it accepted is that some people simply didn't understand how in
Python to load an arbitrary file by path name and construct a module
for it in memory, with magic __name__. They seemed to think that the
only way to import a code file was for it to have a .py extension and
for the directory to be in sys.path. So, due to ignorance of the
solution as to how to do it meant I got a push back from some people.
:-(

Graham


More information about the Web-SIG mailing list