[Web-SIG] Move to bless Graham's WSGI 1.1 as official spec

Thu Dec 3 11:55:51 CET 2009

James Y Knight ha scritto:
> I move to bless mod_wsgi's definition of WSGI 1.1 [1]
> [...]
> 
> [1] http://code.google.com/p/modwsgi/wiki/SupportForPython3X

Hi.

Just a few questions.

It is true that HTTP headers can be encoded assuming latin-1; and they
can be encoded using PEP 383.

However what about URI (that is, for PATH_INFO and the like)?
For URI (if I remember correctly) the suggested encoding is UTF-8, so
URLS should be decoded using

  url.decode('utf-8', 'surrogateescape')

Is this correct?

Now another question.
Let's consider the `wsgiref.util.application_uri` function

def application_uri(environ):
    url = environ['wsgi.url_scheme']+'://'
    from urllib.parse import quote

    if environ.get('HTTP_HOST'):
        url += environ['HTTP_HOST']
    else:
        url += environ['SERVER_NAME']

        if environ['wsgi.url_scheme'] == 'https':
            if environ['SERVER_PORT'] != '443':
                url += ':' + environ['SERVER_PORT']
        else:
            if environ['SERVER_PORT'] != '80':
                url += ':' + environ['SERVER_PORT']

    url += quote(environ.get('SCRIPT_NAME') or '/')
    return url

There is a potential problem, here, with the quote function.
This function does the following:

def quote(string, safe='/', encoding=None, errors=None):
    if isinstance(string, str):
        if encoding is None:
        encoding = 'utf-8'
        if errors is None:
            errors = 'strict'
            string = string.encode(encoding, errors)

This means that if we use surrogateescape, the informations about
original bytes is lost here.

This can be easily fixed by changing the application_uri function, but
this also means that a WSGI application will not work with Python 3.1.x.

Finally, a question about cookies.
Cookie data SHOULD be transparent to the server/gateway; however WSGI is
going to assume that data is encoded in latin-1.

I don't know what the HTTP/Cookie spec says about this.
However, from a WSGI application point of view, the cookie data can, as
an example, contain some text encoded in UTF-8; this means that the
application must first encode the data:

  cookie_bytes = cookie.encode('latin-1', 'surrogateescape')

and then decode it using UTF-8:

  my_cookie_data = cookie_bytes.decode('utf-8')

This is a bit unreasonable, but I don't know if this is a common
practice (I do this, just to make an example).

Manlio Perillo