From davidgshi at yahoo.co.uk  Tue Nov  4 12:31:49 2008
From: davidgshi at yahoo.co.uk (David Shi)
Date: Tue, 4 Nov 2008 11:31:49 +0000 (GMT)
Subject: [Web-SIG] Seeking advice on user, session and folder management
Message-ID: <137261.95767.qm@web26305.mail.ukl.yahoo.com>

Looking for Python script to do the following.
?
Can?anyone point me to right direction to implementing automatic registration, authentication similar to most modern web services?? I wish to obtain similar script to customise and further develop to add automatic allocation of folders by using their log-in username, automatically setting permissions to these folders, setting a time (say 5 days) before the content of these folders to be flushed out and folder to be deleted if the activity of accessing the folder is dormant.
?
Sincerely,
?
David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081104/408d0a14/attachment.htm>

From davidgshi at yahoo.co.uk  Wed Nov  5 16:01:50 2008
From: davidgshi at yahoo.co.uk (David Shi)
Date: Wed, 5 Nov 2008 15:01:50 +0000 (GMT)
Subject: [Web-SIG] Looking for Python script to upload large data files over
	the internet
Message-ID: <21746.66756.qm@web26304.mail.ukl.yahoo.com>

Can anyone help?
?
I am looking for excellent Python scripts to upload large data files over the internet.

Regards.
?
David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081105/31bfb184/attachment.htm>

From plynch1976 at hotmail.com  Wed Nov  5 20:49:19 2008
From: plynch1976 at hotmail.com (Pat Lynch)
Date: Wed, 5 Nov 2008 19:49:19 +0000
Subject: [Web-SIG] ZSI client to .NET server problem?
Message-ID: <BAY122-W5D92FE306A0165324E3CCC61F0@phx.gbl>


hey all,
I've created a webservice client using ZSI (-l -b -u options).  The function that I'm trying to access on the webserver takes one param (a complex type).  I had to use the -l option because the type actually has a member var which is of the same type (so I used to get the recursive error otherwise).

Anyway, when I call the .NET webservice, the param is received as NULL??  any ideas on what could be causing this??  I turned on debug on the client side & the xml seems to be well-formed (the only difference I can see between my xml & xml sent from a sample .net client is that the namespace is part of the element instead of the parent)..  I tried the approach mentioned here (http://article.gmane.org/gmane.comp.python.pywebsvcs.general/2211), but no improvement...

I've been tearing my hair out since Monday on this, so any help would be appreciated :)

thanks a million.

regards,
Pat.


_________________________________________________________________
Get 30 Free Emoticons for your Windows Live Messenger
http://www.livemessenger-emoticons.com/funfamily/en-ie/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081105/c3ef42c4/attachment.htm>

From davidgshi at yahoo.co.uk  Thu Nov  6 18:53:32 2008
From: davidgshi at yahoo.co.uk (David Shi)
Date: Thu, 6 Nov 2008 17:53:32 +0000 (GMT)
Subject: [Web-SIG] Looking for a nitty-gritty Python Ajax middleware script
	to fire off a number of processors
Message-ID: <897945.80434.qm@web26305.mail.ukl.yahoo.com>

Dear All,
?
I am looking for a nitty-gritty Python Ajax script to fire off a number of processing programmes, periodically checking their operations, sending messages back to an HTML div form by sending back the links of generated data files, to be downloaded by end users.
?
I am using .NET IIS 6.0 and Windows Server.
?
Regards.
?
David


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081106/2108bb50/attachment.htm>

From davidgshi at yahoo.co.uk  Tue Nov 11 21:29:11 2008
From: davidgshi at yahoo.co.uk (David Shi)
Date: Tue, 11 Nov 2008 20:29:11 +0000 (GMT)
Subject: [Web-SIG] Has anyone tried calling zip.py in feedback.py and print
	out an innerHTML to provide a download link?
Message-ID: <408436.82336.qm@web26307.mail.ukl.yahoo.com>

Hello, there.
?
Has anyone tried calling zip.py in feedback.py and print out an innerHTML to provide a download link?
?
I find difficult to make it work.
?
Sincerely,
?
David
?#**********************************************************************
# Description:
#    Zips the contents of a folder.
# Parameters:
#   0 - Input folder.
#   1 - Output zip file. It is assumed that the user added the .zip 
#       extension.  
#**********************************************************************

# Import modules and create the geoprocessor
#
import sys, zipfile, arcgisscripting, os, traceback
gp = arcgisscripting.create()

# Function for zipping files.  If keep is true, the folder, along with 
#  all its contents, will be written to the zip file.  If false, only 
#  the contents of the input folder will be written to the zip file - 
#  the input folder name will not appear in the zip file.
#
def zipws(path, zip, keep):
    path = os.path.normpath(path)
    # os.walk visits every subdirectory, returning a 3-tuple
    #  of directory name, subdirectories in it, and filenames
    #  in it.
    #
    for (dirpath, dirnames, filenames) in os.walk(path):
        # Iterate over every filename
        #
        for file in filenames:
            # Ignore .lock files
            #
            if not file.endswith('.lock'):
                gp.AddMessage("Adding %s..." % os.path.join(path, dirpath, file))
                try:
                    if keep:
                        zip.write(os.path.join(dirpath, file),
                        os.path.join(os.path.basename(path), os.path.join(dirpath, file)[len(path)+len(os.sep):]))
                    else:
                        zip.write(os.path.join(dirpath, file),            
                        os.path.join(dirpath[len(path):], file)) 

                except Exception, e:
                    gp.AddWarning("    Error adding %s: %s" % (file, e))

    return None

if __name__ == '__main__':
    try:
        # Get the tool parameter values
        #
        infolder = gp.GetParameterAsText(0)
        outfile = gp.GetParameterAsText(1)      

        # Create the zip file for writing compressed data. In some rare
        #  instances, the ZIP_DEFLATED constant may be unavailable and
        #  the ZIP_STORED constant is used instead.  When ZIP_STORED is
        #  used, the zip file does not contain compressed data, resulting
        #  in large zip files. 
        #
        try:
                zip = zipfile.ZipFile(outfile, 'w', zipfile.ZIP_DEFLATED)
                zipws(infolder, zip, True)
                zip.close()
        except RuntimeError:
                # Delete zip file if exists
                #
                if os.path.exists(outfile):
                        os.unlink(outfile)
                zip = zipfile.ZipFile(outfile, 'w', zipfile.ZIP_STORED)
                zipws(infolder, zip, True)
                zip.close()
                gp.AddWarning("    Unable to compress zip file contents.")

        gp.AddMessage("Zip file created successfully")

    except:
        # Return any python specific errors as well as any errors from the geoprocessor
        #
        tb = sys.exc_info()[2]
        tbinfo = traceback.format_tb(tb)[0]
        pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n    " +                 str(sys.exc_type)+ ": " + str(sys.exc_value) + "\n"
        gp.AddError(pymsg)

        msgs = "GP ERRORS:\n" + gp.GetMessages(2) + "\n"
        gp.AddError(msgs)


?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081111/db56b79a/attachment.htm>

From and-py at doxdesk.com  Wed Nov 12 20:22:38 2008
From: and-py at doxdesk.com (Andrew Clover)
Date: Wed, 12 Nov 2008 20:22:38 +0100
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
Message-ID: <491B2CFE.7060502@doxdesk.com>

It would be lovely if we could allow WSGI applications to reliably 
accept Unicode paths.

That is to say, allow WSGI apps to have beautiful URLs like Wikipedia's, 
without requiring URL-rewriting magic. (Which is so highly 
server-specific, potentially unavailable to non-admin webmasters, and 
makes WSGI app deployment more difficult than it already is.)


If we could reliably read the bytes the browser sends to us in the GET 
request that would be great, we could just decode those and be done with 
it. Unfortunately, that's not reliable, because:

1. thanks to an old wart in the CGI specification, %XX hex escapes are 
decoded before the character is put into the PATH_INFO environment variable;

2. the environment variables may be stored as Unicode.

(1) on its own gives us the problem of not being able to distinguish a 
path-separator slash from an encoded %2F; a long-known problem but not 
one that greatly affects most people.

But combined with (2) that means some other component must choose how to 
decode the bytes into Unicode characters. No standard currently 
specifies what encoding to use, it is not typically configuarable, and 
it's certainly not within reach of the WSGI application. My assumption 
is that most applications will want to end up with UTF-8-encoded URLs; 
other choices are certainly possible but as we move towards IRI they 
become less likely.


This situation previously affected only Windows users, because NT 
environment variables are native Unicode. However, Python 3.0 specifies 
all environment variable access is through a Unicode wrapper, and gives 
no way to control how that automatic decoding is done, leaving everyone 
in the same boat.

WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ 
should be "decoded from the headers using HTTP standard encodings (i.e. 
latin-1 + RFC 2047)", but unfortunately this doesn't quite work:

1. for many existing environments the decoding-from-headers charset is 
out of reach of the WSGI server/layer and may well not be ISO-8859-1. 
Even wsgiref doesn't currently use 8859-1 (see below).

2. RFC2047 is not applicable to HTTP headers, which are not really 
822-family headers even though they look just like them. The sub-headers 
in eg. a multipart/form-data chunk *are* (probably) proper 822 headers 
so RFC2047 could apply, but those headers are already dealt with by the 
application or framework, not WSGI. HTTP 1.1 (RFC2616) does refer to 
RFC2047 as an encoding mechanism for TEXT and quoted-string, but this 
makes no sense as 2047 itself requires embedding in atom-based parsing 
sequences which those productions are not (quoted-strings are explicitly 
disallowed by 2047 itself). In any case no existing browser attempts to 
support RFC2047 encoding rules for any possible interpretation of what 
2616 might mean.


Something like Lu?s Bruno's ORIGINAL_PATH_INFO proposal 
(http://mail.python.org/pipermail/web-sig/2008-January/003124.html) 
would be worth looking at for this IMO. It may be of questionable 
usefulness if the only character affected is the slash, but it also 
happens to solve the Unicode problem. Obviously whatever it was called 
it would have to be an optional additional value in the WSGI environ, as 
pure CGI servers wouldn't be able to supply it. Conceivably it might 
also be possible to have a standardised mod_rewrite rule to make the 
variable also available to Apache CGI scripts, but still this is far 
from global availability.

In the meantime I've been looking at how various combinations of servers 
deal with this issue, and in what circumstances an application or 
middleware can safely recover all possible Unicode input. 'Apache' 
refers to the (AFAICT-identical) behaviour of both mod_cgi and mod_wsgi; 
'IIS' refers to IIS with CGI.


*** Apache/Posix/Python2
OK.

No problem here, it's byte-based all the way through.


*** Apache/Posix/Python3:
Dependent on the default encoding.

Apache puts bytes into the envvars but Python takes them out as unicode. 
If the system default encoding happens to be the same as the encoding 
the WSGI application wanted we will be OK. Normally the app will want 
UTF-8; many Linux distributions do use UTF-8 as the default system 
encoding but there are plenty of distros (eg. Debian) and other Unixen 
that do not. In any case we are getting a nasty system dependency at 
deploy time that many webmasters will not be able to resolve.

It is sometimes possible to recover mangled characters despite the wrong 
decoding having been applied. For example if the system encoding was 
ISO-8859-1 or another encoding that maps every byte to a unique Unicode 
character, we can encode the Unicode string back to its original bytes, 
and thence apply the decoding we actually wanted! If, on the other hand, 
it's something like ISO-8859-4, where not all high bytes are mapped at 
all, we'll be losing random characters... not good.


*** Apache/NT/Python2
Always unrecoverable data loss.

Apache on Windows always uses ISO-8859-1 to decode the request path and 
put it in the Unicode envvars. This is OK so far, we have Unicode 
characters with the same codepoints as the original bytes. However, 
Python2 needs to make the envvars available as bytes. It uses the system 
default encoding; if that were ISO-8859-1, we'd be OK.

But it never is. Western European on NT is actually cp1252, whose 
characters in the range 0x80 to 0x9F differ from ISO-8859-1. And if the 
app wants UTF-8, chances are those characters are going to come up a 
lot. There is as far as I know no user-selectable Windows codepage that 
can map all the Unicode characters up to U+00FF.


*** Apache/NT/Python3
Wrong, but always recoverable.

Python retreives the bytes-encoded-into-Unicode-codepoints string 
directly from the envvars. If the encoding should have been UTF-8 or 
something else other than ISO-8859-1, we can recover the original bytes 
by re-encoding to 8859-1, then decoding using the real charset.


*** IIS/NT/Python2
Mostly unrecoverable data loss.

IIS decodes submitted bytes to Unicode using UTF-8 when it can. But if 
there is an invalid UTF-8 sequence in the bytes it will try again using 
the system codepage. Python will then re-encode the Unicode envvar using 
the system codepage.

If the app is expecting UTF-8 we can decode what Python gives us using 
the system codepage (ie. 'mbcs') and get back any of the submitted 
characters that happened to be in this server's system codepage. Other 
characters may be replaced by question marks or Windows's best attempts 
to give us something useful, which at best may be a character shorn of 
diacriticals and at worst something just completely wrong.

NT's system codepage is never UTF-8, it is not a user-selectable option 
never mind the default. We can improve our chances of getting more 
characters through by using a character set with a wide repertoire, such 
as cp932 (Shift-JIS). But it's still not really proper Unicode support.

If the app is expecting something non-UTF-8 there's not much hope. Even 
if it wanted the same character set as the system codepage, it can't be 
sure that the submitted bytes didn't happen to also be a valid UTF-8 
sequence, and thus get mangled by IIS decoding them that way.


*** IIS/NT/Python3
OK, as long as the app wants UTF-8.

Incoming UTF-8 bytes are reliably converted to Unicode strings by IIS, 
and directly read by Python from the envvars.

If the application didn't want UTF-8 the situation is about as hopeless 
as with Python2.


*** wsgiref.simple_server/(any)/Python2
OK.

Bytes all the way through.


*** wsgiref.simple_server/(any)/Python3:
Probably will be OK, as long as the app wants UTF-8.

simple_server is currently broken in rc2. However judging by the code, 
it is using urllib.parse.unquote, which assumes UTF-8, so it'll be fine 
for apps that want UTF-8 and hopeless for those that don't.


I'd be very interested to hear what other servers are doing in this 
situation - nginx? cherrypy's one? - and wonder if any particular 
behaviour should be 'blessed'.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/

From ianb at colorstudy.com  Thu Nov 13 00:24:54 2008
From: ianb at colorstudy.com (Ian Bicking)
Date: Wed, 12 Nov 2008 17:24:54 -0600
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <491B2CFE.7060502@doxdesk.com>
References: <491B2CFE.7060502@doxdesk.com>
Message-ID: <491B65C6.3020206@colorstudy.com>

Andrew Clover wrote:
> If we could reliably read the bytes the browser sends to us in the GET 
> request that would be great, we could just decode those and be done with 
> it. Unfortunately, that's not reliable, because:
> 
> 1. thanks to an old wart in the CGI specification, %XX hex escapes are 
> decoded before the character is put into the PATH_INFO environment 
> variable;

I don't see a problem with this?  At least not a problem with respect to 
encoding.  As it is (in Python 2), you should do something like 
environ['PATH_INFO'].decode('utf8') and it should work.  It doesn't seem 
like there's any distinction between %-encoded characters and plain 
characters in this situation.

> 2. the environment variables may be stored as Unicode.
> 
> (1) on its own gives us the problem of not being able to distinguish a 
> path-separator slash from an encoded %2F; a long-known problem but not 
> one that greatly affects most people.
> 
> But combined with (2) that means some other component must choose how to 
> decode the bytes into Unicode characters. No standard currently 
> specifies what encoding to use, it is not typically configuarable, and 
> it's certainly not within reach of the WSGI application. My assumption 
> is that most applications will want to end up with UTF-8-encoded URLs; 
> other choices are certainly possible but as we move towards IRI they 
> become less likely.
> 
> 
> This situation previously affected only Windows users, because NT 
> environment variables are native Unicode. However, Python 3.0 specifies 
> all environment variable access is through a Unicode wrapper, and gives 
> no way to control how that automatic decoding is done, leaving everyone 
> in the same boat.
> 
> WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ 
> should be "decoded from the headers using HTTP standard encodings (i.e. 
> latin-1 + RFC 2047)", but unfortunately this doesn't quite work:

My understanding of this suggestion is that latin-1 is a way of 
representing bytes as unicode.  In other words, the values will be 
unicode, but that will simply be a lie.  So if you know you have UTF8 
paths, you'd do:

path_info = environ['PATH_INFO'].encode('latin-1').decode('utf8')

As far as I can tell this is simply to avoid having bytes in the 
environment, even though bytes are an accurate representation and 
unicode is not.

A lot of what you write about has to do with CGI, which is the only 
place WSGI interacts with os.environ.  CGI is really an aspect of the 
CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the WSGI 
spec itself.

Personally I'm more inclined to set up a policy on the WSGI server 
itself with respect to the encoding, and then use real unicode 
characters.  Unfortunately that's not as flexible as bytes, as it 
doesn't make it very easy to sniff out the encoding in 
application-specific ways, or support different encodings in different 
parts of the server (which would be useful if, for instance, you were to 
proxy applications with unknown encodings).  So... maybe that's not the 
most feasible option.  But if it's not, then I'd rather stick with bytes.


-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org

From graham.dumpleton at gmail.com  Thu Nov 13 00:44:53 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Thu, 13 Nov 2008 10:44:53 +1100
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <491B2CFE.7060502@doxdesk.com>
References: <491B2CFE.7060502@doxdesk.com>
Message-ID: <88e286470811121544ue9c46a4l77e4e011acece623@mail.gmail.com>

FWIW, there was a past discussion on these issues on mod_wsgi list. I
can't really remember what the outcome of the discussion was. The
discussion is at:

  http://groups.google.com/group/modwsgi/browse_frm/thread/2471a1a71620629f

Graham

2008/11/13 Andrew Clover <and-py at doxdesk.com>:
> It would be lovely if we could allow WSGI applications to reliably accept
> Unicode paths.
>
> That is to say, allow WSGI apps to have beautiful URLs like Wikipedia's,
> without requiring URL-rewriting magic. (Which is so highly server-specific,
> potentially unavailable to non-admin webmasters, and makes WSGI app
> deployment more difficult than it already is.)
>
>
> If we could reliably read the bytes the browser sends to us in the GET
> request that would be great, we could just decode those and be done with it.
> Unfortunately, that's not reliable, because:
>
> 1. thanks to an old wart in the CGI specification, %XX hex escapes are
> decoded before the character is put into the PATH_INFO environment variable;
>
> 2. the environment variables may be stored as Unicode.
>
> (1) on its own gives us the problem of not being able to distinguish a
> path-separator slash from an encoded %2F; a long-known problem but not one
> that greatly affects most people.
>
> But combined with (2) that means some other component must choose how to
> decode the bytes into Unicode characters. No standard currently specifies
> what encoding to use, it is not typically configuarable, and it's certainly
> not within reach of the WSGI application. My assumption is that most
> applications will want to end up with UTF-8-encoded URLs; other choices are
> certainly possible but as we move towards IRI they become less likely.
>
>
> This situation previously affected only Windows users, because NT
> environment variables are native Unicode. However, Python 3.0 specifies all
> environment variable access is through a Unicode wrapper, and gives no way
> to control how that automatic decoding is done, leaving everyone in the same
> boat.
>
> WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ should
> be "decoded from the headers using HTTP standard encodings (i.e. latin-1 +
> RFC 2047)", but unfortunately this doesn't quite work:
>
> 1. for many existing environments the decoding-from-headers charset is out
> of reach of the WSGI server/layer and may well not be ISO-8859-1. Even
> wsgiref doesn't currently use 8859-1 (see below).
>
> 2. RFC2047 is not applicable to HTTP headers, which are not really
> 822-family headers even though they look just like them. The sub-headers in
> eg. a multipart/form-data chunk *are* (probably) proper 822 headers so
> RFC2047 could apply, but those headers are already dealt with by the
> application or framework, not WSGI. HTTP 1.1 (RFC2616) does refer to RFC2047
> as an encoding mechanism for TEXT and quoted-string, but this makes no sense
> as 2047 itself requires embedding in atom-based parsing sequences which
> those productions are not (quoted-strings are explicitly disallowed by 2047
> itself). In any case no existing browser attempts to support RFC2047
> encoding rules for any possible interpretation of what 2616 might mean.
>
>
> Something like Lu?s Bruno's ORIGINAL_PATH_INFO proposal
> (http://mail.python.org/pipermail/web-sig/2008-January/003124.html) would be
> worth looking at for this IMO. It may be of questionable usefulness if the
> only character affected is the slash, but it also happens to solve the
> Unicode problem. Obviously whatever it was called it would have to be an
> optional additional value in the WSGI environ, as pure CGI servers wouldn't
> be able to supply it. Conceivably it might also be possible to have a
> standardised mod_rewrite rule to make the variable also available to Apache
> CGI scripts, but still this is far from global availability.
>
> In the meantime I've been looking at how various combinations of servers
> deal with this issue, and in what circumstances an application or middleware
> can safely recover all possible Unicode input. 'Apache' refers to the
> (AFAICT-identical) behaviour of both mod_cgi and mod_wsgi; 'IIS' refers to
> IIS with CGI.
>
>
> *** Apache/Posix/Python2
> OK.
>
> No problem here, it's byte-based all the way through.
>
>
> *** Apache/Posix/Python3:
> Dependent on the default encoding.
>
> Apache puts bytes into the envvars but Python takes them out as unicode. If
> the system default encoding happens to be the same as the encoding the WSGI
> application wanted we will be OK. Normally the app will want UTF-8; many
> Linux distributions do use UTF-8 as the default system encoding but there
> are plenty of distros (eg. Debian) and other Unixen that do not. In any case
> we are getting a nasty system dependency at deploy time that many webmasters
> will not be able to resolve.
>
> It is sometimes possible to recover mangled characters despite the wrong
> decoding having been applied. For example if the system encoding was
> ISO-8859-1 or another encoding that maps every byte to a unique Unicode
> character, we can encode the Unicode string back to its original bytes, and
> thence apply the decoding we actually wanted! If, on the other hand, it's
> something like ISO-8859-4, where not all high bytes are mapped at all, we'll
> be losing random characters... not good.
>
>
> *** Apache/NT/Python2
> Always unrecoverable data loss.
>
> Apache on Windows always uses ISO-8859-1 to decode the request path and put
> it in the Unicode envvars. This is OK so far, we have Unicode characters
> with the same codepoints as the original bytes. However, Python2 needs to
> make the envvars available as bytes. It uses the system default encoding; if
> that were ISO-8859-1, we'd be OK.
>
> But it never is. Western European on NT is actually cp1252, whose characters
> in the range 0x80 to 0x9F differ from ISO-8859-1. And if the app wants
> UTF-8, chances are those characters are going to come up a lot. There is as
> far as I know no user-selectable Windows codepage that can map all the
> Unicode characters up to U+00FF.
>
>
> *** Apache/NT/Python3
> Wrong, but always recoverable.
>
> Python retreives the bytes-encoded-into-Unicode-codepoints string directly
> from the envvars. If the encoding should have been UTF-8 or something else
> other than ISO-8859-1, we can recover the original bytes by re-encoding to
> 8859-1, then decoding using the real charset.
>
>
> *** IIS/NT/Python2
> Mostly unrecoverable data loss.
>
> IIS decodes submitted bytes to Unicode using UTF-8 when it can. But if there
> is an invalid UTF-8 sequence in the bytes it will try again using the system
> codepage. Python will then re-encode the Unicode envvar using the system
> codepage.
>
> If the app is expecting UTF-8 we can decode what Python gives us using the
> system codepage (ie. 'mbcs') and get back any of the submitted characters
> that happened to be in this server's system codepage. Other characters may
> be replaced by question marks or Windows's best attempts to give us
> something useful, which at best may be a character shorn of diacriticals and
> at worst something just completely wrong.
>
> NT's system codepage is never UTF-8, it is not a user-selectable option
> never mind the default. We can improve our chances of getting more
> characters through by using a character set with a wide repertoire, such as
> cp932 (Shift-JIS). But it's still not really proper Unicode support.
>
> If the app is expecting something non-UTF-8 there's not much hope. Even if
> it wanted the same character set as the system codepage, it can't be sure
> that the submitted bytes didn't happen to also be a valid UTF-8 sequence,
> and thus get mangled by IIS decoding them that way.
>
>
> *** IIS/NT/Python3
> OK, as long as the app wants UTF-8.
>
> Incoming UTF-8 bytes are reliably converted to Unicode strings by IIS, and
> directly read by Python from the envvars.
>
> If the application didn't want UTF-8 the situation is about as hopeless as
> with Python2.
>
>
> *** wsgiref.simple_server/(any)/Python2
> OK.
>
> Bytes all the way through.
>
>
> *** wsgiref.simple_server/(any)/Python3:
> Probably will be OK, as long as the app wants UTF-8.
>
> simple_server is currently broken in rc2. However judging by the code, it is
> using urllib.parse.unquote, which assumes UTF-8, so it'll be fine for apps
> that want UTF-8 and hopeless for those that don't.
>
>
> I'd be very interested to hear what other servers are doing in this
> situation - nginx? cherrypy's one? - and wonder if any particular behaviour
> should be 'blessed'.
>
> --
> And Clover
> mailto:and at doxdesk.com
> http://www.doxdesk.com/
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe:
> http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com
>

From davidgshi at yahoo.co.uk  Thu Nov 13 12:40:10 2008
From: davidgshi at yahoo.co.uk (David Shi)
Date: Thu, 13 Nov 2008 11:40:10 +0000 (GMT)
Subject: [Web-SIG] Looking for a Python Ajax Middleware script
In-Reply-To: <e5fff6640810310613g3bdff700v499fc42e3aad0877@mail.gmail.com>
Message-ID: <508252.63885.qm@web26305.mail.ukl.yahoo.com>

Dear Benji York,
?
Thank you very much for letting me to know this.
?
I am not a programmer but has a demonstration project to complete.? How can I easily to follow instructions to implement and test this?
?
Does it work with Ajax?
?
I am using Windows Server and IIS.?? I do not have facility to un-gzip it.
?
Regards.
?
David

--- On Fri, 31/10/08, Benji York <benji at benjiyork.com> wrote:

From: Benji York <benji at benjiyork.com>
Subject: Re: [Web-SIG] Looking for a Python Ajax Middleware script
To: davidgshi at yahoo.co.uk
Cc: web-sig at python.org
Date: Friday, 31 October, 2008, 1:13 PM

2008/10/31 David Shi <davidgshi at yahoo.co.uk>:
>
> Has anyone tried the following with Python?

[snip]

It sounds like you could use zc.async: http://pypi.python.org/pypi/zc.async/

>From the above page:

   The zc.async package provides an easy-to-use Python tool that
   schedules work persistently and reliably across multiple processes
   and machines.
-- 
Benji York


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081113/2cc70316/attachment.htm>

From benji at benjiyork.com  Thu Nov 13 14:26:10 2008
From: benji at benjiyork.com (Benji York)
Date: Thu, 13 Nov 2008 08:26:10 -0500
Subject: [Web-SIG] Looking for a Python Ajax Middleware script
In-Reply-To: <508252.63885.qm@web26305.mail.ukl.yahoo.com>
References: <e5fff6640810310613g3bdff700v499fc42e3aad0877@mail.gmail.com>
	<508252.63885.qm@web26305.mail.ukl.yahoo.com>
Message-ID: <e5fff6640811130526n45ae129fyd2619b96661753b2@mail.gmail.com>

On Thu, Nov 13, 2008 at 6:40 AM, David Shi <davidgshi at yahoo.co.uk> wrote:
> Dear Benji York,
>
> Thank you very much for letting me to know this.
>
> I am not a programmer but has a demonstration project to complete.  How can
> I easily to follow instructions to implement and test this?

I doubt it; zc.async is well documented, but it is only tool.  Therefore
you can use it to accomplish your goal, but you would have to do a
non-trivial amount of programming to address your particular need.

> Does it work with Ajax?

That question doesn't really apply.
-- 
Benji York
Senior Software Engineer
Zope Corporation

From and-py at doxdesk.com  Fri Nov 14 18:14:08 2008
From: and-py at doxdesk.com (Andrew Clover)
Date: Fri, 14 Nov 2008 18:14:08 +0100
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <491B65C6.3020206@colorstudy.com>
References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com>
Message-ID: <491DB1E0.3070701@doxdesk.com>

Ian Bicking wrote:

> As it is (in Python 2), you should do something like 
> environ['PATH_INFO'].decode('utf8') and it should work.

See the test cases in my original post: this doesn't work universally. 
On WinNT platforms PATH_INFO has already gone through a decode/encode 
cycle which almost always irretrievably mangles the value.

> My understanding of this suggestion is that latin-1 is a way of 
> representing bytes as unicode. In other words, the values will be 
> unicode, but that will simply be a lie.

Yes, that would be a sensible approach, but it is not what is actually 
happening in any WSGI environment I have tested. For example 
wsgiref.simple_server decodes using UTF-8 not 8859-1???or would do, if 
it were working. (It is currently broken in 3.0rc2; I put a hack in to 
get it running but I'm not really sure what the current status of 
simple_server in 3.0 is.)

> A lot of what you write about has to do with CGI, which is the only 
> place WSGI interacts with os.environ.  CGI is really an aspect of the 
> CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the WSGI 
> spec itself.

Indeed, but we naturally have to take into account implementability on 
CGI. If a WSGI spec *requires* PATH_INFO to have been obtained using 
8859-1 decoding???or UTF-8, which is the other sensible option given 
that most URIs today are UTF-8???then there cannot be a fully-compliant 
CGI-to-WSGI wrapper. Perhaps it's not the big issue it was when WSGI was 
first getting off the ground, but IMO it's still important.

> Personally I'm more inclined to set up a policy on the WSGI server 
> itself with respect to the encoding, and then use real unicode 
> characters.

I think we are stuck with Unicode environ at this point, given the CGI 
issue. But applications do need to know about the encoding in use, 
because they will (typically) be generating their own links. So an 
optional way to get that information to the application would be 
advantageous.

I'm now of the opinion that the best way to do this is to standardise 
Apache's ?REQUEST_URI? as an optional environ item. This header is 
pre-URI-decoding, containing only %-sequences and not real high bytes, 
so it can be decoded to Unicode using any old charset without worry.

An application wanting to support Unicode URIs (or encoded slashes in 
URIs*) could then sniff for REQUEST_URI and use it in preference to 
PATH_INFO where available. This is a bit more work for the application, 
but it should generally be handled transparently by a library/framework 
and supporting PATH_INFO in a portable fashion already has warts thanks 
to IIS's bugs, so the situation is not much worse than it already is.

And of course we get support through mod_cgi and mod_wsgi automatically, 
so Graham doesn't have to do anything. :-)

Graham Dumpleton wrote:

> I can't really remember what the outcome of the discussion was.

Not too much outcome really, unfortunately! You concluded:

> there possibly still is an open question there on how
> encoding of non ascii characters works in practice. We just need to
> do some actual tests to see what happens and whether there is a problem. 

...to which the answer is???judging by the results posted???probably 
?yes?, I'm afraid!

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/

From ianb at colorstudy.com  Fri Nov 14 18:47:50 2008
From: ianb at colorstudy.com (Ian Bicking)
Date: Fri, 14 Nov 2008 11:47:50 -0600
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <491DB1E0.3070701@doxdesk.com>
References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com>
	<491DB1E0.3070701@doxdesk.com>
Message-ID: <491DB9C6.1070909@colorstudy.com>

Andrew Clover wrote:
> Ian Bicking wrote:
> 
>> As it is (in Python 2), you should do something like 
>> environ['PATH_INFO'].decode('utf8') and it should work.
> 
> See the test cases in my original post: this doesn't work universally. 
> On WinNT platforms PATH_INFO has already gone through a decode/encode 
> cycle which almost always irretrievably mangles the value.

This is something messed up with CGI on NT, and whatever server you are 
using, and perhaps the CGI adapter (maybe there's a way to get the raw 
environment without any encoding, for example?) -- it's mostly 
irrelevant to WSGI itself.

>> My understanding of this suggestion is that latin-1 is a way of 
>> representing bytes as unicode. In other words, the values will be 
>> unicode, but that will simply be a lie.
> 
> Yes, that would be a sensible approach, but it is not what is actually 
> happening in any WSGI environment I have tested. For example 
> wsgiref.simple_server decodes using UTF-8 not 8859-1???or would do, if 
> it were working. (It is currently broken in 3.0rc2; I put a hack in to 
> get it running but I'm not really sure what the current status of 
> simple_server in 3.0 is.)

As far as I know, PJE just made the suggestion about Latin-1, I don't 
know if anything has actually been done in wsgiref or elsewhere to 
implement that.  Honestly I don't know if anyone is doing anything with 
WSGI and Python 3.

>> A lot of what you write about has to do with CGI, which is the only 
>> place WSGI interacts with os.environ.  CGI is really an aspect of the 
>> CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the 
>> WSGI spec itself.
> 
> Indeed, but we naturally have to take into account implementability on 
> CGI. If a WSGI spec *requires* PATH_INFO to have been obtained using 
> 8859-1 decoding???or UTF-8, which is the other sensible option given 
> that most URIs today are UTF-8???then there cannot be a fully-compliant 
> CGI-to-WSGI wrapper. Perhaps it's not the big issue it was when WSGI was 
> first getting off the ground, but IMO it's still important.

This will presumably require hacks that might be system-dependent. 
Probably the current CGI adapter will just have to be a bit more 
complicated.  Also, if Python is utf8-decoding the environment, we'll 
just have to shortcut that entirely, as you can't just undo utf8.  I 
assume there is some way to get at the bytes in the environment, if not 
then that is a Python 3 bug.

>> Personally I'm more inclined to set up a policy on the WSGI server 
>> itself with respect to the encoding, and then use real unicode 
>> characters.
> 
> I think we are stuck with Unicode environ at this point, given the CGI 
> issue. But applications do need to know about the encoding in use, 
> because they will (typically) be generating their own links. So an 
> optional way to get that information to the application would be 
> advantageous.

The encoding of the operating system (which presumably informs the 
encoding of os.environ) has nothing to do with the encoding of the web 
application.  For the CGI adapter we simply need to find a way to ignore 
the system encoding.

> I'm now of the opinion that the best way to do this is to standardise 
> Apache's ?REQUEST_URI? as an optional environ item. This header is 
> pre-URI-decoding, containing only %-sequences and not real high bytes, 
> so it can be decoded to Unicode using any old charset without worry.

Unfortunately REQUEST_URI doesn't map directly to SCRIPT_NAME/PATH_INFO. 
  I think it might be feasible to support an encoded version of 
SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names, 
and I don't know of any particular standard to base those names on), 
moving from the two keys to a single REQUEST_URI is not feasible.

It's not that trivial to figure out where in REQUEST_URI the 
SCRIPT_NAME/PATH_INFO boundary really is, as there's many ways the 
unencoded values could be encoded.  I guess you'd probably count 
segments, try to catch %2f (where the segments won't match up), and then 
double check that the decoded REQUEST_URI matches SCRIPT_NAME+PATH_INFO.

> An application wanting to support Unicode URIs (or encoded slashes in 
> URIs*) could then sniff for REQUEST_URI and use it in preference to 
> PATH_INFO where available. This is a bit more work for the application, 
> but it should generally be handled transparently by a library/framework 
> and supporting PATH_INFO in a portable fashion already has warts thanks 
> to IIS's bugs, so the situation is not much worse than it already is.

I use the distinction between SCRIPT_NAME and PATH_INFO extensively. 
And frankly IIS is probably less relevant to most developers than CGI. 
Anyway, any of these bugs are things that need to be fixed in the WSGI 
adapter, we must not let them propagate into the specification or 
applications.  So if IIS has problems with PATH_INFO, the WSGI adapter 
(be it CGI or otherwise) should be configured to fix those problems up 
front.

> And of course we get support through mod_cgi and mod_wsgi automatically, 
> so Graham doesn't have to do anything. :-)
> 
> Graham Dumpleton wrote:
> 
>> I can't really remember what the outcome of the discussion was.
> 
> Not too much outcome really, unfortunately! You concluded:
> 
>> there possibly still is an open question there on how
>> encoding of non ascii characters works in practice. We just need to
>> do some actual tests to see what happens and whether there is a problem. 
> 
> ...to which the answer is???judging by the results posted???probably 
> ?yes?, I'm afraid!


-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org

From and-py at doxdesk.com  Fri Nov 14 22:23:35 2008
From: and-py at doxdesk.com (Andrew Clover)
Date: Fri, 14 Nov 2008 22:23:35 +0100
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <491DB9C6.1070909@colorstudy.com>
References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com>
	<491DB1E0.3070701@doxdesk.com> <491DB9C6.1070909@colorstudy.com>
Message-ID: <491DEC57.6080402@doxdesk.com>

Ian Bicking wrote:

> This is something messed up with CGI on NT, and whatever server you are 
> using, and perhaps the CGI adapter (maybe there's a way to get the raw 
> environment without any encoding, for example?)

Python decodes the environ to its own copy (wrapped in os.environ) at 
interpreter startup time; there's no way to query the real ?live? 
environment that I know of. It'd require a C extension.

> Honestly I don't know if anyone is doing anything with 
> WSGI and Python 3.

I know Graham has done some work on mod_wsgi for 3.0, but no, I don't 
know anyone using it in anger.

Is it worth submitting patches to simple_server to make it run on 3.0? 
Is it too late to include at this stage anyway? Shipping 3.0 with a 
non-functional wsgiref is a bit embarrassing.

> I assume there is some way to get at the bytes in the environment, if not 
> then that is a Python 3 bug.

There is not, and this appears to be deliberate.

> I think it might be feasible to support an encoded version of 
> SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names, 
> and I don't know of any particular standard to base those names on),
> moving from the two keys to a single REQUEST_URI is not feasible.

That's certainly a possibility, but I feel it's easier to hitch a ride 
on the existing header, which despite being non-standard is still quite 
widely used.

> I guess you'd probably count segments, try to catch %2f (where the
> segments won't match up), and then double check that the decoded
> REQUEST_URI matches SCRIPT_NAME+PATH_INFO.

I'm currently testing with just the segment counting. It's only 
necessary that the segments from SCRIPT_NAME are matched and stripped, 
and those are extremely unlikely to contain ?%2F? because:

   - there aren't many filesystems that can accept ?/? as a filename
     character. RISC OS is the only one I can think of, and it by
     convention swaps ?/? and ?.? to compensate as it is, so even
     there you couldn't use ?%2F?;
   - there aren't many webservers that can map a file or alias to a
     path containing ?%2F?;
   - no-one wants to mount a webapp alias at such a weird name???it's
     only in the section corresponding to PATH_INFO that ?%2F? might
     ever be of use in practice.

In the worst case, many applications already know and can strip the URL 
at which they're mounted, but unless there's a legitimate ?%2F? in their 
SCRIPT_NAME it doesn't actually matter.

> frankly IIS is probably less relevant to most developers than CGI. 

Er... really?

You and I may not favour it, but it's ?35% of the world out there, not 
something we can afford to ignore IMO.

> So if IIS has problems with PATH_INFO, the WSGI adapter 
> (be it CGI or otherwise) should be configured to fix those problems up 
> front.

What I'm saying is that neither Apache's nor IIS's behaviour can be 
considered clearly correct or wrong at this point, and there is no way a 
WSGI adapter living underneath them *can* fix up the differences.

(There is an problem with PATH_INFO that a WSGI adapter *could* clear 
up, which is that IIS makes PATH_INFO the entire path including 
SCRIPT_NAME. I'm not sure whether it's worth fixing that up in the 
adapter layer though... it's possible some frameworks are already 
dealing with it, and might even be relying on it!)

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/

From ianb at colorstudy.com  Sun Nov 16 04:16:41 2008
From: ianb at colorstudy.com (Ian Bicking)
Date: Sat, 15 Nov 2008 21:16:41 -0600
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
Message-ID: <491F9099.2090508@colorstudy.com>

We need to make a revision to the WSGI spec to say that 
environ['wsgi.input'].readline takes an optional size argument.  It 
always does in practice (except in wsgiref.validate.validator, rendering 
that validator useless), and is required to in practice, because 
everyone uses cgi.FieldStorage, and it passes in that argument.

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org

From graham.dumpleton at gmail.com  Sun Nov 16 06:22:39 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Sun, 16 Nov 2008 16:22:39 +1100
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <491F9099.2090508@colorstudy.com>
References: <491F9099.2090508@colorstudy.com>
Message-ID: <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>

2008/11/16 Ian Bicking <ianb at colorstudy.com>:
> We need to make a revision to the WSGI spec to say that
> environ['wsgi.input'].readline takes an optional size argument.  It always
> does in practice (except in wsgiref.validate.validator, rendering that
> validator useless), and is required to in practice, because everyone uses
> cgi.FieldStorage, and it passes in that argument.

This has been brought up numerous times before. There are other things
about wsgi.input that really need to be changed as well to make it
more useful. When I have pushed for revised specification before I
could never get enough interest in it from the people that most would
perceive are the ones who oversee the PEP.

Graham

From stephan at transvection.de  Sun Nov 16 14:51:09 2008
From: stephan at transvection.de (Stephan Diehl)
Date: Sun, 16 Nov 2008 14:51:09 +0100
Subject: [Web-SIG] possible bug in cgi
Message-ID: <4920254D.4010609@transvection.de>

this is probably not the right place to ask, but I found some irritating
behaviour with the cgi module and are unsure if it's a bug (seen on
python2.5 and python2.6)
The problem is this:
>>> import cgi
>>> cgi.FieldStorage(environ={'QUERY_STRING':u'a=b'})
FieldStorage(None, None, [MiniFieldStorage('a\x00', '\x00b\x00')])
>>> cgi.FieldStorage(environ={'QUERY_STRING':'a=b'})
FieldStorage(None, None, [MiniFieldStorage('a', 'b')])

When creating a FieldStorage with an environment that contains a unicode
'QUERY_STRING' value, garbage is returned.
The ultimate problem seems to be, that the QUERY_STRING is converted to
a cStringIO object which holds only the memory representation of unicode
strings.

Regards, Stephan

From ianb at colorstudy.com  Sun Nov 16 19:06:15 2008
From: ianb at colorstudy.com (Ian Bicking)
Date: Sun, 16 Nov 2008 12:06:15 -0600
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
References: <491F9099.2090508@colorstudy.com>
	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
Message-ID: <49206117.4020103@colorstudy.com>

Graham Dumpleton wrote:
> 2008/11/16 Ian Bicking <ianb at colorstudy.com>:
>> We need to make a revision to the WSGI spec to say that
>> environ['wsgi.input'].readline takes an optional size argument.  It always
>> does in practice (except in wsgiref.validate.validator, rendering that
>> validator useless), and is required to in practice, because everyone uses
>> cgi.FieldStorage, and it passes in that argument.
> 
> This has been brought up numerous times before. There are other things
> about wsgi.input that really need to be changed as well to make it
> more useful. When I have pushed for revised specification before I
> could never get enough interest in it from the people that most would
> perceive are the ones who oversee the PEP.

Yes, this has been passed over before.  To resolve this, let's just not 
pass it over this time?  This is a relatively small change to the WSGI 
spec, because it represents standard practice -- this change is simply 
getting the spec in line with implementations.

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org

From fumanchu at aminus.org  Sun Nov 16 21:39:53 2008
From: fumanchu at aminus.org (Robert Brewer)
Date: Sun, 16 Nov 2008 12:39:53 -0800
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <49206117.4020103@colorstudy.com>
References: <491F9099.2090508@colorstudy.com><88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
	<49206117.4020103@colorstudy.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6405A40F07@ex10.hostedexchange.local>

+1

> -----Original Message-----
> From: web-sig-bounces+fumanchu=aminus.org at python.org [mailto:web-sig-
> bounces+fumanchu=aminus.org at python.org] On Behalf Of Ian Bicking
> Sent: Sunday, November 16, 2008 10:06 AM
> To: Graham Dumpleton
> Cc: Web SIG
> Subject: Re: [Web-SIG] Revising environ['wsgi.input'].readline in the
> WSGI specification
> 
> Graham Dumpleton wrote:
> > 2008/11/16 Ian Bicking <ianb at colorstudy.com>:
> >> We need to make a revision to the WSGI spec to say that
> >> environ['wsgi.input'].readline takes an optional size argument.  It
> always
> >> does in practice (except in wsgiref.validate.validator, rendering
> that
> >> validator useless), and is required to in practice, because
everyone
> uses
> >> cgi.FieldStorage, and it passes in that argument.
> >
> > This has been brought up numerous times before. There are other
> things
> > about wsgi.input that really need to be changed as well to make it
> > more useful. When I have pushed for revised specification before I
> > could never get enough interest in it from the people that most
would
> > perceive are the ones who oversee the PEP.
> 
> Yes, this has been passed over before.  To resolve this, let's just
not
> pass it over this time?  This is a relatively small change to the WSGI
> spec, because it represents standard practice -- this change is simply
> getting the spec in line with implementations.
> 
> --
> Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-
> sig/fumanchu%40aminus.org

From mhammond at skippinet.com.au  Mon Nov 17 03:36:21 2008
From: mhammond at skippinet.com.au (Mark Hammond)
Date: Mon, 17 Nov 2008 13:36:21 +1100
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <491DEC57.6080402@doxdesk.com>
References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com>
	<491DB1E0.3070701@doxdesk.com> <491DB9C6.1070909@colorstudy.com>
	<491DEC57.6080402@doxdesk.com>
Message-ID: <000c01c9485d$49ff0d20$ddfd2760$@com.au>

> Python decodes the environ to its own copy (wrapped in os.environ) at
> interpreter startup time;

I don't think Python explicitly converts it - the CRT's ANSI version of environ is used, so the resulting strings should be encoded using the 'mbcs' encoding.  What mangling do you see?

> there's no way to query the real ?live?
> environment that I know of. It'd require a C extension.

win32api and ctypes would both let you call the Windows API.

> What I'm saying is that neither Apache's nor IIS's behaviour can be
> considered clearly correct or wrong at this point, and there is no way
> a WSGI adapter living underneath them *can* fix up the differences.

What is IIS doing wrong here?  IIUC, ISAPI treats everything as bytes, so it is more likely to be the "higher-level" layers built on ISAPI (eg, ASP) which assume encodings.

Apologies if you have already answered any of these - I haven?t been following that closely...

Cheers,

Mark


From and-py at doxdesk.com  Mon Nov 17 18:54:24 2008
From: and-py at doxdesk.com (Andrew Clover)
Date: Mon, 17 Nov 2008 18:54:24 +0100
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <000c01c9485d$49ff0d20$ddfd2760$@com.au>
References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com>
	<491DB1E0.3070701@doxdesk.com> <491DB9C6.1070909@colorstudy.com>
	<491DEC57.6080402@doxdesk.com>
	<000c01c9485d$49ff0d20$ddfd2760$@com.au>
Message-ID: <4921AFD0.7050506@doxdesk.com>

Mark Hammond wrote:

> I don't think Python explicitly converts it - the CRT's ANSI version
> of environ is used

Yes, it would be the CRT on Python 2.x. (Python 3.0 on non-NT does a 
conversion always using UTF-8, if I'm reading convertenviron right.)

> so the resulting strings should be encoded using the 'mbcs' encoding.
> What mangling do you see?

Correct, it's characters unencodable in mbcs that are lost*. mbcs is 
never equivalent to UTF-8 (which would allow us to recover characters on 
IIS) or ISO-8859 (which would allow us to receover characters on 
Apache-for-Windows) so there's always heavy lossage.

(* - replaced with ? or Windows's attempt to substitute something that 
looks vaguely like the original character.)

> win32api and ctypes would both let you call the Windows API.

Ah! I had considered the win32 extensions but it's a bit of a 
dependency... I'd forgotten that we get ctypes for free in 2.5.

So we'd be looking at:

     ctypes.windll.kernel32.GetEnvironmentVariableW(u'PATH_INFO', ...)

when CPython 2.5+/NT is detected, right? That increases the number of 
situations in which we can feasibly recover URIs that are valid UTF-8 
sequences (modulo the slash anyway). Doing the actual recovery still 
requires some server-sniffing though.

> What is IIS doing wrong here?

It's not wrong as such. There are three reasonable choices for decoding 
header values before putting them in a Unicode environment, and the CGI 
spec, as it knows nothing about Unicode environment variables, fails to 
specify which:

     1. ISO-8859-1 (which ensures bytes can be recovered)
     2. UTF-8 (since most URIs are effectively UTF-8 today)
     3. Configured system codepage (mbcs)

Apache [with mod_cgi or mod_wsgi] decides on (1). IIS tries for (2), 
falling back to (3) on invalid sequences. The text concerning Python 3.0 
in the WSGI Amendments page could be read as blessing Apache's behaviour.

However wsgiref.simple_server currently also goes for (2), although that 
probably can't be considered canonical. I'd be interested to know what 
other WSGI servers do.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/

From and-py at doxdesk.com  Mon Nov 17 18:55:48 2008
From: and-py at doxdesk.com (Andrew Clover)
Date: Mon, 17 Nov 2008 18:55:48 +0100
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <49206117.4020103@colorstudy.com>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
	<49206117.4020103@colorstudy.com>
Message-ID: <4921B024.90804@doxdesk.com>

Ian Bicking wrote:

> To resolve this, let's just not pass it over this time?

+1

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/

From mark.mchristensen at gmail.com  Mon Nov 17 19:43:48 2008
From: mark.mchristensen at gmail.com (Mark Ramm)
Date: Mon, 17 Nov 2008 13:43:48 -0500
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <4921B024.90804@doxdesk.com>
References: <491F9099.2090508@colorstudy.com>
	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
	<49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com>
Message-ID: <ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>

On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover <and-py at doxdesk.com> wrote:
> Ian Bicking wrote:
>
>> To resolve this, let's just not pass it over this time?

Totally agreed.

What exactly needs to happen next?

From ianb at colorstudy.com  Mon Nov 17 19:51:04 2008
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 17 Nov 2008 12:51:04 -0600
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>	<49206117.4020103@colorstudy.com>
	<4921B024.90804@doxdesk.com>
	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
Message-ID: <4921BD18.8000908@colorstudy.com>

Mark Ramm wrote:
> On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover <and-py at doxdesk.com> wrote:
>> Ian Bicking wrote:
>>
>>> To resolve this, let's just not pass it over this time?
> 
> Totally agreed.
> 
> What exactly needs to happen next?

We need to propose a change to the WSGI specification.  I propose, in 
"Input and Error Streams" 
(http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
change it to have "readline(hint)" and expand Note 3 to include readline 
as well as readlines, removing Note 2.  Also I suppose some sort of 
change note in the specification?

Does this sound like a sufficient change to the spec, and are there any 
objections to the change?

-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org

From tseaver at palladion.com  Mon Nov 17 20:01:08 2008
From: tseaver at palladion.com (Tres Seaver)
Date: Mon, 17 Nov 2008 14:01:08 -0500
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <4921BD18.8000908@colorstudy.com>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>	<49206117.4020103@colorstudy.com>	<4921B024.90804@doxdesk.com>	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com>
Message-ID: <gfsf1l$5j1$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ian Bicking wrote:
> Mark Ramm wrote:
>> On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover <and-py at doxdesk.com> wrote:
>>> Ian Bicking wrote:
>>>
>>>> To resolve this, let's just not pass it over this time?
>> Totally agreed.
>>
>> What exactly needs to happen next?
> 
> We need to propose a change to the WSGI specification.  I propose, in 
> "Input and Error Streams" 
> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
> change it to have "readline(hint)" and expand Note 3 to include readline 
> as well as readlines, removing Note 2.  Also I suppose some sort of 
> change note in the specification?
>
> Does this sound like a sufficient change to the spec, and are there any 
> objections to the change?

+1 from me.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFJIb90+gerLs4ltQ4RAt/5AJdkn2ObmgAN2SU3dd8E4KNXolz5AJwIgOJP
D9ZKBwF5jUunMrlQXaDbkA==
=hUNu
-----END PGP SIGNATURE-----


From manlio_perillo at libero.it  Mon Nov 17 20:49:16 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Mon, 17 Nov 2008 20:49:16 +0100
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <4921BD18.8000908@colorstudy.com>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>	<49206117.4020103@colorstudy.com>	<4921B024.90804@doxdesk.com>	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com>
Message-ID: <4921CABC.60904@libero.it>

Ian Bicking ha scritto:
> [...]
> We need to propose a change to the WSGI specification.  I propose, in 
> "Input and Error Streams" 
> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
> change it to have "readline(hint)" and expand Note 3 to include readline 
> as well as readlines, removing Note 2.  Also I suppose some sort of 
> change note in the specification?
> 
> Does this sound like a sufficient change to the spec, and are there any 
> objections to the change?
> 

Fine for me, but of course we need to do this as:
1) Errata to WSGI 1.0
or
2) WSGI 1.1
or
3) WSGI 2.0

You can't just modify the current WSGI 1.0 spec.

I'm for 2), with the other clarifications about WSGI we have discussed 
in the past.


Regards  Manlio Perillo

From ianb at colorstudy.com  Mon Nov 17 21:23:13 2008
From: ianb at colorstudy.com (Ian Bicking)
Date: Mon, 17 Nov 2008 14:23:13 -0600
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <4921CABC.60904@libero.it>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>	<49206117.4020103@colorstudy.com>	<4921B024.90804@doxdesk.com>	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it>
Message-ID: <4921D2B1.5060004@colorstudy.com>

Manlio Perillo wrote:
> Ian Bicking ha scritto:
>> [...]
>> We need to propose a change to the WSGI specification.  I propose, in 
>> "Input and Error Streams" 
>> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
>> change it to have "readline(hint)" and expand Note 3 to include 
>> readline as well as readlines, removing Note 2.  Also I suppose some 
>> sort of change note in the specification?
>>
>> Does this sound like a sufficient change to the spec, and are there 
>> any objections to the change?
>>
> 
> Fine for me, but of course we need to do this as:
> 1) Errata to WSGI 1.0
> or
> 2) WSGI 1.1
> or
> 3) WSGI 2.0
> 
> You can't just modify the current WSGI 1.0 spec.
> 
> I'm for 2), with the other clarifications about WSGI we have discussed 
> in the past.

I'm for 1.  What other clarifications were you thinking of?


-- 
Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org

From pje at telecommunity.com  Mon Nov 17 21:25:41 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Mon, 17 Nov 2008 15:25:41 -0500
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <4921CABC.60904@libero.it>
References: <491F9099.2090508@colorstudy.com>
	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
	<49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com>
	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it>
Message-ID: <20081117202418.B603F3A4092@sparrow.telecommunity.com>

At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote:
>Ian Bicking ha scritto:
>>[...]
>>We need to propose a change to the WSGI specification.  I propose, 
>>in "Input and Error Streams" 
>>(http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) 
>>we change it to have "readline(hint)" and expand Note 3 to include 
>>readline as well as readlines, removing Note 2.  Also I suppose 
>>some sort of change note in the specification?
>>Does this sound like a sufficient change to the spec, and are there 
>>any objections to the change?
>
>Fine for me, but of course we need to do this as:
>1) Errata to WSGI 1.0
>or
>2) WSGI 1.1
>or
>3) WSGI 2.0
>
>You can't just modify the current WSGI 1.0 spec.
>
>I'm for 2), with the other clarifications about WSGI we have 
>discussed in the past.

I'm more inclined towards #1.  But in any event we need to get 
clearer about how the amendment or erratum will be phrased.


From fumanchu at aminus.org  Mon Nov 17 22:00:02 2008
From: fumanchu at aminus.org (Robert Brewer)
Date: Mon, 17 Nov 2008 13:00:02 -0800
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <4921D2B1.5060004@colorstudy.com>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>	<49206117.4020103@colorstudy.com>	<4921B024.90804@doxdesk.com>	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com><4921BD18.8000908@colorstudy.com>
	<4921CABC.60904@libero.it> <4921D2B1.5060004@colorstudy.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6405A414EF@ex10.hostedexchange.local>

Ian Bicking wrote:
> Manlio Perillo wrote:
> > Ian Bicking ha scritto:
> >> [...]
> >> We need to propose a change to the WSGI specification.  I propose,
> in
> >> "Input and Error Streams"
> >> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams)
> we
> >> change it to have "readline(hint)" and expand Note 3 to include
> >> readline as well as readlines, removing Note 2.  Also I suppose
some
> >> sort of change note in the specification?
> >>
> >> Does this sound like a sufficient change to the spec, and are there
> >> any objections to the change?
> >>
> >
> > Fine for me, but of course we need to do this as:
> > 1) Errata to WSGI 1.0
> > or
> > 2) WSGI 1.1
> > or
> > 3) WSGI 2.0
> >
> > You can't just modify the current WSGI 1.0 spec.
> >
> > I'm for 2), with the other clarifications about WSGI we have
> discussed
> > in the past.
> 
> I'm for 1.  What other clarifications were you thinking of?

PLEASE don't ask, don't tell. Let's not complicate this change by
conflating it with others yet again.


Robert Brewer
fumanchu at aminus.org


From manlio_perillo at libero.it  Mon Nov 17 22:13:05 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Mon, 17 Nov 2008 22:13:05 +0100
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <20081117202418.B603F3A4092@sparrow.telecommunity.com>
References: <491F9099.2090508@colorstudy.com>
	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
	<49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com>
	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it>
	<20081117202418.B603F3A4092@sparrow.telecommunity.com>
Message-ID: <4921DE61.8030204@libero.it>

Phillip J. Eby ha scritto:
> At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote:
>> Ian Bicking ha scritto:
>>> [...]
>>> We need to propose a change to the WSGI specification.  I propose, in 
>>> "Input and Error Streams" 
>>> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
>>> change it to have "readline(hint)" and expand Note 3 to include 
>>> readline as well as readlines, removing Note 2.  Also I suppose some 
>>> sort of change note in the specification?
>>> Does this sound like a sufficient change to the spec, and are there 
>>> any objections to the change?
>>
>> Fine for me, but of course we need to do this as:
>> 1) Errata to WSGI 1.0
>> or
>> 2) WSGI 1.1
>> or
>> 3) WSGI 2.0
>>
>> You can't just modify the current WSGI 1.0 spec.
>>
>> I'm for 2), with the other clarifications about WSGI we have discussed 
>> in the past.
> 
> I'm more inclined towards #1.  

I'm not sure, since it is an API change; of course if there was an error 
in the API this should be an errata, but there is a rationale behind the 
current API.

I'm fine, however, with an amendment.


 > [...]


Regards   Manlio Perillo

From manlio_perillo at libero.it  Mon Nov 17 22:29:18 2008
From: manlio_perillo at libero.it (Manlio Perillo)
Date: Mon, 17 Nov 2008 22:29:18 +0100
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <4921D2B1.5060004@colorstudy.com>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>	<49206117.4020103@colorstudy.com>	<4921B024.90804@doxdesk.com>	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it>
	<4921D2B1.5060004@colorstudy.com>
Message-ID: <4921E22E.90001@libero.it>

Ian Bicking ha scritto:
> [...]
>> Fine for me, but of course we need to do this as:
>> 1) Errata to WSGI 1.0
>> or
>> 2) WSGI 1.1
>> or
>> 3) WSGI 2.0
>>
>> You can't just modify the current WSGI 1.0 spec.
>>
>> I'm for 2), with the other clarifications about WSGI we have discussed 
>> in the past.
> 
> I'm for 1.  What other clarifications were you thinking of?
> 

Here is a list of messages I have posted in the past.

- start_response and error checking
   25 September 2007
   http://mail.python.org/pipermail/web-sig/2007-September/002771.html
- hop-by-hop headers handling
   1 October 2007
   http://mail.python.org/pipermail/web-sig/2007-October/002775.html
- HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH
   12 December 2007
   http://mail.python.org/pipermail/web-sig/2007-December/003014.html
- a possible error in the WSGI spec
   20 December 2007
   http://mail.python.org/pipermail/web-sig/2007-December/003064.html
- calling start_response and the write from a separate thread
   27 December 2007
   http://mail.python.org/pipermail/web-sig/2007-December/003104.html
- WSGI and PEP 325
   20 May 2008
   http://mail.python.org/pipermail/web-sig/2008-May/003438.html


I'm rather sure there were other threads about clarifications of WSGI 1.0.

One of these was about if a WSGI gateway is allowed to skip the 
generation of the request body (assuming the WSGI applications returns a 
generator) if this is not required (the client cached copy of the 
request entity is up to date and the server is going to return 304 Not 
Modified)


Regards   Manlio Perillo

From tseaver at palladion.com  Mon Nov 17 22:36:02 2008
From: tseaver at palladion.com (Tres Seaver)
Date: Mon, 17 Nov 2008 16:36:02 -0500
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <4921DE61.8030204@libero.it>
References: <491F9099.2090508@colorstudy.com>	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>	<49206117.4020103@colorstudy.com>
	<4921B024.90804@doxdesk.com>	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>	<4921BD18.8000908@colorstudy.com>
	<4921CABC.60904@libero.it>	<20081117202418.B603F3A4092@sparrow.telecommunity.com>
	<4921DE61.8030204@libero.it>
Message-ID: <gfso44$8c5$1@ger.gmane.org>

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Manlio Perillo wrote:
> Phillip J. Eby ha scritto:
>> At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote:
>>> Ian Bicking ha scritto:
>>>> [...]
>>>> We need to propose a change to the WSGI specification.  I propose, in 
>>>> "Input and Error Streams" 
>>>> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we 
>>>> change it to have "readline(hint)" and expand Note 3 to include 
>>>> readline as well as readlines, removing Note 2.  Also I suppose some 
>>>> sort of change note in the specification?
>>>> Does this sound like a sufficient change to the spec, and are there 
>>>> any objections to the change?
>>> Fine for me, but of course we need to do this as:
>>> 1) Errata to WSGI 1.0
>>> or
>>> 2) WSGI 1.1
>>> or
>>> 3) WSGI 2.0
>>>
>>> You can't just modify the current WSGI 1.0 spec.
>>>
>>> I'm for 2), with the other clarifications about WSGI we have discussed 
>>> in the past.
>> I'm more inclined towards #1.  
> 
> I'm not sure, since it is an API change; of course if there was an error 
> in the API this should be an errata, but there is a rationale behind the 
> current API.
> 
> I'm fine, however, with an amendment.

Isn't the rationale completely defeated by the equivalent, relaxed form
for 'readlines' (note #3).  That was why I voted +1:  I couldn't see
that relaxing 'readline' to match 'readlines' would make life any harder
on server implementers.


Tres.
- --
===================================================================
Tres Seaver          +1 540-429-0999          tseaver at palladion.com
Palladion Software   "Excellence by Design"    http://palladion.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFJIePC+gerLs4ltQ4RAnsrAKCflurxZqxfJvjgX2YeU9XlXFDvPgCfQRcn
rHK7/cvRh9zm5x8PyTq3ZLE=
=c8v8
-----END PGP SIGNATURE-----


From graham.dumpleton at gmail.com  Mon Nov 17 23:30:50 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Tue, 18 Nov 2008 09:30:50 +1100
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <gfso44$8c5$1@ger.gmane.org>
References: <491F9099.2090508@colorstudy.com>
	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
	<49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com>
	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it>
	<20081117202418.B603F3A4092@sparrow.telecommunity.com>
	<4921DE61.8030204@libero.it> <gfso44$8c5$1@ger.gmane.org>
Message-ID: <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com>

2008/11/18 Tres Seaver <tseaver at palladion.com>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Manlio Perillo wrote:
>> Phillip J. Eby ha scritto:
>>> At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote:
>>>> Ian Bicking ha scritto:
>>>>> [...]
>>>>> We need to propose a change to the WSGI specification.  I propose, in
>>>>> "Input and Error Streams"
>>>>> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we
>>>>> change it to have "readline(hint)" and expand Note 3 to include
>>>>> readline as well as readlines, removing Note 2.  Also I suppose some
>>>>> sort of change note in the specification?
>>>>> Does this sound like a sufficient change to the spec, and are there
>>>>> any objections to the change?
>>>> Fine for me, but of course we need to do this as:
>>>> 1) Errata to WSGI 1.0
>>>> or
>>>> 2) WSGI 1.1
>>>> or
>>>> 3) WSGI 2.0
>>>>
>>>> You can't just modify the current WSGI 1.0 spec.
>>>>
>>>> I'm for 2), with the other clarifications about WSGI we have discussed
>>>> in the past.
>>> I'm more inclined towards #1.
>>
>> I'm not sure, since it is an API change; of course if there was an error
>> in the API this should be an errata, but there is a rationale behind the
>> current API.
>>
>> I'm fine, however, with an amendment.
>
> Isn't the rationale completely defeated by the equivalent, relaxed form
> for 'readlines' (note #3).  That was why I voted +1:  I couldn't see
> that relaxing 'readline' to match 'readlines' would make life any harder
> on server implementers.

I would be for (1) errata or amendment as reality is that there is
probably no WSGI implementation that disallows an argument to
readline() given that certain Python code such as cgi.FieldStorage
wouldn't work otherwise.

For such a clarification on existing practice, I see no point in
having to change wsgi.version in environ as it would just cause
confusion.

I would also like to see other changes to WSGI specification but now
is not the time, let us at least though get this obvious issue with
API dealt with. After that we can then perhaps have a discussion of
future of WSGI specification and whether there really is any interest
in future versions with more significant changes. Although, personally
I will not be holding my breath for that to happen. :-)

Graham

From pywebsig at xhaus.com  Tue Nov 18 13:02:37 2008
From: pywebsig at xhaus.com (Alan Kennedy)
Date: Tue, 18 Nov 2008 12:02:37 +0000
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
	specification
In-Reply-To: <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com>
References: <491F9099.2090508@colorstudy.com> <49206117.4020103@colorstudy.com>
	<4921B024.90804@doxdesk.com>
	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it>
	<20081117202418.B603F3A4092@sparrow.telecommunity.com>
	<4921DE61.8030204@libero.it> <gfso44$8c5$1@ger.gmane.org>
	<88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com>
Message-ID: <4a951aa00811180402g68e067a3m1e2a6dddb29d4e20@mail.gmail.com>

[Graham]
> I would be for (1) errata or amendment as reality is that there is
> probably no WSGI implementation that disallows an argument to
> readline() given that certain Python code such as cgi.FieldStorage
> wouldn't work otherwise.
>
> For such a clarification on existing practice, I see no point in
> having to change wsgi.version in environ as it would just cause
> confusion.

+1

[Graham]
> I would also like to see other changes to WSGI specification but now
> is not the time, let us at least though get this obvious issue with
> API dealt with. After that we can then perhaps have a discussion of
> future of WSGI specification and whether there really is any interest
> in future versions with more significant changes.

+1

Alan.

From pje at telecommunity.com  Tue Nov 18 16:44:57 2008
From: pje at telecommunity.com (Phillip J. Eby)
Date: Tue, 18 Nov 2008 10:44:57 -0500
Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI
 specification
In-Reply-To: <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com
 >
References: <491F9099.2090508@colorstudy.com>
	<88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com>
	<49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com>
	<ec2d68db0811171043u5d3711e4g7bcd7b74cf61d4d2@mail.gmail.com>
	<4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it>
	<20081117202418.B603F3A4092@sparrow.telecommunity.com>
	<4921DE61.8030204@libero.it> <gfso44$8c5$1@ger.gmane.org>
	<88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com>
Message-ID: <20081118154328.088B23A411A@sparrow.telecommunity.com>

At 09:30 AM 11/18/2008 +1100, Graham Dumpleton wrote:
>I would be for (1) errata or amendment as reality is that there is
>probably no WSGI implementation that disallows an argument to
>readline() given that certain Python code such as cgi.FieldStorage
>wouldn't work otherwise.

Please note that that was a change in Python 2.5; older Pythons 
(including Jython until very recently) would not have needed a 
readline() argument, and so are less likely to have been tested that way.


From and-py at doxdesk.com  Wed Nov 19 01:40:57 2008
From: and-py at doxdesk.com (Andrew Clover)
Date: Wed, 19 Nov 2008 01:40:57 +0100
Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets
In-Reply-To: <4921AFD0.7050506@doxdesk.com>
References: <491B2CFE.7060502@doxdesk.com>
	<491B65C6.3020206@colorstudy.com>	<491DB1E0.3070701@doxdesk.com>
	<491DB9C6.1070909@colorstudy.com>	<491DEC57.6080402@doxdesk.com>	<000c01c9485d$49ff0d20$ddfd2760$@com.au>
	<4921AFD0.7050506@doxdesk.com>
Message-ID: <49236099.7070604@doxdesk.com>

> ctypes.windll.kernel32.GetEnvironmentVariableW(u'PATH_INFO', ...)

Hmm... it turns out: no. IIS appears to be mangling characters that are 
not in mbcs even *before* it puts the decoded value into the envvars.

The same is true with isapi_wsgi, which is the only other WSGI adapter I 
know of for IIS. This gets the same mangled byte string from 
GetServerVariable as Python gets from the envvars, so it looks like this 
is a mistake IIS is making further up before it even hits the CGI 
handler. Maybe someone more familiar with ISAPI knows a better way to 
read PATH_INFO than GetServerVariable, but I can't see anything 
promising in MSDN.

So it would seem to be impossible at the moment to have Unicode paths 
work under IIS at all.

The ctypes approach could rescue bytes for the Apache/nt/Py2 combination 
(perhaps also from libc.getenv for Apache/posix/Py3), but then Apache 
already gives us REQUEST_URI which is a much easier workaround. There 
might be CGI servers for Windows where ctypes could serve some purpose, 
but I can't think of any currently in use other than the Big Two.

In summary, to get the original submitted byte strings for PATH_INFO:

Apache/nt/Py2
     process REQUEST_URI
Apache/posix/Py2
     use PATH_INFO directly
     (or process REQUEST_URI)
Apache/nt/Py3
     encode PATH_INFO to ISO-8859-1
     (or process REQUEST_URI)
Apache/posix/Py3
     process REQUEST_URI
IIS/nt/Py2
     decode PATH_INFO from mbcs, then encode to UTF-8
     FAIL for characters not in current mbcs
     FAIL for non-UTF-8 input
IIS/nt/Py3
     encode PATH_INFO to UTF-8
     FAIL for characters not in current mbcs
     FAIL for non-UTF-8 input
wsgiref.simple_server/Py2
     use PATH_INFO directly
wsgiref.simple_server/Py3
     remains to be seen, but at the moment encode PATH_INFO to UTF-8
     FAIL for non-UTF-8 input
cherrypy.wsgiserver/Py2
     use PATH_INFO directly
cherrypy.wsgiserver/Py3
     remains to be seen, but at the moment encode PATH_INFO to UTF-8
     FAIL for non-UTF-8 input

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/

From randy at rcs-comp.com  Sat Nov 22 06:50:45 2008
From: randy at rcs-comp.com (Randy Syring)
Date: Sat, 22 Nov 2008 00:50:45 -0500
Subject: [Web-SIG] Implementing File Upload Size Limits
Message-ID: <49279DB5.9090109@rcs-comp.com>

I am looking for opinions and thoughts on best practice for limiting 
file upload size.  I have a few considerations:

    * Ultimately, I would want my application with my method of handling
      forms to be able to give the user a message that the file size was
      too big.  That means that however, the size is limited, just
      blanking out wsgi.input and setting content-length to zero doesn't
      seem correct.  That would make it look like the form wasn't
      submitted with any data I believe.
    * Given the above, it seems that something would need to get put in
      the environment to tell middleware and the application that the
      file input was aborted, but what would be the best way for doing
      it?  Should it be some kind of standard, or just dependent on your
      server or middleware?
    * It seems best to implement this functionality as the very first
      middleware in the stack.  Since other middleware read and
      manipulate wsgi.input, handling the upload size at the application
      level wouldn't prevent middlware from wasting resources dealing
      with a very large file.

Is it possible to prevent the server from even accepting all the data 
(i.e. trying to save bandwidth and server resources) if the 
content-length is known to be too big?  Or is the server required to 
take all the client's data regardless, even if it ends up going in the 
bit bucket?  I realize some of this is server specific, not WSGI 
specific, but I would be interested in knowing how the most popular 
servers handle this or what the HTTP specs require if anyone knows.

Thanks in advance for any insight you might be able to provide.

-- 
--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081122/75882cf2/attachment.htm>

From randy at rcs-comp.com  Sat Nov 22 10:07:53 2008
From: randy at rcs-comp.com (Randy Syring)
Date: Sat, 22 Nov 2008 04:07:53 -0500
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <49279DB5.9090109@rcs-comp.com>
References: <49279DB5.9090109@rcs-comp.com>
Message-ID: <4927CBE9.7060609@rcs-comp.com>

I did find this:

http://wiki.pylonshq.com/display/pylonscookbook/A+Better+Way+To+Limit+File+Upload+Size

Which was good, but still leaves some unanswered questions:

    * What if one is not using the paste http server?
    * This method gives an unfriendly response.  What would be the best
      method to propagate this error condition down to the app so that a
      message could be given to the user in the context of the form they
      had previously submitted (i.e. an error message under the input
      field reminding them of the max upload size and even possibly
      telling them how big the file was they uploaded).

Thanks.

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or 
whatever you do, do all to the glory
of God." 1 Cor 10:31


Randy Syring wrote:
> I am looking for opinions and thoughts on best practice for limiting 
> file upload size.  I have a few considerations:
>
>     * Ultimately, I would want my application with my method of
>       handling forms to be able to give the user a message that the
>       file size was too big.  That means that however, the size is
>       limited, just blanking out wsgi.input and setting content-length
>       to zero doesn't seem correct.  That would make it look like the
>       form wasn't submitted with any data I believe.
>     * Given the above, it seems that something would need to get put
>       in the environment to tell middleware and the application that
>       the file input was aborted, but what would be the best way for
>       doing it?  Should it be some kind of standard, or just dependent
>       on your server or middleware?
>     * It seems best to implement this functionality as the very first
>       middleware in the stack.  Since other middleware read and
>       manipulate wsgi.input, handling the upload size at the
>       application level wouldn't prevent middlware from wasting
>       resources dealing with a very large file.
>
> Is it possible to prevent the server from even accepting all the data 
> (i.e. trying to save bandwidth and server resources) if the 
> content-length is known to be too big?  Or is the server required to 
> take all the client's data regardless, even if it ends up going in the 
> bit bucket?  I realize some of this is server specific, not WSGI 
> specific, but I would be interested in knowing how the most popular 
> servers handle this or what the HTTP specs require if anyone knows.
>
> Thanks in advance for any insight you might be able to provide.
> -- 
> --------------------------------------
> Randy Syring
> RCS Computers & Web Solutions
> 502-644-4776
> http://www.rcs-comp.com
>
> "Whether, then, you eat or drink or 
> whatever you do, do all to the glory
> of God." 1 Cor 10:31
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com
>   
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081122/7a814117/attachment.htm>

From graham.dumpleton at gmail.com  Sat Nov 22 10:12:26 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Sat, 22 Nov 2008 20:12:26 +1100
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <49279DB5.9090109@rcs-comp.com>
References: <49279DB5.9090109@rcs-comp.com>
Message-ID: <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>

2008/11/22 Randy Syring <randy at rcs-comp.com>:
> I am looking for opinions and thoughts on best practice for limiting file
> upload size.  I have a few considerations:
>
> Ultimately, I would want my application with my method of handling forms to
> be able to give the user a message that the file size was too big.  That
> means that however, the size is limited, just blanking out wsgi.input and
> setting content-length to zero doesn't seem correct.  That would make it
> look like the form wasn't submitted with any data I believe.
> Given the above, it seems that something would need to get put in the
> environment to tell middleware and the application that the file input was
> aborted, but what would be the best way for doing it?  Should it be some
> kind of standard, or just dependent on your server or middleware?
> It seems best to implement this functionality as the very first middleware
> in the stack.  Since other middleware read and manipulate wsgi.input,
> handling the upload size at the application level wouldn't prevent middlware
> from wasting resources dealing with a very large file.
>
> Is it possible to prevent the server from even accepting all the data (i.e.
> trying to save bandwidth and server resources) if the content-length is
> known to be too big?  Or is the server required to take all the client's
> data regardless, even if it ends up going in the bit bucket?  I realize some
> of this is server specific, not WSGI specific, but I would be interested in
> knowing how the most popular servers handle this or what the HTTP specs
> require if anyone knows.
>
> Thanks in advance for any insight you might be able to provide.

If you use Apache/mod_wsgi to host your WSGI application, the best way
of handling this is use the Apache LimitRequestNody directive for
appropriate context. This will result in Apache returning a
HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If
you need a custom error document for that response type use Apache
ErrorDocument directive to specify URL of handler which would generate
it.

Except for the custom error document if delegated to the WSGI
application, doing it this way results in it all being handled by
Apache/mod_wsgi and your WSGI application will not even be invoked.
The request body content would also not even be read by Apache at all.
Do note that whether this avoids the client sending the request body
input depends on whether the client was expecting a '100 Continue'
response before it send the data. Most web browsers still I believe
don't use '100 Continue' response.

This would be the preferred solution for Apache/mod_wsgi as it is
handled at lowest levels and guaranteed that request content wouldn't
be read at that point. It is however taking control out of your
application.

For Apache/mod_wsgi, if you do not do it this way but instead validate
content length in the WSGI application and have the WSGI application
return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then
whether the request content gets read depends on whether you are using
embedded mode or daemon mode of mod_wsgi.

If you use embedded mode, so long as your WSGI application doesn't
read the input and just returns the error response, the request
content wouldn't be read at all. If you are using daemon mode however,
then the request content would always be read by Apache child worker
process, even if client asked for '100 Continue' response. This is
because the Apache child worker process will always proxy request
content to the daemon process.

Anyway, that is how things are for Apache/mod_wsgi.

Graham

From randy at rcs-comp.com  Sat Nov 22 19:06:15 2008
From: randy at rcs-comp.com (Randy Syring)
Date: Sat, 22 Nov 2008 13:06:15 -0500
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>
References: <49279DB5.9090109@rcs-comp.com>
	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>
Message-ID: <49284A17.2050801@rcs-comp.com>

[forgot to copy list]

Graham Dumpleton wrote:
> 2008/11/22 Randy Syring <randy at rcs-comp.com>:
>   
>> I am looking for opinions and thoughts on best practice for limiting file
>> upload size.  I have a few considerations:
>>
>> <snip>
>>     
> If you use Apache/mod_wsgi to host your WSGI application, the best way
> of handling this is use the Apache LimitRequestNody directive for
> appropriate context. This will result in Apache returning a
> HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If
> you need a custom error document for that response type use Apache
> ErrorDocument directive to specify URL of handler which would generate
> it.
>   
Graham,

Thank you for your response.  What you noted above does seem to be the
lowest level solution possible if you are using apache.  I suppose using
an error document that is part of the application would at least allow
me to serve a specific page from my application that could detail the
error.  If I wanted to get fancy, each time a form with an input element
was sent to a user, I could save that path in a special variable in the
user's session.  My error page could then look for that value in the
user session and if present, load the correct form, giving the user an
error message noting that the file uploaded was too big.  The downfall
to that approach is that the form comes back empty.  It might be better
to just have the error page give them some details and encourage them to
use the back button, in which case the form's fields would hopefully
still be filled in.
> Except for the custom error document if delegated to the WSGI
> application, doing it this way results in it all being handled by
> Apache/mod_wsgi and your WSGI application will not even be invoked.
> The request body content would also not even be read by Apache at all.
> Do note that whether this avoids the client sending the request body
> input depends on whether the client was expecting a '100 Continue'
> response before it send the data. Most web browsers still I believe
> don't use '100 Continue' response.
>
> This would be the preferred solution for Apache/mod_wsgi as it is
> handled at lowest levels and guaranteed that request content wouldn't
> be read at that point. It is however taking control out of your
> application.
>   
Hopefully you can clarify something for me.  Lets assume that the client
does not use '100 Continue' but sends data immediately, after sending
the headers.  If the server never reads the request content, what does
that mean exactly?  Does the data get transferred over the wire but then
discarded or does the client not get to send the data until the server
reads the request body?  I.e. the client tries to "send" it, but the
content isn't actually transferred across the wire until the server
reads it.  I am just wondering if there is a buffer or queue or
something between the server and the client that allows data to be
transferred even if the server doesn't "read" the request body.  Or, is
it just like a straight pipe where one end (the client) can't push data
through until the other end (the server) reads it.

I agree that it does take control out of the application.  From a
usability perspective, the best solution IMO would be for the user to
get the form back and have a red error messsage under the input field
indicating the file size uploaded was too big and giving them the max
file size allowed.  However, on second thought, that may not be true.
As noted above, because the entire request body was rejected, the form
loaded would have none of the information they submitted and most users
would probably think they have to fill out the whole form again.
Probably better to just give them a non-form error page and let them use
the back button (or even provide a link that uses javascript to go back)
and in so doing hopefully salvage the time they put into the form.

I suppose, though, that two different kinds of file size limits need to
be thought through.  The first limit would be an application wide limit
that is set for security/resource reasons.  That, I believe, is what we
have been discussing up to this point.  I am just realizing that it
would also be fine to limit upload sizes at the application level and
give more user-friendly error messages.  So I might decide on a 10MB
application-wide upload limit, but I might also restrict free accounts
and paid accounts to 256k and 5MB respectively.  As long as a user
uploads something less than 10MB, they get a friendly in-line error
message.  If they upload over 10MB, we handle that at the apache level
and send them to a custom error page.
> For Apache/mod_wsgi, if you do not do it this way but instead validate
> content length in the WSGI application and have the WSGI application
> return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then
> whether the request content gets read depends on whether you are using
> embedded mode or daemon mode of mod_wsgi.
>
> If you use embedded mode, so long as your WSGI application doesn't
> read the input and just returns the error response, the request
> content wouldn't be read at all. If you are using daemon mode however,
> then the request content would always be read by Apache child worker
> process, even if client asked for '100 Continue' response. This is
> because the Apache child worker process will always proxy request
> content to the daemon process.
>
>   
Thats good to know.  I think at this point I have talked myself into
thinking that there is no good reason to handle it at the application
level, but would appreciate any further feedback you might have.

One other thing, what would be a good upload size limit?  Should it
always be as low as possible?  What might be a good "middle-ground" for
the average web application uploading documents and pictures?

Thank you for taking the time to respond.

--------------------------------------
Randy Syring
RCS Computers & Web Solutions
502-644-4776
http://www.rcs-comp.com

"Whether, then, you eat or drink or
whatever you do, do all to the glory
of God." 1 Cor 10:31


From brian at briansmith.org  Tue Nov 25 18:03:22 2008
From: brian at briansmith.org (Brian Smith)
Date: Tue, 25 Nov 2008 11:03:22 -0600
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <49284A17.2050801@rcs-comp.com>
References: <49279DB5.9090109@rcs-comp.com>	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>
	<49284A17.2050801@rcs-comp.com>
Message-ID: <006501c94f1f$ba54a620$2efdf260$@org>

Randy Syring wrote:
> Hopefully you can clarify something for me.  Lets assume that the
> client does not use '100 Continue' but sends data immediately, after
> sending the headers.  If the server never reads the request content,
> what does that mean exactly?  Does the data get transferred over the
> wire but then discarded or does the client not get to send the data
> until the server reads the request body?  I.e. the client tries to
> "send" it, but the content isn't actually transferred across the
> wire until the server reads it.  I am just wondering if there
> is a buffer or queue or something between the server and the client
> that allows data to be transferred even if the server doesn't
> "read" the request body.  Or, is it just like a straight pipe
> where one end (the client) can't push data through until the other
> end (the server) reads it.

Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in
this scenario. The input and the output are buffered separately both of
those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the
non-blocking I/O logic needed to prevent deadlocks. I heard (but did not
verify) that mod_fastcgi does not have this deadlocking problem. The sizes
of the buffers determines the size of the inputs and outputs needed to cause
a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default. 

Therefore, for maximum portability, a WSGI application should ALWAYS consume
the *whole* request body if it wants to avoid the deadlock using the
reference WSGI adapter in PEP 333 or mod_wsgi. 

Probably other WSGI gateways have similar issues. It would be nice if there
was a standard entry in the WSGI environment (e.g.
"wsgi.may_ignore_request_body") that could be used to safely detect when we
can skip the request body. It would be even nicer if WSGI gateways were
updated to avoid this problem. However, that is easier said than done.

If you know C, it is relatively simple to modify mod_wsgi to use a different
Apache<->daemon communication protocol so that the daemon mode works as you
would expect (no deadlocks, proper 100-continue support, request body isn't
read unless your application asks for it). A long time ago I had a patch
that did this (among other things) but I don't think I have it any more. 

However, once you get to that point, you still run into problems. If your
goal is to avoid reading the request body, then you need to close the
connection in your error response; Otherwise, if the request was a HTTP/1.1
request, you still need to read the entire request body in order to process
any requests that follow it in the request pipeline. Unfortunately, a WSGI
application doesn't have any way of signaling that the connection is to be
closed; the WSGI specification forbids the WSGI application from returning
the Connection header since it is hop-by-hop. And, even if there was such a
mechanism, a poorly-coded client is likely to still cause a deadlock if the
server doesn't read its full request. Make sure you test with all your
targeted browsers.

Consequently...

> > If you are using daemon mode however,
> > then the request content would always be read by Apache child worker
> > process, even if client asked for '100 Continue' response. This is
> > because the Apache child worker process will always proxy request
> > content to the daemon process.
> >
> Thats good to know.  I think at this point I have talked myself into
> thinking that there is no good reason to handle it at the application
> level, but would appreciate any further feedback you might have.

...if your users will often attempt to upload large files exceed your
limits, is to best to mitigate the problem on the client-side. First,
document the file size limit clearly on the page where the upload happens.
Secondly, implement a flash-based and/or java-based file upload control that
can be used when the user has Flash installed (fall back to the regular
control otherwise). With such an uploader, you can check the file size on
the client and prevent these requests from even being made (in the typical
case). You will still have to implement the validation logic on the server
to prevent malicious use and/or disabled Javascript/Flash/Java. There are
additional benefits to this approach (better UI, multi-file selection,
compression, encryption, doesn't waste the user's time, saves bandwidth) but
it comes with all the drawbacks inherent with Flash/Java/Javascript.

Regards,
Brian


From and-py at doxdesk.com  Tue Nov 25 21:14:52 2008
From: and-py at doxdesk.com (Andrew Clover)
Date: Tue, 25 Nov 2008 21:14:52 +0100
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <006501c94f1f$ba54a620$2efdf260$@org>
References: <49279DB5.9090109@rcs-comp.com>	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>	<49284A17.2050801@rcs-comp.com>
	<006501c94f1f$ba54a620$2efdf260$@org>
Message-ID: <492C5CBC.30006@doxdesk.com>

Brian Smith wrote:

> Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in
> this scenario.

Under IIS CGI it's considerably more likely. The output buffer you get 
is smaller than Apache/Linux (at least on Win2K3 it's only 2KB), so even 
a relatively small error page spat out before reading the whole input 
will result in a cheeky hang.

> Therefore, for maximum portability, a WSGI application should ALWAYS consume
> the *whole* request body if it wants to avoid the deadlock using the
> reference WSGI adapter in PEP 333 or mod_wsgi
(...in daemon mode)

yep.

-- 
And Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/


From graham.dumpleton at gmail.com  Tue Nov 25 23:59:10 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Wed, 26 Nov 2008 09:59:10 +1100
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <006501c94f1f$ba54a620$2efdf260$@org>
References: <49279DB5.9090109@rcs-comp.com>
	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>
	<49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org>
Message-ID: <88e286470811251459h6ed70717wbc22ba47009810d3@mail.gmail.com>

2008/11/26 Brian Smith <brian at briansmith.org>:
> Randy Syring wrote:
>> Hopefully you can clarify something for me.  Lets assume that the
>> client does not use '100 Continue' but sends data immediately, after
>> sending the headers.  If the server never reads the request content,
>> what does that mean exactly?  Does the data get transferred over the
>> wire but then discarded or does the client not get to send the data
>> until the server reads the request body?  I.e. the client tries to
>> "send" it, but the content isn't actually transferred across the
>> wire until the server reads it.  I am just wondering if there
>> is a buffer or queue or something between the server and the client
>> that allows data to be transferred even if the server doesn't
>> "read" the request body.  Or, is it just like a straight pipe
>> where one end (the client) can't push data through until the other
>> end (the server) reads it.
>
> Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in
> this scenario. The input and the output are buffered separately both of
> those buffers can fill up.

It isn't 'many situations', it is a quite specific situation.

The issue applies only to mod_wsgi daemon mode and only occurs where
the size of the request content body size is larger than the UNIX
socket buffer size for that platform and the WSGI application doesn't
consume all the request body. At the same time, the WSGI application
would then have to return a set of response headers and response body
which combined are also larger than the UNIX socket buffer size for
that platform.

> Neither mod_wsgi nor mod_cgid implement the
> non-blocking I/O logic needed to prevent deadlocks.

Both mod_wsgi and mod_cgi do have timeouts so that a permanent
deadlock situation at least doesn't arise. This is based off standard
Apache Timeout directive. AFAIK I know mod_cgid still has bug in it
whereby it doesn't detect it and so possibly easy way to DOS an Apache
server.

As far as changing how mod_wsgi works, there exists the issue:

  http://code.google.com/p/modwsgi/issues/detail?id=56

It is low priority though as no one has been reporting it as a problem
in actual use. Scenarios where it technically might be triggered would
generally be SPAM bots trying to POST large amounts of data to
arbitrary URLs. If an application is function as intended, the
situation shouldn't really arise as POST requests should be getting
directed at URLs which will consume it.

That issue also references the IIS+CGI issue someone else mentioned:

  http://www.doxdesk.com/updates/2006.html#u20060416-cgi

FWIW, mod_scgi also has same problem and it doesn't implement timeouts
so can suffer permanent deadlock.

> I heard (but did not
> verify) that mod_fastcgi does not have this deadlocking problem. The sizes
> of the buffers determines the size of the inputs and outputs needed to cause
> a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default.

MacOS X is only system I know of that has small default UNIX socket
buffer sizes. This small buffer size only applies to UNIX socket
buffer sizes, for INET sockets it is much much larger. Since
mod_fastcgi predominantly uses INET sockets, if there is an issue it
may not be obvious as you would need to be returning very large
response. From what I remember when I looked at mod_fastcgi and
mod_proxy for certain types of operations they both try and force all
request content down the socket before trying to read response. Thus,
am not convinced that problem couldn't actually occur for both of
these as well, but since INET socket buffer size much much larger, not
generally triggered.

To work around UNIX socket buffer size on mod_wsgi, there are options
which can be supplied to WSGIDaemonProcess to change the UNIX socket
buffer sizes used to something more sensible.

> Therefore, for maximum portability, a WSGI application should ALWAYS consume
> the *whole* request body if it wants to avoid the deadlock using the
> reference WSGI adapter in PEP 333 or mod_wsgi.
>
> Probably other WSGI gateways have similar issues. It would be nice if there
> was a standard entry in the WSGI environment (e.g.
> "wsgi.may_ignore_request_body") that could be used to safely detect when we
> can skip the request body. It would be even nicer if WSGI gateways were
> updated to avoid this problem. However, that is easier said than done.
>
> If you know C, it is relatively simple to modify mod_wsgi to use a different
> Apache<->daemon communication protocol so that the daemon mode works as you
> would expect (no deadlocks, proper 100-continue support, request body isn't
> read unless your application asks for it). A long time ago I had a patch
> that did this (among other things) but I don't think I have it any more.

Depends on your definition of simple. It would be quite fiddly to do
and get right, or one would have to rewrite a large amount of code. I
wouldn't regard either as really that simple.

> However, once you get to that point, you still run into problems. If your
> goal is to avoid reading the request body, then you need to close the
> connection in your error response; Otherwise, if the request was a HTTP/1.1
> request, you still need to read the entire request body in order to process
> any requests that follow it in the request pipeline. Unfortunately, a WSGI
> application doesn't have any way of signaling that the connection is to be
> closed; the WSGI specification forbids the WSGI application from returning
> the Connection header since it is hop-by-hop. And, even if there was such a
> mechanism, a poorly-coded client is likely to still cause a deadlock if the
> server doesn't read its full request. Make sure you test with all your
> targeted browsers.

Apache, and I would expect any sensible web server, always closes a
client connection when error responses are returned. Thus it will only
allow request pipelining so long as 200 response is returned. Okay, it
isn't this simple as Apache looks at lots of other things as well, but
close enough.

The WSGI specification may forbid returning Connection header, but if
you do do it with mod_wsgi, then Apache will note it and close the
connection even if 200 response is returned.

Graham

> Consequently...
>
>> > If you are using daemon mode however,
>> > then the request content would always be read by Apache child worker
>> > process, even if client asked for '100 Continue' response. This is
>> > because the Apache child worker process will always proxy request
>> > content to the daemon process.
>> >
>> Thats good to know.  I think at this point I have talked myself into
>> thinking that there is no good reason to handle it at the application
>> level, but would appreciate any further feedback you might have.
>
> ...if your users will often attempt to upload large files exceed your
> limits, is to best to mitigate the problem on the client-side. First,
> document the file size limit clearly on the page where the upload happens.
> Secondly, implement a flash-based and/or java-based file upload control that
> can be used when the user has Flash installed (fall back to the regular
> control otherwise). With such an uploader, you can check the file size on
> the client and prevent these requests from even being made (in the typical
> case). You will still have to implement the validation logic on the server
> to prevent malicious use and/or disabled Javascript/Flash/Java. There are
> additional benefits to this approach (better UI, multi-file selection,
> compression, encryption, doesn't waste the user's time, saves bandwidth) but
> it comes with all the drawbacks inherent with Flash/Java/Javascript.
>
> Regards,
> Brian
>
>
>

From brian at briansmith.org  Wed Nov 26 16:01:43 2008
From: brian at briansmith.org (Brian Smith)
Date: Wed, 26 Nov 2008 09:01:43 -0600
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <88e286470811251459h6ed70717wbc22ba47009810d3@mail.gmail.com>
References: <49279DB5.9090109@rcs-comp.com>	
	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>	
	<49284A17.2050801@rcs-comp.com>
	<006501c94f1f$ba54a620$2efdf260$@org>
	<88e286470811251459h6ed70717wbc22ba47009810d3@mail.gmail.com>
Message-ID: <00c301c94fd7$e59acc20$b0d06460$@org>

Brian Smith wrote:
> 2008/11/26 Brian Smith <brian at briansmith.org>:
> > Under Apache CGI or mod_wsgi, in many situations you will get a
> > deadlock in this scenario. 
> 
> It isn't 'many situations', it is a quite specific situation.

Right. I meant that it can happen quite often (every time) that situation
occurs, depending on the characteristics of the application.
 
> > If you know C, it is relatively simple to modify mod_wsgi to use a
> > different Apache<->daemon communication protocol 
> 
> Depends on your definition of simple. It would be quite fiddly to do
> and get right, or one would have to rewrite a large amount of code. I
> wouldn't regard either as really that simple.

I did it by implementing the communication protocol that I had proposed on
the mod_wsgi mailing list a while ago. It is straightforward to do, but it
does take a lot of time to learn how mod_wsgi works in order to make the
change, especially if you have never written an Apache module before.

- Brian


From fumanchu at aminus.org  Thu Nov 27 18:07:31 2008
From: fumanchu at aminus.org (Robert Brewer)
Date: Thu, 27 Nov 2008 09:07:31 -0800
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <006501c94f1f$ba54a620$2efdf260$@org>
References: <49279DB5.9090109@rcs-comp.com>	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com><49284A17.2050801@rcs-comp.com>
	<006501c94f1f$ba54a620$2efdf260$@org>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6405C7759F@ex10.hostedexchange.local>

Brian Smith wrote:
> Randy Syring wrote:
> > Hopefully you can clarify something for me.  Lets assume that the
> > client does not use '100 Continue' but sends data immediately, after
> > sending the headers.  If the server never reads the request content,
> > what does that mean exactly?  Does the data get transferred over the
> > wire but then discarded or does the client not get to send the data
> > until the server reads the request body?  I.e. the client tries to
> > "send" it, but the content isn't actually transferred across the
> > wire until the server reads it.  I am just wondering if there
> > is a buffer or queue or something between the server and the client
> > that allows data to be transferred even if the server doesn't
> > "read" the request body.  Or, is it just like a straight pipe
> > where one end (the client) can't push data through until the other
> > end (the server) reads it.
> 
> Under Apache CGI or mod_wsgi, in many situations you will get a
> deadlock in
> this scenario. The input and the output are buffered separately both
of
> those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the
> non-blocking I/O logic needed to prevent deadlocks. I heard (but did
> not
> verify) that mod_fastcgi does not have this deadlocking problem. The
> sizes
> of the buffers determines the size of the inputs and outputs needed to
> cause
> a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by
> default.
> 
> Therefore, for maximum portability, a WSGI application should ALWAYS
> consume
> the *whole* request body if it wants to avoid the deadlock using the
> reference WSGI adapter in PEP 333 or mod_wsgi.

Indeed. This is covered in RFC 2616 Section 8.2.3:

    If an origin server receives a request that does not include an
    Expect request-header field with the "100-continue" expectation,
    the request includes a request body, and the server responds
    with a final status code before reading the entire request body
    from the transport connection, then the server SHOULD NOT close
    the transport connection until it has read the entire request,
    or until the client closes the connection. Otherwise, the client
    might not reliably receive the response message. However, this
    requirement is not be construed as preventing a server from
    defending itself against denial-of-service attacks, or from
    badly broken client implementations.

CherryPy's wsgiserver will read any remaining request body (which the
application hasn't read) before sending response headers.


Robert Brewer
fumanchu at aminus.org


From graham.dumpleton at gmail.com  Fri Nov 28 00:15:17 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Fri, 28 Nov 2008 10:15:17 +1100
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <F1962646D3B64642B7C9A06068EE1E6405C7759F@ex10.hostedexchange.local>
References: <49279DB5.9090109@rcs-comp.com>
	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>
	<49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org>
	<F1962646D3B64642B7C9A06068EE1E6405C7759F@ex10.hostedexchange.local>
Message-ID: <88e286470811271515l4e60ab3br3ae9fc3bf56588ac@mail.gmail.com>

2008/11/28 Robert Brewer <fumanchu at aminus.org>:
> Brian Smith wrote:
>> Randy Syring wrote:
>> > Hopefully you can clarify something for me.  Lets assume that the
>> > client does not use '100 Continue' but sends data immediately, after
>> > sending the headers.  If the server never reads the request content,
>> > what does that mean exactly?  Does the data get transferred over the
>> > wire but then discarded or does the client not get to send the data
>> > until the server reads the request body?  I.e. the client tries to
>> > "send" it, but the content isn't actually transferred across the
>> > wire until the server reads it.  I am just wondering if there
>> > is a buffer or queue or something between the server and the client
>> > that allows data to be transferred even if the server doesn't
>> > "read" the request body.  Or, is it just like a straight pipe
>> > where one end (the client) can't push data through until the other
>> > end (the server) reads it.
>>
>> Under Apache CGI or mod_wsgi, in many situations you will get a
>> deadlock in
>> this scenario. The input and the output are buffered separately both
> of
>> those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the
>> non-blocking I/O logic needed to prevent deadlocks. I heard (but did
>> not
>> verify) that mod_fastcgi does not have this deadlocking problem. The
>> sizes
>> of the buffers determines the size of the inputs and outputs needed to
>> cause
>> a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by
>> default.
>>
>> Therefore, for maximum portability, a WSGI application should ALWAYS
>> consume
>> the *whole* request body if it wants to avoid the deadlock using the
>> reference WSGI adapter in PEP 333 or mod_wsgi.
>
> Indeed. This is covered in RFC 2616 Section 8.2.3:
>
>    If an origin server receives a request that does not include an
>    Expect request-header field with the "100-continue" expectation,
>    the request includes a request body, and the server responds
>    with a final status code before reading the entire request body
>    from the transport connection, then the server SHOULD NOT close
>    the transport connection until it has read the entire request,
>    or until the client closes the connection. Otherwise, the client
>    might not reliably receive the response message. However, this
>    requirement is not be construed as preventing a server from
>    defending itself against denial-of-service attacks, or from
>    badly broken client implementations.
>
> CherryPy's wsgiserver will read any remaining request body (which the
> application hasn't read) before sending response headers.

A WSGI application could technically want to send response headers and
only then read remaining request content. I don't believe there is
anything in the WSGI specification which prevents that. If you are
discarding the request content as soon as response headers are
generated, that could technically be a problem for some use cases,
even if they may be obscure.

I cant tell from looking at latest CherryPy WSGI server code as has
been changed since last I looked at it and haven't yet had time to
grok it and run some tests, but previously in respect of where WSGI
specification says:

"""The server is not required to read past the client's specified
Content-Length, and is allowed to simulate an end-of-file condition if
the application attempts to read past that point."""

the CherryPy WSGI server code chose NOT to simulate an end-of-file
condition. This was the case as the amount of data read from
wsgi.input was never tracked. This meant that if application did try
and read more content than available and request pipelining occurring
then the read would hang as would not get an empty string returned as
would be normal for end-of-file condition for file like object.

If the code is still behaving this way, then it wouldn't be possible
for it to discard remaining input as how much was read wasn't tracked.

Looking at latest code I do note the presence of a wrapper around
socket used for wsgi.input, but haven't been able to work out yet
whether it returns a traditional empty string as end-of-file
condition, or whether it is going to instead raise your
MaxSizeExceeded exception and thus not be file like in it behaviour.

Can you perhaps explain what is going to happen when an attempt is
made to read more content than what was available and whether it is
actually going to raise an exception rather than just return an empty
string like file like objects would.

Personally I think that that part of WSGI specification should be
amended such that it is required that an end-of-file condition MUST be
indicated using an empty string just like with normal file like
objects. Just this one change would mean that one could call read()
with no arguments and have it return all input, whereas at the moment
WSGI specification does allow argument to read() be optional.

This would actually negate the whole need for applications to even
check/use CONTENT_LENGTH except for situations where it mattered such
as 413 response or where how it decided to process it was dependent on
size. That is, to get all request content you would just call read()
with no argument. If you wanted to process it in chunks, then it would
just loop reading a set chunk size until empty string returned and it
wouldn't need to track how much it read and short read the last chunk.
If applications worked this way then one could handle mutating input
filters that changed amount of request content, ie., decompression of
data, plus could handle chunked transfer encoding on request content
in a reasonable way without having to read it all in and buffer it
just to work out CONTENT_LENGTH.

Up till now, the only major WGSI server (ignoring wsgiref perhaps) I
knew of which didn't allow read() with no argument or which didn't
simulate end-of-file through empty string being returned was CherryPy
WSGI server. Now its code has been changed, but not sure if it still
does that or whether it has done something totally different to
everything else by raising an exception instead.

Graham

From fumanchu at aminus.org  Fri Nov 28 06:58:25 2008
From: fumanchu at aminus.org (Robert Brewer)
Date: Thu, 27 Nov 2008 21:58:25 -0800
Subject: [Web-SIG] Implementing File Upload Size Limits
In-Reply-To: <88e286470811271515l4e60ab3br3ae9fc3bf56588ac@mail.gmail.com>
References: <49279DB5.9090109@rcs-comp.com>
	<88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com>
	<49284A17.2050801@rcs-comp.com>
	<006501c94f1f$ba54a620$2efdf260$@org>
	<F1962646D3B64642B7C9A06068EE1E6405C7759F@ex10.hostedexchange.local>
	<88e286470811271515l4e60ab3br3ae9fc3bf56588ac@mail.gmail.com>
Message-ID: <F1962646D3B64642B7C9A06068EE1E6405C7763E@ex10.hostedexchange.local>

Graham Dumpleton wrote:
> 2008/11/28 Robert Brewer <fumanchu at aminus.org>:
> > CherryPy's wsgiserver will read any remaining request body (which
the
> > application hasn't read) before sending response headers.
> 
> A WSGI application could technically want to send response headers and
> only then read remaining request content. I don't believe there is
> anything in the WSGI specification which prevents that. If you are
> discarding the request content as soon as response headers are
> generated, that could technically be a problem for some use cases,
> even if they may be obscure.

I'll look into that further.

> I cant tell from looking at latest CherryPy WSGI server code as has
> been changed since last I looked at it and haven't yet had time to
> grok it and run some tests, but previously in respect of where WSGI
> specification says:
> 
> """The server is not required to read past the client's specified
> Content-Length, and is allowed to simulate an end-of-file condition if
> the application attempts to read past that point."""
> 
> the CherryPy WSGI server code chose NOT to simulate an end-of-file
> condition. This was the case as the amount of data read from
> wsgi.input was never tracked. This meant that if application did try
> and read more content than available and request pipelining occurring
> then the read would hang as would not get an empty string returned as
> would be normal for end-of-file condition for file like object.
> 
> If the code is still behaving this way, then it wouldn't be possible
> for it to discard remaining input as how much was read wasn't tracked.
> 
> Looking at latest code I do note the presence of a wrapper around
> socket used for wsgi.input, but haven't been able to work out yet
> whether it returns a traditional empty string as end-of-file
> condition, or whether it is going to instead raise your
> MaxSizeExceeded exception and thus not be file like in it behaviour.

It still raises MaxSizeExceeded.

> Can you perhaps explain what is going to happen when an attempt is
> made to read more content than what was available and whether it is
> actually going to raise an exception rather than just return an empty
> string like file like objects would.
> 
> Personally I think that that part of WSGI specification should be
> amended such that it is required that an end-of-file condition MUST be
> indicated using an empty string just like with normal file like
> objects. Just this one change would mean that one could call read()
> with no arguments and have it return all input, whereas at the moment
> WSGI specification does allow argument to read() be optional.
> 
> This would actually negate the whole need for applications to even
> check/use CONTENT_LENGTH except for situations where it mattered such
> as 413 response or where how it decided to process it was dependent on
> size. That is, to get all request content you would just call read()
> with no argument. If you wanted to process it in chunks, then it would
> just loop reading a set chunk size until empty string returned and it
> wouldn't need to track how much it read and short read the last chunk.
> If applications worked this way then one could handle mutating input
> filters that changed amount of request content, ie., decompression of
> data, plus could handle chunked transfer encoding on request content
> in a reasonable way without having to read it all in and buffer it
> just to work out CONTENT_LENGTH.
> 
> Up till now, the only major WGSI server (ignoring wsgiref perhaps) I
> knew of which didn't allow read() with no argument or which didn't
> simulate end-of-file through empty string being returned was CherryPy
> WSGI server. Now its code has been changed, but not sure if it still
> does that or whether it has done something totally different to
> everything else by raising an exception instead.

I'd be open to changing it to EOF instead of error; amending the WSGI
spec would be nice too.


Robert Brewer
fumanchu at aminus.org


From luca.tebaldi at unife.it  Fri Nov 28 17:18:51 2008
From: luca.tebaldi at unife.it (Luca Tebaldi)
Date: Fri, 28 Nov 2008 17:18:51 +0100
Subject: [Web-SIG] web services ssl client
Message-ID: <47eb0bab0811280818y376bb769pc3528297ff46b0be@mail.gmail.com>

Hi,
should I build a client for web services that require authentication based
on a ca (pem and crt), I'm trying to use soappy but not work... someone have
any idea or can tell me where to find a tutorial?

tnx a lot!

Luca

-- 
skype:luca.tebaldi
bookmark: http://del.icio.us/lucatebaldi
foto: http://www.flickr.com/photos/teba/tags/
linkedin: http://www.linkedin.com/in/lucatebaldi
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20081128/ba204f09/attachment.htm>

From graham.dumpleton at gmail.com  Fri Nov 28 23:28:33 2008
From: graham.dumpleton at gmail.com (Graham Dumpleton)
Date: Sat, 29 Nov 2008 09:28:33 +1100
Subject: [Web-SIG] web services ssl client
In-Reply-To: <47eb0bab0811280818y376bb769pc3528297ff46b0be@mail.gmail.com>
References: <47eb0bab0811280818y376bb769pc3528297ff46b0be@mail.gmail.com>
Message-ID: <88e286470811281428l5b363e3he2a37662efb3e202@mail.gmail.com>

2008/11/29 Luca Tebaldi <luca.tebaldi at unife.it>:
> Hi,
> should I build a client for web services that require authentication based
> on a ca (pem and crt), I'm trying to use soappy but not work... someone have
> any idea or can tell me where to find a tutorial?

More appropriate forum for stuff related to Python and SOAP services is:

  http://groups.google.com/group/pywebsvcs?lnk=

Graham