From davidgshi at yahoo.co.uk Tue Nov 4 12:31:49 2008 From: davidgshi at yahoo.co.uk (David Shi) Date: Tue, 4 Nov 2008 11:31:49 +0000 (GMT) Subject: [Web-SIG] Seeking advice on user, session and folder management Message-ID: <137261.95767.qm@web26305.mail.ukl.yahoo.com> Looking for Python script to do the following. ? Can?anyone point me to right direction to implementing automatic registration, authentication similar to most modern web services?? I wish to obtain similar script to customise and further develop to add automatic allocation of folders by using their log-in username, automatically setting permissions to these folders, setting a time (say 5 days) before the content of these folders to be flushed out and folder to be deleted if the activity of accessing the folder is dormant. ? Sincerely, ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidgshi at yahoo.co.uk Wed Nov 5 16:01:50 2008 From: davidgshi at yahoo.co.uk (David Shi) Date: Wed, 5 Nov 2008 15:01:50 +0000 (GMT) Subject: [Web-SIG] Looking for Python script to upload large data files over the internet Message-ID: <21746.66756.qm@web26304.mail.ukl.yahoo.com> Can anyone help? ? I am looking for excellent Python scripts to upload large data files over the internet. Regards. ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From plynch1976 at hotmail.com Wed Nov 5 20:49:19 2008 From: plynch1976 at hotmail.com (Pat Lynch) Date: Wed, 5 Nov 2008 19:49:19 +0000 Subject: [Web-SIG] ZSI client to .NET server problem? Message-ID: hey all, I've created a webservice client using ZSI (-l -b -u options). The function that I'm trying to access on the webserver takes one param (a complex type). I had to use the -l option because the type actually has a member var which is of the same type (so I used to get the recursive error otherwise). Anyway, when I call the .NET webservice, the param is received as NULL?? any ideas on what could be causing this?? I turned on debug on the client side & the xml seems to be well-formed (the only difference I can see between my xml & xml sent from a sample .net client is that the namespace is part of the element instead of the parent).. I tried the approach mentioned here (http://article.gmane.org/gmane.comp.python.pywebsvcs.general/2211), but no improvement... I've been tearing my hair out since Monday on this, so any help would be appreciated :) thanks a million. regards, Pat. _________________________________________________________________ Get 30 Free Emoticons for your Windows Live Messenger http://www.livemessenger-emoticons.com/funfamily/en-ie/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidgshi at yahoo.co.uk Thu Nov 6 18:53:32 2008 From: davidgshi at yahoo.co.uk (David Shi) Date: Thu, 6 Nov 2008 17:53:32 +0000 (GMT) Subject: [Web-SIG] Looking for a nitty-gritty Python Ajax middleware script to fire off a number of processors Message-ID: <897945.80434.qm@web26305.mail.ukl.yahoo.com> Dear All, ? I am looking for a nitty-gritty Python Ajax script to fire off a number of processing programmes, periodically checking their operations, sending messages back to an HTML div form by sending back the links of generated data files, to be downloaded by end users. ? I am using .NET IIS 6.0 and Windows Server. ? Regards. ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidgshi at yahoo.co.uk Tue Nov 11 21:29:11 2008 From: davidgshi at yahoo.co.uk (David Shi) Date: Tue, 11 Nov 2008 20:29:11 +0000 (GMT) Subject: [Web-SIG] Has anyone tried calling zip.py in feedback.py and print out an innerHTML to provide a download link? Message-ID: <408436.82336.qm@web26307.mail.ukl.yahoo.com> Hello, there. ? Has anyone tried calling zip.py in feedback.py and print out an innerHTML to provide a download link? ? I find difficult to make it work. ? Sincerely, ? David ?#********************************************************************** # Description: # Zips the contents of a folder. # Parameters: # 0 - Input folder. # 1 - Output zip file. It is assumed that the user added the .zip # extension. #********************************************************************** # Import modules and create the geoprocessor # import sys, zipfile, arcgisscripting, os, traceback gp = arcgisscripting.create() # Function for zipping files. If keep is true, the folder, along with # all its contents, will be written to the zip file. If false, only # the contents of the input folder will be written to the zip file - # the input folder name will not appear in the zip file. # def zipws(path, zip, keep): path = os.path.normpath(path) # os.walk visits every subdirectory, returning a 3-tuple # of directory name, subdirectories in it, and filenames # in it. # for (dirpath, dirnames, filenames) in os.walk(path): # Iterate over every filename # for file in filenames: # Ignore .lock files # if not file.endswith('.lock'): gp.AddMessage("Adding %s..." % os.path.join(path, dirpath, file)) try: if keep: zip.write(os.path.join(dirpath, file), os.path.join(os.path.basename(path), os.path.join(dirpath, file)[len(path)+len(os.sep):])) else: zip.write(os.path.join(dirpath, file), os.path.join(dirpath[len(path):], file)) except Exception, e: gp.AddWarning(" Error adding %s: %s" % (file, e)) return None if __name__ == '__main__': try: # Get the tool parameter values # infolder = gp.GetParameterAsText(0) outfile = gp.GetParameterAsText(1) # Create the zip file for writing compressed data. In some rare # instances, the ZIP_DEFLATED constant may be unavailable and # the ZIP_STORED constant is used instead. When ZIP_STORED is # used, the zip file does not contain compressed data, resulting # in large zip files. # try: zip = zipfile.ZipFile(outfile, 'w', zipfile.ZIP_DEFLATED) zipws(infolder, zip, True) zip.close() except RuntimeError: # Delete zip file if exists # if os.path.exists(outfile): os.unlink(outfile) zip = zipfile.ZipFile(outfile, 'w', zipfile.ZIP_STORED) zipws(infolder, zip, True) zip.close() gp.AddWarning(" Unable to compress zip file contents.") gp.AddMessage("Zip file created successfully") except: # Return any python specific errors as well as any errors from the geoprocessor # tb = sys.exc_info()[2] tbinfo = traceback.format_tb(tb)[0] pymsg = "PYTHON ERRORS:\nTraceback Info:\n" + tbinfo + "\nError Info:\n " + str(sys.exc_type)+ ": " + str(sys.exc_value) + "\n" gp.AddError(pymsg) msgs = "GP ERRORS:\n" + gp.GetMessages(2) + "\n" gp.AddError(msgs) ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From and-py at doxdesk.com Wed Nov 12 20:22:38 2008 From: and-py at doxdesk.com (Andrew Clover) Date: Wed, 12 Nov 2008 20:22:38 +0100 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets Message-ID: <491B2CFE.7060502@doxdesk.com> It would be lovely if we could allow WSGI applications to reliably accept Unicode paths. That is to say, allow WSGI apps to have beautiful URLs like Wikipedia's, without requiring URL-rewriting magic. (Which is so highly server-specific, potentially unavailable to non-admin webmasters, and makes WSGI app deployment more difficult than it already is.) If we could reliably read the bytes the browser sends to us in the GET request that would be great, we could just decode those and be done with it. Unfortunately, that's not reliable, because: 1. thanks to an old wart in the CGI specification, %XX hex escapes are decoded before the character is put into the PATH_INFO environment variable; 2. the environment variables may be stored as Unicode. (1) on its own gives us the problem of not being able to distinguish a path-separator slash from an encoded %2F; a long-known problem but not one that greatly affects most people. But combined with (2) that means some other component must choose how to decode the bytes into Unicode characters. No standard currently specifies what encoding to use, it is not typically configuarable, and it's certainly not within reach of the WSGI application. My assumption is that most applications will want to end up with UTF-8-encoded URLs; other choices are certainly possible but as we move towards IRI they become less likely. This situation previously affected only Windows users, because NT environment variables are native Unicode. However, Python 3.0 specifies all environment variable access is through a Unicode wrapper, and gives no way to control how that automatic decoding is done, leaving everyone in the same boat. WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ should be "decoded from the headers using HTTP standard encodings (i.e. latin-1 + RFC 2047)", but unfortunately this doesn't quite work: 1. for many existing environments the decoding-from-headers charset is out of reach of the WSGI server/layer and may well not be ISO-8859-1. Even wsgiref doesn't currently use 8859-1 (see below). 2. RFC2047 is not applicable to HTTP headers, which are not really 822-family headers even though they look just like them. The sub-headers in eg. a multipart/form-data chunk *are* (probably) proper 822 headers so RFC2047 could apply, but those headers are already dealt with by the application or framework, not WSGI. HTTP 1.1 (RFC2616) does refer to RFC2047 as an encoding mechanism for TEXT and quoted-string, but this makes no sense as 2047 itself requires embedding in atom-based parsing sequences which those productions are not (quoted-strings are explicitly disallowed by 2047 itself). In any case no existing browser attempts to support RFC2047 encoding rules for any possible interpretation of what 2616 might mean. Something like Lu?s Bruno's ORIGINAL_PATH_INFO proposal (http://mail.python.org/pipermail/web-sig/2008-January/003124.html) would be worth looking at for this IMO. It may be of questionable usefulness if the only character affected is the slash, but it also happens to solve the Unicode problem. Obviously whatever it was called it would have to be an optional additional value in the WSGI environ, as pure CGI servers wouldn't be able to supply it. Conceivably it might also be possible to have a standardised mod_rewrite rule to make the variable also available to Apache CGI scripts, but still this is far from global availability. In the meantime I've been looking at how various combinations of servers deal with this issue, and in what circumstances an application or middleware can safely recover all possible Unicode input. 'Apache' refers to the (AFAICT-identical) behaviour of both mod_cgi and mod_wsgi; 'IIS' refers to IIS with CGI. *** Apache/Posix/Python2 OK. No problem here, it's byte-based all the way through. *** Apache/Posix/Python3: Dependent on the default encoding. Apache puts bytes into the envvars but Python takes them out as unicode. If the system default encoding happens to be the same as the encoding the WSGI application wanted we will be OK. Normally the app will want UTF-8; many Linux distributions do use UTF-8 as the default system encoding but there are plenty of distros (eg. Debian) and other Unixen that do not. In any case we are getting a nasty system dependency at deploy time that many webmasters will not be able to resolve. It is sometimes possible to recover mangled characters despite the wrong decoding having been applied. For example if the system encoding was ISO-8859-1 or another encoding that maps every byte to a unique Unicode character, we can encode the Unicode string back to its original bytes, and thence apply the decoding we actually wanted! If, on the other hand, it's something like ISO-8859-4, where not all high bytes are mapped at all, we'll be losing random characters... not good. *** Apache/NT/Python2 Always unrecoverable data loss. Apache on Windows always uses ISO-8859-1 to decode the request path and put it in the Unicode envvars. This is OK so far, we have Unicode characters with the same codepoints as the original bytes. However, Python2 needs to make the envvars available as bytes. It uses the system default encoding; if that were ISO-8859-1, we'd be OK. But it never is. Western European on NT is actually cp1252, whose characters in the range 0x80 to 0x9F differ from ISO-8859-1. And if the app wants UTF-8, chances are those characters are going to come up a lot. There is as far as I know no user-selectable Windows codepage that can map all the Unicode characters up to U+00FF. *** Apache/NT/Python3 Wrong, but always recoverable. Python retreives the bytes-encoded-into-Unicode-codepoints string directly from the envvars. If the encoding should have been UTF-8 or something else other than ISO-8859-1, we can recover the original bytes by re-encoding to 8859-1, then decoding using the real charset. *** IIS/NT/Python2 Mostly unrecoverable data loss. IIS decodes submitted bytes to Unicode using UTF-8 when it can. But if there is an invalid UTF-8 sequence in the bytes it will try again using the system codepage. Python will then re-encode the Unicode envvar using the system codepage. If the app is expecting UTF-8 we can decode what Python gives us using the system codepage (ie. 'mbcs') and get back any of the submitted characters that happened to be in this server's system codepage. Other characters may be replaced by question marks or Windows's best attempts to give us something useful, which at best may be a character shorn of diacriticals and at worst something just completely wrong. NT's system codepage is never UTF-8, it is not a user-selectable option never mind the default. We can improve our chances of getting more characters through by using a character set with a wide repertoire, such as cp932 (Shift-JIS). But it's still not really proper Unicode support. If the app is expecting something non-UTF-8 there's not much hope. Even if it wanted the same character set as the system codepage, it can't be sure that the submitted bytes didn't happen to also be a valid UTF-8 sequence, and thus get mangled by IIS decoding them that way. *** IIS/NT/Python3 OK, as long as the app wants UTF-8. Incoming UTF-8 bytes are reliably converted to Unicode strings by IIS, and directly read by Python from the envvars. If the application didn't want UTF-8 the situation is about as hopeless as with Python2. *** wsgiref.simple_server/(any)/Python2 OK. Bytes all the way through. *** wsgiref.simple_server/(any)/Python3: Probably will be OK, as long as the app wants UTF-8. simple_server is currently broken in rc2. However judging by the code, it is using urllib.parse.unquote, which assumes UTF-8, so it'll be fine for apps that want UTF-8 and hopeless for those that don't. I'd be very interested to hear what other servers are doing in this situation - nginx? cherrypy's one? - and wonder if any particular behaviour should be 'blessed'. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From ianb at colorstudy.com Thu Nov 13 00:24:54 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Wed, 12 Nov 2008 17:24:54 -0600 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <491B2CFE.7060502@doxdesk.com> References: <491B2CFE.7060502@doxdesk.com> Message-ID: <491B65C6.3020206@colorstudy.com> Andrew Clover wrote: > If we could reliably read the bytes the browser sends to us in the GET > request that would be great, we could just decode those and be done with > it. Unfortunately, that's not reliable, because: > > 1. thanks to an old wart in the CGI specification, %XX hex escapes are > decoded before the character is put into the PATH_INFO environment > variable; I don't see a problem with this? At least not a problem with respect to encoding. As it is (in Python 2), you should do something like environ['PATH_INFO'].decode('utf8') and it should work. It doesn't seem like there's any distinction between %-encoded characters and plain characters in this situation. > 2. the environment variables may be stored as Unicode. > > (1) on its own gives us the problem of not being able to distinguish a > path-separator slash from an encoded %2F; a long-known problem but not > one that greatly affects most people. > > But combined with (2) that means some other component must choose how to > decode the bytes into Unicode characters. No standard currently > specifies what encoding to use, it is not typically configuarable, and > it's certainly not within reach of the WSGI application. My assumption > is that most applications will want to end up with UTF-8-encoded URLs; > other choices are certainly possible but as we move towards IRI they > become less likely. > > > This situation previously affected only Windows users, because NT > environment variables are native Unicode. However, Python 3.0 specifies > all environment variable access is through a Unicode wrapper, and gives > no way to control how that automatic decoding is done, leaving everyone > in the same boat. > > WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ > should be "decoded from the headers using HTTP standard encodings (i.e. > latin-1 + RFC 2047)", but unfortunately this doesn't quite work: My understanding of this suggestion is that latin-1 is a way of representing bytes as unicode. In other words, the values will be unicode, but that will simply be a lie. So if you know you have UTF8 paths, you'd do: path_info = environ['PATH_INFO'].encode('latin-1').decode('utf8') As far as I can tell this is simply to avoid having bytes in the environment, even though bytes are an accurate representation and unicode is not. A lot of what you write about has to do with CGI, which is the only place WSGI interacts with os.environ. CGI is really an aspect of the CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the WSGI spec itself. Personally I'm more inclined to set up a policy on the WSGI server itself with respect to the encoding, and then use real unicode characters. Unfortunately that's not as flexible as bytes, as it doesn't make it very easy to sniff out the encoding in application-specific ways, or support different encodings in different parts of the server (which would be useful if, for instance, you were to proxy applications with unknown encodings). So... maybe that's not the most feasible option. But if it's not, then I'd rather stick with bytes. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From graham.dumpleton at gmail.com Thu Nov 13 00:44:53 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Thu, 13 Nov 2008 10:44:53 +1100 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <491B2CFE.7060502@doxdesk.com> References: <491B2CFE.7060502@doxdesk.com> Message-ID: <88e286470811121544ue9c46a4l77e4e011acece623@mail.gmail.com> FWIW, there was a past discussion on these issues on mod_wsgi list. I can't really remember what the outcome of the discussion was. The discussion is at: http://groups.google.com/group/modwsgi/browse_frm/thread/2471a1a71620629f Graham 2008/11/13 Andrew Clover : > It would be lovely if we could allow WSGI applications to reliably accept > Unicode paths. > > That is to say, allow WSGI apps to have beautiful URLs like Wikipedia's, > without requiring URL-rewriting magic. (Which is so highly server-specific, > potentially unavailable to non-admin webmasters, and makes WSGI app > deployment more difficult than it already is.) > > > If we could reliably read the bytes the browser sends to us in the GET > request that would be great, we could just decode those and be done with it. > Unfortunately, that's not reliable, because: > > 1. thanks to an old wart in the CGI specification, %XX hex escapes are > decoded before the character is put into the PATH_INFO environment variable; > > 2. the environment variables may be stored as Unicode. > > (1) on its own gives us the problem of not being able to distinguish a > path-separator slash from an encoded %2F; a long-known problem but not one > that greatly affects most people. > > But combined with (2) that means some other component must choose how to > decode the bytes into Unicode characters. No standard currently specifies > what encoding to use, it is not typically configuarable, and it's certainly > not within reach of the WSGI application. My assumption is that most > applications will want to end up with UTF-8-encoded URLs; other choices are > certainly possible but as we move towards IRI they become less likely. > > > This situation previously affected only Windows users, because NT > environment variables are native Unicode. However, Python 3.0 specifies all > environment variable access is through a Unicode wrapper, and gives no way > to control how that automatic decoding is done, leaving everyone in the same > boat. > > WSGI Amendments_1.0 includes a suggestion for Python 3.0 that environ should > be "decoded from the headers using HTTP standard encodings (i.e. latin-1 + > RFC 2047)", but unfortunately this doesn't quite work: > > 1. for many existing environments the decoding-from-headers charset is out > of reach of the WSGI server/layer and may well not be ISO-8859-1. Even > wsgiref doesn't currently use 8859-1 (see below). > > 2. RFC2047 is not applicable to HTTP headers, which are not really > 822-family headers even though they look just like them. The sub-headers in > eg. a multipart/form-data chunk *are* (probably) proper 822 headers so > RFC2047 could apply, but those headers are already dealt with by the > application or framework, not WSGI. HTTP 1.1 (RFC2616) does refer to RFC2047 > as an encoding mechanism for TEXT and quoted-string, but this makes no sense > as 2047 itself requires embedding in atom-based parsing sequences which > those productions are not (quoted-strings are explicitly disallowed by 2047 > itself). In any case no existing browser attempts to support RFC2047 > encoding rules for any possible interpretation of what 2616 might mean. > > > Something like Lu?s Bruno's ORIGINAL_PATH_INFO proposal > (http://mail.python.org/pipermail/web-sig/2008-January/003124.html) would be > worth looking at for this IMO. It may be of questionable usefulness if the > only character affected is the slash, but it also happens to solve the > Unicode problem. Obviously whatever it was called it would have to be an > optional additional value in the WSGI environ, as pure CGI servers wouldn't > be able to supply it. Conceivably it might also be possible to have a > standardised mod_rewrite rule to make the variable also available to Apache > CGI scripts, but still this is far from global availability. > > In the meantime I've been looking at how various combinations of servers > deal with this issue, and in what circumstances an application or middleware > can safely recover all possible Unicode input. 'Apache' refers to the > (AFAICT-identical) behaviour of both mod_cgi and mod_wsgi; 'IIS' refers to > IIS with CGI. > > > *** Apache/Posix/Python2 > OK. > > No problem here, it's byte-based all the way through. > > > *** Apache/Posix/Python3: > Dependent on the default encoding. > > Apache puts bytes into the envvars but Python takes them out as unicode. If > the system default encoding happens to be the same as the encoding the WSGI > application wanted we will be OK. Normally the app will want UTF-8; many > Linux distributions do use UTF-8 as the default system encoding but there > are plenty of distros (eg. Debian) and other Unixen that do not. In any case > we are getting a nasty system dependency at deploy time that many webmasters > will not be able to resolve. > > It is sometimes possible to recover mangled characters despite the wrong > decoding having been applied. For example if the system encoding was > ISO-8859-1 or another encoding that maps every byte to a unique Unicode > character, we can encode the Unicode string back to its original bytes, and > thence apply the decoding we actually wanted! If, on the other hand, it's > something like ISO-8859-4, where not all high bytes are mapped at all, we'll > be losing random characters... not good. > > > *** Apache/NT/Python2 > Always unrecoverable data loss. > > Apache on Windows always uses ISO-8859-1 to decode the request path and put > it in the Unicode envvars. This is OK so far, we have Unicode characters > with the same codepoints as the original bytes. However, Python2 needs to > make the envvars available as bytes. It uses the system default encoding; if > that were ISO-8859-1, we'd be OK. > > But it never is. Western European on NT is actually cp1252, whose characters > in the range 0x80 to 0x9F differ from ISO-8859-1. And if the app wants > UTF-8, chances are those characters are going to come up a lot. There is as > far as I know no user-selectable Windows codepage that can map all the > Unicode characters up to U+00FF. > > > *** Apache/NT/Python3 > Wrong, but always recoverable. > > Python retreives the bytes-encoded-into-Unicode-codepoints string directly > from the envvars. If the encoding should have been UTF-8 or something else > other than ISO-8859-1, we can recover the original bytes by re-encoding to > 8859-1, then decoding using the real charset. > > > *** IIS/NT/Python2 > Mostly unrecoverable data loss. > > IIS decodes submitted bytes to Unicode using UTF-8 when it can. But if there > is an invalid UTF-8 sequence in the bytes it will try again using the system > codepage. Python will then re-encode the Unicode envvar using the system > codepage. > > If the app is expecting UTF-8 we can decode what Python gives us using the > system codepage (ie. 'mbcs') and get back any of the submitted characters > that happened to be in this server's system codepage. Other characters may > be replaced by question marks or Windows's best attempts to give us > something useful, which at best may be a character shorn of diacriticals and > at worst something just completely wrong. > > NT's system codepage is never UTF-8, it is not a user-selectable option > never mind the default. We can improve our chances of getting more > characters through by using a character set with a wide repertoire, such as > cp932 (Shift-JIS). But it's still not really proper Unicode support. > > If the app is expecting something non-UTF-8 there's not much hope. Even if > it wanted the same character set as the system codepage, it can't be sure > that the submitted bytes didn't happen to also be a valid UTF-8 sequence, > and thus get mangled by IIS decoding them that way. > > > *** IIS/NT/Python3 > OK, as long as the app wants UTF-8. > > Incoming UTF-8 bytes are reliably converted to Unicode strings by IIS, and > directly read by Python from the envvars. > > If the application didn't want UTF-8 the situation is about as hopeless as > with Python2. > > > *** wsgiref.simple_server/(any)/Python2 > OK. > > Bytes all the way through. > > > *** wsgiref.simple_server/(any)/Python3: > Probably will be OK, as long as the app wants UTF-8. > > simple_server is currently broken in rc2. However judging by the code, it is > using urllib.parse.unquote, which assumes UTF-8, so it'll be fine for apps > that want UTF-8 and hopeless for those that don't. > > > I'd be very interested to hear what other servers are doing in this > situation - nginx? cherrypy's one? - and wonder if any particular behaviour > should be 'blessed'. > > -- > And Clover > mailto:and at doxdesk.com > http://www.doxdesk.com/ > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > http://mail.python.org/mailman/options/web-sig/graham.dumpleton%40gmail.com > From davidgshi at yahoo.co.uk Thu Nov 13 12:40:10 2008 From: davidgshi at yahoo.co.uk (David Shi) Date: Thu, 13 Nov 2008 11:40:10 +0000 (GMT) Subject: [Web-SIG] Looking for a Python Ajax Middleware script In-Reply-To: Message-ID: <508252.63885.qm@web26305.mail.ukl.yahoo.com> Dear Benji York, ? Thank you very much for letting me to know this. ? I am not a programmer but has a demonstration project to complete.? How can I easily to follow instructions to implement and test this? ? Does it work with Ajax? ? I am using Windows Server and IIS.?? I do not have facility to un-gzip it. ? Regards. ? David --- On Fri, 31/10/08, Benji York wrote: From: Benji York Subject: Re: [Web-SIG] Looking for a Python Ajax Middleware script To: davidgshi at yahoo.co.uk Cc: web-sig at python.org Date: Friday, 31 October, 2008, 1:13 PM 2008/10/31 David Shi : > > Has anyone tried the following with Python? [snip] It sounds like you could use zc.async: http://pypi.python.org/pypi/zc.async/ >From the above page: The zc.async package provides an easy-to-use Python tool that schedules work persistently and reliably across multiple processes and machines. -- Benji York -------------- next part -------------- An HTML attachment was scrubbed... URL: From benji at benjiyork.com Thu Nov 13 14:26:10 2008 From: benji at benjiyork.com (Benji York) Date: Thu, 13 Nov 2008 08:26:10 -0500 Subject: [Web-SIG] Looking for a Python Ajax Middleware script In-Reply-To: <508252.63885.qm@web26305.mail.ukl.yahoo.com> References: <508252.63885.qm@web26305.mail.ukl.yahoo.com> Message-ID: On Thu, Nov 13, 2008 at 6:40 AM, David Shi wrote: > Dear Benji York, > > Thank you very much for letting me to know this. > > I am not a programmer but has a demonstration project to complete. How can > I easily to follow instructions to implement and test this? I doubt it; zc.async is well documented, but it is only tool. Therefore you can use it to accomplish your goal, but you would have to do a non-trivial amount of programming to address your particular need. > Does it work with Ajax? That question doesn't really apply. -- Benji York Senior Software Engineer Zope Corporation From and-py at doxdesk.com Fri Nov 14 18:14:08 2008 From: and-py at doxdesk.com (Andrew Clover) Date: Fri, 14 Nov 2008 18:14:08 +0100 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <491B65C6.3020206@colorstudy.com> References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com> Message-ID: <491DB1E0.3070701@doxdesk.com> Ian Bicking wrote: > As it is (in Python 2), you should do something like > environ['PATH_INFO'].decode('utf8') and it should work. See the test cases in my original post: this doesn't work universally. On WinNT platforms PATH_INFO has already gone through a decode/encode cycle which almost always irretrievably mangles the value. > My understanding of this suggestion is that latin-1 is a way of > representing bytes as unicode. In other words, the values will be > unicode, but that will simply be a lie. Yes, that would be a sensible approach, but it is not what is actually happening in any WSGI environment I have tested. For example wsgiref.simple_server decodes using UTF-8 not 8859-1???or would do, if it were working. (It is currently broken in 3.0rc2; I put a hack in to get it running but I'm not really sure what the current status of simple_server in 3.0 is.) > A lot of what you write about has to do with CGI, which is the only > place WSGI interacts with os.environ. CGI is really an aspect of the > CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the WSGI > spec itself. Indeed, but we naturally have to take into account implementability on CGI. If a WSGI spec *requires* PATH_INFO to have been obtained using 8859-1 decoding???or UTF-8, which is the other sensible option given that most URIs today are UTF-8???then there cannot be a fully-compliant CGI-to-WSGI wrapper. Perhaps it's not the big issue it was when WSGI was first getting off the ground, but IMO it's still important. > Personally I'm more inclined to set up a policy on the WSGI server > itself with respect to the encoding, and then use real unicode > characters. I think we are stuck with Unicode environ at this point, given the CGI issue. But applications do need to know about the encoding in use, because they will (typically) be generating their own links. So an optional way to get that information to the application would be advantageous. I'm now of the opinion that the best way to do this is to standardise Apache's ?REQUEST_URI? as an optional environ item. This header is pre-URI-decoding, containing only %-sequences and not real high bytes, so it can be decoded to Unicode using any old charset without worry. An application wanting to support Unicode URIs (or encoded slashes in URIs*) could then sniff for REQUEST_URI and use it in preference to PATH_INFO where available. This is a bit more work for the application, but it should generally be handled transparently by a library/framework and supporting PATH_INFO in a portable fashion already has warts thanks to IIS's bugs, so the situation is not much worse than it already is. And of course we get support through mod_cgi and mod_wsgi automatically, so Graham doesn't have to do anything. :-) Graham Dumpleton wrote: > I can't really remember what the outcome of the discussion was. Not too much outcome really, unfortunately! You concluded: > there possibly still is an open question there on how > encoding of non ascii characters works in practice. We just need to > do some actual tests to see what happens and whether there is a problem. ...to which the answer is???judging by the results posted???probably ?yes?, I'm afraid! -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From ianb at colorstudy.com Fri Nov 14 18:47:50 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Fri, 14 Nov 2008 11:47:50 -0600 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <491DB1E0.3070701@doxdesk.com> References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com> <491DB1E0.3070701@doxdesk.com> Message-ID: <491DB9C6.1070909@colorstudy.com> Andrew Clover wrote: > Ian Bicking wrote: > >> As it is (in Python 2), you should do something like >> environ['PATH_INFO'].decode('utf8') and it should work. > > See the test cases in my original post: this doesn't work universally. > On WinNT platforms PATH_INFO has already gone through a decode/encode > cycle which almost always irretrievably mangles the value. This is something messed up with CGI on NT, and whatever server you are using, and perhaps the CGI adapter (maybe there's a way to get the raw environment without any encoding, for example?) -- it's mostly irrelevant to WSGI itself. >> My understanding of this suggestion is that latin-1 is a way of >> representing bytes as unicode. In other words, the values will be >> unicode, but that will simply be a lie. > > Yes, that would be a sensible approach, but it is not what is actually > happening in any WSGI environment I have tested. For example > wsgiref.simple_server decodes using UTF-8 not 8859-1???or would do, if > it were working. (It is currently broken in 3.0rc2; I put a hack in to > get it running but I'm not really sure what the current status of > simple_server in 3.0 is.) As far as I know, PJE just made the suggestion about Latin-1, I don't know if anything has actually been done in wsgiref or elsewhere to implement that. Honestly I don't know if anyone is doing anything with WSGI and Python 3. >> A lot of what you write about has to do with CGI, which is the only >> place WSGI interacts with os.environ. CGI is really an aspect of the >> CGI to WSGI adapter (like wsgiref.handlers.CGIHandler), and not the >> WSGI spec itself. > > Indeed, but we naturally have to take into account implementability on > CGI. If a WSGI spec *requires* PATH_INFO to have been obtained using > 8859-1 decoding???or UTF-8, which is the other sensible option given > that most URIs today are UTF-8???then there cannot be a fully-compliant > CGI-to-WSGI wrapper. Perhaps it's not the big issue it was when WSGI was > first getting off the ground, but IMO it's still important. This will presumably require hacks that might be system-dependent. Probably the current CGI adapter will just have to be a bit more complicated. Also, if Python is utf8-decoding the environment, we'll just have to shortcut that entirely, as you can't just undo utf8. I assume there is some way to get at the bytes in the environment, if not then that is a Python 3 bug. >> Personally I'm more inclined to set up a policy on the WSGI server >> itself with respect to the encoding, and then use real unicode >> characters. > > I think we are stuck with Unicode environ at this point, given the CGI > issue. But applications do need to know about the encoding in use, > because they will (typically) be generating their own links. So an > optional way to get that information to the application would be > advantageous. The encoding of the operating system (which presumably informs the encoding of os.environ) has nothing to do with the encoding of the web application. For the CGI adapter we simply need to find a way to ignore the system encoding. > I'm now of the opinion that the best way to do this is to standardise > Apache's ?REQUEST_URI? as an optional environ item. This header is > pre-URI-decoding, containing only %-sequences and not real high bytes, > so it can be decoded to Unicode using any old charset without worry. Unfortunately REQUEST_URI doesn't map directly to SCRIPT_NAME/PATH_INFO. I think it might be feasible to support an encoded version of SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names, and I don't know of any particular standard to base those names on), moving from the two keys to a single REQUEST_URI is not feasible. It's not that trivial to figure out where in REQUEST_URI the SCRIPT_NAME/PATH_INFO boundary really is, as there's many ways the unencoded values could be encoded. I guess you'd probably count segments, try to catch %2f (where the segments won't match up), and then double check that the decoded REQUEST_URI matches SCRIPT_NAME+PATH_INFO. > An application wanting to support Unicode URIs (or encoded slashes in > URIs*) could then sniff for REQUEST_URI and use it in preference to > PATH_INFO where available. This is a bit more work for the application, > but it should generally be handled transparently by a library/framework > and supporting PATH_INFO in a portable fashion already has warts thanks > to IIS's bugs, so the situation is not much worse than it already is. I use the distinction between SCRIPT_NAME and PATH_INFO extensively. And frankly IIS is probably less relevant to most developers than CGI. Anyway, any of these bugs are things that need to be fixed in the WSGI adapter, we must not let them propagate into the specification or applications. So if IIS has problems with PATH_INFO, the WSGI adapter (be it CGI or otherwise) should be configured to fix those problems up front. > And of course we get support through mod_cgi and mod_wsgi automatically, > so Graham doesn't have to do anything. :-) > > Graham Dumpleton wrote: > >> I can't really remember what the outcome of the discussion was. > > Not too much outcome really, unfortunately! You concluded: > >> there possibly still is an open question there on how >> encoding of non ascii characters works in practice. We just need to >> do some actual tests to see what happens and whether there is a problem. > > ...to which the answer is???judging by the results posted???probably > ?yes?, I'm afraid! -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From and-py at doxdesk.com Fri Nov 14 22:23:35 2008 From: and-py at doxdesk.com (Andrew Clover) Date: Fri, 14 Nov 2008 22:23:35 +0100 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <491DB9C6.1070909@colorstudy.com> References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com> <491DB1E0.3070701@doxdesk.com> <491DB9C6.1070909@colorstudy.com> Message-ID: <491DEC57.6080402@doxdesk.com> Ian Bicking wrote: > This is something messed up with CGI on NT, and whatever server you are > using, and perhaps the CGI adapter (maybe there's a way to get the raw > environment without any encoding, for example?) Python decodes the environ to its own copy (wrapped in os.environ) at interpreter startup time; there's no way to query the real ?live? environment that I know of. It'd require a C extension. > Honestly I don't know if anyone is doing anything with > WSGI and Python 3. I know Graham has done some work on mod_wsgi for 3.0, but no, I don't know anyone using it in anger. Is it worth submitting patches to simple_server to make it run on 3.0? Is it too late to include at this stage anyway? Shipping 3.0 with a non-functional wsgiref is a bit embarrassing. > I assume there is some way to get at the bytes in the environment, if not > then that is a Python 3 bug. There is not, and this appears to be deliberate. > I think it might be feasible to support an encoded version of > SCRIPT_NAME and PATH_INFO for WSGI 2.0 (creating entirely new key names, > and I don't know of any particular standard to base those names on), > moving from the two keys to a single REQUEST_URI is not feasible. That's certainly a possibility, but I feel it's easier to hitch a ride on the existing header, which despite being non-standard is still quite widely used. > I guess you'd probably count segments, try to catch %2f (where the > segments won't match up), and then double check that the decoded > REQUEST_URI matches SCRIPT_NAME+PATH_INFO. I'm currently testing with just the segment counting. It's only necessary that the segments from SCRIPT_NAME are matched and stripped, and those are extremely unlikely to contain ?%2F? because: - there aren't many filesystems that can accept ?/? as a filename character. RISC OS is the only one I can think of, and it by convention swaps ?/? and ?.? to compensate as it is, so even there you couldn't use ?%2F?; - there aren't many webservers that can map a file or alias to a path containing ?%2F?; - no-one wants to mount a webapp alias at such a weird name???it's only in the section corresponding to PATH_INFO that ?%2F? might ever be of use in practice. In the worst case, many applications already know and can strip the URL at which they're mounted, but unless there's a legitimate ?%2F? in their SCRIPT_NAME it doesn't actually matter. > frankly IIS is probably less relevant to most developers than CGI. Er... really? You and I may not favour it, but it's ?35% of the world out there, not something we can afford to ignore IMO. > So if IIS has problems with PATH_INFO, the WSGI adapter > (be it CGI or otherwise) should be configured to fix those problems up > front. What I'm saying is that neither Apache's nor IIS's behaviour can be considered clearly correct or wrong at this point, and there is no way a WSGI adapter living underneath them *can* fix up the differences. (There is an problem with PATH_INFO that a WSGI adapter *could* clear up, which is that IIS makes PATH_INFO the entire path including SCRIPT_NAME. I'm not sure whether it's worth fixing that up in the adapter layer though... it's possible some frameworks are already dealing with it, and might even be relying on it!) -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From ianb at colorstudy.com Sun Nov 16 04:16:41 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Sat, 15 Nov 2008 21:16:41 -0600 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification Message-ID: <491F9099.2090508@colorstudy.com> We need to make a revision to the WSGI spec to say that environ['wsgi.input'].readline takes an optional size argument. It always does in practice (except in wsgiref.validate.validator, rendering that validator useless), and is required to in practice, because everyone uses cgi.FieldStorage, and it passes in that argument. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From graham.dumpleton at gmail.com Sun Nov 16 06:22:39 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sun, 16 Nov 2008 16:22:39 +1100 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <491F9099.2090508@colorstudy.com> References: <491F9099.2090508@colorstudy.com> Message-ID: <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> 2008/11/16 Ian Bicking : > We need to make a revision to the WSGI spec to say that > environ['wsgi.input'].readline takes an optional size argument. It always > does in practice (except in wsgiref.validate.validator, rendering that > validator useless), and is required to in practice, because everyone uses > cgi.FieldStorage, and it passes in that argument. This has been brought up numerous times before. There are other things about wsgi.input that really need to be changed as well to make it more useful. When I have pushed for revised specification before I could never get enough interest in it from the people that most would perceive are the ones who oversee the PEP. Graham From stephan at transvection.de Sun Nov 16 14:51:09 2008 From: stephan at transvection.de (Stephan Diehl) Date: Sun, 16 Nov 2008 14:51:09 +0100 Subject: [Web-SIG] possible bug in cgi Message-ID: <4920254D.4010609@transvection.de> this is probably not the right place to ask, but I found some irritating behaviour with the cgi module and are unsure if it's a bug (seen on python2.5 and python2.6) The problem is this: >>> import cgi >>> cgi.FieldStorage(environ={'QUERY_STRING':u'a=b'}) FieldStorage(None, None, [MiniFieldStorage('a\x00', '\x00b\x00')]) >>> cgi.FieldStorage(environ={'QUERY_STRING':'a=b'}) FieldStorage(None, None, [MiniFieldStorage('a', 'b')]) When creating a FieldStorage with an environment that contains a unicode 'QUERY_STRING' value, garbage is returned. The ultimate problem seems to be, that the QUERY_STRING is converted to a cStringIO object which holds only the memory representation of unicode strings. Regards, Stephan From ianb at colorstudy.com Sun Nov 16 19:06:15 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Sun, 16 Nov 2008 12:06:15 -0600 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> Message-ID: <49206117.4020103@colorstudy.com> Graham Dumpleton wrote: > 2008/11/16 Ian Bicking : >> We need to make a revision to the WSGI spec to say that >> environ['wsgi.input'].readline takes an optional size argument. It always >> does in practice (except in wsgiref.validate.validator, rendering that >> validator useless), and is required to in practice, because everyone uses >> cgi.FieldStorage, and it passes in that argument. > > This has been brought up numerous times before. There are other things > about wsgi.input that really need to be changed as well to make it > more useful. When I have pushed for revised specification before I > could never get enough interest in it from the people that most would > perceive are the ones who oversee the PEP. Yes, this has been passed over before. To resolve this, let's just not pass it over this time? This is a relatively small change to the WSGI spec, because it represents standard practice -- this change is simply getting the spec in line with implementations. -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From fumanchu at aminus.org Sun Nov 16 21:39:53 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Sun, 16 Nov 2008 12:39:53 -0800 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <49206117.4020103@colorstudy.com> References: <491F9099.2090508@colorstudy.com><88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> Message-ID: +1 > -----Original Message----- > From: web-sig-bounces+fumanchu=aminus.org at python.org [mailto:web-sig- > bounces+fumanchu=aminus.org at python.org] On Behalf Of Ian Bicking > Sent: Sunday, November 16, 2008 10:06 AM > To: Graham Dumpleton > Cc: Web SIG > Subject: Re: [Web-SIG] Revising environ['wsgi.input'].readline in the > WSGI specification > > Graham Dumpleton wrote: > > 2008/11/16 Ian Bicking : > >> We need to make a revision to the WSGI spec to say that > >> environ['wsgi.input'].readline takes an optional size argument. It > always > >> does in practice (except in wsgiref.validate.validator, rendering > that > >> validator useless), and is required to in practice, because everyone > uses > >> cgi.FieldStorage, and it passes in that argument. > > > > This has been brought up numerous times before. There are other > things > > about wsgi.input that really need to be changed as well to make it > > more useful. When I have pushed for revised specification before I > > could never get enough interest in it from the people that most would > > perceive are the ones who oversee the PEP. > > Yes, this has been passed over before. To resolve this, let's just not > pass it over this time? This is a relatively small change to the WSGI > spec, because it represents standard practice -- this change is simply > getting the spec in line with implementations. > > -- > Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web- > sig/fumanchu%40aminus.org From mhammond at skippinet.com.au Mon Nov 17 03:36:21 2008 From: mhammond at skippinet.com.au (Mark Hammond) Date: Mon, 17 Nov 2008 13:36:21 +1100 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <491DEC57.6080402@doxdesk.com> References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com> <491DB1E0.3070701@doxdesk.com> <491DB9C6.1070909@colorstudy.com> <491DEC57.6080402@doxdesk.com> Message-ID: <000c01c9485d$49ff0d20$ddfd2760$@com.au> > Python decodes the environ to its own copy (wrapped in os.environ) at > interpreter startup time; I don't think Python explicitly converts it - the CRT's ANSI version of environ is used, so the resulting strings should be encoded using the 'mbcs' encoding. What mangling do you see? > there's no way to query the real ?live? > environment that I know of. It'd require a C extension. win32api and ctypes would both let you call the Windows API. > What I'm saying is that neither Apache's nor IIS's behaviour can be > considered clearly correct or wrong at this point, and there is no way > a WSGI adapter living underneath them *can* fix up the differences. What is IIS doing wrong here? IIUC, ISAPI treats everything as bytes, so it is more likely to be the "higher-level" layers built on ISAPI (eg, ASP) which assume encodings. Apologies if you have already answered any of these - I haven?t been following that closely... Cheers, Mark From and-py at doxdesk.com Mon Nov 17 18:54:24 2008 From: and-py at doxdesk.com (Andrew Clover) Date: Mon, 17 Nov 2008 18:54:24 +0100 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <000c01c9485d$49ff0d20$ddfd2760$@com.au> References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com> <491DB1E0.3070701@doxdesk.com> <491DB9C6.1070909@colorstudy.com> <491DEC57.6080402@doxdesk.com> <000c01c9485d$49ff0d20$ddfd2760$@com.au> Message-ID: <4921AFD0.7050506@doxdesk.com> Mark Hammond wrote: > I don't think Python explicitly converts it - the CRT's ANSI version > of environ is used Yes, it would be the CRT on Python 2.x. (Python 3.0 on non-NT does a conversion always using UTF-8, if I'm reading convertenviron right.) > so the resulting strings should be encoded using the 'mbcs' encoding. > What mangling do you see? Correct, it's characters unencodable in mbcs that are lost*. mbcs is never equivalent to UTF-8 (which would allow us to recover characters on IIS) or ISO-8859 (which would allow us to receover characters on Apache-for-Windows) so there's always heavy lossage. (* - replaced with ? or Windows's attempt to substitute something that looks vaguely like the original character.) > win32api and ctypes would both let you call the Windows API. Ah! I had considered the win32 extensions but it's a bit of a dependency... I'd forgotten that we get ctypes for free in 2.5. So we'd be looking at: ctypes.windll.kernel32.GetEnvironmentVariableW(u'PATH_INFO', ...) when CPython 2.5+/NT is detected, right? That increases the number of situations in which we can feasibly recover URIs that are valid UTF-8 sequences (modulo the slash anyway). Doing the actual recovery still requires some server-sniffing though. > What is IIS doing wrong here? It's not wrong as such. There are three reasonable choices for decoding header values before putting them in a Unicode environment, and the CGI spec, as it knows nothing about Unicode environment variables, fails to specify which: 1. ISO-8859-1 (which ensures bytes can be recovered) 2. UTF-8 (since most URIs are effectively UTF-8 today) 3. Configured system codepage (mbcs) Apache [with mod_cgi or mod_wsgi] decides on (1). IIS tries for (2), falling back to (3) on invalid sequences. The text concerning Python 3.0 in the WSGI Amendments page could be read as blessing Apache's behaviour. However wsgiref.simple_server currently also goes for (2), although that probably can't be considered canonical. I'd be interested to know what other WSGI servers do. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From and-py at doxdesk.com Mon Nov 17 18:55:48 2008 From: and-py at doxdesk.com (Andrew Clover) Date: Mon, 17 Nov 2008 18:55:48 +0100 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <49206117.4020103@colorstudy.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> Message-ID: <4921B024.90804@doxdesk.com> Ian Bicking wrote: > To resolve this, let's just not pass it over this time? +1 -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From mark.mchristensen at gmail.com Mon Nov 17 19:43:48 2008 From: mark.mchristensen at gmail.com (Mark Ramm) Date: Mon, 17 Nov 2008 13:43:48 -0500 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921B024.90804@doxdesk.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> Message-ID: On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover wrote: > Ian Bicking wrote: > >> To resolve this, let's just not pass it over this time? Totally agreed. What exactly needs to happen next? From ianb at colorstudy.com Mon Nov 17 19:51:04 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 17 Nov 2008 12:51:04 -0600 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> Message-ID: <4921BD18.8000908@colorstudy.com> Mark Ramm wrote: > On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover wrote: >> Ian Bicking wrote: >> >>> To resolve this, let's just not pass it over this time? > > Totally agreed. > > What exactly needs to happen next? We need to propose a change to the WSGI specification. I propose, in "Input and Error Streams" (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we change it to have "readline(hint)" and expand Note 3 to include readline as well as readlines, removing Note 2. Also I suppose some sort of change note in the specification? Does this sound like a sufficient change to the spec, and are there any objections to the change? -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From tseaver at palladion.com Mon Nov 17 20:01:08 2008 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 17 Nov 2008 14:01:08 -0500 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921BD18.8000908@colorstudy.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ian Bicking wrote: > Mark Ramm wrote: >> On Mon, Nov 17, 2008 at 12:55 PM, Andrew Clover wrote: >>> Ian Bicking wrote: >>> >>>> To resolve this, let's just not pass it over this time? >> Totally agreed. >> >> What exactly needs to happen next? > > We need to propose a change to the WSGI specification. I propose, in > "Input and Error Streams" > (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we > change it to have "readline(hint)" and expand Note 3 to include readline > as well as readlines, removing Note 2. Also I suppose some sort of > change note in the specification? > > Does this sound like a sufficient change to the spec, and are there any > objections to the change? +1 from me. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD4DBQFJIb90+gerLs4ltQ4RAt/5AJdkn2ObmgAN2SU3dd8E4KNXolz5AJwIgOJP D9ZKBwF5jUunMrlQXaDbkA== =hUNu -----END PGP SIGNATURE----- From manlio_perillo at libero.it Mon Nov 17 20:49:16 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 17 Nov 2008 20:49:16 +0100 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921BD18.8000908@colorstudy.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> Message-ID: <4921CABC.60904@libero.it> Ian Bicking ha scritto: > [...] > We need to propose a change to the WSGI specification. I propose, in > "Input and Error Streams" > (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we > change it to have "readline(hint)" and expand Note 3 to include readline > as well as readlines, removing Note 2. Also I suppose some sort of > change note in the specification? > > Does this sound like a sufficient change to the spec, and are there any > objections to the change? > Fine for me, but of course we need to do this as: 1) Errata to WSGI 1.0 or 2) WSGI 1.1 or 3) WSGI 2.0 You can't just modify the current WSGI 1.0 spec. I'm for 2), with the other clarifications about WSGI we have discussed in the past. Regards Manlio Perillo From ianb at colorstudy.com Mon Nov 17 21:23:13 2008 From: ianb at colorstudy.com (Ian Bicking) Date: Mon, 17 Nov 2008 14:23:13 -0600 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921CABC.60904@libero.it> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> Message-ID: <4921D2B1.5060004@colorstudy.com> Manlio Perillo wrote: > Ian Bicking ha scritto: >> [...] >> We need to propose a change to the WSGI specification. I propose, in >> "Input and Error Streams" >> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we >> change it to have "readline(hint)" and expand Note 3 to include >> readline as well as readlines, removing Note 2. Also I suppose some >> sort of change note in the specification? >> >> Does this sound like a sufficient change to the spec, and are there >> any objections to the change? >> > > Fine for me, but of course we need to do this as: > 1) Errata to WSGI 1.0 > or > 2) WSGI 1.1 > or > 3) WSGI 2.0 > > You can't just modify the current WSGI 1.0 spec. > > I'm for 2), with the other clarifications about WSGI we have discussed > in the past. I'm for 1. What other clarifications were you thinking of? -- Ian Bicking : ianb at colorstudy.com : http://blog.ianbicking.org From pje at telecommunity.com Mon Nov 17 21:25:41 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Mon, 17 Nov 2008 15:25:41 -0500 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921CABC.60904@libero.it> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> Message-ID: <20081117202418.B603F3A4092@sparrow.telecommunity.com> At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote: >Ian Bicking ha scritto: >>[...] >>We need to propose a change to the WSGI specification. I propose, >>in "Input and Error Streams" >>(http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) >>we change it to have "readline(hint)" and expand Note 3 to include >>readline as well as readlines, removing Note 2. Also I suppose >>some sort of change note in the specification? >>Does this sound like a sufficient change to the spec, and are there >>any objections to the change? > >Fine for me, but of course we need to do this as: >1) Errata to WSGI 1.0 >or >2) WSGI 1.1 >or >3) WSGI 2.0 > >You can't just modify the current WSGI 1.0 spec. > >I'm for 2), with the other clarifications about WSGI we have >discussed in the past. I'm more inclined towards #1. But in any event we need to get clearer about how the amendment or erratum will be phrased. From fumanchu at aminus.org Mon Nov 17 22:00:02 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Mon, 17 Nov 2008 13:00:02 -0800 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921D2B1.5060004@colorstudy.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> <4921D2B1.5060004@colorstudy.com> Message-ID: Ian Bicking wrote: > Manlio Perillo wrote: > > Ian Bicking ha scritto: > >> [...] > >> We need to propose a change to the WSGI specification. I propose, > in > >> "Input and Error Streams" > >> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) > we > >> change it to have "readline(hint)" and expand Note 3 to include > >> readline as well as readlines, removing Note 2. Also I suppose some > >> sort of change note in the specification? > >> > >> Does this sound like a sufficient change to the spec, and are there > >> any objections to the change? > >> > > > > Fine for me, but of course we need to do this as: > > 1) Errata to WSGI 1.0 > > or > > 2) WSGI 1.1 > > or > > 3) WSGI 2.0 > > > > You can't just modify the current WSGI 1.0 spec. > > > > I'm for 2), with the other clarifications about WSGI we have > discussed > > in the past. > > I'm for 1. What other clarifications were you thinking of? PLEASE don't ask, don't tell. Let's not complicate this change by conflating it with others yet again. Robert Brewer fumanchu at aminus.org From manlio_perillo at libero.it Mon Nov 17 22:13:05 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 17 Nov 2008 22:13:05 +0100 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <20081117202418.B603F3A4092@sparrow.telecommunity.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> <20081117202418.B603F3A4092@sparrow.telecommunity.com> Message-ID: <4921DE61.8030204@libero.it> Phillip J. Eby ha scritto: > At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote: >> Ian Bicking ha scritto: >>> [...] >>> We need to propose a change to the WSGI specification. I propose, in >>> "Input and Error Streams" >>> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we >>> change it to have "readline(hint)" and expand Note 3 to include >>> readline as well as readlines, removing Note 2. Also I suppose some >>> sort of change note in the specification? >>> Does this sound like a sufficient change to the spec, and are there >>> any objections to the change? >> >> Fine for me, but of course we need to do this as: >> 1) Errata to WSGI 1.0 >> or >> 2) WSGI 1.1 >> or >> 3) WSGI 2.0 >> >> You can't just modify the current WSGI 1.0 spec. >> >> I'm for 2), with the other clarifications about WSGI we have discussed >> in the past. > > I'm more inclined towards #1. I'm not sure, since it is an API change; of course if there was an error in the API this should be an errata, but there is a rationale behind the current API. I'm fine, however, with an amendment. > [...] Regards Manlio Perillo From manlio_perillo at libero.it Mon Nov 17 22:29:18 2008 From: manlio_perillo at libero.it (Manlio Perillo) Date: Mon, 17 Nov 2008 22:29:18 +0100 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921D2B1.5060004@colorstudy.com> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> <4921D2B1.5060004@colorstudy.com> Message-ID: <4921E22E.90001@libero.it> Ian Bicking ha scritto: > [...] >> Fine for me, but of course we need to do this as: >> 1) Errata to WSGI 1.0 >> or >> 2) WSGI 1.1 >> or >> 3) WSGI 2.0 >> >> You can't just modify the current WSGI 1.0 spec. >> >> I'm for 2), with the other clarifications about WSGI we have discussed >> in the past. > > I'm for 1. What other clarifications were you thinking of? > Here is a list of messages I have posted in the past. - start_response and error checking 25 September 2007 http://mail.python.org/pipermail/web-sig/2007-September/002771.html - hop-by-hop headers handling 1 October 2007 http://mail.python.org/pipermail/web-sig/2007-October/002775.html - HTTP_CONTENT_TYPE and HTTP_CONTENT_LENGTH 12 December 2007 http://mail.python.org/pipermail/web-sig/2007-December/003014.html - a possible error in the WSGI spec 20 December 2007 http://mail.python.org/pipermail/web-sig/2007-December/003064.html - calling start_response and the write from a separate thread 27 December 2007 http://mail.python.org/pipermail/web-sig/2007-December/003104.html - WSGI and PEP 325 20 May 2008 http://mail.python.org/pipermail/web-sig/2008-May/003438.html I'm rather sure there were other threads about clarifications of WSGI 1.0. One of these was about if a WSGI gateway is allowed to skip the generation of the request body (assuming the WSGI applications returns a generator) if this is not required (the client cached copy of the request entity is up to date and the server is going to return 304 Not Modified) Regards Manlio Perillo From tseaver at palladion.com Mon Nov 17 22:36:02 2008 From: tseaver at palladion.com (Tres Seaver) Date: Mon, 17 Nov 2008 16:36:02 -0500 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <4921DE61.8030204@libero.it> References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> <20081117202418.B603F3A4092@sparrow.telecommunity.com> <4921DE61.8030204@libero.it> Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Manlio Perillo wrote: > Phillip J. Eby ha scritto: >> At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote: >>> Ian Bicking ha scritto: >>>> [...] >>>> We need to propose a change to the WSGI specification. I propose, in >>>> "Input and Error Streams" >>>> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we >>>> change it to have "readline(hint)" and expand Note 3 to include >>>> readline as well as readlines, removing Note 2. Also I suppose some >>>> sort of change note in the specification? >>>> Does this sound like a sufficient change to the spec, and are there >>>> any objections to the change? >>> Fine for me, but of course we need to do this as: >>> 1) Errata to WSGI 1.0 >>> or >>> 2) WSGI 1.1 >>> or >>> 3) WSGI 2.0 >>> >>> You can't just modify the current WSGI 1.0 spec. >>> >>> I'm for 2), with the other clarifications about WSGI we have discussed >>> in the past. >> I'm more inclined towards #1. > > I'm not sure, since it is an API change; of course if there was an error > in the API this should be an errata, but there is a rationale behind the > current API. > > I'm fine, however, with an amendment. Isn't the rationale completely defeated by the equivalent, relaxed form for 'readlines' (note #3). That was why I voted +1: I couldn't see that relaxing 'readline' to match 'readlines' would make life any harder on server implementers. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver at palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJIePC+gerLs4ltQ4RAnsrAKCflurxZqxfJvjgX2YeU9XlXFDvPgCfQRcn rHK7/cvRh9zm5x8PyTq3ZLE= =c8v8 -----END PGP SIGNATURE----- From graham.dumpleton at gmail.com Mon Nov 17 23:30:50 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Tue, 18 Nov 2008 09:30:50 +1100 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> <20081117202418.B603F3A4092@sparrow.telecommunity.com> <4921DE61.8030204@libero.it> Message-ID: <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com> 2008/11/18 Tres Seaver : > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Manlio Perillo wrote: >> Phillip J. Eby ha scritto: >>> At 08:49 PM 11/17/2008 +0100, Manlio Perillo wrote: >>>> Ian Bicking ha scritto: >>>>> [...] >>>>> We need to propose a change to the WSGI specification. I propose, in >>>>> "Input and Error Streams" >>>>> (http://www.python.org/dev/peps/pep-0333/#input-and-error-streams) we >>>>> change it to have "readline(hint)" and expand Note 3 to include >>>>> readline as well as readlines, removing Note 2. Also I suppose some >>>>> sort of change note in the specification? >>>>> Does this sound like a sufficient change to the spec, and are there >>>>> any objections to the change? >>>> Fine for me, but of course we need to do this as: >>>> 1) Errata to WSGI 1.0 >>>> or >>>> 2) WSGI 1.1 >>>> or >>>> 3) WSGI 2.0 >>>> >>>> You can't just modify the current WSGI 1.0 spec. >>>> >>>> I'm for 2), with the other clarifications about WSGI we have discussed >>>> in the past. >>> I'm more inclined towards #1. >> >> I'm not sure, since it is an API change; of course if there was an error >> in the API this should be an errata, but there is a rationale behind the >> current API. >> >> I'm fine, however, with an amendment. > > Isn't the rationale completely defeated by the equivalent, relaxed form > for 'readlines' (note #3). That was why I voted +1: I couldn't see > that relaxing 'readline' to match 'readlines' would make life any harder > on server implementers. I would be for (1) errata or amendment as reality is that there is probably no WSGI implementation that disallows an argument to readline() given that certain Python code such as cgi.FieldStorage wouldn't work otherwise. For such a clarification on existing practice, I see no point in having to change wsgi.version in environ as it would just cause confusion. I would also like to see other changes to WSGI specification but now is not the time, let us at least though get this obvious issue with API dealt with. After that we can then perhaps have a discussion of future of WSGI specification and whether there really is any interest in future versions with more significant changes. Although, personally I will not be holding my breath for that to happen. :-) Graham From pywebsig at xhaus.com Tue Nov 18 13:02:37 2008 From: pywebsig at xhaus.com (Alan Kennedy) Date: Tue, 18 Nov 2008 12:02:37 +0000 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com> References: <491F9099.2090508@colorstudy.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> <20081117202418.B603F3A4092@sparrow.telecommunity.com> <4921DE61.8030204@libero.it> <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com> Message-ID: <4a951aa00811180402g68e067a3m1e2a6dddb29d4e20@mail.gmail.com> [Graham] > I would be for (1) errata or amendment as reality is that there is > probably no WSGI implementation that disallows an argument to > readline() given that certain Python code such as cgi.FieldStorage > wouldn't work otherwise. > > For such a clarification on existing practice, I see no point in > having to change wsgi.version in environ as it would just cause > confusion. +1 [Graham] > I would also like to see other changes to WSGI specification but now > is not the time, let us at least though get this obvious issue with > API dealt with. After that we can then perhaps have a discussion of > future of WSGI specification and whether there really is any interest > in future versions with more significant changes. +1 Alan. From pje at telecommunity.com Tue Nov 18 16:44:57 2008 From: pje at telecommunity.com (Phillip J. Eby) Date: Tue, 18 Nov 2008 10:44:57 -0500 Subject: [Web-SIG] Revising environ['wsgi.input'].readline in the WSGI specification In-Reply-To: <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com > References: <491F9099.2090508@colorstudy.com> <88e286470811152122n3e6c6a81ma40f8e55a704313d@mail.gmail.com> <49206117.4020103@colorstudy.com> <4921B024.90804@doxdesk.com> <4921BD18.8000908@colorstudy.com> <4921CABC.60904@libero.it> <20081117202418.B603F3A4092@sparrow.telecommunity.com> <4921DE61.8030204@libero.it> <88e286470811171430id838c44xbd06acb493c524f6@mail.gmail.com> Message-ID: <20081118154328.088B23A411A@sparrow.telecommunity.com> At 09:30 AM 11/18/2008 +1100, Graham Dumpleton wrote: >I would be for (1) errata or amendment as reality is that there is >probably no WSGI implementation that disallows an argument to >readline() given that certain Python code such as cgi.FieldStorage >wouldn't work otherwise. Please note that that was a change in Python 2.5; older Pythons (including Jython until very recently) would not have needed a readline() argument, and so are less likely to have been tested that way. From and-py at doxdesk.com Wed Nov 19 01:40:57 2008 From: and-py at doxdesk.com (Andrew Clover) Date: Wed, 19 Nov 2008 01:40:57 +0100 Subject: [Web-SIG] WSGI Amendments thoughts: the horror of charsets In-Reply-To: <4921AFD0.7050506@doxdesk.com> References: <491B2CFE.7060502@doxdesk.com> <491B65C6.3020206@colorstudy.com> <491DB1E0.3070701@doxdesk.com> <491DB9C6.1070909@colorstudy.com> <491DEC57.6080402@doxdesk.com> <000c01c9485d$49ff0d20$ddfd2760$@com.au> <4921AFD0.7050506@doxdesk.com> Message-ID: <49236099.7070604@doxdesk.com> > ctypes.windll.kernel32.GetEnvironmentVariableW(u'PATH_INFO', ...) Hmm... it turns out: no. IIS appears to be mangling characters that are not in mbcs even *before* it puts the decoded value into the envvars. The same is true with isapi_wsgi, which is the only other WSGI adapter I know of for IIS. This gets the same mangled byte string from GetServerVariable as Python gets from the envvars, so it looks like this is a mistake IIS is making further up before it even hits the CGI handler. Maybe someone more familiar with ISAPI knows a better way to read PATH_INFO than GetServerVariable, but I can't see anything promising in MSDN. So it would seem to be impossible at the moment to have Unicode paths work under IIS at all. The ctypes approach could rescue bytes for the Apache/nt/Py2 combination (perhaps also from libc.getenv for Apache/posix/Py3), but then Apache already gives us REQUEST_URI which is a much easier workaround. There might be CGI servers for Windows where ctypes could serve some purpose, but I can't think of any currently in use other than the Big Two. In summary, to get the original submitted byte strings for PATH_INFO: Apache/nt/Py2 process REQUEST_URI Apache/posix/Py2 use PATH_INFO directly (or process REQUEST_URI) Apache/nt/Py3 encode PATH_INFO to ISO-8859-1 (or process REQUEST_URI) Apache/posix/Py3 process REQUEST_URI IIS/nt/Py2 decode PATH_INFO from mbcs, then encode to UTF-8 FAIL for characters not in current mbcs FAIL for non-UTF-8 input IIS/nt/Py3 encode PATH_INFO to UTF-8 FAIL for characters not in current mbcs FAIL for non-UTF-8 input wsgiref.simple_server/Py2 use PATH_INFO directly wsgiref.simple_server/Py3 remains to be seen, but at the moment encode PATH_INFO to UTF-8 FAIL for non-UTF-8 input cherrypy.wsgiserver/Py2 use PATH_INFO directly cherrypy.wsgiserver/Py3 remains to be seen, but at the moment encode PATH_INFO to UTF-8 FAIL for non-UTF-8 input -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From randy at rcs-comp.com Sat Nov 22 06:50:45 2008 From: randy at rcs-comp.com (Randy Syring) Date: Sat, 22 Nov 2008 00:50:45 -0500 Subject: [Web-SIG] Implementing File Upload Size Limits Message-ID: <49279DB5.9090109@rcs-comp.com> I am looking for opinions and thoughts on best practice for limiting file upload size. I have a few considerations: * Ultimately, I would want my application with my method of handling forms to be able to give the user a message that the file size was too big. That means that however, the size is limited, just blanking out wsgi.input and setting content-length to zero doesn't seem correct. That would make it look like the form wasn't submitted with any data I believe. * Given the above, it seems that something would need to get put in the environment to tell middleware and the application that the file input was aborted, but what would be the best way for doing it? Should it be some kind of standard, or just dependent on your server or middleware? * It seems best to implement this functionality as the very first middleware in the stack. Since other middleware read and manipulate wsgi.input, handling the upload size at the application level wouldn't prevent middlware from wasting resources dealing with a very large file. Is it possible to prevent the server from even accepting all the data (i.e. trying to save bandwidth and server resources) if the content-length is known to be too big? Or is the server required to take all the client's data regardless, even if it ends up going in the bit bucket? I realize some of this is server specific, not WSGI specific, but I would be interested in knowing how the most popular servers handle this or what the HTTP specs require if anyone knows. Thanks in advance for any insight you might be able to provide. -- -------------------------------------- Randy Syring RCS Computers & Web Solutions 502-644-4776 http://www.rcs-comp.com "Whether, then, you eat or drink or whatever you do, do all to the glory of God." 1 Cor 10:31 -------------- next part -------------- An HTML attachment was scrubbed... URL: From randy at rcs-comp.com Sat Nov 22 10:07:53 2008 From: randy at rcs-comp.com (Randy Syring) Date: Sat, 22 Nov 2008 04:07:53 -0500 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <49279DB5.9090109@rcs-comp.com> References: <49279DB5.9090109@rcs-comp.com> Message-ID: <4927CBE9.7060609@rcs-comp.com> I did find this: http://wiki.pylonshq.com/display/pylonscookbook/A+Better+Way+To+Limit+File+Upload+Size Which was good, but still leaves some unanswered questions: * What if one is not using the paste http server? * This method gives an unfriendly response. What would be the best method to propagate this error condition down to the app so that a message could be given to the user in the context of the form they had previously submitted (i.e. an error message under the input field reminding them of the max upload size and even possibly telling them how big the file was they uploaded). Thanks. -------------------------------------- Randy Syring RCS Computers & Web Solutions 502-644-4776 http://www.rcs-comp.com "Whether, then, you eat or drink or whatever you do, do all to the glory of God." 1 Cor 10:31 Randy Syring wrote: > I am looking for opinions and thoughts on best practice for limiting > file upload size. I have a few considerations: > > * Ultimately, I would want my application with my method of > handling forms to be able to give the user a message that the > file size was too big. That means that however, the size is > limited, just blanking out wsgi.input and setting content-length > to zero doesn't seem correct. That would make it look like the > form wasn't submitted with any data I believe. > * Given the above, it seems that something would need to get put > in the environment to tell middleware and the application that > the file input was aborted, but what would be the best way for > doing it? Should it be some kind of standard, or just dependent > on your server or middleware? > * It seems best to implement this functionality as the very first > middleware in the stack. Since other middleware read and > manipulate wsgi.input, handling the upload size at the > application level wouldn't prevent middlware from wasting > resources dealing with a very large file. > > Is it possible to prevent the server from even accepting all the data > (i.e. trying to save bandwidth and server resources) if the > content-length is known to be too big? Or is the server required to > take all the client's data regardless, even if it ends up going in the > bit bucket? I realize some of this is server specific, not WSGI > specific, but I would be interested in knowing how the most popular > servers handle this or what the HTTP specs require if anyone knows. > > Thanks in advance for any insight you might be able to provide. > -- > -------------------------------------- > Randy Syring > RCS Computers & Web Solutions > 502-644-4776 > http://www.rcs-comp.com > > "Whether, then, you eat or drink or > whatever you do, do all to the glory > of God." 1 Cor 10:31 > > ------------------------------------------------------------------------ > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: http://mail.python.org/mailman/options/web-sig/randy%40rcs-comp.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.dumpleton at gmail.com Sat Nov 22 10:12:26 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 22 Nov 2008 20:12:26 +1100 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <49279DB5.9090109@rcs-comp.com> References: <49279DB5.9090109@rcs-comp.com> Message-ID: <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> 2008/11/22 Randy Syring : > I am looking for opinions and thoughts on best practice for limiting file > upload size. I have a few considerations: > > Ultimately, I would want my application with my method of handling forms to > be able to give the user a message that the file size was too big. That > means that however, the size is limited, just blanking out wsgi.input and > setting content-length to zero doesn't seem correct. That would make it > look like the form wasn't submitted with any data I believe. > Given the above, it seems that something would need to get put in the > environment to tell middleware and the application that the file input was > aborted, but what would be the best way for doing it? Should it be some > kind of standard, or just dependent on your server or middleware? > It seems best to implement this functionality as the very first middleware > in the stack. Since other middleware read and manipulate wsgi.input, > handling the upload size at the application level wouldn't prevent middlware > from wasting resources dealing with a very large file. > > Is it possible to prevent the server from even accepting all the data (i.e. > trying to save bandwidth and server resources) if the content-length is > known to be too big? Or is the server required to take all the client's > data regardless, even if it ends up going in the bit bucket? I realize some > of this is server specific, not WSGI specific, but I would be interested in > knowing how the most popular servers handle this or what the HTTP specs > require if anyone knows. > > Thanks in advance for any insight you might be able to provide. If you use Apache/mod_wsgi to host your WSGI application, the best way of handling this is use the Apache LimitRequestNody directive for appropriate context. This will result in Apache returning a HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If you need a custom error document for that response type use Apache ErrorDocument directive to specify URL of handler which would generate it. Except for the custom error document if delegated to the WSGI application, doing it this way results in it all being handled by Apache/mod_wsgi and your WSGI application will not even be invoked. The request body content would also not even be read by Apache at all. Do note that whether this avoids the client sending the request body input depends on whether the client was expecting a '100 Continue' response before it send the data. Most web browsers still I believe don't use '100 Continue' response. This would be the preferred solution for Apache/mod_wsgi as it is handled at lowest levels and guaranteed that request content wouldn't be read at that point. It is however taking control out of your application. For Apache/mod_wsgi, if you do not do it this way but instead validate content length in the WSGI application and have the WSGI application return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then whether the request content gets read depends on whether you are using embedded mode or daemon mode of mod_wsgi. If you use embedded mode, so long as your WSGI application doesn't read the input and just returns the error response, the request content wouldn't be read at all. If you are using daemon mode however, then the request content would always be read by Apache child worker process, even if client asked for '100 Continue' response. This is because the Apache child worker process will always proxy request content to the daemon process. Anyway, that is how things are for Apache/mod_wsgi. Graham From randy at rcs-comp.com Sat Nov 22 19:06:15 2008 From: randy at rcs-comp.com (Randy Syring) Date: Sat, 22 Nov 2008 13:06:15 -0500 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> Message-ID: <49284A17.2050801@rcs-comp.com> [forgot to copy list] Graham Dumpleton wrote: > 2008/11/22 Randy Syring : > >> I am looking for opinions and thoughts on best practice for limiting file >> upload size. I have a few considerations: >> >> >> > If you use Apache/mod_wsgi to host your WSGI application, the best way > of handling this is use the Apache LimitRequestNody directive for > appropriate context. This will result in Apache returning a > HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response to the client. If > you need a custom error document for that response type use Apache > ErrorDocument directive to specify URL of handler which would generate > it. > Graham, Thank you for your response. What you noted above does seem to be the lowest level solution possible if you are using apache. I suppose using an error document that is part of the application would at least allow me to serve a specific page from my application that could detail the error. If I wanted to get fancy, each time a form with an input element was sent to a user, I could save that path in a special variable in the user's session. My error page could then look for that value in the user session and if present, load the correct form, giving the user an error message noting that the file uploaded was too big. The downfall to that approach is that the form comes back empty. It might be better to just have the error page give them some details and encourage them to use the back button, in which case the form's fields would hopefully still be filled in. > Except for the custom error document if delegated to the WSGI > application, doing it this way results in it all being handled by > Apache/mod_wsgi and your WSGI application will not even be invoked. > The request body content would also not even be read by Apache at all. > Do note that whether this avoids the client sending the request body > input depends on whether the client was expecting a '100 Continue' > response before it send the data. Most web browsers still I believe > don't use '100 Continue' response. > > This would be the preferred solution for Apache/mod_wsgi as it is > handled at lowest levels and guaranteed that request content wouldn't > be read at that point. It is however taking control out of your > application. > Hopefully you can clarify something for me. Lets assume that the client does not use '100 Continue' but sends data immediately, after sending the headers. If the server never reads the request content, what does that mean exactly? Does the data get transferred over the wire but then discarded or does the client not get to send the data until the server reads the request body? I.e. the client tries to "send" it, but the content isn't actually transferred across the wire until the server reads it. I am just wondering if there is a buffer or queue or something between the server and the client that allows data to be transferred even if the server doesn't "read" the request body. Or, is it just like a straight pipe where one end (the client) can't push data through until the other end (the server) reads it. I agree that it does take control out of the application. From a usability perspective, the best solution IMO would be for the user to get the form back and have a red error messsage under the input field indicating the file size uploaded was too big and giving them the max file size allowed. However, on second thought, that may not be true. As noted above, because the entire request body was rejected, the form loaded would have none of the information they submitted and most users would probably think they have to fill out the whole form again. Probably better to just give them a non-form error page and let them use the back button (or even provide a link that uses javascript to go back) and in so doing hopefully salvage the time they put into the form. I suppose, though, that two different kinds of file size limits need to be thought through. The first limit would be an application wide limit that is set for security/resource reasons. That, I believe, is what we have been discussing up to this point. I am just realizing that it would also be fine to limit upload sizes at the application level and give more user-friendly error messages. So I might decide on a 10MB application-wide upload limit, but I might also restrict free accounts and paid accounts to 256k and 5MB respectively. As long as a user uploads something less than 10MB, they get a friendly in-line error message. If they upload over 10MB, we handle that at the apache level and send them to a custom error page. > For Apache/mod_wsgi, if you do not do it this way but instead validate > content length in the WSGI application and have the WSGI application > return HTTP_REQUEST_ENTITY_TOO_LARGE (413) error response, then > whether the request content gets read depends on whether you are using > embedded mode or daemon mode of mod_wsgi. > > If you use embedded mode, so long as your WSGI application doesn't > read the input and just returns the error response, the request > content wouldn't be read at all. If you are using daemon mode however, > then the request content would always be read by Apache child worker > process, even if client asked for '100 Continue' response. This is > because the Apache child worker process will always proxy request > content to the daemon process. > > Thats good to know. I think at this point I have talked myself into thinking that there is no good reason to handle it at the application level, but would appreciate any further feedback you might have. One other thing, what would be a good upload size limit? Should it always be as low as possible? What might be a good "middle-ground" for the average web application uploading documents and pictures? Thank you for taking the time to respond. -------------------------------------- Randy Syring RCS Computers & Web Solutions 502-644-4776 http://www.rcs-comp.com "Whether, then, you eat or drink or whatever you do, do all to the glory of God." 1 Cor 10:31 From brian at briansmith.org Tue Nov 25 18:03:22 2008 From: brian at briansmith.org (Brian Smith) Date: Tue, 25 Nov 2008 11:03:22 -0600 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <49284A17.2050801@rcs-comp.com> References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> <49284A17.2050801@rcs-comp.com> Message-ID: <006501c94f1f$ba54a620$2efdf260$@org> Randy Syring wrote: > Hopefully you can clarify something for me. Lets assume that the > client does not use '100 Continue' but sends data immediately, after > sending the headers. If the server never reads the request content, > what does that mean exactly? Does the data get transferred over the > wire but then discarded or does the client not get to send the data > until the server reads the request body? I.e. the client tries to > "send" it, but the content isn't actually transferred across the > wire until the server reads it. I am just wondering if there > is a buffer or queue or something between the server and the client > that allows data to be transferred even if the server doesn't > "read" the request body. Or, is it just like a straight pipe > where one end (the client) can't push data through until the other > end (the server) reads it. Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in this scenario. The input and the output are buffered separately both of those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the non-blocking I/O logic needed to prevent deadlocks. I heard (but did not verify) that mod_fastcgi does not have this deadlocking problem. The sizes of the buffers determines the size of the inputs and outputs needed to cause a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default. Therefore, for maximum portability, a WSGI application should ALWAYS consume the *whole* request body if it wants to avoid the deadlock using the reference WSGI adapter in PEP 333 or mod_wsgi. Probably other WSGI gateways have similar issues. It would be nice if there was a standard entry in the WSGI environment (e.g. "wsgi.may_ignore_request_body") that could be used to safely detect when we can skip the request body. It would be even nicer if WSGI gateways were updated to avoid this problem. However, that is easier said than done. If you know C, it is relatively simple to modify mod_wsgi to use a different Apache<->daemon communication protocol so that the daemon mode works as you would expect (no deadlocks, proper 100-continue support, request body isn't read unless your application asks for it). A long time ago I had a patch that did this (among other things) but I don't think I have it any more. However, once you get to that point, you still run into problems. If your goal is to avoid reading the request body, then you need to close the connection in your error response; Otherwise, if the request was a HTTP/1.1 request, you still need to read the entire request body in order to process any requests that follow it in the request pipeline. Unfortunately, a WSGI application doesn't have any way of signaling that the connection is to be closed; the WSGI specification forbids the WSGI application from returning the Connection header since it is hop-by-hop. And, even if there was such a mechanism, a poorly-coded client is likely to still cause a deadlock if the server doesn't read its full request. Make sure you test with all your targeted browsers. Consequently... > > If you are using daemon mode however, > > then the request content would always be read by Apache child worker > > process, even if client asked for '100 Continue' response. This is > > because the Apache child worker process will always proxy request > > content to the daemon process. > > > Thats good to know. I think at this point I have talked myself into > thinking that there is no good reason to handle it at the application > level, but would appreciate any further feedback you might have. ...if your users will often attempt to upload large files exceed your limits, is to best to mitigate the problem on the client-side. First, document the file size limit clearly on the page where the upload happens. Secondly, implement a flash-based and/or java-based file upload control that can be used when the user has Flash installed (fall back to the regular control otherwise). With such an uploader, you can check the file size on the client and prevent these requests from even being made (in the typical case). You will still have to implement the validation logic on the server to prevent malicious use and/or disabled Javascript/Flash/Java. There are additional benefits to this approach (better UI, multi-file selection, compression, encryption, doesn't waste the user's time, saves bandwidth) but it comes with all the drawbacks inherent with Flash/Java/Javascript. Regards, Brian From and-py at doxdesk.com Tue Nov 25 21:14:52 2008 From: and-py at doxdesk.com (Andrew Clover) Date: Tue, 25 Nov 2008 21:14:52 +0100 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <006501c94f1f$ba54a620$2efdf260$@org> References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> <49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org> Message-ID: <492C5CBC.30006@doxdesk.com> Brian Smith wrote: > Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in > this scenario. Under IIS CGI it's considerably more likely. The output buffer you get is smaller than Apache/Linux (at least on Win2K3 it's only 2KB), so even a relatively small error page spat out before reading the whole input will result in a cheeky hang. > Therefore, for maximum portability, a WSGI application should ALWAYS consume > the *whole* request body if it wants to avoid the deadlock using the > reference WSGI adapter in PEP 333 or mod_wsgi (...in daemon mode) yep. -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ From graham.dumpleton at gmail.com Tue Nov 25 23:59:10 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Wed, 26 Nov 2008 09:59:10 +1100 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <006501c94f1f$ba54a620$2efdf260$@org> References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> <49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org> Message-ID: <88e286470811251459h6ed70717wbc22ba47009810d3@mail.gmail.com> 2008/11/26 Brian Smith : > Randy Syring wrote: >> Hopefully you can clarify something for me. Lets assume that the >> client does not use '100 Continue' but sends data immediately, after >> sending the headers. If the server never reads the request content, >> what does that mean exactly? Does the data get transferred over the >> wire but then discarded or does the client not get to send the data >> until the server reads the request body? I.e. the client tries to >> "send" it, but the content isn't actually transferred across the >> wire until the server reads it. I am just wondering if there >> is a buffer or queue or something between the server and the client >> that allows data to be transferred even if the server doesn't >> "read" the request body. Or, is it just like a straight pipe >> where one end (the client) can't push data through until the other >> end (the server) reads it. > > Under Apache CGI or mod_wsgi, in many situations you will get a deadlock in > this scenario. The input and the output are buffered separately both of > those buffers can fill up. It isn't 'many situations', it is a quite specific situation. The issue applies only to mod_wsgi daemon mode and only occurs where the size of the request content body size is larger than the UNIX socket buffer size for that platform and the WSGI application doesn't consume all the request body. At the same time, the WSGI application would then have to return a set of response headers and response body which combined are also larger than the UNIX socket buffer size for that platform. > Neither mod_wsgi nor mod_cgid implement the > non-blocking I/O logic needed to prevent deadlocks. Both mod_wsgi and mod_cgi do have timeouts so that a permanent deadlock situation at least doesn't arise. This is based off standard Apache Timeout directive. AFAIK I know mod_cgid still has bug in it whereby it doesn't detect it and so possibly easy way to DOS an Apache server. As far as changing how mod_wsgi works, there exists the issue: http://code.google.com/p/modwsgi/issues/detail?id=56 It is low priority though as no one has been reporting it as a problem in actual use. Scenarios where it technically might be triggered would generally be SPAM bots trying to POST large amounts of data to arbitrary URLs. If an application is function as intended, the situation shouldn't really arise as POST requests should be getting directed at URLs which will consume it. That issue also references the IIS+CGI issue someone else mentioned: http://www.doxdesk.com/updates/2006.html#u20060416-cgi FWIW, mod_scgi also has same problem and it doesn't implement timeouts so can suffer permanent deadlock. > I heard (but did not > verify) that mod_fastcgi does not have this deadlocking problem. The sizes > of the buffers determines the size of the inputs and outputs needed to cause > a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by default. MacOS X is only system I know of that has small default UNIX socket buffer sizes. This small buffer size only applies to UNIX socket buffer sizes, for INET sockets it is much much larger. Since mod_fastcgi predominantly uses INET sockets, if there is an issue it may not be obvious as you would need to be returning very large response. From what I remember when I looked at mod_fastcgi and mod_proxy for certain types of operations they both try and force all request content down the socket before trying to read response. Thus, am not convinced that problem couldn't actually occur for both of these as well, but since INET socket buffer size much much larger, not generally triggered. To work around UNIX socket buffer size on mod_wsgi, there are options which can be supplied to WSGIDaemonProcess to change the UNIX socket buffer sizes used to something more sensible. > Therefore, for maximum portability, a WSGI application should ALWAYS consume > the *whole* request body if it wants to avoid the deadlock using the > reference WSGI adapter in PEP 333 or mod_wsgi. > > Probably other WSGI gateways have similar issues. It would be nice if there > was a standard entry in the WSGI environment (e.g. > "wsgi.may_ignore_request_body") that could be used to safely detect when we > can skip the request body. It would be even nicer if WSGI gateways were > updated to avoid this problem. However, that is easier said than done. > > If you know C, it is relatively simple to modify mod_wsgi to use a different > Apache<->daemon communication protocol so that the daemon mode works as you > would expect (no deadlocks, proper 100-continue support, request body isn't > read unless your application asks for it). A long time ago I had a patch > that did this (among other things) but I don't think I have it any more. Depends on your definition of simple. It would be quite fiddly to do and get right, or one would have to rewrite a large amount of code. I wouldn't regard either as really that simple. > However, once you get to that point, you still run into problems. If your > goal is to avoid reading the request body, then you need to close the > connection in your error response; Otherwise, if the request was a HTTP/1.1 > request, you still need to read the entire request body in order to process > any requests that follow it in the request pipeline. Unfortunately, a WSGI > application doesn't have any way of signaling that the connection is to be > closed; the WSGI specification forbids the WSGI application from returning > the Connection header since it is hop-by-hop. And, even if there was such a > mechanism, a poorly-coded client is likely to still cause a deadlock if the > server doesn't read its full request. Make sure you test with all your > targeted browsers. Apache, and I would expect any sensible web server, always closes a client connection when error responses are returned. Thus it will only allow request pipelining so long as 200 response is returned. Okay, it isn't this simple as Apache looks at lots of other things as well, but close enough. The WSGI specification may forbid returning Connection header, but if you do do it with mod_wsgi, then Apache will note it and close the connection even if 200 response is returned. Graham > Consequently... > >> > If you are using daemon mode however, >> > then the request content would always be read by Apache child worker >> > process, even if client asked for '100 Continue' response. This is >> > because the Apache child worker process will always proxy request >> > content to the daemon process. >> > >> Thats good to know. I think at this point I have talked myself into >> thinking that there is no good reason to handle it at the application >> level, but would appreciate any further feedback you might have. > > ...if your users will often attempt to upload large files exceed your > limits, is to best to mitigate the problem on the client-side. First, > document the file size limit clearly on the page where the upload happens. > Secondly, implement a flash-based and/or java-based file upload control that > can be used when the user has Flash installed (fall back to the regular > control otherwise). With such an uploader, you can check the file size on > the client and prevent these requests from even being made (in the typical > case). You will still have to implement the validation logic on the server > to prevent malicious use and/or disabled Javascript/Flash/Java. There are > additional benefits to this approach (better UI, multi-file selection, > compression, encryption, doesn't waste the user's time, saves bandwidth) but > it comes with all the drawbacks inherent with Flash/Java/Javascript. > > Regards, > Brian > > > From brian at briansmith.org Wed Nov 26 16:01:43 2008 From: brian at briansmith.org (Brian Smith) Date: Wed, 26 Nov 2008 09:01:43 -0600 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <88e286470811251459h6ed70717wbc22ba47009810d3@mail.gmail.com> References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> <49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org> <88e286470811251459h6ed70717wbc22ba47009810d3@mail.gmail.com> Message-ID: <00c301c94fd7$e59acc20$b0d06460$@org> Brian Smith wrote: > 2008/11/26 Brian Smith : > > Under Apache CGI or mod_wsgi, in many situations you will get a > > deadlock in this scenario. > > It isn't 'many situations', it is a quite specific situation. Right. I meant that it can happen quite often (every time) that situation occurs, depending on the characteristics of the application. > > If you know C, it is relatively simple to modify mod_wsgi to use a > > different Apache<->daemon communication protocol > > Depends on your definition of simple. It would be quite fiddly to do > and get right, or one would have to rewrite a large amount of code. I > wouldn't regard either as really that simple. I did it by implementing the communication protocol that I had proposed on the mod_wsgi mailing list a while ago. It is straightforward to do, but it does take a lot of time to learn how mod_wsgi works in order to make the change, especially if you have never written an Apache module before. - Brian From fumanchu at aminus.org Thu Nov 27 18:07:31 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Thu, 27 Nov 2008 09:07:31 -0800 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <006501c94f1f$ba54a620$2efdf260$@org> References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com><49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org> Message-ID: Brian Smith wrote: > Randy Syring wrote: > > Hopefully you can clarify something for me. Lets assume that the > > client does not use '100 Continue' but sends data immediately, after > > sending the headers. If the server never reads the request content, > > what does that mean exactly? Does the data get transferred over the > > wire but then discarded or does the client not get to send the data > > until the server reads the request body? I.e. the client tries to > > "send" it, but the content isn't actually transferred across the > > wire until the server reads it. I am just wondering if there > > is a buffer or queue or something between the server and the client > > that allows data to be transferred even if the server doesn't > > "read" the request body. Or, is it just like a straight pipe > > where one end (the client) can't push data through until the other > > end (the server) reads it. > > Under Apache CGI or mod_wsgi, in many situations you will get a > deadlock in > this scenario. The input and the output are buffered separately both of > those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the > non-blocking I/O logic needed to prevent deadlocks. I heard (but did > not > verify) that mod_fastcgi does not have this deadlocking problem. The > sizes > of the buffers determines the size of the inputs and outputs needed to > cause > a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by > default. > > Therefore, for maximum portability, a WSGI application should ALWAYS > consume > the *whole* request body if it wants to avoid the deadlock using the > reference WSGI adapter in PEP 333 or mod_wsgi. Indeed. This is covered in RFC 2616 Section 8.2.3: If an origin server receives a request that does not include an Expect request-header field with the "100-continue" expectation, the request includes a request body, and the server responds with a final status code before reading the entire request body from the transport connection, then the server SHOULD NOT close the transport connection until it has read the entire request, or until the client closes the connection. Otherwise, the client might not reliably receive the response message. However, this requirement is not be construed as preventing a server from defending itself against denial-of-service attacks, or from badly broken client implementations. CherryPy's wsgiserver will read any remaining request body (which the application hasn't read) before sending response headers. Robert Brewer fumanchu at aminus.org From graham.dumpleton at gmail.com Fri Nov 28 00:15:17 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Fri, 28 Nov 2008 10:15:17 +1100 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> <49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org> Message-ID: <88e286470811271515l4e60ab3br3ae9fc3bf56588ac@mail.gmail.com> 2008/11/28 Robert Brewer : > Brian Smith wrote: >> Randy Syring wrote: >> > Hopefully you can clarify something for me. Lets assume that the >> > client does not use '100 Continue' but sends data immediately, after >> > sending the headers. If the server never reads the request content, >> > what does that mean exactly? Does the data get transferred over the >> > wire but then discarded or does the client not get to send the data >> > until the server reads the request body? I.e. the client tries to >> > "send" it, but the content isn't actually transferred across the >> > wire until the server reads it. I am just wondering if there >> > is a buffer or queue or something between the server and the client >> > that allows data to be transferred even if the server doesn't >> > "read" the request body. Or, is it just like a straight pipe >> > where one end (the client) can't push data through until the other >> > end (the server) reads it. >> >> Under Apache CGI or mod_wsgi, in many situations you will get a >> deadlock in >> this scenario. The input and the output are buffered separately both > of >> those buffers can fill up. Neither mod_wsgi nor mod_cgid implement the >> non-blocking I/O logic needed to prevent deadlocks. I heard (but did >> not >> verify) that mod_fastcgi does not have this deadlocking problem. The >> sizes >> of the buffers determines the size of the inputs and outputs needed to >> cause >> a deadlock. On some platforms (e.g. Mac OS X), they are only 8K by >> default. >> >> Therefore, for maximum portability, a WSGI application should ALWAYS >> consume >> the *whole* request body if it wants to avoid the deadlock using the >> reference WSGI adapter in PEP 333 or mod_wsgi. > > Indeed. This is covered in RFC 2616 Section 8.2.3: > > If an origin server receives a request that does not include an > Expect request-header field with the "100-continue" expectation, > the request includes a request body, and the server responds > with a final status code before reading the entire request body > from the transport connection, then the server SHOULD NOT close > the transport connection until it has read the entire request, > or until the client closes the connection. Otherwise, the client > might not reliably receive the response message. However, this > requirement is not be construed as preventing a server from > defending itself against denial-of-service attacks, or from > badly broken client implementations. > > CherryPy's wsgiserver will read any remaining request body (which the > application hasn't read) before sending response headers. A WSGI application could technically want to send response headers and only then read remaining request content. I don't believe there is anything in the WSGI specification which prevents that. If you are discarding the request content as soon as response headers are generated, that could technically be a problem for some use cases, even if they may be obscure. I cant tell from looking at latest CherryPy WSGI server code as has been changed since last I looked at it and haven't yet had time to grok it and run some tests, but previously in respect of where WSGI specification says: """The server is not required to read past the client's specified Content-Length, and is allowed to simulate an end-of-file condition if the application attempts to read past that point.""" the CherryPy WSGI server code chose NOT to simulate an end-of-file condition. This was the case as the amount of data read from wsgi.input was never tracked. This meant that if application did try and read more content than available and request pipelining occurring then the read would hang as would not get an empty string returned as would be normal for end-of-file condition for file like object. If the code is still behaving this way, then it wouldn't be possible for it to discard remaining input as how much was read wasn't tracked. Looking at latest code I do note the presence of a wrapper around socket used for wsgi.input, but haven't been able to work out yet whether it returns a traditional empty string as end-of-file condition, or whether it is going to instead raise your MaxSizeExceeded exception and thus not be file like in it behaviour. Can you perhaps explain what is going to happen when an attempt is made to read more content than what was available and whether it is actually going to raise an exception rather than just return an empty string like file like objects would. Personally I think that that part of WSGI specification should be amended such that it is required that an end-of-file condition MUST be indicated using an empty string just like with normal file like objects. Just this one change would mean that one could call read() with no arguments and have it return all input, whereas at the moment WSGI specification does allow argument to read() be optional. This would actually negate the whole need for applications to even check/use CONTENT_LENGTH except for situations where it mattered such as 413 response or where how it decided to process it was dependent on size. That is, to get all request content you would just call read() with no argument. If you wanted to process it in chunks, then it would just loop reading a set chunk size until empty string returned and it wouldn't need to track how much it read and short read the last chunk. If applications worked this way then one could handle mutating input filters that changed amount of request content, ie., decompression of data, plus could handle chunked transfer encoding on request content in a reasonable way without having to read it all in and buffer it just to work out CONTENT_LENGTH. Up till now, the only major WGSI server (ignoring wsgiref perhaps) I knew of which didn't allow read() with no argument or which didn't simulate end-of-file through empty string being returned was CherryPy WSGI server. Now its code has been changed, but not sure if it still does that or whether it has done something totally different to everything else by raising an exception instead. Graham From fumanchu at aminus.org Fri Nov 28 06:58:25 2008 From: fumanchu at aminus.org (Robert Brewer) Date: Thu, 27 Nov 2008 21:58:25 -0800 Subject: [Web-SIG] Implementing File Upload Size Limits In-Reply-To: <88e286470811271515l4e60ab3br3ae9fc3bf56588ac@mail.gmail.com> References: <49279DB5.9090109@rcs-comp.com> <88e286470811220112g42ad87b2r795d79938c64d9e2@mail.gmail.com> <49284A17.2050801@rcs-comp.com> <006501c94f1f$ba54a620$2efdf260$@org> <88e286470811271515l4e60ab3br3ae9fc3bf56588ac@mail.gmail.com> Message-ID: Graham Dumpleton wrote: > 2008/11/28 Robert Brewer : > > CherryPy's wsgiserver will read any remaining request body (which the > > application hasn't read) before sending response headers. > > A WSGI application could technically want to send response headers and > only then read remaining request content. I don't believe there is > anything in the WSGI specification which prevents that. If you are > discarding the request content as soon as response headers are > generated, that could technically be a problem for some use cases, > even if they may be obscure. I'll look into that further. > I cant tell from looking at latest CherryPy WSGI server code as has > been changed since last I looked at it and haven't yet had time to > grok it and run some tests, but previously in respect of where WSGI > specification says: > > """The server is not required to read past the client's specified > Content-Length, and is allowed to simulate an end-of-file condition if > the application attempts to read past that point.""" > > the CherryPy WSGI server code chose NOT to simulate an end-of-file > condition. This was the case as the amount of data read from > wsgi.input was never tracked. This meant that if application did try > and read more content than available and request pipelining occurring > then the read would hang as would not get an empty string returned as > would be normal for end-of-file condition for file like object. > > If the code is still behaving this way, then it wouldn't be possible > for it to discard remaining input as how much was read wasn't tracked. > > Looking at latest code I do note the presence of a wrapper around > socket used for wsgi.input, but haven't been able to work out yet > whether it returns a traditional empty string as end-of-file > condition, or whether it is going to instead raise your > MaxSizeExceeded exception and thus not be file like in it behaviour. It still raises MaxSizeExceeded. > Can you perhaps explain what is going to happen when an attempt is > made to read more content than what was available and whether it is > actually going to raise an exception rather than just return an empty > string like file like objects would. > > Personally I think that that part of WSGI specification should be > amended such that it is required that an end-of-file condition MUST be > indicated using an empty string just like with normal file like > objects. Just this one change would mean that one could call read() > with no arguments and have it return all input, whereas at the moment > WSGI specification does allow argument to read() be optional. > > This would actually negate the whole need for applications to even > check/use CONTENT_LENGTH except for situations where it mattered such > as 413 response or where how it decided to process it was dependent on > size. That is, to get all request content you would just call read() > with no argument. If you wanted to process it in chunks, then it would > just loop reading a set chunk size until empty string returned and it > wouldn't need to track how much it read and short read the last chunk. > If applications worked this way then one could handle mutating input > filters that changed amount of request content, ie., decompression of > data, plus could handle chunked transfer encoding on request content > in a reasonable way without having to read it all in and buffer it > just to work out CONTENT_LENGTH. > > Up till now, the only major WGSI server (ignoring wsgiref perhaps) I > knew of which didn't allow read() with no argument or which didn't > simulate end-of-file through empty string being returned was CherryPy > WSGI server. Now its code has been changed, but not sure if it still > does that or whether it has done something totally different to > everything else by raising an exception instead. I'd be open to changing it to EOF instead of error; amending the WSGI spec would be nice too. Robert Brewer fumanchu at aminus.org From luca.tebaldi at unife.it Fri Nov 28 17:18:51 2008 From: luca.tebaldi at unife.it (Luca Tebaldi) Date: Fri, 28 Nov 2008 17:18:51 +0100 Subject: [Web-SIG] web services ssl client Message-ID: <47eb0bab0811280818y376bb769pc3528297ff46b0be@mail.gmail.com> Hi, should I build a client for web services that require authentication based on a ca (pem and crt), I'm trying to use soappy but not work... someone have any idea or can tell me where to find a tutorial? tnx a lot! Luca -- skype:luca.tebaldi bookmark: http://del.icio.us/lucatebaldi foto: http://www.flickr.com/photos/teba/tags/ linkedin: http://www.linkedin.com/in/lucatebaldi -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.dumpleton at gmail.com Fri Nov 28 23:28:33 2008 From: graham.dumpleton at gmail.com (Graham Dumpleton) Date: Sat, 29 Nov 2008 09:28:33 +1100 Subject: [Web-SIG] web services ssl client In-Reply-To: <47eb0bab0811280818y376bb769pc3528297ff46b0be@mail.gmail.com> References: <47eb0bab0811280818y376bb769pc3528297ff46b0be@mail.gmail.com> Message-ID: <88e286470811281428l5b363e3he2a37662efb3e202@mail.gmail.com> 2008/11/29 Luca Tebaldi : > Hi, > should I build a client for web services that require authentication based > on a ca (pem and crt), I'm trying to use soappy but not work... someone have > any idea or can tell me where to find a tutorial? More appropriate forum for stuff related to Python and SOAP services is: http://groups.google.com/group/pywebsvcs?lnk= Graham