From chris.dent at gmail.com Mon Oct 1 19:07:40 2012 From: chris.dent at gmail.com (chris.dent at gmail.com) Date: Mon, 1 Oct 2012 18:07:40 +0100 (BST) Subject: [Web-SIG] resources for porting wsgi apps from python 2 to 3 Message-ID: I was at pyconuk over the weekend and came away from that all refreshed and wanting to hack. That combined with the recent release of Python 3.3 had me deciding it was time to start porting TiddlyWeb[1] to Python3. I'm having progress along some lines and a bit of a mess along others. The major holdback right now are dependencies which are not yet ported, which I'd like to port as well, but proving hard to port because they have test dependencies which themselves are not yet ported. A medium sized issue is related to how WSGI is supposed to behave in Python3. TiddlyWeb is its own framework and doesn't use webob or werkzeug, etc. It does dispatch with selector, but other than that processes handling headers and request body as it gets them from the server. For tests it uses wsgi-intercept[2] to simulate a web server. I've volunteered to port that (having already done some minor work on it in the past) so need to get clear and the disposition of bytes or strings in headers and bodies of both requests and responses. I have a few questions that I'm hoping people here will help me answer, or at least point me in the right direction. I'll be happy to summarize the results after the discussion has tailed off.[3] I've looked over pep 3333 and don't some other reading, but I don't feel fully confident. The question is mostly around what part of the stack should be uptight. In the below when I say "bytes" and "str" I mean the Python 3 types. * Should wsgi-intercept (which fakes a server) when giving request info to a "fake app": * Use bytes or str for environ keys? * Use bytes or str for environ values? * Are all environ values created equal or would, for example, QUERY_STRING's value (prior to any parameter to decoding) be handled differently from HTTP_COOKIE * If str, I see that ISO-8859-1 is the assumed encoding. How much hurt occurs in the world if I just assume utf-8 when decoding to str[4]? * When wsgi-intercept is accepting data from the wsgi app: * Should start_response only accept bytes (and error if not), or should it also accept str and encode appropriately? To put it another way: be liberal or srict? If encoding, which encoding? * Should the returned iterable be rejected or encoded if not bytes? What have I forgotten? Thanks for any input, comments, etc. The thing at [3] has a few more details on some of the related issues and pieces of the puzzle. [1] http://tiddlyweb.com/ https://github.com/tiddlyweb/tiddlyweb [2] http://code.google.com/p/wsgi-intercept/ [3] I've started keeping notes on this project at http://tiddlyweb3.tiddlyspace.com/ [4] Which is what it should have been all along? -- Chris Dent http://peermore.com/ -- Chris Dent http://burningchrome.com/ [...] From and-py at doxdesk.com Tue Oct 2 14:38:27 2012 From: and-py at doxdesk.com (And Clover) Date: Tue, 02 Oct 2012 13:38:27 +0100 Subject: [Web-SIG] resources for porting wsgi apps from python 2 to 3 In-Reply-To: References: Message-ID: <506AE043.30003@doxdesk.com> On 01/10/12 18:07, chris.dent at gmail.com wrote: > * Use bytes or str for environ keys? > * Use bytes or str for environ values? str, decoded from the request bytes using ISO-8859-1. > * Are all environ values created equal or would, for example, > QUERY_STRING's value (prior to any parameter to decoding) > be handled differently from HTTP_COOKIE All environ values are created equal (other than the CGI-mandated odd decoding behaviour of SCRIPT_NAME and PATH_INFO). > * If str, I see that ISO-8859-1 is the assumed encoding. How much > hurt occurs in the world if I just assume utf-8 when decoding to > str[4]? Immediately, all non-ASCII characters in the path would be interpreted incorrectly. The more general hurt to the world would be that we would continue the sad pre-PEP3333 situation where every web server handles non-ASCII characters differently, and so no WSGI application can reliably use Unicode in path segments. There is little impact to any header other than the path, because non-ASCII characters almost never appear in them. The query string remains %-encoded so any non-ASCII characters are safe. The other places users can put non-ASCII characters are in cookies and HTTP Basic Authorisation headers, but browser support here is so variable/broken that Python's handling would be the least of your worries. > [4] Which is what it should have been all along? Not necessarily. Even if you decide that all web apps must use UTF-8 for text encoding, it's valid to have URL-encoded, non-text binary data in a path segment. This would be unrecoverable using straight UTF-8. (They would be recoverable if surrogateescape were used, but PEP 3333 has to encompass language versions that don't have surrogateescape, and also it's questionable whether it should be possible to smuggle non-UTF-8 data into strings that applications assume are safe.) Plus header values are less likely to be UTF-8, and HTTP specifies that they're ISO-8859-1 (even if that is not well-observed by browsers). Ideally, the interfaces should all be bytes, because HTTP is defined in terms of bytes. But that plays poorly with Python 3's default Unicode strs (for environ et al). So ISO-8859-1 was chosen as a str interface for which the original bytes can at least be recovered. > * Should start_response only accept bytes (and error if not), or > should it also accept str and encode appropriately? status and response_headers are, like the request headers, native str (to be ISO-8859-1 encoded). It's only the HTTP entity body that is always bytestring. > * Should the returned iterable be rejected or encoded if not bytes? I don't think it's specified by the PEP, but wsgiref looks like it'll chuck TypeError when it tries to write str to the buffer/socket. cheers, -- And Clover mailto:and at doxdesk.com http://www.doxdesk.com/ gtalk:chat?jid=bobince at gmail.com From chris.dent at gmail.com Tue Oct 2 21:36:26 2012 From: chris.dent at gmail.com (chris.dent at gmail.com) Date: Tue, 2 Oct 2012 20:36:26 +0100 (BST) Subject: [Web-SIG] resources for porting wsgi apps from python 2 to 3 In-Reply-To: <506AE043.30003@doxdesk.com> References: <506AE043.30003@doxdesk.com> Message-ID: On Tue, 2 Oct 2012, And Clover wrote: [a bunch of useful stuff] Thanks, that was nicely cogent and thus very useful. I've got a first working version: https://github.com/cdent/python3-wsgi-intercept I need to codify some things in tests a bit more, and there's an issue with https request and http.client, but I'm close. -- Chris Dent http://burningchrome.com/ [...] From me at rpatterson.net Tue Oct 30 19:13:07 2012 From: me at rpatterson.net (Ross Patterson) Date: Tue, 30 Oct 2012 11:13:07 -0700 Subject: [Web-SIG] WSGI apps on IIS Message-ID: <87ehkfygjw.fsf@rpatterson.net> >From my blog post: http://rpatterson.net/software/wsgi-apps-on-iis The `iiswsgi`_ module implements a FastCGI to `WSGI`_ gateway that is compatible with `IIS`_'s variation of the `FastCGI protocol`_. It also provides `distutils`_ commands for building, distributing and installing `Microsoft Web Deploy`_ (MSDeploy) packages through the `Web Platform Installer`_ (WebPI). The goals of the code in `iiswsgi`_ are to do the following for deploying WSGI apps on IIS: * make it open source as far as possible, right up to IIS * be Pythonic as far as possible, right up to the MSDeploy packaging * re-use our existing tool-chain for distributing packages * share the maintenance burden for a WSGI on Windows story across the community For the `Plone`_ project, it's always been simultaneously a necessity that we support a Windows deployment story and one of our biggest pain points. The Windows installers have always been very different from the other installers. They have had different layouts from our user and developer documentation and even from each other. They have never been maintained or supported by more than one entity, either a company or an individual, and as such have often and ultimately languished. And as for the poor individuals who have tackled the Windows installers, they have almost always burned out and can no longer provide any significant Windows support at all. This is not a healthy open source community dynamic. And yet there is wide consensus that it's not an option *not* to have a Windows deployment story. My hope is that by generalizing the IIS deployment architecture as a `WSGI server`_ and `distutils commands`_, it can be of use to the general Python `WSGI`_ world. I also hope that by doing things 'The Right Way', it will be something that will be clearer and easy to support and maintain. With those two together maybe we can solve the burnout issue by distributing the maintenance load. I'd very much appreciate any help to that end, particularly including feedback on how to get there. I don't care where the code lives and would be happy to see some of it merged back into the packages it derives from or moved into larger packages. So please let me know if you'd like to coordinate moving things around with me. Help Needed ----------- Any contributions are very welcome. Here are a few things I'm looking for in particular: * addressing `Known Issues`_ * IIS app name and Python dist name conventions * fostering community ownership * writing tests I'm particularly apologetic for the last one, I'm ashamed by the lack of tests. In my defense, this whole problem was such a fog for me when I started that I just needed to start writing things and poking around. Believe me this is not my usual MO, I almost always do TDD and beg your forgiveness. :-) I look forward to getting this going! .. _`iiswsgi`: https://github.com/rpatterson/iiswsgi .. _`WSGI`: http://wsgi.readthedocs.org/en/latest/ .. _`IIS`: http://www.iis.net .. _`FastCGI protocol`: http://www.fastcgi.com/drupal/ .. _`distutils`: http://docs.python.org/distutils/ .. _`Microsoft Web Deploy`: http://www.iis.net/downloads/microsoft/web-deploy .. _`Web Platform Installer`: http://www.microsoft.com/web/downloads/platform.aspx .. _`Plone`: http://plone.org .. _`WSGI server`: https://github.com/rpatterson/iiswsgi#iiswsgi-fcgi-gateway .. _`distutils commands`: https://github.com/rpatterson/iiswsgi#build-msdeploy-package .. _`Known Issues`: https://github.com/rpatterson/iiswsgi#known-issues