From andrew at aeracode.org Wed Mar 9 19:34:40 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Wed, 9 Mar 2016 16:34:40 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec Message-ID: Hi all, As some of you may know, I've been working over the past few months to bring native WebSocket support to Django, via a project codenamed "Django Channels" - this is mostly the reason I've been involved in recent WSGI discussions. I'm personally of the opinion that WSGI works well for HTTP, with a few improvements we can roll into a 1.1, but that we also need something else that can support WebSockets and other future web protocols (e.g. WebRTC components). To that end, I did some work to make the underlying mechanism Django Channels uses into more of a standard, which I have codenamed ASGI; while initially I intended for it to be a Django documented API, as I've gone further with the project I've come to believe it could be useful to the Python community at large. My intention would be for this spec to sit alongside WSGI, and be a second option for both servers and frameworks to support (if they wished) that supports both HTTP and WebSocket connections, as well as a reasonable way to extend it to future protocols. All current applications and servers could continue to work via adapter classes that transform ASGI to WSGI on either end of its HTTP path, which I think is an important migration consideration. I'd love some feedback from this group on my proposed specification, and any major problems you forsee; there are a few issues I know about, mostly potential performance issues, but in most of those cases I believe the gains outweigh the loss. The major change is that servers and applications now run independently, either in separate threads or processes, and communicate bidirectionally over a "channel layer", rather than the server calling the application directly. I'm not yet angling to take this to a PEP, but that would be my eventual goal; right now, I want to get feedback from people on their major likes/dislikes, and how it works for various parts of the Python web ecosystem. The spec already has an application framework (Django), web/websocket server (Daphne [1]) and three channel layers [2] implemented, so I've ironed out some major problems it initially had from working on those, but I'm not as experienced in the rigours of serving HTTP as most of you are. I do encourage you, though, to take a look at the rest of the Channels docs if you want to get an idea of how it works and deploys in practice. Spec is up here: http://channels.readthedocs.org/en/latest/asgi.html Helpful quick Q&A: http://channels.readthedocs.org/en/latest/inshort.html I do believe that making a clean break from WSGI to a new structure (and NOT calling it "WSGI 2") is the best thing we can do if we truly want to support more web protocols properly, and I believe that doing that in a way that still supports WSGI and provides a nice migration path is important - I believe ASGI provides both of these things, as well as a relatively simple core API (one of WSGI's strengths in my opinion) - but I welcome your opinions as well. Andrew [1] https://github.com/andrewgodwin/daphne [2] https://github.com/andrewgodwin/asgi_redis, https://github.com/andrewgodwin/asgiref/blob/master/asgiref/inmemory.py, https://github.com/andrewgodwin/channels/blob/master/channels/database_layer.py -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Thu Mar 10 04:59:15 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Thu, 10 Mar 2016 09:59:15 +0000 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: > On 10 Mar 2016, at 00:34, Andrew Godwin wrote: > > To that end, I did some work to make the underlying mechanism Django Channels uses into more of a standard, which I have codenamed ASGI; while initially I intended for it to be a Django documented API, as I've gone further with the project I've come to believe it could be useful to the Python community at large. > Andrew, Thanks for this work! I?ve provided some proposed changes as pull requests against the channels repository. I?ll ignore those for the rest of the email: we can discuss them on GitHub. I also have a few more general notes. I didn?t make PRs for these, mostly because they?re too ?vague? as feedback goes to be concretely handled by me. First, your HTTP section has request headers serialized to a dict and response headers serialized to a list of tuples. I?m not sure how I feel about that asymmetry: it might be cleaner just to use lists-of-tuples in both places and allow application frameworks to handle translation to dictionary if they require it. Second, if it were me I?d remove the `status_text` field on the `Response` object. Custom status text is a terrible misfeature (especially as HTTP/2 doesn?t support it), and in 99% of cases you?re just wasting data by repeatedly sending the default phrase that the server already knows. Third, you?re currently sending header fields with unicode names and byte string values. That?s understandable, but I wonder if it?s worthwhile trying to limit the behaviour of compliant servers in encoding/decoding those header fields. For example, you could assert that the unicode header names will always use the Latin-1 codec when encoding/decoding. This is mostly me being paranoid about poorly written apps/servers issuing bad bytes onto the network. I should note that RFC 7230 strictly limits header names to US-ASCII, but Latin-1 would be the defensive choice against already-badly-written apps. Your section on server push is great, whoever wrote that is clearly a genius. ;) You define web socket data frames with an incrementing counter from zero, but also note that the maximum integer size is Python?s sys.maxint (you actually aren?t that clear about it, which might be a good idea). While this is *probably* not a problem, you may want to note that really long running or active web socket connections are at risk of exhausting the ?order? counter, and define a behaviour if that happens. Otherwise, this is an interesting specification. I?m certainly open to helping push it through the PEP process if you?d like assistance with that. Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From ionel.mc at gmail.com Thu Mar 10 09:46:39 2016 From: ionel.mc at gmail.com (Ionel Maries Cristian) Date: Thu, 10 Mar 2016 16:46:39 +0200 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: Hey, On Thu, Mar 10, 2016 at 2:34 AM, Andrew Godwin wrote: > Helpful quick Q&A: http://channels.readthedocs.org/en/latest/inshort.html > I have looked over that and it's not very clear what goes where. [1] I'd be inclined to understand that the process type "that handles HTTP and WebSockets" would be some sort of specialized proxy service that does the websocket routing, proxying plain requests to the worker (for the regular views) and specific frontend protocol handling (upgrading to http2.0/websockets or whatever). It would be more clear if the docs would include some diagrams illustrating data flow and how all the components connect together with what protocols. Shouldn't the process type "that handles HTTP and WebSockets" have a more specific term? It's a bit long to type. Spec is up here: http://channels.readthedocs.org/en/latest/asgi.html > Is ASGI a wire protocol? I'd assume it is, if multiple processes communicate to each other with this protocol, but the docs don't have any details about the exact wire format. Also, some comparison to existing solutions (like Meteor/SockJS/Crossbar /WAMP ) would help clearing lots of questions. ?[1] Sorry if it sounds harsh, certainly not the intention. I'm just a bit confused/overwhelmed.? Thanks, -- Ionel Cristian M?rie?, http://blog.ionelmc.ro -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at aeracode.org Thu Mar 10 13:36:54 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Thu, 10 Mar 2016 10:36:54 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On Thu, Mar 10, 2016 at 1:59 AM, Cory Benfield wrote: > > > On 10 Mar 2016, at 00:34, Andrew Godwin wrote: > > > > To that end, I did some work to make the underlying mechanism Django > Channels uses into more of a standard, which I have codenamed ASGI; while > initially I intended for it to be a Django documented API, as I've gone > further with the project I've come to believe it could be useful to the > Python community at large. > > > > Andrew, > > Thanks for this work! I?ve provided some proposed changes as pull requests > against the channels repository. I?ll ignore those for the rest of the > email: we can discuss them on GitHub. > > I also have a few more general notes. I didn?t make PRs for these, mostly > because they?re too ?vague? as feedback goes to be concretely handled by me. > > First, your HTTP section has request headers serialized to a dict and > response headers serialized to a list of tuples. I?m not sure how I feel > about that asymmetry: it might be cleaner just to use lists-of-tuples in > both places and allow application frameworks to handle translation to > dictionary if they require it. > I think you're right, and I've just been stubbornly trying to use a dict as it's slightly "nicer". I honestly considered making both sides dict and cookies the separate thing as they're the only special case, but I suspect that multiple headers are one of those things that might turn out to be useful for some broken client/new feature someday. > > Second, if it were me I?d remove the `status_text` field on the `Response` > object. Custom status text is a terrible misfeature (especially as HTTP/2 > doesn?t support it), and in 99% of cases you?re just wasting data by > repeatedly sending the default phrase that the server already knows. > Well, it IS optional; you only need to send it if you're changing it from the default or providing an unusual new value (e.g. 418). We could change the spec to say servers don't have to abide by it, too. I have done a project in the past with custom reason phrases, that's all :) > > Third, you?re currently sending header fields with unicode names and byte > string values. That?s understandable, but I wonder if it?s worthwhile > trying to limit the behaviour of compliant servers in encoding/decoding > those header fields. For example, you could assert that the unicode header > names will always use the Latin-1 codec when encoding/decoding. This is > mostly me being paranoid about poorly written apps/servers issuing bad > bytes onto the network. I should note that RFC 7230 strictly limits header > names to US-ASCII, but Latin-1 would be the defensive choice against > already-badly-written apps. > Yes, it's perhaps an unwritten understanding that they're meant to be encoded/decoded only to latin1, and I believe this is what Daphne does; they're unicode mostly as that makes keying into the header dictionary much nicer in py3/unicode_literals land, and because there's a clear encoding way to handle them. > > Your section on server push is great, whoever wrote that is clearly a > genius. ;) > > You define web socket data frames with an incrementing counter from zero, > but also note that the maximum integer size is Python?s sys.maxint (you > actually aren?t that clear about it, which might be a good idea). While > this is *probably* not a problem, you may want to note that really long > running or active web socket connections are at risk of exhausting the > ?order? counter, and define a behaviour if that happens. > Ah, good catch. I'll specify a very high maximum order number for any protocol and say it rolls over to 0 for the next one, and then I can modify channels' global_ordering to expect that - I think that's the most sensible approach here. > > Otherwise, this is an interesting specification. I?m certainly open to > helping push it through the PEP process if you?d like assistance with that. > > If we see some rough agreement on it, yes, I would love some help with that. Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.dent at gmail.com Thu Mar 10 13:57:14 2016 From: chris.dent at gmail.com (chris.dent at gmail.com) Date: Thu, 10 Mar 2016 18:57:14 +0000 (GMT) Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On Thu, 10 Mar 2016, Andrew Godwin wrote: > I think you're right, and I've just been stubbornly trying to use a dict as > it's slightly "nicer". I honestly considered making both sides dict and > cookies the separate thing as they're the only special case, but I suspect > that multiple headers are one of those things that might turn out to be > useful for some broken client/new feature someday. It sounds like you consider multiple headers of the same name in request and response as some kind of bug or fault. It's not it is perfectly legit and something I want to be able to do in my webbby frameworks. Vary is the main one. I know that I can join on ',' in a single header when it is represented in a dict but "meh". I totally agree that dicts are much nicer to work with, so I'm not sure what the ideal solution is, but I just wanted to raise that point about multiple headers. As you were. Carry on. etc. -- Chris Dent http://burningchrome.com/ [...] From andrew at aeracode.org Thu Mar 10 14:32:46 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Thu, 10 Mar 2016 11:32:46 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On Thu, Mar 10, 2016 at 10:57 AM, wrote: > On Thu, 10 Mar 2016, Andrew Godwin wrote: > > I think you're right, and I've just been stubbornly trying to use a dict as >> it's slightly "nicer". I honestly considered making both sides dict and >> cookies the separate thing as they're the only special case, but I suspect >> that multiple headers are one of those things that might turn out to be >> useful for some broken client/new feature someday. >> > > It sounds like you consider multiple headers of the same name in > request and response as some kind of bug or fault. It's not it is > perfectly legit and something I want to be able to do in my webbby > frameworks. Vary is the main one. > > I know that I can join on ',' in a single header when it is > represented in a dict but "meh". > Well, the protocol server would be the thing that's doing the joining if it sees multiple headers - you'd always see comma-joined headers from clients as an ASGI application, which I like as I like consistency. > > I totally agree that dicts are much nicer to work with, so I'm not > sure what the ideal solution is, but I just wanted to raise that > point about multiple headers. As you were. Carry on. etc. Yeah, I find the whole comma thing a bit weird, and sort of wonder if it's actually a workable thing for all HTTP clients. I hope it is. Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Thu Mar 10 16:27:19 2016 From: robertc at robertcollins.net (Robert Collins) Date: Fri, 11 Mar 2016 10:27:19 +1300 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On 11 March 2016 at 08:32, Andrew Godwin wrote: > > > > > Well, the protocol server would be the thing that's doing the joining if it > sees multiple headers - you'd always see comma-joined headers from clients > as an ASGI application, which I like as I like consistency. For consistency, why not a dict unicode -> List[bytes] ? -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From robertc at robertcollins.net Thu Mar 10 16:30:01 2016 From: robertc at robertcollins.net (Robert Collins) Date: Fri, 11 Mar 2016 10:30:01 +1300 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On 10 March 2016 at 13:34, Andrew Godwin wrote: > Hi all, > > As some of you may know, I've been working over the past few months to bring > native WebSocket support to Django, via a project codenamed "Django > Channels" - this is mostly the reason I've been involved in recent WSGI > discussions. > > I'm personally of the opinion that WSGI works well for HTTP, with a few > improvements we can roll into a 1.1, but that we also need something else > that can support WebSockets and other future web protocols (e.g. WebRTC > components). > > To that end, I did some work to make the underlying mechanism Django > Channels uses into more of a standard, which I have codenamed ASGI; while > initially I intended for it to be a Django documented API, as I've gone > further with the project I've come to believe it could be useful to the > Python community at large. I realise this may sound bikesheddy, but it would be really good to not call it ASGI. From your docs " Despite the name of the proposal, ASGI does not specify or design to any specific in-process async solution, such as asyncio, twisted, or gevent. Instead, the receive_many function can be switched between nonblocking or synchronous. This approach allows applications to choose what?s best for their current runtime environment; further improvements may provide extensions where cooperative versions of receive_many are provided." I'm worried that folk will assume a parallel between ASGI and asyncio, but there appears to be none... which is only a problem due to the room for confusion. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From andrew at aeracode.org Thu Mar 10 16:34:29 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Thu, 10 Mar 2016 13:34:29 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: > > > > I realise this may sound bikesheddy, but it would be really good to > not call it ASGI. From your docs " > Despite the name of the proposal, ASGI does not specify or design to > any specific in-process async solution, such as asyncio, twisted, or > gevent. Instead, the receive_many function can be switched between > nonblocking or synchronous. This approach allows applications to > choose what?s best for their current runtime environment; further > improvements may provide extensions where cooperative versions of > receive_many are provided." > > I'm worried that folk will assume a parallel between ASGI and asyncio, > but there appears to be none... which is only a problem due to the > room for confusion. Better names are welcome, but I quite like ASGI's similarity to WSGI, and the fact it's pronounceable as a single word. The "Asynchronous" part covers the way the whole system operates; async is already an overloaded term, and while there might be initial confusion, I think "async" also has strong associations with the sort of problems ASGI solves (like websockets), which I think is useful. > For consistency, why not a dict unicode -> List[bytes] I personally think this is worse than a list of tuples (which you can at least feed straight into dict()) - the only header that comes through as multiple, ever, is Set-Cookie, after all. Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertc at robertcollins.net Thu Mar 10 17:07:48 2016 From: robertc at robertcollins.net (Robert Collins) Date: Fri, 11 Mar 2016 11:07:48 +1300 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On 11 March 2016 at 10:34, Andrew Godwin wrote: >> >> >> I realise this may sound bikesheddy, but it would be really good to >> not call it ASGI. From your docs " >> Despite the name of the proposal, ASGI does not specify or design to >> any specific in-process async solution, such as asyncio, twisted, or >> gevent. Instead, the receive_many function can be switched between >> nonblocking or synchronous. This approach allows applications to >> choose what?s best for their current runtime environment; further >> improvements may provide extensions where cooperative versions of >> receive_many are provided." >> >> I'm worried that folk will assume a parallel between ASGI and asyncio, >> but there appears to be none... which is only a problem due to the >> room for confusion. > > > Better names are welcome, but I quite like ASGI's similarity to WSGI, and > the fact it's pronounceable as a single word. The "Asynchronous" part covers > the way the whole system operates; async is already an overloaded term, and > while there might be initial confusion, I think "async" also has strong > associations with the sort of problems ASGI solves (like websockets), which > I think is useful. Perhaps thats a particularly browser-centric view? There's nothing that strongly associates TCP with Python's slant on 'async' for me - interfaces on top of message passing can be sync or async - as in fact the switch you've got demonstrates :). Other names? quick thoughts... WSGP (web services gateway protocol) MuPGI (multiple protocol gateway interface) >> For consistency, why not a dict unicode -> List[bytes] > > I personally think this is worse than a list of tuples (which you can at > least feed straight into dict()) - the only header that comes through as > multiple, ever, is Set-Cookie, after all. I think you're wrong about that 'only header' statement. rfc 7230 3.2.2 permits multiple header fields with the same field name for all field values defined as comma separated lists, and for set-cookie. So you can't feed it straight into dict, unless you place a requirement on the server to always fold together multiple header fields with the same field name.... and clients to not use that either. Oh, and special case Set-cookie. -Rob -- Robert Collins Distinguished Technologist HP Converged Cloud From andrew at aeracode.org Thu Mar 10 18:56:00 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Thu, 10 Mar 2016 15:56:00 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On Thu, Mar 10, 2016 at 2:07 PM, Robert Collins wrote: > On 11 March 2016 at 10:34, Andrew Godwin wrote: > >> > >> > >> I realise this may sound bikesheddy, but it would be really good to > >> not call it ASGI. From your docs " > >> Despite the name of the proposal, ASGI does not specify or design to > >> any specific in-process async solution, such as asyncio, twisted, or > >> gevent. Instead, the receive_many function can be switched between > >> nonblocking or synchronous. This approach allows applications to > >> choose what?s best for their current runtime environment; further > >> improvements may provide extensions where cooperative versions of > >> receive_many are provided." > >> > >> I'm worried that folk will assume a parallel between ASGI and asyncio, > >> but there appears to be none... which is only a problem due to the > >> room for confusion. > > > > > > Better names are welcome, but I quite like ASGI's similarity to WSGI, and > > the fact it's pronounceable as a single word. The "Asynchronous" part > covers > > the way the whole system operates; async is already an overloaded term, > and > > while there might be initial confusion, I think "async" also has strong > > associations with the sort of problems ASGI solves (like websockets), > which > > I think is useful. > > Perhaps thats a particularly browser-centric view? There's nothing > that strongly associates TCP with Python's slant on 'async' for me - > interfaces on top of message passing can be sync or async - as in fact > the switch you've got demonstrates :). > > Other names? > > quick thoughts... > WSGP (web services gateway protocol) > MuPGI (multiple protocol gateway interface) Maybe, but this is specifically oriented as a web-based protocol - I'm not proposing to replace all network processing here - and in that context, "async" largely means "I can do things outside a normal request-response process". I guess it would take a lot for me to change the name at this point, as it's already so many places, but I do see your point. > > > > >> For consistency, why not a dict unicode -> List[bytes] > > > > I personally think this is worse than a list of tuples (which you can at > > least feed straight into dict()) - the only header that comes through as > > multiple, ever, is Set-Cookie, after all. > > I think you're wrong about that 'only header' statement. > > rfc 7230 3.2.2 permits multiple header fields with the same field name > for all field values defined as comma separated lists, and for > set-cookie. > > So you can't feed it straight into dict, unless you place a > requirement on the server to always fold together multiple header > fields with the same field name.... and clients to not use that > either. Oh, and special case Set-cookie. > I would indeed want to require servers to always fold headers together into a comma-separated list, as that's what the RFC says, and it then means applications only have to deal with one kind of multi-header! Set-cookie is the annoying thing here, though. That's why it's dict inbound and list of tuples outbound right now, and I just don't know if I want to make the inbound one a list of tuples too, given I do definitely want to force servers to concat headers together (unless I find any examples of that screwing things up) Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From cory at lukasa.co.uk Fri Mar 11 05:28:35 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 11 Mar 2016 10:28:35 +0000 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: > On 10 Mar 2016, at 23:56, Andrew Godwin wrote: > > I would indeed want to require servers to always fold headers together into a comma-separated list, as that's what the RFC says, and it then means applications only have to deal with one kind of multi-header! Wellllll?.kinda? The RFC says that multiple headers are *semantically equivalent* to the joined form, but does not in any sense require that it be done. (The normative language in RFC 7230 is MAY.) I had this discussion recently with Brian Smith: while there is only one correct way to fold/unfold headers, anywhere on the spectrum between completely folded and completely unfolded is a perfectly valid representation of the HTTP header block. This means that there?s no *rules* about how a server is supposed to do it, at least from the IETF. ASGI is of course totally allowed to add its own rules, and requiring that they be folded is not terrible. FWIW, in my experience, I?ve found that ?list of tuples? is really the most likely to be correct way to represent a header block, because it provides some assurances to the user that the header block has not been aggressively transformed from how it was sent on the wire. While the *rules* are that the folded representation is supposed to be semantically equivalent to the unfolded representation, there is nonetheless some information implicit in those headers being separate. My intuition when writing this kind of thing is to pass applications (like Django) the most meaningful representation I can, and then allow the application to make its own decisions about what meaning they?re willing to lose. That?s why I?d advocate for ?list of two-tuples of bytestrings? as the representation. However, I don?t think there?s anything *wrong* with forcing the headers to be joined by the server where possible: it?s just not how I?d do it. ;) > Set-cookie is the annoying thing here, though. That's why it's dict inbound and list of tuples outbound right now, and I just don't know if I want to make the inbound one a list of tuples too, given I do definitely want to force servers to concat headers together (unless I find any examples of that screwing things up) You could make the inbound one a list of tuples but still require that the servers concat headers. The rule then would be that it needs to be possible for an application to say `dict(headers)` without any loss of meaning. Cory -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From cory at lukasa.co.uk Fri Mar 11 05:28:36 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 11 Mar 2016 10:28:36 +0000 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: > On 10 Mar 2016, at 18:36, Andrew Godwin wrote: > > > Second, if it were me I?d remove the `status_text` field on the `Response` object. Custom status text is a terrible misfeature (especially as HTTP/2 doesn?t support it), and in 99% of cases you?re just wasting data by repeatedly sending the default phrase that the server already knows. > > Well, it IS optional; you only need to send it if you're changing it from the default or providing an unusual new value (e.g. 418). We could change the spec to say servers don't have to abide by it, too. I have done a project in the past with custom reason phrases, that's all :) You monster! ;) For what it?s worth, I object to the use of reason phrases because, as with all things in HTTP, they were far-too-broadly specified. The rules for parsing the reason phrase are super broad (the reason phrase allows \t, space, and then all bytes from 0x21 to 0xFF *excluding* 0x7F (ASCII DEL). This means that it?s sometimes possible to encode a reason phrase containing non-ASCII/non-Latin-1 codepoints in UTF-8 (I?ve seen this happen), and then everything gets really terrible really fast. IMO, I think almost nothing would be lost by just quietly removing it from the specification. The only loss is in setting ?unusual? values, and FWIW I think that?s *also* unwise: if it can?t be found here[0] then the unusual status code is nothing but vanity, because it?s no more precise than the X00 version that already exists (no user agent can take action on it). Again, just my 2?. Cory [0]: https://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From cmawebsite at gmail.com Fri Mar 11 10:45:05 2016 From: cmawebsite at gmail.com (Collin Anderson) Date: Fri, 11 Mar 2016 10:45:05 -0500 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: Just a thought from a non-wsgi developer: I think it might be smart to follow http2 when in doubt on a question: - http2 preserves header order and allows duplicates in both directions. A list of tuples seems to be the best data structure IMHO. - http2 ignores reason phrases, which makes me think discarding it wouldn't be a problem for the new standard. On Fri, Mar 11, 2016 at 5:28 AM, Cory Benfield wrote: > > > On 10 Mar 2016, at 18:36, Andrew Godwin wrote: > > > > > > Second, if it were me I?d remove the `status_text` field on the > `Response` object. Custom status text is a terrible misfeature (especially > as HTTP/2 doesn?t support it), and in 99% of cases you?re just wasting data > by repeatedly sending the default phrase that the server already knows. > > > > Well, it IS optional; you only need to send it if you're changing it > from the default or providing an unusual new value (e.g. 418). We could > change the spec to say servers don't have to abide by it, too. I have done > a project in the past with custom reason phrases, that's all :) > > You monster! ;) > > For what it?s worth, I object to the use of reason phrases because, as > with all things in HTTP, they were far-too-broadly specified. The rules for > parsing the reason phrase are super broad (the reason phrase allows \t, > space, and then all bytes from 0x21 to 0xFF *excluding* 0x7F (ASCII DEL). > This means that it?s sometimes possible to encode a reason phrase > containing non-ASCII/non-Latin-1 codepoints in UTF-8 (I?ve seen this > happen), and then everything gets really terrible really fast. > > IMO, I think almost nothing would be lost by just quietly removing it from > the specification. The only loss is in setting ?unusual? values, and FWIW I > think that?s *also* unwise: if it can?t be found here[0] then the unusual > status code is nothing but vanity, because it?s no more precise than the > X00 version that already exists (no user agent can take action on it). > > Again, just my 2?. > > Cory > > > [0]: > https://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > https://mail.python.org/mailman/options/web-sig/cmawebsite%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at aeracode.org Fri Mar 11 12:56:22 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Fri, 11 Mar 2016 09:56:22 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: On Fri, Mar 11, 2016 at 2:28 AM, Cory Benfield wrote: > > On 10 Mar 2016, at 23:56, Andrew Godwin wrote: > > I would indeed want to require servers to always fold headers together > into a comma-separated list, as that's what the RFC says, and it then means > applications only have to deal with one kind of multi-header! > > > Wellllll?.kinda? > > The RFC says that multiple headers are *semantically equivalent* to the > joined form, but does not in any sense require that it be done. (The > normative language in RFC 7230 is MAY.) > > I had this discussion recently with Brian Smith: while there is only one > correct way to fold/unfold headers, anywhere on the spectrum between > completely folded and completely unfolded is a perfectly valid > representation of the HTTP header block. This means that there?s no *rules* > about how a server is supposed to do it, at least from the IETF. ASGI is of > course totally allowed to add its own rules, and requiring that they be > folded is not terrible. > > FWIW, in my experience, I?ve found that ?list of tuples? is really the > most likely to be correct way to represent a header block, because it > provides some assurances to the user that the header block has not been > aggressively transformed from how it was sent on the wire. While the > *rules* are that the folded representation is supposed to be semantically > equivalent to the unfolded representation, there is nonetheless some > information implicit in those headers being separate. > > My intuition when writing this kind of thing is to pass applications (like > Django) the most meaningful representation I can, and then allow the > application to make its own decisions about what meaning they?re willing to > lose. That?s why I?d advocate for ?list of two-tuples of bytestrings? as > the representation. However, I don?t think there?s anything *wrong* with > forcing the headers to be joined by the server where possible: it?s just > not how I?d do it. ;) > > Set-cookie is the annoying thing here, though. That's why it's dict > inbound and list of tuples outbound right now, and I just don't know if I > want to make the inbound one a list of tuples too, given I do definitely > want to force servers to concat headers together (unless I find any > examples of that screwing things up) > > > You could make the inbound one a list of tuples but still require that the > servers concat headers. The rule then would be that it needs to be possible > for an application to say `dict(headers)` without any loss of meaning. > Yes, I think this is a good argument - my worry has always been that the "no multiples" is more of a soft rule that some clients might break or some apps might rely on the ordering/multiplicity of things, so preserving it is _probably_ helpful (and as you say, it lets the header names go back to bytestrings). I'll modify the spec and then update Daphne and Channels to match; I can leave Channels parsing both types for a bit, at least. Collin's point about http2's handling of headers is on point, too - if the new spec is deliberately thinned down to that point but no further, it's probably wise to follow them since they know much more about it than I do. Andrew -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at aeracode.org Fri Mar 11 12:59:59 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Fri, 11 Mar 2016 09:59:59 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: One thing I did want to ask - is it worth still squashing everything down to the same case? Daphne already clears out headers with _ in them to avoid that CVE about it, and header case is never semantic, or so I thought? Andrew On Fri, Mar 11, 2016 at 9:56 AM, Andrew Godwin wrote: > > > On Fri, Mar 11, 2016 at 2:28 AM, Cory Benfield wrote: > >> >> On 10 Mar 2016, at 23:56, Andrew Godwin wrote: >> >> I would indeed want to require servers to always fold headers together >> into a comma-separated list, as that's what the RFC says, and it then means >> applications only have to deal with one kind of multi-header! >> >> >> Wellllll?.kinda? >> >> The RFC says that multiple headers are *semantically equivalent* to the >> joined form, but does not in any sense require that it be done. (The >> normative language in RFC 7230 is MAY.) >> >> I had this discussion recently with Brian Smith: while there is only one >> correct way to fold/unfold headers, anywhere on the spectrum between >> completely folded and completely unfolded is a perfectly valid >> representation of the HTTP header block. This means that there?s no *rules* >> about how a server is supposed to do it, at least from the IETF. ASGI is of >> course totally allowed to add its own rules, and requiring that they be >> folded is not terrible. >> >> FWIW, in my experience, I?ve found that ?list of tuples? is really the >> most likely to be correct way to represent a header block, because it >> provides some assurances to the user that the header block has not been >> aggressively transformed from how it was sent on the wire. While the >> *rules* are that the folded representation is supposed to be semantically >> equivalent to the unfolded representation, there is nonetheless some >> information implicit in those headers being separate. >> >> My intuition when writing this kind of thing is to pass applications >> (like Django) the most meaningful representation I can, and then allow the >> application to make its own decisions about what meaning they?re willing to >> lose. That?s why I?d advocate for ?list of two-tuples of bytestrings? as >> the representation. However, I don?t think there?s anything *wrong* with >> forcing the headers to be joined by the server where possible: it?s just >> not how I?d do it. ;) >> >> Set-cookie is the annoying thing here, though. That's why it's dict >> inbound and list of tuples outbound right now, and I just don't know if I >> want to make the inbound one a list of tuples too, given I do definitely >> want to force servers to concat headers together (unless I find any >> examples of that screwing things up) >> >> >> You could make the inbound one a list of tuples but still require that >> the servers concat headers. The rule then would be that it needs to be >> possible for an application to say `dict(headers)` without any loss of >> meaning. >> > > Yes, I think this is a good argument - my worry has always been that the > "no multiples" is more of a soft rule that some clients might break or some > apps might rely on the ordering/multiplicity of things, so preserving it is > _probably_ helpful (and as you say, it lets the header names go back to > bytestrings). > > I'll modify the spec and then update Daphne and Channels to match; I can > leave Channels parsing both types for a bit, at least. > > Collin's point about http2's handling of headers is on point, too - if the > new spec is deliberately thinned down to that point but no further, it's > probably wise to follow them since they know much more about it than I do. > > Andrew > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmawebsite at gmail.com Fri Mar 11 13:03:35 2016 From: cmawebsite at gmail.com (Collin Anderson) Date: Fri, 11 Mar 2016 13:03:35 -0500 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: http2 makes all header names lowercase On Fri, Mar 11, 2016 at 12:59 PM, Andrew Godwin wrote: > One thing I did want to ask - is it worth still squashing everything down > to the same case? Daphne already clears out headers with _ in them to avoid > that CVE about it, and header case is never semantic, or so I thought? > > Andrew > > On Fri, Mar 11, 2016 at 9:56 AM, Andrew Godwin > wrote: > >> >> >> On Fri, Mar 11, 2016 at 2:28 AM, Cory Benfield wrote: >> >>> >>> On 10 Mar 2016, at 23:56, Andrew Godwin wrote: >>> >>> I would indeed want to require servers to always fold headers together >>> into a comma-separated list, as that's what the RFC says, and it then means >>> applications only have to deal with one kind of multi-header! >>> >>> >>> Wellllll?.kinda? >>> >>> The RFC says that multiple headers are *semantically equivalent* to the >>> joined form, but does not in any sense require that it be done. (The >>> normative language in RFC 7230 is MAY.) >>> >>> I had this discussion recently with Brian Smith: while there is only one >>> correct way to fold/unfold headers, anywhere on the spectrum between >>> completely folded and completely unfolded is a perfectly valid >>> representation of the HTTP header block. This means that there?s no *rules* >>> about how a server is supposed to do it, at least from the IETF. ASGI is of >>> course totally allowed to add its own rules, and requiring that they be >>> folded is not terrible. >>> >>> FWIW, in my experience, I?ve found that ?list of tuples? is really the >>> most likely to be correct way to represent a header block, because it >>> provides some assurances to the user that the header block has not been >>> aggressively transformed from how it was sent on the wire. While the >>> *rules* are that the folded representation is supposed to be semantically >>> equivalent to the unfolded representation, there is nonetheless some >>> information implicit in those headers being separate. >>> >>> My intuition when writing this kind of thing is to pass applications >>> (like Django) the most meaningful representation I can, and then allow the >>> application to make its own decisions about what meaning they?re willing to >>> lose. That?s why I?d advocate for ?list of two-tuples of bytestrings? as >>> the representation. However, I don?t think there?s anything *wrong* with >>> forcing the headers to be joined by the server where possible: it?s just >>> not how I?d do it. ;) >>> >>> Set-cookie is the annoying thing here, though. That's why it's dict >>> inbound and list of tuples outbound right now, and I just don't know if I >>> want to make the inbound one a list of tuples too, given I do definitely >>> want to force servers to concat headers together (unless I find any >>> examples of that screwing things up) >>> >>> >>> You could make the inbound one a list of tuples but still require that >>> the servers concat headers. The rule then would be that it needs to be >>> possible for an application to say `dict(headers)` without any loss of >>> meaning. >>> >> >> Yes, I think this is a good argument - my worry has always been that the >> "no multiples" is more of a soft rule that some clients might break or some >> apps might rely on the ordering/multiplicity of things, so preserving it is >> _probably_ helpful (and as you say, it lets the header names go back to >> bytestrings). >> >> I'll modify the spec and then update Daphne and Channels to match; I can >> leave Channels parsing both types for a bit, at least. >> >> Collin's point about http2's handling of headers is on point, too - if >> the new spec is deliberately thinned down to that point but no further, >> it's probably wise to follow them since they know much more about it than I >> do. >> >> Andrew >> > > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > https://mail.python.org/mailman/options/web-sig/cmawebsite%40gmail.com > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew at aeracode.org Fri Mar 11 13:05:07 2016 From: andrew at aeracode.org (Andrew Godwin) Date: Fri, 11 Mar 2016 10:05:07 -0800 Subject: [Web-SIG] Inviting feedback on my proposed "ASGI" spec In-Reply-To: References: Message-ID: Yes, I thought that was the case. I think adding lowercase normalisation to header names to the spec would be sensible (daphne already does this, but I'd like to make it reliable upon) Andrew On Fri, Mar 11, 2016 at 10:03 AM, Collin Anderson wrote: > http2 makes all header names lowercase > > On Fri, Mar 11, 2016 at 12:59 PM, Andrew Godwin > wrote: > >> One thing I did want to ask - is it worth still squashing everything down >> to the same case? Daphne already clears out headers with _ in them to avoid >> that CVE about it, and header case is never semantic, or so I thought? >> >> Andrew >> >> On Fri, Mar 11, 2016 at 9:56 AM, Andrew Godwin >> wrote: >> >>> >>> >>> On Fri, Mar 11, 2016 at 2:28 AM, Cory Benfield >>> wrote: >>> >>>> >>>> On 10 Mar 2016, at 23:56, Andrew Godwin wrote: >>>> >>>> I would indeed want to require servers to always fold headers together >>>> into a comma-separated list, as that's what the RFC says, and it then means >>>> applications only have to deal with one kind of multi-header! >>>> >>>> >>>> Wellllll?.kinda? >>>> >>>> The RFC says that multiple headers are *semantically equivalent* to the >>>> joined form, but does not in any sense require that it be done. (The >>>> normative language in RFC 7230 is MAY.) >>>> >>>> I had this discussion recently with Brian Smith: while there is only >>>> one correct way to fold/unfold headers, anywhere on the spectrum between >>>> completely folded and completely unfolded is a perfectly valid >>>> representation of the HTTP header block. This means that there?s no *rules* >>>> about how a server is supposed to do it, at least from the IETF. ASGI is of >>>> course totally allowed to add its own rules, and requiring that they be >>>> folded is not terrible. >>>> >>>> FWIW, in my experience, I?ve found that ?list of tuples? is really the >>>> most likely to be correct way to represent a header block, because it >>>> provides some assurances to the user that the header block has not been >>>> aggressively transformed from how it was sent on the wire. While the >>>> *rules* are that the folded representation is supposed to be semantically >>>> equivalent to the unfolded representation, there is nonetheless some >>>> information implicit in those headers being separate. >>>> >>>> My intuition when writing this kind of thing is to pass applications >>>> (like Django) the most meaningful representation I can, and then allow the >>>> application to make its own decisions about what meaning they?re willing to >>>> lose. That?s why I?d advocate for ?list of two-tuples of bytestrings? as >>>> the representation. However, I don?t think there?s anything *wrong* with >>>> forcing the headers to be joined by the server where possible: it?s just >>>> not how I?d do it. ;) >>>> >>>> Set-cookie is the annoying thing here, though. That's why it's dict >>>> inbound and list of tuples outbound right now, and I just don't know if I >>>> want to make the inbound one a list of tuples too, given I do definitely >>>> want to force servers to concat headers together (unless I find any >>>> examples of that screwing things up) >>>> >>>> >>>> You could make the inbound one a list of tuples but still require that >>>> the servers concat headers. The rule then would be that it needs to be >>>> possible for an application to say `dict(headers)` without any loss of >>>> meaning. >>>> >>> >>> Yes, I think this is a good argument - my worry has always been that the >>> "no multiples" is more of a soft rule that some clients might break or some >>> apps might rely on the ordering/multiplicity of things, so preserving it is >>> _probably_ helpful (and as you say, it lets the header names go back to >>> bytestrings). >>> >>> I'll modify the spec and then update Daphne and Channels to match; I can >>> leave Channels parsing both types for a bit, at least. >>> >>> Collin's point about http2's handling of headers is on point, too - if >>> the new spec is deliberately thinned down to that point but no further, >>> it's probably wise to follow them since they know much more about it than I >>> do. >>> >>> Andrew >>> >> >> >> _______________________________________________ >> Web-SIG mailing list >> Web-SIG at python.org >> Web SIG: http://www.python.org/sigs/web-sig >> Unsubscribe: >> https://mail.python.org/mailman/options/web-sig/cmawebsite%40gmail.com >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.madden at nextthought.com Thu Mar 24 11:18:04 2016 From: jason.madden at nextthought.com (Jason Madden) Date: Thu, 24 Mar 2016 10:18:04 -0500 Subject: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)? Message-ID: Hi all, Is there any practical reason that the type of the `environ` object must be exactly `dict`, as specified in PEP3333? I'm asking because it was recently pointed out that gevent's WSGI server can sometimes print `environ` (on certain error cases), but that can lead to sensitive information being kept in the server's logs (e.g., HTTP_AUTHORIZATION, HTTP_COOKIE, maybe other things). The simplest and most flexible way to prevent this from happening, not just inadvertently within gevent itself but also for client applications, I thought, was to have `environ` be a subclass of `dict` with a customized `__repr__` (much like WebOb does for MultiDict, and repoze.who does for Identity, both for similar reasons). Unfortunately, when I implemented that in [0], I discovered that `wsgiref.validator` asserts that type(environ) is dict. I looked up the PEP, and sure enough, PEP 3333 states that environ "must be a builtin Python dictionary (not a subclass, UserDict or other dictionary emulation)." [1] Background/History ================== That seemed overly restrictive to me, so I tried to backtrack the history of that language in hopes of discovering the rationale. - It was present in the predecessor of PEP 3333, PEP 0333, in the first version committed to the repository in August 2004. [2] - Prior to that, it was in both drafts of what would become PEP 0333 posted to this mailing list, again from August 2004: [3], [4]. - The ancestor of those drafts, the "Python Web Container Interface v1.0" was posted in December of 2003 with somewhat less restrictive language: "the environ object *must* be a Python dictionary....The rationale for requiring a dictionary is to maximize portability between containers" [5]. Now, the discussion on that earliest draft in [5] specifically brought up using other types that implement all the methods of a dictionary, like UserDict.DictMixin [6]. The last post on the subject in that thread seemed to be leaning towards accepting non-dict objects, at least if they were good enough [7]. By the time the draft became recognizable as the precursor to PEP 0333 in [3], the very strict language we have now was in place. That draft, however, specifically stated that it was intended to be compatible with Python 1.5.2. In Python 1.5.2, it wasn't possible to subclass the builtin dict, so imitations, like UserDict.DictMixin, were necessarily imprecise. This was later changed to the much-maligned Python 2.2.2 release [8]; Python 2.2 added the ability to subclass dict, but the language wasn't changed. Today ===== Given that today, we can subclass dict with full fidelity, is there still any practical reason not to be able to do so? I'm probably OK with gevent violating the letter of the spec in this regard, so long as there are no practical consequences. I was able to think of two possible objections, but both can be solved: - Pickling the custom `environ` type and then loading it in another process might not work if the class is not available. I can imagine this coming up with Celery, for example. This is easily fixed by adding an appropriate `__reduce_ex__` implementation. - Code somewhere relies on `if type(some_object) is dict:` (where `environ` became `some_object`, presumably through several levels of calls), instead of `isinstance(some_object, dict)` or `isinstance(some_object, collections.MutableMapping)`. The solution here is simply to not do that :) Pylint, among other linters, produces warnings if you do. Can anyone think of any other practical reasons I've overlooked? Is this just a horrible idea for other reasons? I appreciate any discussion! Thanks, Jason [0] https://github.com/gevent/gevent/compare/secure-environ [1] https://www.python.org/dev/peps/pep-3333/#specification-details [2] https://github.com/python/peps/commit/d5864f018f58a35fa787492e6763e382f98b923c#diff-ff370d50af3db062b015d1ef85935779 [3] https://mail.python.org/pipermail/web-sig/2004-August/000518.html [4] https://mail.python.org/pipermail/web-sig/2004-August/000562.html [5] https://mail.python.org/pipermail/web-sig/2003-December/000394.html [7] https://mail.python.org/pipermail/web-sig/2003-December/000401.html [8] https://mail.python.org/pipermail/web-sig/2004-August/000565.html From alan at xhaus.com Thu Mar 24 12:09:20 2016 From: alan at xhaus.com (Alan Kennedy) Date: Thu, 24 Mar 2016 16:09:20 +0000 Subject: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)? In-Reply-To: References: Message-ID: I don't see this relevant message in your references. https://mail.python.org/pipermail/web-sig/2004-September/000749.html Perhaps that, and following messages, might shed more light? On Thu, Mar 24, 2016 at 3:18 PM, Jason Madden wrote: > Hi all, > > > Is there any practical reason that the type of the `environ` object must > be exactly `dict`, as specified in PEP3333? > > I'm asking because it was recently pointed out that gevent's WSGI server > can sometimes print `environ` (on certain error cases), but that can lead > to sensitive information being kept in the server's logs (e.g., > HTTP_AUTHORIZATION, HTTP_COOKIE, maybe other things). The simplest and most > flexible way to prevent this from happening, not just inadvertently within > gevent itself but also for client applications, I thought, was to have > `environ` be a subclass of `dict` with a customized `__repr__` (much like > WebOb does for MultiDict, and repoze.who does for Identity, both for > similar reasons). > > Unfortunately, when I implemented that in [0], I discovered that > `wsgiref.validator` asserts that type(environ) is dict. I looked up the > PEP, and sure enough, PEP 3333 states that environ "must be a builtin > Python dictionary (not a subclass, UserDict or other dictionary > emulation)." [1] > > Background/History > ================== > > That seemed overly restrictive to me, so I tried to backtrack the history > of that language in hopes of discovering the rationale. > > - It was present in the predecessor of PEP 3333, PEP 0333, in the first > version committed to the repository in August 2004. [2] > - Prior to that, it was in both drafts of what would become PEP 0333 > posted to this mailing list, again from August 2004: [3], [4]. > - The ancestor of those drafts, the "Python Web Container Interface v1.0" > was posted in December of 2003 with somewhat less restrictive language: > "the environ object *must* be a Python dictionary....The rationale for > requiring a dictionary is to maximize portability > between containers" [5]. > > Now, the discussion on that earliest draft in [5] specifically brought up > using other types that implement all the methods of a dictionary, like > UserDict.DictMixin [6]. The last post on the subject in that thread seemed > to be leaning towards accepting non-dict objects, at least if they were > good enough [7]. > > By the time the draft became recognizable as the precursor to PEP 0333 in > [3], the very strict language we have now was in place. That draft, > however, specifically stated that it was intended to be compatible with > Python 1.5.2. In Python 1.5.2, it wasn't possible to subclass the builtin > dict, so imitations, like UserDict.DictMixin, were necessarily imprecise. > This was later changed to the much-maligned Python 2.2.2 release [8]; > Python 2.2 added the ability to subclass dict, but the language wasn't > changed. > > Today > ===== > > Given that today, we can subclass dict with full fidelity, is there still > any practical reason not to be able to do so? I'm probably OK with gevent > violating the letter of the spec in this regard, so long as there are no > practical consequences. I was able to think of two possible objections, but > both can be solved: > > - Pickling the custom `environ` type and then loading it in another > process might not work if the class is not available. I can imagine this > coming up with Celery, for example. This is easily fixed by adding an > appropriate `__reduce_ex__` implementation. > > - Code somewhere relies on `if type(some_object) is dict:` (where > `environ` became `some_object`, presumably through several levels of > calls), instead of `isinstance(some_object, dict)` or > `isinstance(some_object, collections.MutableMapping)`. The solution here is > simply to not do that :) Pylint, among other linters, produces warnings if > you do. > > Can anyone think of any other practical reasons I've overlooked? Is this > just a horrible idea for other reasons? > > I appreciate any discussion! > > Thanks, > Jason > > [0] https://github.com/gevent/gevent/compare/secure-environ > [1] https://www.python.org/dev/peps/pep-3333/#specification-details > [2] > https://github.com/python/peps/commit/d5864f018f58a35fa787492e6763e382f98b923c#diff-ff370d50af3db062b015d1ef85935779 > [3] https://mail.python.org/pipermail/web-sig/2004-August/000518.html > [4] https://mail.python.org/pipermail/web-sig/2004-August/000562.html > [5] https://mail.python.org/pipermail/web-sig/2003-December/000394.html > [7] https://mail.python.org/pipermail/web-sig/2003-December/000401.html > [8] https://mail.python.org/pipermail/web-sig/2004-August/000565.html > > _______________________________________________ > Web-SIG mailing list > Web-SIG at python.org > Web SIG: http://www.python.org/sigs/web-sig > Unsubscribe: > https://mail.python.org/mailman/options/web-sig/alan%40xhaus.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jason.madden at nextthought.com Thu Mar 24 12:29:05 2016 From: jason.madden at nextthought.com (Jason Madden) Date: Thu, 24 Mar 2016 11:29:05 -0500 Subject: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)? In-Reply-To: References: Message-ID: <7670B44D-E963-4D8C-A5E9-E057F4C775BE@nextthought.com> > On Mar 24, 2016, at 11:09, Alan Kennedy wrote: > > I don't see this relevant message in your references. > > https://mail.python.org/pipermail/web-sig/2004-September/000749.html > > Perhaps that, and following messages, might shed more light? Yes, thank you, I did miss that thread. It does help shed some light on the issue. The two main arguments made seem to be that: 1) Creating subclasses of builtin objects is difficult and subject to breakage if you try to get too fancy. That's a fair point, and in the context of when it was written (Python 3.0 was still under discussion) it makes a lot of sense. 2) Middleware or the app can do dict(environ) and lose your subclass. Also true. But I think it's only particularly relevant if the WSGI implementation itself relies on the subclass to provide essential functionality that the PEP specifies (e.g., decoding bytes-to-str on key access). It was also mentioned that practicality beats purity and no practical use for a subclass was known. Well, here's a practical use :) And the two points above do not apply to this practical use, I think. (1) doesn't apply because `__repr__` is not going to change and isn't fancy. (2) doesn't apply because gevent keeps a reference to the environ its creates and passes to the app, so if middleware passes a new dict(environ) on to the app, gevent's own error handling is still secure; consider passing a SecureEnviron to the app a best-effort at secure-by-default---if the user configures their application such that this feature is disabled for part of the stack, that's on the application. No feature of gevent will break, and it's better than not having the option at all IMHO. Jason From cory at lukasa.co.uk Fri Mar 25 06:01:51 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 25 Mar 2016 10:01:51 +0000 Subject: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)? In-Reply-To: <7670B44D-E963-4D8C-A5E9-E057F4C775BE@nextthought.com> References: <7670B44D-E963-4D8C-A5E9-E057F4C775BE@nextthought.com> Message-ID: <70D88569-62CB-4EC8-A467-6890336B0D96@lukasa.co.uk> > On 24 Mar 2016, at 16:29, Jason Madden wrote: > Well, here's a practical use :) And the two points above do not apply to this practical use, I think. (1) doesn't apply because `__repr__` is not going to change and isn't fancy. (2) doesn't apply because gevent keeps a reference to the environ its creates and passes to the app, so if middleware passes a new dict(environ) on to the app, gevent's own error handling is still secure; consider passing a SecureEnviron to the app a best-effort at secure-by-default---if the user configures their application such that this feature is disabled for part of the stack, that's on the application. No feature of gevent will break, and it's better than not having the option at all IMHO. Given that gevent is keeping hold of its own reference to the environ, why does gevent not simply wrap the environ dict in a class that implements this functionality directly? In that manner, gevent can expose its own error handling behaviour as desired, and continue to follow PEP-3333. In fact, I believe this is exactly what PJ was getting at. The ability to subclass the dictionary (in this case, to subclass it with one that hides some keys on printing) is only useful to the entity that does the subclassing, because there is no guarantee that the subclass will not be lost somewhere else in the WSGI stack. However, if subclassing is only useful to you there is another alternative to the problem, which is to compose the environ dict into an object that applies the custom behaviour. Because of that, I?m disinclined to want to widen the spec here. PJ?s original analysis is right: allowing subclasses does not provide more utility than disallowing them, but it does allow more bugs to creep in due to inconsistent expectations. Better to have an object with a known set of behaviours and have applications/servers wrap it in custom function. Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: From jason.madden at nextthought.com Fri Mar 25 11:04:24 2016 From: jason.madden at nextthought.com (Jason Madden) Date: Fri, 25 Mar 2016 10:04:24 -0500 Subject: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)? In-Reply-To: <70D88569-62CB-4EC8-A467-6890336B0D96@lukasa.co.uk> References: <7670B44D-E963-4D8C-A5E9-E057F4C775BE@nextthought.com> <70D88569-62CB-4EC8-A467-6890336B0D96@lukasa.co.uk> Message-ID: > On Mar 25, 2016, at 05:01, Cory Benfield wrote: > > Given that gevent is keeping hold of its own reference to the environ, why does gevent not simply wrap the environ dict in a class that implements this functionality directly? In that manner, gevent can expose its own error handling behaviour as desired, and continue to follow PEP-3333. I did consider that, but didn't want to do that unless there were actual practical problems passing the same object that gevent references. Making a copy just to pass to the application adds additional time and memory requirements that are always nice to avoid in a server. > In fact, I believe this is exactly what PJ was getting at. The ability to subclass the dictionary (in this case, to subclass it with one that hides some keys on printing) is only useful to the entity that does the subclassing, because there is no guarantee that the subclass will not be lost somewhere else in the WSGI stack. I looked at most of the middleware listed on the WSGI homepage [1], as well as a decent sampling of the packages identified as middleware on PyPI [2]. I didn't find any that passed a new environ on to the next application; they all seem to simply pass on the environ object as given to them. Now that's just a sampling so obviously it doesn't mean that such copying doesn't happen. But doing so eliminates the ability for lower middlewares to communicate with upper middlewares through the environ if they are more than one layer separated---a real-world example is setting `paste.expected_exceptions`---so practically speaking, I imagine it's quite rare. > Because of that, I?m disinclined to want to widen the spec here. PJ?s original analysis is right: allowing subclasses does not provide more utility than disallowing them, but it does allow more bugs to creep in due to inconsistent expectations. Better to have an object with a known set of behaviours and have applications/servers wrap it in custom function. I'm not sure I agree with that, but I can see the argument. I started out by asking if there were any *practical* reasons not to pass a tiny dict subclass as environ, and when I was surveying existing middleware for this thread, I found a big reason: it turns out that WebOb's Request object *also* verifies that type(environ) is dict [3]. Given the popularity of WebOb and its derivatives like Pyramid, this is not a change gevent can make. We'll take a different approach. Thanks again for all the great insights and discussion! Jason [1] http://wsgi.readthedocs.org/en/latest/libraries.html [2] https://pypi.python.org/pypi?:action=browse&show=all&c=319&c=326&c=506&c=509 [3] https://github.com/Pylons/webob/blob/master/webob/request.py#L112 From cory at lukasa.co.uk Fri Mar 25 13:23:35 2016 From: cory at lukasa.co.uk (Cory Benfield) Date: Fri, 25 Mar 2016 17:23:35 +0000 Subject: [Web-SIG] Any practical reason type(environ) must be dict (not subclass)? In-Reply-To: References: <7670B44D-E963-4D8C-A5E9-E057F4C775BE@nextthought.com> <70D88569-62CB-4EC8-A467-6890336B0D96@lukasa.co.uk> Message-ID: > On 25 Mar 2016, at 15:04, Jason Madden wrote: > > >> On Mar 25, 2016, at 05:01, Cory Benfield wrote: >> >> Given that gevent is keeping hold of its own reference to the environ, why does gevent not simply wrap the environ dict in a class that implements this functionality directly? In that manner, gevent can expose its own error handling behaviour as desired, and continue to follow PEP-3333. > > I did consider that, but didn't want to do that unless there were actual practical problems passing the same object that gevent references. Making a copy just to pass to the application adds additional time and memory requirements that are always nice to avoid in a server. For what it?s worth, I?m not advocating a copy. I?m advocating a class like this: class SecureDictWrapper(collections.MutableMapping): def __init__(self, environ): self._environ = environ That class would then implement the MutableMapping API and delegate its calls through to the dictionary itself. There would still only be one dictionary: the only new allocation is for the wrapper class. The overhead is small. =) Cory -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 801 bytes Desc: Message signed with OpenPGP using GPGMail URL: