[Web-SIG] REMOTE_ADDR and proxys

Tue Oct 14 05:21:32 CEST 2014

On 13/10/2014, at 11:26 PM, Benoit Chesneau <bchesneau at gmail.com> wrote:

> 
> 
> On Sun, Oct 12, 2014 at 11:38 PM, Robert Collins <robertc at robertcollins.net> wrote:
> On 30 September 2014 11:47, Alan Kennedy <alan at xhaus.com> wrote:
> 
> > [Robert]
> >> So it sounds like it should be the responsibility of a middleware to
> >> renormalize the environment?
> >
> > In order for that to be the case, you have strictly define what
> > "normalization" means.
> 
> For a given deployment its well defined. I agree that in general its not.
> 
> > I believe that it is not possible to fully specify "normalization", and that
> > any attempt to do so is futile.
> >
> > If you want to attempt it for the specific scenarios that your particular
> > application has to deal with, then by all means code your version of
> > "normalization" into your application. Or write some middleware to do it.
> >
> > But trying to make "normalization" a part of a WSGI-style specification is
> > impossible.
> 
> I don't recall proposing that it should be in a WSGI-style spec.
> 
> -Rob
> 
> --
> Robert Collins <rbtcollins at hp.com>
> Distinguished Technologist
> HP Converged Cloud
> _______________________________________________
> Web-SIG mailing list
> Web-SIG at python.org
> Web SIG: http://www.python.org/sigs/web-sig
> Unsubscribe: https://mail.python.org/mailman/options/web-sig/bchesneau%40gmail.com
> 
> 
> All this issue looks like the problem raised (and not yet solved) recently in Gunicorn when the REMOTE_ADDR has been handled more strictly and we removed all the X-Forward-* headers handling:
> 
> https://github.com/benoitc/gunicorn/issues/797
> 
> There is another case to take in consideration, when your server is answering on unix sockets, so you don't have any TCP address to present. For now we answer with an empty field. 
> 
> Also some application frameworks recently removed the middleware handling X-Forward-* headers. I wonder why.
> 
> 
> There is an RFC for forward headers: http://tools.ietf.org/html/rfc7239 . For me instead of trying to change the strict behaviour of REMOTE_ADDR I wonder if we shouldn't rather add a new field to the environ. Thoughts?

My prior thinking on this was that REMOTE_ADDR should be left alone.

If front end proxies support RFC-7239 and pass them through you are all good.

If you are in a situation where a front end proxy doesn't support RFC-7239 but uses the prior convention of X-Forwarded-* headers, then one could take the older headers and construct the new RFC-7239 headers and drop the old X-Fowarded headers.

In other words, converge on the new convention set by RFC-7239 by translating the old way of doing things to the new. This way a WSGI application can be coded up just to check for the new header and not have to deal with both.

The actual translation from old headers to new could be done by a WSGI middleware or an optionally enabled WSGI server feature. Either way it doesn't need to be part of the WSGI specification.

As noted by others, the issue though is how much you trust the information passed in by the headers and does it capture entirely the existence of multiple hops.

In the case of REMOTE_ADDR it is added by the web server based on actual socket information and so there is no way a client can supersede it.

The X-Fowarded-* and Forwarded headers have the problem that a client can set them itself.

In having multiple ways now of denoting it, which takes precedence and do you trust. If your proxies use X-Forwarded-* but a HTTP client sets Forwarded, what do you do.

Ultimately, whether you use a WSGI middleware or a WSGI server which provides a built function for the typical case (optionally enabled), it has to be configurable to the point of an administrator being able to say what are the trusted headers. You may also want to be able to say what the IPs of proxies are that you want to trust if practical. This must be something an administrator can do and not be be dependent on developers embedding it within an application, which is why a builtin mechanism with a WSGI server may be preferred.

Anyway, this way a system administrator can say whether it is expected that a proxy only sets X-Forwarded-* and not Forwarded or vice versa and who to trust. You likely can't just have a default strategy if you want to be safe.

Another issue to consider is header spoofing, which not all WSGI servers protect against at the moment.

The spoofing problem is because of the CGI rule around how header names are converted. That is:

   Meta-variables with names beginning with "HTTP_" contain values read
   from the client request header fields, if the protocol used is HTTP.
   The HTTP header field name is converted to upper case, has all
   occurrences of "-" replaced with "_" and has "HTTP_" prepended to
   give the meta-variable name.  The header data can be presented as
   sent by the client, or can be rewritten in ways which do not change
   its semantics.  If multiple header fields with the same field-name
   are received then the server MUST rewrite them as a single value
   having the same semantics.  Similarly, a header field that spans
   multiple lines MUST be merged onto a single line.  The server MUST,
   if necessary, change the representation of the data (for example, the
   character set) to be appropriate for a CGI meta-variable.
So this means that X-Forwarded-For is translated to HTTP_X_FOWARDED_FOR. The problem is that if a client itself sends X_Forwarded_For, then it would also map to the same thing.

By the rules above the two values would be concatenated if a proxy set one and the client sent the other, usually separating the values with a comma. If you are attempting to block certain clients based on this, then the header value could be poisoned and cause problems for such a scheme.

If using a WSGI middleware therefore, depending on the final usage, you may want to be making sure the WSGI server deals with this form of header spoofing as well.

FWIW, latest versions of mod_wsgi will only accept headers and convert using the above rule where they only contain alphanumerics and '-'. If any other characters are used the header is thrown away.

This behaviour is by virtue of Apache 2.4 doing the blocking.

There was however a bug in mod_wsgi which means that spoofed headers still got through in environ passed to mod_wsgi specific access/authentication/authorization hook extensions for Apache. This has been fixed in recent release. At the same time it was decided to apply the more strict rules about what was allowed back to older Apache 2.2 as well, since Apache 2.2 doesn't do the blocking that Apache 2.4 does.

Unfortunately because Linux distros ship out of date mod_wsgi versions, it can still be an issue there. Have been pondering turning the issue into a CERT just to force them to back port the fixes. :-)

Graham

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/web-sig/attachments/20141014/b2a959db/attachment.html>