[Mailman-Developers] Virtual Host handling in listinfo.py
Richard Barrett
R.Barrett@ftel.co.uk
Fri, 6 Oct 2000 15:20:03 +0100
In what follows I am referring to code in the file
Mailman/Cgi/listinfo.py in the 2.0beta6 release of Mailman. I'm
working with Apache/1.3.12 (Unix) which may influence your judgement
about my arguments.
Sorry if what follows is too long but I found it useful to fully
analyse my own thinking on the topic. Red face for me if the issue is
well known to you all.
The code I'm concerned with is in the function
FormatListinfoOverview. It deals with the situation when
mm_cfg.VIRTUAL_HOST_OVERVIEW is true and computation is done to
detemine which advertised mail lists should be returned when the URI
/mailman/listinfo/ is being responded to. The relevant bits of the
code are as follows:
def FormatListinfoOverview(error=None):
...
<snip>
...
http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
port = os.environ.get('SERVER_PORT')
# strip off the port if there is one
if port and http_host[-len(port)-1:] == ':'+port:
http_host = http_host[:-len(port)-1]
if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host:
host_name = http_host
else:
host_name = mm_cfg.DEFAULT_HOST_NAME
...
<snip>
...
for n in names:
if mlist.advertised:
if mm_cfg.VIRTUAL_HOST_OVERVIEW and \
http_host and \
string.find(http_host, mlist.web_page_url) == -1 and \
string.find(mlist.web_page_url, http_host) == -1:
# List is for different identity of this host - skip it.
continue
else:
advertised.append(mlist)
...
<snip>
There is a flaw in this code in the way that it strips the port
number from the http_host variable but I'll come on to that below.
As best I can judge the purpose of considering the value of the
environment variable HTTP_HOST (if available) instead of just using
the SERVER_NAME value is to try and deduce a virtual host's server
name in cirumstances when the web server has not. For instance:
1. Typically VirtualHost directives in httpd.conf will have been
defined using FQDN, for example:
NameVirtualHost 192.168.1.1
<VirtualHost bert.my.co.uk>
ServerName bert.my.co.uk
</VirutalHost>
<VirtualHost fred.my.co.uk>
ServerName fred.my.co.uk
</VirutalHost>
2. The virtual hosts will have associated ServerName directives whose
values are used to set SERVER_NAME.
3. If a user on the local network uses a URL which does not fully
quality the servers domain name, e.g. http://fred/mailman/listinfo/,
then the VirtualHost directive is not correlated by the web server
and the SERVER_NAME will not be set to fred.my.co.uk but to some
other value depending on type and order of the VirtualHost directives
in httpd.conf, bert.my.co.uk in this example.
4. In these circumstances, the cunning code above will ignore the
SERVER_NAME value and match the fred value in HTTP_HOST.
I do not think this trick in the FormatListinfoOverview function is
the right way to overcome this problem. If you want to match both
partial and fully qualified domain names to a virtual host then two
VirtualHost directives should be used in httpd.conf, for example:
NameVirtualHost 192.168.1.1
<VirtualHost bert.my.co.uk>
ServerName bert.my.co.uk
</VirutalHost>
<VirtualHost fred.my.co.uk>
ServerName fred.my.co.uk
</VirutalHost>
<VirtualHost fred>
ServerName fred.my.co.uk
</VirutalHost>
By doing this both web server and listinfo.py reach the same
conclusion by the same route. I am saying that, in principle, the
problem, which is generic to the way the web server is operating,
should be solved by setting up the correct virtual host definitions
in httpd.conf not by second guessing the virtual host setup in
listinfo.py. listinfo.py should only consider the value of
SERVER_NAME and not even look at HTTP_HOST. My proposed changes to
the FormatListinfoOverview function are:
cut here-vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
diff -r -c2 mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py
mailman-2.0beta6.listinfo1/Mailman/Cgi/listinfo.py
*** mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py Wed Aug 2
00:10:41 2000
--- mailman-2.0beta6.listinfo1/Mailman/Cgi/listinfo.py Fri Oct 6
14:22:44 2000
***************
*** 59,73 ****
"Present a general welcome and itemize the (public) lists for this host."
! # XXX We need a portable way to determine the host by which we are being
! # visited! An absolute URL would do...
! http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
! port = os.environ.get('SERVER_PORT')
! # strip off the port if there is one
! if port and http_host[-len(port)-1:] == ':'+port:
! http_host = http_host[:-len(port)-1]
! if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host:
! host_name = http_host
! else:
! host_name = mm_cfg.DEFAULT_HOST_NAME
doc = Document()
--- 59,63 ----
"Present a general welcome and itemize the (public) lists for this host."
! http_host = host_name = os.environ.get('SERVER_NAME')
doc = Document()
cut here-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[Even if you do not agree with my first conclusion, please read on.]
Which brings me on to the problem that started me looking at this
whole issue; what happens when SSH's port forwarding is used to make
the browser/web connection? For example:
1. I'm dialling in to my favorite ISP from home using my laptop. In
order to then connect to our internal-access-only Mailman web server
through our corporate firewall, I have to use SSH's port forwarding.
2. I set up port forwarding so that any connection to local port 8081
on my laptop is forwarded to fred.my.co.uk:80 by the firewall machine.
3. I give my browser the URL http://localhost:8081/mailman/listinfo/.
4. The HTTP request is forwarded satisfactorily to fred.my.co.uk:80
with the URI being /mailman/listinfo/.
5. Because mm_cfg.VIRTUAL_HOST_OVERVIEW is true, listinfo.py proceeds
to tell me there are no advertised mail lists on host localhost:8081.
Well, I knew that.
This is because of two flaws in the FormatListinfoOverview function:
1. In trying to remove the port number from the end of the string
value of HTTP_HOST, the code assumes that the length of the port
number is equal to the length of the SERVER_PORT environment
variable's value. In the case of my example this is assumption is
wrong: the port number at the end of HTTP_HOST is 4 characters
('8081') and the SERVER_PORT is 2 characters long ('80').
2. Even with this first flaw corrected, the code still fails to
recognise the circumstances because it is analysing HTTP_HOST and
extracting the value 'localhost'. But the ip number of this value
does not even match the SERVER_ADDR environment variable's value,
which is a dead giveaway.
This problem disappears if virtual host definition in httpd.conf is
used instead of trickery involving HTTP_HOST in listinfo.py.
So also does a similar problem which occurs if the explicit ip number
of the server is used in the URL given to the browser, instead of the
server's domain name, assuming no ip-based virtual host has been
defined in httpd.conf to map the ip number to an acceptable
ServerName.
OK, so you do not agree with my contention that listinfo.py should
not consider HTTP_HOST because it might break a bunch of existing
Mailman installations. In that case, the following changes to the
FormatListinfoOverview function avoid my problems. The position here
is that if:
either - The ip number of the HTTP_HOST doesn't match the SERVER_ADDR.
or - The URL contains the server's ip number rather than its name.
then the code behaves as if the browser didn't supply an HTTP Host header:
cut here-vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
diff -r -c2 mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py
mailman-2.0beta6.listinfo2/Mailman/Cgi/listinfo.py
*** mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py Wed Aug 2
00:10:41 2000
--- mailman-2.0beta6.listinfo2/Mailman/Cgi/listinfo.py Fri Oct 6
15:06:31 2000
***************
*** 22,25 ****
--- 22,26 ----
import os
import string
+ import socket
from Mailman import mm_cfg
***************
*** 62,72 ****
# visited! An absolute URL would do...
http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
- port = os.environ.get('SERVER_PORT')
# strip off the port if there is one
! if port and http_host[-len(port)-1:] == ':'+port:
! http_host = http_host[:-len(port)-1]
! if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host:
host_name = http_host
else:
host_name = mm_cfg.DEFAULT_HOST_NAME
--- 63,77 ----
# visited! An absolute URL would do...
http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
# strip off the port if there is one
! if http_host:
! http_host = string.split(http_host, ':')[0]
! host_ip = socket.gethostbyname(http_host)
! server_ip = os.environ.get('SERVER_ADDR')
! if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host and \
! host_ip == server_ip and \
! host_ip != http_host:
host_name = http_host
else:
+ http_host = None
host_name = mm_cfg.DEFAULT_HOST_NAME
cut here-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
I have yet to post either of these above patches to sourceforge. I
would appreciate a sanity check on my thinking and any rebuttal of my
arguments or constructive comments. RSVP
------------------------------------------------------------------
Richard Barrett, PostPoint 30, e-mail:r.barrett@ftel.co.uk
Fujitsu Telecommunications Europe Ltd, tel: (44) 121 717 6337
Solihull Parkway, Birmingham Business Park, B37 7YU, England
"Democracy is two wolves and a lamb voting on what to have for
lunch. Liberty is a well armed lamb contesting the vote."
Benjamin Franklin, 1759
------------------------------------------------------------------