[Mailman-Developers] Virtual Host handling in listinfo.py

Richard Barrett R.Barrett@ftel.co.uk
Fri, 6 Oct 2000 15:20:03 +0100


In what follows I am referring to code in the file 
Mailman/Cgi/listinfo.py in the 2.0beta6 release of Mailman. I'm 
working with Apache/1.3.12 (Unix) which may influence your judgement 
about my arguments.

Sorry if what follows is too long but I found it useful to fully 
analyse my own thinking on the topic. Red face for me if the issue is 
well known to you all.

The code I'm concerned with is in the function 
FormatListinfoOverview. It deals with the situation when 
mm_cfg.VIRTUAL_HOST_OVERVIEW is true and computation is done to 
detemine which advertised mail lists should be returned when the URI 
/mailman/listinfo/ is being responded to. The relevant bits of the 
code are as follows:

     def FormatListinfoOverview(error=None):
         ...
         <snip>
         ...
         http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
         port = os.environ.get('SERVER_PORT')
         # strip off the port if there is one
         if port and http_host[-len(port)-1:] == ':'+port:
             http_host = http_host[:-len(port)-1]
         if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host:
             host_name = http_host
         else:
             host_name = mm_cfg.DEFAULT_HOST_NAME
         ...
         <snip>
         ...
         for n in names:
             if mlist.advertised:
                 if mm_cfg.VIRTUAL_HOST_OVERVIEW and \
                         http_host and \
                         string.find(http_host, mlist.web_page_url) == -1 and \
                         string.find(mlist.web_page_url, http_host) == -1:
                     # List is for different identity of this host - skip it.
                     continue

                 else:
                     advertised.append(mlist)
         ...
         <snip>


There is a flaw in this code in the way that it strips the port 
number from the http_host variable but I'll come on to that below.

As best I can judge the purpose of considering the value of the 
environment variable HTTP_HOST (if available) instead of just using 
the SERVER_NAME value is to try and deduce a virtual host's server 
name in cirumstances when the web server has not. For instance:

1. Typically VirtualHost directives in httpd.conf will have been 
defined using FQDN, for example:

     NameVirtualHost 192.168.1.1
     <VirtualHost bert.my.co.uk>
         ServerName bert.my.co.uk
     </VirutalHost>
     <VirtualHost fred.my.co.uk>
         ServerName fred.my.co.uk
     </VirutalHost>

2. The virtual hosts will have associated ServerName directives whose 
values are used to set SERVER_NAME.

3. If a user on the local network uses a URL which does not fully 
quality the servers domain name, e.g. http://fred/mailman/listinfo/, 
then the VirtualHost directive is not correlated by the web server 
and the SERVER_NAME will not be set to fred.my.co.uk but to some 
other value depending on type and order of the VirtualHost directives 
in httpd.conf, bert.my.co.uk in this example.

4. In these circumstances, the cunning code above will ignore the 
SERVER_NAME value and match the fred value in HTTP_HOST.

I do not think this trick in the FormatListinfoOverview function is 
the right way to overcome this problem. If you want to match both 
partial and fully qualified domain names to a virtual host then two 
VirtualHost directives should be used in httpd.conf, for example:

     NameVirtualHost 192.168.1.1
     <VirtualHost bert.my.co.uk>
         ServerName bert.my.co.uk
     </VirutalHost>
     <VirtualHost fred.my.co.uk>
         ServerName fred.my.co.uk
     </VirutalHost>
     <VirtualHost fred>
         ServerName fred.my.co.uk
     </VirutalHost>

By doing this both web server and listinfo.py reach the same 
conclusion by the same route. I am saying that, in principle, the 
problem, which is generic to the way the web server is operating, 
should be solved by setting up the correct virtual host definitions 
in httpd.conf not by second guessing the virtual host setup in 
listinfo.py. listinfo.py should only consider the value of 
SERVER_NAME and not even look at HTTP_HOST. My proposed changes to 
the FormatListinfoOverview function are:

cut here-vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
diff -r -c2 mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py 
mailman-2.0beta6.listinfo1/Mailman/Cgi/listinfo.py
*** mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py	Wed Aug  2 
00:10:41 2000
--- mailman-2.0beta6.listinfo1/Mailman/Cgi/listinfo.py	Fri Oct  6 
14:22:44 2000
***************
*** 59,73 ****
       "Present a general welcome and itemize the (public) lists for this host."

!     # XXX We need a portable way to determine the host by which we are being
!     #     visited!  An absolute URL would do...
!     http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
!     port = os.environ.get('SERVER_PORT')
!     # strip off the port if there is one
!     if port and http_host[-len(port)-1:] == ':'+port:
!         http_host = http_host[:-len(port)-1]
!     if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host:
! 	host_name = http_host
!     else:
! 	host_name = mm_cfg.DEFAULT_HOST_NAME

       doc = Document()
--- 59,63 ----
       "Present a general welcome and itemize the (public) lists for this host."

!     http_host = host_name = os.environ.get('SERVER_NAME')

       doc = Document()
cut here-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[Even if you do not agree with my first conclusion, please read on.]

Which brings me on to the problem that started me looking at this 
whole issue; what happens when SSH's port forwarding is used to make 
the browser/web connection? For example:

1. I'm dialling in to my favorite ISP from home using my laptop. In 
order to then connect to our internal-access-only Mailman web server 
through our corporate firewall, I have to use SSH's port forwarding.

2. I set up port forwarding so that any connection to local port 8081 
on my laptop is forwarded to fred.my.co.uk:80 by the firewall machine.

3. I give my browser the URL http://localhost:8081/mailman/listinfo/.

4. The HTTP request is forwarded satisfactorily to fred.my.co.uk:80 
with the URI being /mailman/listinfo/.

5. Because mm_cfg.VIRTUAL_HOST_OVERVIEW is true, listinfo.py proceeds 
to tell me there are no advertised mail lists on host localhost:8081. 
Well, I knew that.

This is because of two flaws in the FormatListinfoOverview function:

1. In trying to remove the port number from the end of the string 
value of HTTP_HOST, the code assumes that the length of the port 
number is equal to the length of the SERVER_PORT environment 
variable's value. In the case of my example this is assumption is 
wrong: the port number at the end of HTTP_HOST is 4 characters 
('8081') and the SERVER_PORT is 2 characters long ('80').

2. Even with this first flaw corrected, the code still fails to 
recognise the circumstances because it is analysing HTTP_HOST and 
extracting the value 'localhost'. But the ip number of this value 
does not even match the SERVER_ADDR environment variable's value, 
which is a dead giveaway.

This problem disappears if virtual host definition in httpd.conf is 
used instead of trickery involving HTTP_HOST in listinfo.py.

So also does a similar problem which occurs if the explicit ip number 
of the server is used in the URL given to the browser, instead of the 
server's domain name, assuming no ip-based virtual host has been 
defined in httpd.conf to map the ip number to an acceptable 
ServerName.

OK, so you do not agree with my contention that listinfo.py should 
not consider HTTP_HOST because it might break a bunch of existing 
Mailman installations. In that case, the following changes to the 
FormatListinfoOverview function avoid my problems. The position here 
is that if:

either - The ip number of the HTTP_HOST doesn't match the SERVER_ADDR.

or - The URL contains the server's ip number rather than its name.

then the code behaves as if the browser didn't supply an HTTP Host header:

cut here-vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
diff -r -c2 mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py 
mailman-2.0beta6.listinfo2/Mailman/Cgi/listinfo.py
*** mailman-2.0beta6.stock/Mailman/Cgi/listinfo.py	Wed Aug  2 
00:10:41 2000
--- mailman-2.0beta6.listinfo2/Mailman/Cgi/listinfo.py	Fri Oct  6 
15:06:31 2000
***************
*** 22,25 ****
--- 22,26 ----
   import os
   import string
+ import socket

   from Mailman import mm_cfg
***************
*** 62,72 ****
       #     visited!  An absolute URL would do...
       http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
-     port = os.environ.get('SERVER_PORT')
       # strip off the port if there is one
!     if port and http_host[-len(port)-1:] == ':'+port:
!         http_host = http_host[:-len(port)-1]
!     if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host:
   	host_name = http_host
       else:
   	host_name = mm_cfg.DEFAULT_HOST_NAME

--- 63,77 ----
       #     visited!  An absolute URL would do...
       http_host = os.environ.get('HTTP_HOST', os.environ.get('SERVER_NAME'))
       # strip off the port if there is one
!     if http_host:
!         http_host = string.split(http_host, ':')[0]
!     host_ip = socket.gethostbyname(http_host)
!     server_ip = os.environ.get('SERVER_ADDR')
!     if mm_cfg.VIRTUAL_HOST_OVERVIEW and http_host and \
!             host_ip == server_ip and \
!             host_ip != http_host:
   	host_name = http_host
       else:
+ 	http_host = None
   	host_name = mm_cfg.DEFAULT_HOST_NAME
cut here-^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

I have yet to post either of these above patches to sourceforge. I 
would appreciate a sanity check on my thinking and any rebuttal of my 
arguments or constructive comments. RSVP
------------------------------------------------------------------
Richard Barrett, PostPoint 30,         e-mail:r.barrett@ftel.co.uk
Fujitsu Telecommunications Europe Ltd,      tel: (44) 121 717 6337
Solihull Parkway, Birmingham Business Park, B37 7YU, England
"Democracy is two wolves and a lamb voting on what to have for
lunch. Liberty is a well armed lamb contesting the vote."
Benjamin Franklin, 1759
------------------------------------------------------------------