[issue5550] [urllib.request]: Comparison of HTTP headers should be insensitive to the case

karl report at bugs.python.org
Wed Mar 20 21:38:47 CET 2013


karl added the comment:

First, Sanity check for myself to be sure to understand.
==============================================
→ python3.3
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 01:25:11) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import urllib.request
>>> req = urllib.request.Request('http://www.python.org/')
>>> req.headers
{}
>>> req.unredirected_hdrs
{}
>>> r = urllib.request.urlopen(req)
>>> req.headers
{}
>>> req.unredirected_hdrs
{'User-agent': 'Python-urllib/3.3', 'Host': 'www.python.org'}
>>> req.header_items()
[('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')]
>>> req.has_header('host')
False
>>> req.has_header('Host')
True
>>> req.get_header('host')
>>> req.get_header('Host')
'www.python.org'
>>> 'host' in req.unredirected_hdrs
False
>>> 'Host' in req.unredirected_hdrs
True
>>> 'host' in req.header_items()
False
>>> 'Host' in req.header_items()
False
>>> req.unredirected_hdrs.items()
dict_items([('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')])
>>> req.headers.get('Host', req.unredirected_hdrs.get('Host',None))
'www.python.org'
>>> req.headers.get('Host')
>>> req.headers.get('host'.capitalize(), req.unredirected_hdrs.get('host'.capitalize(),None))
'www.python.org'
>>> req.headers.get('HOST'.capitalize(), req.unredirected_hdrs.get('HOST'.capitalize(),None))
'www.python.org'
>>> req.headers.get('host'.title(), req.unredirected_hdrs.get('host'.title(),None))
'www.python.org'
>>> 'host'.capitalize() in req.unredirected_hdrs
True
==============================================

OK. The two add methods force the capitalization thought capitalize() (And not title() which is an issue by itself)

http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l359

    def add_header(self, key, val):
        # useful for something like authentication
        self.headers[key.capitalize()] = val

    def add_unredirected_header(self, key, val):
        # will not be added to a redirected request
        self.unredirected_hdrs[key.capitalize()] = val


HTTP headers are case insensitive. The way the methods get_header() and has_header() are currently designed. We could also capitalize the variable before requesting it.


So something like 

    def get_header(self, header_name, default=None):
        return self.headers.get(
            header_name.capitalize(),
            self.unredirected_hdrs.get(header_name.capitalize(), default))


    def has_header(self, header_name):
        return (header_name.capitalize() in self.headers or
                header_name.capitalize() in self.unredirected_hdrs)

The method to add headers on request is 

>>> req.add_header("foo-bar","booh")
>>> req.headers
{'Foo-bar': 'booh'}
>>> req.unredirected_hdrs
{'User-agent': 'Python-urllib/3.3', 'Host': 'www.python.org'}
>>> req.header_items()
[('Foo-bar', 'booh'), ('User-agent', 'Python-urllib/3.3'), ('Host', 'www.python.org')]


So if someone add an header it will be capitalized. And the query will be correct.

The issue is more with addheader which doesn't have the same constraint.
http://hg.python.org/cpython/file/3.3/Lib/urllib/request.py#l1624

Personally I would have made everything case insensitive ;)

Also note that in this module the casing is not consistent when the values are hardcoded. Sometimes Content-Type, sometimes Content-type.

Anyway I submitted a patch with the code modification AND the test. And this is the result when running the test suite.

→ ./python.exe Lib/test/test_urllib2net.py 
test_sni (__main__.HTTPSTests) ... skipped 'test disabled - test server needed'
test_custom_headers (__main__.OtherNetworkTests) ... ok
test_file (__main__.OtherNetworkTests) ... ok
test_ftp (__main__.OtherNetworkTests) ... ok
test_headers_case_sensitivity (__main__.OtherNetworkTests) ... ok
test_sites_no_connection_close (__main__.OtherNetworkTests) ... /Users/karl/Documents/2011/cpython/Lib/socket.py:370: ResourceWarning: unclosed <socket.socket object, fd=5, family=2, type=1, proto=6>
  self._sock = None
ok
test_urlwithfrag (__main__.OtherNetworkTests) ... ok
test_close (__main__.CloseSocketTest) ... ok
test_ftp_basic (__main__.TimeoutTest) ... ok
test_ftp_default_timeout (__main__.TimeoutTest) ... ok
test_ftp_no_timeout (__main__.TimeoutTest) ... ok
test_ftp_timeout (__main__.TimeoutTest) ... ok
test_http_basic (__main__.TimeoutTest) ... ok
test_http_default_timeout (__main__.TimeoutTest) ... ok
test_http_no_timeout (__main__.TimeoutTest) ... ok
test_http_timeout (__main__.TimeoutTest) ... ok

----------------------------------------------------------------------
Ran 16 tests in 15.259s

OK (skipped=1)
[137983 refs]

----------
Added file: http://bugs.python.org/file29511/issue-5550-2.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5550>
_______________________________________


More information about the Python-bugs-list mailing list