url validator in python

Tim Chase python.list at tim.thechases.com
Wed Mar 19 12:21:19 EDT 2008


> How can I check the validity of absolute urls with http scheme?
> example:
> "http://www.example.com/something.html" -> valid
> "http://www.google.com/ + Brite_AB_Iframe_URL + " -> invalid

You could try something like

   import urllib
   tests = (
     ("http://www.google.com/ + Brite_AB_Iframe_URL + ", False),
     ("http://www.example.com/something.html", True),
     ("https://www.google.com/ + Brite_AB_Iframe_URL + ", False),
     ("https://www.example.com/something.html", True),
   )
   def no_method(url):
     if ':' in url[:7]:
       # strip off the leading http:
       return url.split(':', 1)[1]
     return url

   def is_valid_url(url):
     url = no_method(url)
     return url == urllib.quote(url)

   for test_url, expected_result in tests:
     print "Testing %s\nagainst %s" % (
       no_method(test_url),
       urllib.quote(no_method(test_url))
       )
     actual_result = is_valid_url(test_url)
     print 'Pass: %s' % (actual_result == expected_result)
     print '='*70

The reason for the no_method() is that otherwise it gets 
normalized to "http%3A//..." so you have to strip off that bit 
before comparing.

-tkc







More information about the Python-list mailing list