Sort by domain name?

Paul Rubin http
Mon Oct 2 12:06:00 EDT 2006


"js " <ebgssth at gmail.com> writes:
> All I want to do is to sort out a list of url by companyname,
> like oreilly, ask, skype, amazon, google and so on, to find out
> how many company's url the list contain.

Here's a function I used to use.  It makes no attempt to be
exhaustive, but did a reasonable job on the domains I cared about at
the time:

    def host_domain(hostname):
        parts = hostname.split('.')
        if parts[-1] in ('au','uk','nz', 'za', 'jp', 'br'):
            # www.foobar.co.uk, etc
            host_len = 3
        elif len(parts)==4 and re.match('^[\d.]+$', hostname):
            host_len = 4                # 2.3.4.5 numeric address
        else:
            host_len = 2
        d = '.'.join(parts[-(host_len):])
#       print 'host_domain:', hostname, '=>', d
        return d



More information about the Python-list mailing list