how to strip the domain name in python?

Marc 'BlackJack' Rintsch bj_666 at gmx.net
Sun Apr 15 12:57:35 EDT 2007


In <1176654669.737355.78300 at y5g2000hsa.googlegroups.com>, Marko.Cain.23
wrote:

> On Apr 14, 10:36 am, Marko.Cain... at gmail.com wrote:
>> On Apr 14, 12:02 am, Michael Bentley <mich... at jedimindworks.com>
>> wrote:
>>
>>
>>
>> > On Apr 13, 2007, at 11:49 PM, Marko.Cain... at gmail.com wrote:
>>
>> > > Hi,
>>
>> > > I have a list of url names like this, and I am trying to strip out the
>> > > domain name using the following code:
>>
>> > >http://www.cnn.com
>> > >www.yahoo.com
>> > >http://www.ebay.co.uk
>>
>> > > pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
>> > > match = re.findall(pattern, line)
>>
>> > > if (match):
>> > >         s1, s2 = match[0]
>>
>> > >         print s2
>>
>> > > but none of the site matched, can you please tell me what am i
>> > > missing?
>>
>> > change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
>> > \/(.*)\.(.*)", re.S)
>>
>> Thanks. I try this:
>>
>> but when the 'line' ishttp://www.cnn.com, I get 's2' com,
>> but i want 'cnn.com' (everything after the first '.'), how can I do
>> that?
>>
>> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
>>
>>     match = re.findall(pattern, line)
>>
>>     if (match):
>>
>>         s1, s2 = match[0]
>>
>>         print s2
> 
> Can anyone please help me with my problem?  I still can't solve it.
> 
> Basically, I want to strip out the text after the first '.' in url
> address:
> 
> http://www.cnn.com -> cnn.com

from urlparse import urlsplit

def get_domain(url):
    net_location = urlsplit(url)[1]
    return '.'.join(net_location.rsplit('.', 2)[-2:])

def main():
    print get_domain('http://www.cnn.com')

Ciao,
	Marc 'BlackJack' Rintsch



More information about the Python-list mailing list