how to strip the domain name in python?

Sun Apr 15 20:42:00 EDT 2007

Marko.Cain.23 at gmail.com wrote:
> On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
>> In <1176654669.737355.78... at y5g2000hsa.googlegroups.com>, Marko.Cain.23
>> wrote:
>>
>>
>>
>>> On Apr 14, 10:36 am, Marko.Cain... at gmail.com wrote:
>>>> On Apr 14, 12:02 am, Michael Bentley <mich... at jedimindworks.com>
>>>> wrote:
>>>>> On Apr 13, 2007, at 11:49 PM, Marko.Cain... at gmail.com wrote:
>>>>>> Hi,
>>>>>> I have a list of url names like this, and I am trying to strip out the
>>>>>> domain name using the following code:
>>>>>> http://www.cnn.com
>>>>>> www.yahoo.com
>>>>>> http://www.ebay.co.uk
>>>>>> pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
>>>>>> match = re.findall(pattern, line)
>>>>>> if (match):
>>>>>>         s1, s2 = match[0]
>>>>>>         print s2
>>>>>> but none of the site matched, can you please tell me what am i
>>>>>> missing?
>>>>> change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
>>>>> \/(.*)\.(.*)", re.S)
>>>> Thanks. I try this:
>>>> but when the 'line' ishttp://www.cnn.com, I get 's2' com,
>>>> but i want 'cnn.com' (everything after the first '.'), how can I do
>>>> that?
>>>> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
>>>>     match = re.findall(pattern, line)
>>>>     if (match):
>>>>         s1, s2 = match[0]
>>>>         print s2
>>> Can anyone please help me with my problem?  I still can't solve it.
>>> Basically, I want to strip out the text after the first '.' in url
>>> address:
>>> http://www.cnn.com-> cnn.com
>> from urlparse import urlsplit
>>
>> def get_domain(url):
>>     net_location = urlsplit(url)[1]
>>     return '.'.join(net_location.rsplit('.', 2)[-2:])
>>
>> def main():
>>     print get_domain('http://www.cnn.com')
>>
>> Ciao,
>>         Marc 'BlackJack' Rintsch
> 
> Thanks for your help.
> 
> But if the input string is  "http://www.ebay.co.uk/",  I only get
> "co.uk"
> 
> how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?
> 
  >>> def get_domain(url):
  ...   net_location = urlsplit(url)[1]
  ...   return net_location.split(".", 1)[1]
  ...
  >>> print get_domain('http://www.cnn.com')
cnn.com
  >>> print get_domain('http://www.ebay.co.uk')
ebay.co.uk
  >>>

regards
  Steve
-- 
Steve Holden       +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd          http://www.holdenweb.com
Skype: holdenweb     http://del.icio.us/steve.holden
Recent Ramblings       http://holdenweb.blogspot.com