how to strip the domain name in python?

Marko.Cain.23 at gmail.com Marko.Cain.23 at gmail.com
Sun Apr 15 17:24:31 EDT 2007


On Apr 15, 11:57 am, Marc 'BlackJack' Rintsch <bj_... at gmx.net> wrote:
> In <1176654669.737355.78... at y5g2000hsa.googlegroups.com>, Marko.Cain.23
> wrote:
>
>
>
> > On Apr 14, 10:36 am, Marko.Cain... at gmail.com wrote:
> >> On Apr 14, 12:02 am, Michael Bentley <mich... at jedimindworks.com>
> >> wrote:
>
> >> > On Apr 13, 2007, at 11:49 PM, Marko.Cain... at gmail.com wrote:
>
> >> > > Hi,
>
> >> > > I have a list of url names like this, and I am trying to strip out the
> >> > > domain name using the following code:
>
> >> > >http://www.cnn.com
> >> > >www.yahoo.com
> >> > >http://www.ebay.co.uk
>
> >> > > pattern = re.compile("http:\\\\(.*)\.(.*)", re.S)
> >> > > match = re.findall(pattern, line)
>
> >> > > if (match):
> >> > >         s1, s2 = match[0]
>
> >> > >         print s2
>
> >> > > but none of the site matched, can you please tell me what am i
> >> > > missing?
>
> >> > change re.compile("http:\\\\(.*)\.(.*)", re.S) to re.compile("http:\/
> >> > \/(.*)\.(.*)", re.S)
>
> >> Thanks. I try this:
>
> >> but when the 'line' ishttp://www.cnn.com, I get 's2' com,
> >> but i want 'cnn.com' (everything after the first '.'), how can I do
> >> that?
>
> >> pattern = re.compile("http:\/\/(.*)\.(.*)", re.S)
>
> >>     match = re.findall(pattern, line)
>
> >>     if (match):
>
> >>         s1, s2 = match[0]
>
> >>         print s2
>
> > Can anyone please help me with my problem?  I still can't solve it.
>
> > Basically, I want to strip out the text after the first '.' in url
> > address:
>
> >http://www.cnn.com-> cnn.com
>
> from urlparse import urlsplit
>
> def get_domain(url):
>     net_location = urlsplit(url)[1]
>     return '.'.join(net_location.rsplit('.', 2)[-2:])
>
> def main():
>     print get_domain('http://www.cnn.com')
>
> Ciao,
>         Marc 'BlackJack' Rintsch

Thanks for your help.

But if the input string is  "http://www.ebay.co.uk/",  I only get
"co.uk"

how can I change it so that it works for both www.ebay.co.uk and www.cnn.com?




More information about the Python-list mailing list