help with link parsing?

Wed Dec 22 11:56:11 EST 2010

On Dec 22, 4:24 pm, "Colin J. Williams" <cjwilliam... at gmail.com>
wrote:
> On 21-Dec-10 12:22 PM, Jon Clements wrote:
>
> > import lxml
> > from urlparse import urlsplit
>
> > doc = lxml.html.parse('http://www.google.com')
> > print map(urlsplit, doc.xpath('//a/@href'))
>
> > [SplitResult(scheme='http', netloc='www.google.co.uk', path='/imghp',
> > query='hl=en&tab=wi', fragment=''), SplitResult(scheme='http',
> > netloc='video.google.co.uk', path='/', query='hl=en&tab=wv',
> > fragment=''), SplitResult(scheme='http', netloc='maps.google.co.uk',
> > path='/maps', query='hl=en&tab=wl', fragment=''),
> > SplitResult(scheme='http', netloc='news.google.co.uk', path='/nwshp',
> > query='hl=en&tab=wn', fragment=''), ...]
>
> Jon,
>
> What version of Python was used to run this?
>
> Colin W.

2.6.5 - the lxml library is not a standard module though and needs to
be installed.