help with link parsing?

Colin J. Williams cjwilliams43 at gmail.com
Wed Dec 22 11:24:03 EST 2010


On 21-Dec-10 12:22 PM, Jon Clements wrote:
> import lxml
> from urlparse import urlsplit
>
> doc = lxml.html.parse('http://www.google.com')
> print map(urlsplit, doc.xpath('//a/@href'))
>
> [SplitResult(scheme='http', netloc='www.google.co.uk', path='/imghp',
> query='hl=en&tab=wi', fragment=''), SplitResult(scheme='http',
> netloc='video.google.co.uk', path='/', query='hl=en&tab=wv',
> fragment=''), SplitResult(scheme='http', netloc='maps.google.co.uk',
> path='/maps', query='hl=en&tab=wl', fragment=''),
> SplitResult(scheme='http', netloc='news.google.co.uk', path='/nwshp',
> query='hl=en&tab=wn', fragment=''), ...]

Jon,

What version of Python was used to run this?

Colin W.



More information about the Python-list mailing list