Extracting links!

Max M maxm at mxm.dk
Tue Feb 11 03:42:49 EST 2003


Muhammad wrote:
> I'm building a search engine for our interior sites in Perl. 
> The engine uses index (word->urls), I built an index perl script that
> indexes all the pages on a site, and start running it in our server,
> but it was so heavy, took a lot of time and  stacked the server
> sometimes.


The file I have attached here contains a module that might be of some help.

It parses a html file for links.

You can ie. relatively easy make it return all links in a page as 
absolute urls.

lp = LinkParser(slashdot_source)
absolue_urls = lp.relative2abs('http://www.slashdot.org/').values()

-- 

hilsen/regards Max M Rasmussen, Denmark

http://www.futureport.dk/
Fremtiden, videnskab, skeptiscisme og transhumanisme
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: LinkParser.py
URL: <http://mail.python.org/pipermail/python-list/attachments/20030211/ed58b8ce/attachment.ksh>


More information about the Python-list mailing list