Extracting links!
Max M
maxm at mxm.dk
Tue Feb 11 03:42:49 EST 2003
Muhammad wrote:
> I'm building a search engine for our interior sites in Perl.
> The engine uses index (word->urls), I built an index perl script that
> indexes all the pages on a site, and start running it in our server,
> but it was so heavy, took a lot of time and stacked the server
> sometimes.
The file I have attached here contains a module that might be of some help.
It parses a html file for links.
You can ie. relatively easy make it return all links in a page as
absolute urls.
lp = LinkParser(slashdot_source)
absolue_urls = lp.relative2abs('http://www.slashdot.org/').values()
--
hilsen/regards Max M Rasmussen, Denmark
http://www.futureport.dk/
Fremtiden, videnskab, skeptiscisme og transhumanisme
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: LinkParser.py
URL: <http://mail.python.org/pipermail/python-list/attachments/20030211/ed58b8ce/attachment.ksh>
More information about the Python-list
mailing list