Open source web crawler with mysql integration

Support Desk mike at ipglobal.net
Fri Apr 10 10:28:28 EDT 2009


Sounds Interesting. When its done would you care to share it?

Sincerely,
Michael H.
 
-----Original Message-----
From: Philip Semanchuk [mailto:philip at semanchuk.com] 
Sent: Thursday, April 09, 2009 9:46 PM
To: Python
Subject: Re: Open source web crawler with mysql integration


On Apr 9, 2009, at 7:37 PM, Daniel Fetchinson wrote:

>> I'm looking for a crawler that can spider my site and toss the  
>> results
>> into mysql so, in turn, that database can be indexed by Sphinx  
>> Search.
>>
>> Since I don't want to reinvent the wheel, is anyone aware of any open
>> source projects or code snippets that can already handle this?
>
> Have a look at http://nikitathespider.com/python/


As the author of Nikita, I can say that (a) she used Postgres and (b)  
the code wasn't open sourced except for a couple of small parts. The  
service is now defunct. It wasn't making money. Ideally I'd like to  
open source the code one day, but it would take a lot of documentation  
work to make it installable by others, and I won't have the time to do  
that for the foreseeable future.

At the URL provided there's a nice module for parsing robots.txt files  
(better than the one in the standard library IMHO) but that's about it.

FYI, I wrote my spider in Python because I couldn't find a decent one  
written in Python. There's Nutch, but that's not Python (Java I think).

Good luck
Philip






More information about the Python-list mailing list