[BangPypers] Harvestman error

Anand Balachandran Pillai abpillai at gmail.com
Mon May 31 11:46:35 CEST 2010


On Sun, May 30, 2010 at 9:56 PM, JAGANADH G <jaganadhg at gmail.com> wrote:

> Dear All I was trying to run Harvestman(A Python tool for web harvesting).
> I got the following error
> http://pastebin.com/uPzUs0Xw
>
> My configuration file is http://pastebin.com/dfhiy2Q6
>
> Can any body help me regarding this.
>
> I was trying to harvest my blog with a word filter 'Python'
>

 There is no word filter anymore. You hit upon a bug which seems to
 still apply the word-filter code :)

 For filtering based on words or regular expressions on the page content,
 you can implement a custom crawler. It is pretty easy and a sample
 already exists. Just modify the code to suit the keyword(s) you want
 to filter.

 Look for "searchingcrawler.py" inside apps/samples folder and
 modify the code.


>
> --
> **********************************
> JAGANADH G
> http://jaganadhg.freeflux.net/blog
> _______________________________________________
> BangPypers mailing list
> BangPypers at python.org
> http://mail.python.org/mailman/listinfo/bangpypers
>



-- 
--Anand


More information about the BangPypers mailing list