[melbourne-pug] Joblib question

paul sorenson news02 at metrak.com
Fri Mar 9 20:33:02 EST 2018


Mike,

Are there unique features of joblib that you need to use?

Scraping web pages is often a good candidate for asyncio based models.

cheers


On 03/08/2018 11:41 PM, Mike Dewhirst wrote:
> https://media.readthedocs.org/pdf/joblib/latest/joblib.pdf
>
> I'm trying to make the following code run in parallel on separate CPU
> cores but haven't had any success.
>
> def make_links(self): for db in databases: link =
> create_useful_link(self, Link, db) if link: scrape_db(self, link, db)
> This is a web scraper which is working nicely in a leisurely
> sequential manner.  databases is a list of urls with gaps to be filled
> by create_useful_link() which makes a link record from the Link class.
> The self instance is a source of attributes for filling the url gaps.
> self is a chemical substance and the link record url field when
> clicked in a browser will bring up that external website with the
> chemical substance selected for researching by the viewer. If
> successful, we then fetch the external page and scrape a bunch of
> interesting data from it and turn that into substance notes.
> scrape_db() doesn't return anything but it does create up to nine
> other records.
>
>         from joblib import Parallel, delayed
>
>         class Substance( etc ..
>             ...
>             def make_links(self):
>                 #Parallel(n_jobs=-2)(delayed(
>                 #    scrape_db(self, create_useful_link(self, Link, db), db) for db in databases
>                 #))
> I'm getting a TypeError from Parallel delayed() - can't pickle
> generator objects
>
> So my question is how to write the commented code properly? I suspect
> I haven't done enough comprehension.
>
> Thanks for any help
>
> Mike
>
>
> _______________________________________________
> melbourne-pug mailing list
> melbourne-pug at python.org
> https://mail.python.org/mailman/listinfo/melbourne-pug

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/melbourne-pug/attachments/20180309/edf1f92a/attachment.html>


More information about the melbourne-pug mailing list