Creating objects in thread created with subclass

Nick Arnett narnett at mccmedia.com
Tue Apr 9 11:12:10 EDT 2002


Aahz & all,

I'm working from Aahz's example of a thread pool spider
(http://starship.python.net/crew/aahz/IPC9/ThreadPoolSpider.py) and I'm
uncertain about whether I should continue with the approach that I used in a
single-threaded version.  In my original, I create an object that knows how
to do the retrieval of pages and some processing (this spider is not
recursive in the usual sense of following links, but it is somewhat
similar).  After it retrieves a series of pages and extracts the data I
want, it returns a list of tuples that I pass to another object that either
directly inserts it into a MySQL database or writes a bulk insert file.  So
I already have a producer-consumer pattern of sorts.

In my original, I'd create the retriever object, set attributes, then call
the function to go do its work, which returns the data to pass to the
database insertion object.  Here's my question -- in the "run" method of
Aahz's example, if I create my retriever object in an ordinary way, then all
of the threads will be sharing the same object, right?  And that most
definitely will not work.  So, I could create objects with 'eval' or 'exec'
to generate unique names for each thread -- but is that the right way to do
it?  Or should I stop using objects there and directly use the functions
from my classes.  I haven't looked at a whole lot of examples, but so far,
the latter seems to be the way people do it.

The retrieval object is a subclass of a subclass; the top-level class knows
a lot about generating URLs, analyzing data, etc.  Thus, I think it'll be a
bit of work to rearrange it into functions.  And I'm going to lose some of
the elegancy of the subclassing, which I don't particularly like.

If it helps, the reason for subclassing is that this spider knows how to
retrieve some specific types of Web pages.  The rules for recursing and
analyzing them vary by Web site, but they have a lot in common.

Thanks!

Nick






More information about the Python-list mailing list