threading - race condition?

skunkwerk skunkwerk at gmail.com
Mon May 12 15:29:16 EDT 2008


On May 11, 1:55 pm, Dennis Lee Bieber <wlfr... at ix.netcom.com> wrote:
> On Sun, 11 May 2008 09:16:25 -0700 (PDT),skunkwerk
> <skunkw... at gmail.com> declaimed the following in comp.lang.python:
>
>
>
> > the only issue i have now is that it takes a long time for 100 threads
> > to initialize that connection (>5 minutes) - and as i'm doing this on
> > a webserver any time i update the code i have to restart all those
> > threads, which i'm doing right now in a for loop.  is there any way I
> > can keep the thread stuff separate from the rest of the code for this
> > file, yet allow access?  It wouldn't help having a .pyc or using
> > psycho, correct, as the time is being spent in the runtime?  something
> > along the lines of 'start a new thread every minute until you get to a
> > 100' without blocking the execution of the rest of the code in that
> > file?  or maybe any time i need to do a search, start a new thread if
> > the #threads is <100?
>
>         Is this running as part of the server process, or as a client
> accessing the server?
>
>         Alternative question: Have you tried measuring the performance using
> /fewer/ threads... 25 or less? I believe I'd mentioned prior that you
> seem to have a lot of overhead code for what may be a short query.
>
>         If the .get_item() code is doing a full sequence of: connect to
> database; format&submit query; fetch results; disconnect from
> database... I'd recommend putting the connect/disconnect outside of the
> thread while loop (though you may then need to put sentinel values into
> the feed queue -- one per thread -- so they can cleanly exit and
> disconnect rather than relying on daemonization for exit).
>
> thread:
>         dbcon = ...
>         while True:
>                 query = Q.get()
>                 if query == SENTINEL: break
>                 result = get_item(dbcon, query)
>                 ...
>         dbcon.close()
>
>         Third alternative: Find some way to combine the database queries.
> Rather than 100 threads each doing a single lookup (from your code, it
> appears that only 1 result is expected per search term), run 10 threads
> each looking up 10 items at once...
>
> thread:
>         dbcon = ...
>         terms = []
>         terminate = False
>         while not terminate:
>                 while len(terms) < 10:
>                         query = Q.get_nowait()
>                         if not query: break
>                         if query == SENTINEL:
>                                 terminate = True
>                                 break
>                         terms.append(query)
>                 results = get_item(dbcon, terms)
>                 terms = []
>                 #however you are returning items; match the query term to the
>                 #key item in the list of returned data?
>         dbcon.close()
>
> where the final select statement looks something like:
>
> SQL = """select key, title, scraped from ***
>                         where key in ( %s )""" % ", ".join("?" for x in terms)
>         #assumes database adapter uses ? for placeholder
> dbcur.execute(SQL, terms)
> --
>         Wulfraed        Dennis Lee Bieber               KD6MOG
>         wlfr... at ix.netcom.com         wulfr... at bestiaria.com
>                 HTTP://wlfraed.home.netcom.com/
>         (Bestiaria Support Staff:               web-a... at bestiaria.com)
>                 HTTP://www.bestiaria.com/

thanks again Dennis,
   i chose 100 threads so i could do 10 simultaneous searches (where
each search contains 10 terms - using 10 threads).  the .get_item()
code is not doing the database connection - rather the intialization
is done in the initialization of each thread.  so basically once a
thread starts the database connection is persistent and .get_item
queries are very fast.  this is running as a server process (using
django).

cheers



More information about the Python-list mailing list