Overcoming python performance penalty for multicore CPU

Steve Holden steve at holdenweb.com
Wed Feb 3 22:50:19 EST 2010


John Nagle wrote:
> Paul Rubin wrote:
>> John Nagle <nagle at animats.com> writes:
>>> Analysis of each domain is
>>> performed in a separate process, but each process uses multiple
>>> threads to read process several web pages simultaneously.
>>>
>>>    Some of the threads go compute-bound for a second or two at a time as
>>> they parse web pages.  
>>
>> You're probably better off using separate processes for the different
>> pages.  If I remember, you were using BeautifulSoup, which while very
>> cool, is pretty doggone slow for use on large volumes of pages.  I don't
>> know if there's much that can be done about that without going off on a
>> fairly messy C or C++ coding adventure.  Maybe someday someone will do
>> that.
> 
>    I already use separate processes for different domains.  I could
> live with Python's GIL as long as moving to a multicore server
> doesn't make performance worse.  That's why I asked about CPU dedication
> for each process, to avoid thrashing at the GIL.
> 
I believe it's already been said that the GIL thrashing is mostly MacOS
specific. You might also find something in the affinity module

  http://pypi.python.org/pypi/affinity/0.1.0

to ensure that each process in your pool runs on only one processor.

regards
 Steve
-- 
Steve Holden           +1 571 484 6266   +1 800 494 3119
PyCon is coming! Atlanta, Feb 2010  http://us.pycon.org/
Holden Web LLC                 http://www.holdenweb.com/
UPCOMING EVENTS:        http://holdenweb.eventbrite.com/




More information about the Python-list mailing list