Overcoming python performance penalty for multicore CPU

Paul Rubin no.email at nospam.invalid
Wed Feb 3 20:51:36 EST 2010


John Nagle <nagle at animats.com> writes:
> Analysis of each domain is
> performed in a separate process, but each process uses multiple
> threads to read process several web pages simultaneously.
>
>    Some of the threads go compute-bound for a second or two at a time as
> they parse web pages.  

You're probably better off using separate processes for the different
pages.  If I remember, you were using BeautifulSoup, which while very
cool, is pretty doggone slow for use on large volumes of pages.  I don't
know if there's much that can be done about that without going off on a
fairly messy C or C++ coding adventure.  Maybe someday someone will do
that.



More information about the Python-list mailing list