Python Thread Question

Jp Calderone exarkun at intarweb.us
Thu Apr 17 09:52:15 EDT 2003


On Thu, Apr 17, 2003 at 06:34:11AM -0700, Anand B Pillai wrote:
> Hi Pythonistas
> 
>  I have written this application which is a kind of intranet
>  web-spider. It crawls a given url and retrives the files in
>  the url and saves it to the disk. 
> 
>  Now when I do this using multiple threads(python threads), 
>  assigning each url to a thread I find that the download gets
>  completed faster than if it were in a single thread. I assume
>  that the reason for this must be simple, that when you use 
>  a single thread idiom, the app has to wait till a file is 
>  downloaded. Whereas if you use a thread for each download, 
>  the app can spawn other threads for other downloads, so no 
>  wait is needed. I am firing off a group of threads (limited
>  by a maxthread count) and pooling them in a threadgroup. 
>  Once the threads are fired for download, the app does not
>  try to control them until they finish/killed or a network 
>  time-out occurs.
> 
>  Ideally speaking, multithreading need not improve the speed
>  of an application but in examples like this which involve
>  bottlenecks like network traffic, it does. My questions about
>  this are:

  Unless you use non-blocking sockets.  But that is another story.

> 
>  1. Does python threads work only if the native platform supports
>     threading ?  i.e, is python firing 'C' threads which in turn
>     fire the platform API threads (Win32 for windows/ pthreads for
>     linux etc)?

  Yes, native threads.

>  2. Can a software API (Win32/pthreads) do multithreading even if
>     the CPU does not support multithreading ? (might seem like a
>     superfluous question when almost all cpus does in this age, but
>     the question is still valid). Or is multithreading ultimately
>     related to how the CPU handles threads ?

  I am not aware of a modern CPU on which threading cannot be implemented. 
Perhaps a more useful piece of information to give you is this: threading
APIs can be implemented either in a kernel or in a userspace library; they
don't require hardware support.

>  3. Is the apparent increase in speed in my program using multiple
>     threads attributable to the CPU or the platform API or python ?

  It is attributable to none of the above, as far as I can tell.  When you
perform blocking reads, your app ends up spending most of its time idle. 
When you launch 50 threads and do 50 blocking reads, your app spends
slightly less of its time idle, but is still mostly just waiting for those
reads to return something.  The difference is, data is probably coming in
for all 50 of those reads at the same time, maybe from different hosts - so
you get higher network throughput, and so your program seems faster.

> 
>  4. Can I safely say that multithreading will improve my application
>     performance if it has similar work to do on many resources at the
>     same time ? (egs: a web parser/ spider/ a disk-to-disk file copier/
>     directory synchronizer) Or does it depend upon the nature of the
>     task at hand ?

  Nope.  Threading costs you context switches.  A multi-threaded app using
blocking IO will appear "faster" than a single-thread app doing the same,
but both take longer to run than a single written app using non-blocking IO.

  Context switches are usually relatively cheap compared to what your app is
actually doing, though, so the difference between doing blocking IO in a
multi-thread app and non-blocking IO in a single-thread app isn't always
obvious.  (Other things associated with multi-threaded apps, such as
deadlocks and race conditions are, though ;)

  Hope this helps,

  Jp

-- 
It is practically impossible to teach good programming style to
students that have had prior exposure to BASIC: as potential
programmers they are mentally mutilated beyond hope of
regeneration.        -- Dijkstra
-- 
 up 28 days, 9:02, 5 users, load average: 0.04, 0.03, 0.00





More information about the Python-list mailing list