[Python-checkins] r63759 - in peps/trunk: pep-0000.txt pep-0371.txt
georg.brandl
python-checkins at python.org
Wed May 28 17:28:09 CEST 2008
Author: georg.brandl
Date: Wed May 28 17:28:08 2008
New Revision: 63759
Log:
Add PEP 371, Addition of the processing module, by Jesse Noller/Richard Oudkerk.
Added:
peps/trunk/pep-0371.txt
Modified:
peps/trunk/pep-0000.txt
Modified: peps/trunk/pep-0000.txt
==============================================================================
--- peps/trunk/pep-0000.txt (original)
+++ peps/trunk/pep-0000.txt Wed May 28 17:28:08 2008
@@ -96,6 +96,7 @@
S 364 Transitioning to the Py3K Standard Library Warsaw
S 368 Standard image protocol and class Mastrodomenico
S 369 Post import hooks Heimes
+ S 371 Addition of the Processing module Noller, Oudkerk
S 3134 Exception Chaining and Embedded Tracebacks Yee
S 3135 New Super Spealman, Delaney
S 3138 String representation in Python 3000 Ishimoto
@@ -473,6 +474,7 @@
S 368 Standard image protocol and class Mastrodomenico
S 369 Post import hooks Heimes
SA 370 Per user site-packages directory Heimes
+ S 371 Addition of the Processing module Noller, Oudkerk
SR 666 Reject Foolish Indentation Creighton
SR 754 IEEE 754 Floating Point Special Values Warnes
P 3000 Python 3000 GvR
@@ -607,10 +609,12 @@
Meyer, Mike mwm at mired.org
Montanaro, Skip skip at pobox.com
Moore, Paul gustav at morpheus.demon.co.uk
+ Noller, Jesse jnoller at gmail.com
North, Ben ben at redfrontdoor.org
Norwitz, Neal nnorwitz at gmail.com
Oliphant, Travis oliphant at ee.byu.edu
Orendorff, Jason jason.orendorff at gmail.com
+ Oudkerk, Richard r.m.oudkerk at googlemail.com
Pedroni, Samuele pedronis at python.org
Pelletier, Michel michel at users.sourceforge.net
Peters, Tim tim at zope.com
Added: peps/trunk/pep-0371.txt
==============================================================================
--- (empty file)
+++ peps/trunk/pep-0371.txt Wed May 28 17:28:08 2008
@@ -0,0 +1,345 @@
+PEP: 371
+Title: Addition of the Processing module to standard library
+Version: $Revision: $
+Last-Modified: $Date: $
+Author: Jesse Noller <jnoller at gmail.com>
+ Richard Oudkerk <r.m.oudkerk at googlemail.com>
+Status: Draft
+Type: Standards Track
+Content-Type: text/plain
+Created: 06-May-2008
+Python-Version: 2.6 / 3.0
+Post-History:
+
+
+Abstract
+
+ This PEP proposes the inclusion of the pyProcessing [1] module into the
+ python standard library.
+
+ The processing module mimics the standard library threading module and API
+ to provide a process-based approach to "threaded programming" allowing
+ end-users to dispatch multiple tasks that effectively side-step the global
+ interpreter lock.
+
+ The module also provides server and client modules to provide remote-
+ sharing and management of objects and tasks so that applications may not
+ only leverage multiple cores on the local machine, but also distribute
+ objects and tasks across a cluster of networked machines.
+
+ While the distributed capabilities of the module are beneficial, the primary
+ focus of this PEP is the core threading-like API and capabilities of the
+ module.
+
+Rationale
+
+ The current CPython interpreter implements the Global Interpreter Lock (GIL)
+ and barring work in Python 3000 or other versions currently planned [2], the
+ GIL will remain as-is within the CPython interpreter for the foreseeable
+ future. While the GIL itself enables clean and easy to maintain C code for
+ the interpreter and extensions base, it is frequently an issue for those
+ Python programmers who are leveraging multi-core machines.
+
+ The GIL itself prevents more than a single thread from running within the
+ interpreter at any given point in time, effectively removing python's
+ ability to take advantage of multi-processor systems. While I/O bound
+ applications do not suffer the same slow-down when using threading, they do
+ suffer some performance cost due to the GIL.
+
+ The Processing module offers a method to side-step the GIL allowing
+ applications within CPython to take advantage of multi-core architectures
+ without asking users to completely change their programming paradigm (i.e.:
+ dropping threaded programming for another "concurrent" approach - Twisted,
+ etc).
+
+ The Processing module offers CPython users a known API (that of the
+ threading module), with known semantics and easy-scalability. In the
+ future, the module might not be as relevant should the CPython interpreter
+ enable "true" threading, however for some applications, forking an OS
+ process may sometimes be more desirable than using lightweight threads,
+ especially on those platforms where process creation is fast/optimized.
+
+ For example, a simple threaded application:
+
+ from threading import Thread as worker
+
+ def afunc(number):
+ print number * 3
+
+ t = worker(target=afunc, args=(4,))
+ t.start()
+ t.join()
+
+ The pyprocessing module mirrors the API so well, that with a simple change
+ of the import to:
+
+ from processing import Process as worker
+
+ The code now executes through the processing.Process class. This type of
+ compatibility means that, with a minor (in most cases) change in code,
+ users' applications will be able to leverage all cores and processors on a
+ given machine for parallel execution. In many cases the pyprocessing module
+ is even faster than the normal threading approach for I/O bound programs.
+ This of course, takes into account that the pyprocessing module is in
+ optimized C code, while the threading module is not.
+
+The "Distributed" Problem
+
+ In the discussion on Python-Dev about the inclusion of this module [3] there
+ was confusion about the intentions this PEP with an attempt to solve the
+ "Distributed" problem - frequently comparing the functionality of this
+ module with other solutions like MPI-based communication [4], CORBA, or
+ other distributed object approaches [5].
+
+ The "distributed" problem is large and varied. Each programmer working
+ within this domain has either very strong opinions about their favorite
+ module/method or a highly customized problem for which no existing solution
+ works.
+
+ The acceptance of this module does not preclude or recommend that
+ programmers working on the "distributed" problem not examine other solutions
+ for their problem domain. The intent of including this module is to provide
+ entry-level capabilities for local concurrency and the basic support to
+ spread that concurrency across a network of machines - although the two are
+ not tightly coupled, the pyprocessing module could in fact, be used in
+ conjunction with any of the other solutions including MPI/etc.
+
+ If necessary - it is possible to completely decouple the local concurrency
+ abilities of the module from the network-capable/shared aspects of the
+ module. Without serious concerns or cause however, the author of this PEP
+ does not recommend that approach.
+
+Performance Comparison
+
+ As we all know - there are "lies, damned lies, and benchmarks". These speed
+ comparisons, while aimed at showcasing the performance of the pyprocessing
+ module, are by no means comprehensive or applicable to all possible use
+ cases or environments. Especially for those platforms with sluggish process
+ forking timing.
+
+ All benchmarks were run using the following:
+ * 4 Core Intel Xeon CPU @ 3.00GHz
+ * 16 GB of RAM
+ * Python 2.5.2 compiled on Gentoo Linux (kernel 2.6.18.6)
+ * pyProcessing 0.52
+
+ All of the code for this can be downloaded from:
+ http://jessenoller.com/code/bench-src.tgz
+
+ The basic method of execution for these benchmarks is in the
+ run_benchmarks.py script, which is simply a wrapper to execute a target
+ function through a single threaded (linear), multi-threaded (via threading),
+ and multi-process (via pyprocessing) function for a static number of
+ iterations with increasing numbers of execution loops and/or threads.
+
+ The run_benchmarks.py script executes each function 100 times, picking the
+ best run of that 100 iterations via the timeit module.
+
+ First, to identify the overhead of the spawning of the workers, we execute
+ an function which is simply a pass statement (empty):
+
+ cmd: python run_benchmarks.py empty_func.py
+ Importing empty_func
+ Starting tests ...
+ non_threaded (1 iters) 0.000001 seconds
+ threaded (1 threads) 0.000796 seconds
+ processes (1 procs) 0.000714 seconds
+
+ non_threaded (2 iters) 0.000002 seconds
+ threaded (2 threads) 0.001963 seconds
+ processes (2 procs) 0.001466 seconds
+
+ non_threaded (4 iters) 0.000002 seconds
+ threaded (4 threads) 0.003986 seconds
+ processes (4 procs) 0.002701 seconds
+
+ non_threaded (8 iters) 0.000003 seconds
+ threaded (8 threads) 0.007990 seconds
+ processes (8 procs) 0.005512 seconds
+
+ As you can see, process forking via the pyprocessing module is faster than
+ the speed of building and then executing the threaded version of the code.
+
+ The second test calculates 50000 fibonacci numbers inside of each thread
+ (isolated and shared nothing):
+
+ cmd: python run_benchmarks.py fibonacci.py
+ Importing fibonacci
+ Starting tests ...
+ non_threaded (1 iters) 0.195548 seconds
+ threaded (1 threads) 0.197909 seconds
+ processes (1 procs) 0.201175 seconds
+
+ non_threaded (2 iters) 0.397540 seconds
+ threaded (2 threads) 0.397637 seconds
+ processes (2 procs) 0.204265 seconds
+
+ non_threaded (4 iters) 0.795333 seconds
+ threaded (4 threads) 0.797262 seconds
+ processes (4 procs) 0.206990 seconds
+
+ non_threaded (8 iters) 1.591680 seconds
+ threaded (8 threads) 1.596824 seconds
+ processes (8 procs) 0.417899 seconds
+
+ The third test calculates the sum of all primes below 100000, again sharing
+ nothing.
+
+ cmd: run_benchmarks.py crunch_primes.py
+ Importing crunch_primes
+ Starting tests ...
+ non_threaded (1 iters) 0.495157 seconds
+ threaded (1 threads) 0.522320 seconds
+ processes (1 procs) 0.523757 seconds
+
+ non_threaded (2 iters) 1.052048 seconds
+ threaded (2 threads) 1.154726 seconds
+ processes (2 procs) 0.524603 seconds
+
+ non_threaded (4 iters) 2.104733 seconds
+ threaded (4 threads) 2.455215 seconds
+ processes (4 procs) 0.530688 seconds
+
+ non_threaded (8 iters) 4.217455 seconds
+ threaded (8 threads) 5.109192 seconds
+ processes (8 procs) 1.077939 seconds
+
+
+ The reason why tests two and three focused on pure numeric crunching is to
+ showcase how the current threading implementation does hinder non-I/O
+ applications. Obviously, these tests could be improved to use a queue for
+ coordination of results and chunks of work but that is not required to show
+ the performance of the module.
+
+ The next test is an I/O bound test. This is normally where we see a steep
+ improvement in the threading module approach versus a single-threaded
+ approach. In this case, each worker is opening a descriptor to lorem.txt,
+ randomly seeking within it and writing lines to /dev/null:
+
+ cmd: python run_benchmarks.py file_io.py
+ Importing file_io
+ Starting tests ...
+ non_threaded (1 iters) 0.057750 seconds
+ threaded (1 threads) 0.089992 seconds
+ processes (1 procs) 0.090817 seconds
+
+ non_threaded (2 iters) 0.180256 seconds
+ threaded (2 threads) 0.329961 seconds
+ processes (2 procs) 0.096683 seconds
+
+ non_threaded (4 iters) 0.370841 seconds
+ threaded (4 threads) 1.103678 seconds
+ processes (4 procs) 0.101535 seconds
+
+ non_threaded (8 iters) 0.749571 seconds
+ threaded (8 threads) 2.437204 seconds
+ processes (8 procs) 0.203438 seconds
+
+ As you can see, pyprocessing is still faster on this I/O operation than
+ using multiple threads. And using multiple threads is slower than the
+ single threaded execution itself.
+
+ Finally, we will run a socket-based test to show network I/O performance.
+ This function grabs a URL from a server on the LAN that is a simple error
+ page from tomcat. It gets the page 100 times. The network is silent, and a
+ 10G connection:
+
+ cmd: python run_benchmarks.py url_get.py
+ Importing url_get
+ Starting tests ...
+ non_threaded (1 iters) 0.124774 seconds
+ threaded (1 threads) 0.120478 seconds
+ processes (1 procs) 0.121404 seconds
+
+ non_threaded (2 iters) 0.239574 seconds
+ threaded (2 threads) 0.146138 seconds
+ processes (2 procs) 0.138366 seconds
+
+ non_threaded (4 iters) 0.479159 seconds
+ threaded (4 threads) 0.200985 seconds
+ processes (4 procs) 0.188847 seconds
+
+ non_threaded (8 iters) 0.960621 seconds
+ threaded (8 threads) 0.659298 seconds
+ processes (8 procs) 0.298625 seconds
+
+ We finally see threaded performance surpass that of single-threaded
+ execution, but the pyprocessing module is still faster when increasing the
+ number of workers. If you stay with one or two threads/workers, then the
+ timing between threads and pyprocessing is fairly close.
+
+ Additional benchmarks can be found in the pyprocessing module's source
+ distribution's examples/ directory.
+
+Maintenance
+
+ Richard M. Oudkerk - the author of the pyprocessing module has agreed to
+ maintaing the module within Python SVN. Jesse Noller has volunteered to
+ also help maintain/document and test the module.
+
+Timing/Schedule
+
+ Some concerns have been raised about the timing/lateness of this PEP
+ for the 2.6 and 3.0 releases this year, however it is felt by both
+ the authors and others that the functionality this module offers
+ surpasses the risk of inclusion.
+
+ However, taking into account the desire not to destabilize python-core, some
+ refactoring of pyprocessing's code "into" python-core can be withheld until
+ the next 2.x/3.x releases. This means that the actual risk to python-core
+ is minimal, and largely constrained to the actual module itself.
+
+Open Issues
+
+ * All existing tests for the module should be converted to UnitTest format.
+ * Existing documentation has to be moved to ReST formatting.
+ * Verify code coverage percentage of existing test suite.
+ * Identify any requirements to achieve a 1.0 milestone if required.
+ * Verify current source tree conforms to standard library practices.
+ * Rename top-level module from "pyprocessing" to "multiprocessing".
+ * Confirm no "default" remote connection capabilities, if needed enable the
+ remote security mechanisms by default for those classes which offer remote
+ capabilities.
+ * Some of the API (Queue methods qsize(), task_done() and join()) either
+ need to be added, or the reason for their exclusion needs to be identified
+ and documented clearly.
+
+Closed Issues
+
+ * Reliance on ctypes: The pyprocessing module's reliance on ctypes prevents
+ the module from functioning on platforms where ctypes is not supported.
+ This is not a restriction of this module, but rather ctypes.
+
+References
+
+ [1] PyProcessing home page
+ http://pyprocessing.berlios.de/
+
+ [2] See Adam Olsen's "safe threading" project
+ http://code.google.com/p/python-safethread/
+
+ [3] See: Addition of "pyprocessing" module to standard lib.
+ http://mail.python.org/pipermail/python-dev/2008-May/079417.html
+
+ [4] http://mpi4py.scipy.org/
+
+ [5] See "Cluster Computing"
+ http://wiki.python.org/moin/ParallelProcessing
+
+ [6] The original run_benchmark.py code was published in Python
+ Magazine in December 2008: "Python Threads and the Global Interpreter
+ Lock" by Jesse Noller. It has been modified for this PEP.
+
+Copyright
+
+ This document has been placed in the public domain.
+
+
+
+Local Variables:
+mode: indented-text
+indent-tabs-mode: nil
+sentence-end-double-space: t
+fill-column: 70
+coding: utf-8
+End:
More information about the Python-checkins
mailing list