[Python-ideas] fork

Sun Aug 2 02:02:53 CEST 2015

On Aug 1, 2015, at 10:36, Sven R. Kunze <srkunze at mail.de> wrote:
> 
> Thanks everybody for the feedback on 'fork'.
> 
> Let me address the issues and specify it further:
> 
> 
> 1) Process vs. Thread vs. Coroutine
> 
> From my understanding, the main fallacy here is that the caller would be able to decide which type of pool is best suited.
> 
> Take create_thumbnail as an example. You do not know whether this is cpu-bound or io-bound; you can just make a guess or try it out.
> 
> But who knows then? I would say: the callee.
> 
> create_thumbnail is cpu-bound when doing the work itself on the machine.
> create_thumbnail is io-bound when delegating the work to, say, a web service.

There's a whole separate thread going on about making it easier to understand the distinctions between coroutine/thread/process, separate tasks/pools/executors, etc. There's really no way to take that away from the programmer, but Python  (and, more importantly, the Python docs) could do a lot to make that easier.

Your idea of having a single global "pool manager" object, where you could submit tasks and, depending on how they're marked, they get handled differently might have merit. But that's something you could build pretty easily on top of concurrent.futures (at least for threads vs. processes; you can add in coroutines later, because they're not quite as easy to integrate), upload to PyPI, and start getting experience with before trying to push it into the stdlib, much less the core language. (Notice that Greg Ewing had a proposal a few years ago that was very similar to the recent async/await change, but he couldn't sell anyone on it. But then, after extensive experience with the asyncio module, first as tulip on PyPI and then added to the stdlib, the need for the new syntax became more obvious to everyone, and people--including Guido--who had rejected Greg's proposal out of hand enthusiastically supported the new proposal.)

> Same functionality, same name, same API, different pools required.
> 
> 
> This said, I would propose something like a marking solution:
> 
> @cpu_bound
> def create_thumbnail(image):
>     # impl
> 
> @io_bound
> def create_thumbnail(image):
>     # impl
> 
> (coroutines are already marked as such)
> 
> From this, the Python interpreter should be able to infer which type of pool is appropriate.
> 
> 
> 2) Pool size
> 
> Do lists have a fixed length? Do I need to define their lengths right from the start? Do I know them in advance?
> 
> I think the answers to these questions are obvious. I don't understand why it should be different for the size of the pools. They could grow and shrink depending on the workload and the available resources.

The available resources rarely change at runtime. If you're doing CPU-bound work, the number of cores is unlikely to change during a run. (In rare cases, you might want to sometimes count hyperthreads as separate cores and sometimes not, but that would depend on intimate knowledge of the execution characteristics of the tasks you're submitting in two different places.) Similarly, if you're doing threads, the ideal pool size usually depends more on what you're waiting for than on what you're doing--12 threads may be great for submitting URLs to arbitrary servers on the internet, 4 threads may be better for submitting to a specific web service that you've configured to match, 16 threads may be better for a simulation with 2^n bodies, etc

Sometimes these really do need to grow and shrink configurably--not during a run, but during a deployment. In that case, you should store them in a config file rather than hard coding them. Then your sysadmin/deploy manager/whatever can learn how to test and configure them. For a real-life example (although not in Python), I know Level3 configured their video servers to use 4 processes of 4 threads per machine, while Akamai used 1 process of 16 threads (actually 2, but the second only for failover, not used live). Why? I have no idea, but presumably they tested the software with their machines and their networks and came to different results, and it's a good thing their software allowed them to configure it so they could each save that 1.3% heat or whatever it was they were trying to optimize.

> 3) Pool Management in General
> 
> There is a reason why I hesitate to explicitly manage pools. Our code runs on a plethora of platforms ranging from few to many hardware threads. We actually do not want to integrate platform-specific properties right into the source. The point of having parallelism and concurrency is to squeeze out more of the machines and get better response times. Anything else wouldn't be honest in my opinion (besides from researching and experimenting).

Which is exactly why some apps should expose these details to the sysadmin as configuration variables. Hiding the details inside the interpreter would make that harder, not easier.

> Thus, a practical solution needs to be simple and universal. Explicitly setting the size of the pool is not universal and definitely not easy.

If you want universal and easy, the default value is the number of CPUs, which is often the best value to use. When you don't need to manually configure things to squeeze out the last few %, just rely on the defaults. When you do need to, it should be as easy as possible. And that's the way things currently are.

> It doesn't need to be perfect. Even if a first draft implementation would simply define pools having exactly 4 processes/threads/coroutines, that would be awesome. Even cutting execution time into half would be an amazing accomplishment.
> 
> Maybe, even 'fork' is too complicated. It could work without it given the decorators above. But then, we could not decide whether to run things in parallel or sequentially. I think I do not like that.
> 
> 
> 4) Keyword 'fork'
> 
> Well, first shot. If you have a better one, I am all in for it (4 letters or shorter only ;) )... Or maybe something like 'par' for parallel or 'con' for concurrent.
> 
> 
> 5) Awaiting the Completion of Something
> 
> As Andrew proposed, using the return value should result in blocking.
> 
> What if there is no result to wait for?
> That one is harder but I think another keyword like 'wait' or 'await' should work here fine.
> 
> for image in images:
>     fork create_thumbnail(image)
> wait
> print(get_size_of_thumbnail_dir())

This only allows you to wait on everything to finish, or nothing at all. Very often, you want to wait on things in whatever order they come in. Or wait until the first task has finished. Or wait on them in the order they were submitted (which still allows you to get some pipelining over waiting on all).

This is a well-known problem, and the standard solution across many languages is futures. The concurrent.futures module and the asyncio module are both designed around futures. You can explicitly wait on a future, or chain further operations onto a future--and, more importantly, you can compose futures into various kinds of group-waiting objects (wait for all, wait for any, wait for all or until first error, wait in any order, wait in specified order) that are themselves futures.

If you want to try to collapse futures into syntax, you need something that still retains all of the power of futures. A single keyword isn't going to do that.

Also, note that await is already a keyword in Python; it's used to explicitly block until another coroutine is ready. In other words, it's a syntactic form of the very simplest way to use futures (and note that, because futures are composable, anything can ultimately be reduced to "block until this one future is ready"). The reason the thread/process futures don't have such a keyword is that they don't need one; just calling a function blocks on it, and, because threads and processes are preemptive rather than cooperative, that works without blocking any other tasks. So, instead of writing "await futures.wait(iterable_of_futures, where=FIRST_EXCEPTION)" you just write the same thing without "await" and it already does what you want.

> 6) Exceptions
> 
> As close to sequential execution as possible.
> 
> That is, when some function is forked out and raises an exception, it should behave as if it were a normal function call.
> 
> for image in images:
>     fork create_thumbnail(image) # I would like to see that in my stacktrace

Futures already take care of this. They automatically transport exceptions (with stack traces) across the boundary to reraise where they're waited for.

> Also true for expressions. '+=' might raise an exception because, say, huge_calculation returns 'None'. Although the actually evaluation of the sum needs to take place only at the print statement, I would like to see the exception raised at the highlighted place:
> 
> end_result = 0
> for items in items_list:
>     end_result += fork huge_calculation(items) # stacktrace for '+=' should be here
> print(end_result) # not here

In this code, your += isn't inside a "fork", so there's no way the implementation could know that you want it delayed. What you're asking for here is either implicit lazy evaluation, contagious futures, or dataflow variables, all of which are much more radical changes to the language than just adding syntactic sugar for explicit futures.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150801/413afc6d/attachment.html>