-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From fijall at gmail.com  Tue Jul 13 13:57:31 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 13 Jul 2010 13:57:31 +0200
Subject: [pypy-dev] Windows binaries
In-Reply-To: 
References: 
Message-ID: 

On Tue, Jul 13, 2010 at 1:09 PM, Dave Cross  wrote:
> Hi,
>
>
>
> Is there a likely delivery date for Windows binaries of PyPy 1.3?
>

Eh, sorry, my fault, will upload them today.

>
>
> Dave.
>
> **********************************************************************
> Please consider the environment - do you really need to print this email?
>
> This email is intended only for the person(s) named above and may contain
> private and confidential information. If it has come to you in error, please
> destroy and permanently delete any copy in your possession and contact us on
> +44 (0) 161 480 4420. The information in this email is copyright ? CDL Group
> Holdings Limited. We cannot accept any liability for any loss or damage
> sustained as a result of software viruses. It is your responsibility to
> carry out such virus checking as is necessary before opening any attachment.
> Cheshire Datasystems Limited uses software which automatically screens
> incoming emails for inappropriate content and attachments. If the software
> identifies such content or attachment, the email will be forwarded to our
> Technology Department for checking. You should be aware that any email which
> you send to Cheshire Datasystems Limited is subject to this procedure.
> Cheshire Datasystems Limited, Strata House, Kings Reach Road, Stockport SK4
> 2HD
> Registered in England and Wales with Company Number 3991057
> VAT registration: 727 1188 33
>
>
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>

From fijall at gmail.com  Sun Jul 18 18:36:24 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Sun, 18 Jul 2010 18:36:24 +0200
Subject: [pypy-dev] [pypy-svn] r76268 - pypy/branch/micronumpy/pypy/tool
In-Reply-To: <20100716224104.25C49282BD4@codespeak.net>
References: <20100716224104.25C49282BD4@codespeak.net>
Message-ID: 

Benchmarks generally should go to pypy/benchmarks directory in the
main source tree (that is svn+ssh://codespeak.net/svn/pypy/benchmarks)

On Sat, Jul 17, 2010 at 12:41 AM,   wrote:
> Author: dan
> Date: Sat Jul 17 00:41:02 2010
> New Revision: 76268
>
> Added:
> ? pypy/branch/micronumpy/pypy/tool/convolve.py
> Modified:
> ? pypy/branch/micronumpy/pypy/tool/numpybench.py
> Log:
> Oops, I forgot the most important part of the benchmark!
>
> Added: pypy/branch/micronumpy/pypy/tool/convolve.py
> ==============================================================================
> --- (empty file)
> +++ pypy/branch/micronumpy/pypy/tool/convolve.py ? ? ? ?Sat Jul 17 00:41:02 2010
> @@ -0,0 +1,43 @@
> +from __future__ import division
> +from __main__ import numpy as np
> +
> +def naive_convolve(f, g):
> + ? ?# f is an image and is indexed by (v, w)
> + ? ?# g is a filter kernel and is indexed by (s, t),
> + ? ?# ? it needs odd dimensions
> + ? ?# h is the output image and is indexed by (x, y),
> + ? ?# ? it is not cropped
> + ? ?if g.shape[0] % 2 != 1 or g.shape[1] % 2 != 1:
> + ? ? ? ?raise ValueError("Only odd dimensions on filter supported")
> + ? ?# smid and tmid are number of pixels between the center pixel
> + ? ?# and the edge, ie for a 5x5 filter they will be 2.
> + ? ?#
> + ? ?# The output size is calculated by adding smid, tmid to each
> + ? ?# side of the dimensions of the input image.
> + ? ?vmax = f.shape[0]
> + ? ?wmax = f.shape[1]
> + ? ?smax = g.shape[0]
> + ? ?tmax = g.shape[1]
> + ? ?smid = smax // 2
> + ? ?tmid = tmax // 2
> + ? ?xmax = vmax + 2*smid
> + ? ?ymax = wmax + 2*tmid
> + ? ?# Allocate result image.
> + ? ?h = np.zeros([xmax, ymax], dtype=f.dtype)
> + ? ?# Do convolution
> + ? ?for x in range(xmax):
> + ? ? ? ?for y in range(ymax):
> + ? ? ? ? ? ?# Calculate pixel value for h at (x,y). Sum one component
> + ? ? ? ? ? ?# for each pixel (s, t) of the filter g.
> + ? ? ? ? ? ?s_from = max(smid - x, -smid)
> + ? ? ? ? ? ?s_to = min((xmax - x) - smid, smid + 1)
> + ? ? ? ? ? ?t_from = max(tmid - y, -tmid)
> + ? ? ? ? ? ?t_to = min((ymax - y) - tmid, tmid + 1)
> + ? ? ? ? ? ?value = 0
> + ? ? ? ? ? ?for s in range(s_from, s_to):
> + ? ? ? ? ? ? ? ?for t in range(t_from, t_to):
> + ? ? ? ? ? ? ? ? ? ?v = x - smid + s
> + ? ? ? ? ? ? ? ? ? ?w = y - tmid + t
> + ? ? ? ? ? ? ? ? ? ?value += g[smid - s, tmid - t] * f[v, w]
> + ? ? ? ? ? ?h[x, y] = value
> + ? ?return h
>
> Modified: pypy/branch/micronumpy/pypy/tool/numpybench.py
> ==============================================================================
> --- pypy/branch/micronumpy/pypy/tool/numpybench.py ? ? ?(original)
> +++ pypy/branch/micronumpy/pypy/tool/numpybench.py ? ? ?Sat Jul 17 00:41:02 2010
> @@ -21,13 +21,29 @@
> ? ? return numpy.array(kernel)
>
> ?if __name__ == '__main__':
> - ? ?from sys import argv as args
> - ? ?width, height, kwidth, kheight = [int(x) for x in args[1:]]
> + ? ?from optparse import OptionParser
> +
> + ? ?option_parser = OptionParser()
> + ? ?option_parser.add_option('--kernel-size', dest='kernel', default='3x3',
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? help="The size of the convolution kernel, given as WxH. ie 3x3"
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?"Note that both dimensions must be odd.")
> + ? ?option_parser.add_option('--image-size', dest='image', default='256x256',
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? help="The size of the image, given as WxH. ie. 256x256")
> + ? ?option_parser.add_option('--runs', '--count', dest='count', default=1000,
> + ? ? ? ? ? ? ? ? ? ? ? ? ? ? help="The number of times to run the convolution filter")
> +
> + ? ?options, args = option_parser.parse_args()
> +
> + ? ?def parse_dimension(arg):
> + ? ? ? ?return [int(s.strip()) for s in arg.split('x')]
> +
> + ? ?width, height = parse_dimension(options.image)
> + ? ?kwidth, kheight = parse_dimension(options.kernel)
> + ? ?count = int(options.count)
>
> ? ? image = generate_image(width, height)
> ? ? kernel = generate_kernel(kwidth, kheight)
>
> ? ? from timeit import Timer
> ? ? convolve_timer = Timer('naive_convolve(image, kernel)', 'from convolve import naive_convolve; from __main__ import image, kernel; gc.enable()')
> - ? ?count = 100
> ? ? print "%.5f sec/pass" % (convolve_timer.timeit(number=count)/count)
> _______________________________________________
> pypy-svn mailing list
> pypy-svn at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-svn
>

From jcreigh at gmail.com  Thu Jul 22 15:34:55 2010
From: jcreigh at gmail.com (Jason Creighton)
Date: Thu, 22 Jul 2010 09:34:55 -0400
Subject: [pypy-dev] Building a shared library on x86-64 fails due to static
	linking of libffi
Message-ID: 

Hello,

While working on asmgcc-64, I ran into this issue. For some reason, PyPy
wants to link libffi statically on some platforms, Linux included. But when
compiling with the "shared" option (as is done in some asmgcroot tests), you
get link errors like:

/usr/bin/ld: /usr/lib/libffi.a(ffi64.o): relocation R_X86_64_32S against
`.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/lib/libffi.a: could not read symbols: Bad value

I interpret this to mean that since we building a shared library, the
resulting library must be position independent, so we can't link in non-PIC
such as is found in the static version of libffi on my system. (Ubuntu
10.04, x86-64). And indeed, if I switch to linking dynamically, the error
goes away and things seem to work.

However, I don't want to just blindly enable dynamic linking, because there
must be a reason it was configured to link statically in the first place.
What is that reason?

Also, what steps should I take here? I think I need to enable dynamic
linking of libffi on x86-64 Linux when building a shared library at the very
least, but to reduce the number of code paths, I'm somewhat inclined to link
dynamically whether we're building a library or not. What do you guys think?

Thanks,

Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From amauryfa at gmail.com  Thu Jul 22 17:03:57 2010
From: amauryfa at gmail.com (Amaury Forgeot d'Arc)
Date: Thu, 22 Jul 2010 17:03:57 +0200
Subject: [pypy-dev] Building a shared library on x86-64 fails due to
	static linking of libffi
In-Reply-To: 
References: 
Message-ID: 

Hi,

2010/7/22 Jason Creighton :
> Hello,
>
> While working on asmgcc-64, I ran into this issue. For some reason, PyPy
> wants to link libffi statically on some platforms, Linux included. But when
> compiling with the "shared" option (as is done in some asmgcroot tests), you
> get link errors like:
>
> /usr/bin/ld: /usr/lib/libffi.a(ffi64.o): relocation R_X86_64_32S against
> `.rodata' can not be used when making a shared object; recompile with -fPIC
> /usr/lib/libffi.a: could not read symbols: Bad value
>
> I interpret this to mean that since we building a shared library, the
> resulting library must be position independent, so we can't link in non-PIC
> such as is found in the static version of libffi on my system. (Ubuntu
> 10.04, x86-64). And indeed, if I switch to linking dynamically, the error
> goes away and things seem to work.

Exactly

> However, I don't want to just blindly enable dynamic linking, because there
> must be a reason it was configured to link statically in the first place.
> What is that reason?
>
> Also, what steps should I take here? I think I need to enable dynamic
> linking of libffi on x86-64 Linux when building a shared library at the very
> least, but to reduce the number of code paths, I'm somewhat inclined to link
> dynamically whether we're building a library or not. What do you guys think?

The reason is actually in the code: pypy/rlib/libffi.py

    # On some platforms, we try to link statically libffi, which is small
    # anyway and avoids endless troubles for installing.  On other platforms
    # libffi.a is typically not there, so we link dynamically.

Probably static linking to libffi should be disabled on 64bit platform.
Or just skip the test: for what I know, --shared is not really needed
on Unix platforms.

-- 
Amaury Forgeot d'Arc

From ndbecker2 at gmail.com  Thu Jul 22 17:59:37 2010
From: ndbecker2 at gmail.com (Neal Becker)
Date: Thu, 22 Jul 2010 11:59:37 -0400
Subject: [pypy-dev] Building a shared library on x86-64 fails due to
	static linking of libffi
References: 

Message-ID: 

AFAIK, i386 is the only platform that allows building a shared lib linked 
with a static lib.

From bhartsho at yahoo.com  Fri Jul 23 06:49:53 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Thu, 22 Jul 2010 21:49:53 -0700 (PDT)
Subject: [pypy-dev] rpython questions, **kw, __call__, __getattr__
Message-ID: <92687.49811.qm@web114018.mail.gq1.yahoo.com>

Looking through the pypy source code i see **kw, __call__ and __getattr__ are used, but when i try to write my own rpython code that uses these conventions, i get translation errors.  Do i need to borrow from "application space" in order to do this or can i just give hints to the annotator?
Thanks,
-brett

#this is allowed
def func(*args): print(args)

#but this is not?
def func(**kw): print(args)
#error call pattern too complex

#this class fails to translate, are we not allowed to define our own __call__ and __getattr__ in rpython?
class A(object):
  __call__(*args): print(args)
  __getattr__(self,name): print(name)

From fijall at gmail.com  Fri Jul 23 10:20:40 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 23 Jul 2010 10:20:40 +0200
Subject: [pypy-dev] rpython questions, **kw, __call__, __getattr__
In-Reply-To: <92687.49811.qm@web114018.mail.gq1.yahoo.com>
References: <92687.49811.qm@web114018.mail.gq1.yahoo.com>
Message-ID: 

Hello.

__call__ and __getattr__ won't work. You see it in pypy source code,
because not all of pypy source code is RPython (in fact, Python is a
metaprogramming language for RPython). Same goes to **kw.

On Fri, Jul 23, 2010 at 6:49 AM, Hart's Antler  wrote:
> Looking through the pypy source code i see **kw, __call__ and __getattr__ are used, but when i try to write my own rpython code that uses these conventions, i get translation errors. ?Do i need to borrow from "application space" in order to do this or can i just give hints to the annotator?
> Thanks,
> -brett
>
>
>
> #this is allowed
> def func(*args): print(args)
>
> #but this is not?
> def func(**kw): print(args)
> #error call pattern too complex
>
> #this class fails to translate, are we not allowed to define our own __call__ and __getattr__ in rpython?
> class A(object):
> ?__call__(*args): print(args)
> ?__getattr__(self,name): print(name)
>
>
>
>
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>

From cfbolz at gmx.de  Fri Jul 23 10:23:15 2010
From: cfbolz at gmx.de (Carl Friedrich Bolz)
Date: Fri, 23 Jul 2010 10:23:15 +0200
Subject: [pypy-dev] rpython questions, **kw, __call__, __getattr__
In-Reply-To: <92687.49811.qm@web114018.mail.gq1.yahoo.com>
References: <92687.49811.qm@web114018.mail.gq1.yahoo.com>
Message-ID: <4C495173.9070107@gmx.de>

On 07/23/2010 06:49 AM, Hart's Antler wrote:
> Looking through the pypy source code i see **kw, __call__ and
> __getattr__ are used,

Where exactly are they used? Not all of the code in PyPy is RPython.

> but when i try to write my own rpython code
> that uses these conventions, i get translation errors.  Do i need to
> borrow from "application space" in order to do this or can i just
> give hints to the annotator? Thanks, -brett
>
>
>
> #this is allowed
 > def func(*args):
 >     print(args)
>
> #but this is not?
 > def func(**kw):
 >     print(args)
 > #error call pattern too complex
>
> #this class fails to translate, are we not allowed to define our own
> __call__ and __getattr__ in rpython?

> class A(object):
 >     __call__(*args):
 >          print(args)
 >     __getattr__(self,name):
 >          print(name)

You cannot use any __xxx__ functions in RPython, only __init__ and 
__del__. Anyway, you cannot translate a class, so "fails to translate" 
has no meaning :-).

Cheers,

Carl Friedrich

From bhartsho at yahoo.com  Sat Jul 24 11:06:09 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Sat, 24 Jul 2010 02:06:09 -0700 (PDT)
Subject: [pypy-dev] PyPyGTK v0.1
Message-ID: <114704.12351.qm@web114014.mail.gq1.yahoo.com>

http://pastebin.com/UhnEurqb

The above is a crude way to run pygtk from rpython (through CPython and talking on a pipe), but at least it partially works.  Callbacks are limited to quoted lambdas, but some simple return types back to rpython is possible - i'm going to try that next.  There is no support for dynamic attribute access, but most of pygtk involves function calls.  Where attribute access is required, i guess extra proxy functions could be written.

-brett 

From cfbolz at gmx.de  Sat Jul 24 11:21:28 2010
From: cfbolz at gmx.de (Carl Friedrich Bolz)
Date: Sat, 24 Jul 2010 11:21:28 +0200
Subject: [pypy-dev] PyPyGTK v0.1
In-Reply-To: <114704.12351.qm@web114014.mail.gq1.yahoo.com>
References: <114704.12351.qm@web114014.mail.gq1.yahoo.com>
Message-ID: <4C4AB098.1010306@gmx.de>

Hi Brett,

On 07/24/2010 11:06 AM, Hart's Antler wrote:
> http://pastebin.com/UhnEurqb

Nice. Did you also see this?: 
http://morepypy.blogspot.com/2009/11/using-cpython-extension-modules-with.html
I guess it could be used for GTK as well.

BTW, I guess if we ever wanted "real" GTK support, without proxying 
CPython, we should use the GObject-introspection features, which should 
make the wrapping rather simple.

Cheers,

Carl Friedrich

From bhartsho at yahoo.com  Mon Jul 26 07:47:02 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Sun, 25 Jul 2010 22:47:02 -0700 (PDT)
Subject: [pypy-dev] PyPy Proxy
Message-ID: <345280.58625.qm@web114015.mail.gq1.yahoo.com>

The code from PyPyGTK has been generalized so that it can work with PyGame, and PyODE.  Function calls are improved so that different arg types can be accepted by defining custom wrappers per function.  Custom wrappers can also be made for the return values, so different types can be handled as well (it seems that rpython restricts what can be returned from a function to the same types).  Proxy objects can move back and forth from CPython to RPython.  The wrappers for pygtk, pyode, and pygame are by no means complete, but some basic tests are working.  Callbacks are limited to quoted lambdas, but it could be improved.

http://pastebin.com/rWEfgMSN

I had seen the other proxy method before, but found few examples, how does it work, from Rpython or the PyPy interpreter?
http://morepypy.blogspot.com/2009/11/using-cpython-extension-modules-with.html

From kevinar18 at hotmail.com  Tue Jul 27 04:09:16 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Mon, 26 Jul 2010 22:09:16 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
Message-ID: 

Might as well warn you: This is going to be a rather long post.
I'm not sure if this is appropriate to post here or if would fit right in with the mailing list.? Sorry, if it is the wrong place to post about this.

I've looked through the documenation (http://codespeak.net/pypy/dist/pypy/doc/stackless.html) and didn't really see what I was looking for.? I've also investigated several options in the default CPython.

What I'm trying to accomplish:
I am trying to write a particular threading scenario that follows these rules.? It is partly an experiment and partly for actual production code.

1. Hundreds or thousands of micro-threads that are essentially small self-contained programs (not really, but you can think of them that way).
2. No shared state - data is passed around from one micro-thread to another; only one micro-thread has access to the data at a time. (although the programmer gets the impression there is no shared state, in reality, the underlying implementation uses shared memory / shared state for speed; the data does not move; you just pass around a reference/pointer to some shared memory)
3. The micro-threads can run in parallel on different cpu cores, get moved to a different core, etc....
4. The micro-threads are truly pre-emptive (uses hardware interrupt pre-emption).
5. It is my intention to write my own scheduler that will suspend the micro-threads, start them, control the sharing of data, assign them to different CPU cores etc....? In fact, for my purposes, I MUST write my own scheduler as I have very specific requirements on when they should and should not run.

Now, I have spent some time trying to find a way to achieve this ... and I can implement a rather poor version using default Python.? However, I don't see any way to implement my ideal version.? Maybe someone here might have some pointers for me.

Shared Memory between parallel processes
----------------------------------------
Quick Question: Do queues from the multiprocessing module use shared memory?? If the answer is YES, you can just skip this section, because that would solve this particular problem.

(For simplicity, let's assume a quad core CPU)
It is my intent to create 4 threads/processs (one per core) and use the scheduler to assign a micro-thread (of which there may be hundreds) to one of the 4 threads/processes.? However, the micro-threads need to exchange data quickly; to do that I need shared memory -- and that is where I'm having some trouble.
Normally, 4 threads would be the ideal solution -- as they can run in parallel and use shared memory.? However, because of the Python GIL, I can't use threads in this way; thus, I have to use 4 processes, which are not setup to share memory.

Question: How can I share Python Objects between processes USING SHARED MEMORY?? I do not want to have to copy or "pass" data back and forth between processes or have to use a proxy "server" process.? These are both too much of a performance hit for my needs; shared memory is what I need.

The multiprocessing module offers me 4 options: "queues", "pipes", "shared memory map", and a "server process".
"Shared memory map" won't work as it only handles C values and arrays (not Python objects or variables).
"Server Process" sounds like a bad idea.? Am I correct in that this option requires extra processing power and does not even use shared memory?? If so, that would be a very bad choice for me.
The big question then... do "queues" and "pipes" used shared memory or do they pass data back and forth between processes?? (if they used shared memory, then that would be perfect)

Does PyPy have any other options for me?

True Pre-emptive scheduling?

----------------------------

Any way to get pre-emptive micro-threads?? Stackless (the real 
Stackless, not the one in PyPy) has the ability to suspend them after a 
certain number of interpreter instructions; however, this is prone to 
problems because it can run much longer than expected.? Ideally, I would 
like to have true pre-emptive scheduling using 
hardware interrupts based on timing or CPU cycles (like the OS does for 
real threads).

I am currently not aware of any way to achieve this in CPython, PyPy, Unladen Swallow, Stackless, etc....

Are there detailed docs on why the Python GIL exists?
-----------------------------------------------------
I don't mean trivial statements like "because of C extensions" or "because the interpreter can't handle it".
It may be possible that my particular usage would not require the GIL.? However, I won't know this until I can understand what threading problems the Python interpreter has that the GIL was meant to protect against.? Is there detailed documentation about this anywhere that covers all the threading issues that the GIL was meant to solve?

Thanks,
Kevin

_________________________________________________________________
Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1

From evan at theunixman.com  Tue Jul 27 08:27:03 2010
From: evan at theunixman.com (Evan Cofsky)
Date: Mon, 26 Jul 2010 23:27:03 -0700
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: 
References: 
Message-ID: <20100727062702.GE12699@tunixman.com>

On 07/26 22:09, Kevin Ar18 wrote:
> What I'm trying to accomplish:
>
> I am trying to write a particular threading scenario that follows these
> rules.? It is partly an experiment and partly for actual
> production code.

This is actually interesting to me as well. I can't count the number of
times I've had to implement something like this for projects. It would be
nice to be able to use a public module instead of writing it all
yet again.

> Now, I have spent some time trying to find a way to achieve this ... and
> I can implement a rather poor version using default Python.? However, I
> don't see any way to implement my ideal version.? Maybe someone here
> might have some pointers for me.

> Shared Memory between parallel processes

This is the way I usually implement it. I'm currently mulling over some
sort of byte-addressable abstraction that can use a buffer or any sequence
as a backing store, which would make it useful for mmap objects as well.
And I'm thinking about using the class definitions and inheritance to
handle nested structures in some way.

> Quick Question: Do queues from the multiprocessing module use shared
> memory?? If the answer is YES, you can just skip this section, because
> that would solve this particular problem.

I can't imagine it wouldn't, but I haven't checked the source yet.

> Question: How can I share Python Objects between processes USING SHARED
> MEMORY?? I do not want to have to copy or "pass" data back and forth
> between processes or have to use a proxy "server" process.? These are
> both too much of a performance hit for my needs; shared memory is what
> I need.

Anonymous memory-mapped regions would work, with a suitable data
abstraction. Or even memory-mapped files, which aren't really all that
different on systems anymore.

> The multiprocessing module offers me 4 options: "queues", "pipes", "shared memory map", and a "server process".
> "Shared memory map" won't work as it only handles C values and arrays (not Python objects or variables).

cPickle could help. But then there's a serialization/deserialization step
which wouldn't really be too fast. It's not slow, but the cost of copying
the data is far outweighed by the cost of the dumps/loads, and if you need
to share multiple copies you're really going to feel it.

> "Server Process" sounds like a bad idea.? Am I correct in that this
> option requires extra processing power and does not even use
> shared memory??

Not really. It depends on how you would implement it.

> The big question then... do "queues" and "pipes" used shared memory or
> do they pass data back and forth between processes?? (if they used
> shared memory, then that would be perfect)

Queues most likely do, pipes absolutely do not.

> Does PyPy have any other options for me?

I wonder if it could be done with an object space, or similarly done
"behind the scenes" in the PyPy interpreter, sort of the way ZODB works
semi-transparently. Only in this case completely transparently.

> True Pre-emptive scheduling?

This wouldn't really be difficult, although doing it efficiently might
very well be without some serious black magic. But PyPy may also be the
right tool for that since the black magic can be written in Python or
RPython instead of C.

> Any way to get pre-emptive micro-threads?? Stackless (the real
> Stackless, not the one in PyPy) has the ability to suspend them after a
> certain number of interpreter instructions; however, this is prone to
> problems because it can run much longer than expected.? Ideally, I would
> like to have true pre-emptive scheduling using hardware interrupts based
> on timing or CPU cycles (like the OS does for real threads).

By using a process for each thread, and some shared memory arena for the
bulk of the application data structures, this is probably quite possible
without reimplementing the OS in Python.

> I am currently not aware of any way to achieve this in CPython, PyPy,
> Unladen Swallow, Stackless, etc....

I've done this a number of times, both with threads and with processes.
Processes ironically give you finer control over scheduling since you
aren't stuck behind the GIL, but as you are finding, you need some way to
share data.

> Are there detailed docs on why the Python GIL exists?

Here is the page from the Python Wiki:

http://wiki.python.org/moin/GlobalInterpreterLock

And here is an interesting article on the GIL problem:

http://blog.ianbicking.org/gil-of-doom.html

-- 
Evan Cofsky "The UNIX Man" 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: Digital signature
URL: 

From fijall at gmail.com  Tue Jul 27 11:43:50 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 27 Jul 2010 11:43:50 +0200
Subject: [pypy-dev] PyPy Proxy
In-Reply-To: <345280.58625.qm@web114015.mail.gq1.yahoo.com>
References: <345280.58625.qm@web114015.mail.gq1.yahoo.com>
Message-ID: 

Hey.

Does it come with tests? Or how can I look how is it working?

On Mon, Jul 26, 2010 at 7:47 AM, Hart's Antler  wrote:
> The code from PyPyGTK has been generalized so that it can work with PyGame, and PyODE. ?Function calls are improved so that different arg types can be accepted by defining custom wrappers per function. ?Custom wrappers can also be made for the return values, so different types can be handled as well (it seems that rpython restricts what can be returned from a function to the same types). ?Proxy objects can move back and forth from CPython to RPython. ?The wrappers for pygtk, pyode, and pygame are by no means complete, but some basic tests are working. ?Callbacks are limited to quoted lambdas, but it could be improved.
>
> http://pastebin.com/rWEfgMSN
>
> I had seen the other proxy method before, but found few examples, how does it work, from Rpython or the PyPy interpreter?
> http://morepypy.blogspot.com/2009/11/using-cpython-extension-modules-with.html
>
>
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>

From fijall at gmail.com  Tue Jul 27 11:48:57 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 27 Jul 2010 11:48:57 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 
Message-ID: 

On Tue, Jul 27, 2010 at 4:09 AM, Kevin Ar18  wrote:
>
> Might as well warn you: This is going to be a rather long post.
> I'm not sure if this is appropriate to post here or if would fit right in with the mailing list.? Sorry, if it is the wrong place to post about this.
>

This is a relevant list for some of questions below. I'll try to answer them.

> Quick Question: Do queues from the multiprocessing module use shared memory?? If the answer is YES, you can just skip this section, because that would solve this particular problem.

PyPy has no multiprocessing module so far (besides, I think it's an
ugly hack, but that's another issue).

>
> Does PyPy have any other options for me?
>

Right now, no. But there are ways in which you can experiment. Truly
concurrent threads (depends on implicit vs explicit shared memory)
might require a truly concurrent GC to achieve performance. This is
work (although not as big as removing refcounting from CPython for
example).

>
> True Pre-emptive scheduling?
>
> ----------------------------
>
> Any way to get pre-emptive micro-threads?? Stackless (the real
> Stackless, not the one in PyPy) has the ability to suspend them after a
> certain number of interpreter instructions; however, this is prone to
> problems because it can run much longer than expected.? Ideally, I would
> like to have true pre-emptive scheduling using
> hardware interrupts based on timing or CPU cycles (like the OS does for
> real threads).
>
> I am currently not aware of any way to achieve this in CPython, PyPy, Unladen Swallow, Stackless, etc....
>

Sounds relatively easy, but you would need to write this part in
RPython (however, that does not mean you get rid of GIL).

>
> Are there detailed docs on why the Python GIL exists?
> -----------------------------------------------------
> I don't mean trivial statements like "because of C extensions" or "because the interpreter can't handle it".
> It may be possible that my particular usage would not require the GIL.? However, I won't know this until I can understand what threading problems the Python interpreter has that the GIL was meant to protect against.? Is there detailed documentation about this anywhere that covers all the threading issues that the GIL was meant to solve?

The short answer is "yes". The long answer is that it's much easier to
write interpreter assuming GIL is around. For fine-grained locking to
work and be efficient, you would need:

* Some sort of concurrent GC (not specifically running in a separate
thread, but having different pools of memory to allocate from)
* Possibly a JIT optimization that would remove some locking.
* The forementioned locking, to ensure that it's not that easy to
screw things up.

So, in short, "work".

From bhartsho at yahoo.com  Tue Jul 27 14:43:47 2010
From: bhartsho at yahoo.com (Hart's Antler)
Date: Tue, 27 Jul 2010 05:43:47 -0700 (PDT)
Subject: [pypy-dev] PyPy Proxy
In-Reply-To: 
Message-ID: <929795.85917.qm@web114016.mail.gq1.yahoo.com>

Hi Maciej,

Yes it comes with its own test, just save the file from pastebin and run it.  You should two gtk windows popup and a pygame window that draws a circle.

I have a new version with proxy support for one module from Blender2.5 (bpy.ops); note that pygtk, ode, and pygame are broken in this version because i had to change some things so it can run in Blender's embedded python3.1

http://pastebin.com/TsYNqd8p

--- On Tue, 7/27/10, Maciej Fijalkowski  wrote:

> From: Maciej Fijalkowski 
> Subject: Re: [pypy-dev] PyPy Proxy
> To: "Hart's Antler" 
> Cc: pypy-dev at codespeak.net
> Date: Tuesday, 27 July, 2010, 2:43 AM
> Hey.
> 
> Does it come with tests? Or how can I look how is it
> working?
> 
> On Mon, Jul 26, 2010 at 7:47 AM, Hart's Antler 
> wrote:
> > The code from PyPyGTK has been generalized so that it
> can work with PyGame, and PyODE. ?Function calls are
> improved so that different arg types can be accepted by
> defining custom wrappers per function. ?Custom wrappers can
> also be made for the return values, so different types can
> be handled as well (it seems that rpython restricts what can
> be returned from a function to the same types). ?Proxy
> objects can move back and forth from CPython to RPython.
> ?The wrappers for pygtk, pyode, and pygame are by no means
> complete, but some basic tests are working. ?Callbacks are
> limited to quoted lambdas, but it could be improved.
> >
> > http://pastebin.com/rWEfgMSN
> >
> > I had seen the other proxy method before, but found
> few examples, how does it work, from Rpython or the PyPy
> interpreter?
> > http://morepypy.blogspot.com/2009/11/using-cpython-extension-modules-with.html
> >
> >
> >
> > _______________________________________________
> > pypy-dev at codespeak.net
> > http://codespeak.net/mailman/listinfo/pypy-dev
> >
> 

From p.giarrusso at gmail.com  Tue Jul 27 15:17:29 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Tue, 27 Jul 2010 15:17:29 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: <20100727062702.GE12699@tunixman.com>
References: 
	<20100727062702.GE12699@tunixman.com>
Message-ID: 

On Tue, Jul 27, 2010 at 08:27, Evan Cofsky  wrote:
> On 07/26 22:09, Kevin Ar18 wrote:
>> Are there detailed docs on why the Python GIL exists?
>
> Here is the page from the Python Wiki:
>
> http://wiki.python.org/moin/GlobalInterpreterLock

To keep it short, CPython uses refcounting, and without the GIL the
refcount incs and decs would need to be atomic, with a huge
performance impact (that's discussed in the below links).

However, you can look at this answer from Guido van Rossum:
http://www.artima.com/weblogs/viewpost.jsp?thread=214235

And these two attempts to remove the GIL:
http://code.google.com/p/unladen-swallow/wiki/ProjectPlan#Global_Interpreter_Lock
http://code.google.com/p/python-safethread/

PyPy does not have this problem, but you still need to make
thread-safe the dictionaries holding members of each object. You don't
need to make lists thread-safe, I think, because the programmer is
supposed to lock them, but you want to allow a thread to add a member
to an object while another thread performs a method call.

Anyway, all this just explains why the GIL is still there, which is a
slightly different question from the original one. With
state-of-the-art technology, it is bad on every front, except
simplicity of implementation.

> And here is an interesting article on the GIL problem:
>
> http://blog.ianbicking.org/gil-of-doom.html

Given that processor frequencies aren't going to increase a lot in the
future as they used to do, while the number of cores is going to
increase much more, this article seems outdated nowadays - see also
http://atlee.ca/blog/2006/06/27/python-warts-2/.

This other link (http://poshmodule.sourceforge.net/) used to be
interesting for the problem you are discussing, but seems also dead -
there are other modules here:
http://wiki.python.org/moin/ParallelProcessing.

Best regards
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/

From fijall at gmail.com  Tue Jul 27 15:42:55 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 27 Jul 2010 15:42:55 +0200
Subject: [pypy-dev] rotting buildbot infrastructure
Message-ID: 

Hello.

According to current buildbot status, both osx and win machines are
offline. No clue how to get them back. Anyway, our OS X machine is
unable to translate pypy, so it's not exactly the best buildbot ever.
Can anyone contribute any machine for one of those buildbots?

Cheers
fijal

From p.giarrusso at gmail.com  Tue Jul 27 16:36:26 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Tue, 27 Jul 2010 16:36:26 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 

Message-ID: 

Hi all!

I am possibly interested in doing work on this, even if not in the
immediate future.

On Tue, Jul 27, 2010 at 11:48, Maciej Fijalkowski  wrote:
> On Tue, Jul 27, 2010 at 4:09 AM, Kevin Ar18  wrote:

> Truly
> concurrent threads (depends on implicit vs explicit shared memory)
> might require a truly concurrent GC to achieve performance. This is
> work (although not as big as removing refcounting from CPython for
> example).

>> Are there detailed docs on why the Python GIL exists?
>> -----------------------------------------------------
>> I don't mean trivial statements like "because of C extensions" or "because the interpreter can't handle it".
>> It may be possible that my particular usage would not require the GIL.? However, I won't know this until I can understand what threading problems the Python interpreter has that the GIL was meant to protect against.? Is there detailed documentation about this anywhere that covers all the threading issues that the GIL was meant to solve?

> The short answer is "yes". The long answer is that it's much easier to
> write interpreter assuming GIL is around. For fine-grained locking to
> work and be efficient, you would need:

> * The forementioned locking, to ensure that it's not that easy to
> screw things up.
I've wondered around the guarantees we need to offer to the
programmer, and my guess was that Jython's memory model is similar.
I've been concentrating on the dictionary of objects, on the
assumption that lists and most other built-in structures should be
locked by the programmer in case of concurrent modifications.

However, we don't want to require locking to support something like:
Thread 1:
obj.newmember=1;
Thread 2:
a = obj.oldmember;

Looking for Jython memory model on Google produces some garbage and
then this document from Unladen Swallow:
http://code.google.com/p/unladen-swallow/wiki/MemoryModel
It implicitly agrees on what's above (since Jython and IronPython both
use thread-safe dictionaries), and then delves into issues about
allowed reorderings.
However, it requires that even racy code does not make the interpreter crash.

> * Possibly a JIT optimization that would remove some locking.
Any more specific ideas on this?
> * Some sort of concurrent GC (not specifically running in a separate
> thread, but having different pools of memory to allocate from)

Among all points, this seems the easiest design-wise. Having
per-thread pools is nowadays standard, so it's _just_ work (as opposed
to 'complicated design'). Parallel GCs become important just when lots
of garbage must be reclaimed.
A GC is called concurrent, rather than parallel, when it runs
concurrently with the mutator, and this usually reduces both pause
times and throughput, so you probably don't want this as default (it
is useful for particular programs, such as heavily interactive
programs or videogames, I guess), do you?

More details are here:
http://www.ibm.com/developerworks/java/library/j-jtp11253/

The trick used in the (mostly) concurrent collector of Hotspot seems
interesting: it uses two short-stop-the-world phases and lets the
program run in between. I think I'll look for a paper on it.

Cheers,
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/

From fijall at gmail.com  Tue Jul 27 17:07:59 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 27 Jul 2010 17:07:59 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 

Message-ID: 

On Tue, Jul 27, 2010 at 4:36 PM, Paolo Giarrusso  wrote:
> Hi all!
>
> I am possibly interested in doing work on this, even if not in the
> immediate future.

Well, talk is cheap. Would be great to see some work done of course.

Cheers,
fijal

From fijall at gmail.com  Tue Jul 27 17:11:43 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Tue, 27 Jul 2010 17:11:43 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 

Message-ID: 

>
>> Truly
>> concurrent threads (depends on implicit vs explicit shared memory)
>> might require a truly concurrent GC to achieve performance. This is
>> work (although not as big as removing refcounting from CPython for
>> example).
>
>>> Are there detailed docs on why the Python GIL exists?
>>> -----------------------------------------------------
>>> I don't mean trivial statements like "because of C extensions" or "because the interpreter can't handle it".
>>> It may be possible that my particular usage would not require the GIL.? However, I won't know this until I can understand what threading problems the Python interpreter has that the GIL was meant to protect against.? Is there detailed documentation about this anywhere that covers all the threading issues that the GIL was meant to solve?
>
>> The short answer is "yes". The long answer is that it's much easier to
>> write interpreter assuming GIL is around. For fine-grained locking to
>> work and be efficient, you would need:
>
>> * The forementioned locking, to ensure that it's not that easy to
>> screw things up.
> I've wondered around the guarantees we need to offer to the
> programmer, and my guess was that Jython's memory model is similar.
> I've been concentrating on the dictionary of objects, on the
> assumption that lists and most other built-in structures should be
> locked by the programmer in case of concurrent modifications.
>
> However, we don't want to require locking to support something like:
> Thread 1:
> obj.newmember=1;
> Thread 2:
> a = obj.oldmember;
>
> Looking for Jython memory model on Google produces some garbage and
> then this document from Unladen Swallow:
> http://code.google.com/p/unladen-swallow/wiki/MemoryModel
> It implicitly agrees on what's above (since Jython and IronPython both
> use thread-safe dictionaries), and then delves into issues about
> allowed reorderings.
> However, it requires that even racy code does not make the interpreter crash.

I guess the main restraint is "interpreter should not crash" indeed.

>
>> * Possibly a JIT optimization that would remove some locking.
> Any more specific ideas on this?

Well, yes. Determining when object is local so you don't need to do
any locking, even though it escapes (this is also "just work", since
it has been done before).

>> * Some sort of concurrent GC (not specifically running in a separate
>> thread, but having different pools of memory to allocate from)
>
> Among all points, this seems the easiest design-wise. Having
> per-thread pools is nowadays standard, so it's _just_ work (as opposed
> to 'complicated design'). Parallel GCs become important just when lots
> of garbage must be reclaimed.
> A GC is called concurrent, rather than parallel, when it runs
> concurrently with the mutator, and this usually reduces both pause
> times and throughput, so you probably don't want this as default (it
> is useful for particular programs, such as heavily interactive
> programs or videogames, I guess), do you?

I guess I meant parallel then.

>
> More details are here:
> http://www.ibm.com/developerworks/java/library/j-jtp11253/
>
> The trick used in the (mostly) concurrent collector of Hotspot seems
> interesting: it uses two short-stop-the-world phases and lets the
> program run in between. I think I'll look for a paper on it.

Would be interested in that.

>
> Cheers,
> --
> Paolo Giarrusso - Ph.D. Student
> http://www.informatik.uni-marburg.de/~pgiarrusso/
>

From holger at merlinux.eu  Tue Jul 27 18:05:49 2010
From: holger at merlinux.eu (holger krekel)
Date: Tue, 27 Jul 2010 18:05:49 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared
	memory	message passing?
In-Reply-To: 
References: 

Message-ID: <20100727160548.GJ14601@trillke.net>

On Tue, Jul 27, 2010 at 17:07 +0200, Maciej Fijalkowski wrote:
> On Tue, Jul 27, 2010 at 4:36 PM, Paolo Giarrusso  wrote:
> > Hi all!
> >
> > I am possibly interested in doing work on this, even if not in the
> > immediate future.
> 
> Well, talk is cheap. Would be great to see some work done of course.

Well, I think it can be useful to state intentions and interest.  At least
for my projects i feel a difference if people express interest (even through 
negative feedback or broken code) or if they are indifferent, 
not saying or doing anything. 

best,
holger

From jbaker at zyasoft.com  Tue Jul 27 19:58:17 2010
From: jbaker at zyasoft.com (Jim Baker)
Date: Tue, 27 Jul 2010 11:58:17 -0600
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: <20100727160548.GJ14601@trillke.net>
References: 

	<20100727160548.GJ14601@trillke.net>
Message-ID: 

A much shorter version of the Jython memory model can be found in my book:
http://jythonpodcast.hostjava.net/jythonbook/en/1.0/Concurrency.html#python-memory-model

In general, I would think the coroutine mechanism being implemented by Lukas
Stadler for the MLVM version of the hotspot JVM might be a good option; you
can directly control the scheduling, although I don't think you change the
mapping from one hardware thread to another. (That's probably not
interesting.)

There are good results with JRuby, it would be nice to replicate with Jython
- and it should be really straightforward to do that. See
http://classparser.blogspot.com/

- Jim

On Tue, Jul 27, 2010 at 10:05 AM, holger krekel  wrote:

> On Tue, Jul 27, 2010 at 17:07 +0200, Maciej Fijalkowski wrote:
> > On Tue, Jul 27, 2010 at 4:36 PM, Paolo Giarrusso 
> wrote:
> > > Hi all!
> > >
> > > I am possibly interested in doing work on this, even if not in the
> > > immediate future.
> >
> > Well, talk is cheap. Would be great to see some work done of course.
>
> Well, I think it can be useful to state intentions and interest.  At least
> for my projects i feel a difference if people express interest (even
> through
> negative feedback or broken code) or if they are indifferent,
> not saying or doing anything.
>
> best,
> holger
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From kevinar18 at hotmail.com  Tue Jul 27 20:20:10 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Tue, 27 Jul 2010 14:20:10 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: <20100727062702.GE12699@tunixman.com>
References: ,
	<20100727062702.GE12699@tunixman.com>
Message-ID: 

I won't even bother giving individual replies.? It's 
going to take me some time to go through all that information on the 
GIL, so I guess there's no much of a reply I can give anyways.? :)? Let me explain what this is all about in greater detail.

BTW, if there are more links on the GIL, feel free to post.

> Anonymous memory-mapped regions would work, with a suitable data
> abstraction. Or even memory-mapped files, which aren't really all that
> different on systems anymore.
I considered that... however, that would mean writing a significant library to convert Python data types to C/machine types and I wasn't looking forward to that prospect... although after some experimenting, maybe I will find that it won't be that big a deal for my particular situation.

-----------------------
What this is all about:
-----------------------
I am attempting to experiment with FBP - Flow Based Programming (http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com/fbp/book.pdf)? There is something very similar in Python: http://www.kamaelia.org/MiniAxon.html? Also, there are some similarities to Erlang - the share nothing memory model... and on some very broad levels, there are similarities that can be found in functional languages.

Consider p74 and p75 of the FBP book (http://www.jpaulmorrison.com/fbp/book.pdf).? Programs essentially consist of many "black boxes" connected together.? A box receives data, processes it and passes it along to another box, to output or drops/deletes it.? Each box, is like a mini-program written in a traditional programming language (like C++ or Python).

The process of connecting the boxes together was actually designed to be programmed visually, as you can see from the examples in the book (I have no idea if it works well, as I am merely starting to experiment with it).

Each box, being a self contained "program," the only data it has access to is 3 parts:
(1) it's own internal variables
(2) The "in ports" These are connections from other boxes allowing the box to receive data to be processed (very similar to the arguments in a function call)
(3) The "out ports" After processing the data, the box sends results to various "out ports" (which, in turn, go to anther box's "in port" or to system output).? There is no "return" like in functions... and a box can continually generate many pieces of data on the "out ports", unlike a function which only generates one return.

------------------------
At this point, my understanding of the FBP concept is extremely limited.? Unfortunately, the author does not have very detailed documentation on the implementation details.? So, I am going to try exploring the concept on my own and see if I can actually use it in some production code.

Implementation of FBP requires a custom scheduler for several reasons:
(1) A box can only run if it has actual data on the "in port(s)"? Thus, the scheduler would only schedule boxes to run when they can actually process some data.
(2) In theory, it may be possible to end up with hundreds or thousands of these light weight boxes.? Using heavy-weight OS threads or processes for every one is out of the question.

The Kamaelia website describes a simplistic single-threaded way to write a scheduler in Python that would work for the FBP concept (even though they never heard of FBP when they designed Kamaelia).? Based on that, it seems like writing a simple scheduler would be rather easy:

In a perfect world, here's what I might do:
* Assume a quad core cpu
(1) Spawn 1 process
(2) Spawn 4 threads & assign each thread to only 1 core -- in other words, don't let the OS handle moving threads around to different cores
(3) Inside each thread, have a mini scheduler that switches back and forth between the many micro-threads (or "boxes") -- note that the OS should not handle any of the switching between micro-threads/boxes as it does it all wrong (and to heavyweight) for this situation.
(4) Using a shared memory queue, each of the 4 schedulers can get the next box to run... or add more boxes to the schedule queue.

(5) Each box has access to its "in ports" and "out ports" only -- and nothing else.? These can be implemented as shared memory for speed.

Some notes:
Garbage Collection - I noticed that one of the issues mentioned about the GIL was garbage collection.? Within the FBP concept, this MIGHT be easily solved: (a) only 1 running piece of code (1 box) can access a piece of data at a time, so there is no worries about whether there are dangling pointers to the var/object somewhere, etc... (b) data must be manually "dropped" inside a box to get rid of it; thus, there is no need to go checking for data that is not used anymore

Threading protection - In theory, there is significantly less threading issues since: (a) only one box can control/access data at a time (b) the only place where there is contention is when you push/pop from the in/out ports ... and that is trivial to protect against.

Anyways, I appreciate the replies.? At this point, I guess I'll just go for a simplistic implementation to get a feel for how things work.? Then, maybe I can check on if something better can be done in PyPy.

_________________________________________________________________
Hotmail is redefining busy with tools for the New Busy. Get more from your inbox.
http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_2

From cfbolz at gmx.de  Tue Jul 27 23:56:26 2010
From: cfbolz at gmx.de (Carl Friedrich Bolz)
Date: Tue, 27 Jul 2010 23:56:26 +0200
Subject: [pypy-dev] rotting buildbot infrastructure
In-Reply-To: 
References: 
Message-ID: <4C4F560A.6080101@gmx.de>

On 07/27/2010 03:42 PM, Maciej Fijalkowski wrote:
> Hello.
>
> According to current buildbot status, both osx and win machines are
> offline. No clue how to get them back. Anyway, our OS X machine is
> unable to translate pypy, so it's not exactly the best buildbot ever.
> Can anyone contribute any machine for one of those buildbots?

Sorry, I will only be able to look at the OS X machine in August. Why 
can't it translate PyPy?

Carl Friedrich

From fijall at gmail.com  Wed Jul 28 08:42:22 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Wed, 28 Jul 2010 08:42:22 +0200
Subject: [pypy-dev] rotting buildbot infrastructure
In-Reply-To: <4C4F560A.6080101@gmx.de>
References:  
	<4C4F560A.6080101@gmx.de>
Message-ID: 

On Tue, Jul 27, 2010 at 11:56 PM, Carl Friedrich Bolz  wrote:
> On 07/27/2010 03:42 PM, Maciej Fijalkowski wrote:
>> Hello.
>>
>> According to current buildbot status, both osx and win machines are
>> offline. No clue how to get them back. Anyway, our OS X machine is
>> unable to translate pypy, so it's not exactly the best buildbot ever.
>> Can anyone contribute any machine for one of those buildbots?
>
> Sorry, I will only be able to look at the OS X machine in August. Why
> can't it translate PyPy?

There is not enough memory (the build timeout after like 4 or 5 hours).

>
> Carl Friedrich
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>

From stephen at thorne.id.au  Wed Jul 28 09:29:51 2010
From: stephen at thorne.id.au (Stephen Thorne)
Date: Wed, 28 Jul 2010 17:29:51 +1000
Subject: [pypy-dev] rotting buildbot infrastructure
In-Reply-To: 
References: 
	<4C4F560A.6080101@gmx.de>

Message-ID: <20100728072951.GF1338@thorne.id.au>

On 2010-07-28, Maciej Fijalkowski wrote:
> On Tue, Jul 27, 2010 at 11:56 PM, Carl Friedrich Bolz  wrote:
> > On 07/27/2010 03:42 PM, Maciej Fijalkowski wrote:
> >> Hello.
> >>
> >> According to current buildbot status, both osx and win machines are
> >> offline. No clue how to get them back. Anyway, our OS X machine is
> >> unable to translate pypy, so it's not exactly the best buildbot ever.
> >> Can anyone contribute any machine for one of those buildbots?
> >
> > Sorry, I will only be able to look at the OS X machine in August. Why
> > can't it translate PyPy?
> 
> There is not enough memory (the build timeout after like 4 or 5 hours).

I have a quad core ppc OSX (10.4) machine that isn't currently operating. It
only has a few of its RAM slots filled. If anyone wanted to fill it with RAM it
would make a reasonable build machine.

-- 
Regards,
Stephen Thorne
Development Engineer
Netbox Blue

From william.leslie.ttg at gmail.com  Wed Jul 28 14:54:39 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Wed, 28 Jul 2010 22:54:39 +1000
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 28 July 2010 04:20, Kevin Ar18  wrote:
> I am attempting to experiment with FBP - Flow Based Programming (http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com/fbp/book.pdf)? There is something very similar in Python: http://www.kamaelia.org/MiniAxon.html? Also, there are some similarities to Erlang - the share nothing memory model... and on some very broad levels, there are similarities that can be found in functional languages.

Does anyone know if there is a central resource for incompatible
python memory model proposals? I know of Jython, Python-Safethread,
and Mont-E.

I do like the idea of MiniAxon, but let me mention a topic that has
slowly been bubbling to the front of my mind for the last few months.

Concurrency in the face of shared mutable state is hard. It makes it
trivial to introduce bugs all over the place. Nondeterminacy related
bugs are far harder to test, diagnose, and fix than anything else that
I would almost mandate static verification (via optional typing,
probably) of task noninterference if I was moving to a concurrent
environment with shared mutable state. There might be a reasonable
middle ground where, if a task attempts to violate the required static
semantics, it fails dynamically. At least then, latent bugs make
plenty of noise. An example for MiniAxon (as I understand it, which is
not very well) would be verification that a "task" (including
functions that the task calls) never closes over and yields the same
mutable objects, and never mutates globally reachable objects.

I wonder if you could close such tasks off with a clever subclass of
the proxy object space that detects and rejects such memory model
violations? With only semantics that make the program deterministic?

The moral equivalent would be cooperating processes with a large
global (or not) shared memory store for immutable objects, queues for
communication, and the additional semantic that objects in a queue are
either immutable or the queue holds their only reference. The trouble
is that it is so hard to work out what immutable really means.
Non-optional annotations would be not very pythonian.

-- 
William Leslie

From p.giarrusso at gmail.com  Wed Jul 28 15:12:40 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Wed, 28 Jul 2010 15:12:40 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com> 

Message-ID: 

On Tue, Jul 27, 2010 at 20:20, Kevin Ar18  wrote:
>
> I won't even bother giving individual replies.? It's
> going to take me some time to go through all that information on the
> GIL, so I guess there's no much of a reply I can give anyways.? :)? Let me explain what this is all about in greater detail.

> BTW, if there are more links on the GIL, feel free to post.
>
>> Anonymous memory-mapped regions would work, with a suitable data
>> abstraction. Or even memory-mapped files, which aren't really all that
>> different on systems anymore.
> I considered that... however, that would mean writing a significant library to convert Python data types to C/machine types and I wasn't looking forward to that prospect... although after some experimenting, maybe I will find that it won't be that big a deal for my particular situation.

> I am attempting to experiment with FBP - Flow Based Programming (http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com/fbp/book.pdf)? There is something very similar in Python: http://www.kamaelia.org/MiniAxon.html? Also, there are some similarities to Erlang - the share nothing memory model... and on some very broad levels, there are similarities that can be found in functional languages.
Except for the "visual programming" part, the general idea you
describe stems from CSP (Communicating Sequential Processes) and is
also found at least in the Scala actor library and in Google's Go with
goroutines.

In both languages you can easily pretend that no memory is shared by
avoiding to share any pointers (unlike C, even buggy code can't modify
a pointer which wasn't shared), and Go recommends programming this
way. A difference is that this is a convention.

For the "visual programming", it looks like a particular case of what
the Eclipse Modeling Framework is doing (they allow you to define
types of diagrams, called metamodels, and a way to convert them to
code, and generate a diagram editor and other support stuff. I'm not
an expert on that).
>From what you describe, FBP seems to give nothing new, except the
combination among "visual programming" with this idea. Disclaimer: I
did not read the book.

> Consider p74 and p75 of the FBP book (http://www.jpaulmorrison.com/fbp/book.pdf).? Programs essentially consist of many "black boxes" connected together.? A box receives data, processes it and passes it along to another box, to output or drops/deletes it.? Each box, is like a mini-program written in a traditional programming language (like C++ or Python).
>
> The process of connecting the boxes together was actually designed to be programmed visually, as you can see from the examples in the book (I have no idea if it works well, as I am merely starting to experiment with it).
>
> Each box, being a self contained "program," the only data it has access to is 3 parts:
> (1) it's own internal variables
> (2) The "in ports" These are connections from other boxes allowing the box to receive data to be processed (very similar to the arguments in a function call)
> (3) The "out ports" After processing the data, the box sends results to various "out ports" (which, in turn, go to anther box's "in port" or to system output).? There is no "return" like in functions... and a box can continually generate many pieces of data on the "out ports", unlike a function which only generates one return.
>
>
> ------------------------
> At this point, my understanding of the FBP concept is extremely limited.? Unfortunately, the author does not have very detailed documentation on the implementation details.? So, I am going to try exploring the concept on my own and see if I can actually use it in some production code.
>
>
> Implementation of FBP requires a custom scheduler for several reasons:
> (1) A box can only run if it has actual data on the "in port(s)"? Thus, the scheduler would only schedule boxes to run when they can actually process some data.
> (2) In theory, it may be possible to end up with hundreds or thousands of these light weight boxes.? Using heavy-weight OS threads or processes for every one is out of the question.
>
>
> The Kamaelia website describes a simplistic single-threaded way to write a scheduler in Python that would work for the FBP concept (even though they never heard of FBP when they designed Kamaelia).? Based on that, it seems like writing a simple scheduler would be rather easy:

> In a perfect world, here's what I might do:
> * Assume a quad core cpu
> (1) Spawn 1 process
> (2) Spawn 4 threads & assign each thread to only 1 core -- in other words, don't let the OS handle moving threads around to different cores
> (3) Inside each thread, have a mini scheduler that switches back and forth between the many micro-threads (or "boxes") -- note that the OS should not handle any of the switching between micro-threads/boxes as it does it all wrong (and to heavyweight) for this situation.
> (4) Using a shared memory queue, each of the 4 schedulers can get the next box to run... or add more boxes to the schedule queue.

Most of this is usual or standard - even if somebody possibly won't
set thread-CPU affinity, possibly because they don't know about the
syscalls to do it, i.e. sched_setaffinity. IIRC, this was not
mentioned in the paper I read about the Scala actor library.
Look for 'N:M threading library' (without quotes) on Google.

> (5) Each box has access to its "in ports" and "out ports" only -- and nothing else.? These can be implemented as shared memory for speed.

> Some notes:
> Garbage Collection - I noticed that one of the issues mentioned about the GIL was garbage collection.? Within the FBP concept, this MIGHT be easily solved: (a) only 1 running piece of code (1 box) can access a piece of data at a time, so there is no worries about whether there are dangling pointers to the var/object somewhere, etc...

> (b) data must be manually "dropped" inside a box to get rid of it; thus, there is no need to go checking for data that is not used anymore

A "piece of data" can point to other objects, and the pointer can be
modified. So you need GC anyway: having that, requiring data to be
dropped explicitly seems just an annoyance (there might be deeper
reasons, however).

> Threading protection - In theory, there is significantly less threading issues since: (a) only one box can control/access data at a time (b) the only place where there is contention is when you push/pop from the in/out ports ... and that is trivial to protect against.
Agreed.
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/

From p.giarrusso at gmail.com  Wed Jul 28 15:37:07 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Wed, 28 Jul 2010 15:37:07 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com> 

Message-ID: 

On Wed, Jul 28, 2010 at 14:54, William Leslie
 wrote:
> On 28 July 2010 04:20, Kevin Ar18  wrote:
>> I am attempting to experiment with FBP - Flow Based Programming (http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com/fbp/book.pdf)? There is something very similar in Python: http://www.kamaelia.org/MiniAxon.html? Also, there are some similarities to Erlang - the share nothing memory model... and on some very broad levels, there are similarities that can be found in functional languages.

> Does anyone know if there is a central resource for incompatible
> python memory model proposals? I know of Jython, Python-Safethread,
> and Mont-E.

Add Unladen Swallow to your list - the "Jython memory model" is undocumented.
I don't know of Mont-E, can't find its website through Google (!), and
there seems to be no such central resource.

> I do like the idea of MiniAxon, but let me mention a topic that has
> slowly been bubbling to the front of my mind for the last few months.

> Concurrency in the face of shared mutable state is hard. It makes it
> trivial to introduce bugs all over the place. Nondeterminacy related
> bugs are far harder to test, diagnose, and fix than anything else that
> I would almost mandate static verification (via optional typing,
> probably) of task noninterference if I was moving to a concurrent
> environment with shared mutable state.

This is a general issue with concurrency, and usually I try to solve
this using more pencil-and-paper design than usual.

> There might be a reasonable
> middle ground where, if a task attempts to violate the required static
> semantics, it fails dynamically. At least then, latent bugs make
> plenty of noise.

In general, I've seen lots of research on this, and something
implemented in Valgrind - see here for links:
http://blaisorbladeprog.blogspot.com/2010/07/automatic-race-detection.html.
Given the interest on this, the lack of complete tools might mean that
it is just too hard currently.

> An example for MiniAxon (as I understand it, which is
> not very well) would be verification that a "task" (including
> functions that the task calls) never closes over and yields the same
> mutable objects, and never mutates globally reachable objects.

I guess that 'close over' here means 'getting as input'.

> I wonder if you could close such tasks off with a clever subclass of
> the proxy object space that detects and rejects such memory model
> violations? With only semantics that make the program deterministic?

> The moral equivalent would be cooperating processes with a large
> global (or not) shared memory store for immutable objects, queues for
> communication, and the additional semantic that objects in a queue are
> either immutable or the queue holds their only reference.

In C++ auto_ptr do it, but that's hard in Python.

> The trouble
> is that it is so hard to work out what immutable really means.
> Non-optional annotations would be not very pythonian.

If you want static guarantees, you need a statically typed language.
The usual argument for dynamic languages is that instead of static
typing, you need to write unit tests, and since you must do that
anyway, dynamic languages are a win. We have two incomplete attempts
to make programs correct:
- Types give strong guarantees against a subclass of errors (you
_never_ get certain errors from a program which compiles)
- Testing gives weak guarantees (which go just as far as you test),
but covers all classes of errors
- The middle ground would be to require annotations to prove
properties. One would need (once and for all) to annotate even strings
as immutable!

Cheers,
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/

From william.leslie.ttg at gmail.com  Wed Jul 28 16:56:43 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Thu, 29 Jul 2010 00:56:43 +1000
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 28 July 2010 23:37, Paolo Giarrusso  wrote:
> On Wed, Jul 28, 2010 at 14:54, William Leslie
>  wrote:
>> Does anyone know if there is a central resource for incompatible
>> python memory model proposals? I know of Jython, Python-Safethread,
>> and Mont-E.
>
> Add Unladen Swallow to your list - the "Jython memory model" is undocumented.
> I don't know of Mont-E, can't find its website through Google (!), and
> there seems to be no such central resource.

Mont-E was, for a long time, the hypothetical capability-secure subset
of python based on E and discussed on cap-talk. A handful of people
started work on it in earnest as a cpython fork fairly recently, but
it does seem to be pretty quiet, and documentation free. I did find a
repository and a presentation:
  http://bytebucket.org/habnabit/mont-e/overview
  https://docs.google.com/present/view?id=d9wrrrq_15ch78nq9n

> This is a general issue with concurrency, and usually I try to solve
> this using more pencil-and-paper design than usual.

I found the following paper pretty interesting. The motivating study
is some concurrency experts implementing software for proving the lack
of deadlock in Java. Even with the sort of dedication that only a
researcher with no life can provide, their deadlock inference software
itself deadlocked after many years of use.
www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-1.pdf

>> An example for MiniAxon (as I understand it, which is
>> not very well) would be verification that a "task" (including
>> functions that the task calls) never closes over and yields the same
>> mutable objects, and never mutates globally reachable objects.
>
> I guess that 'close over' here means 'getting as input'.

I mean that it keeps a reference to the objects between invocations.
Hence, sharing mutable state.

>> The trouble
>> is that it is so hard to work out what immutable really means.
>> Non-optional annotations would be not very pythonian.
>
> If you want static guarantees, you need a statically typed language.
> The usual argument for dynamic languages is that instead of static
> typing, you need to write unit tests, and since you must do that
> anyway, dynamic languages are a win.

One thing that many even very experienced hackers miss is that
(static) types (and typesystems) actually cover a broad range of
usages, and many of them are very different to the structural
typesafety systems you are used to in C# and Java. A typesystem can
prove anything that is statically computable, from the noninterference
of effects to program termination, the ability to stack allocate data
structures, and that privileged information can't be tainted.

It's important to realise that these are orthogonal to, not supersets
of, typesystems that validate structural safety. So it can be
reasonable, if yet a little more difficult, to apply them to dynamic
languages.

-- 
William Leslie

From glavoie at gmail.com  Wed Jul 28 21:32:38 2010
From: glavoie at gmail.com (Gabriel Lavoie)
Date: Wed, 28 Jul 2010 15:32:38 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: 
References: 
Message-ID: 

Hello Kevin,
     I don't know if it can be a solution to your problem but for my Master
Thesis I'm working on making Stackless Python distributed. What I did is
working but not complete and I'm right now in the process of writing the
thesis (in french unfortunately). My code currently works with PyPy's
"stackless" module onlyis and use some PyPy specific things. Here's what I
added to Stackless:

- Possibility to move tasklets easily (ref_tasklet.move(node_id)). A node is
an instance of an interpreter.
- Each tasklet has its global namespace (to avoid sharing of data). The
state is also easier to move to another interpreter this way.
- Distributed channels: All requests are known by all nodes using the
channel.
- Distributed objets: When a reference is sent to a remote node, the object
is not copied, a reference is created using PyPy's proxy object space.
- Automated dependency recovery when an object or a tasklet is loaded on
another interpreter

With a proper scheduler, many tasklets could be automatically spread in
multiple interpreters to use multiple cores or on multiple computers. A bit
like the N:M threading model where N lightweight threads/coroutines can be
executed on M threads.

The API is described here in french but it's pretty straightforward:
https://w3.mutehq.net/wiki/maitrise/API_DStackless

The code is available here (Just click on the Download link next to the
trunk folder):
https://w3.mutehq.net/websvn/wildchild/dstackless/trunk/

You need pypy-c built with --stackless. The code is a bit buggy right now
though...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From kevinar18 at hotmail.com  Thu Jul 29 02:59:21 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Wed, 28 Jul 2010 20:59:21 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: 
References: ,

Message-ID: 

> I don't know if it can be a solution to your problem but for my Master
> Thesis I'm working on making Stackless Python distributed.

It might be of use.  Thanks for the heads up.  I do have several questions:

1) Is it PyPy's stackless module or Stackless Python (stackless.com)?  Or are they the same module?
2) Do you have a non-https version of the site or one with a publically signed certificate?

P.S. You can send your reply over private email if you want, so as to not bother the list. :)

Date: Wed, 28 Jul 2010 15:32:38 -0400
Subject: Re: [pypy-dev] pre-emptive micro-threads utilizing shared memory 	message passing?
From: glavoie at gmail.com
To: kevinar18 at hotmail.com
CC: pypy-dev at codespeak.net

Hello Kevin,
     I don't know if it can be a solution to your problem but for my Master Thesis I'm working on making Stackless Python distributed. What I did is working but not complete and I'm right now in the process of writing the thesis (in french unfortunately). My code currently works with PyPy's "stackless" module onlyis and use some PyPy specific things. Here's what I added to Stackless:

- Possibility to move tasklets easily (ref_tasklet.move(node_id)). A node is an instance of an interpreter.- Each tasklet has its global namespace (to avoid sharing of data). The state is also easier to move to another interpreter this way. 
- Distributed channels: All requests are known by all nodes using the channel. - Distributed objets: When a reference is sent to a remote node, the object is not copied, a reference is created using PyPy's proxy object space.
- Automated dependency recovery when an object or a tasklet is loaded on another interpreter
With a proper scheduler, many tasklets could be automatically spread in multiple interpreters to use multiple cores or on multiple computers. A bit like the N:M threading model where N lightweight threads/coroutines can be executed on M threads. 

The API is described here in french but it's pretty straightforward:https://w3.mutehq.net/wiki/maitrise/API_DStackless

The code is available here (Just click on the Download link next to the trunk folder):https://w3.mutehq.net/websvn/wildchild/dstackless/trunk/

You need pypy-c built with --stackless. The code is a bit buggy right now though...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From kevinar18 at hotmail.com  Thu Jul 29 03:33:04 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Wed, 28 Jul 2010 21:33:04 -0400
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: ,
	<20100727062702.GE12699@tunixman.com>,
	,

Message-ID: 

As a followup to my earlier post:
"pre-emptive micro-threads utilizing shared memory message passing?"

I am actually finding that the biggest hurdle to accomplishing what I want is the lack of ANY type of shared memory -- even if it is limited.  I wonder if I might ask a question:

Would the following be a possible way to offer a limited type of shared memory:

Summary: create a system very, very similar to POSH, but with differences:

In detail, here's what I mean:
* unlike POSH, utilize OS threads and shared memory (not processes)
* Create a special shared memory location where you can place Python objects
* Each Python object you place into this location can only be accessed (modified) by 1 thread.
* You must manually assign ownership of an object to a particular thread.
* The thread that "owns" the object is the only one that can modify it.
* You can transfer ownership to another thread (but, as always only the owner can modify it).

* There is no GIL when a thread interacts with these special objects.  You can have true thread parallelism if your code uses a lot of these special objects.
* The GIL remains in place for all other data access.
* If your code has a mixture of access to the special objects and regular data, then once you hit a point where a thread starts to interact with data not in the special storage, then that thread must follow GIL rules.

Granted, there might be some difficulty with the GIL part... but I thought I might ask anyways. :)

> Date: Wed, 28 Jul 2010 22:54:39 +1000
> Subject: Re: [pypy-dev] pre-emptive micro-threads utilizing shared memory 	message passing?
> From: william.leslie.ttg at gmail.com
> To: kevinar18 at hotmail.com
> CC: pypy-dev at codespeak.net
> 
> On 28 July 2010 04:20, Kevin Ar18  wrote:
> > I am attempting to experiment with FBP - Flow Based Programming (http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com/fbp/book.pdf)  There is something very similar in Python: http://www.kamaelia.org/MiniAxon.html  Also, there are some similarities to Erlang - the share nothing memory model... and on some very broad levels, there are similarities that can be found in functional languages.
> 
> Does anyone know if there is a central resource for incompatible
> python memory model proposals? I know of Jython, Python-Safethread,
> and Mont-E.
> 
> I do like the idea of MiniAxon, but let me mention a topic that has
> slowly been bubbling to the front of my mind for the last few months.
> 
> Concurrency in the face of shared mutable state is hard. It makes it
> trivial to introduce bugs all over the place. Nondeterminacy related
> bugs are far harder to test, diagnose, and fix than anything else that
> I would almost mandate static verification (via optional typing,
> probably) of task noninterference if I was moving to a concurrent
> environment with shared mutable state. There might be a reasonable
> middle ground where, if a task attempts to violate the required static
> semantics, it fails dynamically. At least then, latent bugs make
> plenty of noise. An example for MiniAxon (as I understand it, which is
> not very well) would be verification that a "task" (including
> functions that the task calls) never closes over and yields the same
> mutable objects, and never mutates globally reachable objects.
> 
> I wonder if you could close such tasks off with a clever subclass of
> the proxy object space that detects and rejects such memory model
> violations? With only semantics that make the program deterministic?
> 
> The moral equivalent would be cooperating processes with a large
> global (or not) shared memory store for immutable objects, queues for
> communication, and the additional semantic that objects in a queue are
> either immutable or the queue holds their only reference. The trouble
> is that it is so hard to work out what immutable really means.
> Non-optional annotations would be not very pythonian.
> 
> -- 
> William Leslie

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From alex.gaynor at gmail.com  Thu Jul 29 03:44:23 2010
From: alex.gaynor at gmail.com (Alex Gaynor)
Date: Wed, 28 Jul 2010 20:44:23 -0500
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On Wed, Jul 28, 2010 at 8:33 PM, Kevin Ar18  wrote:
> As a followup to my earlier post:
> "pre-emptive micro-threads utilizing shared memory message passing?"
>
> I am actually finding that the biggest hurdle to accomplishing what I want
> is the lack of ANY type of shared memory -- even if it is limited.??I wonder
> if I might ask a question:
>
> Would the following be a possible way to offer a limited type of shared
> memory:
>
> Summary: create a system very, very similar to POSH, but with differences:
>
> In detail, here's what I mean:
> * unlike POSH, utilize OS threads and shared memory (not processes)
> * Create a special shared memory location where you can place Python objects
> * Each Python object you place into this location can only be accessed
> (modified) by 1 thread.
> * You must manually assign ownership of an object to a particular thread.
> * The thread that "owns" the object is the only one that can modify it.
> * You can transfer ownership to another thread (but, as always only the
> owner can modify it).
>
> * There is no GIL when a thread interacts with these special objects.??You
> can have true thread parallelism if your code uses a lot of these special
> objects.
> * The GIL remains in place for all other data access.
> * If your code has a mixture of access to the special objects and regular
> data, then once you hit a point where a thread starts to interact with data
> not in the special storage, then that thread must follow GIL rules.
>
> Granted, there might be some difficulty with the GIL part... but I thought I
> might ask anyways. :)
>
>> Date: Wed, 28 Jul 2010 22:54:39 +1000
>> Subject: Re: [pypy-dev] pre-emptive micro-threads utilizing shared memory
>> message passing?
>> From: william.leslie.ttg at gmail.com
>> To: kevinar18 at hotmail.com
>> CC: pypy-dev at codespeak.net
>>
>> On 28 July 2010 04:20, Kevin Ar18  wrote:
>> > I am attempting to experiment with FBP - Flow Based Programming
>> > (http://www.jpaulmorrison.com/fbp/ and book:
>> > http://www.jpaulmorrison.com/fbp/book.pdf)? There is something very similar
>> > in Python: http://www.kamaelia.org/MiniAxon.html? Also, there are some
>> > similarities to Erlang - the share nothing memory model... and on some very
>> > broad levels, there are similarities that can be found in functional
>> > languages.
>>
>> Does anyone know if there is a central resource for incompatible
>> python memory model proposals? I know of Jython, Python-Safethread,
>> and Mont-E.
>>
>> I do like the idea of MiniAxon, but let me mention a topic that has
>> slowly been bubbling to the front of my mind for the last few months.
>>
>> Concurrency in the face of shared mutable state is hard. It makes it
>> trivial to introduce bugs all over the place. Nondeterminacy related
>> bugs are far harder to test, diagnose, and fix than anything else that
>> I would almost mandate static verification (via optional typing,
>> probably) of task noninterference if I was moving to a concurrent
>> environment with shared mutable state. There might be a reasonable
>> middle ground where, if a task attempts to violate the required static
>> semantics, it fails dynamically. At least then, latent bugs make
>> plenty of noise. An example for MiniAxon (as I understand it, which is
>> not very well) would be verification that a "task" (including
>> functions that the task calls) never closes over and yields the same
>> mutable objects, and never mutates globally reachable objects.
>>
>> I wonder if you could close such tasks off with a clever subclass of
>> the proxy object space that detects and rejects such memory model
>> violations? With only semantics that make the program deterministic?
>>
>> The moral equivalent would be cooperating processes with a large
>> global (or not) shared memory store for immutable objects, queues for
>> communication, and the additional semantic that objects in a queue are
>> either immutable or the queue holds their only reference. The trouble
>> is that it is so hard to work out what immutable really means.
>> Non-optional annotations would be not very pythonian.
>>
>> --
>> William Leslie
>
> _______________________________________________
> pypy-dev at codespeak.net
> http://codespeak.net/mailman/listinfo/pypy-dev
>

Honestly, that sounds really difficult, out and out removing the GIL
would probably be easier.

Alex

-- 
"I disapprove of what you say, but I will defend to the death your
right to say it." -- Voltaire
"The people's good is the highest law." -- Cicero
"Code can always be simpler than you think, but never as simple as you
want" -- Me

From kevinar18 at hotmail.com  Thu Jul 29 04:07:57 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Wed, 28 Jul 2010 22:07:57 -0400
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: ,
	<20100727062702.GE12699@tunixman.com>,
	,
	,
	,

Message-ID: 

> Honestly, that sounds really difficult, out and out removing the GIL
> would probably be easier.
Based on the extremely limited info on the GIL, the big issue I noticed were two pieces of code trying to modify the same object at the same time because of the way they are stored internally in Python and because of garbage collection.
I figured, if you have special objects that cannot be simultaneously accessed that maybe that would be possible. 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From william.leslie.ttg at gmail.com  Thu Jul 29 07:18:57 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Thu, 29 Jul 2010 15:18:57 +1000
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 29 July 2010 11:33, Kevin Ar18  wrote:
> In detail, here's what I mean:
> * unlike POSH, utilize OS threads and shared memory (not processes)
> * Create a special shared memory location where you can place Python objects
> * Each Python object you place into this location can only be accessed
> (modified) by 1 thread.
> * You must manually assign ownership of an object to a particular thread.
> * The thread that "owns" the object is the only one that can modify it.
> * You can transfer ownership to another thread (but, as always only the
> owner can modify it).

When an object is mutable, it must be visible to at most one thread.
This means it can participate in return values, arguments and queues,
but the sender cannot keep a reference to an object it sends, because
if the receiver mutates the object, this will need to be reflected in
the sender's thread to ensure internal consistency. Well, you could
ignore internal consistency, require explicit locking, and have it
segfault when the change to the length of your list has propogated but
not the element you have added, but that wouldn't be much fun. The
alternative, implicitly writing updates back to memory as soon as
possible and reading them out of memory every time, can be hundreds or
more times slower. So you really can't have two tasks sharing mutable
objects, ever.

-- 
William Leslie

From fijall at gmail.com  Thu Jul 29 09:27:05 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 29 Jul 2010 09:27:05 +0200
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com> 

Message-ID: 

On Thu, Jul 29, 2010 at 7:18 AM, William Leslie
 wrote:
> On 29 July 2010 11:33, Kevin Ar18  wrote:
>> In detail, here's what I mean:
>> * unlike POSH, utilize OS threads and shared memory (not processes)
>> * Create a special shared memory location where you can place Python objects
>> * Each Python object you place into this location can only be accessed
>> (modified) by 1 thread.
>> * You must manually assign ownership of an object to a particular thread.
>> * The thread that "owns" the object is the only one that can modify it.
>> * You can transfer ownership to another thread (but, as always only the
>> owner can modify it).
>
> When an object is mutable, it must be visible to at most one thread.
> This means it can participate in return values, arguments and queues,
> but the sender cannot keep a reference to an object it sends, because
> if the receiver mutates the object, this will need to be reflected in
> the sender's thread to ensure internal consistency. Well, you could
> ignore internal consistency, require explicit locking, and have it
> segfault when the change to the length of your list has propogated but
> not the element you have added, but that wouldn't be much fun. The
> alternative, implicitly writing updates back to memory as soon as
> possible and reading them out of memory every time, can be hundreds or
> more times slower. So you really can't have two tasks sharing mutable
> objects, ever.
>
> --
> William Leslie

Hi.

Do you have any data points supporting your claim?

Cheers,
fijal

From william.leslie.ttg at gmail.com  Thu Jul 29 09:32:57 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Thu, 29 Jul 2010 17:32:57 +1000
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 29 July 2010 17:27, Maciej Fijalkowski  wrote:
> On Thu, Jul 29, 2010 at 7:18 AM, William Leslie
>  wrote:
>> When an object is mutable, it must be visible to at most one thread.
>> This means it can participate in return values, arguments and queues,
>> but the sender cannot keep a reference to an object it sends, because
>> if the receiver mutates the object, this will need to be reflected in
>> the sender's thread to ensure internal consistency. Well, you could
>> ignore internal consistency, require explicit locking, and have it
>> segfault when the change to the length of your list has propogated but
>> not the element you have added, but that wouldn't be much fun. The
>> alternative, implicitly writing updates back to memory as soon as
>> possible and reading them out of memory every time, can be hundreds or
>> more times slower. So you really can't have two tasks sharing mutable
>> objects, ever.
>>
>> --
>> William Leslie
>
> Hi.
>
> Do you have any data points supporting your claim?

About the performance of programs that involve a cache miss on every
memory access, or internal consistency?

-- 
William Leslie

From fijall at gmail.com  Thu Jul 29 09:40:21 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 29 Jul 2010 09:40:21 +0200
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com> 

Message-ID: 

On Thu, Jul 29, 2010 at 9:32 AM, William Leslie
 wrote:
> On 29 July 2010 17:27, Maciej Fijalkowski  wrote:
>> On Thu, Jul 29, 2010 at 7:18 AM, William Leslie
>>  wrote:
>>> When an object is mutable, it must be visible to at most one thread.
>>> This means it can participate in return values, arguments and queues,
>>> but the sender cannot keep a reference to an object it sends, because
>>> if the receiver mutates the object, this will need to be reflected in
>>> the sender's thread to ensure internal consistency. Well, you could
>>> ignore internal consistency, require explicit locking, and have it
>>> segfault when the change to the length of your list has propogated but
>>> not the element you have added, but that wouldn't be much fun. The
>>> alternative, implicitly writing updates back to memory as soon as
>>> possible and reading them out of memory every time, can be hundreds or
>>> more times slower. So you really can't have two tasks sharing mutable
>>> objects, ever.
>>>
>>> --
>>> William Leslie
>>
>> Hi.
>>
>> Do you have any data points supporting your claim?
>
> About the performance of programs that involve a cache miss on every
> memory access, or internal consistency?
>

I think I lost some implication here. Did I get you right - you claim
that per-object locking in case threads share obejcts are very
expensive, is that correct? If not, I completely misunderstood you and
my question makes no sense, please explain. If yes, why does it mean a
cache miss on every read/write?

Cheers,
fijal

From william.leslie.ttg at gmail.com  Thu Jul 29 09:57:58 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Thu, 29 Jul 2010 17:57:58 +1000
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 29 July 2010 17:40, Maciej Fijalkowski  wrote:
> On Thu, Jul 29, 2010 at 9:32 AM, William Leslie
>  wrote:
>> On 29 July 2010 17:27, Maciej Fijalkowski  wrote:
>>> On Thu, Jul 29, 2010 at 7:18 AM, William Leslie
>>>  wrote:
>>>> When an object is mutable, it must be visible to at most one thread.
>>>> This means it can participate in return values, arguments and queues,
>>>> but the sender cannot keep a reference to an object it sends, because
>>>> if the receiver mutates the object, this will need to be reflected in
>>>> the sender's thread to ensure internal consistency. Well, you could
>>>> ignore internal consistency, require explicit locking, and have it
>>>> segfault when the change to the length of your list has propogated but
>>>> not the element you have added, but that wouldn't be much fun. The
>>>> alternative, implicitly writing updates back to memory as soon as
>>>> possible and reading them out of memory every time, can be hundreds or
>>>> more times slower. So you really can't have two tasks sharing mutable
>>>> objects, ever.
>>>>
>>>> --
>>>> William Leslie
>>>
>>> Hi.
>>>
>>> Do you have any data points supporting your claim?
>>
>> About the performance of programs that involve a cache miss on every
>> memory access, or internal consistency?
>>
>
> I think I lost some implication here. Did I get you right - you claim
> that per-object locking in case threads share obejcts are very
> expensive, is that correct? If not, I completely misunderstood you and
> my question makes no sense, please explain. If yes, why does it mean a
> cache miss on every read/write?

I claim that there are two alternatives in the face of one thread
mutating an object and the other observing:

0. You can give up consistency and do fine-grained locking, which is
reasonably fast but error prone, or
1. Expect python to handle all of this for you, effectively not making
a change to the memory model. You could do this with implicit
per-object locks which might be reasonably fast in the absence of
contention, but not when several threads are trying to use the object.

Queues already are in a sense your per-object-lock,
one-thread-mutating, but usually one thread has acquire semantics and
one has release semantics, and that combination actually works. It's
when you expect to have a full memory barrier that is the problem.

Come to think of it, you might be right Kevin: as long as only one
thread mutates the object, the mutating thread never /needs/ to
acquire, as it knows that it has the latest revision.

Have I missed something?

-- 
William Leslie

From fijall at gmail.com  Thu Jul 29 10:02:30 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 29 Jul 2010 10:02:30 +0200
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com> 

Message-ID: 

On Thu, Jul 29, 2010 at 9:57 AM, William Leslie
 wrote:
> On 29 July 2010 17:40, Maciej Fijalkowski  wrote:
>> On Thu, Jul 29, 2010 at 9:32 AM, William Leslie
>>  wrote:
>>> On 29 July 2010 17:27, Maciej Fijalkowski  wrote:
>>>> On Thu, Jul 29, 2010 at 7:18 AM, William Leslie
>>>>  wrote:
>>>>> When an object is mutable, it must be visible to at most one thread.
>>>>> This means it can participate in return values, arguments and queues,
>>>>> but the sender cannot keep a reference to an object it sends, because
>>>>> if the receiver mutates the object, this will need to be reflected in
>>>>> the sender's thread to ensure internal consistency. Well, you could
>>>>> ignore internal consistency, require explicit locking, and have it
>>>>> segfault when the change to the length of your list has propogated but
>>>>> not the element you have added, but that wouldn't be much fun. The
>>>>> alternative, implicitly writing updates back to memory as soon as
>>>>> possible and reading them out of memory every time, can be hundreds or
>>>>> more times slower. So you really can't have two tasks sharing mutable
>>>>> objects, ever.
>>>>>
>>>>> --
>>>>> William Leslie
>>>>
>>>> Hi.
>>>>
>>>> Do you have any data points supporting your claim?
>>>
>>> About the performance of programs that involve a cache miss on every
>>> memory access, or internal consistency?
>>>
>>
>> I think I lost some implication here. Did I get you right - you claim
>> that per-object locking in case threads share obejcts are very
>> expensive, is that correct? If not, I completely misunderstood you and
>> my question makes no sense, please explain. If yes, why does it mean a
>> cache miss on every read/write?
>
> I claim that there are two alternatives in the face of one thread
> mutating an object and the other observing:
>
> 0. You can give up consistency and do fine-grained locking, which is
> reasonably fast but error prone, or
> 1. Expect python to handle all of this for you, effectively not making
> a change to the memory model. You could do this with implicit
> per-object locks which might be reasonably fast in the absence of
> contention, but not when several threads are trying to use the object.
>
> Queues already are in a sense your per-object-lock,
> one-thread-mutating, but usually one thread has acquire semantics and
> one has release semantics, and that combination actually works. It's
> when you expect to have a full memory barrier that is the problem.
>
> Come to think of it, you might be right Kevin: as long as only one
> thread mutates the object, the mutating thread never /needs/ to
> acquire, as it knows that it has the latest revision.
>
> Have I missed something?
>
> --
> William Leslie
>

So my question is why you think 1. is really expensive (can you find
evidence). I don't see what is has to do with cache misses. Besides,
in python you cannot guarantee much about mutability of objects. So
you don't know if object passed in a queue is mutable or not, unless
you restrict yourself to some very simlpe types (in which case there
is no shared memory, since you only pass immutable objects).

Cheers,
fijal

From william.leslie.ttg at gmail.com  Thu Jul 29 10:50:52 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Thu, 29 Jul 2010 18:50:52 +1000
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 29 July 2010 18:02, Maciej Fijalkowski  wrote:
> On Thu, Jul 29, 2010 at 9:57 AM, William Leslie
>  wrote:
>> I claim that there are two alternatives in the face of one thread
>> mutating an object and the other observing:
>>
>> 0. You can give up consistency and do fine-grained locking, which is
>> reasonably fast but error prone, or
>> 1. Expect python to handle all of this for you, effectively not making
>> a change to the memory model. You could do this with implicit
>> per-object locks which might be reasonably fast in the absence of
>> contention, but not when several threads are trying to use the object.
>>
>> Queues already are in a sense your per-object-lock,
>> one-thread-mutating, but usually one thread has acquire semantics and
>> one has release semantics, and that combination actually works. It's
>> when you expect to have a full memory barrier that is the problem.
>>
>> Come to think of it, you might be right Kevin: as long as only one
>> thread mutates the object, the mutating thread never /needs/ to
>> acquire, as it knows that it has the latest revision.
>>
>> Have I missed something?
>>
>> --
>> William Leslie
>>
>
> So my question is why you think 1. is really expensive (can you find
> evidence). I don't see what is has to do with cache misses. Besides,
> in python you cannot guarantee much about mutability of objects. So
> you don't know if object passed in a queue is mutable or not, unless
> you restrict yourself to some very simlpe types (in which case there
> is no shared memory, since you only pass immutable objects).

If task X expects that task Y will mutate some object it has, it needs
to go back to the source for every read. This means that if you do use
mutation of some shared object for communication, it needs to be
synchronised before every access. What this means for us is that every
read from a possibly mutable object requires an acquire, and every
write requires a release. It's as if every reference in the program is
implemented with a volatile pointer. Even if the object is never
mutated, there can be a lot of unnecessary bus chatter waiting for
MESI to tell us so.

-- 
William Leslie

From fijall at gmail.com  Thu Jul 29 10:55:25 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Thu, 29 Jul 2010 10:55:25 +0200
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com> 

Message-ID: 

On Thu, Jul 29, 2010 at 10:50 AM, William Leslie
 wrote:
> On 29 July 2010 18:02, Maciej Fijalkowski  wrote:
>> On Thu, Jul 29, 2010 at 9:57 AM, William Leslie
>>  wrote:
>>> I claim that there are two alternatives in the face of one thread
>>> mutating an object and the other observing:
>>>
>>> 0. You can give up consistency and do fine-grained locking, which is
>>> reasonably fast but error prone, or
>>> 1. Expect python to handle all of this for you, effectively not making
>>> a change to the memory model. You could do this with implicit
>>> per-object locks which might be reasonably fast in the absence of
>>> contention, but not when several threads are trying to use the object.
>>>
>>> Queues already are in a sense your per-object-lock,
>>> one-thread-mutating, but usually one thread has acquire semantics and
>>> one has release semantics, and that combination actually works. It's
>>> when you expect to have a full memory barrier that is the problem.
>>>
>>> Come to think of it, you might be right Kevin: as long as only one
>>> thread mutates the object, the mutating thread never /needs/ to
>>> acquire, as it knows that it has the latest revision.
>>>
>>> Have I missed something?
>>>
>>> --
>>> William Leslie
>>>
>>
>> So my question is why you think 1. is really expensive (can you find
>> evidence). I don't see what is has to do with cache misses. Besides,
>> in python you cannot guarantee much about mutability of objects. So
>> you don't know if object passed in a queue is mutable or not, unless
>> you restrict yourself to some very simlpe types (in which case there
>> is no shared memory, since you only pass immutable objects).
>
> If task X expects that task Y will mutate some object it has, it needs
> to go back to the source for every read. This means that if you do use
> mutation of some shared object for communication, it needs to be
> synchronised before every access. What this means for us is that every
> read from a possibly mutable object requires an acquire, and every
> write requires a release. It's as if every reference in the program is
> implemented with a volatile pointer. Even if the object is never
> mutated, there can be a lot of unnecessary bus chatter waiting for
> MESI to tell us so.
>

I do agree there is an overhead. Can you provide some data how much
this overhead is? Python is not a very simple language and a lot of
things are complex and time consuming, so I wonder how it compares to
locking per object.

From sparks.m at gmail.com  Thu Jul 29 11:44:52 2010
From: sparks.m at gmail.com (Michael Sparks)
Date: Thu, 29 Jul 2010 10:44:52 +0100
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

Would comments from a project using this approach in real systems be
of interest/use/help? Whilst I didn't know about Morrison's FBP
(Balzer's work predates him btw - don't listen to hype) I had heard of
(and played with) Occam among other more influential things, and
Kamaelia is a real tool. Also there is already a pre-existing FBP tool
for Stackless, and then historically there's also MASCOT & friends. It
just looks to me that you're tieing yourself up in knots over things
that aren't problems, when there are some things which could be useful
(in practice) & interesting in this space.

Oh, incidentally, Mini Axon is a toy/teaching/testing system - as the
name suggests. The main Axon is more complete -- in the areas we've
needed - it's been driven by real system needs.

(for those who don't know me, Kamaelia is my project, I don't bite,
but I do sometimes talk or type fast)

Regards,

Michael Sparks
--
http://www.kamaelia.org/PragmaticConcurrency.html
http://yeoldeclue.com/blog

On 7/29/10, Kevin Ar18  wrote:
> As a followup to my earlier post:
> "pre-emptive micro-threads utilizing shared memory message passing?"
>
> I am actually finding that the biggest hurdle to accomplishing what I want
> is the lack of ANY type of shared memory -- even if it is limited.  I wonder
> if I might ask a question:
>
> Would the following be a possible way to offer a limited type of shared
> memory:
>
> Summary: create a system very, very similar to POSH, but with differences:
>
> In detail, here's what I mean:
> * unlike POSH, utilize OS threads and shared memory (not processes)
> * Create a special shared memory location where you can place Python objects
> * Each Python object you place into this location can only be accessed
> (modified) by 1 thread.
> * You must manually assign ownership of an object to a particular thread.
> * The thread that "owns" the object is the only one that can modify it.
> * You can transfer ownership to another thread (but, as always only the
> owner can modify it).
>
> * There is no GIL when a thread interacts with these special objects.  You
> can have true thread parallelism if your code uses a lot of these special
> objects.
> * The GIL remains in place for all other data access.
> * If your code has a mixture of access to the special objects and regular
> data, then once you hit a point where a thread starts to interact with data
> not in the special storage, then that thread must follow GIL rules.
>
> Granted, there might be some difficulty with the GIL part... but I thought I
> might ask anyways. :)
>
>> Date: Wed, 28 Jul 2010 22:54:39 +1000
>> Subject: Re: [pypy-dev] pre-emptive micro-threads utilizing shared memory
>> 	message passing?
>> From: william.leslie.ttg at gmail.com
>> To: kevinar18 at hotmail.com
>> CC: pypy-dev at codespeak.net
>>
>> On 28 July 2010 04:20, Kevin Ar18  wrote:
>> > I am attempting to experiment with FBP - Flow Based Programming
>> > (http://www.jpaulmorrison.com/fbp/ and book:
>> > http://www.jpaulmorrison.com/fbp/book.pdf)  There is something very
>> > similar in Python: http://www.kamaelia.org/MiniAxon.html  Also, there
>> > are some similarities to Erlang - the share nothing memory model... and
>> > on some very broad levels, there are similarities that can be found in
>> > functional languages.
>>
>> Does anyone know if there is a central resource for incompatible
>> python memory model proposals? I know of Jython, Python-Safethread,
>> and Mont-E.
>>
>> I do like the idea of MiniAxon, but let me mention a topic that has
>> slowly been bubbling to the front of my mind for the last few months.
>>
>> Concurrency in the face of shared mutable state is hard. It makes it
>> trivial to introduce bugs all over the place. Nondeterminacy related
>> bugs are far harder to test, diagnose, and fix than anything else that
>> I would almost mandate static verification (via optional typing,
>> probably) of task noninterference if I was moving to a concurrent
>> environment with shared mutable state. There might be a reasonable
>> middle ground where, if a task attempts to violate the required static
>> semantics, it fails dynamically. At least then, latent bugs make
>> plenty of noise. An example for MiniAxon (as I understand it, which is
>> not very well) would be verification that a "task" (including
>> functions that the task calls) never closes over and yields the same
>> mutable objects, and never mutates globally reachable objects.
>>
>> I wonder if you could close such tasks off with a clever subclass of
>> the proxy object space that detects and rejects such memory model
>> violations? With only semantics that make the program deterministic?
>>
>> The moral equivalent would be cooperating processes with a large
>> global (or not) shared memory store for immutable objects, queues for
>> communication, and the additional semantic that objects in a queue are
>> either immutable or the queue holds their only reference. The trouble
>> is that it is so hard to work out what immutable really means.
>> Non-optional annotations would be not very pythonian.
>>
>> --
>> William Leslie
>

From william.leslie.ttg at gmail.com  Thu Jul 29 15:15:32 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Thu, 29 Jul 2010 23:15:32 +1000
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 29 July 2010 18:55, Maciej Fijalkowski  wrote:
> On Thu, Jul 29, 2010 at 10:50 AM, William Leslie
>  wrote:
>> If task X expects that task Y will mutate some object it has, it needs
>> to go back to the source for every read. This means that if you do use
>> mutation of some shared object for communication, it needs to be
>> synchronised before every access. What this means for us is that every
>> read from a possibly mutable object requires an acquire, and every
>> write requires a release. It's as if every reference in the program is
>> implemented with a volatile pointer. Even if the object is never
>> mutated, there can be a lot of unnecessary bus chatter waiting for
>> MESI to tell us so.
>>
>
> I do agree there is an overhead. Can you provide some data how much
> this overhead is? Python is not a very simple language and a lot of
> things are complex and time consuming, so I wonder how it compares to
> locking per object.

It *is* locking per object, but you also spend time looking for the
data if someone else has invalidated your cache line.

Come to think of it, that isn't as bad as it first seemed to me. If
the sender never mutates the object, it will Just Work on any machine
with a fairly flat cache architecture.

Sorry. Carry on.

-- 
William Leslie

From kevinar18 at hotmail.com  Thu Jul 29 18:56:28 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 29 Jul 2010 12:56:28 -0400
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: ,
	<20100727062702.GE12699@tunixman.com>,
	,
	,
	,
	,
	,
	,
	,

Message-ID: 

> I claim that there are two alternatives in the face of one thread
> mutating an object and the other observing:
Well, I did consider the possibility of one thread being able to change, the others observe, but I have no idea if that is too complicate like you are suggesting.
However, that is not even necessary.  An even more limited form, would work fine (at least for me):

Two possible modes:
Read/Write from 1 thread:
* ONLY one thread can change and observe(read) -- no other threads have access of any kind or even know of its existence until you transfer control to another thread (then only the thread you transferred control has acces).
(Optional) read only from all threads:
* Optionally, you could have objects that are in read only mode and all threads can observe it.

To make things easier, maybe special GIL-free threads could be added.  (They would still be OS-level threads, but with special properties in Python.) These threads would have the property that they could ONLY access data stored in the special object store to which they have read/write privilege.  They can't access other objects not in the special store.  As a result, these special threads would be free of the GIL and could run in parallel.

> Queues already are in a sense your per-object-lock,
> one-thread-mutating, but usually one thread has acquire semantics and
> one has release semantics, and that combination actually works. It's
> when you expect to have a full memory barrier that is the problem.

Now you brought up something interesting: queues
To be honest something like queues and pipes would good enough for my purposes -- if they used shared memory.  Currently, the implemenation of queues and pipes in the multiprocessing module seems rather costly as they use processes, and require copying data back and forth.
In particular, what would be useful:

* A queue that holds self-contained Python objects (with no pointers/references to other data not in the queue so as to prevent threading issues)
* The queue can be accessed by all special threads simultaneously (in parallel).  You would only need locks around queue operations, but that is pretty easy to do -- unless there is some hidden Interpreter problem that would make this easy task hard.
* Streaming buffers -- like a file buffer or something similar, so you can send data from one thread to another as it comes in (when you don't know when it will end or it may never end).  Only two threads have access: one to put data in, the other to extract it.

> 0. You can give up consistency and do fine-grained locking, which is
> reasonably fast but error prone, or
> 1. Expect python to handle all of this for you, effectively not making
> a change to the memory model. You could do this with implicit
> per-object locks which might be reasonably fast in the absence of
> contention, but not when several threads are trying to use the object.
> 
...
> 
> Come to think of it, you might be right Kevin: as long as only one
> thread mutates the object, the mutating thread never /needs/ to
> acquire, as it knows that it has the latest revision.
> 
> Have I missed something?
I'm afraid I don't know enough about Python's Interpreter to say much.  The only way would be for me to do some studying on interpreters/compilers and get digging into the codebase -- and I'm not sure how much time I have to do that right now. :)
Perhaps the part about one thread only having read & write changes the situation?

One possible implemenation might be similar to how POSH does it:
Now, I'm not suggesting this, because I know enough to say it is possible, but just to put something out there that might work.
Create a special virtual memory address or lookup table for each thread.  When you assign a read+write object to a thread, it gets added to the virtual address/memory table.
Optinally, it could be up to the programmer to make sure they don't try to access data from a thread that does not have ownership/control of that object.  If a programmer does try to access it, it would fail as the memory address would point to nowhere/bad data/etc....

Of course, there are probably other, better ways to do it that are not as fickle as this... but I don't know if the limitations of the Python Interpreter and GIL would allow better methods. 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From andrewfr_ice at yahoo.com  Thu Jul 29 18:56:52 2010
From: andrewfr_ice at yahoo.com (Andrew Francis)
Date: Thu, 29 Jul 2010 09:56:52 -0700 (PDT)
Subject: [pypy-dev] pypy-dev Digest, Vol 360, Issue 13
In-Reply-To: 
Message-ID: <680164.96893.qm@web120007.mail.ne1.yahoo.com>

Hi Kevin:

Message: 1
Date: Tue, 27 Jul 2010 14:20:10 -0400
From: Kevin Ar18 
Subject: Re: [pypy-dev] pre-emptive micro-threads utilizing shared
    memory message passing?
To: 
Message-ID: 
Content-Type: text/plain; charset="iso-8859-1"

>I am attempting to experiment with FBP - Flow Based Programming >(http://www.jpaulmorrison.com/fbp/ and book: http://www.jpaulmorrison.com>/fbp/book.pdf)? There is something very similar in Python: >http://www.kamaelia.org/MiniAxon.html? Also, there are some similarities >to Erlang - the share nothing memory model... and on some very broad >levels, there are similarities that can be found in functional languages.

I just came back from EuroPython. A lot of discussion on concurrency....

Well functional languages (like Erlang), variables tend to be immutable. This is a bonus in a concurrent system - makes it easier to reason about the system - and helps to avoid various race conditions. As for the shared memory. I think there is a difference between whether things are shared at the application programmer level, or under the hood controlled by the system. Programmers tend to beare bad at the former. 

>http://www.kamaelia.org/MiniAxon.html

I took a quick look. Maybe I am biased but Stackless Python gives you most of that. Also tasklets and channels can do everything a generator can and more (a generator is more specialised than a coroutine). Also it is easy to mimic asynchrony with a CSP style messaging system where microthreads and channels are cheap. A line from the book "Actors: A Model of Concurrent Computation in Distributed Systems" by Gul A. Agha comes to mind: "synchrony is mere buffered asynchrony."

>The process of connecting the boxes together was actually designed to be >programmed visually, as you can see from the examples in the book (I have >no idea if it works well, as I am merely starting to experiment with it).

What bought me to Stackless Python and PyPy was work concerning WS-BPEL. Allegedly, WS-BPEL/XLang/WSFL (Web-Services Flow Language) are based on formalisms like pi calculus.

Since I don't own a multi-core machine and I am not doing CPU intense stuff, I never really cared. However I have been doing things where I needed to impose logical orderings upon processes (i.e., process C can only run after process A and B are finished). My initial native uses of Stackless (easy to do in anything system based on CSP), resulted in deadlocking the system. So I found understanding deadlock to be very important.

>Each box, being a self contained "program," the only data it has access >to is 3 parts:

>Implementation of FBP requires a custom scheduler for several reasons:
>(1) A box can only run if it has actual data on the "in port(s)"? Thus, >the scheduler would only schedule boxes to run when they can actually >process some data.

Stackless Python already works like this. No custom scheduler needed. I would recommend you read Rob Pike's paper "The Implementation of Newsqueak" or some of the Cardilli papers to understand how CSP constructs with channels work. And if you need to customize schedulers - you have two routes 1) Use pre-existing classes and API 2) Experiment with PyPy's stackless.py

>(2) In theory, it may be possible to end up with hundreds or thousands of >these light weight boxes.? Using heavy-weight OS threads or processes for every one is out of the question.

Stackless Python.

>In a perfect world, here's what I might do:
* Assume a quad core cpu
>(1) Spawn 1 process
>(2) Spawn 4 threads & assign each thread to only 1 core -- in other >words, don't let the OS handle moving threads around to different cores
>(3) Inside each thread, have a mini scheduler that switches back and >forth between the many micro-threads (or "boxes") -- note that the OS >should not handle any of the switching between micro-threads/boxes as it >does it all wrong (and to heavyweight) for this situation.
>(4) Using a shared memory queue, each of the 4 schedulers can get the >next box to run... or add more boxes to the schedule queue.

My advice: get stuff properly working under a single threaded model first so you understand the machinery. That said, I think Carlos Eduardo de Paula a few years ago played with adapting Stackless for multi-processing.

Second piece of advice: start looking at how Go does things. Stackless Python and Go share a common ancestor. However Go does much more on the multi-core front.

Cheers,
Andrew

From kevinar18 at hotmail.com  Thu Jul 29 19:35:14 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 29 Jul 2010 13:35:14 -0400
Subject: [pypy-dev] pypy-dev Digest, Vol 360, Issue 13
In-Reply-To: <680164.96893.qm@web120007.mail.ne1.yahoo.com>
References: ,
	<680164.96893.qm@web120007.mail.ne1.yahoo.com>
Message-ID: 

> Well functional languages (like Erlang), variables tend to be immutable. This is a bonus in a concurrent system - makes it easier to reason about the system - and helps to avoid various race conditions. As for the shared memory. I think there is a difference between whether things are shared at the application programmer level, or under the hood controlled by the system. Programmers tend to beare bad at the former. 

Your right... and I am actually talking about non-shared memory from the perspective of the programmer, but under the hood, it MUST use shared memory for implementation.  The problem I am running into is that there is no way to implement it under the hood because there is no way to do shared memory in Python.

Thanks for bringing that up.  Maybe that will clarify what I was going on about. :)

> I took a quick look. Maybe I am biased but Stackless Python gives you most of that. Also tasklets and channels can do everything a generator can and more (a generator is more specialised than a coroutine). Also it is easy to mimic asynchrony with a CSP style messaging system where microthreads and channels are cheap. A line from the book "Actors: A Model of Concurrent Computation in Distributed Systems" by Gul A. Agha comes to mind: "synchrony is mere buffered asynchrony."

Agreed.  Stuff like the stackless module in PyPy, greenlets, twisted, and others do offer some useful options that are even better than generators...  I could definitely make use of them for some of the broader implemenation details.  However, the problem is always that there is no way to make them parallel within Python itself, because there is no shared memory that I can use for "under the hood" implemenation.

Now, if there is a true parallel implementation of stackless, greenlets, twisted, etc... maybe it could fit my purposes... but I'd have to check.  I did some basic searching on various Python threading implemenations in the past and didn't really find one that did... but, like you suggested, maybe there is one out there somewhere.

> >The process of connecting the boxes together was actually designed to be >programmed visually, as you can see from the examples in the book (I have >no idea if it works well, as I am merely starting to experiment with it).
> 
> What bought me to Stackless Python and PyPy was work concerning WS-BPEL. Allegedly, WS-BPEL/XLang/WSFL (Web-Services Flow Language) are based on formalisms like pi calculus.
> 
> Since I don't own a multi-core machine and I am not doing CPU intense stuff, I never really cared. However I have been doing things where I needed to impose logical orderings upon processes (i.e., process C can only run after process A and B are finished). My initial native uses of Stackless (easy to do in anything system based on CSP), resulted in deadlocking the system. So I found understanding deadlock to be very important.
> 
Thanks... and, uh, about all I can do is bookmark this for later.  Really, thanks for the links; I may very well want to research each and every one of these at some point and see what I can learn from each one.  If you have more stuff like that, feel free to let me know. :)

> My advice: get stuff properly working under a single threaded model first so you understand the machinery. That said, I think Carlos Eduardo de Paula a few years ago played with adapting Stackless for multi-processing.
Yeah, I've been considering that.  Maybe I'll just go ahead with a single threaded implementation... and if I feel like it, I could always try to edit PyPy or one of the other implemenations later (although I probably never will due to time constraints :) ).  Still, I figured I might as well ask around and see if it was possible to do a parallel implementation sooner.

Or... what I may end up doing is using the slow multiprocessing module and queues.  Granted, it will probably be slow since it doesn't use shared memory "under the hood", but it would be parallel.

> Second piece of advice: start looking at how Go does things. Stackless Python and Go share a common ancestor. However Go does much more on the multi-core front.
I have looked at Go Goroutines.... albeit briefly.  I noticed that they are co-operative like stackless and, based on your comments, I'm guessing they work on multiple cores?  I was really disappointed that they were not pre-emptive, however.  I haven't really looked much into it beyond that, but maybe I'll give it another look; but using it would mean not using Python. :( 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From kevinar18 at hotmail.com  Thu Jul 29 19:44:39 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 29 Jul 2010 13:44:39 -0400
Subject: [pypy-dev] FW: Would the following shared memory model be possible?
In-Reply-To: 
References: ,
	<20100727062702.GE12699@tunixman.com>,
	,
	,
	,
	,

Message-ID: 

> Would comments from a project using this approach in real systems be
> of interest/use/help? Whilst I didn't know about Morrison's FBP
> (Balzer's work predates him btw - don't listen to hype) I had heard of
> (and played with) Occam among other more influential things, and
> Kamaelia is a real tool. Also there is already a pre-existing FBP tool
> for Stackless, and then historically there's also MASCOT & friends. It

You brought up a lot of topics.  I went ahead and sent you a private email.  There's always lots of interesting things I can add to my list of things to learn about. :)

> just looks to me that you're tieing yourself up in knots over things
> that aren't problems, when there are some things which could be useful
> (in practice) & interesting in this space.
The particular issue in this situation is that there is no way to make Kamaelia, FBP, or other concurrency concepts run in parallel (unless you are willing to accept lots of overhead like with the multiprocessing queues).

Since you have worked with Kamaelia code a lot... you understand a lot more about implementation details.  Do you think the previous shared memory concept or something like it would let you make Kamaelia parallel?
If not, can you think of any method that would let you make Kamaelia parallel?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 

From kevinar18 at hotmail.com  Thu Jul 29 20:02:38 2010
From: kevinar18 at hotmail.com (Kevin Ar18)
Date: Thu, 29 Jul 2010 14:02:38 -0400
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: 
References: ,

Message-ID: 

> Hello Kevin,
> I don't know if it can be a solution to your problem but for my
> Master Thesis I'm working on making Stackless Python distributed. What
> I did is working but not complete and I'm right now in the process of
> writing the thesis (in french unfortunately). My code currently works
> with PyPy's "stackless" module onlyis and use some PyPy specific
> things. Here's what I added to Stackless:
>
> - Possibility to move tasklets easily (ref_tasklet.move(node_id)). A
> node is an instance of an interpreter.
> - Each tasklet has its global namespace (to avoid sharing of data). The
> state is also easier to move to another interpreter this way.
> - Distributed channels: All requests are known by all nodes using the
> channel.
> - Distributed objets: When a reference is sent to a remote node, the
> object is not copied, a reference is created using PyPy's proxy object
> space.
> - Automated dependency recovery when an object or a tasklet is loaded
> on another interpreter
>
> With a proper scheduler, many tasklets could be automatically spread in
> multiple interpreters to use multiple cores or on multiple computers. A
> bit like the N:M threading model where N lightweight threads/coroutines
> can be executed on M threads.

Was able to have a look at the API...
If others don't mind my asking this on the mailing list:

* .send() and .receive()
What type of data can you send and receive between the tasklets?  Can you pass entire Python objects?

* .send() and .receive() memory model
When you send data between tasklets (pass messages) or whateve you want to call it, how is this implemented under the hood?  Does it use shared memory under the hood or does it involve a more costly copying of the data?  I realize that if it is on another machine you have to copy the data, but what about between two threads?  You mentioned PyPy's proxy object.... guess I'll need to read up on that. 		 	   		  

From sparks.m at gmail.com  Thu Jul 29 19:21:25 2010
From: sparks.m at gmail.com (Michael Sparks)
Date: Thu, 29 Jul 2010 18:21:25 +0100
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 

Message-ID: <201007291821.26318.sparks.m@gmail.com>

I make it a point these days to only reply on-list. It leads to endless 
repetition otherwise. If you repost this cc'ing the pypy-dev list I'll reply. 
If you think it's off topic there, then I see no point.

Michael.

On Thursday 29 July 2010 18:05:27 you wrote:
> Thanks for the reply.
> 
> > Would comments from a project using this approach in real systems be
> > of interest/use/help?
> 
> I contacted someone from Kamaelia a while back (probably you).
> Yes, use of the dataflow concept would be really useful (no
> MIT/BSD/Python/PD license).  However, licensing was an issues, so I went
> it on my own.  I find the concept rather interesting both to maybe learn
> from and to actually try and use in an actual application.
> 
> > Whilst I didn't know about Morrison's FBP
> > (Balzer's work predates him btw - don't listen to hype) I had heard of
> > (and played with) Occam among other more influential things, and
> > Kamaelia is a real tool.
> 
> What is this Balzer and Occam? :)  Do you have any links I can look at?
> 
> > Also there is already a pre-existing FBP tool
> > for Stackless
> 
> The problem is that Stackless is not parallel, which is what I would really
> like to do.
> 
> > , and then historically there's also MASCOT & friends.
> 
> Do you have a link about this?

-- 
>>>

From andrewfr_ice at yahoo.com  Thu Jul 29 22:39:16 2010
From: andrewfr_ice at yahoo.com (Andrew Francis)
Date: Thu, 29 Jul 2010 13:39:16 -0700 (PDT)
Subject: [pypy-dev] Would the following shared memory model be possible?
Message-ID: <557968.57023.qm@web120009.mail.ne1.yahoo.com>

Hi Michael:

--- On Thu, 7/29/10, Michael Sparks  wrote:

> It's a pity we didn't get a chance to chat at the
> conference. (I was the one videoing everything for upload after >transcoding :)

Yes I noticed. I gave the talk "Prototyping Go's Select with stackless.py for Stackless Python." Much of that talk dealt with rendezvous semantics courtesy via synchronous channels.

I will post the Original slides and the Revised version (mistakes corrected ) in a day or two. 

> > >http://www.kamaelia.org/MiniAxon.html
> > 

> I'm biassed towards Kamaelia (naturally :-), but I agree.
> MiniAxon is just a  toy/tutorial. Early in kamaelia's history we >considered using stackless, but rejected it simply because we wanted to >work mainly with mainline python, rather than a specialised version.

Fair enough. Currently Stackless Python is being integrated with Psyco and will be available as a module. 

> Other things in Stackless's favour (IIRC) - include the
> fact that you can  pickle generators, and send them across network >connection, unpickle them and let them continue running. I don't know 
>if you do the same with tasklets, but I wouldn't be surprised if you do :)

As long as you do not have a C Frame involved, you can pickle a tasklet.
That was the subject of my "Silly Stackless Python Trick" lighting talk.
I was going to demonstrate a version of the Sieve of Eratosthenes that could be pickled and resumed on another machine. However my HP Netbook had a non-standard VGA output connection and I needed to install Stackless
on a loaner ThinkPad that died as I hooked it up. However you saw all
that :-(

> That means you have potential for process migration.

Yep. Gabriel Lavoie does a lot of work in that area with PyPy (thanks Gabriel !)

> Doing that sensibly though IMO would require better understanding
> in the system of what the user is trying to achieve and what they're >sending. (It's easy to think of examples where this causes more pain than >it's worth after all)

You have to understand what can be pickled. Occasionally you are in for
a surprise (i.e. functools).

>You could argue in that case that the biggest _real_ difference is 
>that we try to use a unified API for different concurrency
>methods. 

Well I would argue that Stackless has a simple elegant model. The addition of select just adds more power.Stackless channels can also serve as generators (they are iterable). I recently took a stab at writing the Sleeping Barber's problem. I think in Stackless, the basic solution was about 30 lines. Very little clutter.

> One **highly subjective** other thing in our favour, is
> that generators are limited to a single level of control flow 
>(ie non-nestable without a trampoline). This doesn't sounds like
> an advantage, but it tends to lead to simpler components which are 
>in turn reusable. (and that I view as useful :)

Okay. I attended Ray's Hettinger's talk on Monocle. In the past
I have encountered situations where I bumped up with the nesting problem.
If I recall, the problem involved request handlers that had a RPC style AND made additional Twisted deferred calls:

class MyRequestHandler(...):

    @defer.inlineCallbacks
    def process(self):
        try:
            result = yield
client.getPage("http://www.google.com")
        except Exception, err:
            log.err(err, "process getPage call
failed")
        else:
            # do some processing with the result 
            return result

looks reasonable but Python will balk. Nested generators. Only way around it is that you had to hope that the Twisted protocol was properly written and chain deferreds.

> Have fun,

I do :-)

Cheers,
Andrew

From p.giarrusso at gmail.com  Thu Jul 29 22:53:22 2010
From: p.giarrusso at gmail.com (Paolo Giarrusso)
Date: Thu, 29 Jul 2010 22:53:22 +0200
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com> 

Message-ID: 

On Thu, Jul 29, 2010 at 15:15, William Leslie
 wrote:
> On 29 July 2010 18:55, Maciej Fijalkowski  wrote:
>> On Thu, Jul 29, 2010 at 10:50 AM, William Leslie
>>  wrote:
>>> If task X expects that task Y will mutate some object it has, it needs
>>> to go back to the source for every read. This means that if you do use
>>> mutation of some shared object for communication, it needs to be
>>> synchronised before every access. What this means for us is that every
>>> read from a possibly mutable object requires an acquire, and every
>>> write requires a release. It's as if every reference in the program is
>>> implemented with a volatile pointer. Even if the object is never
>>> mutated, there can be a lot of unnecessary bus chatter waiting for
>>> MESI to tell us so.
>>>
>>
>> I do agree there is an overhead. Can you provide some data how much
>> this overhead is? Python is not a very simple language and a lot of
>> things are complex and time consuming, so I wonder how it compares to
>> locking per object.

Below I try to prove that locking is still too expensive, even for an
interpreter.
Also, for many things the clever optimizations you do allow making
those costs small, at least for the average case / fast path. I have
been taught to consider clever optimizations as required. With JIT
compilation, specialization and shadow classes, are method calls much
more expensive than a guard and (if no inlining is done, as might
happen in PyPy in the worst case for big functions) an assembler
'call' opcode, and possibly stack shuffling? How many cycles is that?
How more expensive is that than optimized JavaScript (which is not far
from C, the only difference being the guard)? You can assume the case
of plain calls without keyword arguments and so on (and with inlining,
keyword arguments should pay no runtime cost).

Also, the free threading patches which tried removing the GIL gave an
unacceptable (IIRC 2x) slowdown to CPython in the old days of CPython
1.5. And I don't think they tried to lock every object, just what you
need to lock (which included refcounts).

> It *is* locking per object, but you also spend time looking for the
> data if someone else has invalidated your cache line.

That overhead is already there in locking per object, I think (locking
can be much more expensive than a cache miss, see below).
However, locking per object does not prevent race conditions unless
you make atomic regions as big as actually needed (locking per
statement does not work), it just prevents data races (a conflict
between a write and a memory operation which are not synchronized
between each other). And you can't extend atomic regions indefinitely,
as that implies starvation. Even software transactional memory
requires the programmer to allocate which regions have to be atomic.

Given the additional cost (discussed elsewhere in this mail), and
given that there is not much benefit, I think locking-per-object is
not worth it (but I'd still love to know more about why the effort on
python-safethread was halted).

> Come to think of it, that isn't as bad as it first seemed to me. If
> the sender never mutates the object, it will Just Work on any machine
> with a fairly flat cache architecture.

You first wrote: "The alternative, implicitly writing updates back to
memory as soon as possible and reading them out of memory every time,
can be hundreds or more times slower."
This is not "locking per object", it is just semantically close to it,
and becomes equivalent if only one thread has a reference at any time.

They are very different though performance-wise, and each of them is
better for some usages. In the Linux kernel (which I consider quite
authoritative here, on what you can do in C) both are used for valid
performance reasons, and a JIT compiler could choice between them.
Here, first I describe the two alternatives mentioned. Finally, I go
to the combination for the "unshared case".

- What you first described (memory barriers or uncached R/Ws) can be
faster for small updates, depending on the access pattern. An uncached
memory area does not disturb other memory traffic, unlike memory
barriers which are global, but I don't think an unprivileged process
is allowed to obtain one (by modifying MSRs or PATs, for x86).

Cost: each memory op goes to main memory and is thus as slow as a
cache miss (hundreds of clock cycles). When naively reading a Python
field, many such reads can be possible, but a JIT compiler can bring
it down to the equivalent of a C access with shadow classes and
specialization, and this would pay even more here (V8 does it for
JavaScript and I think PyPy already does most or all of it).

- Locking per object (monitors): slow upfront, but you can do each r/w
out of your cache, so if the object is kept locked for some time, this
is more efficient.
How slow? A system call to perform locking can cost tens of thousands
of cycles. But Java locks, and nowadays even Linux futexes (and
Windows locks), perform everything in userspace in as many cases as
possible (the slowpath is when there is actually contention on the
lock, but it's uncommon with locking-per-object). I won't sum up here
the literature on this.

- Since no contention is expected here, a simple couple of memory
barrier is needed on send/receive (a write barrier for send, a read
one for receive, IIRC). Allowing read-only access to another thread
already brings back to a mixture of the above two solutions. However,
in the 1st solution, using memory barriers, you'd need a write barrier
for every write, but you could save on read barriers.
-- 
Paolo Giarrusso - Ph.D. Student
http://www.informatik.uni-marburg.de/~pgiarrusso/

From exarkun at twistedmatrix.com  Thu Jul 29 23:24:58 2010
From: exarkun at twistedmatrix.com (exarkun at twistedmatrix.com)
Date: Thu, 29 Jul 2010 21:24:58 -0000
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: <557968.57023.qm@web120009.mail.ne1.yahoo.com>
References: <557968.57023.qm@web120009.mail.ne1.yahoo.com>
Message-ID: <20100729212458.2188.24074246.divmod.xquotient.34@localhost.localdomain>

On 08:39 pm, andrewfr_ice at yahoo.com wrote:
>
>Okay. I attended Ray's Hettinger's talk on Monocle. In the past
>I have encountered situations where I bumped up with the nesting 
>problem.
>If I recall, the problem involved request handlers that had a RPC style 
>AND made additional Twisted deferred calls:
>
>class MyRequestHandler(...):
>
>    @defer.inlineCallbacks
>    def process(self):
>        try:
>            result = yield
>client.getPage("http://www.google.com")
>        except Exception, err:
>            log.err(err, "process getPage call
>failed")
>        else:
>            # do some processing with the result
>            return result
>
>looks reasonable but Python will balk.

Aside from the "return result" (should be defer.returnValue(result), 
generators can't return with a value), this looks fine to me too.  Why 
do you say Python will balk?

Jean-Paul

From william.leslie.ttg at gmail.com  Fri Jul 30 09:35:29 2010
From: william.leslie.ttg at gmail.com (William Leslie)
Date: Fri, 30 Jul 2010 17:35:29 +1000
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On 30 July 2010 06:53, Paolo Giarrusso  wrote:
>> Come to think of it, that isn't as bad as it first seemed to me. If
>> the sender never mutates the object, it will Just Work on any machine
>> with a fairly flat cache architecture.
>
> You first wrote: "The alternative, implicitly writing updates back to
> memory as soon as possible and reading them out of memory every time,
> can be hundreds or more times slower."
> This is not "locking per object", it is just semantically close to it,
> and becomes equivalent if only one thread has a reference at any time.

Yes, direct memory access was misdirection (sorry), as the cache
already handles consistency even in NUMA systems of the same size that
sit on most desktops today, and most significantly you still need to
lock objects in many cases, such as looking up an entry in a dict,
which can change size while probing. Not only are uncached accesses
needlessly slow in the typical case, but they are not sufficient to
ensure consistency of some resizable rpython data structures.

-- 
William Leslie

From evan at theunixman.com  Fri Jul 30 21:36:28 2010
From: evan at theunixman.com (Evan Cofsky)
Date: Fri, 30 Jul 2010 12:36:28 -0700
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: 
References: 

Message-ID: <20100730193627.GB2082@tunixman.com>

On 07/27 11:48, Maciej Fijalkowski wrote:
> Right now, no. But there are ways in which you can experiment. Truly
> concurrent threads (depends on implicit vs explicit shared memory)
> might require a truly concurrent GC to achieve performance. This is
> work (although not as big as removing refcounting from CPython for
> example).

Would starting to remove the GIL then be a useful project for someone
(like me, for example) to undertake? It might be a good start to
experimentation with other kinds of concurrency. I've been interested in
Software Transactional Memory
(http://en.wikipedia.org/wiki/Software_transactional_memory).

-- 
Evan Cofsky 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: Digital signature
URL: 

From fijall at gmail.com  Fri Jul 30 21:40:35 2010
From: fijall at gmail.com (Maciej Fijalkowski)
Date: Fri, 30 Jul 2010 21:40:35 +0200
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
	message passing?
In-Reply-To: <20100730193627.GB2082@tunixman.com>
References: 

	<20100730193627.GB2082@tunixman.com>
Message-ID: 

On Fri, Jul 30, 2010 at 9:36 PM, Evan Cofsky  wrote:
> On 07/27 11:48, Maciej Fijalkowski wrote:
>> Right now, no. But there are ways in which you can experiment. Truly
>> concurrent threads (depends on implicit vs explicit shared memory)
>> might require a truly concurrent GC to achieve performance. This is
>> work (although not as big as removing refcounting from CPython for
>> example).
>
> Would starting to remove the GIL then be a useful project for someone
> (like me, for example) to undertake? It might be a good start to
> experimentation with other kinds of concurrency. I've been interested in
> Software Transactional Memory
> (http://en.wikipedia.org/wiki/Software_transactional_memory).
>
> --
> Evan Cofsky 
>

I think removing GIL is not a good place to start. It's far too
complex without knowing codebase (it's fairly complex with knowing
codebase). There are many related projects, which are smaller in size
and eventually might lead to having some idea how to remove the GIL.
If you're interested, come to #pypy on IRC to discuss.

Cheers,
fijal

From evan at theunixman.com  Fri Jul 30 21:54:09 2010
From: evan at theunixman.com (Evan Cofsky)
Date: Fri, 30 Jul 2010 12:54:09 -0700
Subject: [pypy-dev] pre-emptive micro-threads utilizing shared memory
 message passing?
In-Reply-To: 
References: 

	<20100730193627.GB2082@tunixman.com>

Message-ID: <20100730195408.GC2082@tunixman.com>

On 07/30 21:40, Maciej Fijalkowski wrote:
> If you're interested, come to #pypy on IRC to discuss.

Sounds reasonable enough. I'll hang out on #pypy and see what happens.

Thanks

-- 
Evan Cofsky 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: Digital signature
URL: 

From sparks.m at gmail.com  Sat Jul 31 03:08:49 2010
From: sparks.m at gmail.com (Michael Sparks)
Date: Sat, 31 Jul 2010 02:08:49 +0100
Subject: [pypy-dev] FW: Would the following shared memory model be
	possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

On Thu, Jul 29, 2010 at 6:44 PM, Kevin Ar18  wrote:
> You brought up a lot of topics.  I went ahead and sent you a private email.
> There's always lots of interesting things I can add to my list of things to
> learn about. :)

Yes, there are lots of interesting things. I have a limited amount of
time however (I should be in bed, it's very late here, but I do /try/
to reply to on-list mails), so cannot spood feed you. Mailing me
directly rather than a (relevant) list precludes you getting answers
from someone other than me. Not being on lists also precludes you
getting answers to questions by chance. Changing emails and names in
email headers also makes keeping track of people hard...

(For example you asked off list last year about Kamaelia's license
from a different email address. Since it wasn't searchable I
completely forgot. You also asked all sorts of questions but didn't
want the answers public, so I didn't reply. If instead you'd
subscribed to the list, and asked there, you'd've found out that
Kamaelia's license changed - to the Apache Software License v2 ...)

If I mention something you find interesting, please Google first and
then ask publicly somewhere relevant. (the answer and question are
then googleable, and you're doing the community a service IMO if you
ask q's that way - if you're question is somewhere relevant and shows
you've already googled prior work as far as you can... People are
always willing to help people who show willing to help themselves in
my experience.)

>> just looks to me that you're tieing yourself up in knots over things
>> that aren't problems, when there are some things which could be useful
>> (in practice) & interesting in this space.
> The particular issue in this situation is that there is no way to make
> Kamaelia, FBP, or other concurrency concepts run in parallel (unless you are
> willing to accept lots of overhead like with the multiprocessing queues).
>
> Since you have worked with Kamaelia code a lot... you understand a lot more
> about implementation details.  Do you think the previous shared memory
> concept or something like it would let you make Kamaelia parallel?
> If not, can you think of any method that would let you make Kamaelia
> parallel?

Kamaelia already CAN run components in parallel in different processes
(has been able to do so for quite some time) or on different
processors. Indeed, all you do is use a ProcessPipeline or
ProcessGraphline rather than Pipeline or Graphline, and the components
in the top level are spread across processes. I still view the code as
experimental, but it does work, and when needed is very useful.

Kamaelia running on Iron Python can run on seperate processors sharing
data efficiently (due to lack of GIL there) happily too. Threaded
components there do that naturally - I don't use IronPython, but it
does run on Iron Python. On windows this is easiest, though Mono works
just as well.

I believe Jython also is GIL free, and Kamaelia's Axon runs there
cleanly too. As a result because Kamaelia is pure python, it runs
truly in parallel there too (based on hearing from people using
kamaelia on jython). Cpython is the exception (and a rather big one at
that). (Pypy has a choice IIUC)

Personally, I think if PyPy worked with generators better (which is
why I keep an eye on PyPy) and cpyext was improved, it'd provide a
really compelling platform for me. (I was rather gutted at Europython
to hear that PyPy's generator support was still ... problematic)

Regarding the *efficiency* and *enforcement* of the approach taken, I
feel you're chasing the wrong tree, but let's go there.

What approach does baseline (non-Iron Python running) kamaelia take
for multi-process work?

For historical reasons, it builds on top of pprocess rather than
multiprocessing module based. This means for interprocess
communications objects are pickled before being sent over operating
system pipes.

This provides an obvious communications overhead - and this isn't
really kamaelia specific at this point.

However, shifting data from one CPU to another is expensive, and only
worth doing in some circumstances. (Consider a machine with several
physical CPUs - each has a local CPU cache, and the data needs to be
transferred from one to another, which is why partly people worry
about thread/CPU affinity etc)

Basically, if you can manage it, you don't want to shift data between
CPUs, you want to partition the processing.

ie you may want to start caring about the size of messages and number
of messages going between processes. Sending small and few between
processes is going to be preferable to sending large and many for
throughput purposes.

In the case of small and few, the approach of pickling and sending
across OS pipes isn't such a bad idea. It works.

If you do want to share data between CPUs, and it sounds like you do,
then most OSs already provide a means of doing that - threads. The
conventions people use for using threads are where they become
unpicked, but as a mechanism, threads do generally work, and work
well.

As well as channels/boxes, you can use an STM approach, such as than
in Axon.STM ...
    * http://www.kamaelia.org/STM.html
    * http://code.google.com/p/kamaelia/source/browse/trunk/Code/Python/Bindings/STM/

...which is logically very similar to version control for variables. A
downside of STM (at least with this approach) however, is that for it
to work, you need either copy on write semantics for objects, or full
copying of objects or similar. Personally I use a biological metaphor
here, in that channels/boxes and components, and similar perform a
similar function to axons and neurons in the body, and that STM is
akin to the hormonal system for maintaining and controlling system
state. (I modelled biological tree growth many moons ago)

Anyhow, coming back to threads, that brings us back to python, and
implementations with a GIL, and those without.

For implementations with a GIL, you then have a choice: do I choose to
try and implement a memory model that _enforces_ data locality? that
is if a piece of data is in use inside a single "process" or "thread"
(from hereon I'll use "task" as a generic phrase) that trying to use
it inside another causes a problem for the task attempting to breach
the model.

In order to enforce this, I personally believe you'd need to use
multiple processes, and only share data through dedicated code
managing shared memory. You could of course do this outside user code.
To do this you'd need an abstraction that made sense, and something
like stackless' channels or kamaelia's (in/out) box model makes sense
there. (The CELL API uses a mailbox metaphor as well for reference)

In that case, you have a choice. You either copy the data into shared
memory, or you share the data in situ. The former gives you back
precisely the same overhead previously described, or the latter
fragments your memory (since you can no longer access it). You could
also have compaction.

However, personally, I think any possible benefits here are outweighed
by the costs and complexity.

The alternative is to _encourage_ data locality. That is encourage the
usage and sharing of data such that whilst you could share data
between tasks and cause corruption that the common way of using the
system discourages such actions. In essence that's what I try to do in
Kamaelia, and it seems to work. Specifically, the model says:

    * If I take a piece of data from an inbox, I own it and can do anything
      with it that I like. If you think of a physical piece of paper and
      I take it from an intray, then that really is the case.

    * If I put a piece of data in an outbox, I no longer own it and should
      not attempt to do anything more with it. Again, using a physical
      metaphor, and naming scheme helps here. In particular, if I put a
      piece of paper in the post, I can no longer modify it. How it gets
      to its recipient is not my concern either.

In practice this does actually work. If you add in immutable tuples,
and immutable strings then it becomes a lot clearer how this can work.

Is there a risk here of accidental modification? Yes. However, the
size and general simplicity of components tends to lead to such
problems being picked up early. It also enables component level
acceptance tests. (We tend to build small examples of usage, which in
turn effectively form acceptance tests)

[ An alternative is to make the "send" primitive make a copy on send.
That would be quite an overhead, and also limit the types of data you
can send. ]

In practical terms, it works. (Stackless proves this as well IMO,
since despite some differences, there's also lots of similarities)

The other question that arises, is "isn't the GIL a problem with
threads?". Well, the answer to that really depends on what you're
doing. David Beazely's talk on what happens on mixing different sorts
of threads shows that it isn't ideal, and if you're hitting that
behaviour, then actually switching to real processes makes sense.
However if you're doing CPU intensive work inside a C extension which
releases the GIL (eg numpy), then it's less of an issue in practice.
Custom extensions can do the same.

So, for example, picking something which I know colleagues [1] at work
do, you can use a DVS broadcast capture card to capture video frames,
pass those between threads which are doing processing on them, and
inside those threads use c extensions to process the data efficiently
(since image processing does take time...), and those release the GIL
boosting throughput.

   [1] On this project : http://www.bbc.co.uk/rd/projects/2009/10/i3dlive.shtml

So, that makes it all sound great - ie things can, after various
fashions, run in parallel on various versions of python, to practical
benefit. But obviously it could be improved.

Personally, I think the project most likely to make a difference here
is actually pypy. Now, talk is very cheap, and easy, and I'm not
likely to implement this, so I'll aim to be brief. Execution is hard.

In particular, what I think is most likely to be beneficial is
something _like_ this:

Assume pypy runs without a GIL. Then allow the creation of a green
process. A green process is implemented using threads, but with data
created on the heap such that it defaults to being marked private to
the thread (ie ala thread local storage, but perhaps implemented
slightly differently - via references from the thread local storage
into the heap) rather than shared. Sharing between green processes
(for channels or boxes) would "simply" be detagged as being owned by
one thread, and passed to another.

In particular this would mean that you need a mechanism for doing
this. Simply attempting to call another green process (or thread) from
another with mutable data types would be sufficient to raise the
equivalent of a segmentation fault.

Secondly, improve cpyext to the extent that each cpython extension
gets it's own version of the GIL. (ie each extension runs with its own
logical runtime, and thinks that it has its own GIL which it can lock
and release. In practice it's faked by the PyPy runtime. This is
essentially similar conceptually to creating green processes.

It's worth considering that the Linux kernel went through similar
changes, in that in the 2.0 days there was a large single big lock,
which was replaced by ever granular locks. I personally think that
since there are so many extensions that rely on the existence of the
GIL simply waving a wand to get rid of it isn't likely. However
logically providing a GIL per C-Extension may be plausible, and _may_
be sufficient.

However, I don't know - it might well not - I've not looked at the
code, and talk is cheap - execution is hard.

Hopefully the above (cheap :) comments are in some small way useful.

Regards,

Michael.

From cfbolz at gmx.de  Sat Jul 31 08:34:49 2010
From: cfbolz at gmx.de (Carl Friedrich Bolz)
Date: Sat, 31 Jul 2010 08:34:49 +0200
Subject: [pypy-dev] S3 2010 deadline extension
Message-ID: <4C53C409.1060101@gmx.de>

The S3 2010 Paper deadline was moved forward by two weeks, and is now 
August 13, 2010.

*** Workshop on Self-sustaining Systems (S3) 2010 ***

September 27-28, 2010
The University of Tokyo, Japan
http://www.hpi.uni-potsdam.de/swa/s3/s3-10/

In cooperation with ACM SIGPLAN

=== Call for papers ===

The Workshop on Self-sustaining Systems (S3) is a forum for discussion 
of topics relating to computer systems and languages that are able to 
bootstrap, implement, modify, and maintain themselves. One property of 
these systems is that their implementation is based on small but 
powerful abstractions; examples include (amongst others) 
Squeak/Smalltalk, COLA, Klein/Self, PyPy/Python, Rubinius/Ruby, and 
Lisp. Such systems are the engines of their own replacement, giving 
researchers and developers great power to experiment with, and explore 
future directions from within, their own small language kernels.

S3 will be take place September 27-28, 2010 at The University of Tokyo, 
Japan. It is an exciting opportunity for researchers and practitioners 
interested in self-sustaining systems to meet and share their knowledge, 
experience, and ideas for future research and development.

--- Submissions and proceedings ---

S3 invites submissions of high-quality papers reporting original 
research, or describing innovative contributions to, or experience with, 
self-sustaining systems, their implementation, and their application. 
Papers that depart significantly from established ideas and practices 
are particularly welcome.

Submissions must not have been published previously and must not be 
under review for any another refereed event or publication. The program 
committee will evaluate each contributed paper based on its relevance, 
significance, clarity, and originality. Revised papers will be published 
as post-proceedings in the ACM Digital Library.

Papers should be submitted electronically via EasyChair at 
http://www.easychair.org/conferences/?conf=s32010 in PDF format. 
Submissions must be written in English (the official language of the 
workshop) and must not exceed 10 pages. They should use the ACM SIGPLAN 
10 point format, templates for which are available at 
http://www.acm.org/sigs/sigplan/authorInformation.htm.

--- Venue ---

The University of Tokyo, Komaba Campus, Japan

--- Important dates ---

Submission of papers: *EXTENDED* August 13, 2010
Author notification: August 27, 2010
Early registration: September 3, 2010
Revised papers: September 10, 2010
S3 workshop: September 27-28, 2010
Final papers for ACM-DL post-proceedings: October 15, 2010

--- Invited talks ---

Yukihiro Matsumoto: "From Lisp to Ruby to Rubinius"
Takashi Ikegami: "Sustainable Autonomy and Designing Mind Time"

--- Chairs ---

Robert Hirschfeld (Hasso-Plattner-Institut Potsdam, Germany)
hirschfeld at hpi.uni-potsdam.de
Hidehiko Masuhara (The University of Tokyo, Japan)
masuhara at graco.c.u-tokyo.ac.jp
Kim Rose (Viewpoints Research Institute, USA)
kim.rose at vpri.org

--- Program committee ---

Carl Friedrich Bolz, University of Duesseldorf, Germany
Johan Brichau, Universite Catholique de Louvain, Belgium
Shigeru Chiba, Tokyo Institute of Technology, Japan
Brian Demsky, University of California, Irvine, USA
Marcus Denker, INRIA Lille, France
Richard P. Gabriel, IBM Research, USA
Michael Haupt, Hasso-Plattner-Institut, Germany
Robert Hirschfeld, Hasso-Plattner-Institut, Germany (co-chair)
Atsushi Igarashi, University of Kyoto, Japan
David Lorenz, The Open University, Israel
Hidehiko Masuhara, University of Tokyo, Japan (co-chair)
Eliot Miranda, Teleplace, USA
Ian Piumarta, Viewpoints Research Institute, USA
Martin Rinard, MIT, USA
Antero Taivalsaari, Nokia, Finland
David Ungar, IBM, USA

_______________________________________________
fonc mailing list
fonc at vpri.org
http://vpri.org/mailman/listinfo/fonc

From andrewfr_ice at yahoo.com  Sat Jul 31 12:00:49 2010
From: andrewfr_ice at yahoo.com (Andrew Francis)
Date: Sat, 31 Jul 2010 03:00:49 -0700 (PDT)
Subject: [pypy-dev] Would the following shared memory model be possible?
In-Reply-To: 
Message-ID: <597371.6380.qm@web120001.mail.ne1.yahoo.com>

Hi JP:

Message: 1
Date: Thu, 29 Jul 2010 21:24:58 -0000
From: exarkun at twistedmatrix.com
Subject: Re: [pypy-dev] Would the following shared memory model be
    possible?
To: pypy-dev at codespeak.net
Message-ID:
    <20100729212458.2188.24074246.divmod.xquotient.34 at localhost.localdomain>

Content-Type: text/plain; charset="utf-8"; format="flowed"

On 08:39 pm, andrewfr_ice at yahoo.com wrote:
>
>Okay. I attended Ray's Hettinger's talk on Monocle. In the past
>I have encountered situations where I bumped up with the nesting
>problem.
>If I recall, the problem involved request handlers that had a RPC style
>AND made additional Twisted deferred calls:
>
>class MyRequestHandler(...):
>
>    @defer.inlineCallbacks
>    def process(self):
>        try:
>            result = yield
>client.getPage("http://www.google.com")
>        except Exception, err:
>            log.err(err, "process getPage call
>failed")
>        else:
>            # do some processing with the result
>            return result
>
>looks reasonable but Python will balk.

JP>Aside from the "return result" (should be defer.returnValue(result),
JP>generators can't return with a value), this looks fine to me too.  Why
JP>do you say Python will balk?

Well the return with a value was the deal breaker. I used this example because this is where I came face-to-face with nested generators - and generated a mistrust for them in regard to exotic uses. There was something else about the real example (I am having a hard time finding the posts - somewhere in 2007) - I think it was a very early version of PyAMF and it really wanted a return (HTTP is okay). I believe under the hood, if the protocol returns a deferred or None, the reactor will expect further output in the future.

Cheers,
Andrew

Cheers,
Andrew

From sparks.m at gmail.com  Sat Jul 31 19:43:32 2010
From: sparks.m at gmail.com (Michael Sparks)
Date: Sat, 31 Jul 2010 18:43:32 +0100
Subject: [pypy-dev] FW: Would the following shared memory model be
	possible?
In-Reply-To: 
References: 
	<20100727062702.GE12699@tunixman.com>

Message-ID: 

[ cc'ing the list in case anyone else took my words the same way as Kevin :-( ]

On Sat, Jul 31, 2010 at 5:26 PM, Kevin Ar18  wrote:
> I have no idea what I did you warrant you hateful replies towards me, but
> they really are not appropriate (in public or private email).

I had absolutely no intention of offending you, and am deeply sorry
for any offense that I may have caused you.

In my reply I merely wanted to flag that I don't have time to go into
everything (like most people), that asking questions in a public realm
is better because you may then get answers from multiple people, and
that people who appear to do some research first tend to get better
answers. I also tried to give an example, but that doesn't appear to
have been helpful. (I'm fallible like everyone else)

My intention there was to be helpful and to explain why I have that
view of only replying on list, and it appears to have offended you
instead, and I apologise. (one person's direct and helpful speech in
one place can be a mortal insult somewhere else)

After those couple of paragraphs, I tried to add to your discussion by
replying to your specific points which you asked about parallel
execution, noting places and examples where it is possible today. (to
varying degrees of satisfaction) I then also tried to answer your
point of "if something extra could be done, what would probably be
generally useful". To that I noted that *my* talk there was cheap, and
that execution was hard.

Somehow along the way, my intent to try to be helpful to you has
resulted in offending and upsetting you, and for that I am truly sorry
- life is simply too short for people to upset each other, and in no
way was my post intended as "hateful", and once again, my apologies.
In future please assume good intentions - I assumed good intentions on
your part.

I'll bow out at this point.

Best Regards,

Michael.

>
>> Date: Sat, 31 Jul 2010 02:08:49 +0100
>> Subject: Re: [pypy-dev] FW: Would the following shared memory model be
>> possible?
>> From: sparks.m at gmail.com
>> To: kevinar18 at hotmail.com
>> CC: pypy-dev at codespeak.net
>>
>> On Thu, Jul 29, 2010 at 6:44 PM, Kevin Ar18  wrote:
>> > You brought up a lot of topics. I went ahead and sent you a private
>> > email.
>> > There's always lots of interesting things I can add to my list of things
>> > to
>> > learn about. :)
>>
>> Yes, there are lots of interesting things. I have a limited amount of
>> time however (I should be in bed, it's very late here, but I do /try/
>> to reply to on-list mails), so cannot spood feed you. Mailing me
>> directly rather than a (relevant) list precludes you getting answers
>> from someone other than me. Not being on lists also precludes you
>> getting answers to questions by chance. Changing emails and names in
>> email headers also makes keeping track of people hard...
>>
>> (For example you asked off list last year about Kamaelia's license
>> from a different email address. Since it wasn't searchable I
>> completely forgot. You also asked all sorts of questions but didn't
>> want the answers public, so I didn't reply. If instead you'd
>> subscribed to the list, and asked there, you'd've found out that
>> Kamaelia's license changed - to the Apache Software License v2 ...)
>>
>> If I mention something you find interesting, please Google first and
>> then ask publicly somewhere relevant. (the answer and question are
>> then googleable, and you're doing the community a service IMO if you
>> ask q's that way - if you're question is somewhere relevant and shows
>> you've already googled prior work as far as you can... People are
>> time however (I should be in bed, it's very late here, but I do /try/
>> to reply to on-list mails), so cannot spood feed you. Mailing me
>> directly rather than a (relevant) list precludes you getting answers
>> from someone other than me. Not being on lists also precludes you
>> getting answers to questions by chance. Changing emails and names in
>> email headers also makes keeping track of people hard...
>>
>> (For example you asked off list last year about Kamaelia's license
>> from a different email address. Since it wasn't searchable I
>> completely forgot. You also asked all sorts of questions but didn't
>> want the answers public, so I didn't reply. If instead you'd
>> subscribed to the list, and asked there, you'd've found out that
>> Kamaelia's license changed - to the Apache Software License v2 ...)
>>
>> always willing to help people who show willing to help themselves in
>> my experience.)
>>
>> >> just looks to me that you're tieing yourself up in knots over things
>> >> that aren't problems, when there are some things which could be useful
>> >> (in practice) & interesting in this space.
>> > The particular issue in this situation is that there is no way to make
>> > Kamaelia, FBP, or other concurrency concepts run in parallel (unless you
>> > are
>> > willing to accept lots of overhead like with the multiprocessing
>> > queues).
>> >
>> > Since you have worked with Kamaelia code a lot... you understand a lot
>> > more
>> > about implementation details. Do you think the previous shared memory
>> > concept or something like it would let you make Kamaelia parallel?
>> > If not, can you think of any method that would let you make Kamaelia
>> > parallel?
>>
>> Kamaelia already CAN run components in parallel in different processes
>> (has been able to do so for quite some time) or on different
>> processors. Indeed, all you do is use a ProcessPipeline or
>> ProcessGraphline rather than Pipeline or Graphline, and the components
>> in the top level are spread across processes. I still view the code as
>> experimental, but it does work, and when needed is very useful.
>>
>> Kamaelia running on Iron Python can run on seperate processors sharing
>> data efficiently (due to lack of GIL there) happily too. Threaded
>> components there do that naturally - I don't use IronPython, but it
>> does run on Iron Python. On windows this is easiest, though Mono works
>> just as well.
>>
>> I believe Jython also is GIL free, and Kamaelia's Axon runs there
>> cleanly too. As a result because Kamaelia is pure python, it runs
>> truly in parallel there too (based on hearing from people using
>> kamaelia on jython). Cpython is the exception (and a rather big one at
>> that). (Pypy has a choice IIUC)
>>
>> Personally, I think if PyPy worked with generators better (which is
>> why I keep an eye on PyPy) and cpyext was improved, it'd provide a
>> really compelling platform for me. (I was rather gutted at Europython
>> to hear that PyPy's generator support was still ... problematic)
>>
>> Regarding the *efficiency* and *enforcement* of the approach taken, I
>> feel you're chasing the wrong tree, but let's go there.
>>
>> What approach does baseline (non-Iron Python running) kamaelia take
>> for multi-process work?
>>
>> For historical reasons, it builds on top of pprocess rather than
>> multiprocessing module based. This means for interprocess
>> communications objects are pickled before being sent over operating
>> system pipes.
>>
>> This provides an obvious communications overhead - and this isn't
>> really kamaelia specific at this point.
>>
>> However, shifting data from one CPU to another is expensive, and only
>> worth doing in some circumstances. (Consider a machine with several
>> physical CPUs - each has a local CPU cache, and the data needs to be
>> transferred from one to another, which is why partly people worry
>> about thread/CPU affinity etc)
>>
>> Basically, if you can manage it, you don't want to shift data between
>> CPUs, you want to partition the processing.
>>
>> ie you may want to start caring about the size of messages and number
>> of messages going between processes. Sending small and few between
>> processes is going to be preferable to sending large and many for
>> throughput purposes.
>>
>> In the case of small and few, the approach of pickling and sending
>> across OS pipes isn't such a bad idea. It works.
>>
>> If you do want to share data between CPUs, and it sounds like you do,
>> then most OSs already provide a means of doing that - threads. The
>> conventions people use for using threads are where they become
>> unpicked, but as a mechanism, threads do generally work, and work
>> well.
>>
>> As well as channels/boxes, you can use an STM approach, such as than
>> in Axon.STM ...
>> * http://www.kamaelia.org/STM.html
>> *
>> http://code.google.com/p/kamaelia/source/browse/trunk/Code/Python/Bindings/STM/
>>
>> ...which is logically very similar to version control for variables. A
>> downside of STM (at least with this approach) however, is that for it
>> to work, you need either copy on write semantics for objects, or full
>> copying of objects or similar. Personally I use a biological metaphor
>> here, in that channels/boxes and components, and similar perform a
>> similar function to axons and neurons in the body, and that STM is
>> akin to the hormonal system for maintaining and controlling system
>> state. (I modelled biological tree growth many moons ago)
>>
>> Anyhow, coming back to threads, that brings us back to python, and
>> implementations with a GIL, and those without.
>>
>> For implementations with a GIL, you then have a choice: do I choose to
>> try and implement a memory model that _enforces_ data locality? that
>> is if a piece of data is in use inside a single "process" or "thread"
>> (from hereon I'll use "task" as a generic phrase) that trying to use
>> it inside another causes a problem for the task attempting to breach
>> the model.
>>
>> In order to enforce this, I personally believe you'd need to use
>> multiple processes, and only share data through dedicated code
>> managing shared memory. You could of course do this outside user code.
>> To do this you'd need an abstraction that made sense, and something
>> like stackless' channels or kamaelia's (in/out) box model makes sense
>> there. (The CELL API uses a mailbox metaphor as well for reference)
>>
>> In that case, you have a choice. You either copy the data into shared
>> memory, or you share the data in situ. The former gives you back
>> precisely the same overhead previously described, or the latter
>> fragments your memory (since you can no longer access it). You could
>> also have compaction.
>>
>> However, personally, I think any possible benefits here are outweighed
>> by the costs and complexity.
>>
>> The alternative is to _encourage_ data locality. That is encourage the
>> usage and sharing of data such that whilst you could share data
>> between tasks and cause corruption that the common way of using the
>> system discourages such actions. In essence that's what I try to do in
>> Kamaelia, and it seems to work. Specifically, the model says:
>>
>> * If I take a piece of data from an inbox, I own it and can do anything
>> with it that I like. If you think of a physical piece of paper and
>> I take it from an intray, then that really is the case.
>>
>> * If I put a piece of data in an outbox, I no longer own it and should
>> not attempt to do anything more with it. Again, using a physical
>> metaphor, and naming scheme helps here. In particular, if I put a
>> piece of paper in the post, I can no longer modify it. How it gets
>> to its recipient is not my concern either.
>>
>> In practice this does actually work. If you add in immutable tuples,
>> and immutable strings then it becomes a lot clearer how this can work.
>>
>> Is there a risk here of accidental modification? Yes. However, the
>> size and general simplicity of components tends to lead to such
>> problems being picked up early. It also enables component level
>> acceptance tests. (We tend to build small examples of usage, which in
>> turn effectively form acceptance tests)
>>
>> [ An alternative is to make the "send" primitive make a copy on send.
>> That would be quite an overhead, and also limit the types of data you
>> can send. ]
>>
>> In practical terms, it works. (Stackless proves this as well IMO,
>> since despite some differences, there's also lots of similarities)
>>
>> The other question that arises, is "isn't the GIL a problem with
>> threads?". Well, the answer to that really depends on what you're
>> doing. David Beazely's talk on what happens on mixing different sorts
>> of threads shows that it isn't ideal, and if you're hitting that
>> behaviour, then actually switching to real processes makes sense.
>> However if you're doing CPU intensive work inside a C extension which
>> releases the GIL (eg numpy), then it's less of an issue in practice.
>> Custom extensions can do the same.
>>
>> So, for example, picking something which I know colleagues [1] at work
>> do, you can use a DVS broadcast capture card to capture video frames,
>> pass those between threads which are doing processing on them, and
>> inside those threads use c extensions to process the data efficiently
>> (since image processing does take time...), and those release the GIL
>> boosting throughput.
>>
>> [1] On this project :
>> http://www.bbc.co.uk/rd/projects/2009/10/i3dlive.shtml
>>
>> So, that makes it all sound great - ie things can, after various
>> fashions, run in parallel on various versions of python, to practical
>> benefit. But obviously it could be improved.
>>
>> Personally, I think the project most likely to make a difference here
>> is actually pypy. Now, talk is very cheap, and easy, and I'm not
>> likely to implement this, so I'll aim to be brief. Execution is hard.
>>
>> In particular, what I think is most likely to be beneficial is
>> something _like_ this:
>>
>> Assume pypy runs without a GIL. Then allow the creation of a green
>> process. A green process is implemented using threads, but with data
>> created on the heap such that it defaults to being marked private to
>> the thread (ie ala thread local storage, but perhaps implemented
>> slightly differently - via references from the thread local storage
>> into the heap) rather than shared. Sharing between green processes
>> (for channels or boxes) would "simply" be detagged as being owned by
>> one thread, and passed to another.
>>
>> In particular this would mean that you need a mechanism for doing
>> this. Simply attempting to call another green process (or thread) from
>> another with mutable data types would be sufficient to raise the
>> equivalent of a segmentation fault.
>>
>> Secondly, improve cpyext to the extent that each cpython extension
>> gets it's own version of the GIL. (ie each extension runs with its own
>> logical runtime, and thinks that it has its own GIL which it can lock
>> and release. In practice it's faked by the PyPy runtime. This is
>> essentially similar conceptually to creating green processes.
>>
>> It's worth considering that the Linux kernel went through similar
>> changes, in that in the 2.0 days there was a large single big lock,
>> which was replaced by ever granular locks. I personally think that
>> since there are so many extensions that rely on the existence of the
>> GIL simply waving a wand to get rid of it isn't likely. However
>> logically providing a GIL per C-Extension may be plausible, and _may_
>> be sufficient.
>>
>> However, I don't know - it might well not - I've not looked at the
>> code, and talk is cheap - execution is hard.
>>
>> Hopefully the above (cheap :) comments are in some small way useful.
>>
>> Regards,
>>
>>
>> Michael.
>