From muzgash.lists at gmail.com Fri Jul 2 09:19:25 2010 From: muzgash.lists at gmail.com (Gerardo Gutierrez) Date: Fri, 2 Jul 2010 08:19:25 -0500 Subject: [IPython-dev] IPythonQt/ZMQ raw_input support issue. Message-ID: Hi everyone. I've been trying to implement from the examples in pyzmqthe support for raw_input calls, to be used in IPthonZMQ and IPythonQt projects. Nothing goo has come from this attempts but a better understanding of what I don't understand. Here's a guide through my last thoughts so you can help me: firs I wanted to add to this line(154) in the kernel, a new type of request: * * *for msg_type in ['execute_request', 'complete_request','**raw_input_request **']:** ** self.handlers[msg_type] = getattr(self, msg_type)* Also I need to add a function: * * *def raw_input_request(self,ident,parent):* * print>>sys.__stdout__,"entered"* * * Just to check if the messages thread gets there (Which it doesn't).* * The class RawInput is as it was always. And obviously the overwriting of the raw_input function in main(): *rawinput=RawInput(session,pub_socket) __builtin__.raw_input=rawinput* * * Now for the frontend part, I need also a msg_type:* * *for msg_type in ['pyin', 'pyout', 'pyerr', 'stream','raw_input']: self.handlers[msg_type] = getattr(self, 'handle_%s' % msg_type)* And a handler for raw_input type of messages: *def handle_raw_input(self,omsg): stdin_msg=sys.stdin.readline() src=stdin_msg self.session.send(self.request_socket,'raw_input_request',dict(code=src))* As you can see this is just to send the raw_input request with the line written by the user. The error is this: * : Operation not supported Traceback (most recent call last): File "./kernel.py", line 194, in execute_request exec comp_code in self.user_ns, self.user_ns File "", line 1, in File "./kernel.py", line 126, in __call__ reply=self.socket.recv_json() File "_zmq.pyx", line 906, in zmq._zmq.Socket.recv_json (zmq/_zmq.c:6862) File "_zmq.pyx", line 751, in zmq._zmq.Socket.recv (zmq/_zmq.c:5316) File "_zmq.pyx", line 781, in zmq._zmq.Socket._recv_copy (zmq/_zmq.c:5690) ZMQError: Operation not supported* * * It says that the error is in *exec comp_code in self.user_ns, self.user_ns*, which is in the execute_request function {u'content': {u'code': u'raw_input()'}, u'header': {u'username': u'muzgash', u'msg_id': 0, u'session': u'264d21d4-7e00-4b7e-b051-3c0ba7b221f6'}, u'msg_type': u'execute_request', u'parent_header': {}} {'content': {u'status': u'error', u'etype': u"", u'evalue': u'Operation n.......... so I can say that the error is in the first message sent by the frontend to the kernel, but the raw_input function is called ( RawInput.__call__() ) and also a message is sent to the frontend through: msg = self.session.msg(u'raw_input') self.socket.send_json(msg) and the function handle_raw_input is called, which sends a new message to the kernel {u'content': {u'code': u'input-->\n'}, u'header': {u'username': u'muzgash', u'msg_id': 1, u'session': u'264d21d4-7e00-4b7e-b051-3c0ba7b221f6'}, u'msg_type': u'raw_input_request', u'parent_header': {}} so this line in the class RawInput should recieve it: while True: try: reply = self.socket.recv_json(zmq.NOBLOCK) But it doesn't. That's one thing. Another one is that for this to work well in the Qt frontend I need to fix pyout (keeping off course multiline input) which I have no clue how to do it. I could write a pretty crude fix with Qt and without requesting the kernel twice, but no _NN call will work and it will have to be rewritten when this problems are solved. So I think for now I'll move on to the next point in the schedule 'till some ideas popup. thanks in advance. Best regards. -- Gerardo Guti?rrez Guti?rrez Physics student Universidad de Antioquia Computational physics and astrophysics group (FACom ) Computational science and development branch(FACom-dev ) Usuario Linux #492295 -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.v.root at gmail.com Fri Jul 2 11:41:14 2010 From: ben.v.root at gmail.com (Benjamin Root) Date: Fri, 2 Jul 2010 10:41:14 -0500 Subject: [IPython-dev] cProfile and iPython Message-ID: Hello, I have found an odd bug when I used cProfile in an iPython shell. It seems to not load the same environment as the shell. The following is a very simple example: import cProfile import math x = 25 cProfile.run("y = math.sqrt(x)") This throws an exception "NameError: name 'math' is not defined" Similar problems occur if I define a function "foo" and call that in run. I should also note that using "run -p" works just fine for a cProfile-less version of the above script. I am using a stock install of python 2.6.5 and ipython 0.10 on Ubuntu 10.04. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Fri Jul 2 12:09:25 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 2 Jul 2010 11:09:25 -0500 Subject: [IPython-dev] cProfile and iPython In-Reply-To: References: Message-ID: Hi Benjamin, On Fri, Jul 2, 2010 at 10:41 AM, Benjamin Root wrote: > I have found an odd bug when I used cProfile in an iPython shell.? It seems > to not load the same environment as the shell.? The following is a very > simple exampl Thanks for the report! I just made a ticket for it: http://github.com/ipython/ipython/issues/issue/131 Your example completely reproduces the problem, many thanks. I hope we can get a fix for it soon, though if you find any solutions by all means send them our way. Cheers, f From fperez.net at gmail.com Sun Jul 4 01:17:04 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 3 Jul 2010 22:17:04 -0700 Subject: [IPython-dev] IPythonQt/ZMQ raw_input support issue. In-Reply-To: References: Message-ID: Hi all, On Fri, Jul 2, 2010 at 6:19 AM, Gerardo Gutierrez wrote: > > I've been trying to implement from the examples in pyzmq the support for > raw_input calls, to be used in IPthonZMQ and IPythonQt projects. > Nothing goo has come from this attempts but a better understanding of what I > don't understand. Here's a guide through my last thoughts so you can help > me: Just to let you know that I phoned Gerardo today and we went over the details of this, so the (otherwise fairly urgent) question is answered for now. I'll be offline for 3 days, and will then have some more time to get back on track with the list (the last few days were ipython-intensive but at the scipy sprints, I'm trying to write up a summary report of our activities there before I crash tonight). Cheers, f From fperez.net at gmail.com Sun Jul 4 01:36:47 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Sat, 3 Jul 2010 22:36:47 -0700 Subject: [IPython-dev] ipython + pydev In-Reply-To: References: Message-ID: Hey Satra, On Tue, Jun 29, 2010 at 10:18 PM, Satrajit Ghosh wrote: > hi, > > just wanted to check if anybody knows a way of getting an ipython console to > work with pydev. No clue, sorry. I don't use eclipse so I have no idea... Cheers, f From fperez.net at gmail.com Sun Jul 4 13:14:32 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Sun, 4 Jul 2010 10:14:32 -0700 Subject: [IPython-dev] IPython sprint summary (not) Message-ID: Hi all, our sprinting work at scipy turned out to be a lot bigger and more productive than I'd had imagined. I wanted to write up a good summary of the design changes and contributions but ran out of time last night and I'm headed out offline for 3 days now. If anyone who was present could write something up, it would be fantastic and would help out those following trunk and recent commit activities understand what's going on. Otherwise I'll do it around Thursday when I'm back. Cheers, f From dwf at cs.toronto.edu Thu Jul 8 19:52:24 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 8 Jul 2010 19:52:24 -0400 Subject: [IPython-dev] debugger.py refactoring Message-ID: Hey folks, I was just wondering (I didn't see a roadmap anywhere but then again I didn't look very hard) if a refactoring was planned for IPython/core/debugger.py, in particular to make it more extensible to third party tools. I just hacked in support for Andreas Kloeckner's pudb ( http://pypi.python.org/pypi/pudb ) but it wasn't pretty in the least. I guess some sort of 'debugger registry' would make sense, that a user could call into from their ipy_user_conf.py in order to hook up their favourite debugger's post-mortem mode? This is all just fanciful thinking aloud, but if no one's planning on doing anything to debugger.py in the near future I might give it a try when I get back into town next week. David From dwf at cs.toronto.edu Thu Jul 8 20:27:04 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 8 Jul 2010 20:27:04 -0400 Subject: [IPython-dev] debugger.py refactoring In-Reply-To: References: Message-ID: <65810004-4086-409F-9782-8B8C071F0882@cs.toronto.edu> On 2010-07-08, at 7:52 PM, David Warde-Farley wrote: > Hey folks, > > I was just wondering (I didn't see a roadmap anywhere but then again I didn't look very hard) if a refactoring was planned for IPython/core/debugger.py, in particular to make it more extensible to third party tools. Oops, debugger.py certainly isn't the place for this, and nor was it where I put it, but rather in iplib.py. For the interested, here's my monkey-patch job: # use pydb if available if Debugger.has_pydb: from pydb import pm else: # try and use pudb try: import pudb pudb.post_mortem((sys.last_type, sys.last_value, sys.last_traceback)) return except ImportError: pass # fallback to our internal debugger pm = lambda : self.InteractiveTB.debugger(force=True) self.history_saving_wrapper(pm)() David From benjaminrk at gmail.com Fri Jul 9 18:35:27 2010 From: benjaminrk at gmail.com (MinRK) Date: Fri, 9 Jul 2010 15:35:27 -0700 Subject: [IPython-dev] Heartbeat Device Message-ID: Brian, Have you worked on the Heartbeat Device? Does that need to go in 0MQ itself, or can it be part of pyzmq? I'm trying to work out how to really tell that an engine is down. Is the heartbeat to be in a separate process? Are we guaranteed that a zmq thread is responsive no matter what an engine process is doing? If that's the case, is a moderate timeout on recv adequate to determine engine failure? If zmq threads are guaranteed to be responsive, it seems like a simple pair socket might be good enough, rather than needing a new device. Or even through the registration XREP socket. Can we formalize exactly what the heartbeat needs to be? -MinRK -------------- next part -------------- An HTML attachment was scrubbed... URL: From songofacandy at gmail.com Sat Jul 10 05:18:16 2010 From: songofacandy at gmail.com (INADA Naoki) Date: Sat, 10 Jul 2010 18:18:16 +0900 Subject: [IPython-dev] Porting to Python3 Message-ID: Hi, all. Today, Python hack-a-thon is held in Japan. I've ported IPython to Python3 in there. Some feature works now. http://github.com/methane/ipython -- INADA Naoki? From ellisonbg at gmail.com Mon Jul 12 12:15:01 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Mon, 12 Jul 2010 09:15:01 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References: Message-ID: On Fri, Jul 9, 2010 at 3:35 PM, MinRK wrote: > Brian, > Have you worked on the Heartbeat Device? Does that need to go in 0MQ itself, I have not. Ideally it could go into 0MQ itself. But, in principle, we could do it in pyzmq. We just have to write a nogil pure C function that uses the low-level C API to do the heartbeat. Then we can just run that function in a thread with a "with nogil" block. Shouldn't be too bad, given how simple the heartbeat logic is. The main thing we will have to think about is how to start/stop the heartbeat in a clean way. > or can it be part of pyzmq? > I'm trying to work out how to really tell that an engine is down. > Is the heartbeat to be in a separate process? No, just a separate C/C++ thread that doesn't hold the GIL. > Are we guaranteed that a zmq thread is responsive no matter what an engine > process is doing? If that's the case, is a moderate timeout on recv adequate > to determine engine failure? Yes, I think we can assume this. The only thing that would take the 0mq thread down is something semi-fatal like a signal that doesn't get handled. But as long as the 0MQ thread doesn't have any bugs, it should simply keep running no matter what the other thread does (OK, other than segfaulting) > If zmq threads are guaranteed to be responsive, it seems like a simple pair > socket might be good enough, rather than needing a new device. Or even > through the registration XREP socket. That (registration XREP socket) won't work unless we want to write all that logic in C. I don't know about a PAIR socket because of the need for multiple clients? > Can we formalize exactly what the heartbeat needs to be? OK, let's think. The engine needs to connect, the controller bind. It would be nice if the controller didn't need a separate heartbeat socket for each engine, but I guess we need the ability to track which specific engine is heartbeating. Also, there is the question of to do want to do a reqest/reply or pub/sub style heartbeat. What do you think? Brian > -MinRK -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From benjaminrk at gmail.com Mon Jul 12 15:51:47 2010 From: benjaminrk at gmail.com (MinRK) Date: Mon, 12 Jul 2010 12:51:47 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References: Message-ID: On Mon, Jul 12, 2010 at 09:15, Brian Granger wrote: > On Fri, Jul 9, 2010 at 3:35 PM, MinRK wrote: > > Brian, > > Have you worked on the Heartbeat Device? Does that need to go in 0MQ > itself, > > I have not. Ideally it could go into 0MQ itself. But, in principle, > we could do it in pyzmq. We just have to write a nogil pure C > function that uses the low-level C API to do the heartbeat. Then we > can just run that function in a thread with a "with nogil" block. > Shouldn't be too bad, given how simple the heartbeat logic is. The > main thing we will have to think about is how to start/stop the > heartbeat in a clean way. > > > or can it be part of pyzmq? > > I'm trying to work out how to really tell that an engine is down. > > Is the heartbeat to be in a separate process? > > No, just a separate C/C++ thread that doesn't hold the GIL. > > > Are we guaranteed that a zmq thread is responsive no matter what an > engine > > process is doing? If that's the case, is a moderate timeout on recv > adequate > > to determine engine failure? > > Yes, I think we can assume this. The only thing that would take the > 0mq thread down is something semi-fatal like a signal that doesn't get > handled. But as long as the 0MQ thread doesn't have any bugs, it > should simply keep running no matter what the other thread does (OK, > other than segfaulting) > > > If zmq threads are guaranteed to be responsive, it seems like a simple > pair > > socket might be good enough, rather than needing a new device. Or even > > through the registration XREP socket. > > That (registration XREP socket) won't work unless we want to write all > that logic in C. > I don't know about a PAIR socket because of the need for multiple clients? > I wasn't thinking of a single PAIR socket, but rather a pair for each engine. We already have a pair for each engine for the queue, but I am not quite seeing the need for a special device beyond a PAIR socket in the heartbeat. > > > Can we formalize exactly what the heartbeat needs to be? > > OK, let's think. The engine needs to connect, the controller bind. > It would be nice if the controller didn't need a separate heartbeat > socket for each engine, but I guess we need the ability to track which > specific engine is heartbeating. Also, there is the question of to > do want to do a reqest/reply or pub/sub style heartbeat. What do you > think? > The way we talked about it, the heartbeat needs to issue commands both ways. While it is used for checking whether an engine remains alive, it is also the avenue for aborting jobs. If we do have a strict heartbeat, then I think PUB/SUB is a good choice. However, if heartbeat is all it does, then we need a _third_ connection to each engine for control commands. Since messages cannot jump the queue, the engine queue PAIR socket cannot be used for commands, and a PUB/SUB model for heartbeat can _either_ receive commands _or_ have results. control commands: beat (check alive) abort (remove a task from the queue) signal (SIGINT, etc.) exit (engine.kill) reset (clear queue, namespace) more? It's possible that we could implement these with a PUB on the controller and a SUB on each engine, only interpreting results received via the queue's PAIR socket. But then every command would be sent to every engine, even though many would only be meant for one (too inefficient/costly?). It would however make the actual heartbeat command very simple as a single send. It does not allow for the engine to initiate queries of the controller, for instance a work stealing implementation. Again, it is possible that this could be implemented via the job queue PAIR socket, but that would only allow for stealing when completely starved for work, since the job queue and communication queue would be the same. There's also the issue of task dependency. If we are to implement dependency checking as we discussed (depend on taskIDs, and only execute once the task has been completed), the engine needs to be able to query the controller about the tasks depended upon. This makes the controller being the PUB side unworkable. This says to me that we need two-way connections between the engines and the controller. That can either be implemented as multiple connections (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each engine could provide the whole heartbeat/command channel. -MinRK > > Brian > > > > -MinRK > > > > -- > Brian E. Granger, Ph.D. > Assistant Professor of Physics > Cal Poly State University, San Luis Obispo > bgranger at calpoly.edu > ellisonbg at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjaminrk at gmail.com Mon Jul 12 19:10:55 2010 From: benjaminrk at gmail.com (MinRK) Date: Mon, 12 Jul 2010 16:10:55 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References: Message-ID: I've been thinking about this, and it seems like we can't have a responsive rich control connection unless it is in another process, like the old IPython daemon. Pure heartbeat is easy with a C device, and we may not even need a new one. For instance, I added support for the builtin devices of zeromq to pyzmq with a few lines, and you can have simple is_alive style heartbeat with a FORWARDER device. I pushed a basic example of this (examples/heartbeat) to my pyzmq fork. Running a ~3 second numpy.dot action, the heartbeat pings remain responsive at <1ms. -MinRK On Mon, Jul 12, 2010 at 12:51, MinRK wrote: > > > On Mon, Jul 12, 2010 at 09:15, Brian Granger wrote: > >> On Fri, Jul 9, 2010 at 3:35 PM, MinRK wrote: >> > Brian, >> > Have you worked on the Heartbeat Device? Does that need to go in 0MQ >> itself, >> >> I have not. Ideally it could go into 0MQ itself. But, in principle, >> we could do it in pyzmq. We just have to write a nogil pure C >> function that uses the low-level C API to do the heartbeat. Then we >> can just run that function in a thread with a "with nogil" block. >> Shouldn't be too bad, given how simple the heartbeat logic is. The >> main thing we will have to think about is how to start/stop the >> heartbeat in a clean way. >> >> > or can it be part of pyzmq? >> > I'm trying to work out how to really tell that an engine is down. >> > Is the heartbeat to be in a separate process? >> >> No, just a separate C/C++ thread that doesn't hold the GIL. >> >> > Are we guaranteed that a zmq thread is responsive no matter what an >> engine >> > process is doing? If that's the case, is a moderate timeout on recv >> adequate >> > to determine engine failure? >> >> Yes, I think we can assume this. The only thing that would take the >> 0mq thread down is something semi-fatal like a signal that doesn't get >> handled. But as long as the 0MQ thread doesn't have any bugs, it >> should simply keep running no matter what the other thread does (OK, >> other than segfaulting) >> >> > If zmq threads are guaranteed to be responsive, it seems like a simple >> pair >> > socket might be good enough, rather than needing a new device. Or even >> > through the registration XREP socket. >> >> That (registration XREP socket) won't work unless we want to write all >> that logic in C. >> I don't know about a PAIR socket because of the need for multiple clients? >> > I wasn't thinking of a single PAIR socket, but rather a pair for each > engine. We already have a pair for each engine for the queue, but I am not > quite seeing the need for a special device beyond a PAIR socket in the > heartbeat. > > >> >> > Can we formalize exactly what the heartbeat needs to be? >> >> OK, let's think. The engine needs to connect, the controller bind. >> It would be nice if the controller didn't need a separate heartbeat >> socket for each engine, but I guess we need the ability to track which >> specific engine is heartbeating. Also, there is the question of to >> do want to do a reqest/reply or pub/sub style heartbeat. What do you >> think? >> > The way we talked about it, the heartbeat needs to issue commands both > ways. While it is used for checking whether an engine remains alive, it is > also the avenue for aborting jobs. If we do have a strict heartbeat, then I > think PUB/SUB is a good choice. > > However, if heartbeat is all it does, then we need a _third_ connection to > each engine for control commands. Since messages cannot jump the queue, the > engine queue PAIR socket cannot be used for commands, and a PUB/SUB model > for heartbeat can _either_ receive commands _or_ have results. > > control commands: > beat (check alive) > abort (remove a task from the queue) > signal (SIGINT, etc.) > exit (engine.kill) > reset (clear queue, namespace) > > more? > > It's possible that we could implement these with a PUB on the controller > and a SUB on each engine, only interpreting results received via the queue's > PAIR socket. But then every command would be sent to every engine, even > though many would only be meant for one (too inefficient/costly?). It would > however make the actual heartbeat command very simple as a single send. > > It does not allow for the engine to initiate queries of the controller, for > instance a work stealing implementation. Again, it is possible that this > could be implemented via the job queue PAIR socket, but that would only > allow for stealing when completely starved for work, since the job queue and > communication queue would be the same. > > There's also the issue of task dependency. > > If we are to implement dependency checking as we discussed (depend on > taskIDs, and only execute once the task has been completed), the engine > needs to be able to query the controller about the tasks depended upon. This > makes the controller being the PUB side unworkable. > > This says to me that we need two-way connections between the engines and > the controller. That can either be implemented as multiple connections > (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each engine could > provide the whole heartbeat/command channel. > > -MinRK > > >> >> Brian >> >> >> > -MinRK >> >> >> >> -- >> Brian E. Granger, Ph.D. >> Assistant Professor of Physics >> Cal Poly State University, San Luis Obispo >> bgranger at calpoly.edu >> ellisonbg at gmail.com >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vano at mail.mipt.ru Mon Jul 12 21:26:12 2010 From: vano at mail.mipt.ru (vano) Date: Tue, 13 Jul 2010 05:26:12 +0400 Subject: [IPython-dev] %run -d is broken in Python 2.7 Message-ID: <1213230248.20100713052612@mail.mipt.ru> Subj. On attempting to run a script under (colored :-) ) debugger the following appears (VV excerpt, full message is attached): In [1]: %run -e -d setup.py build --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) C:\Ivan\Cython-0.12.1\ in () D:\py\lib\site-packages\ipython-0.10-py2.7.egg\IPython\iplib.pyc in ipmagic(self, arg_s) -> 1182 return fn(magic_args) D:\py\lib\site-packages\ipython-0.10-py2.7.egg\IPython\Magic.pyc in magic_run(self, parameter_s, runner, file_finder) -> 1633 checkline = deb.checkline(filename,bp) D:\py\lib\pdb.py in checkline(self, filename, lineno) --> 470 line = linecache.getline(filename, lineno, self.curframe.f_globals) AttributeError: Pdb instance has no attribute 'curframe' > d:\py\lib\pdb.py(470)checkline() --> 470 line = linecache.getline(filename, lineno, self.curframe.f_globals) --------------------------------------------------------------------------- After thorough investigation, it turned out a pdb issue (details are on the link), so i filed a bug there (http://bugs.python.org/issue9230) as well as a bugfix. If any of you have write access to python source, you can help me to get it fixed quickly. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: backtrace.txt URL: From ellisonbg at gmail.com Mon Jul 12 23:26:04 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Mon, 12 Jul 2010 20:26:04 -0700 Subject: [IPython-dev] Fwd: [zeromq-dev] Authentication on "topic" In-Reply-To: References: <3127C7C2-4A7D-4DF7-8A62-42BFE6F12E0C@quant-edge.com> Message-ID: Just saw this on the 0MQ list about authentication and 0MQ. Cheers, Brian ---------- Forwarded message ---------- From: Pieter Hintjens Date: Mon, Jul 12, 2010 at 11:37 AM Subject: Re: [zeromq-dev] Authentication on "topic" To: 0MQ development list Hi Viet, There is no plan to add authentication to ZeroMQ core. ?However we are developing a data plant layer above ZeroMQ, which will do secure distribution over multicast as well as TCP. ?It will use request-response to do key distribution, and then clients will use those keys to unlock streams of data. The data plant layer will provide a stream-based pubsub fabric with tools such as fork, clone, arbitrate, failover, delay, log, etc. ?It will eventually connect to feed handlers to provide a ticket plant. This new product will be open source but we're developing it off-line initially, i.e. with a closed community of participants. ?If you are interested in getting access to it early, drop me a line. Regards - Pieter Hintjens iMatix On Mon, Jul 12, 2010 at 7:41 PM, Viet Hoang, Quant Edge wrote: > Hi, > We are evaluating ZeroMQ to replace our existing client/server architecture. > Our requirements are: > 1. Clients login into the servers farm > 2. Each client will have its own topic > 3. Many traders/risk managers will subscribe to client topics to monitor > trading activities > 3. Client sends an order to Order gateway, responses & status will be > published back to the clients and trader/risk manager screen > The initial feedback is excellent, with its load balancing and > publish/subscribe features, ZeroMQ simply fits our requirements.?We need > some sort of authentication mode for publish/subscribe feature, so that only > un-authorized people cannot siphon on "topics", but I could not find it > anywhere in the code? Do you guys have any plan to add the feature on soon? > Cheers, > Viet > > > _______________________________________________ > zeromq-dev mailing list > zeromq-dev at lists.zeromq.org > http://lists.zeromq.org/mailman/listinfo/zeromq-dev > > _______________________________________________ zeromq-dev mailing list zeromq-dev at lists.zeromq.org http://lists.zeromq.org/mailman/listinfo/zeromq-dev -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From ellisonbg at gmail.com Mon Jul 12 23:43:29 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Mon, 12 Jul 2010 20:43:29 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References:

Message-ID: Min, On Mon, Jul 12, 2010 at 4:10 PM, MinRK wrote: > I've been thinking about this, and it seems like we can't have a responsive > rich control connection unless it is in another process, like the old > IPython daemon. I am not quite sure I follow what you mean by this. Can you elaborate? > Pure heartbeat is easy with a C device, and we may not even > need a new one. For instance, I added support for the builtin devices of > zeromq to pyzmq with a few lines, and you can have simple is_alive style > heartbeat with a FORWARDER device. I looked at this and it looks very nice. I think for basic is_alive type heartbeats this will work fine. The only thing to be careful of is that 0MQ sockets are not thread safe. Thus, it would be best to actually create the socket in the thread as well. But we do want the flexibility to be able to pass in sockets to the device. We will have to think about that issue. > I pushed a basic example of this (examples/heartbeat) to my pyzmq fork. > Running a ~3 second numpy.dot action, the heartbeat pings remain responsive > at <1ms. This is great! Cheers, Brian > -MinRK > > On Mon, Jul 12, 2010 at 12:51, MinRK wrote: >> >> >> On Mon, Jul 12, 2010 at 09:15, Brian Granger wrote: >>> >>> On Fri, Jul 9, 2010 at 3:35 PM, MinRK wrote: >>> > Brian, >>> > Have you worked on the Heartbeat Device? Does that need to go in 0MQ >>> > itself, >>> >>> I have not. ?Ideally it could go into 0MQ itself. ?But, in principle, >>> we could do it in pyzmq. ?We just have to write a nogil pure C >>> function that uses the low-level C API to do the heartbeat. ?Then we >>> can just run that function in a thread with a "with nogil" block. >>> Shouldn't be too bad, given how simple the heartbeat logic is. ?The >>> main thing we will have to think about is how to start/stop the >>> heartbeat in a clean way. >>> >>> > or can it be part of pyzmq? >>> > I'm trying to work out how to really tell that an engine is down. >>> > Is the heartbeat to be in a separate process? >>> >>> No, just a separate C/C++ thread that doesn't hold the GIL. >>> >>> > Are we guaranteed that a zmq thread is responsive no matter what an >>> > engine >>> > process is doing? If that's the case, is a moderate timeout on recv >>> > adequate >>> > to determine engine failure? >>> >>> Yes, I think we can assume this. ?The only thing that would take the >>> 0mq thread down is something semi-fatal like a signal that doesn't get >>> handled. ?But as long as the 0MQ thread doesn't have any bugs, it >>> should simply keep running no matter what the other thread does (OK, >>> other than segfaulting) >>> >>> > If zmq threads are guaranteed to be responsive, it seems like a simple >>> > pair >>> > socket might be good enough, rather than needing a new device. Or even >>> > through the registration XREP socket. >>> >>> That (registration XREP socket) won't work unless we want to write all >>> that logic in C. >>> I don't know about a PAIR socket because of the need for multiple >>> clients? >> >> I wasn't thinking of a single PAIR socket, but rather a pair for each >> engine. We already have a pair for each engine for the queue, but I am not >> quite seeing the need for a special device beyond a PAIR socket in the >> heartbeat. >> >>> >>> > Can we formalize exactly what the heartbeat needs to be? >>> >>> OK, let's think. ?The engine needs to connect, the controller bind. >>> It would be nice if the controller didn't need a separate heartbeat >>> socket for each engine, but I guess we need the ability to track which >>> specific engine is heartbeating. ? Also, there is the question of to >>> do want to do a reqest/reply or pub/sub style heartbeat. ?What do you >>> think? >> >> The way we talked about it, the heartbeat needs to issue commands both >> ways. While it is used for checking whether an engine remains alive, it is >> also the avenue for aborting jobs. ?If we do have a strict heartbeat, then I >> think PUB/SUB is a good choice. >> However, if heartbeat is all it does, then we need a _third_ connection to >> each engine for control commands. Since messages cannot jump the queue, the >> engine queue PAIR socket cannot be used for commands, and a PUB/SUB model >> for heartbeat can _either_ receive commands _or_ have results. >> control commands: >> beat (check alive) >> abort (remove a task from the queue) >> signal (SIGINT, etc.) >> exit (engine.kill) >> reset (clear queue, namespace) >> more? >> It's possible that we could implement these with a PUB on the controller >> and a SUB on each engine, only interpreting results received via the queue's >> PAIR socket. But then every command would be sent to every engine, even >> though many would only be meant for one (too inefficient/costly?). It would >> however make the actual heartbeat command very simple as a single send. >> It does not allow for the engine to initiate queries of the controller, >> for instance a work stealing implementation. Again, it is possible that this >> could be implemented via the job queue PAIR socket, but that would only >> allow for stealing when completely starved for work, since the job queue and >> communication queue would be the same. >> There's also the issue of task dependency. >> If we are to implement dependency checking as we discussed (depend on >> taskIDs, and only execute once the task has been completed), the engine >> needs to be able to query the controller about the tasks depended upon. This >> makes the controller being the PUB side unworkable. >> This says to me that we need two-way connections between the engines and >> the controller. That can either be implemented as multiple connections >> (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each engine could >> provide the whole heartbeat/command channel. >> -MinRK >> >>> >>> Brian >>> >>> >>> > -MinRK >>> >>> >>> >>> -- >>> Brian E. Granger, Ph.D. >>> Assistant Professor of Physics >>> Cal Poly State University, San Luis Obispo >>> bgranger at calpoly.edu >>> ellisonbg at gmail.com >> > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From benjaminrk at gmail.com Tue Jul 13 00:49:01 2010 From: benjaminrk at gmail.com (MinRK) Date: Mon, 12 Jul 2010 21:49:01 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References:

Message-ID: On Mon, Jul 12, 2010 at 20:43, Brian Granger wrote: > Min, > > On Mon, Jul 12, 2010 at 4:10 PM, MinRK wrote: > > I've been thinking about this, and it seems like we can't have a > responsive > > rich control connection unless it is in another process, like the old > > IPython daemon. > > I am not quite sure I follow what you mean by this. Can you elaborate? > The main advantage that we were to gain from the out-of-process ipdaemon was the ability to abort/kill (signal) blocking jobs. With 0MQ threads, the only logic we can have in a control/heartbeat thread must be implemented in GIL-free C/C++. That limits what we can do in terms of interacting with the main work thread, as I understand it. > > > Pure heartbeat is easy with a C device, and we may not even > > need a new one. For instance, I added support for the builtin devices of > > zeromq to pyzmq with a few lines, and you can have simple is_alive style > > heartbeat with a FORWARDER device. > > I looked at this and it looks very nice. I think for basic is_alive > type heartbeats this will work fine. The only thing to be careful of > is that 0MQ sockets are not thread safe. Thus, it would be best to > actually create the socket in the thread as well. But we do want the > flexibility to be able to pass in sockets to the device. We will have > to think about that issue. > I wrote/pushed a basic ThreadsafeDevice, which creates/binds/connects inside the thread's run method. It adds bind_in/out, connect_in/out, and setsockopt_in/out methods which just queue up arguments to be called at the head of the run method. I added a tspong.py in the heartbeat example using it. > > > I pushed a basic example of this (examples/heartbeat) to my pyzmq fork. > > Running a ~3 second numpy.dot action, the heartbeat pings remain > responsive > > at <1ms. > > This is great! > > Cheers, > > Brian > > -MinRK > > > > On Mon, Jul 12, 2010 at 12:51, MinRK wrote: > >> > >> > >> On Mon, Jul 12, 2010 at 09:15, Brian Granger > wrote: > >>> > >>> On Fri, Jul 9, 2010 at 3:35 PM, MinRK wrote: > >>> > Brian, > >>> > Have you worked on the Heartbeat Device? Does that need to go in 0MQ > >>> > itself, > >>> > >>> I have not. Ideally it could go into 0MQ itself. But, in principle, > >>> we could do it in pyzmq. We just have to write a nogil pure C > >>> function that uses the low-level C API to do the heartbeat. Then we > >>> can just run that function in a thread with a "with nogil" block. > >>> Shouldn't be too bad, given how simple the heartbeat logic is. The > >>> main thing we will have to think about is how to start/stop the > >>> heartbeat in a clean way. > >>> > >>> > or can it be part of pyzmq? > >>> > I'm trying to work out how to really tell that an engine is down. > >>> > Is the heartbeat to be in a separate process? > >>> > >>> No, just a separate C/C++ thread that doesn't hold the GIL. > >>> > >>> > Are we guaranteed that a zmq thread is responsive no matter what an > >>> > engine > >>> > process is doing? If that's the case, is a moderate timeout on recv > >>> > adequate > >>> > to determine engine failure? > >>> > >>> Yes, I think we can assume this. The only thing that would take the > >>> 0mq thread down is something semi-fatal like a signal that doesn't get > >>> handled. But as long as the 0MQ thread doesn't have any bugs, it > >>> should simply keep running no matter what the other thread does (OK, > >>> other than segfaulting) > >>> > >>> > If zmq threads are guaranteed to be responsive, it seems like a > simple > >>> > pair > >>> > socket might be good enough, rather than needing a new device. Or > even > >>> > through the registration XREP socket. > >>> > >>> That (registration XREP socket) won't work unless we want to write all > >>> that logic in C. > >>> I don't know about a PAIR socket because of the need for multiple > >>> clients? > >> > >> I wasn't thinking of a single PAIR socket, but rather a pair for each > >> engine. We already have a pair for each engine for the queue, but I am > not > >> quite seeing the need for a special device beyond a PAIR socket in the > >> heartbeat. > >> > >>> > >>> > Can we formalize exactly what the heartbeat needs to be? > >>> > >>> OK, let's think. The engine needs to connect, the controller bind. > >>> It would be nice if the controller didn't need a separate heartbeat > >>> socket for each engine, but I guess we need the ability to track which > >>> specific engine is heartbeating. Also, there is the question of to > >>> do want to do a reqest/reply or pub/sub style heartbeat. What do you > >>> think? > >> > >> The way we talked about it, the heartbeat needs to issue commands both > >> ways. While it is used for checking whether an engine remains alive, it > is > >> also the avenue for aborting jobs. If we do have a strict heartbeat, > then I > >> think PUB/SUB is a good choice. > >> However, if heartbeat is all it does, then we need a _third_ connection > to > >> each engine for control commands. Since messages cannot jump the queue, > the > >> engine queue PAIR socket cannot be used for commands, and a PUB/SUB > model > >> for heartbeat can _either_ receive commands _or_ have results. > >> control commands: > >> beat (check alive) > >> abort (remove a task from the queue) > >> signal (SIGINT, etc.) > >> exit (engine.kill) > >> reset (clear queue, namespace) > >> more? > >> It's possible that we could implement these with a PUB on the controller > >> and a SUB on each engine, only interpreting results received via the > queue's > >> PAIR socket. But then every command would be sent to every engine, even > >> though many would only be meant for one (too inefficient/costly?). It > would > >> however make the actual heartbeat command very simple as a single send. > >> It does not allow for the engine to initiate queries of the controller, > >> for instance a work stealing implementation. Again, it is possible that > this > >> could be implemented via the job queue PAIR socket, but that would only > >> allow for stealing when completely starved for work, since the job queue > and > >> communication queue would be the same. > >> There's also the issue of task dependency. > >> If we are to implement dependency checking as we discussed (depend on > >> taskIDs, and only execute once the task has been completed), the engine > >> needs to be able to query the controller about the tasks depended upon. > This > >> makes the controller being the PUB side unworkable. > >> This says to me that we need two-way connections between the engines and > >> the controller. That can either be implemented as multiple connections > >> (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each engine > could > >> provide the whole heartbeat/command channel. > >> -MinRK > >> > >>> > >>> Brian > >>> > >>> > >>> > -MinRK > >>> > >>> > >>> > >>> -- > >>> Brian E. Granger, Ph.D. > >>> Assistant Professor of Physics > >>> Cal Poly State University, San Luis Obispo > >>> bgranger at calpoly.edu > >>> ellisonbg at gmail.com > >> > > > > > > > > -- > Brian E. Granger, Ph.D. > Assistant Professor of Physics > Cal Poly State University, San Luis Obispo > bgranger at calpoly.edu > ellisonbg at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Tue Jul 13 01:04:24 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Mon, 12 Jul 2010 22:04:24 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References:

Message-ID: On Mon, Jul 12, 2010 at 9:49 PM, MinRK wrote: > > > On Mon, Jul 12, 2010 at 20:43, Brian Granger wrote: >> >> Min, >> >> On Mon, Jul 12, 2010 at 4:10 PM, MinRK wrote: >> > I've been thinking about this, and it seems like we can't have a >> > responsive >> > rich control connection unless it is in another process, like the old >> > IPython daemon. >> >> I am not quite sure I follow what you mean by this. ?Can you elaborate? > > The main advantage that we were to gain from the out-of-process ipdaemon was > the ability to abort/kill (signal) blocking jobs. With 0MQ threads, the only > logic we can have in a control/heartbeat thread must be implemented in > GIL-free C/C++. That limits what we can do in terms of interacting with the > main work thread, as I understand it. Yes, but I think it might be possible to spawn an external process to send a signal back to the process. But I am not sure about this. >> >> > Pure heartbeat is easy with a C device, and we may not even >> > need a new one. For instance, I added support for the builtin devices of >> > zeromq to pyzmq with a few lines, and you can have simple is_alive style >> > heartbeat with a FORWARDER device. >> >> I looked at this and it looks very nice. ?I think for basic is_alive >> type heartbeats this will work fine. ?The only thing to be careful of >> is that 0MQ sockets are not thread safe. ?Thus, it would be best to >> actually create the socket in the thread as well. ?But we do want the >> flexibility to be able to pass in sockets to the device. ?We will have >> to think about that issue. > > > I wrote/pushed a basic ThreadsafeDevice, which creates/binds/connects inside > the thread's run method. > It adds bind_in/out, connect_in/out, and setsockopt_in/out methods which > just queue up arguments to be called at the head of the run method. I added > a tspong.py in the heartbeat example using it. Cool, I will review this and merge it into master. Cheers, Brian >> >> > I pushed a basic example of this (examples/heartbeat) to my pyzmq fork. >> > Running a ~3 second numpy.dot action, the heartbeat pings remain >> > responsive >> > at <1ms. >> >> This is great! >> >> Cheers, >> >> Brian >> > -MinRK >> > >> > On Mon, Jul 12, 2010 at 12:51, MinRK wrote: >> >> >> >> >> >> On Mon, Jul 12, 2010 at 09:15, Brian Granger >> >> wrote: >> >>> >> >>> On Fri, Jul 9, 2010 at 3:35 PM, MinRK wrote: >> >>> > Brian, >> >>> > Have you worked on the Heartbeat Device? Does that need to go in 0MQ >> >>> > itself, >> >>> >> >>> I have not. ?Ideally it could go into 0MQ itself. ?But, in principle, >> >>> we could do it in pyzmq. ?We just have to write a nogil pure C >> >>> function that uses the low-level C API to do the heartbeat. ?Then we >> >>> can just run that function in a thread with a "with nogil" block. >> >>> Shouldn't be too bad, given how simple the heartbeat logic is. ?The >> >>> main thing we will have to think about is how to start/stop the >> >>> heartbeat in a clean way. >> >>> >> >>> > or can it be part of pyzmq? >> >>> > I'm trying to work out how to really tell that an engine is down. >> >>> > Is the heartbeat to be in a separate process? >> >>> >> >>> No, just a separate C/C++ thread that doesn't hold the GIL. >> >>> >> >>> > Are we guaranteed that a zmq thread is responsive no matter what an >> >>> > engine >> >>> > process is doing? If that's the case, is a moderate timeout on recv >> >>> > adequate >> >>> > to determine engine failure? >> >>> >> >>> Yes, I think we can assume this. ?The only thing that would take the >> >>> 0mq thread down is something semi-fatal like a signal that doesn't get >> >>> handled. ?But as long as the 0MQ thread doesn't have any bugs, it >> >>> should simply keep running no matter what the other thread does (OK, >> >>> other than segfaulting) >> >>> >> >>> > If zmq threads are guaranteed to be responsive, it seems like a >> >>> > simple >> >>> > pair >> >>> > socket might be good enough, rather than needing a new device. Or >> >>> > even >> >>> > through the registration XREP socket. >> >>> >> >>> That (registration XREP socket) won't work unless we want to write all >> >>> that logic in C. >> >>> I don't know about a PAIR socket because of the need for multiple >> >>> clients? >> >> >> >> I wasn't thinking of a single PAIR socket, but rather a pair for each >> >> engine. We already have a pair for each engine for the queue, but I am >> >> not >> >> quite seeing the need for a special device beyond a PAIR socket in the >> >> heartbeat. >> >> >> >>> >> >>> > Can we formalize exactly what the heartbeat needs to be? >> >>> >> >>> OK, let's think. ?The engine needs to connect, the controller bind. >> >>> It would be nice if the controller didn't need a separate heartbeat >> >>> socket for each engine, but I guess we need the ability to track which >> >>> specific engine is heartbeating. ? Also, there is the question of to >> >>> do want to do a reqest/reply or pub/sub style heartbeat. ?What do you >> >>> think? >> >> >> >> The way we talked about it, the heartbeat needs to issue commands both >> >> ways. While it is used for checking whether an engine remains alive, it >> >> is >> >> also the avenue for aborting jobs. ?If we do have a strict heartbeat, >> >> then I >> >> think PUB/SUB is a good choice. >> >> However, if heartbeat is all it does, then we need a _third_ connection >> >> to >> >> each engine for control commands. Since messages cannot jump the queue, >> >> the >> >> engine queue PAIR socket cannot be used for commands, and a PUB/SUB >> >> model >> >> for heartbeat can _either_ receive commands _or_ have results. >> >> control commands: >> >> beat (check alive) >> >> abort (remove a task from the queue) >> >> signal (SIGINT, etc.) >> >> exit (engine.kill) >> >> reset (clear queue, namespace) >> >> more? >> >> It's possible that we could implement these with a PUB on the >> >> controller >> >> and a SUB on each engine, only interpreting results received via the >> >> queue's >> >> PAIR socket. But then every command would be sent to every engine, even >> >> though many would only be meant for one (too inefficient/costly?). It >> >> would >> >> however make the actual heartbeat command very simple as a single send. >> >> It does not allow for the engine to initiate queries of the controller, >> >> for instance a work stealing implementation. Again, it is possible that >> >> this >> >> could be implemented via the job queue PAIR socket, but that would only >> >> allow for stealing when completely starved for work, since the job >> >> queue and >> >> communication queue would be the same. >> >> There's also the issue of task dependency. >> >> If we are to implement dependency checking as we discussed (depend on >> >> taskIDs, and only execute once the task has been completed), the engine >> >> needs to be able to query the controller about the tasks depended upon. >> >> This >> >> makes the controller being the PUB side unworkable. >> >> This says to me that we need two-way connections between the engines >> >> and >> >> the controller. That can either be implemented as multiple connections >> >> (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each engine >> >> could >> >> provide the whole heartbeat/command channel. >> >> -MinRK >> >> >> >>> >> >>> Brian >> >>> >> >>> >> >>> > -MinRK >> >>> >> >>> >> >>> >> >>> -- >> >>> Brian E. Granger, Ph.D. >> >>> Assistant Professor of Physics >> >>> Cal Poly State University, San Luis Obispo >> >>> bgranger at calpoly.edu >> >>> ellisonbg at gmail.com >> >> >> > >> > >> >> >> >> -- >> Brian E. Granger, Ph.D. >> Assistant Professor of Physics >> Cal Poly State University, San Luis Obispo >> bgranger at calpoly.edu >> ellisonbg at gmail.com > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From benjaminrk at gmail.com Tue Jul 13 01:10:01 2010 From: benjaminrk at gmail.com (MinRK) Date: Mon, 12 Jul 2010 22:10:01 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References:

Message-ID: On Mon, Jul 12, 2010 at 22:04, Brian Granger wrote: > On Mon, Jul 12, 2010 at 9:49 PM, MinRK wrote: > > > > > > On Mon, Jul 12, 2010 at 20:43, Brian Granger > wrote: > >> > >> Min, > >> > >> On Mon, Jul 12, 2010 at 4:10 PM, MinRK wrote: > >> > I've been thinking about this, and it seems like we can't have a > >> > responsive > >> > rich control connection unless it is in another process, like the old > >> > IPython daemon. > >> > >> I am not quite sure I follow what you mean by this. Can you elaborate? > > > > The main advantage that we were to gain from the out-of-process ipdaemon > was > > the ability to abort/kill (signal) blocking jobs. With 0MQ threads, the > only > > logic we can have in a control/heartbeat thread must be implemented in > > GIL-free C/C++. That limits what we can do in terms of interacting with > the > > main work thread, as I understand it. > > Yes, but I think it might be possible to spawn an external process to > send a signal back to the process. But I am not sure about this. > > >> > >> > Pure heartbeat is easy with a C device, and we may not even > >> > need a new one. For instance, I added support for the builtin devices > of > >> > zeromq to pyzmq with a few lines, and you can have simple is_alive > style > >> > heartbeat with a FORWARDER device. > >> > >> I looked at this and it looks very nice. I think for basic is_alive > >> type heartbeats this will work fine. The only thing to be careful of > >> is that 0MQ sockets are not thread safe. Thus, it would be best to > >> actually create the socket in the thread as well. But we do want the > >> flexibility to be able to pass in sockets to the device. We will have > >> to think about that issue. > > > > > > I wrote/pushed a basic ThreadsafeDevice, which creates/binds/connects > inside > > the thread's run method. > > It adds bind_in/out, connect_in/out, and setsockopt_in/out methods which > > just queue up arguments to be called at the head of the run method. I > added > > a tspong.py in the heartbeat example using it. > > Cool, I will review this and merge it into master. > > I'd say it's not ready for master in one particular respect: The Device thread doesn't respond to signals, so I have to kill it to stop it. I haven't yet figured out why this is happening; it might be quite simple. I'll push up some unit tests tomorrow > Cheers, > > Brian > > >> > >> > I pushed a basic example of this (examples/heartbeat) to my pyzmq > fork. > >> > Running a ~3 second numpy.dot action, the heartbeat pings remain > >> > responsive > >> > at <1ms. > >> > >> This is great! > >> > >> Cheers, > >> > >> Brian > >> > -MinRK > >> > > >> > On Mon, Jul 12, 2010 at 12:51, MinRK wrote: > >> >> > >> >> > >> >> On Mon, Jul 12, 2010 at 09:15, Brian Granger > >> >> wrote: > >> >>> > >> >>> On Fri, Jul 9, 2010 at 3:35 PM, MinRK wrote: > >> >>> > Brian, > >> >>> > Have you worked on the Heartbeat Device? Does that need to go in > 0MQ > >> >>> > itself, > >> >>> > >> >>> I have not. Ideally it could go into 0MQ itself. But, in > principle, > >> >>> we could do it in pyzmq. We just have to write a nogil pure C > >> >>> function that uses the low-level C API to do the heartbeat. Then we > >> >>> can just run that function in a thread with a "with nogil" block. > >> >>> Shouldn't be too bad, given how simple the heartbeat logic is. The > >> >>> main thing we will have to think about is how to start/stop the > >> >>> heartbeat in a clean way. > >> >>> > >> >>> > or can it be part of pyzmq? > >> >>> > I'm trying to work out how to really tell that an engine is down. > >> >>> > Is the heartbeat to be in a separate process? > >> >>> > >> >>> No, just a separate C/C++ thread that doesn't hold the GIL. > >> >>> > >> >>> > Are we guaranteed that a zmq thread is responsive no matter what > an > >> >>> > engine > >> >>> > process is doing? If that's the case, is a moderate timeout on > recv > >> >>> > adequate > >> >>> > to determine engine failure? > >> >>> > >> >>> Yes, I think we can assume this. The only thing that would take the > >> >>> 0mq thread down is something semi-fatal like a signal that doesn't > get > >> >>> handled. But as long as the 0MQ thread doesn't have any bugs, it > >> >>> should simply keep running no matter what the other thread does (OK, > >> >>> other than segfaulting) > >> >>> > >> >>> > If zmq threads are guaranteed to be responsive, it seems like a > >> >>> > simple > >> >>> > pair > >> >>> > socket might be good enough, rather than needing a new device. Or > >> >>> > even > >> >>> > through the registration XREP socket. > >> >>> > >> >>> That (registration XREP socket) won't work unless we want to write > all > >> >>> that logic in C. > >> >>> I don't know about a PAIR socket because of the need for multiple > >> >>> clients? > >> >> > >> >> I wasn't thinking of a single PAIR socket, but rather a pair for each > >> >> engine. We already have a pair for each engine for the queue, but I > am > >> >> not > >> >> quite seeing the need for a special device beyond a PAIR socket in > the > >> >> heartbeat. > >> >> > >> >>> > >> >>> > Can we formalize exactly what the heartbeat needs to be? > >> >>> > >> >>> OK, let's think. The engine needs to connect, the controller bind. > >> >>> It would be nice if the controller didn't need a separate heartbeat > >> >>> socket for each engine, but I guess we need the ability to track > which > >> >>> specific engine is heartbeating. Also, there is the question of to > >> >>> do want to do a reqest/reply or pub/sub style heartbeat. What do > you > >> >>> think? > >> >> > >> >> The way we talked about it, the heartbeat needs to issue commands > both > >> >> ways. While it is used for checking whether an engine remains alive, > it > >> >> is > >> >> also the avenue for aborting jobs. If we do have a strict heartbeat, > >> >> then I > >> >> think PUB/SUB is a good choice. > >> >> However, if heartbeat is all it does, then we need a _third_ > connection > >> >> to > >> >> each engine for control commands. Since messages cannot jump the > queue, > >> >> the > >> >> engine queue PAIR socket cannot be used for commands, and a PUB/SUB > >> >> model > >> >> for heartbeat can _either_ receive commands _or_ have results. > >> >> control commands: > >> >> beat (check alive) > >> >> abort (remove a task from the queue) > >> >> signal (SIGINT, etc.) > >> >> exit (engine.kill) > >> >> reset (clear queue, namespace) > >> >> more? > >> >> It's possible that we could implement these with a PUB on the > >> >> controller > >> >> and a SUB on each engine, only interpreting results received via the > >> >> queue's > >> >> PAIR socket. But then every command would be sent to every engine, > even > >> >> though many would only be meant for one (too inefficient/costly?). It > >> >> would > >> >> however make the actual heartbeat command very simple as a single > send. > >> >> It does not allow for the engine to initiate queries of the > controller, > >> >> for instance a work stealing implementation. Again, it is possible > that > >> >> this > >> >> could be implemented via the job queue PAIR socket, but that would > only > >> >> allow for stealing when completely starved for work, since the job > >> >> queue and > >> >> communication queue would be the same. > >> >> There's also the issue of task dependency. > >> >> If we are to implement dependency checking as we discussed (depend on > >> >> taskIDs, and only execute once the task has been completed), the > engine > >> >> needs to be able to query the controller about the tasks depended > upon. > >> >> This > >> >> makes the controller being the PUB side unworkable. > >> >> This says to me that we need two-way connections between the engines > >> >> and > >> >> the controller. That can either be implemented as multiple > connections > >> >> (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each engine > >> >> could > >> >> provide the whole heartbeat/command channel. > >> >> -MinRK > >> >> > >> >>> > >> >>> Brian > >> >>> > >> >>> > >> >>> > -MinRK > >> >>> > >> >>> > >> >>> > >> >>> -- > >> >>> Brian E. Granger, Ph.D. > >> >>> Assistant Professor of Physics > >> >>> Cal Poly State University, San Luis Obispo > >> >>> bgranger at calpoly.edu > >> >>> ellisonbg at gmail.com > >> >> > >> > > >> > > >> > >> > >> > >> -- > >> Brian E. Granger, Ph.D. > >> Assistant Professor of Physics > >> Cal Poly State University, San Luis Obispo > >> bgranger at calpoly.edu > >> ellisonbg at gmail.com > > > > > > > > -- > Brian E. Granger, Ph.D. > Assistant Professor of Physics > Cal Poly State University, San Luis Obispo > bgranger at calpoly.edu > ellisonbg at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benjaminrk at gmail.com Tue Jul 13 15:51:39 2010 From: benjaminrk at gmail.com (MinRK) Date: Tue, 13 Jul 2010 12:51:39 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References:

Message-ID: Re: not exiting without killing: I just needed to add thread.setDaemon(True), so the device threads do exit properly now. committed/pushed to git. -MinRK On Mon, Jul 12, 2010 at 22:10, MinRK wrote: > > > On Mon, Jul 12, 2010 at 22:04, Brian Granger wrote: > >> On Mon, Jul 12, 2010 at 9:49 PM, MinRK wrote: >> > >> > >> > On Mon, Jul 12, 2010 at 20:43, Brian Granger >> wrote: >> >> >> >> Min, >> >> >> >> On Mon, Jul 12, 2010 at 4:10 PM, MinRK wrote: >> >> > I've been thinking about this, and it seems like we can't have a >> >> > responsive >> >> > rich control connection unless it is in another process, like the old >> >> > IPython daemon. >> >> >> >> I am not quite sure I follow what you mean by this. Can you elaborate? >> > >> > The main advantage that we were to gain from the out-of-process ipdaemon >> was >> > the ability to abort/kill (signal) blocking jobs. With 0MQ threads, the >> only >> > logic we can have in a control/heartbeat thread must be implemented in >> > GIL-free C/C++. That limits what we can do in terms of interacting with >> the >> > main work thread, as I understand it. >> >> Yes, but I think it might be possible to spawn an external process to >> send a signal back to the process. But I am not sure about this. >> >> >> >> >> > Pure heartbeat is easy with a C device, and we may not even >> >> > need a new one. For instance, I added support for the builtin devices >> of >> >> > zeromq to pyzmq with a few lines, and you can have simple is_alive >> style >> >> > heartbeat with a FORWARDER device. >> >> >> >> I looked at this and it looks very nice. I think for basic is_alive >> >> type heartbeats this will work fine. The only thing to be careful of >> >> is that 0MQ sockets are not thread safe. Thus, it would be best to >> >> actually create the socket in the thread as well. But we do want the >> >> flexibility to be able to pass in sockets to the device. We will have >> >> to think about that issue. >> > >> > >> > I wrote/pushed a basic ThreadsafeDevice, which creates/binds/connects >> inside >> > the thread's run method. >> > It adds bind_in/out, connect_in/out, and setsockopt_in/out methods which >> > just queue up arguments to be called at the head of the run method. I >> added >> > a tspong.py in the heartbeat example using it. >> >> Cool, I will review this and merge it into master. >> >> > I'd say it's not ready for master in one particular respect: The Device > thread doesn't respond to signals, so I have to kill it to stop it. I > haven't yet figured out why this is happening; it might be quite simple. > > I'll push up some unit tests tomorrow > > > >> Cheers, >> >> Brian >> >> >> >> >> > I pushed a basic example of this (examples/heartbeat) to my pyzmq >> fork. >> >> > Running a ~3 second numpy.dot action, the heartbeat pings remain >> >> > responsive >> >> > at <1ms. >> >> >> >> This is great! >> >> >> >> Cheers, >> >> >> >> Brian >> >> > -MinRK >> >> > >> >> > On Mon, Jul 12, 2010 at 12:51, MinRK wrote: >> >> >> >> >> >> >> >> >> On Mon, Jul 12, 2010 at 09:15, Brian Granger >> >> >> wrote: >> >> >>> >> >> >>> On Fri, Jul 9, 2010 at 3:35 PM, MinRK >> wrote: >> >> >>> > Brian, >> >> >>> > Have you worked on the Heartbeat Device? Does that need to go in >> 0MQ >> >> >>> > itself, >> >> >>> >> >> >>> I have not. Ideally it could go into 0MQ itself. But, in >> principle, >> >> >>> we could do it in pyzmq. We just have to write a nogil pure C >> >> >>> function that uses the low-level C API to do the heartbeat. Then >> we >> >> >>> can just run that function in a thread with a "with nogil" block. >> >> >>> Shouldn't be too bad, given how simple the heartbeat logic is. The >> >> >>> main thing we will have to think about is how to start/stop the >> >> >>> heartbeat in a clean way. >> >> >>> >> >> >>> > or can it be part of pyzmq? >> >> >>> > I'm trying to work out how to really tell that an engine is down. >> >> >>> > Is the heartbeat to be in a separate process? >> >> >>> >> >> >>> No, just a separate C/C++ thread that doesn't hold the GIL. >> >> >>> >> >> >>> > Are we guaranteed that a zmq thread is responsive no matter what >> an >> >> >>> > engine >> >> >>> > process is doing? If that's the case, is a moderate timeout on >> recv >> >> >>> > adequate >> >> >>> > to determine engine failure? >> >> >>> >> >> >>> Yes, I think we can assume this. The only thing that would take >> the >> >> >>> 0mq thread down is something semi-fatal like a signal that doesn't >> get >> >> >>> handled. But as long as the 0MQ thread doesn't have any bugs, it >> >> >>> should simply keep running no matter what the other thread does >> (OK, >> >> >>> other than segfaulting) >> >> >>> >> >> >>> > If zmq threads are guaranteed to be responsive, it seems like a >> >> >>> > simple >> >> >>> > pair >> >> >>> > socket might be good enough, rather than needing a new device. Or >> >> >>> > even >> >> >>> > through the registration XREP socket. >> >> >>> >> >> >>> That (registration XREP socket) won't work unless we want to write >> all >> >> >>> that logic in C. >> >> >>> I don't know about a PAIR socket because of the need for multiple >> >> >>> clients? >> >> >> >> >> >> I wasn't thinking of a single PAIR socket, but rather a pair for >> each >> >> >> engine. We already have a pair for each engine for the queue, but I >> am >> >> >> not >> >> >> quite seeing the need for a special device beyond a PAIR socket in >> the >> >> >> heartbeat. >> >> >> >> >> >>> >> >> >>> > Can we formalize exactly what the heartbeat needs to be? >> >> >>> >> >> >>> OK, let's think. The engine needs to connect, the controller bind. >> >> >>> It would be nice if the controller didn't need a separate heartbeat >> >> >>> socket for each engine, but I guess we need the ability to track >> which >> >> >>> specific engine is heartbeating. Also, there is the question of >> to >> >> >>> do want to do a reqest/reply or pub/sub style heartbeat. What do >> you >> >> >>> think? >> >> >> >> >> >> The way we talked about it, the heartbeat needs to issue commands >> both >> >> >> ways. While it is used for checking whether an engine remains alive, >> it >> >> >> is >> >> >> also the avenue for aborting jobs. If we do have a strict >> heartbeat, >> >> >> then I >> >> >> think PUB/SUB is a good choice. >> >> >> However, if heartbeat is all it does, then we need a _third_ >> connection >> >> >> to >> >> >> each engine for control commands. Since messages cannot jump the >> queue, >> >> >> the >> >> >> engine queue PAIR socket cannot be used for commands, and a PUB/SUB >> >> >> model >> >> >> for heartbeat can _either_ receive commands _or_ have results. >> >> >> control commands: >> >> >> beat (check alive) >> >> >> abort (remove a task from the queue) >> >> >> signal (SIGINT, etc.) >> >> >> exit (engine.kill) >> >> >> reset (clear queue, namespace) >> >> >> more? >> >> >> It's possible that we could implement these with a PUB on the >> >> >> controller >> >> >> and a SUB on each engine, only interpreting results received via the >> >> >> queue's >> >> >> PAIR socket. But then every command would be sent to every engine, >> even >> >> >> though many would only be meant for one (too inefficient/costly?). >> It >> >> >> would >> >> >> however make the actual heartbeat command very simple as a single >> send. >> >> >> It does not allow for the engine to initiate queries of the >> controller, >> >> >> for instance a work stealing implementation. Again, it is possible >> that >> >> >> this >> >> >> could be implemented via the job queue PAIR socket, but that would >> only >> >> >> allow for stealing when completely starved for work, since the job >> >> >> queue and >> >> >> communication queue would be the same. >> >> >> There's also the issue of task dependency. >> >> >> If we are to implement dependency checking as we discussed (depend >> on >> >> >> taskIDs, and only execute once the task has been completed), the >> engine >> >> >> needs to be able to query the controller about the tasks depended >> upon. >> >> >> This >> >> >> makes the controller being the PUB side unworkable. >> >> >> This says to me that we need two-way connections between the engines >> >> >> and >> >> >> the controller. That can either be implemented as multiple >> connections >> >> >> (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each engine >> >> >> could >> >> >> provide the whole heartbeat/command channel. >> >> >> -MinRK >> >> >> >> >> >>> >> >> >>> Brian >> >> >>> >> >> >>> >> >> >>> > -MinRK >> >> >>> >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> Brian E. Granger, Ph.D. >> >> >>> Assistant Professor of Physics >> >> >>> Cal Poly State University, San Luis Obispo >> >> >>> bgranger at calpoly.edu >> >> >>> ellisonbg at gmail.com >> >> >> >> >> > >> >> > >> >> >> >> >> >> >> >> -- >> >> Brian E. Granger, Ph.D. >> >> Assistant Professor of Physics >> >> Cal Poly State University, San Luis Obispo >> >> bgranger at calpoly.edu >> >> ellisonbg at gmail.com >> > >> > >> >> >> >> -- >> Brian E. Granger, Ph.D. >> Assistant Professor of Physics >> Cal Poly State University, San Luis Obispo >> bgranger at calpoly.edu >> ellisonbg at gmail.com >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Tue Jul 13 23:08:03 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 13 Jul 2010 20:08:03 -0700 Subject: [IPython-dev] Fwd: [zeromq-dev] Authentication on "topic" In-Reply-To: References: <3127C7C2-4A7D-4DF7-8A62-42BFE6F12E0C@quant-edge.com>

Message-ID: On Mon, Jul 12, 2010 at 8:26 PM, Brian Granger wrote: > Just saw this on the 0MQ list about authentication and 0MQ. Interesting... > > Cheers, > > Brian > > > ---------- Forwarded message ---------- > From: Pieter Hintjens > Date: Mon, Jul 12, 2010 at 11:37 AM > Subject: Re: [zeromq-dev] Authentication on "topic" > To: 0MQ development list > > > Hi Viet, > > There is no plan to add authentication to ZeroMQ core. ?However we are > developing a data plant layer above ZeroMQ, which will do secure > distribution over multicast as well as TCP. ?It will use > request-response to do key distribution, and then clients will use > those keys to unlock streams of data. > I wonder if their model for authentication would be enough for us. Not sure quite yet... But at least it's good to see 0mq moving in this direction, even if it's just to see patterns of security to take ideas from. Cheers, f From fperez.net at gmail.com Tue Jul 13 23:11:32 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 13 Jul 2010 20:11:32 -0700 Subject: [IPython-dev] Porting to Python3 In-Reply-To: References: Message-ID: Hi Naoki, On Sat, Jul 10, 2010 at 2:18 AM, INADA Naoki wrote: > Today, Python hack-a-thon is held in Japan. > I've ported IPython to Python3 in there. > Some feature works now. > > http://github.com/methane/ipython Fantastic! This is great. Can you run the test suite? It should naturally skip the twisted parts but at least the other pieces should run. Can your fork run with 2.x as well? It would be good to keep compatibility with both, if possible, for a while... I know that's not what python originally recommended, but in practice I think it's a more realistic goal. Cheers, f From ellisonbg at gmail.com Tue Jul 13 23:57:31 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Tue, 13 Jul 2010 20:57:31 -0700 Subject: [IPython-dev] Heartbeat Device In-Reply-To: References:

Message-ID: Nice to know that works, but I don't think that will work for devices that use blocking recv calls. But it may. Brian On Tue, Jul 13, 2010 at 12:51 PM, MinRK wrote: > Re: not exiting without killing: > I just needed to add thread.setDaemon(True), so the device threads do exit > properly now. > committed/pushed to git. > > -MinRK > On Mon, Jul 12, 2010 at 22:10, MinRK wrote: >> >> >> On Mon, Jul 12, 2010 at 22:04, Brian Granger wrote: >>> >>> On Mon, Jul 12, 2010 at 9:49 PM, MinRK wrote: >>> > >>> > >>> > On Mon, Jul 12, 2010 at 20:43, Brian Granger >>> > wrote: >>> >> >>> >> Min, >>> >> >>> >> On Mon, Jul 12, 2010 at 4:10 PM, MinRK wrote: >>> >> > I've been thinking about this, and it seems like we can't have a >>> >> > responsive >>> >> > rich control connection unless it is in another process, like the >>> >> > old >>> >> > IPython daemon. >>> >> >>> >> I am not quite sure I follow what you mean by this. ?Can you >>> >> elaborate? >>> > >>> > The main advantage that we were to gain from the out-of-process >>> > ipdaemon was >>> > the ability to abort/kill (signal) blocking jobs. With 0MQ threads, the >>> > only >>> > logic we can have in a control/heartbeat thread must be implemented in >>> > GIL-free C/C++. That limits what we can do in terms of interacting with >>> > the >>> > main work thread, as I understand it. >>> >>> Yes, but I think it might be possible to spawn an external process to >>> send a signal back to the process. ?But I am not sure about this. >>> >>> >> >>> >> > Pure heartbeat is easy with a C device, and we may not even >>> >> > need a new one. For instance, I added support for the builtin >>> >> > devices of >>> >> > zeromq to pyzmq with a few lines, and you can have simple is_alive >>> >> > style >>> >> > heartbeat with a FORWARDER device. >>> >> >>> >> I looked at this and it looks very nice. ?I think for basic is_alive >>> >> type heartbeats this will work fine. ?The only thing to be careful of >>> >> is that 0MQ sockets are not thread safe. ?Thus, it would be best to >>> >> actually create the socket in the thread as well. ?But we do want the >>> >> flexibility to be able to pass in sockets to the device. ?We will have >>> >> to think about that issue. >>> > >>> > >>> > I wrote/pushed a basic ThreadsafeDevice, which creates/binds/connects >>> > inside >>> > the thread's run method. >>> > It adds bind_in/out, connect_in/out, and setsockopt_in/out methods >>> > which >>> > just queue up arguments to be called at the head of the run method. I >>> > added >>> > a tspong.py in the heartbeat example using it. >>> >>> Cool, I will review this and merge it into master. >>> >> >> I'd say it's not ready for master in one particular respect: The Device >> thread doesn't respond to signals, so I have to kill it to stop it. I >> haven't yet figured out why this is happening; it might be quite simple. >> I'll push up some unit tests tomorrow >> >>> >>> Cheers, >>> >>> Brian >>> >>> >> >>> >> > I pushed a basic example of this (examples/heartbeat) to my pyzmq >>> >> > fork. >>> >> > Running a ~3 second numpy.dot action, the heartbeat pings remain >>> >> > responsive >>> >> > at <1ms. >>> >> >>> >> This is great! >>> >> >>> >> Cheers, >>> >> >>> >> Brian >>> >> > -MinRK >>> >> > >>> >> > On Mon, Jul 12, 2010 at 12:51, MinRK wrote: >>> >> >> >>> >> >> >>> >> >> On Mon, Jul 12, 2010 at 09:15, Brian Granger >>> >> >> wrote: >>> >> >>> >>> >> >>> On Fri, Jul 9, 2010 at 3:35 PM, MinRK >>> >> >>> wrote: >>> >> >>> > Brian, >>> >> >>> > Have you worked on the Heartbeat Device? Does that need to go in >>> >> >>> > 0MQ >>> >> >>> > itself, >>> >> >>> >>> >> >>> I have not. ?Ideally it could go into 0MQ itself. ?But, in >>> >> >>> principle, >>> >> >>> we could do it in pyzmq. ?We just have to write a nogil pure C >>> >> >>> function that uses the low-level C API to do the heartbeat. ?Then >>> >> >>> we >>> >> >>> can just run that function in a thread with a "with nogil" block. >>> >> >>> Shouldn't be too bad, given how simple the heartbeat logic is. >>> >> >>> ?The >>> >> >>> main thing we will have to think about is how to start/stop the >>> >> >>> heartbeat in a clean way. >>> >> >>> >>> >> >>> > or can it be part of pyzmq? >>> >> >>> > I'm trying to work out how to really tell that an engine is >>> >> >>> > down. >>> >> >>> > Is the heartbeat to be in a separate process? >>> >> >>> >>> >> >>> No, just a separate C/C++ thread that doesn't hold the GIL. >>> >> >>> >>> >> >>> > Are we guaranteed that a zmq thread is responsive no matter what >>> >> >>> > an >>> >> >>> > engine >>> >> >>> > process is doing? If that's the case, is a moderate timeout on >>> >> >>> > recv >>> >> >>> > adequate >>> >> >>> > to determine engine failure? >>> >> >>> >>> >> >>> Yes, I think we can assume this. ?The only thing that would take >>> >> >>> the >>> >> >>> 0mq thread down is something semi-fatal like a signal that doesn't >>> >> >>> get >>> >> >>> handled. ?But as long as the 0MQ thread doesn't have any bugs, it >>> >> >>> should simply keep running no matter what the other thread does >>> >> >>> (OK, >>> >> >>> other than segfaulting) >>> >> >>> >>> >> >>> > If zmq threads are guaranteed to be responsive, it seems like a >>> >> >>> > simple >>> >> >>> > pair >>> >> >>> > socket might be good enough, rather than needing a new device. >>> >> >>> > Or >>> >> >>> > even >>> >> >>> > through the registration XREP socket. >>> >> >>> >>> >> >>> That (registration XREP socket) won't work unless we want to write >>> >> >>> all >>> >> >>> that logic in C. >>> >> >>> I don't know about a PAIR socket because of the need for multiple >>> >> >>> clients? >>> >> >> >>> >> >> I wasn't thinking of a single PAIR socket, but rather a pair for >>> >> >> each >>> >> >> engine. We already have a pair for each engine for the queue, but I >>> >> >> am >>> >> >> not >>> >> >> quite seeing the need for a special device beyond a PAIR socket in >>> >> >> the >>> >> >> heartbeat. >>> >> >> >>> >> >>> >>> >> >>> > Can we formalize exactly what the heartbeat needs to be? >>> >> >>> >>> >> >>> OK, let's think. ?The engine needs to connect, the controller >>> >> >>> bind. >>> >> >>> It would be nice if the controller didn't need a separate >>> >> >>> heartbeat >>> >> >>> socket for each engine, but I guess we need the ability to track >>> >> >>> which >>> >> >>> specific engine is heartbeating. ? Also, there is the question of >>> >> >>> to >>> >> >>> do want to do a reqest/reply or pub/sub style heartbeat. ?What do >>> >> >>> you >>> >> >>> think? >>> >> >> >>> >> >> The way we talked about it, the heartbeat needs to issue commands >>> >> >> both >>> >> >> ways. While it is used for checking whether an engine remains >>> >> >> alive, it >>> >> >> is >>> >> >> also the avenue for aborting jobs. ?If we do have a strict >>> >> >> heartbeat, >>> >> >> then I >>> >> >> think PUB/SUB is a good choice. >>> >> >> However, if heartbeat is all it does, then we need a _third_ >>> >> >> connection >>> >> >> to >>> >> >> each engine for control commands. Since messages cannot jump the >>> >> >> queue, >>> >> >> the >>> >> >> engine queue PAIR socket cannot be used for commands, and a PUB/SUB >>> >> >> model >>> >> >> for heartbeat can _either_ receive commands _or_ have results. >>> >> >> control commands: >>> >> >> beat (check alive) >>> >> >> abort (remove a task from the queue) >>> >> >> signal (SIGINT, etc.) >>> >> >> exit (engine.kill) >>> >> >> reset (clear queue, namespace) >>> >> >> more? >>> >> >> It's possible that we could implement these with a PUB on the >>> >> >> controller >>> >> >> and a SUB on each engine, only interpreting results received via >>> >> >> the >>> >> >> queue's >>> >> >> PAIR socket. But then every command would be sent to every engine, >>> >> >> even >>> >> >> though many would only be meant for one (too inefficient/costly?). >>> >> >> It >>> >> >> would >>> >> >> however make the actual heartbeat command very simple as a single >>> >> >> send. >>> >> >> It does not allow for the engine to initiate queries of the >>> >> >> controller, >>> >> >> for instance a work stealing implementation. Again, it is possible >>> >> >> that >>> >> >> this >>> >> >> could be implemented via the job queue PAIR socket, but that would >>> >> >> only >>> >> >> allow for stealing when completely starved for work, since the job >>> >> >> queue and >>> >> >> communication queue would be the same. >>> >> >> There's also the issue of task dependency. >>> >> >> If we are to implement dependency checking as we discussed (depend >>> >> >> on >>> >> >> taskIDs, and only execute once the task has been completed), the >>> >> >> engine >>> >> >> needs to be able to query the controller about the tasks depended >>> >> >> upon. >>> >> >> This >>> >> >> makes the controller being the PUB side unworkable. >>> >> >> This says to me that we need two-way connections between the >>> >> >> engines >>> >> >> and >>> >> >> the controller. That can either be implemented as multiple >>> >> >> connections >>> >> >> (PUB/SUB + PAIR or REQ/REP), or simply a PAIR socket for each >>> >> >> engine >>> >> >> could >>> >> >> provide the whole heartbeat/command channel. >>> >> >> -MinRK >>> >> >> >>> >> >>> >>> >> >>> Brian >>> >> >>> >>> >> >>> >>> >> >>> > -MinRK >>> >> >>> >>> >> >>> >>> >> >>> >>> >> >>> -- >>> >> >>> Brian E. Granger, Ph.D. >>> >> >>> Assistant Professor of Physics >>> >> >>> Cal Poly State University, San Luis Obispo >>> >> >>> bgranger at calpoly.edu >>> >> >>> ellisonbg at gmail.com >>> >> >> >>> >> > >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Brian E. Granger, Ph.D. >>> >> Assistant Professor of Physics >>> >> Cal Poly State University, San Luis Obispo >>> >> bgranger at calpoly.edu >>> >> ellisonbg at gmail.com >>> > >>> > >>> >>> >>> >>> -- >>> Brian E. Granger, Ph.D. >>> Assistant Professor of Physics >>> Cal Poly State University, San Luis Obispo >>> bgranger at calpoly.edu >>> ellisonbg at gmail.com >> > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From ellisonbg at gmail.com Wed Jul 14 01:05:57 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Tue, 13 Jul 2010 22:05:57 -0700 Subject: [IPython-dev] SciPy Sprint summary Message-ID: Hello all, We wanted to updated everyone on the IPython sprint at SciPy 2010. We had lots of people sprinting on IPython (something like 7-10), which was fantastic. Here are some highlights: * A number of us worked on PyZMQ itself, to prepare for IPython's relying on it more and more. * Justin Riley added a nice example to PyZMQ that uses 0MQ sockets to talk to a MongoDB based key-value store. * Min Ragan-Kelley and Brian Granger added a Tornado compatible event loop to PyZMQ. This event loop will help us refactor the Twisted using parts of IPython to use PyZMQ instead. * Min created a nice PyZMQ based log handler for the logging module. This makes it easy to build distributed logging systems using the publish/subscript sockets of 0MQ. We will be using this throughout IPython. * We spent a considerable amount of time discussing how to port the IPython parallel computing platform from Twisted to PyZMQ. Min started coding a prototype task controller using PyZMQ. * Justin Riley created a very nice diagram illustrating the design of the new PyZMQ based kernel/frontend architecture for IPython. * Everyone helped code a new interface that will allow various IPython frontends to interact with the IPython kernel using PyZMQ. * Fernando Perez and Jonathan March worked on the git workflow and on getting Jonathan set up for patch management. * Fernando Perez and Robert Kern worked on the message specification for the JSON message format will are starting to use. Much of this work will be hitting master over the summer. Thanks to everyone for helping out and I apologize if I forgot anyone or anything. Cheers, Brian -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From ellisonbg at gmail.com Wed Jul 14 01:08:35 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Tue, 13 Jul 2010 22:08:35 -0700 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: Message-ID: Here is a link to the nice diagram of the kernel/frontend design that Justin did: http://github.com/ipython/ipython/commit/e21b32e89a634cb1393fd54c1a5657f63f40b1ff Thanks Justin! Cheers, Brian On Tue, Jul 13, 2010 at 10:05 PM, Brian Granger wrote: > Hello all, > > We wanted to updated everyone on the IPython sprint at SciPy 2010. ?We > had lots of people sprinting on IPython (something like 7-10), which > was fantastic. ?Here are some highlights: > > * A number of us worked on PyZMQ itself, to prepare for IPython's > relying on it more and more. > * Justin Riley added a nice example to PyZMQ that uses 0MQ sockets to > talk to a MongoDB based key-value store. > * Min Ragan-Kelley and Brian Granger added a Tornado compatible event > loop to PyZMQ. ?This event loop will help us refactor the Twisted > using parts of IPython to use PyZMQ instead. > * Min created a nice PyZMQ based log handler for the logging module. > This makes it easy to build distributed logging systems using the > publish/subscript sockets of 0MQ. ?We will be using this throughout > IPython. > * We spent a considerable amount of time discussing how to port the > IPython parallel computing platform from Twisted to PyZMQ. ?Min > started coding a prototype task controller using PyZMQ. > * Justin Riley created a very nice diagram illustrating the design of > the new PyZMQ based kernel/frontend architecture for IPython. > * Everyone helped code a new interface that will allow various IPython > frontends to interact with the IPython kernel using PyZMQ. > * Fernando Perez and Jonathan March worked on the git workflow and on > getting Jonathan set up for patch management. > * Fernando Perez and Robert Kern worked on the message specification > for the JSON message format will are starting to use. > > Much of this work will be hitting master over the summer. ?Thanks to > everyone for helping out and I apologize if I forgot anyone or > anything. > > Cheers, > > Brian > > -- > Brian E. Granger, Ph.D. > Assistant Professor of Physics > Cal Poly State University, San Luis Obispo > bgranger at calpoly.edu > ellisonbg at gmail.com > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From fperez.net at gmail.com Wed Jul 14 03:38:07 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 14 Jul 2010 00:38:07 -0700 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: Message-ID: On Tue, Jul 13, 2010 at 10:05 PM, Brian Granger wrote: > Hello all, > > We wanted to updated everyone on the IPython sprint at SciPy 2010. ?We > had lots of people sprinting on IPython (something like 7-10), which > was fantastic. ?Here are some highlights: [...] Sorry, we forgot to include: * Omar Zapata, one of the two IPython Google Summer of Code students who was present at the sprints, made progress on the terminal-based zmq interactive frontend, and implemented the extra socket design to support calls to raw_input() in the kernel. Cheers, f From fperez.net at gmail.com Wed Jul 14 03:43:14 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 14 Jul 2010 00:43:14 -0700 Subject: [IPython-dev] %run -d is broken in Python 2.7 In-Reply-To: <1213230248.20100713052612@mail.mipt.ru> References: <1213230248.20100713052612@mail.mipt.ru> Message-ID: 2010/7/12 vano : > After thorough investigation, it turned out a pdb issue (details are > on the link), so i filed a bug there (http://bugs.python.org/issue9230) as > well as a bugfix. > > If any of you have write access to python source, you can help me to get > it fixed quickly. Ouch, thanks for finding this and providing the pdb patch. Unfortunately I don't have write access to Python itself (I have 2-year old patches lingering in the python tracker, I'm afraid). If you can make a (most likely ugly) monkeypatch at runtime to fix this from the IPython side, we'll include that. There's a good chance this will take forever to fix in Python itself, so carrying our own version-checked ugly fix is better than having broken functionality for 2.7 users. I imagine that grabbing the pdb instance and injecting a frame object into it will do the trick, from looking at your traceback. If you make such a fix, just post a pull request for us or a patch, as you prefer: http://ipython.scipy.org/doc/nightly/html/development/gitwash/index.html and we'll be happy to include it. Cheers, f From fperez.net at gmail.com Wed Jul 14 03:50:21 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 14 Jul 2010 00:50:21 -0700 Subject: [IPython-dev] debugger.py refactoring In-Reply-To: References: Message-ID: Hi David, On Thu, Jul 8, 2010 at 4:52 PM, David Warde-Farley wrote: > > I was just wondering (I didn't see a roadmap anywhere but then again I didn't look very hard) if a refactoring was planned for IPython/core/debugger.py, in particular to make it more extensible to third party tools. I just hacked in support for Andreas Kloeckner's pudb ( http://pypi.python.org/pypi/pudb ) but it wasn't pretty in the least. I guess some sort of 'debugger registry' would make sense, that a user could call into from their ipy_user_conf.py in order to hook up their favourite debugger's post-mortem mode? > > This is all just fanciful thinking aloud, but if no one's planning on doing anything to debugger.py in the near future I might give it a try when I get back into town next week. certainly this kind of improvement for integration with other tools is always welcome. Could you post your fixes as either an attached patch or a github pull request, whatever you find most convenient? Some directions: http://ipython.scipy.org/doc/nightly/html/development/gitwash/index.html Cheers, f From P.Schellart at astro.ru.nl Wed Jul 14 04:16:51 2010 From: P.Schellart at astro.ru.nl (Pim Schellart) Date: Wed, 14 Jul 2010 10:16:51 +0200 Subject: [IPython-dev] Error when running ipcluster Message-ID: Dear IPython developers, I would like to use IPython to do some basic parallelization. However when I execute ipcluster to setup a controller and some engines I get the following error: ~ $ ipcluster Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/2.6/bin/ipcluster", line 16, in from IPython.kernel.ipclusterapp import launch_new_instance ImportError: No module named ipclusterapp I have installed all dependencies listed as required for the parallel computing tasks. When building IPython (0.10) I they are found as follows: BUILDING IPYTHON python: 2.6.5 (r265:79063, Jun 6 2010, 11:37:41) [GCC 4.2.1 (Apple Inc. build 5659)] platform: darwin OPTIONAL DEPENDENCIES Zope.Interface: yes Twisted: 10.1.0 Foolscap: 0.5.1 OpenSSL: 0.6 sphinx: 1.0b2 pygments: 1.3.1 nose: Not found (required for running the test suite) pexpect: no (required for running standalone doctests) Any idea what is going wrong here? Kind regards, Pim Schellart From fperez.net at gmail.com Wed Jul 14 14:24:27 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 14 Jul 2010 11:24:27 -0700 Subject: [IPython-dev] Error when running ipcluster In-Reply-To: References: Message-ID: Hi Pim, On Wed, Jul 14, 2010 at 1:16 AM, Pim Schellart wrote: > > I would like to use IPython to do some basic parallelization. > However when I execute ipcluster to setup a controller and some > engines I get the following error: > > ~ $ ipcluster > Traceback (most recent call last): > ?File "/Library/Frameworks/Python.framework/Versions/2.6/bin/ipcluster", > line 16, in > ? ?from IPython.kernel.ipclusterapp import launch_new_instance > ImportError: No module named ipclusterapp [...] > Any idea what is going wrong here? That script is a 0.11 series startup script, while you mention you are running 0.10. It seems you've somehow mixed up the installation of the 0.10 and the 0.11 versions of IPython... It's possible you have in your $path a 0.11 startup ipcluster script, but the version of the IPython package in $pythonpath is the 0.10 version... I'd suggest cleaning up the combination and reinstalling, somehow you have a weird hybrid... Cheers, f From Fernando.Perez at berkeley.edu Wed Jul 14 15:17:43 2010 From: Fernando.Perez at berkeley.edu (Fernando Perez) Date: Wed, 14 Jul 2010 12:17:43 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References: Message-ID: Hi Evan, [ quick note, I'm cc-ing the ipython-dev list so these technical discussions on the new code happen there, so other developers benefit as well ] On Wed, Jul 14, 2010 at 12:10, Brian Granger wrote: > On Wed, Jul 14, 2010 at 10:56 AM, Evan Patterson wrote: >> Hi guys, >> >> I've been making decent progress at connecting my FrontendWidget to a >> KernelManager. I have, however, encountered one fairly serious problem: >> since the XREQ and SUB channels of the KernelManager are in separate >> threads, there is no guarantee about the order in which signals are emitted. >> I'm finding that 'execute_reply' signals are frequently emitted *before* all >> the output signals have been emitted. > > Yes, that is definitely possible and we really don't have control over > it. ?Part of the difficulty is that the SUB/SUB channel does buffering > of stdout/stderr (just like sys.stdout/sys.stderr). ?While it will > make your application logic more difficult, I think this is something > fundamental we have to live with. ?Also, I wouldn't be surprised if > the same were true of the regular python shell because of the > buffering of stdout. > >> It seems to me that we should be enforcing, to the extent that we can (i.e. >> ignoring threads in the kernel for now), the assumption that when >> 'execute_reply' is signaled, all output has been signaled. Is this >> reasonable? > > I don't think so. ?I would write the frontend to allow for arbitrary > timing of the execute_reply and the SUB messages. ?You will have to > use the parent_id information to order things properly in the GUI. > Does this make sense. ?I think if we try to impose the timing of the > signals, we will end up breaking the model or introducing extra > latencies. ?Let us know if you have questions. ?I know this will > probably be one of the more subtle parts of the frontend logic. > > Cheers, > > Brian > > > > >> Evan >> > > > > -- > Brian E. Granger, Ph.D. > Assistant Professor of Physics > Cal Poly State University, San Luis Obispo > bgranger at calpoly.edu > ellisonbg at gmail.com > I was in the middle of writing my reply when Brian's arrived pretty much along the same lines :) The parent_id info is the key: clients should have enough information to reconstruct the chain of messages with this, because every message effectively has a 'pointer' to its parent. It's possible that we may need to extend the message spec with a bit more data to make this easier, if you spot anything along those lines we can look into it. We cobbled together that message spec very quickly, so it should be considered by no means final. But the key idea is always: a client makes a request, and all outputs that are products of honoring this request (on stdout/err, pyout, etc) should have enough info in the messages to trace them back to that original cause. With this, the client should be able to put the output in the right places as it arrives, since it can reconstruct what output goes with what input. The simplest example of that is what we showed you with two terminal clients talking to the same kernel, where each client would show the other's inputs and outputs with [OUT from ...] messages. The client was receiving *all* outputs on its SUB socket, and disentangling what came from its own inputs vs what was input/output from other clients running simultaneously. Let us know if this is clear... Cheers, f From epatters at enthought.com Wed Jul 14 17:21:18 2010 From: epatters at enthought.com (Evan Patterson) Date: Wed, 14 Jul 2010 16:21:18 -0500 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References: Message-ID: On Wed, Jul 14, 2010 at 2:10 PM, Brian Granger wrote: > On Wed, Jul 14, 2010 at 10:56 AM, Evan Patterson > wrote: > > Hi guys, > > > > I've been making decent progress at connecting my FrontendWidget to a > > KernelManager. I have, however, encountered one fairly serious problem: > > since the XREQ and SUB channels of the KernelManager are in separate > > threads, there is no guarantee about the order in which signals are > emitted. > > I'm finding that 'execute_reply' signals are frequently emitted *before* > all > > the output signals have been emitted. > > Yes, that is definitely possible and we really don't have control over > it. Part of the difficulty is that the SUB/SUB channel does buffering > of stdout/stderr (just like sys.stdout/sys.stderr). While it will > make your application logic more difficult, I think this is something > fundamental we have to live with. Also, I wouldn't be surprised if > the same were true of the regular python shell because of the > buffering of stdout. > I'm not sure it's fair to call this problem fundamental (if we ignore the corner case of the threads in the kernel). After all, output and execution completion happen in a very predictable order in the kernel; it's only our use of multiple frontend-side channel threads that has complicated the issue. In a regular same-process shell, this wouldn't be a problem because you would simply flush stdout before writing the new prompt. It makes sense to be able to request a flush here, I think. A 'flush' in this case would just consist of the making the SubChannel thread active, so that its event loop would pick up whatever it needs to. I believe calling time.sleep(0) once in the XReqChannel before sending an execute reply will be sufficient. The latency introduced should be negligible. I'll experiment with this. > > It seems to me that we should be enforcing, to the extent that we can > (i.e. > > ignoring threads in the kernel for now), the assumption that when > > 'execute_reply' is signaled, all output has been signaled. Is this > > reasonable? > > I don't think so. I would write the frontend to allow for arbitrary > timing of the execute_reply and the SUB messages. You will have to > use the parent_id information to order things properly in the GUI. > Does this make sense. I think if we try to impose the timing of the > signals, we will end up breaking the model or introducing extra > latencies. Let us know if you have questions. I know this will > probably be one of the more subtle parts of the frontend logic. > Yes, this is something that will be quite difficult to get right. For frontend implementors who are interested only in console-style interaction, it doesn't make sense for them to have worry about this. Evan > > Cheers, > > Brian > > > > > > Evan > > > > > > -- > Brian E. Granger, Ph.D. > Assistant Professor of Physics > Cal Poly State University, San Luis Obispo > bgranger at calpoly.edu > ellisonbg at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From justin.t.riley at gmail.com Thu Jul 15 10:49:05 2010 From: justin.t.riley at gmail.com (Justin Riley) Date: Thu, 15 Jul 2010 10:49:05 -0400 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: Message-ID: <4C3F1FE1.4040000@gmail.com> On 07/14/2010 01:05 AM, Brian Granger wrote: > Hello all, > > We wanted to updated everyone on the IPython sprint at SciPy 2010. We > had lots of people sprinting on IPython (something like 7-10), which > was fantastic. Here are some highlights: > Also just wanted to mention, for those using the EC2 cloud, that during SciPy I also added a new plugin to the StarCluster project (http://web.mit.edu/starcluster) that will automatically configure and launch ipcluster on EC2: http://github.com/jtriley/StarCluster/blob/master/starcluster/plugins/ipcluster.py The ipcluster plugin will be released in the next version coming out soon. For those unfamiliar, StarCluster creates/configures scientific computing clusters on EC2. The clusters launched have MPI and Sun Grid Engine as well as NumPy/SciPy installations compiled against an ATLAS/LAPACK that has been optimized for the 8-core instance types. Thanks, ~Justin From epatters at enthought.com Thu Jul 15 12:23:57 2010 From: epatters at enthought.com (Evan Patterson) Date: Thu, 15 Jul 2010 11:23:57 -0500 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: I've added a 'flush' method to the KernelManager here: http://github.com/epatters/ipython/commit/2ecde29e8f2a5e7012236f61819b2f7833248553 It works, although there may be a more intelligent way to do it. That being said, I tried a number of different things, and none of the others worked. Brian: since the 'flush' method must be called explicitly by clients, this won't break our model or extra induce latencies for clients that want to take a more sophisticated approach to SUB channel monitoring. Evan On Wed, Jul 14, 2010 at 4:21 PM, Evan Patterson wrote: > On Wed, Jul 14, 2010 at 2:10 PM, Brian Granger wrote: > >> On Wed, Jul 14, 2010 at 10:56 AM, Evan Patterson >> wrote: >> > Hi guys, >> > >> > I've been making decent progress at connecting my FrontendWidget to a >> > KernelManager. I have, however, encountered one fairly serious problem: >> > since the XREQ and SUB channels of the KernelManager are in separate >> > threads, there is no guarantee about the order in which signals are >> emitted. >> > I'm finding that 'execute_reply' signals are frequently emitted *before* >> all >> > the output signals have been emitted. >> >> Yes, that is definitely possible and we really don't have control over >> it. Part of the difficulty is that the SUB/SUB channel does buffering >> of stdout/stderr (just like sys.stdout/sys.stderr). While it will >> make your application logic more difficult, I think this is something >> fundamental we have to live with. Also, I wouldn't be surprised if >> the same were true of the regular python shell because of the >> buffering of stdout. >> > > I'm not sure it's fair to call this problem fundamental (if we ignore the > corner case of the threads in the kernel). After all, output and execution > completion happen in a very predictable order in the kernel; it's only our > use of multiple frontend-side channel threads that has complicated the > issue. > > In a regular same-process shell, this wouldn't be a problem because you > would simply flush stdout before writing the new prompt. It makes sense to > be able to request a flush here, I think. A 'flush' in this case would just > consist of the making the SubChannel thread active, so that its event loop > would pick up whatever it needs to. I believe calling time.sleep(0) once in > the XReqChannel before sending an execute reply will be sufficient. The > latency introduced should be negligible. I'll experiment with this. > > >> > It seems to me that we should be enforcing, to the extent that we can >> (i.e. >> > ignoring threads in the kernel for now), the assumption that when >> > 'execute_reply' is signaled, all output has been signaled. Is this >> > reasonable? >> >> I don't think so. I would write the frontend to allow for arbitrary >> timing of the execute_reply and the SUB messages. You will have to >> use the parent_id information to order things properly in the GUI. >> Does this make sense. I think if we try to impose the timing of the >> signals, we will end up breaking the model or introducing extra >> latencies. Let us know if you have questions. I know this will >> probably be one of the more subtle parts of the frontend logic. >> > > Yes, this is something that will be quite difficult to get right. For > frontend implementors who are interested only in console-style interaction, > it doesn't make sense for them to have worry about this. > > Evan > > >> >> Cheers, >> >> Brian >> >> >> >> >> > Evan >> > >> >> >> >> -- >> Brian E. Granger, Ph.D. >> Assistant Professor of Physics >> Cal Poly State University, San Luis Obispo >> bgranger at calpoly.edu >> ellisonbg at gmail.com >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Thu Jul 15 13:34:36 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Thu, 15 Jul 2010 10:34:36 -0700 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: <4C3F1FE1.4040000@gmail.com> References: <4C3F1FE1.4040000@gmail.com> Message-ID: Justin, Thanks for the post. You should also know that it looks like someone is going to add native SGE support to ipcluster for 0.10.1. This should allow the starting of the engines on the compute nodes using SGE. I was quite excited with Amazon's announcement that they were adding a new HPC instance type. Sounds killer. Cheers, Brian On Thu, Jul 15, 2010 at 7:49 AM, Justin Riley wrote: > On 07/14/2010 01:05 AM, Brian Granger wrote: >> Hello all, >> >> We wanted to updated everyone on the IPython sprint at SciPy 2010. ?We >> had lots of people sprinting on IPython (something like 7-10), which >> was fantastic. ?Here are some highlights: >> > > Also just wanted to mention, for those using the EC2 cloud, that during > SciPy I also added a new plugin to the StarCluster project > (http://web.mit.edu/starcluster) that will automatically configure and > launch ipcluster on EC2: > > http://github.com/jtriley/StarCluster/blob/master/starcluster/plugins/ipcluster.py > > The ipcluster plugin will be released in the next version coming out soon. > > > For those unfamiliar, StarCluster creates/configures scientific > computing clusters on EC2. The clusters launched have MPI and Sun Grid > Engine as well as NumPy/SciPy installations compiled against an > ATLAS/LAPACK that has been optimized for the 8-core instance types. > > Thanks, > > ~Justin > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From ellisonbg at gmail.com Thu Jul 15 13:44:10 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Thu, 15 Jul 2010 10:44:10 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: On Thu, Jul 15, 2010 at 9:23 AM, Evan Patterson wrote: > I've added a 'flush' method to the KernelManager here: > > http://github.com/epatters/ipython/commit/2ecde29e8f2a5e7012236f61819b2f7833248553 > > It works, although there may be a more intelligent way to do it. That being > said, I tried a number of different things, and none of the others worked. The only issue that I see with this is that if the SUB channel keeps getting incoming message, flush will not return immediately. > Brian: since the 'flush' method must be called explicitly by clients, this > won't break our model or extra induce latencies for clients that want to > take a more sophisticated approach to SUB channel monitoring. That is true, so I think it this helps you to get going, it is worth using for now. But, I still don't see why we reorder the messages in the frontend based on the parent_ids. Just so you know, Fernando and I have set aside time starting this Sunday to work extensively on this. At that time we can talk more about this issue. Cheers, Brian > Evan > > On Wed, Jul 14, 2010 at 4:21 PM, Evan Patterson > wrote: >> >> On Wed, Jul 14, 2010 at 2:10 PM, Brian Granger >> wrote: >>> >>> On Wed, Jul 14, 2010 at 10:56 AM, Evan Patterson >>> wrote: >>> > Hi guys, >>> > >>> > I've been making decent progress at connecting my FrontendWidget to a >>> > KernelManager. I have, however, encountered one fairly serious problem: >>> > since the XREQ and SUB channels of the KernelManager are in separate >>> > threads, there is no guarantee about the order in which signals are >>> > emitted. >>> > I'm finding that 'execute_reply' signals are frequently emitted >>> > *before* all >>> > the output signals have been emitted. >>> >>> Yes, that is definitely possible and we really don't have control over >>> it. ?Part of the difficulty is that the SUB/SUB channel does buffering >>> of stdout/stderr (just like sys.stdout/sys.stderr). ?While it will >>> make your application logic more difficult, I think this is something >>> fundamental we have to live with. ?Also, I wouldn't be surprised if >>> the same were true of the regular python shell because of the >>> buffering of stdout. >> >> I'm not sure it's fair to call this problem fundamental (if we ignore the >> corner case of the threads in the kernel). After all, output and execution >> completion happen in a very predictable order in the kernel; it's only our >> use of multiple frontend-side channel threads that has complicated the >> issue. >> >> In a regular same-process shell, this wouldn't be a problem because you >> would simply flush stdout before writing the new prompt. It makes sense to >> be able to request a flush here, I think. A 'flush' in this case would just >> consist of the making the SubChannel thread active, so that its event loop >> would pick up whatever it needs to. I believe calling time.sleep(0) once in >> the XReqChannel before sending an execute reply will be sufficient. The >> latency introduced should be negligible. I'll experiment with this. >> >>> >>> > It seems to me that we should be enforcing, to the extent that we can >>> > (i.e. >>> > ignoring threads in the kernel for now), the assumption that when >>> > 'execute_reply' is signaled, all output has been signaled. Is this >>> > reasonable? >>> >>> I don't think so. ?I would write the frontend to allow for arbitrary >>> timing of the execute_reply and the SUB messages. ?You will have to >>> use the parent_id information to order things properly in the GUI. >>> Does this make sense. ?I think if we try to impose the timing of the >>> signals, we will end up breaking the model or introducing extra >>> latencies. ?Let us know if you have questions. ?I know this will >>> probably be one of the more subtle parts of the frontend logic. >> >> Yes, this is something that will be quite difficult to get right. For >> frontend implementors who are interested only in console-style interaction, >> it doesn't make sense for them to have worry about this. >> >> Evan >> >>> >>> Cheers, >>> >>> Brian >>> >>> >>> >>> >>> > Evan >>> > >>> >>> >>> >>> -- >>> Brian E. Granger, Ph.D. >>> Assistant Professor of Physics >>> Cal Poly State University, San Luis Obispo >>> bgranger at calpoly.edu >>> ellisonbg at gmail.com >> > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From epatters at enthought.com Thu Jul 15 14:00:12 2010 From: epatters at enthought.com (Evan Patterson) Date: Thu, 15 Jul 2010 13:00:12 -0500 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: On Thu, Jul 15, 2010 at 12:44 PM, Brian Granger wrote: > On Thu, Jul 15, 2010 at 9:23 AM, Evan Patterson > wrote: > > I've added a 'flush' method to the KernelManager here: > > > > > http://github.com/epatters/ipython/commit/2ecde29e8f2a5e7012236f61819b2f7833248553 > > > > It works, although there may be a more intelligent way to do it. That > being > > said, I tried a number of different things, and none of the others > worked. > > The only issue that I see with this is that if the SUB channel keeps > getting incoming message, flush will not return immediately. > > > Brian: since the 'flush' method must be called explicitly by clients, > this > > won't break our model or extra induce latencies for clients that want to > > take a more sophisticated approach to SUB channel monitoring. > > That is true, so I think it this helps you to get going, it is worth > using for now. But, I still don't see why we reorder the messages in > the frontend based on the parent_ids. Just so you know, Fernando and > I have set aside time starting this Sunday to work extensively on > this. At that time we can talk more about this issue. > Just to clarify: the issue isn't so much that the message themselves have to be reordered, but what this implies for the text widget update. Currently, I more or less blindly append text the end of text widget buffer as I go. To support arbitrary order insertion, I would have to have a mechanism whereby blocks of texts are tagged according to the message that they correspond to. Then, whenever output messages come in, I would have to find the correct spot to insert them. Since this is considerably more complex than just calling 'flush', doing this the "right" way is not a priority until more important things get done. Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Thu Jul 15 14:07:08 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Thu, 15 Jul 2010 11:07:08 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: Evan, On Wed, Jul 14, 2010 at 2:21 PM, Evan Patterson wrote: > On Wed, Jul 14, 2010 at 2:10 PM, Brian Granger wrote: >> >> On Wed, Jul 14, 2010 at 10:56 AM, Evan Patterson >> wrote: >> > Hi guys, >> > >> > I've been making decent progress at connecting my FrontendWidget to a >> > KernelManager. I have, however, encountered one fairly serious problem: >> > since the XREQ and SUB channels of the KernelManager are in separate >> > threads, there is no guarantee about the order in which signals are >> > emitted. >> > I'm finding that 'execute_reply' signals are frequently emitted *before* >> > all >> > the output signals have been emitted. >> >> Yes, that is definitely possible and we really don't have control over >> it. ?Part of the difficulty is that the SUB/SUB channel does buffering >> of stdout/stderr (just like sys.stdout/sys.stderr). ?While it will >> make your application logic more difficult, I think this is something >> fundamental we have to live with. ?Also, I wouldn't be surprised if >> the same were true of the regular python shell because of the >> buffering of stdout. > > I'm not sure it's fair to call this problem fundamental (if we ignore the > corner case of the threads in the kernel). After all, output and execution > completion happen in a very predictable order in the kernel; it's only our > use of multiple frontend-side channel threads that has complicated the > issue. But this leaves out one of the main causes of unpredictability in a distributed system: network and IO latency. I our architecture, this occurs when we ask 0MQ to send a message. At that point, it is up to 0MQ, the OS Kernel, the network stack (including routers, etc.) to get deliver the messages in the best way they can. In a multi-channel model like this, there is simple no promise that the order in which we send messages is the order in which tey will arrive. This is a fundamental issue that I consider a feature of our current architecture, because, in my experience, if you artificially try to impose determinacy on network traffic you end up with extremely awkward error handing. > In a regular same-process shell, this wouldn't be a problem because you > would simply flush stdout before writing the new prompt. It makes sense to > be able to request a flush here, I think. A 'flush' in this case would just > consist of the making the SubChannel thread active, so that its event loop > would pick up whatever it needs to. I believe calling time.sleep(0) once in > the XReqChannel before sending an execute reply will be sufficient. The > latency introduced should be negligible. I'll experiment with this. OK, I think this is worth a shot. >> >> > It seems to me that we should be enforcing, to the extent that we can >> > (i.e. >> > ignoring threads in the kernel for now), the assumption that when >> > 'execute_reply' is signaled, all output has been signaled. Is this >> > reasonable? Because of the networking issues, I don't think so. >> I don't think so. ?I would write the frontend to allow for arbitrary >> timing of the execute_reply and the SUB messages. ?You will have to >> use the parent_id information to order things properly in the GUI. >> Does this make sense. ?I think if we try to impose the timing of the >> signals, we will end up breaking the model or introducing extra >> latencies. ?Let us know if you have questions. ?I know this will >> probably be one of the more subtle parts of the frontend logic. > > Yes, this is something that will be quite difficult to get right. For > frontend implementors who are interested only in console-style interaction, > it doesn't make sense for them to have worry about this. Definitely hard to get right and terminal based frontends will definitely need something like flush. Let's see how it goes with this approach. Brian > Evan > >> >> Cheers, >> >> Brian >> >> >> >> >> > Evan >> > >> >> >> >> -- >> Brian E. Granger, Ph.D. >> Assistant Professor of Physics >> Cal Poly State University, San Luis Obispo >> bgranger at calpoly.edu >> ellisonbg at gmail.com > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From fperez.net at gmail.com Thu Jul 15 15:34:37 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 15 Jul 2010 12:34:37 -0700 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: <4C3F1FE1.4040000@gmail.com> Message-ID: On Thu, Jul 15, 2010 at 10:34 AM, Brian Granger wrote: > Thanks for the post. ?You should also know that it looks like someone > is going to add native SGE support to ipcluster for 0.10.1. Yes, Satra and I went over this last night in detail (thanks to Brian for the pointers), and he said he might actually already have some code for it. I suspect we'll get this in soon. Cheers, f From fperez.net at gmail.com Thu Jul 15 16:22:40 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 15 Jul 2010 13:22:40 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: On Thu, Jul 15, 2010 at 11:07 AM, Brian Granger wrote: > > Definitely hard to get right and terminal based frontends will > definitely need something like flush. ?Let's see how it goes with this > approach. Though absent a real event loop with a callback model, there it will need to be implemented with a real sleep(epsilon) and a total timeout. Terminal frontends will always simply be bound to flushing what they can and then moving on if nothing has come in a given window they wait for. Such is life when your 'event loop' is the human hitting the RETURN key... Evan, quick question: when I open your frontend_widget, I see 100% cpu utilization all the time. Do you see this on your end? Cheers, f From justin.t.riley at gmail.com Thu Jul 15 16:33:32 2010 From: justin.t.riley at gmail.com (Justin Riley) Date: Thu, 15 Jul 2010 16:33:32 -0400 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: <4C3F1FE1.4040000@gmail.com> Message-ID: <4C3F709C.5080505@gmail.com> This is great news. Right now StarCluster just takes advantage of password-less ssh already being installed and runs: $ ipcluster ssh --clusterfile /path/to/cluster_file.py This works fine for now, however, having SGE support would allow ipcluster's load to be accounted for by the queue. Is Satra on the list? I have experience with SGE and could help with the code if needed. I can also help test this functionality. ~Justin On 07/15/2010 03:34 PM, Fernando Perez wrote: > On Thu, Jul 15, 2010 at 10:34 AM, Brian Granger wrote: >> Thanks for the post. You should also know that it looks like someone >> is going to add native SGE support to ipcluster for 0.10.1. > > Yes, Satra and I went over this last night in detail (thanks to Brian > for the pointers), and he said he might actually already have some > code for it. I suspect we'll get this in soon. > > Cheers, > > f From justin.t.riley at gmail.com Thu Jul 15 16:40:02 2010 From: justin.t.riley at gmail.com (Justin Riley) Date: Thu, 15 Jul 2010 16:40:02 -0400 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: <4C3F1FE1.4040000@gmail.com> Message-ID: <4C3F7222.6020902@gmail.com> Brian, > I was quite excited with Amazon's announcement that they were > adding a new HPC instance type. Sounds killer. Same here, this is very exciting. The new HPC instance type packs a serious punch especially with regards to network latency between machines which has really been the main problem for folks running MPI on EC2. I'll be working on getting support for the new HPC instance type in StarCluster soon. ~Justin On 07/15/2010 01:34 PM, Brian Granger wrote: > Justin, > > Thanks for the post. You should also know that it looks like someone > is going to add native SGE support to ipcluster for 0.10.1. This > should allow the starting of the engines on the compute nodes using > SGE. I was quite excited with Amazon's announcement that they were > adding a new HPC instance type. Sounds killer. > > Cheers, > > Brian > > On Thu, Jul 15, 2010 at 7:49 AM, Justin Riley wrote: >> On 07/14/2010 01:05 AM, Brian Granger wrote: >>> Hello all, >>> >>> We wanted to updated everyone on the IPython sprint at SciPy 2010. We >>> had lots of people sprinting on IPython (something like 7-10), which >>> was fantastic. Here are some highlights: >>> >> >> Also just wanted to mention, for those using the EC2 cloud, that during >> SciPy I also added a new plugin to the StarCluster project >> (http://web.mit.edu/starcluster) that will automatically configure and >> launch ipcluster on EC2: >> >> http://github.com/jtriley/StarCluster/blob/master/starcluster/plugins/ipcluster.py >> >> The ipcluster plugin will be released in the next version coming out soon. >> >> >> For those unfamiliar, StarCluster creates/configures scientific >> computing clusters on EC2. The clusters launched have MPI and Sun Grid >> Engine as well as NumPy/SciPy installations compiled against an >> ATLAS/LAPACK that has been optimized for the 8-core instance types. >> >> Thanks, >> >> ~Justin >> > > > From epatters at enthought.com Thu Jul 15 17:24:12 2010 From: epatters at enthought.com (Evan Patterson) Date: Thu, 15 Jul 2010 16:24:12 -0500 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: On Thu, Jul 15, 2010 at 3:22 PM, Fernando Perez wrote: > On Thu, Jul 15, 2010 at 11:07 AM, Brian Granger > wrote: > > > > Definitely hard to get right and terminal based frontends will > > definitely need something like flush. Let's see how it goes with this > > approach. > > Though absent a real event loop with a callback model, there it will > need to be implemented with a real sleep(epsilon) and a total timeout. > Terminal frontends will always simply be bound to flushing what they > can and then moving on if nothing has come in a given window they wait > for. Such is life when your 'event loop' is the human hitting the > RETURN key... > This may have been lost in the stream of messages, but you can see my current implementation of flush here: http://github.com/epatters/ipython/commit/2ecde29e8f2a5e7012236f61819b2f7833248553 I'm not sure if my approach is better or worse than a using an epsilon for sleep. > > Evan, quick question: when I open your frontend_widget, I see 100% cpu > utilization all the time. Do you see this on your end? > I hadn't noticed this before (probably because I never pay attention to what my CPU utilization is), but I am seeing this on my end. Thanks for pointing it out; I'll look into it. Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Thu Jul 15 17:31:05 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 15 Jul 2010 14:31:05 -0700 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: <4C3F7222.6020902@gmail.com> References: <4C3F1FE1.4040000@gmail.com> <4C3F7222.6020902@gmail.com> Message-ID: On Thu, Jul 15, 2010 at 1:40 PM, Justin Riley wrote: > > Same here, this is very exciting. The new HPC instance type packs a > serious punch especially with regards to network latency between > machines Have you tested it yet? I saw they listed 10GB interconnects, but I don't recall if they specified the kind of backplane and any actual latency data... Cheers, f From fperez.net at gmail.com Thu Jul 15 17:34:42 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 15 Jul 2010 14:34:42 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: Hi Evan, On Thu, Jul 15, 2010 at 2:24 PM, Evan Patterson wrote: > This may have been lost in the stream of messages, but you can see my > current implementation of flush here: > > http://github.com/epatters/ipython/commit/2ecde29e8f2a5e7012236f61819b2f7833248553 > > I'm not sure if my approach is better or worse than a using an epsilon for > sleep. Even a 0.01 can help avoid hogging the cpu unnecessarily and is completely below human thresholds. It's also probably a good idea to have a safety fallback, so the loop can't stay there forever. Or do we trust the ioloop to be bulletproof in terms of calling the flush callback appropriately? That part isn't clear to me yet. >> Evan, quick question: when I open your frontend_widget, I see 100% cpu >> utilization all the time. ?Do you see this on your end? > > I hadn't noticed this before (probably because I never pay attention to what > my CPU utilization is), but I am seeing this on my end. Thanks for pointing > it out; I'll look into it. I noticed it because my fans started making loud noises after a few seconds of having your shell open. A loud fan is a very good cpu alert :) Cheers, f From fperez.net at gmail.com Thu Jul 15 17:42:01 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 15 Jul 2010 14:42:01 -0700 Subject: [IPython-dev] Paul Ivanov: Did you get any feedback from GH when I merged? Message-ID: Hi Paul, I just applied your pull request into trunk, thanks a lot for the bug fix. I used the GH interface to do it, and I'm curious whether it generated any feedback to you when that happened or not. Cheers, f From satra at mit.edu Thu Jul 15 20:55:48 2010 From: satra at mit.edu (Satrajit Ghosh) Date: Thu, 15 Jul 2010 20:55:48 -0400 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: <4C3F709C.5080505@gmail.com> References: <4C3F1FE1.4040000@gmail.com> <4C3F709C.5080505@gmail.com> Message-ID: hi justin, i hope to test it out tonight. from what fernando and i discussed, this should be relatively straightforward. once i'm done i'll push it to my fork of ipython and announce it here for others to test. cheers, satra On Thu, Jul 15, 2010 at 4:33 PM, Justin Riley wrote: > This is great news. Right now StarCluster just takes advantage of > password-less ssh already being installed and runs: > > $ ipcluster ssh --clusterfile /path/to/cluster_file.py > > This works fine for now, however, having SGE support would allow > ipcluster's load to be accounted for by the queue. > > Is Satra on the list? I have experience with SGE and could help with the > code if needed. I can also help test this functionality. > > ~Justin > > On 07/15/2010 03:34 PM, Fernando Perez wrote: > > On Thu, Jul 15, 2010 at 10:34 AM, Brian Granger > wrote: > >> Thanks for the post. You should also know that it looks like someone > >> is going to add native SGE support to ipcluster for 0.10.1. > > > > Yes, Satra and I went over this last night in detail (thanks to Brian > > for the pointers), and he said he might actually already have some > > code for it. I suspect we'll get this in soon. > > > > Cheers, > > > > f > > _______________________________________________ > IPython-dev mailing list > IPython-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/ipython-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Fri Jul 16 01:25:14 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Thu, 15 Jul 2010 22:25:14 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: On Thu, Jul 15, 2010 at 1:22 PM, Fernando Perez wrote: > On Thu, Jul 15, 2010 at 11:07 AM, Brian Granger wrote: >> >> Definitely hard to get right and terminal based frontends will >> definitely need something like flush. ?Let's see how it goes with this >> approach. > > Though absent a real event loop with a callback model, there it will > need to be implemented with a real sleep(epsilon) and a total timeout. > ?Terminal frontends will always simply be bound to flushing what they > can and then moving on if nothing has come in a given window they wait > for. ?Such is life when your 'event loop' is the human hitting the > RETURN key... > > Evan, quick question: when I open your frontend_widget, I see 100% cpu > utilization all the time. ?Do you see this on your end? We should make sure we understand this. Min and I found that our new Tornado event loop in pyzmq was using 100% CPU because of a bug in the poll timeout (units problems). We have fixed this (so we think!), so I am hopeful the current issue is coming from the flush logic. Brian > Cheers, > > f > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From epatters at enthought.com Fri Jul 16 10:40:55 2010 From: epatters at enthought.com (Evan Patterson) Date: Fri, 16 Jul 2010 09:40:55 -0500 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: On Fri, Jul 16, 2010 at 12:25 AM, Brian Granger wrote: > On Thu, Jul 15, 2010 at 1:22 PM, Fernando Perez > wrote: > > On Thu, Jul 15, 2010 at 11:07 AM, Brian Granger > wrote: > >> > >> Definitely hard to get right and terminal based frontends will > >> definitely need something like flush. Let's see how it goes with this > >> approach. > > > > Though absent a real event loop with a callback model, there it will > > need to be implemented with a real sleep(epsilon) and a total timeout. > > Terminal frontends will always simply be bound to flushing what they > > can and then moving on if nothing has come in a given window they wait > > for. Such is life when your 'event loop' is the human hitting the > > RETURN key... > > > > Evan, quick question: when I open your frontend_widget, I see 100% cpu > > utilization all the time. Do you see this on your end? > > We should make sure we understand this. Min and I found that our new > Tornado event loop in pyzmq was using 100% CPU because of a bug in the > poll timeout (units problems). We have fixed this (so we think!), so > I am hopeful the current issue is coming from the flush logic. > Unfortunately, this does not seem to be the case. I have confirmed that the problem is indeed with the IOLoops. They have the the CPU pegged at 100% even when the console is idle, i.e. when no flushing or communication of any sort occurring. Did you commit your fix to the main branch of PyZMQ? Maybe I am not using the right stuff. Evan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Fri Jul 16 12:00:25 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Fri, 16 Jul 2010 09:00:25 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: Here is the commit: http://github.com/ellisonbg/pyzmq/commit/18f5d061558a176f5496aa8e049182c1a7da64f6 You will need to recompile pyzmq for this to go into affect. Let me know if this doesn't fix the problem. Cheers, Brian On Fri, Jul 16, 2010 at 7:40 AM, Evan Patterson wrote: > On Fri, Jul 16, 2010 at 12:25 AM, Brian Granger wrote: >> >> On Thu, Jul 15, 2010 at 1:22 PM, Fernando Perez >> wrote: >> > On Thu, Jul 15, 2010 at 11:07 AM, Brian Granger >> > wrote: >> >> >> >> Definitely hard to get right and terminal based frontends will >> >> definitely need something like flush. ?Let's see how it goes with this >> >> approach. >> > >> > Though absent a real event loop with a callback model, there it will >> > need to be implemented with a real sleep(epsilon) and a total timeout. >> > ?Terminal frontends will always simply be bound to flushing what they >> > can and then moving on if nothing has come in a given window they wait >> > for. ?Such is life when your 'event loop' is the human hitting the >> > RETURN key... >> > >> > Evan, quick question: when I open your frontend_widget, I see 100% cpu >> > utilization all the time. ?Do you see this on your end? >> >> We should make sure we understand this. Min and I found that our new >> Tornado event loop in pyzmq was using 100% CPU because of a bug in the >> ?poll timeout (units problems). ?We have fixed this (so we think!), so >> I am hopeful the current issue is coming from the flush logic. > > Unfortunately, this does not seem to be the case. I have confirmed that the > problem is indeed with the IOLoops. They have the the CPU pegged at 100% > even when the console is idle, i.e. when no flushing or communication of any > sort occurring. > > Did you commit your fix to the main branch of PyZMQ? Maybe I am not using > the right stuff. > > Evan > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From epatters at enthought.com Fri Jul 16 12:21:41 2010 From: epatters at enthought.com (Evan Patterson) Date: Fri, 16 Jul 2010 11:21:41 -0500 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: I verified that I have that commit and that I recompiled PyZMQ. Unfortunately, the problem persists. Fernando: as a sanity check, can you confirm that you have this problem with the latest version of PyZMQ? Evan On Fri, Jul 16, 2010 at 11:00 AM, Brian Granger wrote: > Here is the commit: > > > http://github.com/ellisonbg/pyzmq/commit/18f5d061558a176f5496aa8e049182c1a7da64f6 > > You will need to recompile pyzmq for this to go into affect. Let me > know if this doesn't fix the problem. > > Cheers, > > Brian > > On Fri, Jul 16, 2010 at 7:40 AM, Evan Patterson > wrote: > > On Fri, Jul 16, 2010 at 12:25 AM, Brian Granger > wrote: > >> > >> On Thu, Jul 15, 2010 at 1:22 PM, Fernando Perez > >> wrote: > >> > On Thu, Jul 15, 2010 at 11:07 AM, Brian Granger > >> > wrote: > >> >> > >> >> Definitely hard to get right and terminal based frontends will > >> >> definitely need something like flush. Let's see how it goes with > this > >> >> approach. > >> > > >> > Though absent a real event loop with a callback model, there it will > >> > need to be implemented with a real sleep(epsilon) and a total timeout. > >> > Terminal frontends will always simply be bound to flushing what they > >> > can and then moving on if nothing has come in a given window they wait > >> > for. Such is life when your 'event loop' is the human hitting the > >> > RETURN key... > >> > > >> > Evan, quick question: when I open your frontend_widget, I see 100% cpu > >> > utilization all the time. Do you see this on your end? > >> > >> We should make sure we understand this. Min and I found that our new > >> Tornado event loop in pyzmq was using 100% CPU because of a bug in the > >> poll timeout (units problems). We have fixed this (so we think!), so > >> I am hopeful the current issue is coming from the flush logic. > > > > Unfortunately, this does not seem to be the case. I have confirmed that > the > > problem is indeed with the IOLoops. They have the the CPU pegged at 100% > > even when the console is idle, i.e. when no flushing or communication of > any > > sort occurring. > > > > Did you commit your fix to the main branch of PyZMQ? Maybe I am not using > > the right stuff. > > > > Evan > > > > > > -- > Brian E. Granger, Ph.D. > Assistant Professor of Physics > Cal Poly State University, San Luis Obispo > bgranger at calpoly.edu > ellisonbg at gmail.com > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Fri Jul 16 14:24:32 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 16 Jul 2010 11:24:32 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: On Fri, Jul 16, 2010 at 9:21 AM, Evan Patterson wrote: > I verified that I have that commit and that I recompiled PyZMQ. > Unfortunately, the problem persists. > > Fernando: as a sanity check, can you confirm that you have this problem with > the latest version of PyZMQ? Same here. I had rebuilt zmq/pyzmq to confirm whether the other problem we'd seen was gone (kernel dying when clients disconnect), and that one is indeed now fixed. But the CPU 100% use is still there, even when Evan's qt frontend is idle. As a data point, Gerardo's, which does not yet use the ioloop, doesn't show the problem. Cheers, f From ellisonbg at gmail.com Fri Jul 16 14:53:16 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Fri, 16 Jul 2010 11:53:16 -0700 Subject: [IPython-dev] Coordinating the XREQ and SUB channels In-Reply-To: References:

Message-ID: The issue is the units of the timeout passed to poll in the ioloop. Here is the line of code where you can see my comment about this: http://github.com/ellisonbg/pyzmq/commit/18f5d061558a176f5496aa8e049182c1a7da64f6#L2R189 Can you try increasing/decreasing it by a factor of 1000. As long as you are using an inplace build, you shouldn't have to recompile. I need to install qt+pyqt and then I can give this a try as well. Brian On Fri, Jul 16, 2010 at 11:24 AM, Fernando Perez wrote: > On Fri, Jul 16, 2010 at 9:21 AM, Evan Patterson wrote: >> I verified that I have that commit and that I recompiled PyZMQ. >> Unfortunately, the problem persists. >> >> Fernando: as a sanity check, can you confirm that you have this problem with >> the latest version of PyZMQ? > > Same here. ?I had rebuilt zmq/pyzmq to confirm whether the other > problem we'd seen was gone (kernel dying when clients disconnect), and > that one is indeed now fixed. > > But the CPU 100% use is still there, even when Evan's qt frontend is > idle. ?As a data point, Gerardo's, which does not yet use the ioloop, > doesn't show the problem. > > Cheers, > f > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From satra at mit.edu Sat Jul 17 09:23:50 2010 From: satra at mit.edu (Satrajit Ghosh) Date: Sat, 17 Jul 2010 09:23:50 -0400 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: <4C3F1FE1.4040000@gmail.com> <4C3F709C.5080505@gmail.com> Message-ID: hi , i've pushed my changes to: http://github.com/satra/ipython/tree/0.10.1-sge notes: 1. it starts cleanly. i can connect and execute things. when i kill using ctrl-c, the messages appear to indicate that everything shut down well. however, the sge ipengine jobs are still running. 2. the pbs option appears to require mpi to be present. i don't think one can launch multiple engines using pbs without mpi or without the workaround i've applied to the sge engine. basically it submits an sge job for each engine that i want to run. i would love to know if a single job can launch multiple engines on a sge/pbs cluster without mpi. cheers, satra On Thu, Jul 15, 2010 at 8:55 PM, Satrajit Ghosh wrote: > hi justin, > > i hope to test it out tonight. from what fernando and i discussed, this > should be relatively straightforward. once i'm done i'll push it to my fork > of ipython and announce it here for others to test. > > cheers, > > satra > > > > On Thu, Jul 15, 2010 at 4:33 PM, Justin Riley wrote: > >> This is great news. Right now StarCluster just takes advantage of >> password-less ssh already being installed and runs: >> >> $ ipcluster ssh --clusterfile /path/to/cluster_file.py >> >> This works fine for now, however, having SGE support would allow >> ipcluster's load to be accounted for by the queue. >> >> Is Satra on the list? I have experience with SGE and could help with the >> code if needed. I can also help test this functionality. >> >> ~Justin >> >> On 07/15/2010 03:34 PM, Fernando Perez wrote: >> > On Thu, Jul 15, 2010 at 10:34 AM, Brian Granger >> wrote: >> >> Thanks for the post. You should also know that it looks like someone >> >> is going to add native SGE support to ipcluster for 0.10.1. >> > >> > Yes, Satra and I went over this last night in detail (thanks to Brian >> > for the pointers), and he said he might actually already have some >> > code for it. I suspect we'll get this in soon. >> > >> > Cheers, >> > >> > f >> >> _______________________________________________ >> IPython-dev mailing list >> IPython-dev at scipy.org >> http://mail.scipy.org/mailman/listinfo/ipython-dev >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ellisonbg at gmail.com Sun Jul 18 00:00:19 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Sat, 17 Jul 2010 21:00:19 -0700 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: <4C3F1FE1.4040000@gmail.com> <4C3F709C.5080505@gmail.com>

Message-ID: On Sat, Jul 17, 2010 at 6:23 AM, Satrajit Ghosh wrote: > hi , > > i've pushed my changes to: > > http://github.com/satra/ipython/tree/0.10.1-sge > > notes: > > 1. it starts cleanly. i can connect and execute things. when i kill using > ctrl-c, the messages appear to indicate that everything shut down well. > however, the sge ipengine jobs are still running. What version of Python and Twisted are you running? > 2. the pbs option appears to require mpi to be present. i don't think one > can launch multiple engines using pbs without mpi or without the workaround > i've applied to the sge engine. basically it submits an sge job for each > engine that i want to run. i would love to know if a single job can launch > multiple engines on a sge/pbs cluster without mpi. I think you are right that pbs needs to use mpirun/mpiexec to start multiple engines using a single PBS job. I am not that familiar with SGE, can you start mulitple processes without mpi and with just a single SGE job? If so, let's try to get that working. Cheers, Brian > cheers, > > satra > > On Thu, Jul 15, 2010 at 8:55 PM, Satrajit Ghosh wrote: >> >> hi justin, >> >> i hope to test it out tonight. from what fernando and i discussed, this >> should be relatively straightforward. once i'm done i'll push it to my fork >> of ipython and announce it here for others to test. >> >> cheers, >> >> satra >> >> >> On Thu, Jul 15, 2010 at 4:33 PM, Justin Riley >> wrote: >>> >>> This is great news. Right now StarCluster just takes advantage of >>> password-less ssh already being installed and runs: >>> >>> $ ipcluster ssh --clusterfile /path/to/cluster_file.py >>> >>> This works fine for now, however, having SGE support would allow >>> ipcluster's load to be accounted for by the queue. >>> >>> Is Satra on the list? I have experience with SGE and could help with the >>> code if needed. I can also help test this functionality. >>> >>> ~Justin >>> >>> On 07/15/2010 03:34 PM, Fernando Perez wrote: >>> > On Thu, Jul 15, 2010 at 10:34 AM, Brian Granger >>> > wrote: >>> >> Thanks for the post. ?You should also know that it looks like someone >>> >> is going to add native SGE support to ipcluster for 0.10.1. >>> > >>> > Yes, Satra and I went over this last night in detail (thanks to Brian >>> > for the pointers), and he said he might actually already have some >>> > code for it. ?I suspect we'll get this in soon. >>> > >>> > Cheers, >>> > >>> > f >>> >>> _______________________________________________ >>> IPython-dev mailing list >>> IPython-dev at scipy.org >>> http://mail.scipy.org/mailman/listinfo/ipython-dev >> > > > _______________________________________________ > IPython-dev mailing list > IPython-dev at scipy.org > http://mail.scipy.org/mailman/listinfo/ipython-dev > > -- Brian E. Granger, Ph.D. Assistant Professor of Physics Cal Poly State University, San Luis Obispo bgranger at calpoly.edu ellisonbg at gmail.com From ellisonbg at gmail.com Sun Jul 18 00:05:32 2010 From: ellisonbg at gmail.com (Brian Granger) Date: Sat, 17 Jul 2010 21:05:32 -0700 Subject: [IPython-dev] SciPy Sprint summary In-Reply-To: References: <4C3F1FE1.4040000@gmail.com> <4C3F709C.5080505@gmail.com>

Message-ID: Is the array jobs feature what you want? http://wikis.sun.com/display/gridengine62u6/Submitting+Jobs Brian On Sat, Jul 17, 2010 at 9:00 PM, Brian Granger wrote: > On Sat, Jul 17, 2010 at 6:23 AM, Satrajit Ghosh