[IPython-dev] client connection in shared filesystem for a farm of machines

MinRK benjaminrk at gmail.com
Wed Jun 22 16:40:03 EDT 2011


Sorry, A recent pull request broke the Controller. Should be fixed now.

On Wed, Jun 22, 2011 at 13:22, Johann Cohen-Tanugi
<johann.cohen-tanugi at lupm.univ-montp2.fr> wrote:
> hmm, I just did a "git pull", but now when starting the controller I get :
> [IPClusterStart] [IPEngineApp] Using existing profile dir:
> '/u/ec/cohen/.config/ipython/profile_default'
> [IPClusterStart] ERROR:root:Error in periodic callback
> [IPClusterStart] Traceback (most recent call last):
> [IPClusterStart]   File
> "/afs/slac/g/glast/users/cohen/IPYDEV/local/lib/python2.6/site-packages/zmq/eventloop/ioloop.py",
> line 432, in _run
> [IPClusterStart]     self.callback()
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/parallel/controller/heartmonitor.py",
> line 112, in beat
> [IPClusterStart]     map(self.handle_new_heart, newhearts)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/parallel/controller/heartmonitor.py",
> line 123, in handle_new_heart
> [IPClusterStart]     handler(heart)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/parallel/controller/hub.py",
> line 530, in handle_new_heart
> [IPClusterStart]     self.finish_registration(heart)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/parallel/controller/hub.py",
> line 977, in finish_registration
> [IPClusterStart]     control=control, heartbeat=heart)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/utils/traitlets.py",
> line 420, in __init__
> [IPClusterStart]     setattr(self, key, value)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/utils/traitlets.py",
> line 302, in __set__
> [IPClusterStart]     new_value = self._validate(obj, value)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/utils/traitlets.py",
> line 310, in _validate
> [IPClusterStart]     return self.validate(obj, value)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/utils/traitlets.py",
> line 967, in validate
> [IPClusterStart]     self.error(obj, value)
> [IPClusterStart]   File
> "/a/wain006/g.glast.u54/cohen/IPYDEV/ipython/IPython/utils/traitlets.py",
> line 333, in error
> [IPClusterStart]     raise TraitError(e)
> [IPClusterStart] TraitError: The 'control' trait of an EngineConnector
> instance must be a string, but a value of
> u'218df526-7c37-4bb1-bb57-652ce02cd122' <type 'unicode'> was specified.
> [IPClusterStart] INFO:IPControllerApp:heartbeat::ignoring new heart:
> '218df526-7c37-4bb1-bb57-652ce02cd122'
>
>
> On 06/22/2011 08:41 PM, MinRK wrote:
>>
>> On Wed, Jun 22, 2011 at 08:39, Johann Cohen-Tanugi
>> <johann.cohentanugi at gmail.com>  wrote:
>>>
>>> Hello,
>>> I am using a farm of several machines, and I log in to it using ssh and
>>> a generic farm name, ending up on one specific machine depending on
>>> loads etc.... All the machines of the farm see the AFS filesystem, on
>>> which a json file was created when I fired ipcluster after a first login
>>> to the farm.
>>> Then I start another terminal and log in again to the farm, ending up on
>>> another machine than the one the ipcluster is running on.
>>> If I then do :
>>> from IPython.parallel import Client
>>> c = Client()
>>>
>>> it hangs.... If I do:
>>> c =
>>>
>>> Client('/u/ec/cohen/.config/ipython/profile_default/security/ipcontroller-client.json')
>>> it returns :
>>>
>>> ---------------------------------------------------------------------------
>>> TypeError                                 Traceback (most recent call
>>> last)
>>>
>>> /a/wain006/g.glast.u54/cohen/IPYDEV/test_directory/<ipython-input-3-9325a79c5ee0>
>>> in<module>()
>>> ---->  1 c =
>>>
>>> Client('/u/ec/cohen/.config/ipython/profile_default/security/ipcontroller-client.json')
>>>
>>> TypeError: __new__() takes exactly 1 argument (2 given)
>>>
>>> so I guess the doc in
>>>
>>> http://ipython.org/ipython-doc/dev/parallel/parallel_intro.html#getting-started
>>> needs a patch.
>>
>> The docs are right - I just introduced a bug when I made the Client
>> inherit from HasTraits.  I pushed a simple fix to master, so it does
>> accept positional arguments.
>> I also included a fix for the 'hang', which is actually a units
>> problem in pyzmq's select - trying to connect to a nonexistent
>> controller will now timeout after 10 seconds (timeout is an arg in the
>> Client constructor, so you can make it shorter if you like).
>>
>>> Finally if I do :
>>> c =
>>>
>>> Client(url_or_file='/afs/slac/u/ec/cohen/.config/ipython/profile_default/security/ipcontroller-client.json')
>>> it also hangs, while on the same ipython session I can immediately check
>>> :
>>> In [8]: ls -ltr
>>>
>>> /afs/slac/u/ec/cohen/.config/ipython/profile_default/security/ipcontroller-client.json
>>> -rw------- 1 cohen ec 130 Jun 22 08:06
>>>
>>> /afs/slac/u/ec/cohen/.config/ipython/profile_default/security/ipcontroller-client.json
>>>
>>> that indeed the JSON file is accessible via the AFS file sharing.
>>>
>>>
>>> I checked that if I forced connecting to the same machine instead of
>>> using the generic farm name,
>>> c=Client() immediately returns with the engines attached and I can
>>> proceed normally.
>>
>> The Controller only listens on loopback by default for security
>> reasons. If you want to connect to a different machine, you must
>> instruct the Controller to listen on a public interface (e.g.
>> ip=0.0.0.0), which you should only do if your cluster is safely behind
>> a firewall. Otherwise, you must use ssh tunnels to connect to the
>> Controller, via the Client's 'ssh' arg.
>>
>> -MinRK
>>
>>> Thanks in advance for the help,
>>> johann
>>> _______________________________________________
>>> IPython-dev mailing list
>>> IPython-dev at scipy.org
>>> http://mail.scipy.org/mailman/listinfo/ipython-dev
>>>
>



More information about the IPython-dev mailing list