[Web-SIG] Daemon server management

Fri Jun 10 17:55:47 CEST 2005

I'm guessing you also meant to copy web-sig...

Jacob Smullyan wrote:
> On Thu, Jun 09, 2005 at 01:52:52PM -0500, Ian Bicking wrote:
> 
>>Jacob Smullyan wrote:
>>
>>>On Thu, Jun 09, 2005 at 01:26:17PM -0500, Ian Bicking wrote:
>>>
>>>
>>>>Does anyone have opinions on how to start and stop daemon servers?  I've
>>>>added a --daemon option to paster serve, but I'd like to implement stop,
>>>>restart, and reload as well.  Whenever I encounter servers that clobber
>>>>pid files, or where the only way you can tell you've started a server
>>>>twice is that you get an error message about not being able to bind to
>>>>the port, it annoys me.  But I'm not sure how to best implement a better
>>>>system.  Especially cross-platform -- though an entirely separate
>>>>process for Windows might make sense (as a windows service or something).
>>>>
>>>>Opinions?  Or examples of other servers (preferably Python-based) that
>>>>do this well?
>>>
>>>
>>>Clobbering pid files is a no-no; but getting an error about a port
>>>being already in use doesn't seem terrible to me. 
>>
>>Yes, but how to avoid clobbering pid files?  It's probably a beginner 
>>question, and I've found workable things in the os module, but I don't 
>>actually know the right way to do this.
> 
> 
> The os module has the best way to open a file, making sure that it
> doesn't exist:
> 
>   try:
> 	fd=os.open(fname, os.O_CREATE | os.O_EXCL)
>   except OSError, e:
> 	if e.errno == errno.EEXIST:
> 		logger.exception("File exists: %s", fname)
>                 # actually, you should bomb out here
>         else:
> 		logger.exception("IO error opening pid file!")
>                 # same
>   else:
>         fp=os.fdopen(fd, 'w')
>         fp.write(str(os.getpid()))
>         fp.flush()
>         fp.close()
> 
> but it isn't foolproof -- there can be a race condition on NFS, as
> documented in "man open" (on Linux, at least).  But anyone who stores
> a pid file on an NFS filesystem is probably asking for it....
> 
> I recently drafted a new version of the skunkweb daemon, which tries
> to be pretty traditional:
> 
>    http://svn.berlios.de/viewcvs/skunkweb/sandbox/smulloni/skunk4/src/skunk/net/server/
> 
> In light of your musing, I see several flaws.  There are some things
> it doesn't do that most textbook examples do -- it doesn't dup stdout
> and stderr, for instance -- that I was aware of.  I notice that I
> didn't open the pid file so carefully -- I guess I'll change that.  In
> practice, starting a daemon twice would probably cause a port conflict
> before the pid file is written, since two instances sharing the same
> pid file are likely to have the same configuration, too.

In Paste I can't really do that, since the pid file gets written before 
the server starts up, because it's server-agnostic, and none of the 
servers currently supported have any of this infrastructure themselves.

>>I'd agree it's wrong to be clever and notice that the process is already 
>>running, then exiting without error.  But it's right to notice the other 
>>process is running, and exit with a helpful error; helpful errors are 
>>always right.  Should I even try to connect to a port if the process in 
>>the pid file is still alive, or should I bail immediately?
> 
> 
> I think that if the pid file exists in any form, you are right to
> refuse to start, with an error message about the pid file already
> existing.  But if this is a separate test, you could still clobber one
> a moment later when you write one yourself; so a careful open is
> probably the most important thing.

I don't like this way of working -- a stale pid file should be 
overwritten automatically.  Otherwise the admin has to figure out 
whether the pid file wasn't cleaned up properly, or the server really is 
alive.  The server can figure that out just as well as the admin can 
manually (probably better).  Though some cases are ambiguous, e.g., you 
can't be sure the live process is the same process that created the pid 
file.

>>>I'd advocate the standard UNIX behavior for UNIX machines; pid file,
>>>conventional signal handling (in particular, HUP reloads).  For
>>>Windows, the standard Windows behavior, whatever that might be; a
>>>cross-platform solution would be neither fish nor fowl.  This is not
>>>just a matter of taste; conforming to the platform's expectations in
>>>this area is the gracious thing to do, since packagers and system
>>>administrators do not relish constantly having to write special
>>>wrappers for non-standard daemons.
>>
>>I'm happy to copy conventions.  Does anyone recommend a particular 
>>document on those conventions?  For things like, do I open log files 
>>before or after I change user id (assuming the server is started as 
>>root)?  And I'm a complete blank slate when it comes to the Windows 
>>side.  Or even Macs, though I'm okay treating them like Unix to start.
> 
> 
> Well, the books I like are the usual suspects: Stevens' UNIX Network
> Programming, Vol. I, Johnson & Troan's Linux Application Development,
> and I also rather like Lincoln Stein's treatment of the same territory
> for Perl -- Network Pr. in Perl.  Copying a good model, like Apache,
> isn't a bad thing either.
> 
> As for log files, I *think* that they end up belonging to root even if
> the child processes setuid to a nobody-style person.  That is what
> I've done.  That seems to be what apache does.

Yes, I think that is the case.  But I think the group ownership might 
change?

I generally like how Apache works now, since they've combined httpd and 
apachectl, but I'm not sure how easy it would be for me to discover the 
particulars.

-- 
Ian Bicking  /  ianb at colorstudy.com  /  http://blog.ianbicking.org