Unix - Frequently Asked Questions (3/7) [Frequent posting]Section (original) (raw)

Top Document: Unix - Frequently Asked Questions (3/7) [Frequent posting]
Previous Document: Is it possible to pass shell variable settings into an awk program?
Next Document: How do I get lines from a pipe ... instead of only in larger blocks?
See reader questions & answers on this topic! - Help others by sharing your knowledge
From: casper@fwi.uva.nl (Casper Dik) Date: Thu, 09 Sep 93 16:39:58 +0200
3.13) How do I get rid of zombie processes that persevere?
  Unfortunately, it's impossible to generalize how the death of
  child processes should behave, because the exact mechanism varies
  over the various flavors of Unix.

  First of all, by default, you have to do a wait() for child
  processes under ALL flavors of Unix.  That is, there is no flavor
  of Unix that I know of that will automatically flush child
  processes that exit, even if you don't do anything to tell it to
  do so.

  Second, under some SysV-derived systems, if you do
  "signal(SIGCHLD, SIG_IGN)" (well, actually, it may be SIGCLD
  instead of SIGCHLD, but most of the newer SysV systems have
  "#define SIGCHLD SIGCLD" in the header files), then child
  processes will be cleaned up automatically, with no further
  effort in your part.  The best way to find out if it works at
  your site is to try it, although if you are trying to write
  portable code, it's a bad idea to rely on this in any case.
  Unfortunately, POSIX doesn't allow you to do this; the behavior
  of setting the SIGCHLD to SIG_IGN under POSIX is undefined, so
  you can't do it if your program is supposed to be
  POSIX-compliant.

  So, what's the POSIX way? As mentioned earlier, you must
  install a signal handler and wait. Under POSIX signal handlers
  are installed with sigaction. Since you are not interested in
  ``stopped'' children, only in terminated children, add SA_NOCLDSTOP
  to sa_flags.  Waiting without blocking is done with waitpid().
  The first argument to waitpid should be -1 (wait for any pid),
  the third should be WNOHANG. This is the most portable way
  and is likely to become more portable in future.

  If your systems doesn't support POSIX, there's a number of ways.
  The easiest way is signal(SIGCHLD, SIG_IGN), if it works.
  If SIG_IGN cannot be used to force automatic clean-up, then you've
  got to write a signal handler to do it.  It isn't easy at all to
  write a signal handler that does things right on all flavors of
  Unix, because of the following inconsistencies:

  On some flavors of Unix, the SIGCHLD signal handler is called if
  one *or more* children have died.  This means that if your signal
  handler only does one wait() call, then it won't clean up all of
  the children.  Fortunately, I believe that all Unix flavors for
  which this is the case have available to the programmer the
  wait3() or waitpid() call, which allows the WNOHANG option to
  check whether or not there are any children waiting to be cleaned
  up.  Therefore, on any system that has wait3()/waitpid(), your
  signal handler should call wait3()/waitpid() over and over again
  with the WNOHANG option until there are no children left to clean
  up. Waitpid() is the preferred interface, as it is in POSIX.

  On SysV-derived systems, SIGCHLD signals are regenerated if there
  are child processes still waiting to be cleaned up after you exit
  the SIGCHLD signal handler.  Therefore, it's safe on most SysV
  systems to assume when the signal handler gets called that you
  only have to clean up one signal, and assume that the handler
  will get called again if there are more to clean up after it
  exits.

  On older systems, there is no way to prevent signal handlers
  from being automatically reset to SIG_DFL when the signal
  handler gets called.  On such systems, you have to put
  "signal(SIGCHILD, catcher_func)" (where "catcher_func" is the
  name of the handler function) as the last thing in the signal
  handler, so that it gets reset.

  Fortunately, newer implementations allow signal handlers to be
  installed without being reset to SIG_DFL when the handler
  function is called.  To get around this problem, on systems that
  do not have wait3()/waitpid() but do have SIGCLD, you need to
  reset the signal handler with a call to signal() after doing at
  least one wait() within the handler, each time it is called.  For
  backward compatibility reasons, System V will keep the old
  semantics (reset handler on call) of signal().  Signal handlers
  that stick can be installed with sigaction() or sigset().

  The summary of all this is that on systems that have waitpid()
  (POSIX) or wait3(), you should use that and your signal handler
  should loop, and on systems that don't, you should have one call
  to wait() per invocation of the signal handler.

  One more thing -- if you don't want to go through all of this
  trouble, there is a portable way to avoid this problem, although
  it is somewhat less efficient.  Your parent process should fork,
  and then wait right there and then for the child process to
  terminate.  The child process then forks again, giving you a
  child and a grandchild.  The child exits immediately (and hence
  the parent waiting for it notices its death and continues to
  work), and the grandchild does whatever the child was originally
  supposed to.  Since its parent died, it is inherited by init,
  which will do whatever waiting is needed.  This method is
  inefficient because it requires an extra fork, but is pretty much
  completely portable.