The Stale pidfile Syndrome - What's wrong with pidfiles?
Many unix daemons record their process ids in pidfiles. Startup and shutdown scripts use these pidfiles to determine which process to send signals to. Examples are Apache, Postgres and OpenSSH.
This method is unreliable.
Is the PID valid?
Some daemons create pidfiles, but do not remove them when the process exits. This could be because the daemon was never programmed to remove the file, or it could be because the process crashed before it could remove the file.
Sometimes the system crashes and processes do not get a chance to remove their pidfiles even if they want to.
In both cases, pidfiles remain, but their contents are invalid. The daemon is no longer running.
Is the process running?
Some scripts attempt to compensate for the above problems by sending a signal 0 to the process named in the pidfile. The assumption is that if no error is returned, then the process is still running.
This test is inadequate. The daemon may have died and a different process may now have that process id. In this case, a script may mistakenly think the daemon is still running and send a signal to the wrong process. This is most likely and most dangerous when the script is run as root.
This test can also fail at boot time. If a stale pid remains when the system starts, another process may now have pid named in the pidfile. When the startup script runs, it may use the signal 0 trick and decide the process is already running and not start the daemon.
Suggestions
There may not be a fool-proof way of determining unambiguously whether a given daemon is still running, but here are some suggestions that may help:
- Daemon processes should be written to remove their own pidfiles if at all possible.
- Boot rc scripts should remove pidfiles before startup scripts are executed. This would at least solve the problems of daemons not starting. Some systems do put pidfiles in a directory that is cleared at boot.
More recent techniques involve writing your application as a normal foreground process, and then running it under a monitor like daemontools.