Screen is not a process control system
This post may seem like it should be blatantly obvious, but in the last month alone, I’ve heard of numerous people using screen and/or cron to keep daemons alive. Worse, there’s even semi-official guides out there that still recommend it, even in this day and age.
Now, don’t get me wrong, screen is absolutely awesome (though I prefer the newer tmux) for what it’s meant to do, such as multiplexing terminals and providing window management of those terminals. But it is not designed to control and watch over your daemon processes. For example, it doesn’t manage your logfiles, it won’t respawn a crashed program and it’s not going to come up by itself after a reboot, either.
Thankfully however, there are a lot of modern tools that are designed specifically for this purpose.
Supervisord
Supervisord is a “client/server system that allows its users to monitor and control a number of processes on UNIX-like operating systems.” It’s a daemon that’s started like all the others on your system by init, but in turn manages other processes for you through simple configuration files, which look like this:
# /etc/supervisor/conf.d/err.conf
[program:err]
directory=/home/err/repository
command=/home/err/virtualenv/bin/python /home/err/repository/scripts/err.py --config "/home/err" --xmpp
autostart=true
autorestart=true
startsecs=10
stopwaitsecs=60
redirect_stderr=true
stdout_logfile=/var/log/supervisor/err.log
stderr_logfile=None
stdout_logfile_maxbytes=150MB
stdout_logfile_backups=0
user=err
environment=HOME=/home/err,USER=err
It provides a command-line program to manage the programs under it’s control, which you can see below, as well as many other features, including an API in case you wanted to integrate it with other systems.
$ supervisorctl status
devpi-server RUNNING pid 1517, uptime 19 days, 1:43:17
err RUNNING pid 1503, uptime 19 days, 1:43:17
munin-fcgi-graph RUNNING pid 1504, uptime 19 days, 1:43:17
munin-fcgi-html RUNNING pid 1495, uptime 19 days, 1:43:17
$ supervisorctl restart devpi-server
devpi-server: stopped
devpi-server: started
Upstart
Upstart is “an event-based replacement for the /sbin/init daemon which handles starting of tasks and services during boot, stopping them during shutdown and supervising them while the system is running. It was originally developed for the Ubuntu distribution, but is intended to be suitable for deployment in all Linux distributions as a replacement for the venerable System-V init.”
It’s one of many init replacements, systemd and OpenRC being some other examples. I’m a Ubuntu user so for me, using upstart makes a lot of sense in my infrastructure. I use upstart rather than supervisor when it comes to system-level daemons, especially those where it’s nice to be able to control where in the boot process they get started. Upstart scripts are just as simple to write as supervisor entries, though.
# /etc/init/uwsgi-emperor.conf
description "uWSGI Emperor"
start on runlevel [2345]
stop on runlevel [06]
respawn
respawn limit 10 5
pre-start script
[ -e /var/run/uwsgi-emperor ] || mkdir /var/run/uwsgi-emperor
chmod 1777 /var/run/uwsgi-emperor
end script
exec uwsgi --logto /var/log/uwsgi-emperor.log --log-date --thunder-lock --die-on-term --emperor /etc/uwsgi/apps-enabled
Others
The two I just highlighted specifically are merely the tip of the iceberg. There’s also monit, daemontools, circus and runit, just to name a few. All that matters is you should be using one of the many tools designed specifically for this purpose, rather than hacking together fragile solutions with tmux, screen or cron.