This past week a tsuru user was having a problem in one of his apps. Some units of his app would simply get stuck, timing out every request and apparently doing nothing. As a workaround he wanted to kill the problematic unit and force it to restart. On tsuru you are able to restart all units of an app but may not choose a particular unit to restart.
We then “sshed” into the problematic container, by using tsuru app shell -a <app-name>
, and tried sending a SIGKILL
to their application process (pid 1) and surprisingly it did not work.
$ sudo kill -9 1 # our first try, from inside the container
We tried SIGTERM
and SIGQUIT
and nothing happened. We then ssh’ed into the host, found out the pid (using docker top
), issued the SIGKILL
and boom the container restarted.
Reading the man page for kill(2) helped understanding this behavior:
The only signals that can be sent to process ID 1, the init process, are those for which init has explicitly installed signal handlers. This is done to assure the system is not brought down accidentally.
So, to be able to kill the container from the inside, you need to register a handler for the particular signal. It turns out that you cannot register a handler for SIGKILL
(you are also not able to ignore this signal). So, one must handle a different signal, e.g, SIGTERM
, and use it to shutdown the application (by raising a SIGKILL
or simply exiting).
The following code shows an example that might be used to check this behavior.
#include <signal.h>
#include <stdlib.h>
#include <unistd.h>
#include <stdio.h>
void handler(int sig)
{
exit(sig);
}
int main(int argc, char *argv[])
{
int duration;
if (argc > 1)
{
duration = atoi(argv[1]);
printf("Sleeping for %ds\n", duration);
sleep(duration);
exit(EXIT_SUCCESS);
}
if(signal(SIGQUIT, handler) == SIG_ERR)
exit(EXIT_FAILURE);
for (;;)
pause();
}
If the code is run as ./killable 30
, the application will sleep for 30 seconds and then just exit. If that is the init process of a container, you won’t be able to send any signal to it as no handler was registered. If no argument is provided, a handler for the SIGQUIT
signal is registered and we are able to send signals to it. In this latter case, we are able to kill the container successfully.
As it turns out, our advice to the user was to setup a signal handler for SIGTERM
and to shutdown the application when receiving this signal.