0

I want the exact opposite of this question. I want to know how I can create a process that keeps restarting if it's killed. Could someone give me an implementation example?

For instance, let's assume that I have a simple process that continuously logs a timestamp and an incrementing counter to a log file, like this:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <unistd.h>

#define FILE_PATH "/home/lincoln/date.log"

int main() {

time_t current_time;
struct tm *time_info;
int counter = 0;

while (1) {
    FILE *file = fopen(FILE_PATH, &quot;a&quot;);
    if (file == NULL){
      perror(&quot;Failed to open the file&quot;);
      return 1;
    }
    // Get the current timestamp
    time(&amp;current_time);
    time_info = localtime(&amp;current_time);

    // Format the timestamp
    char timestamp[25];
    strftime(timestamp, sizeof(timestamp), &quot;%Y-%m-%d %H:%M:%S&quot;, time_info);

    // Write timestamp and counter to the file
    fprintf(file, &quot;%s | Counter: %d\n&quot;, timestamp, counter++);
    fflush(file);

    fclose(file);
    sleep(5); // Sleep for 5 seconds
}

return 0;

}

To generate the binary (ELF file), I used the following command:

gcc logger_file.c -o date_test

(Please note that this is just an example. It could be a "hello world" program that keeps printing. I chose logging because I want to observe the effects of it continuously running.)

From the quoted question (correct me if I'm wrong), it's achieved by a parent process that monitors the child process. If the child process is killed, the parent process restarts it. I understand this concept, but I don't know how it's implemented in practice.

Could someone demonstrate how it can be implemented using the provided binary example? (link references to me study more about are wellcome too.)

Please let me know if more information is needed.

EDIT: In my implementation, I'm planning to do it on an embedded device. Basically, in my working environment, I have sysVinit as the startup system and BusyBox v1.23.2 as the toolset on a 3.18.44 kernel running on a armv7 processor (I'm using gcc-12-arm-linux-gnueabi-base toolchain).

  • confused now. Are you interested into how to write a supervisor, or interested in how to use the supervisor from your init system? – Marcus Müller Jul 11 '23 at 22:27
  • @MarcusMüller From the beginning, I didn't realize that it was possible to use the supervisor of my init system. In your question, you mentioned systemd, and for me, this wasn't useful. However, with your instructions, I achieved success in writing my supervisor (with just some changes in your code). Thanks. – locnnil Jul 12 '23 at 00:02
  • @MarcusMüller I've provieded more information because I thought that with maximum information, it would be more useful for others to help me find a solution to my problem. The problem is that I have a binary app, and I want it to run continuously/ "forever" without anything stopping it. (That's the main point of embedded applications, right? Haha.) – locnnil Jul 12 '23 at 00:09

2 Answers2

7

A process can't revive itself once it's dead. It's dead, it literally cannot do anything anymore (that's the point).

You can have a supervisor that checks the state of a process and spawns a new process when the old one died.

Almost all (but a few niche or slim-container-oriented) Linux distros make that trivial: systemd can do that, and all you need to do is write a systemd service unit file and set the Restart= property to always, or on-abnormal, or on-abort. That's it!

Then, systemd acts as the supervisor (no need to write your own supervisor that fiddles with procfs or similar), and things work out of the box, and you can still explicitly stop the service from restarting through systemctl.

Now, you asked how this would be done in practice. I alluded to procfs; that's one of the ways: you just watch the /proc/{PID} directory or e.g. the memory maps in there.

However, and that's how supervisors usually do it (and systemd's service.c does it like that), is that they register a handler for SIGCHLD: When the child of a process exits, the parent process gets a signal. Because the supervisor is actually starting the supervised process as child through fork, that works.

So, you need roughly the following loosely lifted from an ancient Oracle Guide and Linux man-pages man waitpid:

#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>

void proc_exit() { int wstat; pid_t pid;

while (1) { int retval = waitpid(pid, &wstat, WNOHANG); if (retval == 0) return; else if (retval == -1) return; else printf("Return code: %d\n", WEXITSTATUS(wstat)); } } int main() { signal(SIGCHLD, proc_exit); switch (fork()) { case -1: perror("main: fork"); exit(0); case 0: printf("I'm alive (temporarily)\n"); exit(42); default: pause(); } }

The child process sees fork return 0, so it's the process that prints "I'm alive…. Here, you would, if you wanted to spawn a different executable (i.e., your supervised process) call execve.

3

If you're lucky enough to be on a system which has a regular init daemon, you can run that process out of the /etc/inittab, where you can specify that the process is to be restarted if it dies.

In /etc/inittab you add an entry like this:

dtst:345:respawn:/path/to/date_time

The dtst is just a label; 345 means this is active in run levels 3, 4 and 5, and respawn means that the process will be restarted if it dies (what we want).

Then use the telinit command, or kill -HUP 1 to tell init to re-read the file.

There is probably some complicated way to do something similar on systems that use systemd instead of init.

BusyBox init ignores the runlevels field in the inittab entry; the id field has some special meaning; if it isn't blank, it sets the controlling TTY.

Another way is to have a "nanny" shell script which does this:

#!/bin/sh

while true; do /path/to/date_test done

We could have it so that the loop will terminate if the program terminates normally with a successful status:

while !/path/to/date_test; do : ; done

While the date_test program terminates unsuccessfully or abnormally, run the do-nothing null command : and repeat.

The shell which runs this loop is itself not protected from termination, though; this is useful if the program is unreliable and crashes, requiring restarts.

Kaz
  • 8,273
  • as explained in my answer, the systemd init process can do the same, but you do get actual control. In all honesty, "old" init systems are not "luck" to have when you need to supervise daemons. Systemd allows for much finer control over how, when to and under which circumstances not to restart a service. (fully agreeing that what you write is appropriate according to OP's plans) – Marcus Müller Jul 11 '23 at 22:24
  • Thank you so much! This solution solves my problem in an incredibly elegant way. I just have one question about the label in inittab. Are there any naming restrictions? Does it need to be short? Because I tried naming the label as "testing" and it didn't work, but it worked with "test"! Could you please provide me with some reference links? Thank you in advance. – locnnil Jul 12 '23 at 21:38