24

We have a shell script that -- for various reasons -- wraps a vendor's application. We have system administrators and application owners who have mixed levels of familiarity with systemd. As a result, in situations where the application has failed (systemctl indicates as much), some end users (including “root” system administrators) might start an application “directly” with the wrapper script instead of using systemctl restart. This can cause issues during reboots, because systemd does not call the proper shutdown script -- because as far as it's concerned, the application was already stopped.

To help guide the transition to systemd, I want to update the wrapper script to determine whether it is being called by systemd or by an end-user; if it's being called outside systemd, I want to print a message to the caller, telling them to use systemctl.

How can I determine, within a shell script, whether it is being called by systemd or not?

You may assume:

  • a bash shell for the wrapper script
  • the wrapper script successfully starts and stops the application
  • the systemd service works as expected

An example of the systemd service could be:

[Unit]
Description=Vendor's Application 
After=network-online.target

[Service] ExecStart=/path/to/wrapper start ExecStop=/path/to/wrapper stop Type=forking

[Install] WantedBy=multi-user.target

I am not interested in Detecting the init system, since I already know it's systemd.

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • 1
    General answer for any reasonable init system: If you have a controlling terminal (try to open /dev/tty), then the init system did not start you. Or you acquired a controlling terminal by opening a pty or something, but that's usually a Bad Idea. Or else the init system is not "reasonable" - I've never heard of an init system that gives daemons controlling terminals, but I suppose it could exist? – Kevin Dec 05 '20 at 20:00
  • 3
    @Kevin, ... There are enough ways to not have a controlling tty even when not started by the init system that I'd be worried about that. ssh somehost somecommand, for example, won't have a TTY with ssh using a default configuration. – Charles Duffy Dec 05 '20 at 23:28
  • @AndreasGrapentin, you can have stdout going to somewhere a user can read without a TTY being associated. The example I gave above, of ssh somehost somecommand without -tt or ForceTTY true, demonstrates the point. – Charles Duffy Dec 06 '20 at 20:10
  • if you're in control, a simple option is, have two entry points for the same functionality, put the non systemd one in normal path (don't assume systemd), and the systemd specific one somewhere else (sometimes I see stuff like this put in /usr/lib/myapp/, or /opt/) – ThorSummoner Dec 08 '20 at 18:21
  • 1
    @ThorSummoner yes; that's the essence of wyrm's answer – Jeff Schaller Dec 08 '20 at 18:59

5 Answers5

27

From Lucas Werkmeister's informative answer on Server Fault:

  • With systemd versions 231 and later, there's a JOURNAL_STREAM variable that is set for services whose stdout or stderr is connected to the journal.
  • With systemd versions 232 and later, there's an INVOCATION_ID variable that is set.

If you don't want to rely on those variables, or for systemd versions before 231, you can check if the parent PID is equal to 1:

if [[ $PPID -ne 1 ]]
then
  echo "Don't call me directly; instead, call 'systemctl start/stop service-name'"
  exit 1
fi >&2
Kusalananda
  • 333,661
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • 1
    Apparently this fails for user units (that resides in ~/.config/systemd/user). But yeah, most services are system services so this should cover most use cases. – iBug Dec 05 '20 at 07:23
  • 1
    As suggested in Lucas' answer, it makes the most sense just to add an environment variable to the unit file, something like Environment=LAUNCHED_BY_SYSTEMD=1, since that's entirely in the OP's control, and not reliant on unrelated behaviors of systemd. – jpaugh Dec 07 '20 at 16:27
  • @jpaugh yes; it's a simple workaround, which is nice in its simplicity, but also has a simple bypass, as I commented on ilkkachu's answer here. – Jeff Schaller Dec 07 '20 at 16:31
13

The Short Answer

if ! grep -qEe '[.]service$' /proc/self/cgroup; then
    echo "This script should be started with systemctl" >&2
    exit 1
fi

...or, if you know the specific service name you're expected to run as, and want to be robust against misconfigurations that prevent a user session from being created:

if ! grep -qEe '/myservice[.]service$' /proc/self/cgroup; then
    echo "This service should be started with systemctl start myservice" >&2
    exit 1
fi

Why It Works

One way to determine which service -- if any -- started the current process is checking /proc/self/cgroup. For a systemd-triggered service, this will contain the service name; for example:

12:pids:/system.slice/dhcpcd.service
11:rdma:/
10:memory:/system.slice/dhcpcd.service
9:blkio:/system.slice/dhcpcd.service
8:devices:/system.slice/dhcpcd.service
7:hugetlb:/
6:cpuset:/
5:freezer:/
4:cpu,cpuacct:/system.slice/dhcpcd.service
3:net_cls,net_prio:/
2:perf_event:/
1:name=systemd:/system.slice/dhcpcd.service
0::/system.slice/dhcpcd.service

...whereas for a process associated with a user's session, the cgroup will be something more like /user.slice/user-1000.slice/session-337.scope (assuming that this is the user with UID 1000's 337th session on the system since its last reboot).


A Fancier Implementation

If one wants to detect the specific service being run as, this too can be extracted from /proc/self/cgroup. Consider, for example:

cgroup_full=$(awk -F: '$1 == 0 { print $3 }' /proc/self/cgroup)
cgroup_short=${cgroup_full##*/}
case $cgroup_full in
  /system.slice/*.service) echo "Run from system service ${cgroup_short%.*}";;
  /user.slice/*.service)   echo "Run from user service ${cgroup_short%.*}";;
  *.service)               echo "Service ${cgroup_short%.*} type unknown";;
  *)                       echo "Not run from a systemd service; in $cgroup_full";;
esac
Charles Duffy
  • 1,732
  • 15
  • 22
  • I've seen systems where interactive shell sessions over ssh have a few lines ending in /system.slice/ssh.service in /proc/self/cgroup. – Stéphane Chazelas Dec 07 '20 at 11:48
  • 1
    True, that'll happen if PAM isn't set up correctly on a systemd host. ("Correctly" in this context meaning "invoking the modules systemd provides and requires for correct operation"). I'd call such a misconfiguration always a bug on the part of the distributor or other responsible party; it'll break other systemd features as well. – Charles Duffy Dec 07 '20 at 12:44
  • While that may be true (one can also argue that starting a systemd user session for each ssh host cmd is overkill), that shows that your method is not foolproof. Consider the case where the command is run from at or cron for instance (whose pam profile includes common-session-noninteractive (which non-interactive ssh sessions should probably do as well, though I don't think it's currently possible) which doesn't include pam_systemd). – Stéphane Chazelas Dec 07 '20 at 15:21
  • @StéphaneChazelas, in such a case, I'd expect someone to want to check whether $cgroup_short is an exact match for a precise service, rather than whether it just matches *.service. Amended to suggest this explicitly. – Charles Duffy Dec 07 '20 at 18:45
7

Another obvious solution that comes to mind is to add something like

Environment=FROM_SYSTEMD=1

to the service file, and test on that envvar.

ilkkachu
  • 138,973
  • This is good, because it doesn't rely on a particular systemd version (for INVOCATION_ID), but my only concern with envvars is their ability to be faked. Regardless, setting a variable like this would trigger the early exit and give a chance to inform the caller to use systemctl. – Jeff Schaller Dec 05 '20 at 12:45
  • 5
    @JeffSchaller, if you get an error from the wrapper script telling you to run systemctl, it's much easier to do that instead of going through the script to find out what envvar to set. On the other hand, if you have co-admins with root access who actively work against instructions... well, not much you can do then. – ilkkachu Dec 05 '20 at 13:36
5

I like Jeff Schaller's answer, which is probably The Right Thing™. Another approach could be to use two scripts. Move the actual wrapper from /path/to/wrapper to some other filename, and use that name in the systemd unit file. And then create another script, with the original name, that does nothing but display a helpful error message.

wyrm
  • 543
  • 1
    As an aside, either one of these approaches is likely to result in a few of your peers making their own private copies of the original wrapper. So stand by for that noise. – wyrm Dec 04 '20 at 22:38
  • Yes; the problem of starting the app outside of systemd is too broad to solve one way. Anyone that finds the moved script could execute it. By having one (at least, initially) script, pointed to by the systemd service, it minimizes the problem a bit. Thank you for providing an alternative! – Jeff Schaller Dec 04 '20 at 23:50
2

Another approach may be to query systemd explicitly, to obtain more tightly-coupled checks.

For instance, for analogous use-cases I have been doing like this:

if [ "$(systemctl show -p ControlPID vendorservice)" != "ControlPID=$$" ]; then
    echo 'no go'
    exit 1
fi

The snippet above works only for type=forking services because it leverages the particular states that that kind of job can be in during its lifetime.

Specifically, it queries the ControlPID value set by systemd that, for type=forking jobs, indicates the process spawn directly by systemd itself in the meantime it is waiting for the PIDFile to appear (or GuessMainPID to retrieve something), after which the ControlPID value is set back to 0. This particular behavior by systemd should also protect that check against possible PID wraparounds.

There are many properties that can be retrieved through systemctl show, some of which vary depending on the specific type= used, so for service types other than type=forking, and/or depending on the specific use-case, it may be a matter of querying the most appropriate one(s) to perform equivalent "tethering" checks.

So this one here is not a generic solution, but it is possibly more robust and backwards compatible1 as well as future-proof2.

Note also that you must already know the correct service name to query, and particularly for instance services you need to know the exact instance name (like vendorservice@1) to query.


1 the ControlPID property and the systemctl show command with its the -p option have always existed since the first release of systemd

2 systemctl show output is part of the stability promise

LL3
  • 5,418