3

When I look at journalctl, it tells me the PID and the program name(or service name?) of a log entry.

Then I wondered, logs are created by other processes, how do systemd-journald know the PID of these processes when processes may only write raw strings to the unix domain socket which systemd-journald is listenning. Also, do sytemd-journald always use the same technique to detect the PID of a piece of log data even when processes are producing log using functions like sd_journal_sendv()?

Is there any documentation I should read about this?

I read JdeBP's answer and know systemd-journald listen on an Unix Domian Socket, but even if can know the peer socket address who send the log message, how does it know the PID? What if that sending socket is opened by many non-parent-children processes?

2 Answers2

5

It receives the pid via the SCM_CREDENTIALS ancillary data on the unix socket with recvmsg(), see unix(7). The credentials don't have to be sent explicitly.

Example:

$ cc -Wall scm_cred.c -o scm_cred
$ ./scm_cred
scm_cred: received from 10114: pid=10114 uid=2000 gid=2000

Processes with CAP_SYS_ADMIN data can send whatever pid they want via SCM_CREDENTIALS; in the case of systemd-journald, this means they can fake entries as if logged by another process:

# cc -Wall fake.c -o fake
# setcap CAP_SYS_ADMIN+ep fake

$ ./fake `pgrep -f /usr/sbin/sshd`

# journalctl --no-pager -n 1
...
Dec 29 11:04:57 debin sshd[419]: fake log message from 14202
# rm fake
# lsb_release -d
Description:    Debian GNU/Linux 9.6 (stretch)

systemd-journald handles datagrams and credentials sent via ancillary data is in the server_process_datagram() function from journald-server.c. Both the syslog(3) standard function from libc and sd_journal_sendv() from libsystemd will send their data via a SOCK_DGRAM socket by default, and getsockopt(SO_PEERCRED) does not work on datagram (connectionless) sockets. Neither systemd-journald nor rsyslogd accept SOCK_STREAM connections on /dev/log.

scm_cred.c

#define _GNU_SOURCE     1
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
#include <err.h>

int main(void){
        int fd[2]; pid_t pid;
        if(socketpair(AF_LOCAL, SOCK_DGRAM, 0, fd)) err(1, "socketpair");
        if((pid = fork()) == -1) err(1, "fork");
        if(pid){ /* parent */
                int on = 1;
                union {
                        struct cmsghdr h;
                        char data[CMSG_SPACE(sizeof(struct ucred))];
                } buf;
                struct msghdr m = {0};
                struct ucred *uc = (struct ucred*)CMSG_DATA(&buf.h);
                m.msg_control = &buf;
                m.msg_controllen = sizeof buf;
                if(setsockopt(fd[0], SOL_SOCKET, SO_PASSCRED, &on, sizeof on))
                        err(1, "setsockopt");
                if(recvmsg(fd[0], &m, 0) == -1) err(1, "recvmsg");
                warnx("received from %d: pid=%d uid=%d gid=%d", pid,
                        uc->pid, uc->uid, uc->gid);
        }else   /* child */
                write(fd[1], 0, 0);
        return 0;
}

fake.c

#define _GNU_SOURCE     1
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <err.h>

int main(int ac, char **av){
        union {
                struct cmsghdr h;
                char data[CMSG_SPACE(sizeof(struct ucred))];
        } cm;
        int fd; char buf[256];
        struct ucred *uc = (struct ucred*)CMSG_DATA(&cm.h);
        struct msghdr m = {0};
        struct sockaddr_un ua = {AF_UNIX, "/dev/log"};
        struct iovec iov = {buf};
        if((fd = socket(AF_LOCAL, SOCK_DGRAM, 0)) == -1) err(1, "socket");
        if(connect(fd, (struct sockaddr*)&ua, SUN_LEN(&ua))) err(1, "connect");
        m.msg_control = &cm;
        m.msg_controllen = cm.h.cmsg_len = CMSG_LEN(sizeof(struct ucred));
        cm.h.cmsg_level = SOL_SOCKET;
        cm.h.cmsg_type = SCM_CREDENTIALS;
        uc->pid = ac > 1 ? atoi(av[1]) : getpid();
        uc->uid = ac > 2 ? atoi(av[2]) : geteuid();
        uc->gid = ac > 3 ? atoi(av[3]) : getegid();
        iov.iov_len = snprintf(buf, sizeof buf, "<13>%s from %d",
                ac > 4 ? av[4] : "fake log message", getpid());
        if(iov.iov_len >= sizeof buf) errx(1, "message too long");
        m.msg_iov = &iov;
        m.msg_iovlen = 1;
        if(sendmsg(fd, &m, 0) == -1) err(1, "sendmsg");
        return 0;
}
  • I see. So get the sender's info doesn't need the sender process to send it initiatively. But what if the sender process has CAP_SYS_ADMIN and sendmsg() a PID different from its own? Will systemd-journald get tricked by this behaviour? – 炸鱼薯条德里克 Dec 29 '18 at 07:36
  • no, the kernel checks the credentials. that's mentioned in the unix(7) manpage under SCM_CREDENTIALS. –  Dec 29 '18 at 07:38
  • Yeah, but it mentioned The sender must specify its own process ID (unless it has the capability CAP_SYS_ADMIN). That's why I mention the CAP_SYS_ADMIN, am I misunderstanding anything? – 炸鱼薯条德里克 Dec 29 '18 at 07:40
  • Yes, a process with CAP_SYS_ADMIN can send a pid different from its own. (Haven't tested it, though) –  Dec 29 '18 at 07:43
  • This has apparently changed over recent years. If a process forks children and shares stdout/stderr with them, on systemd 219, the _PID is always the parent pid on the journal regardless of which process wrote to stdout. On the other hand, as of systemd 247 the _PID in the journal correctly matches the pid of the originating child. – istepaniuk Jan 15 '21 at 19:59
  • @istepaniuk No, this answer predates that change, and is about a different thing. This is about old-style daemons which are using syslog(3) (or systemd's sd_journal_sendv()) to log messages in a "stateless" manner (by just sending them to a datagram unix-domain socket), not about processes managed (in a "stateful" manner) by systemd, which are "logging" by just writing to their stdout and stderr (redirected to a stream socket by systemd). It's great that they finally fixed that bug, nonetheless ;-) –  Jan 22 '21 at 07:46
  • 1
    @istepaniuk also read my comments to JdeBP's answer, where I tried (in vain!) to explain the difference between the SO_PEERCRED and SO_PASSCRED mechanisms. There is a lot of confusion around them, apparently shared by the systemd people, too. SO_PEERCRED is especially broken, but neither of them can be reliaby used to determine that you're getting the data from the right user or process. –  Jan 22 '21 at 07:58
  • @mosvy Thanks for clarifying. I was puzzled about this so I created this other question: https://unix.stackexchange.com/questions/630145/how-can-i-have-the-pids-in-the-systemd-journal-for-proecesses-that-share-the-sta/630154#630154, specifically about what changed between systemd versions in this other aspect (identifying the PID of the stdout stream) – istepaniuk Jan 22 '21 at 15:53
2

The kernel tells it.

The EUID, EGID, and PID of the original client process that connected the AF_LOCAL stream socket at /run/systemd/journal/stdout is available from the kernel via the SO_PEERCRED socket option, which it uses. UCSPI-UNIX tools obtain this same information via the same system call.

Child service processes of course inherit their standard I/O file descriptors already opened (unless the parent service process changes this, of course), and so to systemd-journald all log output has the credentials of the original parent process.

Log output generated via the AF_LOCAL socket at /run/systemd/journal/socket that speaks the idiosyncratic systemd-journald protocol is coming over a datagram socket, rather than a stream one. This socket is flagged using the SO_PASSCRED socket option so that the kernel records the same information in each datagram sent, which is pulled out of each datagram by systemd-journald.

Further reading

JdeBP
  • 68,745
  • no, it doesn't get it via SO_PEERCRED, but via ancillary data with recvmsg. I've strace'd systemd-journald. –  Dec 29 '18 at 08:10
  • … and you haven't read what you are commenting on, or my previous answer referred to in the question, or indeed all of what the question asks. – JdeBP Dec 29 '18 at 17:05
  • Because the rude dress-down may give the wrong impressions wrt the accuracy of this answer, I want to make it clear: this answer is wrong. I'll try to explain why. 1. Portable apps which are using syslog() do not connect to the stream socket from /run/systemd/journal/stdout; they simply send their data from an unconnected, datagram to /dev/log. Since SO_PEERCRED is getting the creds of the process that connected to a socket, and does not work with connectionless sockets, it cannot be and is not used to get the pid of the process that called syslog(). –  Dec 30 '18 at 08:01
  • 2. unless overrided by a privileged process, the creds that systemd gets via recvmsg as described in my answer will be those of the process that called send() by way of syslog(), not of the parent process that created the socket or called connect() on it. The second paragraph is particularly misleading, because even SO_PEERCRED on a connection-based socket will not return the creds of the process that created the socket file descriptor, but of the process that connect()ed it. –  Dec 30 '18 at 08:02
  • 3. The SO_PASSCRED option should be set on the socket on which the creds are to be received, not on the socket on which they're sent, and it does not cause the kernel to stick the same info in each datagram sent in the way it's described in the 3rd paragraph. –  Dec 30 '18 at 08:03
  • What you are actually making clear is that you don't read. You didn't read the question talking about logs going to journald from child processes, or my answer explaining how standard output and error go to journald through the very mechanism that you've erroneously claimed is not used at the client end. You didn't read this answer which clearly draws a distinction between that, where SO_PEERCRED most definitely is used despite your erroneous claims to the contrary, and others. You didn't even read where this answer showed you exactly where the systemd code is doing what I state. – JdeBP Jan 27 '19 at 11:54
  • I completely stand by the accuracy of the description from my answer and comments, that I've checked and re-checked. systemd-journald is only using SO_PEERCRED for a stream socket opened by sd_journal_stream_fd() (/run/systemd/journal/stdout) which is absolutely not used by the standard syslog(3) or the sd_journal_send* and sd_journal_print* functions, which are all using datagram sockets, on which SO_PEERCRED does not work. –  Jan 27 '19 at 12:28
  • I'm not aware of any program that's using that sd_journal_stream_fd stream log facility and I don't think that a syslog program that keeps states of clients is a good idea in the 1st place, but that's a completely different matter, not related to this question. As to 'not reading', that simply amounts to bullying; of course I've read everything, it's simply that I prefer to base my answers on facts, rather than do exegesis of other people's answers and second guess what has misled them into believing things that are not true. –  Jan 27 '19 at 12:42