1

Summary

Using mirage as an example, a python program that begins with a shebang:

#!/usr/bin/python
...

Looking at /proc/<pid>/comm or using pgrep, it appears like ...

... the process name is "mirage" when I call it via the shebang:

/usr/bin/mirage &

... but is "python" when I call python explicitly:

/usr/bin/python /usr/bin/mirage &
  • I understand that a process can change its own name, but why isn't the name the same in both cases?
  • Is there a generic way to know how a process was originally launched (using only /proc or grep information)?

Details - shebang

/home/martin> /usr/bin/mirage &
[1] 22638

/proc/<pid>/comm says it is "mirage"

/home/martin> cat /proc/22638/comm 
mirage

pgrep finds it as "mirage" but not as "python"

/home/martin> pgrep -al mirage && echo "found" || echo "not found"
22638 /usr/bin/python /usr/bin/mirage
found

/home/martin> pgrep -al python && echo "found" || echo "not found"
not found

Details - python

/home/martin> /usr/bin/python /usr/bin/mirage &
[1] 21348

/proc/<pid>/comm says it is "python"

/home/martin> cat /proc/21348/comm 
python

pgrep finds it as "python" but not as "mirage"

/home/martin> pgrep -al mirage && echo "found" || echo "not found"
not found

/home/martin> pgrep -al python && echo "found" || echo "not found"
21348 /usr/bin/python /usr/bin/mirage
found
  • You can probably use pgrep -f '^(/usr/bin/python )?/usr/bin/mirage' to have the same result for both cases (now there must be some clever regex to allow with or without path etc.) – A.B Aug 25 '18 at 18:43

2 Answers2

2

The process name is derived from argv[0] in the execv() call.

This explains the behavior that you observe.

The fact that in the shebang case the kernel rearranges how things are called does not affect the argv vector.

schily
  • 19,173
1

I understand that a process can change its own name, but why isn't the name the same in both cases?

Because the programs involved are not changing their own names. They are given the names that the default behaviour of Linux assigns. (Other operating systems behave differently, but this is a Linux question because it talks about /proc/*/comm files.)

On Linux, the program name in comm is taken from whatever the program image file name passed to execve() was. Note that this is not either the argument vector nor the environment vector, and the statement that this name is taken from the argument vector on Linux is erroneous.

The program image file names that the shell is actually passing are /usr/bin/python and /usr/bin/mirage, so python and mirage are what /proc/self/comm are initialized to. (comm is the basename of the program image filename, truncated/padded.)

The argument vector initializes what a process has in its /proc/self/cmdline and the environment vector initializes what it has in /proc/self/environ. So with these and /proc/self/comm, and presuming that a program does not alter them when it runs, you know all three of the pieces of information that were passed to execve() and thus exactly "how a process was originally launched" (although, strictly speaking, this is how the program was launched, as the process was launched with fork()).

Here is an example of those three pieces in action, using clearenv, setenv, and exec from the nosh toolset to set up a small environment and force argv[0] for the purposes of exposition:

% clearenv setenv 1THIS 'is the environment string' setenv 2COMPRISING 'all of the environment.' \
> exec -a 'This is argv[0].' /bin/sleep 100 &
[1] 15564
% for i in /proc/$\!/{comm,cmdline,environ} ; do printf "%s:" $i ; cat -v $i ; printf "\n" ; done
/proc/15564/comm:3

/proc/15564/cmdline:This is argv[0].^@100^@
/proc/15564/environ:1THIS=is the environment string^@2COMPRISING=all of the environment.^@
%

And indeed those are exactly what the nosh toolset's exec command passed to execve() in order to run /bin/sleep. (The 3 is because exec uses the C library's fexecve() function, which internally uses program image filenames of the form /proc/self/fd/3.)

The program image file happening to be a script with a named interpreter and the #! magic number, causing substitution of that interpreter for the program image file and shifting of the argument vector, does not change what goes into comm. It remains whatever program image filename was originally given to execve(). Yes, Linux is inconsistent here, because it does change what goes into cmdline in such cases to be the effective final argument string.

Further reading

JdeBP
  • 68,745