what are the downsides of splitting /proc/pid/stat by whitespace?

Question

What are the downsides of splitting /proc/pid/stat on Linux by whitespace? For example using bash one can access the third column via

$ cat /proc/$$/stat
14198 (bash) S 14195 14198 14198 34816 ...
$ x=($(< /proc/$$/stat)); echo ${x[2]}
S
$

and all seems well?

score 5 · Answer 1 · answered Dec 07 '17 at 14:58

The chief problem is that the space character (0x20) is used both for the delimiter between records and may also appear within a record; should a local user be able to set the process name

$ perl -e '$0="like this"; sleep 999' &
[1] 14343
$

then the parse splitting by whitespace will fail

$ x=($(< /proc/14343/stat)); echo ${x[2]}
this)
$

as the command name contains a space.

$ cat /proc/14343/stat
14343 (like this) S 14198 14343 ...
$

How bad could this be? According to proc(5) the "controlling terminal of the process" is interesting

          tty_nr %d   (7) The controlling terminal of the  process.   (The
                      minor  device number is contained in the combination
                      of bits 31 to 20 and 7 to 0; the major device number
                      is in bits 15 to 8.)

so if a process misuses the controlling terminal information parsed incorrectly from /proc/pid/stat because someone changed that information, well, you may get a security vulnerability.

The parsing is additionally complicated by the fact that a ) can be placed in the process name though there is a 15 character limit

$ perl -e '$0="lisp) a b c d e f g h i"; sleep 999' &
[4] 14440
$ cat /proc/14493/stat
14493 (lisp) a b c d e) S 14198 14493 14198 34816 ...
$

Ideas to Parse this Wart of an Interface

Since the process name can vary somewhere between the empty string and 15 bytes of almost any contents

1234 () S ...
4321 (xxxxxxxxxxxxxxx) S ...

one idea would be to split on the first space to obtain the pid, then work backwards from the end of this string to find the first ); the stuff before the first ) from the right should be the process name and to the left the regular fields. Unit tests for the code would be highly advisable...

It’s often better to parse one of the other files in /proc/pid, if all the information needed is in a single file, or race conditions aren’t an issue. (So if anyone thought of reading /proc/pid/comm to help with parsing /proc/pid/stat, no, it isn’t a good idea.) — Stephen Kitt, Dec 07 '17 at 15:14

score 3 · Accepted Answer · answered Dec 07 '17 at 16:31

3

If you need to even think about it, why not just read /proc/$pid/status instead. It gives the same information on nicely labeled lines, and escapes newlines and backslashes that appear in the process name:

$ perl -e '$0="foo\nbar\n"; system "head -3 /proc/$$/status";'
Name:   foo\nbar\n
Umask:  0022
State:  S (sleeping)

answered Dec 07 '17 at 16:31

ilkkachu

138,973

that's probably easier for perl or such but likely more difficult for C to deal with ( https://github.com/Microsoft/ProcDump-for-Linux/issues/8 ) – thrig Dec 08 '17 at 22:14
@thrig, eh, that code there reads /proc/$pid/stat, not status. So no wonder it has trouble. Reading a single-datum-per-line file in C is just a loop over fgets() and strcmp (for the headers). Though I do think I'll go to sleep now instead of coding the un-escaping. – ilkkachu Dec 08 '17 at 22:34

what are the downsides of splitting /proc/pid/stat by whitespace?

2 Answers2

Ideas to Parse this Wart of an Interface

Linked