9

In linux procfs, /proc/<pid>/stat includes as its second argument the name of the process in parentheses. As far as I can tell (by experimentation) this is unescaped. For example, I have been able to create the following

$ gcc test.c -o 'hello) (world'
...
$ cat /proc/9115/stat
9115 (hello) (world) S 8282 9115 ...

(similarly gcc test.c -o 'name) S 42 23' can allow processes to accidentally or deliberately create fields which will probably mislead naive parsers).

I need to "get at" one of the later fields so need a correct way of skipping this field. I've searched for quite a while to find a reliable way of parsing this line, but have failed to find a canonical question or example.

However, from what I can tell ) is not valid in any field to the right of this field, so a scan from right to left to find the rightmost ) should correctly delimit this second field. Is this correct? This seems a little flaky to me (what if some new field allows ) at a later date)? Is there a better way to parse this file that I've overlooked?

Dannie
  • 193
  • I think you have already figured out the best solution. – larsks Dec 20 '19 at 12:41
  • @mosvy I think I'm ok to code it when I'm sure of the right approach. I've not been able to track the procfs source which formats this line down (which would help me confirm my experiments) because I've always got lost in code for the unrelated /proc/stat. – Dannie Dec 20 '19 at 12:57

3 Answers3

4

The format of /proc/<pid>/stat is documented in the proc(5) manpage.

There cannot be another (...) field, nor could be added in the future, because that would make the format ambiguous. That's quite easy to see in.

The kernel code which formats the /proc/<pid>/stat file is in fs/proc/array.c.

The OP won't tell which language they're using. In perl, something like this could be used:

my @s = readfile("/proc/$pid/stat") =~ /(?<=\().*(?=\))|[^\s()]+/gs;

Notice the s: the "command" field can also contain newlines.

  • Per your deleted comments, I'm happy to tell you which language I'm using: it's python.I could code this in python, also in perl, C, Rust, bash, honest I can. I want to know if it's the right approach. Searching for the last bracket in a string is easy, is it the right thing to do? I've confirmed it probably is by your additional comments. Personally, I'd prefer to avoid over-golfed perl, but I suppose that depends on the context. – Dannie Dec 20 '19 at 13:59
  • (+1 though, because it's the right answer) – Dannie Dec 20 '19 at 14:01
  • In this example, it's not perl that's "golfing", it's regex. It's very common for people not familiar with regex to blame perl for it, but regex is supported by other languages too. The question is do you want to use regex or parse it more laboriously, not necessarily which actual language you use. –  Dec 26 '19 at 01:45
  • @mosvy: what is this readfile() function? Is it a perl6 thing because I don't see it in perl 5. –  Dec 26 '19 at 01:46
  • readfile is sub readfile { local $/; my $h; open $h, '<', $_[0] and <$h> } or sub readfile { local (@ARGV, $/) = $_[0]; <> }. Basically anything that slurps the whole file and returns it as a string will do -- you can put the whole thing in a single do { .. } block and dispense with that sub, but that will make it look even more "golfed" ;-) –  Dec 26 '19 at 02:46
2

Since all the remaining fields are regular numbers, why not work backwards.

e.g.

$ cat /proc/2086/stat
2086 (hello) (world) S 1893 2086 1893 34816 2175 1077952512 119 0 0 0 0 0 0 0 20 0 1 0 5098 7458816 179 18446744073709551615 94130946203648 94130946231776 140722152072096 0 0 0 0 0 0 1 0 0 17 0 0 0 0 0 0 94130948332368 94130948333696 94130971459584 140722152080859 140722152080880 140722152080880 140722152083432 0
$ awk '{ print $(NF-48) } ' /proc/2086/stat
1893
$

steve
  • 21,892
  • Some older versions of linux have less than 52 fields, but you can easily skip over the (...) with awk -v RS= '{sub(/.*\)/,"");print $1,$2}' /proc/3419/stat –  Dec 26 '19 at 05:57
2

This is how I parse stat file:

            static char c;
            static long pos = 0;
            fh = fopen(proc_stat_path, "r");
            if(fh == NULL) ...


            // Find the last ")" char in stat file and parse fields thereafter.
            #define RIGHTBRACKET ')'
            while(1)
            {
                    c = fgetc(fh);
                    if (c == EOF) break;
                    if (c == RIGHTBRACKET) pos = ftell(fh);
            }
            fseek(fh, pos, 0);

            fscanf(fh, " %c %d %d" ..., &state, &ppid, ...);
bandie
  • 373