1

I'm trying to pipe the output of one command to two different awk commands. Following this post I am using tee and process substitution. However, I can't see the output for the substituted process.

nvidia-smi | tee >(awk '/ C / {print $6}') | awk '/ C / {print $3}' | xargs -r ps -o user

This is supposed to show the users and memory usage for all gpu processes. The memory usage and PID are extracted from nvidia-smi, respectively, by awk '/ C / {print $6}' and awk '/ C / {print $3}' with the latter then being piped to ps -o user. The output contains only the users though.

What I would like is

<memory-of-process1> <name-of-user-running-process1>
<memory-of-process2> <name-of-user-running-process2>
<memory-of-process3> <name-of-user-running-process3> 
etc

and what I am getting is

<name-of-user-running-process1>
<name-of-user-running-process2>
<name-of-user-running-process3>
etc

I have tried adding fflush() or stdbuf -o0 to the first awk command, as suggested here.

ludog
  • 15
  • 5
  • There are a few issues with process substitution. It often produces output after the main pipeline terminates and a new shell prompt appears, which indicates it runs in background. I suspect its stdout is therefore disconnected from the terminal. Try redirecting that stream to a file to verify it runs. You might consider tailing that file, although it is not a great solution. – Paul_Pedant May 01 '20 at 12:00
  • Yes I can redirect to a file. nvidia-smi | tee >(awk '/ C / {print $6}' > test) | awk '/ C / {print $3}' | xargs -r ps -o user, and then cat test prints the memory usages. – ludog May 01 '20 at 12:08
  • To achieve the synchronisation at a line level: have awk open a pipe from a ps command, send the pids, and read back the user names internally. You probably want to store all the nvidia data and do this in an END action, rather than run a pipe for each line of input. – Paul_Pedant May 01 '20 at 12:09
  • I'm sorry I wouldn't know how to turn what you suggest into a command. Does synchronisation at a line level mean that the name and memory for each proc are printed on the same line? ie, a different issue to the current absence of memory being printed at all – ludog May 01 '20 at 12:17
  • By the time you get through two separate awks, and xargs, and ps, and pipe buffering, there is no way the awk outputs will arrive in pairs. As it is, each of $6 and $3 will have a newline anyway. I can fake up a tested command in a couple of hour's time, if nobody beats me to it. – Paul_Pedant May 01 '20 at 12:26
  • Hmm ok, thanks. I'm also seeing that if I replace the first awk with cat (no arguments), then the user names are printed twice. Like cat takes stdout as its input after the ps command has run. – ludog May 01 '20 at 12:37
  • @ludog Actually easier than that: what happens is that the output of the first awk is piped to the second awk, because the first one inherits the redirection of standard output set by the pipeline. The first awk prints one field per line, which gets filtered out by the second awk (that single field won't match " C "). To make it work you'll need something along the lines of echo 0 1 | { tee >(awk '{print $1}' 1>&3) | awk '{print $2}' | xargs -- ps -h -o user -p; } 3>&1 (look at redirections) but, as Paul_Pedant has pointed out, it won't give you the expected output anyway. – fra-san May 01 '20 at 14:39
  • @fra-san ah yes I see, that makes sense – ludog May 01 '20 at 15:03

1 Answers1

0

More compact solution using a shell script:

nvidia-smi | grep ' C ' | while read _ _ pid _ _ mem; do
    user="$( ps -o user "${pid}" | tail -n +2 )"
    printf '%8s  %6d  %s\n' "${mem}" "${pid}" "${user}"
done

Note, this runs three extra processes per pid.

Original solution using awk:

#! /bin/bash

function nvidia-smi { cat <<'[][]'
A C 937 D E 1.7MB
A C 1232 D E 0.25MB
A E 6112 D E 13MB
A C 2008 D E 437KB
A C 2024 D E 314157
[][]
}

AWK='
/ C / { Pid[NR] = $3; xPid[$3] = NR; Mem[NR] = $6; }
END {
    for (j in Pid) pp = pp "," Pid[j];
    cmd = "ps 2>&1 -o pid,user -p " substr (pp, 2);
    while (cmd | getline) 
        if ($1 in xPid) User[xPid[$1]] = $2;
    close (cmd);
    fmt = "%8s  %6d  %s\n";
    for (j = 1; j <= NR; ++j) 
        if (j in Mem)
            printf (fmt, Mem[j], Pid[j], User[j]);
}
'
    nvidia-smi | awk "${AWK}"

The function nvidia-smi just presents some test data -- discard that. You need the AWK variable (12 lines in the multi-line constant between single-quotes) and the brief pipeline below it.

A test. I included the pid that was using the memory:

paul $ ./nVid
   1.7MB     937  syslog
  0.25MB    1232  root
   437KB    2008  postfix
  314157    2024  paul
ludog
  • 15
  • 5
Paul_Pedant
  • 8,679
  • Thanks. I can't say I fully follow the awk code, but it seems this just does everything inside awk. I can also confirm it works with the 'real' nvidia-smi, except sometimes the user name is not printed.

    282MiB 4611
    282MiB 4613
    282MiB 12090
    282MiB 12091
    282MiB 12093
    282MiB 26005
    282MiB 26009
    1355MiB 7646 bob 1355MiB 7648 bob 347MiB 24436
    1247MiB 4611 bob 1247MiB 4613 bob 1287MiB 12090 bob 1355MiB 24436 bob 1287MiB 12091 bob 1287MiB 12093 bob 1091MiB 26005 bob 1073MiB 26009 bob

    – ludog May 01 '20 at 14:52
  • sorry, there should be newlines in that but I can't format it in a comment – ludog May 01 '20 at 14:55
  • There is a race condition in there. If a process has just ended, it is possible it was present when nvidia-smi was run but has disappeared by the time ps gets run. I really don't know how transient GPU processes can be: for example, are they only connected while they are actually refreshing something. One fix might be to fetch the whole ps output before and after nvidia runs, and check both lists for the pid. – Paul_Pedant May 01 '20 at 15:28
  • Is that data list from one run, or two? Every pid that shows up without a username also shows up again with a user name. 7646 and 7648 only show up once. Possibly the pids are permanent threads, and only get allocated to a username while they are connected. – Paul_Pedant May 01 '20 at 15:40
  • That is from one run. Some processes are spread over multiple gpus. It looks like that is the cause, that the second gpu is then printed without a name. Normally, the gpu processes in my case are not transient, ie last several minutes at least. – ludog May 01 '20 at 16:09
  • Ah yes, the shell script works too. It might be worth moving this more compact solution to the top of your answer? – ludog May 03 '20 at 13:09