0

Using the -F option for lsof, I can specify which fields are printed:

lsof -w -F pcfn

However, the output is split on multiple lines, ie one line per field:

p23022
csleep
fcwd
n/home/testuser
frtd
n/
ftxt
n/usr/bin/sleep
fmem
n/usr/lib/locale/locale-archive
fmem
n/usr/lib/x86_64-linux-gnu/libc-2.28.so
fmem
n/usr/lib/x86_64-linux-gnu/ld-2.28.so
f0
n/dev/pts/20
f1
n/dev/pts/20
f2
n/dev/pts/20

how can I get custom fields printed on one line?

Martin Vegter
  • 358
  • 75
  • 236
  • 411

4 Answers4

2

The lsof -F output is meant to be post-processable.

AFAICT, lsof renders backslashes and control characters including TAB and newline¹ at least when they're found in one of the fields with some \x notation (\\, \t, \n for backslash, TAB and newline respectively here)², so it should be possible to format that output using TAB-separated values for each of the opened files and that to still be post-processable:

LC_ALL=C lsof -w -F pcfn | LC_ALL=C awk -v OFS='\t' '
  {t = substr($0, 1, 1); f[t] = substr($0, 2)}
  t == "n" {print f["p"], f["c"], f["f"], f["n"]}'

On your sample, that gives:

23022   sleep   cwd /home/testuser
23022   sleep   rtd /
23022   sleep   txt /usr/bin/sleep
23022   sleep   mem /usr/lib/locale/locale-archive
23022   sleep   mem /usr/lib/x86_64-linux-gnu/libc-2.28.so
23022   sleep   mem /usr/lib/x86_64-linux-gnu/ld-2.28.so
23022   sleep   0   /dev/pts/20
23022   sleep   1   /dev/pts/20
23022   sleep   2   /dev/pts/20

And on lsof -w -F pcfn -a -d3 -p "$!" after:

perl -e '$0 = "a\nb\t"; sleep 999' 3> $'x\ny z\tw' &

That gives:

7951    a\nb\t  3   /home/stephane/x\ny z\tw

To get the actual file names from that output you'd still need to decode those \x sequences.

Note that with that lsof command, you get records for every thread of every process, but you don't include the thread id in your list of fields, so you won't know which thread of the process has the file opened, maybe not a problem as it's rare for threads of a same process to have different opened files, but that still means you'll get some duplication in there which you could get rid of by piping to LC_ALL=C sort -u. You can also disable thread reporting with lsof 4.90 or newer with -Ki.

You may also want to include the TYPE field to know how to interpret the NAME field. Beware lsof appends  (deleted) when the opened file has been deleted, and AFAICT, there's no foolproof way to disambiguate that from a file whose name ends in  (deleted)


¹ That doesn't necessarily mean that lsof can safely cope with filenames that contain newline characters. For instance, on Linux, it still uses the old /proc/net/unix API instead of the netlink one to retrieve information about Unix/Abstract domain sockets, and that one falls appart completely if socket file paths contain newline characters. One can easily trick lsof into thinking a process has some socket opened instead of another by binding to sockets with forged file paths.

² it leaves non-control characters as-is though, and the encoding of some characters (such as α encoded as 0xa3 0x5c in BIG5) in some locales do include the 0x5c byte which is the encoding of backslash as well. So here, we're forcing the locale to C to make sure all bytes above 0x7f are rendered as \xHH to avoid surprises when post-processing.

1

Awk is my favourite hammer for this.

  • Variables with names matching the fields are used, and initialised to "-", since values aren't always provided.
  • This depends on "n" being last. Seeing it triggers printing the output, assuming that by then we would have seen all the fields. Of course the print order can be anything.
lsof -w -F pcfn|awk '
BEGIN {
        p=c=f=n="-"
}
# extract field & value for every line
{field=substr($0,1,1); value=substr($0,2)}
# assign value to matching variable name
/^p/{p=value}
/^c/{c=value}
/^f/{f=value}
/^n/{n=value
        print p,c,f,n
        p=c=f=n="-"
}
'

leading to output like:

1 systemd cwd /
- - rtd /
- - txt /usr/lib/systemd/systemd
- - mem /lib64/libm-2.26.so
and so on...
  • Thank you, but parsing malformed output and trying to piece it back together with awk might technically work, but is quite ugly hack. Like printing the output and then scanning it back with OCR. Can't the problem be fixed at the source (lsof) ? – Martin Vegter Sep 01 '22 at 15:27
  • 1
    Not sure "malformed" is applicable - when you specify -F, per the manpage you are getting "OUTPUT FOR OTHER PROGRAMS" - this is literally made to be sent to awk or the like. You can get closer with the -F0 option for seperating fields with NUL that you could e.g. convert to spaces, but you still don't get consistent records with everything on every line. – Andre Beaud Sep 03 '22 at 05:24
1

If all you want is to have the output of each PID (field p) or each field descriptor (field f) in one line. You can try what the manual states:

As an example, -F pcfn'' will select the process ID (`p'), command name (`c'), file descriptor (`f') and file name (`n') fields with an NL field terminator character; -F pcfn0'' selects the same output with a NUL (000) field terminator character.

lsof -w -F pcfn0

which does print one line (that contains NULs) per each p or f group. You can take a look at the output with less. That doesn't mean that all the fields are going to be present as the manual also states:

Lsof doesn't produce all fields for every process or file set, only those that are available.

But clearly, the -F option is used to transmit data to Other Programs. As the manual states:

Instead of a formatted display, lsof will produce output that can be parsed by other programs. See the -F, option description, and the OUTPUT FOR OTHER PROGRAMS section for more information.

So, there is no alternative, the output of lsof must be processed by other programs. Either a c program or an awk script have been already used in the past. An awk example of how to correctly process the output of lsof is given in the scripts directory:

 /usr/share/doc/lsof/examples/list_fields.awk

Or at https://github.com/Distrotech/lsof/blob/master/scripts/list_fields.awk, for example.

And, there is a lsof_fields.h header file from the lsof distribution to build tables from the output of lsof.

And that seems like what you need to do. That imply parsing each field (line) first character (the identifier of the field provided) and joining all of them into a single table that could be printed.

This answer already shows a way to parse the lsof output

0

awk is great, but I wanted to offer an alternative command that is less well-known: pr. I combine the text-splitting abilities of that command with the column command to display output in a nice, customizable format.

lsof -w -F pcfn / | pr --column 4 --across | column

This has the added benefit of easily being able to be changed based on the output you are interested in, just by matching the number 4 to the number of fields you are interested in.

This is an example of the output:

p1682             cPM2 v5.2.0: God  fcwd              n/
frtd              n/                ftxt              n/home/aaron/.nvm
fmem              n/usr/lib/x86_64- fmem              n/usr/lib/x86_64-
fmem              n/usr/lib/x86_64- fmem              n/usr/lib/x86_64-
fmem              n/usr/lib/x86_64- fmem              n/usr/lib/x86_64-

You can also specify a custom separator with pr if you desire, giving you output like the following that you can customize how you see fit. Example:

lsof -w -F pcfn / | head -20 | pr -ts' ' --column 4 -a

Output:

p1682 cPM2 v5.2.0: God fcwd n/
frtd n/ ftxt n/home/aaron/.nvm/versions/node/v16.14.2/bin/node
fmem n/usr/lib/x86_64-linux-gnu/libnss_dns-2.31.so fmem n/usr/lib/x86_64-linux-gnu/libresolv-2.31.so
fmem n/usr/lib/x86_64-linux-gnu/libc-2.31.so fmem n/usr/lib/x86_64-linux-gnu/libpthread-2.31.so
fmem n/usr/lib/x86_64-linux-gnu/libgcc_s.so.1 fmem n/usr/lib/x86_64-linux-gnu/libm-2.31.so
ajmeese7
  • 244
  • Thank you, but parsing malformed output and trying to piece it back together with awk might technically work, but is quite ugly hack. Like printing the output and then scanning it back with OCR. Can't the problem be fixed at the source (lsof) ? – Martin Vegter Sep 01 '22 at 15:27
  • @400theCat I didn't use awk in my answer... – ajmeese7 Sep 01 '22 at 16:46