The lsof -F
output is meant to be post-processable.
AFAICT, lsof
renders backslashes and control characters including TAB and newline¹ at least when they're found in one of the fields with some \x
notation (\\
, \t
, \n
for backslash, TAB and newline respectively here)², so it should be possible to format that output using TAB-separated values for each of the opened files and that to still be post-processable:
LC_ALL=C lsof -w -F pcfn | LC_ALL=C awk -v OFS='\t' '
{t = substr($0, 1, 1); f[t] = substr($0, 2)}
t == "n" {print f["p"], f["c"], f["f"], f["n"]}'
On your sample, that gives:
23022 sleep cwd /home/testuser
23022 sleep rtd /
23022 sleep txt /usr/bin/sleep
23022 sleep mem /usr/lib/locale/locale-archive
23022 sleep mem /usr/lib/x86_64-linux-gnu/libc-2.28.so
23022 sleep mem /usr/lib/x86_64-linux-gnu/ld-2.28.so
23022 sleep 0 /dev/pts/20
23022 sleep 1 /dev/pts/20
23022 sleep 2 /dev/pts/20
And on lsof -w -F pcfn -a -d3 -p "$!"
after:
perl -e '$0 = "a\nb\t"; sleep 999' 3> $'x\ny z\tw' &
That gives:
7951 a\nb\t 3 /home/stephane/x\ny z\tw
To get the actual file n
ames from that output you'd still need to decode those \x
sequences.
Note that with that lsof
command, you get records for every thread of every process, but you don't include the thread id in your list of fields, so you won't know which thread of the process has the file opened, maybe not a problem as it's rare for threads of a same process to have different opened files, but that still means you'll get some duplication in there which you could get rid of by piping to LC_ALL=C sort -u
. You can also disable thread reporting with lsof 4.90 or newer with -Ki
.
You may also want to include the TYPE field to know how to interpret the NAME field. Beware lsof
appends (deleted)
when the opened file has been deleted, and AFAICT, there's no foolproof way to disambiguate that from a file whose name ends in (deleted)
¹ That doesn't necessarily mean that lsof
can safely cope with filenames that contain newline characters. For instance, on Linux, it still uses the old /proc/net/unix
API instead of the netlink one to retrieve information about Unix/Abstract domain sockets, and that one falls appart completely if socket file paths contain newline characters. One can easily trick lsof
into thinking a process has some socket opened instead of another by binding to sockets with forged file paths.
² it leaves non-control characters as-is though, and the encoding of some characters (such as α
encoded as 0xa3 0x5c in BIG5) in some locales do include the 0x5c byte which is the encoding of backslash as well. So here, we're forcing the locale to C to make sure all bytes above 0x7f are rendered as \xHH
to avoid surprises when post-processing.