1
$ sudo lsof -u t  |   grep -i "\.pdf" 

evince  1788    t   37r      REG                8,4    176328     134478 /home/t/some/path1/white space/string1 + string2 string3.pdf
evince  3737    t   36r      REG                8,4   1252636    6692680 /home/t/some/path2/white space/string5 string3.pdf

How can I extract only the second column (pids of processes)?

How can I extract only the ninth column (pathnames of files)? (pathnames can contain any character allowed by Linux and ext4 file systems)

My real command is

$ sudo lsof -u t  | grep -v "wineserv" | grep REG  |   grep "\.pdf" | grep  "string"

where I would search for records whose first column "COMMAND" isn't wineserv, and fifth column "TYPE" is REG, and whose ninth column "NAME" contains .pdf and string.

Prefer bash, awk or Python solutions (and maybe Perl, but I don't know Perl, so won't be able to verify if it is correct or modify it later)

Thanks.

Tim
  • 101,790
  • 1
    lsof has -F flag according to the manual, so you could do lsof -F p to get just the PID itself. Let me know if you want that as an answer, but of course I can do Python and awk parsing as well – Sergiy Kolodyazhnyy Feb 16 '19 at 01:38
  • @SergiyKolodyazhnyy Thanks, and yes. See my update. – Tim Feb 16 '19 at 01:51
  • 1
    Related: https://unix.stackexchange.com/q/299040/117549 – Jeff Schaller Feb 16 '19 at 02:59
  • no need for lsof: find /proc/*/fd -ilname '*.pdf' 2>/dev/null | awk -F/ '{print$3}' (btw, this will also work if the filenames contain newline, spaces, etc). –  Feb 16 '19 at 12:56
  • @mosvy Thanks. How is using parsing output of find on /proc file system compared to parsing lsof output? – Tim Feb 16 '19 at 15:14
  • @mosvy Besides only needing pid, I also want only pathname of pdf file. Can you modify find /proc/*/fd -ilname '*.pdf' 2>/dev/null | awk -F/ '{print$3}' accordingly? – Tim Feb 16 '19 at 23:02
  • find /proc/*/fd -ilname '*.pdf' -printf '%l\n' , find /proc/*/fd -ilname '*.pdf' -printf '%p\t%l\n'. You can also get that info with whatever language you want (C, perl, python, etc). The value added by a tool like lsof should be the ease of use and the human-friendly way it presents that info -- and lsof fails at both spectacularly. –  Feb 17 '19 at 09:02

2 Answers2

3

Using regular expressions:

$ ... | perl -nlE '/.*? (\d+).*?(\/.*)/ and print("$1 ; $2")' 

1788 ; /home/t/some/path1/white space/string1 + string2 string3.pdf
3737 ; /home/t/some/path2/white space/string5 string3.pdf
JJoao
  • 12,170
  • 1
  • 23
  • 45
  • Thanks. By (\/.*), do you assume that lsof always output resolved absolute pathnames not relative pathnames? see https://unix.stackexchange.com/questions/501002/does-lsof-always-show-the-resolved-absolute-pathnames-of-opened-files – Tim Feb 17 '19 at 02:27
  • @Tim, yes (i though this is the default behavior of lsof). I believe some other situations can also easily be covered (some limitations are predictable) – JJoao Feb 17 '19 at 13:13
2

If I understand your requirements this should work:

awk '{ for (i=9; i<=NF; i++) {
    if ($i ~ "string" && $1 != "wineserv" && $5 == "REG" && $NF ~ "\.pdf$") {
        $1=$2=$3=$4=$5=$6=$7=$8=""
        print
    }
}}'
  • Loop through all the fields from 9 to the end, if one contains string:

    • Check that field 1 does not equal wineserv
    • field 5 does equal REG
    • The last field contains .pdf (I think it's safe to assume that even if the file has whitespace the extension should be in the last part)
  • If all conditions are met erase the first 8 fields and print what's left

jesse_b
  • 37,005
  • Thanks.$NF ~ ".pdf" the . doesn't work as a literal dot. – Tim Feb 16 '19 at 02:14
  • @Tim: Thanks didn't realize that. I'll update with \ – jesse_b Feb 16 '19 at 02:15
  • Sorry, forgot to say $NF ~ "\.pdf" doesn't work either. pathnames containing /.../pdf.../... will still match. I don't know why they match. – Tim Feb 16 '19 at 02:23
  • @Tim: How about "\.pdf$" – jesse_b Feb 16 '19 at 02:32
  • That works. But still why /.../pdf.../... matches \.pdf? – Tim Feb 16 '19 at 02:33
  • Why does PDFXCview 4333 t 255r REG 8,4 880 27793700 /home/t/program_files/document/formats/pdf/TrackerSoftware/pdfxcview/pdfxchange_portable/PDFXCview match \.pdf? – Tim Feb 16 '19 at 02:35
  • @Tim: It doesn't for me using mawk 1.2 – jesse_b Feb 16 '19 at 02:41
  • I am using GNU Awk 4.1.4. (1) Does gawk cause the problem? (2) What do you think of gawk vs mawk? – Tim Feb 16 '19 at 03:33
  • Thanks. If I would like to allow arbitrary number of string's to match against, and I would like to put the awk command in to a bash script which accept the string's as command line arguments, how would you arrange the arbitrary number of string's (provided as positional parameters in the shell script) in the the awk command in the shell script? (the relation between multiple string's is AND, i.e. all the string's must appear in the pathname, albeit in arbitrary order. – Tim Feb 16 '19 at 06:38
  • Thanks. The output has a pathname in each line. In each line, there are several blank spaces(?) in front of a pathname. Can the blank spaces be suppressed in the output? – Tim Feb 16 '19 at 06:53
  • Perhaps: $NF ~ /\.pdf/ – JJoao Feb 16 '19 at 10:17