Get the last word on each line

Question

I have a large text file generated from strace which contains in brief :

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 42.93    3.095527         247     12512           unshare
 19.64    1.416000        2975       476           access
 13.65    0.984000        3046       323           lstat
 12.09    0.871552         389      2239       330 futex
 11.47    0.827229          77     10680           epoll_wait
  0.08    0.005779          66        88           fadvise64
  0.06    0.004253           4      1043       193 read
  0.06    0.004000           3      1529         3 lstat
  0.00    0.000344           0      2254      1761 stat
[...]
  0.00    0.000000           0         1           fallocate
  0.00    0.000000           0        24           access
  0.00    0.000000           0         1           open

Excluding the first header line, I would like to get from each line the last field, corresponding to the syscall column. Those would include:

unshare
access
lstat
futex
epoll_wait
.
..
...

This is what I tried tail -n -13 seccomp | awk '{print $5}', which has been able to ignore the first line but somehow some lines containing the error row are ignored due to my search been not refined.

How do i implement this?

The reason using $5 in awk doesn't work is that some of the lines have more fields than the others, namely the error column is empty in most lines, but not all of them. That sort of output is annoying to parse. — ilkkachu, May 27 '22 at 17:57

score 13 · Answer 1 · edited May 27 '22 at 08:13

13

Or like so:

awk 'NR>2 {print $NF}' seccomp
unshare
access
.
.
.

which, for lines beyond the second, prints the last field of the line. NF holds the number of fields, $NF "expands" to the last field's contents¹.

^{¹ or the whole record if it doesn't contain any field (is made of blanks only with the default value of FS, the field separator).}

edited May 27 '22 at 08:13

Stéphane Chazelas

544,893

answered May 27 '22 at 07:45

RudiC

8,969

an explanation of the command is much appreciated.. – geek May 27 '22 at 07:52
@geek I'm guessing that the variable $NF contains the number of fields, so if there are 6 fields, it prints field 6, and if there are 8 fields, it prints field 8, thus always the last field. And NR>2 probably means to only apply this if the line number is larger than two. – gerrit May 30 '22 at 12:48
@gerrit, no, $NF is the $ unary operator applied to the contents of the NF variable, like $1 (or $ 1 or $ (2/2)) is the $ operator applied to the 1 number. – Stéphane Chazelas May 30 '22 at 13:57
@gerrit no need to guess since awk has a man page (e.g. https://man7.org/linux/man-pages/man1/awk.1p.html), plus a POSIX spec (https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html), plus a manual (https://www.gnu.org/software/gawk/manual/gawk.html) and various books plus millions of examples online. – Ed Morton Jun 03 '22 at 11:52
@geek documenting basic features of any well-documented tool/language in every answer posted in a forum is a waste of time and just adds clutter. – Ed Morton Jun 03 '22 at 11:52
@EdMorton Not when someone is asking a specific question it isn't. It's ridiculous to answer a question and then not explain it. That doesn't help. That doesn't help them learn. It's incomplete. Might as well just tell them to RTFM. That's basically what you're saying. – Pryftan Nov 19 '22 at 14:20
@Pryftan it'd be ridiculous to explain every basic construct in every tool/language in every answer. At some point you have to do some amount of learning on your own if you want to use a tool/language. This answer contains NR, NF, and print and the most basic single awk condition { action } statement possible - I don't think it's outrageous to expect someone to look up what those mean. – Ed Morton Nov 19 '22 at 14:43

score 10 · Answer 2 · edited May 27 '22 at 18:06

10

You can easily use grep with option -o (short form of --only-matching).

grep -o "\w*$" filename

\w matches any word character (alphanumeric and underscore)
\w* matches multiple (including zero) word characters
\w*$ matches multiple word characters at the end of the line

To skip the header, use tail -n +3 as suggested by others:

tail -n +3 filename | grep -o "\w*$"

The output is like this:

unshare
access
lstat
futex
epoll_wait
fadvise64
read
lstat
stat
fallocate
access
open

edited May 27 '22 at 18:06

ilkkachu

138,973

answered May 27 '22 at 17:49

Simeon Borko

101

To have this actually match everything after the last space (not just what \w matches), use grep -o '[^ ]*$' instead. – Jivan Pal May 27 '22 at 22:05
Using grep -o '[^ ]*$' matches any character & space including [...] the first field on line 10 – geek May 28 '22 at 08:34
grep -o "\w*$" filename - works like a charm, sometimes grep seems to be less overcomplicated than the others awk and sed – geek May 28 '22 at 08:37

Philippos · Answer 3 · 2022-05-27T07:33:59.603

9

With sed it would be simply

sed '1,2d;s/.* //'

1,2d mean to delete the first to second line, replacing the tail
the substitute command removes everything up to the last whitespace, so you don't need to count columns

To my knowledge, syscalls can't contain any whitespace, so this should work. Otherwise you could rely on the name starting at the 61st character, removing the first 60:

sed '1,2d;s/.\{60\}//'

edited May 27 '22 at 07:33

answered May 27 '22 at 07:28

Philippos

13,453

Or < strace.txt tail -n +3 | cut -c 52- and cut would make it easier to set an ending position, too. (I counted 51 chars before the syscall name, not sure why we got a different number.) If some of lines could be shorter, you might want to use s/.\{1,60\}// in sed to clear the short lines too. – ilkkachu May 27 '22 at 18:00
I always appreciate answers using sed instead of more complicated tools such as awk, perl, etc. @ilkkachu using cut would only work if that exact number of characters was always in the output, while the original sed command above would work regardless of the number of characters before the "last word." – Christopher Schultz May 27 '22 at 20:57
@ChristopherSchultz, yeah, when looking for the last space-separated field, yes. But if we instead wanted one of the middle columns from data formatted like that, e.g. the "errors" column here... Then that would probably be best done by just counting characters. (Or by trying to find if the tool creating the data had an alternate output format...) I meant that as an alternative to the other sed one at the end, the one with the s/.{60}// – ilkkachu May 27 '22 at 21:03
The syscalls correspond to C function names, so you're right, they can't contain any whitespace. – Nonny Moose May 28 '22 at 18:11

score 6 · Answer 4 · answered May 27 '22 at 21:39

6

The standard idiom for printing the last field on a line is

awk '{print $NF}'

The NF variable is automatically set to the Number of Fields on the line, and then $ extracts that field.

I'd say the easiest and safest way to get rid of the unwanted header lines is with egrep.

Putting this all together we have:

scs$ awk '{print $NF}' seccomp | egrep -v '^(--*|syscall)$'

(This would wrongly exclude an actual syscall named "syscall". Presumably that shouldn't be a problem.)

answered May 27 '22 at 21:39

Steve Summit

547

You never need grep when you're using awk. awk '{print $NF}' seccomp | egrep -v '^(--*|syscall)$' = awk '$NF !~ /^(--*|syscall)$/{print $NF}'. By the way, egrep is deprecated in favor of grep -E. – Ed Morton Jun 03 '22 at 12:04

score 3 · Answer 5 · answered May 27 '22 at 10:23

3

Using GNU grep and tail with perl-style regex

grep -Po '.* \K.*' file | tail -12
unshare
access
lstat
futex
epoll_wait
fadvise64
read
lstat
stat
fallocate
access
open

grep -o '[^] ]*$' file | tail -12
unshare
access
lstat
futex
epoll_wait
fadvise64
read
lstat
stat
fallocate
access
open

answered May 27 '22 at 10:23

sseLtaH

2,786

4

tail -n+3 would likely be more appropriate here. – Stéphane Chazelas May 27 '22 at 12:38
@StéphaneChazelas Indeed, or tail +3 – sseLtaH May 27 '22 at 12:51
1

Yes, though tail -n +3 is the standard version. tail +3 is the historical one but deprecated. The GNU implementation of tail takes it as getting the last 10 lines of the file called +3 if 200112 <= $_POSIX2_VERSION < 200809 as those POSIX versions required. – Stéphane Chazelas May 27 '22 at 13:03
Using tail -12 if am right counts from the last line rather than first and the number -12 increments down to the top. It has a weird behaviour as tail -12 seccomp takes the tenth line and excludes the first i guess it' because it was max of twelve lines – geek May 28 '22 at 08:48
@geek That is correct hence why Stéphane Chazelas rightly suggested to work from the top in the comments. The solution with tail -12 is also not flexible i.e if your needed output is more than 12 lines after removing the header lines. – sseLtaH May 28 '22 at 09:43

score 2 · Answer 6 · answered May 29 '22 at 20:42

2

If all the important lines begin with a digit, then

awk '$1~/^[0-9]/{print $NF}'

answered May 29 '22 at 20:42

jp314

131

jubilatious1 · Answer 7 · 2022-05-29T05:18:32.703

Using Raku (formerly known as Perl_6)

raku -ne '.words[*-1].put;'

OR

raku -ne '.words.tail.put;'

Command line flags -ne are used to run the code linewise (non-autoprinting) over the file. The .words call breaks on whitespace: it's short for $_.words wherein $_ denotes the 'topic' variable. Indexing is accomplished via [*-1] 'whatever-star'-minus-one to get the last word, or more simply with .tail. Printing is accomplished using .put ('print-using-terminator', aka \n).

Sample Input:

 42.93    3.095527         247     12512           unshare
 19.64    1.416000        2975       476           access
 13.65    0.984000        3046       323           lstat
 12.09    0.871552         389      2239       330 futex
 11.47    0.827229          77     10680           epoll_wait
  0.08    0.005779          66        88           fadvise64
  0.06    0.004253           4      1043       193 read
  0.06    0.004000           3      1529         3 lstat
  0.00    0.000344           0      2254      1761 stat
  0.00    0.000000           0         1           fallocate
  0.00    0.000000           0        24           access
  0.00    0.000000           0         1           open

Sample Output:

unshare
access
lstat
futex
epoll_wait
fadvise64
read
lstat
stat
fallocate
access
open

Note: if having the header line show up in your output is problematic, you can skip outputting the initial 2 lines with the following code: below ++$ acts as an anonymous state variable that only initializes once, and then increments to count the lines as they're processed:

raku -ne '.words.tail.put if ++$ > 2;'

https://docs.raku.org/syntax/state
https://raku.org

score 0 · Answer 8 · answered May 29 '22 at 04:14

GNU dc , an RPN desk calculator , can pop space separated arguments from its stack. For string data, as is the last field, we enclose it within square brackets first using sed stream editor.

< file \
sed -En '1,2!s/\S+$/[&]/p' |
dc -e "[q]sq [?z0=qpcz0=?]s?
0_0=?"

POSIXly sed can take care of popping the last field. We replace all spaces by newlines and then keep deleting the leading elements . This process terminates wHen no space remains => we are at the last field. Assuming no trailing spaces.

sed '1,2d
  y/ /\n/
  /\n/D' file

perl can do it by using the ternary operator ?

perl -pe '($_) = $.>2 ? /\S+\n/g : ()' file

Get the last word on each line

8 Answers8