7

I want to know the equivalent of

grep -oE '[^ ]+$' pathname

to awk or sed. If someone can answer and explain that would be great. Thank you.

terdon
  • 242,166
  • 4
    Please [edit] your question and explain what you think that command does and what you are trying to do with sed or awk. That way we can be sure that what we give you is what you actually want. – terdon Mar 30 '17 at 14:28
  • 2
    Why do you want to do this with sed or awk instead of grep? What is your actual goal? – Daniel H Mar 30 '17 at 17:09

4 Answers4

6

awk is superset of grep, so you can use that grep pattern in awk as well:

awk '{match($0, "[^ ]+$"); if (RSTART > 0) print substr($0, RSTART, RLENGTH)}'
cuonglm
  • 153,898
  • 2
    awk is superset of grep ? They're two separate utilities , with two separate purposes. Can you explain what you mean by that ? – Sergiy Kolodyazhnyy Mar 30 '17 at 17:07
  • 2
    @Serg: awk can do anything grep does, in standard version – cuonglm Mar 30 '17 at 17:10
  • 1
    it sure can do almost everything that grep does, but asserting that one utility is superset of another unrelated utility is just wrong. grep is line matching tool, awk is text processing language with C-like syntax - different animals. Perl for instance can do anything awk can do and even more, but it's not a superset - it's a completely separate programming language. Don't get me wrong, Your answer is perfectly fine as far as the awk code part, but just that phrase is just . . . wrong – Sergiy Kolodyazhnyy Mar 30 '17 at 17:17
  • @cuonglm, although not perfect your erased solutions were very interesting. YOu shoud not delete them! – JJoao Mar 30 '17 at 17:23
  • @Serg Yes, so that's why I said it's superset, line matching is one of the task a text processing can do. See also this answer for similar mind http://unix.stackexchange.com/a/303049/38906 – cuonglm Mar 30 '17 at 17:24
  • 1
    @JJoao: I need spend more times to make it perfect, you and others still can view it in revision. It needs more document if there's no perfect solution, need to take time, I don't have it now. – cuonglm Mar 30 '17 at 17:25
4

in awk:

awk 'NF>0 {print $NF}'     ## fails with end of line spaces; \thnaks{cuonglm}
awk '/[^ ]$/ {print $NF}'  ## fails with tabs in lasts field; \thanks{Terdon}
awk -F ' +'  '$NF {print $NF}'            ## too tricky  
awk 'match($0,/[^ ]+$/,a){print a[0]}'    ## follows def. of grep -o

In sed:

sed -nr 's/(|.* )([^ ]+)$/\2/p'

\thanks{cuonglm}

and (why not?) in Perl

perl -nlE '/([^ ]+)$/ and say $1'
JJoao
  • 12,170
  • 1
  • 23
  • 45
  • @terdon, you are an evil creature! ☺ ; awk -F ' +' '$NF {print $NF}' ? – JJoao Mar 30 '17 at 17:35
  • 1
    Ooh, clever! Can't think of a way to break that one no :) – terdon Mar 30 '17 at 17:54
  • Use special examples is not good manner to compare with different language. I can say perl is much better at dealing with data than awk? – eexpress Mar 31 '17 at 05:48
  • @utopiceexpress, Perl is a wonderful language (perl -nle '/([^ ]+)$/ and print $1') but that was not the question... – JJoao Mar 31 '17 at 07:57
4

Let's go over what your grep command does first:

  • the -o tells grep to output only the matching part instead of the whole line
  • the -E flag allows use of extended regular expressions
  • '[^ ]+$' will match any non-space character repeated one or more times at the end of the line - basically a word at the end of the line.

Test run:

$ cat input.txt
to be or not to be
that is the question
$ grep -oE '[^ ]+$' input.txt                                      
be
question

Now, how can we do the same in awk ? Well that's easy enough considering that awk operates by default on space-separated entries of each line (we call them words - awk calls them fields). Thus we could print $NF with awk - take the NF variable for number of fields and treat it as referring to specific one. But notice that the grep command would only match non-blank lines, i.e. there is at least one word there. Thus, we need to a condition for awk - operate only on lines which have NF number of fields above zero.

awk 'NF{print $NF}' input.txt

It should be noted that GNU awk at least supports extended regex (I'm not familiar with others as extensively, so won't make claims about others). Thus we could also write a variation on cuonglm's answer:

$ awk '{if(match($0,/[^ ]+$/)) print $NF}' input.txt               
be
question

With GNU sed one can use extended regular expressions as well - that requires -r flag, but you can't simply use same regular expression. There's a need to use backreference with \1.

$ cat input.txt                                                                                                          
to be or not to be
that is the question
blah 

$ sed -r -e 's/^.* ([^ ]+)$/\1/' input.txt                                                                             
be
question
blah

It is possible to obtain desired result with basic regular expression like so:

$ cat input.txt                                                    
to be or not to be
that is the question
blah 
$ sed 's/^.* \([^ ].*\)$/\1/' input.txt                           
be
question
blah 

For more info, please refer to these posts:

  • very good and clear answer +1. (although there are minor behavior differences) – JJoao Mar 31 '17 at 08:01
  • Yes, great explanation, but your examples are not 100% equivalent to the grep. I've been consistently and pedantically pointing out edge cases under all answers, so here's yours: printf 'foo bar\tbaz\na b \n'. Try that oen and compare your sed or awk and the OP's grep. – terdon Mar 31 '17 at 08:06
0

This is the sed-equivalent of what you are trying to do with grep:

# there is a space n a tab in the [...]
sed -ne '/[^  ]$/s/.*[    ]//p'

# this makes the code intent visually unambiguous
TAB=`echo 'x' | tr 'x' '\011'`; # tab
SPC=`echo 'x' | tr 'x' '\040'`; # space
h="[$SPC$TAB]";                 # whitespace
S="[^$TAB$SPC]";                # nonwhitespace

sed -ne "/$S\$/s/.*$h//p"

sed -ne "s/.*$h\($S$S*\)\$/\1/p"
  • No it isn't! That will print all lines of the file, not only those that have at least one non-space character before the end. All you're doing here is removing everything until the last space of a line. You then print everything. – terdon Mar 30 '17 at 17:06
  • Try printf 'foo bar\tbaz\na b \n'. Your first one will print an empty line and your second will print b and a space. This is harder than it looks, I know :) – terdon Mar 30 '17 at 17:58
  • I know, that's why I asked the OP to clarify and didn't answer. If you do answer, however, you need to make sure whatever command you propose gives the exact same output as grep -oE '[^ ]+$'. Your, admittedly very clever, solutions don't do that. Try printf 'foo bar\tbaz\na\tb\n' and pass through the grep and your seds. I don't think you can do it in any other way than what JJoao used with the (|.* ). – terdon Mar 30 '17 at 18:27
  • sed -ne 's/[^ ][^ ]*$/\n&/;s/.*\n//p' Assuming [^ ] => no space only as is the grep. This is for POSIX sed. There's no way to show an escaped newline on the RHS of s/// hence this. –  Mar 30 '17 at 19:00