How to merge multiple piped awk commands into a single awk command

Question

I am writing a script to filter a file that has contents like

a:10
b:20
c:60
# comment
{{# random mustache templating}}
d=4
e=6

to get the output which would look like

a
b
c
d
e

Here is my command

cat filename.txt | awk '{$1=$1;print}' | awk -F'{{' '{print $1}' | awk -F'=' '{print $1}' | awk -F':' '{print $1}' | awk -F'#' '{print $1}' | awk /./

Purpose:

Remove anything in a line from the occurrence of characters '=' or ':'.
Remove the line that starts with '{{' to remove templating.
Trim whitespaces at the beginning and end of each line.
Remove all blank lines.

As I am new to bash, how can I make this command shorter?

Is there a reason to have so many rules sequentially? Perhaps you could write a regex? — Harrys Kavan, Nov 18 '20 at 11:53

Stephen Kitt · Accepted Answer · 2020-11-18T10:54:24.027

2

The field separator can be a full regex, so

awk -F'[:#=]' '!/^{{/ && length($1) > 0 { split($1, a, " "); print a[1] }' filename.txt

is sufficient: any one of ‘:’, ‘#’, ‘=’ will act as a separator. We exclude lines starting with “{{”, match lines where $1 is non-empty, split $1 on whitespace, and print the first resulting field.

edited Nov 18 '20 at 10:54

answered Nov 18 '20 at 10:20

Stephen Kitt

434,908

4

For what it's worth, I agree with @AdminBee that inline code markup is cleaner and more legible than pretty-quotes. – terdon Nov 18 '20 at 11:13
I suppose it can be blinding to see print $1 four times in one command, but the OP never says that they want only the first “word” from the input line. Arguably, an input of ⁠   foo    bar should produce an output of foo bar, not just foo. P.S. I see now that that the first version of your answer got this “right”; it’s not clear to me why you changed it. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
@G-Man I don’t remember now, it’s possible there were clarifying comments that have been deleted since. The original commands in the question only kept the first word, and since the question asks for a simplification of those commands, it seems reasonable to keep that behaviour even though it’s not identified explicitly. – Stephen Kitt Nov 20 '22 at 16:21
“The original commands in the question only kept the first word…” Um, where do you see that? Therein lies the point of my comment; the pipeline in the question says print $1 four times, and each one is with a field separator other than whitespace. So it keeps the “first word” in the sense that foo bar is the first word of ⁠   foo    bar:42. … … … … … … … … … … … … … … … No worries; I have plenty of answers where I cannot reconstruct what I was thinking when I wrote them. – G-Man Says 'Reinstate Monica' Nov 24 '22 at 03:01
@G-Man ah yes, I am indeed blind… The answer timeline shows it being accepted after the relevant edit, so presumably there was a reason for it. – Stephen Kitt Nov 24 '22 at 05:32

Ed Morton · Answer 2 · 2020-11-19T00:21:43.027

1

Keep it simple:

$ awk 'NF && ($1 !~ /^(#|\{+)/) { sub(/[:=].*/,""); print $1 }' file
a
b
c
d
e

edited Nov 19 '20 at 00:21

answered Nov 19 '20 at 00:15

Ed Morton

31,617

(1) I suppose it can be blinding to see print $1 four times in one command, but the OP never says that they want only the first “word” from the input line. Arguably, an input of ⁠   foo    bar should produce an output of foo bar, not just foo. (2) The question is a bit non-specific, but it does explicitly say “Remove all blank lines”, and your code outputs a blank line for an input line that begins with : or  =. … (Cont’d) – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
(Cont’d) … (3) The OP doesn’t say what they want done with #, but their working code removes the first # and everything beyond it, so foo#bar becomes foo. Your code passes foo#bar through unchanged. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21

Bumbling Badger · Answer 3 · 2020-11-19T03:43:58.647

To achieve the result above, I just used regex for the field separator, regex to select the lines and {print $1} to print the first column.

I see no leading whitespace or blank lines in your example, but if you need to deal with these, see my variations to this command below.

awk -F'[:=]' '!/^[#{]/{print $1}' filename.txt

Result:

a
b
c
d
e

If you have whitespace leading or trailing, the following may work. Though, I will admit, without seeing an example it is tricky for me to visualise.

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/{print $1}' filename.txt

To cover every possible case, based on your comments, I have adapted the example. Now, we have leading and trailing whitespace and empty lines.

a:10
b :20
  c:60
# comment
{{# random mustache templating}}
d=4
e =6

This is the slightly altered command to deal with this:

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/ && !/^$/{print $1}' filename.txt

The field separator regex separates the first field $1 from everything which comes after : or =
gsub removes all leading and trailing spaces
The regex before {print $1} removes all lines starting with a # or { to exclude comments, 'templating' and blank lines.

This produces the following result from the adapted example:

a
b
c
d
e

@RishabhBohra The current accepted answer doesn't deal with leading space before the {{# random mustache templating}} in my adapted example. Instead, it leaves behind {{. However, it really depends on what you are looking for? — Bumbling Badger, Nov 19 '20 at 03:57
(1) As you know, the question says “Remove all blank lines”, but your code outputs a blank line for an input line that begins with : or  =. (2) The OP doesn’t say what they want done with #, but their working code removes the first # and everything beyond it, so foo#bar becomes foo. Your code passes foo#bar (and even foo #bar) through unchanged. — G-Man Says 'Reinstate Monica', Nov 19 '22 at 20:23

score 0 · Answer 4 · answered Nov 19 '20 at 05:52

0

Using sed:

sed -E '{ s/\s*([^:=]*).*/\1/ }; /^(\{\{|#|$)/d' infile

Swipe the order of the commands above to sed -E '/.../d; { ... }', if you also want to keep those lines that not started immediately with {{ or # characters but whitespaces.

answered Nov 19 '20 at 05:52

αғsнιη

41,407

codeholic24 · Answer 5 · 2020-11-19T05:24:25.280

-1

May be this will help you to achieve the expected result

#!/bin/bash
dynamic_array=()
while read -r line 
do 
    var=$(echo "$line" | cut -c 1)

    if ! { [ "$var" = '#' ] ||  [ "$var" = '{' ] || [ "$var" = '}' ]; }
    then
                 dynamic_array+=("$var")

    fi 
done < A.txt
str_array_value="${dynamic_array[*]}" ; echo "$str_array_value" | tr ' ' '\n' | awk '!seen[$0]++'

Output :

a   
b   
c    
d
e

edited Nov 19 '20 at 05:24

answered Nov 18 '20 at 14:18

codeholic24

307

2

Please note that while this works, using shell loops to process text files is very inefficient, and in most cases should be avoided in favor of one of the dedicated tools like awk, sed, perl or grep. – AdminBee Nov 18 '20 at 14:50
1

Copy/paste that into http://shellchek.net and it'll tell you about some of the issues and read why-is-using-a-shell-loop-to-process-text-considered-bad-practice for why this is the wrong approach anyway. – Ed Morton Nov 19 '20 at 00:11

How to merge multiple piped awk commands into a single awk command

5 Answers5