1

I am writing a script to filter a file that has contents like

a:10
b:20
c:60
# comment
{{# random mustache templating}}
d=4
e=6

to get the output which would look like

a
b
c
d
e

Here is my command

cat filename.txt | awk '{$1=$1;print}' | awk -F'{{' '{print $1}' | awk -F'=' '{print $1}' | awk -F':' '{print $1}' | awk -F'#' '{print $1}' | awk /./

Purpose:

  • Remove anything in a line from the occurrence of characters '=' or ':'.
  • Remove the line that starts with '{{' to remove templating.
  • Trim whitespaces at the beginning and end of each line.
  • Remove all blank lines.

As I am new to bash, how can I make this command shorter?

αғsнιη
  • 41,407
borz
  • 125
  • 2
  • 7

5 Answers5

2

The field separator can be a full regex, so

awk -F'[:#=]' '!/^{{/ && length($1) > 0 { split($1, a, " "); print a[1] }' filename.txt

is sufficient: any one of ‘:’, ‘#’, ‘=’ will act as a separator. We exclude lines starting with “{{”, match lines where $1 is non-empty, split $1 on whitespace, and print the first resulting field.

Stephen Kitt
  • 434,908
  • 4
    For what it's worth, I agree with @AdminBee that inline code markup is cleaner and more legible than pretty-quotes. – terdon Nov 18 '20 at 11:13
  • I suppose it can be blinding to see print $1 four times in one command, but the OP never says that they want only the first “word” from the input line. Arguably, an input of ⁠   foo    bar should produce an output of foo bar, not just foo.  P.S. I see now that that the first version of your answer got this “right”; it’s not clear to me why you changed it. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
  • @G-Man I don’t remember now, it’s possible there were clarifying comments that have been deleted since. The original commands in the question only kept the first word, and since the question asks for a simplification of those commands, it seems reasonable to keep that behaviour even though it’s not identified explicitly. – Stephen Kitt Nov 20 '22 at 16:21
  • “The original commands in the question only kept the first word…”  Um, where do you see that?  Therein lies the point of my comment; the pipeline in the question says print $1 four times, and each one is with a field separator other than whitespace.  So it keeps the “first word” in the sense that foo bar is the first word of ⁠   foo    bar:42. … … … … … … … … … … … … … … … No worries; I have plenty of answers where I cannot reconstruct what I was thinking when I wrote them. – G-Man Says 'Reinstate Monica' Nov 24 '22 at 03:01
  • @G-Man ah yes, I am indeed blind… The answer timeline shows it being accepted after the relevant edit, so presumably there was a reason for it. – Stephen Kitt Nov 24 '22 at 05:32
1

Keep it simple:

$ awk 'NF && ($1 !~ /^(#|\{+)/) { sub(/[:=].*/,""); print $1 }' file
a
b
c
d
e
Ed Morton
  • 31,617
  • (1) I suppose it can be blinding to see print $1 four times in one command, but the OP never says that they want only the first “word” from the input line.  Arguably, an input of ⁠   foo    bar should produce an output of foo bar, not just foo.  (2) The question is a bit non-specific, but it does explicitly say “Remove all blank lines”, and your code outputs a blank line for an input line that begins with : or  =. … (Cont’d) – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
  • (Cont’d) …  (3) The OP doesn’t say what they want done with #, but their working code removes the first # and everything beyond it, so foo#bar becomes foo.  Your code passes foo#bar through unchanged. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:21
0

To achieve the result above, I just used regex for the field separator, regex to select the lines and {print $1} to print the first column.

I see no leading whitespace or blank lines in your example, but if you need to deal with these, see my variations to this command below.

awk -F'[:=]' '!/^[#{]/{print $1}' filename.txt

Result:

a
b
c
d
e

If you have whitespace leading or trailing, the following may work. Though, I will admit, without seeing an example it is tricky for me to visualise.

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/{print $1}' filename.txt

To cover every possible case, based on your comments, I have adapted the example. Now, we have leading and trailing whitespace and empty lines.

a:10
b :20
  c:60
# comment

{{# random mustache templating}} d=4 e =6

This is the slightly altered command to deal with this:

awk -F'[:=]' '{gsub(/^\s+|\s+$/,"",$1)} !/^[#{]/ && !/^$/{print $1}' filename.txt
  1. The field separator regex separates the first field $1 from everything which comes after : or =
  2. gsub removes all leading and trailing spaces
  3. The regex before {print $1} removes all lines starting with a # or { to exclude comments, 'templating' and blank lines.

This produces the following result from the adapted example:

a
b
c
d
e
  • @RishabhBohra The current accepted answer doesn't deal with leading space before the {{# random mustache templating}} in my adapted example. Instead, it leaves behind {{. However, it really depends on what you are looking for? – Bumbling Badger Nov 19 '20 at 03:57
  • (1) As you know, the question says “Remove all blank lines”, but your code outputs a blank line for an input line that begins with : or  =. (2) The OP doesn’t say what they want done with #, but their working code removes the first # and everything beyond it, so foo#bar becomes foo.  Your code passes foo#bar (and even foo #bar) through unchanged. – G-Man Says 'Reinstate Monica' Nov 19 '22 at 20:23
0

Using sed:

sed -E '{ s/\s*([^:=]*).*/\1/ }; /^(\{\{|#|$)/d' infile

Swipe the order of the commands above to sed -E '/.../d; { ... }', if you also want to keep those lines that not started immediately with {{ or # characters but whitespaces.

αғsнιη
  • 41,407
-1

May be this will help you to achieve the expected result

#!/bin/bash

dynamic_array=()

while read -r line do var=$(echo "$line" | cut -c 1)
if ! { [ "$var" = '#' ] || [ "$var" = '{' ] || [ "$var" = '}' ]; } then dynamic_array+=("$var")
fi done < A.txt

str_array_value="${dynamic_array[*]}" ; echo "$str_array_value" | tr ' ' '\n' | awk '!seen[$0]++'

Output :

a   
b   
c    
d
e