3

I am reading a file line-by-line. Each line looks like this:

xxyu: JHYU_IOPI

Each line is passed to awk as below. I want to print the previous line of the matching pattern; I can achieve this with grep and want to know where I made a mistake with awk.

#!/bin/bash
while read i
do
 awk '/$i/{print a}{a=$0}' ver_in.txt
done<in.txt

I also tried this:

#!/bin/bash
while read i
do
 awk -v var="$i" '/var/{print a}{a=$0}' jil.txt
done<in.txt

Edit: using awk after getting suggestion not to use sh read. My input and desired output are shown below:

EDIT 1: edited the input for @Ed Morton awk script as below

Input file: cat file

/* ----------------- AIX_RUN_WATCH ----------------- */

insert_job: AIX_RUN_WATCH job_type: BOX owner: root permission: date_conditions: 1 days_of_week: su start_times: "22:00" alarm_if_fail: 1 alarm_if_terminated: 1 group: app send_notification: 0 notification_emailaddress:

/* ----------------- AIX_stop ----------------- */

insert_job: AIXstop job_type: CMD box_name: AIX_RUN_WATCH command: ls machine: cfg.mc owner: root permission: date_conditions: 0 box_terminator: 1 std_out_file: ">> /tmp/${AUTOSERV}.${AUTO_JOB_NAME}.$(date +%Y%m%d).stdout" std_err_file: ">> /tmp/${AUTOSERV}.${AUTO_JOB_NAME}.$(date +%Y%m%d).stderr" alarm_if_fail: 1 alarm_if_terminated: 1 group: app send_notification: 1

/* ----------------- AIX_start ----------------- */

insert_job: AIX_start job_type: CMD box_name: AIX_RUN_WATCH command: ls machine: cfg.mc owner: root permission: date_conditions: 0 box_terminator: 1 std_out_file: ">> /tmp/${AUTOSERV}.${AUTO_JOB_NAME}.$(date +%Y%m%d).stdout" std_err_file: ">> /tmp/${AUTOSERV}.${AUTO_JOB_NAME}.$(date +%Y%m%d).stderr" alarm_if_fail: 1 alarm_if_terminated: 1 group: app

cat targets box_name: AIX_RUN_WATCH

Expected output -

 box_name: AIX_RUN_WATCH
 insert_job: AIX_stop
 insert_job: AIX_start
Renga
  • 393

3 Answers3

7

for the first attempt you need to use double quotes for shell variable expansion then escape the ones for awk $ operator to prevent it from expanding by the shell but be aware that using like this will break awk in case variable $i was contain special character like \, /. [I'm skipping to fix one or more of another issues with your command now].

while read i
do
 awk "/$i/{print a}{a=\$0}" ver_in.txt
done<in.txt

for the second attempt you need to use either regex match or string match against the current line, like using regex match (partial regex match) with:

while read i
do
 awk -v var="$i" '$0 ~ var{print a}{a=$0}' jil.txt
done<in.txt

or string match (full string match) like:

while read i
do
 awk -v var="$i" '$0==var{print a}{a=$0}' jil.txt
done<in.txt

now, talking about the commands you are trying to use them as to print previous line of matching pattern, you can do all with awk and left off using the shell loop; here we are doing full string match:

awk 'NR==FNR { str[$0]; next }
($0 in str) && prev!="" { print prev } { prev=$0 }' in.txt ver_in.txt

or doing partial regex match:

awk 'NR==FNR { patt[$0]; next }
{ for(ptrn in patt) if($0 ~ ptrn && prev!="") print prev; prev=$0 }' in.txt ver_in.txt

or doing partial string match:

awk 'NR==FNR { strings[$0]; next }
{ for(str in strings) if(index($0, str) && prev!="") print prev; prev=$0 }' in.txt ver_in.txt

or doing full regex match:

awk 'NR==FNR { patt[$0]; next }
{ for(ptrn in patt) if($0 ~ "^"ptrn"$" && prev!="") print prev; prev=$0 }' in.txt ver_in.txt
αғsнιη
  • 41,407
  • edit the question with input for awk script i am just ignoring the sh and apologies for not gave the input ealrier.. – Renga Oct 25 '21 at 08:36
4

You don't need a while read loop for this, and doing text processing in sh is a bad idea (see Why is using a shell loop to process text considered bad practice?).

Instead get your awk script to process both files.

awk 'NR==FNR { re = $0 "|" re ; next}; # append input line and | to re
     FNR == 1 { sub(/\|$/,"",re) };    # remove trailing | on 1st line of 2nd file
 $0 ~ re { print a }; # if the current line matches re, print a
 {a = $0}' in.txt ver_in.txt

While reading the first file (in.txt), it builds up a regular expression in a variable called re by appending each input line and the regex "alternation" (i.e. OR) operator.

When it has finished reading the first file, the first thing it needs to is remove the trailing | from re. This is necessary because re will always end up with | character due to the way it is constructed. If we don't remove it, that trailing | will cause the regex to match against every line of ver_in.txt.

After that, print variable a if the current input line matches the regex in variable re (this will print an empty line if the first line of ver_in.txt matches re - because a is empty. If you don't want that to happen change that line from $0 ~ re {print a} to $0 ~ re && a != "" {print a}).

Then, whether it matches or not, set a=$0.

NOTE: the NR==FNR {... ; next} is a very common awk idiom for handling the first input file in a different manner than the second and subsequent input files. NR is the global line counter for all files being read, and FNR is the line counter for the current file....so if NR==FNR, that means we're reading the first file. The next statement skips to the next input line, preventing the remainder of the awk script from being executed while in the first file.

You didn't provide a complete data sample, so I made my own to test with:

$ cat in.txt 
xxyu: JHYU_IOPI
foo
bar

This in.txt file will cause re to equal bar|foo|xxyu: JHYU_IOPI

BTW, because the awk script is doing a regex match against re, the lines in in.txt are treated as regular expressions, not as fixed text. That means that if you want any regex special characters (like ., |, [ or ] and many others) in in.txt to be treated as literal characters, you'll need to escape them with a backslash....you would have had to do this with your original sh+awk loop too.

$ cat ver_in.txt 
a line 1
xxyu: JHYU_IOPI
b line 3
d line 4
bar
e line 6
f line 7
foo

Output from the awk script above:

a line 1
d line 4
f line 7
cas
  • 78,579
  • BTW, this awk script is run once and processes both files in one pass. this is as fast as it gets. Your while-read-awk loop runs awk once for every line in in.txt, this is about as slow as it gets - partly because of the overhead of starting up awk and partly because reading text in a shell while loop is slow. – cas Oct 24 '21 at 06:59
  • 1
    I gave up on NR==FNR after one of my clients passed an empty first file, and my awk tried to build a 40GB array from the second file. I moved to awk 'myAwk' fSeq=1 A.file fSeq=2 B.file. This also works for multiple reference files. – Paul_Pedant Oct 24 '21 at 08:15
  • @Paul_Pedant GIGO. BTW, technically an empty file isn't a text file - it doesn't have a line ending in \n – cas Oct 24 '21 at 10:23
  • @cas thanks,,,so technically it will consume less cpu right...i compare both side by side awk consumes high cpu... – Renga Oct 24 '21 at 14:47
  • @PaulPedant just like its NR>1 friend (in order to skip the first line) FNR==NR is so lame I don't even think it was invented by a human –  Oct 24 '21 at 22:26
  • @cas An empty file is not necessarily garbage -- it quite possibly means something like "nobody made a pull request today". I like to protect both my end-user clients and my own code from unexpected situations wherever possible. – Paul_Pedant Oct 24 '21 at 23:49
  • @Renga High CPU %age is quite a good thing (provided it is doing something useful: inefficient algorithms are never desirable). Low CPU for a process can mean inefficient I/O usage, or that many processes are competing for CPU or other resources. I would take a poor view of a system that had a process that could run, but was not being scheduled because it was "too eager". – Paul_Pedant Oct 24 '21 at 23:55
  • 2
    @UncleBilly that statement is so unjustifiably smug and arrogant that I don't even think it's possible for a human. How about using your god-like genius to correct the literally hundreds of answers here on this site that use FNR==NR? What's lame is that awk doesn't deal with it sanely - it should treat an empty file as if it had one line, even if it doesn't have a newline. Either that, or it should have a file-counter variable, perhaps FC, that can be used instead. – cas Oct 25 '21 at 00:28
  • @renga if it's consuming high cpu, that's because it's actually doing work, rather than being idle waiting for input. – cas Oct 25 '21 at 00:30
  • @Paul_Pedant I didn't say an empty file was garbage. I meant that data which isn't what was expected is effectively garbage. – cas Oct 25 '21 at 00:31
  • @cas I completely agree. "Expect the unexpected" to the greatest extent possible. In Gnu/awk you can probably do something about empty files with ARGV, but I began on SunOs 1.5 (without even -v). I embed awk functions in shell, and will usually examine files in shell to catch such issues early. – Paul_Pedant Oct 25 '21 at 07:46
  • NR==FNR is almost always fine but to be able to handle empty first files, instead of NR==FNR in GNU awk use ARGIND==1 and in any awk use FILENAME==ARGV[1] (which is weaker than using ARGIND only if you're processing the same file multiple times on the command line). – Ed Morton Oct 25 '21 at 13:13
3

Don't use a shell loop to manipulate text, see Why is using a shell loop to process text considered bad practice?. The people who invented shell also invented awk for shell to call to manipulate text.

Using any awk in any shell on every Unix box:

$ cat tst.awk
NR==FNR {
    tgts[$0]
    next
}
$0 in tgts {
    if ( $0 != prevTgt ) {
        print $0
        prevTgt = $0
    }
    print prevLine
}
{ prevLine = $1 FS $2 }

$ awk -f tst.awk targets file
box_name: AIX_RUN_WATCH
insert_job: AIXstop
insert_job: AIX_start

Original answer:

awk '
    BEGIN { RS=""; FS="\n" }
    $2 != prev {
        print $2
        prev = $2
    }
    { print $1 }
' file
ght: ertyjk
xxx: rtyuiol
xxx: ertyuikl_fghjk
xxx: qwertyujkl
xxx: rtyuiol_123
ght: YUIOPO
xxx: rtyuiol
xxx: rtyuiopfghj
xxx: dfghjkvbnm
xxx: qzdfghnbvfgh
xxx: qsxcvghuiokmnhgf

See https://www.gnu.org/software/gawk/manual/gawk.html#Multiple-Line for how setting RS to null lets us work with multi-line records, and then setting FS to a newline means each field in such a record is a whole line so we're treating your data as blank-line separated records, each of which contains 2 lines of data.

You mentioned having some other file of ght lines that indicates which should be printed, implying there are other blocks that should not be printed. If you have such a file and it looks like this:

$ cat targets
ght: ertyjk
ght: YUIOPO

and your other input file contains some ght: lines that do not match the above, e.g. see the ght: whatever blocks in the modified input file below:

$ cat file
xxx: rtyuiol
ght: ertyjk

xxx: ertyuikl_fghjk ght: ertyjk

xxx: qwertyujkl ght: ertyjk

xxx: rtyuiol_123 ght: ertyjk

xxx: foo ght: whatever

xxx: bar ght: whatever

xxx: rtyuiol ght: YUIOPO

xxx: rtyuiopfghj ght: YUIOPO

xxx: dfghjkvbnm ght: YUIOPO

xxx: qzdfghnbvfgh ght: YUIOPO

xxx: qsxcvghuiokmnhgf ght: YUIOPO

then the above code would be updated to:

awk '
    BEGIN { FS="\n" }
    NR==FNR {
        tgts[$0]
        next
    }
    $2 != prev {
        if ( inTgts = ($2 in tgts) ) {
            print $2
        }
        prev = $2
    }
    inTgts { print $1 }
' targets RS='' file
ght: ertyjk
xxx: rtyuiol
xxx: ertyuikl_fghjk
xxx: qwertyujkl
xxx: rtyuiol_123
ght: YUIOPO
xxx: rtyuiol
xxx: rtyuiopfghj
xxx: dfghjkvbnm
xxx: qzdfghnbvfgh
xxx: qsxcvghuiokmnhgf
Ed Morton
  • 31,617