How to loop a command through a file column values?

Question

I have a simple command like this

 grep 'X' results.dat | awk '{print $NF}'  > Y.dat

I want to loop this command taking the Xs from column 1 and the corresponding Ys from column 2 of the same file eg. NAMES

NAMES file has the format

C11-C12     p01
C13-C14-C17 P02
etc ..

so the first two steps in the loop should be like this

grep 'C11-C12' results.dat | awk '{print $NF}'  > p01.dat
grep 'C13-C14-C17' results.dat | awk '{print $NF}'  > p02.dat

What are you going to do with the p01.dat and p02.dat files? I give 24 to 1 odds that there is a better way to accomplish your final end result without making these intermediate files. — Wildcard, Dec 02 '16 at 01:39

G-Man Says 'Reinstate Monica' · Accepted Answer · 2016-12-03T05:46:42.783

A solution that doesn’t require looping in the shell:

awk 'pass==1 {  Xpatt[NR] = $1; Yfile[NR] = $2 ".dat"; printf "" > Yfile[NR] }
     pass==2 {
                for (i in Xpatt) {
                        if ($0 ~ Xpatt[i]) print $NF > Yfile[i]
                }
             }' pass=1 NAMES pass=2 results.dat

First of all, awk allows you to specify variable assignments as command-line arguments, after the program, mixed in with the filenames, and not using -v. They are executed at the point in the processing sequence that their position in the command line would suggest. So, in the above command,
1. pass gets set to 1,
2. the NAMES file is processed,
3. pass gets set to 2, and then
4. the results.dat file is processed.
I guess I could have set pass=1 with a -v or in a BEGIN block.

I use the pass variable to tell which file I’m reading. This is commonly done by comparing NR to FNR, but that can lead to false indications if a file is empty.

(Strictly speaking, I suppose that this script should check whether either of the files is empty, because, in that case, there’s no work to be done.)
While pass==1 (we’re reading the NAMES file), save the X and Y values (pattern and filename) from columns 1 and 2 ($1 and $2) of that file. Create the output file (Yfile[NR]) because, if we don’t do it here, we will not get (empty) output files for patterns that are not present in the results.dat file. (If that’s OK with you, leave out the printf statement.)
While pass==2 (we’re reading the results.dat file), loop through the patterns in the NAMES file and print the last word from every line that matches the pattern into the corresponding file — i.e., the equivalent of the OP’s grep X … | awk '{print $NF}' > Y.dat command.

Interesting edge case re: empty files and FNR == NR. Upvoted just for that. — Wildcard, Dec 03 '16 at 06:06

score 1 · Answer 2 · edited May 23 '17 at 11:33

1

Bash solution:

while read X Y remainder || [[ -n ${Y} ]]; do
    awk -v X="$X" '$0 ~ $X {print $NF}' results.dat > "$Y".dat
done < NAMES

Generally, while IFS="q" read X Y remainder; do ...; done < NAMES will iterate over lines from NAMES. It will separate the values in each line based on the value of IFS (internal field separator). In this example, IFS is set to the letter q. IFS defaults to whitespace (space characters, tabs, or newlines). The first field is assigned the variable X, the second to Y, and the rest of the line to remainder.

See also: Read columns from file into separate variables (Unix.SE).

In the solution above, IFS is not specified because I assume your fields are already whitespace-separated.

Note: if the fields in your NAMES file contains backslashes, then you need to use read -r to prevent read from interpreting backslashes as escape sequences.
The ... remainder || [[ -n ${remainder} ]] part handles two things: any extra fields, if any, are stored in remainder; and handles the case if the last line of your input file doesn't end with newline \n (read returns a non-zero exit code when it encounters EOF).

See also: Read a file line by line assigning the value to a variable (SO).
Eliminate grep altogether: awk -v X="$X" '$0 ~ $X {print $NF}' results.dat > "$Y".dat. The -v option to awk defines a variable that can be used in the awk script.

edited May 23 '17 at 11:33

Community

1

answered Dec 02 '16 at 00:02

scottbb

558

Using a shell loop to process text is a bad idea. But, good for explaining the parts. – Wildcard Dec 02 '16 at 01:11
@Wildcard generally agreed. However, in this case the OP is asking how to use lines and fields from one file to process another. In this case, an awk one-liner is probably less readable. Especially when you consider that OP's original approach was grep pattern | awk '{print $NF}', which indicates to me a low level of facility with awk to begin with. – scottbb Dec 02 '16 at 01:19
Agreed on the low facility with Awk—but even then, the answer is to explain yourself well, not to dumb down the answer and give a worse approach to solving the problem than you are capable of. – Wildcard Dec 02 '16 at 01:41
@Wildcard In truth and fairness, I didn't dumb down the answer, because the approach I gave is my best tradeoff between readability and my own facility with the tools. – scottbb Dec 02 '16 at 02:02
Ah, gotcha. In that case, I highly recommend reading the linked answer (re shell loops) in full and then revisiting this answer to see how you can improve it. :) – Wildcard Dec 02 '16 at 02:33
(1) You should always quote your shell variable references (e.g., "$X", "$Y" and "$remainder") unless you have a good reason not to, and you’re sure you know what you’re doing. Putting them into curly braces isn’t nearly as useful as some people seem to think; see this. … (Cont’d) – G-Man Says 'Reinstate Monica' Dec 02 '16 at 06:32
(Cont’d) … (2) For completeness, you might want to mention that, if the X and Y values might contain backslashes, the user should use read -r. (3) I understand the bit about the last line of a file not ending with newline, but why are you saying -n ${remainder}? It seems like -n "$X" would be more appropriate. (4) You forgot to include results.dat on your awk command. – G-Man Says 'Reinstate Monica' Dec 02 '16 at 06:32
@G-Man (1) Indeed, thank you. (2) Yeah, I originally left it out because I made the assumption OP's data didn't have backslashes, but you're right, there's no reason to omit that. (3) Again, oops. But the correct check is if Y is not null/empty. As long as we parsed a pair of fields, the line is valid. If the check was -n ${X}, it would possibly parse and accept a non-newline-terminated line at the end with only one field. (4) Again, oops, and thank you. Big derp there. =) – scottbb Dec 02 '16 at 06:54
1

@scottbb: Oh, another thing that I didn't notice until I wrote my own answer: $0 ~ $X should be $0 ~ X. – G-Man Says 'Reinstate Monica' Dec 02 '16 at 07:48

How to loop a command through a file column values?

2 Answers2