9

I'm extracting rows from a set of text files with awk. The files look like this:

1000    1    75
1000    2    76
1001    1    76
1001    2    80

I'm searching several directories of these with this command:

awk -F"\t" '$3 == "76" { print $1"\t"$2}' ../benchmark/*/labels.txt

awk is giving me the correct output:

1000    2
1001    1

Now for each found row I must execute a script passing these two numbers as parameters, like this:

./build.oct 1000    2

What's the correct way to do that? I don't really care about script console output (it produces files).

6 Answers6

9

You can also use xargs (-l makes it run a separate command for each line):

timp@helez:~/tmp$ awk -F"\t" '$3 == "76" { print $1"\t"$2}' test.txt | xargs -l ./build.oct 
$1 is  1000  and $2 is  2
$1 is  1001  and $2 is  1

timp@helez:~/tmp$ cat test.txt
1000    1   75
1000    2   76
1001    1   76
1001    2   80
timp@helez:~/tmp$ cat build.oct
echo '$1 is ' $1 ' and $2 is ' $2

As suggested in the comments you can also simplify the awk command, since both awk and xargs split on both tabs and spaces:

timp@helez:~/tmp$ awk '$3 == "76" {print $1,$2}' test.txt | xargs -l ./build.oct
$1 is  1000  and $2 is  2
$1 is  1001  and $2 is  1
TimP
  • 461
  • 1
    You can simplify it to print $1,$2 since xargs will split on space as well as on tab characters. – Stéphane Chazelas Jun 18 '14 at 08:57
  • @stéphane-chazelas, correct; I was following the example of the original asker, but it can be simplified. I'll add a note to my answer about the shorter version of the awk command. – TimP Jun 18 '14 at 21:23
3

This worked for me:

awk -F"\t" '$3 == "76" { printf "./build.oct %d %d\n", $1, $2}' \
../benchmark/*/labels.txt | bash
chaos
  • 48,171
  • 2
    Because the awk output is interpreted as shell code, you may want to sanitise your input. Replacing the %s with %d would make it safer in the event someone has managed to sneak a ;rm -rf / in the input files. – Stéphane Chazelas Jun 18 '14 at 08:55
  • @StéphaneChazelas Good point, I edited my answer, thanks – chaos Jun 18 '14 at 09:02
1

Consider this:

cat ../benchmark/*/labels.txt |
while IFS=$'\t' read P1 P2 P3 ; do
  [[ $P3 == 76 ]] && echo $P1 $P2
done |
sort -u |
parallel ./build.oct
  • you save awk subprocess with readline built-in parser (see comments below)
  • you avoid dupes with sort -u
  • you leverage resource usage with parallel (or xargs -l1)

Other approach of interest, piloted by awk:

awk -F'\t' '$3==76 && !seen[$1,$2]++ {
  print $1 FS $2 | "parallel ./build.oct"
}' ../benchmark/*/labels.txt
  • reuse input field separator FS instead of literal
  • dupes are discarded using an array of counters
  • you learn piping to awk subprocess
  • 1
    For any significant volume of data awk parsing is usually more efficient than shell, especially here where shell needs an added cat. IFS='\t' doesn't work in bash; you need IFS=$'\t' or quotes with an actual tab char (usually input with control-V,control-I but may vary). OP didn't express any need to remove dupes but if needed awk can do it without sorting by &&!seen[$1,$2]++. – dave_thompson_085 Jul 02 '22 at 03:48
  • I was wondering which was more efficient for massive input and my intuition is matching your advice. – Thibault LE PAUL Jul 02 '22 at 06:32
0

Assuming that columns 1 and 2 won't have whitespace in its entries, you can also do:

awk -F"\t" '$3 == "76" { print $1"\t"$2}' ../benchmark/*/labels.txt |
    while read a b; do ./build.oct $a $b; done
pepoluan
  • 1,323
0

Using while is the best answer if you have multiple outputs:

I was able to delete many created WireGuard zones with UCI by the below command.

uci show firewall | grep wireguard | awk -F . '{printf "%s\n", $2}' | while read a ; do uci delete -q firewall."$a" ; done

Looping over found outputs and delete.

Edited

In your case all you have to do is:

awk -F"\t" '$3 == "76" { print $1"\t"$2}' ../benchmark/*/labels.txt | while read a b ; do ./build.oct "$a" "$b" ; done 
  • (1) What does “if you have multiple outputs across the command line or the file” mean?   How does it relate to this question?   (2) Please don't add "thank you" as an answer.   Once you have sufficient reputation, you will be able to vote up questions and answers that you found helpful.   (3) Copying somebody else’s answer and then inflating it with particulars from your situation, which are unique to you, isn’t really helpful — especially when you don’t show what data you are working with. … (Cont’d) – G-Man Says 'Reinstate Monica' Jun 30 '22 at 18:11
  • (Cont’d) …  (4) And, if you are going to copy somebody else’s answer, you should say clearly that you are doing so, linking to the source and stating the name of the original author.  (5) Awk is a very powerful program; you almost never need to combine it with grep or sed.  In particular, grep wireguard | awk '{command…}' can be simplified to awk '/wireguard/ {command…}'.  (6) Why do you say printf "%s\n", $2 instead of print $2? … (Cont’d) – G-Man Says 'Reinstate Monica' Jun 30 '22 at 18:19
  • (1) simplified/edited as "multiple outputs" (2) (3) (4) (5) I don't know what you're talking about but I clearly mentioned that it was my command line. and using some variants won't hurt the system. (6) I wanted to separate my outputs with new line for another purpose, thus the use of '\n'.
    (7) Again, I don't know what you're trying to explain but thanks for the advice.
    – Oussama Boumaad Jul 07 '22 at 12:06
  • Thank you for responding politely.  (1) Well, I still don’t understand what you mean by “Using while is the best answer *if you have multiple outputs*”.  … (Cont’d) – G-Man Says 'Reinstate Monica' Jul 07 '22 at 22:06
  • (Cont’d) …  (2), (3) & (4) Your first answer is essentially equivalent to pepoluan’s answer but adapted to your problem.  Your second answer is *identical* to their answer except for spacing.  (2) Posting somebody else’s answer as a new answer looks like you’re saying “Hey! This answer works! Thanks!”  (4) Copying other people’s work without giving credit to the original author is forbidden here.  … (Cont’d) – G-Man Says 'Reinstate Monica' Jul 07 '22 at 22:06
  • (Cont’d) …  (5) What do you not understand?  Look at the difference between your first answer (grep wireguard | awk '{…}') and your second answer (awk '$3 == "76" {…}').  (6) My point was that, while printf prints only the characters you give it, print automatically prints them on a separate line.  Not only is it shorter, but there is a technical reason why print x is preferable to printf "%s\n" x.  (7) See my edit to your answer. – G-Man Says 'Reinstate Monica' Jul 07 '22 at 22:06
0

Gnu awk has a system function. You could run something along the lines of

awk '$3 == "76" { system("./build.oct " $1 " " $2) }' ....
JJoao
  • 12,170
  • 1
  • 23
  • 45
  • 1
    (1) As Stéphane Chazelas hinted in his comment, this can be dangerous if an attacker can write to your input file and say, for example, “;shutdown now 76” or “;rm  *  76”. (2) This almost certainly runs a new bash process for each line of input.  (I don’t see any other answer that does that.) – G-Man Says 'Reinstate Monica' Jul 07 '22 at 19:54
  • @G-ManSays'ReinstateMonica', Thank you! (1) true. That is why I wrote "along the lines of". I never know how to deal with it: The correct solution is tends to hide the basic idea.. (2) true. It can be a problem if you have millions of lines but in many cases this is what we really need, and what is going to be the reasonable solution. – JJoao Jul 08 '22 at 09:14