2

Hello I have 2 files with the first file containing a few values for example

powershell
vectormaps
JuniperSA

and the second file containing values and and ID

appid uid
SplunkforSnort 340
powershell 610
vectormaps 729
JuniperSA 826
postfix 933
SplunkforJuniperSRX 929
TA-barracuda_webfilter 952
TA-dragon-ips 954
dhcpd 392

So im trying to run a while loop with AWK to get the values and their corresponding ID's but the output file seems to be writing something else. This is how im trying to run the while loop.

while read $line;
do
awk '/'$line'/ {print $0}' file2.csv > new
done < file1

My expected output should be

powershell 610
vectormaps 729
JuniperSA 826

but my output is

appid uid
SplunkforSnort 340
powershell 610
vectormaps 729
JuniperSA 826
postfix 933
SplunkforJuniperSRX 929
TA-barracuda_webfilter 952
TA-dragon-ips 954
dhcpd 392

it seems as if nothing is happening. What am i missing here?

  • 4
    is awk mandatory ? what's wrong with grep -f pattern list ? – Archemar Jul 27 '22 at 11:24
  • are pattern in pattern file clean ? no trailling space, tab, not windows edited file ? – Archemar Jul 27 '22 at 11:29
  • actually grep also did the job. but once the file gets more complicated i was wondering if i could use advaced features of awk. Thats why. – ranjit abraham Jul 27 '22 at 11:30
  • @Archemar you can't do this task robustly with just grep. grep -f pattern list, for example, would falsely match substrings and strings containing regexp metachars and strings in the wrong column and there aren't options to grep to let you modify the call to grep to say "only match a literal string in the first column". – Ed Morton Jul 27 '22 at 12:05
  • awk 'NR==FNR{a[$1]=$2;next}; $1 in a' file1 file2 should work. The command is taken from this and this. – Prabhjot Singh Jul 27 '22 at 16:55
  • I would use join here. That’s what it is built for. – D. Ben Knoble Jul 27 '22 at 21:51

3 Answers3

7

Using awk

$ awk 'FNR==NR {a[$1]=$2; next} {$(NF+1)=a[$1]}1' file2 file1
powershell 610
vectormaps 729
JuniperSA 826
sseLtaH
  • 2,786
  • 1
    Worked like a charm. – ranjit abraham Jul 27 '22 at 11:35
  • 1
    {print $0, a[$1]} would be a bit more efficient than {$(NF+1)=a[$1]}1 since the former just prints the 2 strings while the latter additionally modifies a field which causes awk to have to rebuild $0 from the fields, replacing every FS with OFS. It MAY also cause awk to have to resplit the record into fields again too as adding a field modifies the record, I'm not sure about that one. – Ed Morton Jul 27 '22 at 12:12
1

As mentioned, there is no reason to use a while loop or awk regardless of how complicated the file might become. You are simply looking to print lines in the second file that contain strings from the first file. It's best to use the KISS approach rather than complicating something where it isn't necessary.

The following will do what you want:

grep -f file1 file2.csv
Nasir Riley
  • 11,422
  • 1
    That would falsely match substrings and strings containing regexp metachars and strings in the wrong column. You can't robustly do this task with just grep. – Ed Morton Jul 27 '22 at 12:02
  • 1
    That is the example that was given and it's been confirmed that it does what is wanted.. If there are other circumstances, then what I gave can be modified. I'm not going to attempt to provide something that accounts for every possible scenario because nothing does. Requiring such a thing would render a substantial majority of the answers here useless. – Nasir Riley Jul 27 '22 at 13:21
  • 1
    I understand it's what was given, often as in this case the OP doesn't think of problem cases and just provides some sunny day data. Given different input it'd fail. @HatLess' awk answer will work robustly for any input containing substrings, metachars, matches in the wrong column, etc., while the grep command cannot be modified to do so (without adding additional commands like sed to pre-process the data). Using awk for this is just all round the better approach. – Ed Morton Jul 27 '22 at 13:46
0

When setting a variable in shell you must not use the $ sign. Additionally awk can't user shell variables, you have to pass them to awk with -v variable=value

while read line;
do 
awk -v line="$line" -e '{if  ($1 ~ line) print $0} ' file2.csv
done < file1 >new
Harry
  • 304
  • this is a very good approach. Thanks. I didnt know about that -v. – ranjit abraham Jul 27 '22 at 12:11
  • 1
    @ranjitabraham, no, this is a well known anti-pattern (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice) and has additional issues including exposing the shell variable to globbing, word splitting, filename expansion, and will do a partial regexp match across a whole line while you need a full word literal string match against 1 column. It'll also produce just 1 line of output in new instead of many. – Ed Morton Jul 27 '22 at 12:18
  • It's doing the same as grep -f file1 file2.csv but orders of magnitude slower and with multiple additional issues. If you copy/paste it into http://shellcheck,.net I expect it'll tell you about some I haven't mentioned yet as well as some I have. – Ed Morton Jul 27 '22 at 12:24
  • Right, the redirection to file new was at the wrong place. Corrected it. I wanted to give an answer as close to the original question as possible. grep is the right tool for this, but maybe it can't be used in this case? – Harry Jul 27 '22 at 12:40
  • Right, you can't do this with just grep, of the mandatory POSIX tools awk is the right one for this. – Ed Morton Jul 27 '22 at 13:41
  • The expression ($1 == line) would be better because this would not match vectormapsi. – Prabhjot Singh Jul 27 '22 at 17:03