4

I want to create a file that contains columns from two input files. File1 is like:

aa 32
bb 15
cc 78

File2 is:

fa 19
bc 23
cc 50
de 28
aa 45
bb 31

The task is, read through File1, if the 1st field of a row exists among the 1st field of File2, then print that line of File2, with both columns and add the 2nd column entry of File1 containing the 1st field.

The output should be like:

aa 45 32
bb 31 15
cc 50 78

awk is preferred for the script.

Kusalananda
  • 333,661
hldn
  • 41

3 Answers3

11
$ awk 'FNR==NR{a[$1]=$2;next} ($1 in a) {print $1,a[$1],$2}' file2 file1
aa 45 32
bb 31 15
cc 50 78

Explanation:

awk implicitly loops through each file, one line at a time. Since we gave it file2 as the first argument, it is read first. file1 is read second.

  • FNR==NR{a[$1]=$2;next}

    NR is the number of lines that awk has read so far and FNR is the number of lines that awk has read so far from the current file. Thus, if FNR==NR, we are still reading the first named file: file2. For every line in file2, we assign a[$1]=$2.

    Here, a is an associative array and a[$1]=$2 means saving file2's second column, denoted $2, as a value in array a using file2's first column, $1, as the key.

    next tells awk to skip the rest of the commands and start over with the next line.

  • ($1 in a) {print $1,a[$1],$2}

    If we get here, that means that we are reading the second file: file1. If we saw the first field of the line in file2, as determined by the contents of array a, then we print out a line with the values of field 2 from both files.

John1024
  • 74,655
3

I like the awk solution but I think this might be easier:

sort file1 > sortedFile1
sort file2 > sortedFile2
join -o 1.1 2.2 1.2 sortedFile1 sortedFile2

The option -o will override the format which you will select to be first field of first file, second field of second file and second field of first file.

don_crissti
  • 82,805
Isaac
  • 392
0

Join 2nd file, File2 to the 1st file, File1.

join <(sort -k1 file2) <(sort -k1 file1)
Abishek J
  • 170