$ join -a 1 -e 0 -o 0,2.2 <(sort File1) <(sort File2)
banana 2
berry 0
cherry 1
orange 1
strawberry 0
This uses join
to perform a relational JOIN operation between the files. This requires both files to be sorted, which is why we sort them in a process substitution each (you could obviously pre-sort the data if you wish). The command will list all lines from the first input file (-a 1
) and replace missing fields with 0
(-e 0
). The fields in the output will be the join field (the first field in each file by default, and written 0
in the argument of the -o
option) and the second field from the second file (2.2
).
Pro: Fast (especially if the data is already sorted) and memory-efficient.
Con: Re-orders the data.
To preserve the order of the original File1
, you may use awk
instead:
$ awk 'NR == FNR { key[$1] = $2; next } { $2 = ($1 in key) ? key[$1] : 0 }; 1' File2 File1
orange 1
banana 2
berry 0
cherry 1
strawberry 0
This reads the 1st column of File2
as keys in the key
associative array, and the 2nd column as their associated values.
When File1
is being read (NR
is no longer equal to FNR
), we set the 2nd column to either the value from the key
array, if there is a key corresponding to the 1st column, or to 0
if there is no such key.
You may shorten the code somewhat by abusing the fact that a uninitialized value is zero in arithmetic contexts:
$ awk 'NR == FNR { key[$1] = $2; next } { $2 = 0+key[$1] }; 1' File2 File1
orange 1
banana 2
berry 0
cherry 1
strawberry 0
Pro: Output is ordered according to File1
.
Con: Data from File2
is stored in memory (only really matters if reading huge number of lines).
File1
or that ofFile2
(if they can be different)? – AdminBee Dec 03 '21 at 15:18