1

I am new in awk and shell, wondering how can I merge 2 files with the line that have the same record using shell/awk? file1 and file2 may have different order for the Name. I only want to merge the lines that has the same record. please help.

file1.txt
Mary 68 
Tom 50 
Jason 45
Lu 66

file2.txt
Jason 37
Tom 26
Mary 74
Tina 80

mergefile.txt
Marry 68 74
Tom 50 26
Jason 45 37 

I have a try of awk, but it takes some times to running the script. Wondering if there could be a more faster and simple implement.

cat file1.txt | while read line
do
    score1=$( echo $line | awk '{print $2}');
    name1=$( echo $line | awk '{print $1}');

    cat file2.txt | while read l
    do
        score2=$( echo $l | awk '{print $2}');
        name2=$( echo $l | awk '{print $1}');
        if [[ $name1 == $name2 ]]
        then
            echo "$name1 $score1 $score2" >> mergefile
            break
        fi
    done
done
Tiger
  • 347
  • You have some beginner mistakes in your script. Copy/paste it into https://www.shellcheck.net/ to see s list of several of them and then also read https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice, https://mywiki.wooledge.org/Quotes, http://porkmail.org/era/unix/award.html, and https://unix.stackexchange.com/q/65803/133219 for more on those and other issues. Finally - never name a variable l as it looks far too much like the number 1 and so obfuscates your code. – Ed Morton Mar 07 '20 at 18:07

2 Answers2

6

If you want to use awk:

$ awk 'NR==FNR {a[$1] = $2; next} $1 in a {print $1, $2, a[$1]}' file2.txt file1.txt 
Mary 68 74
Tom 50 26
Jason 45 37

No sorting is required and the output will be in the order of the second file given.

Explanation:

  • NR==FNR is the canonical way to select records from the first named file
  • {a[$1] = $2; next} populate an array with keys from the first field and values from the second
  • $1 in a if the first field was already seen in the first file; then
  • {print $1, $2, a[$1]} print the key and value from the second file and the value from the first
steeldriver
  • 81,074
5

This sound like a job for join, the relational database operator

join <(sort file1.txt) <(sort file2.txt)

Tests

$ cat file1.txt
Mary 68
Tom 50
Jason 45
Lu 66

$ cat file2.txt
Jason 37
Tom 26
Mary 74
Tina 80

$ join <(sort file1.txt) <(sort file2.txt)
Jason 45 37
Mary 68 74
Tom 50 26

join is standard tool specified in POSIX.

The join man page states:

The files file1 and file2 shall be ordered in the collating sequence of sort -b on the 
fields on which they shall be joined, by default the first in each line. All selected 
output shall be written in the same collating sequence.
Paulo Tomé
  • 3,782
  • if I want to combine 2 files together, can "join" do the work? also combine the name and scores that only appears in one file? – Tiger Mar 09 '20 at 19:06
  • @Tiger What do you mean by "combine 2 files together". See the join link in my answer to understand what its purpose. – Paulo Tomé Mar 10 '20 at 12:10