4

I am trying to compare two files (Extensions.txt and Temp.txt). If there is a line that does not partially match from Extensions.txt in Temp.txt I would like to append the missing line to Temp.txt.

Extensions.txt (Very basic, one column):

111
1234
4321

Temp.txt:

1234/sip:1234@192.168.1.10:5060  9421b96c5e   Avail   1.480
4321/sip:4321@192.168.1.11:5060  e9b6b979a4   Avail   1.855

Basically, what I want to do is find a match based on everything before the / in the first column and if there is no match, I would like to print the non matching line to the bottom of the file so that it would end up like this:

1234/sip:1234@192.168.1.10:5060  9421b96c5e   Avail   1.480
4321/sip:4321@192.168.1.11:5060  e9b6b979a4   Avail   1.855
111

So far I have attempted grep -v and it doesn't produce the results that I want, I also tried with awk and it seems like that is the way to go, however I do not have a full understanding of how awk works in order to produce the appropriate results.

thanasisp
  • 8,122
  • 1
    If there is a row in 'Temp.txt' like 1235/... (no match in the first file) will it be in the output or not? – thanasisp Nov 15 '20 at 20:50
  • If a value in the extensions.txt file is not found in temp.txt it will output the missing value from extensions to temp. If that makes sense, if not let me know. – MSTek MTL Nov 15 '20 at 20:52
  • So, you mean that, in the opposite case, if value in temp.txt is missing from extensions.txt, you don't want it into the output at all. – thanasisp Nov 15 '20 at 20:54
  • If a partial match of the value in Extensions.txt is found in Temp.txt, I don't want to print/output anything. I want to output the non-found "Extension" value if not found, which in the case of the example above would be 111. – MSTek MTL Nov 15 '20 at 20:58

4 Answers4

5

You can parse the files with awk

awk -F '/' '
    FNR == NR {seen[$1] = $0; next}
    {if ($1 in seen) print seen[$1]; else missing[$1]}
    END {for (x in missing) print x}
' Temp.txt Extensions.txt

Output:

1234/sip:1234@192.168.1.10:5060 9421b96c5e Avail 1.480
4321/sip:4321@192.168.1.11:5060 e9b6b979a4 Avail 1.855
111
  • Set field separator to slash, -F '/'
  • The action after FNR == NR is executed for the lines of the first input file. We store the lines in the associative array seen as keys, and go to next line.
  • The second action is executed for the second file, when FNR != NR. If the first field matches, we print the stored line, else we save the field into another array missing.
  • At the END, we print the missing lines.
thanasisp
  • 8,122
  • I selected this one as best answer compared to the previous. The reason being is this only outputs the missing value and not all plus the missing value. Thanks Again! – MSTek MTL Nov 15 '20 at 21:29
  • Please keep the "accepted" to the initial answer, as both work. – thanasisp Nov 15 '20 at 21:29
  • While both work and the other answer is great as well, I do find that you were the fastest to answer and ultimately the most helpful, so I believe you deserve the "accepted". – MSTek MTL Nov 15 '20 at 21:33
  • Just realized something and was just wondering what your thoughts are. When running awk with the two input files, it seems that even if 1234 is found in Extension.txt and Temp.txt it still prints the value 1234. Any ideas? – MSTek MTL Nov 15 '20 at 22:05
  • If you have a specific example case, I could further see it. For now, this script parses the Temp.txt first, so if 1234/... is found there, then it will print the matching line and not the number alone, for any occurence of 1234 in Extensions.txt – thanasisp Nov 15 '20 at 22:08
  • sorry, could you have 12345/ and still want to match 1234? a partial match in the first field? – thanasisp Nov 15 '20 at 22:17
3

You could read the contents of Extensions.txt into an array, delete the partial matches, then print whatever remains:

$ awk -F/ '
    NR==FNR {a[$1]; next} {for(i in a) if($1 ~ i) delete a[i]} END{for(i in a) print i} 1
  ' Extensions.txt Temp.txt
1234/sip:1234@192.168.1.10:5060  9421b96c5e   Avail   1.480
4321/sip:4321@192.168.1.11:5060  e9b6b979a4   Avail   1.855
111
Quasímodo
  • 18,865
  • 4
  • 36
  • 73
steeldriver
  • 81,074
  • Nice one. I think {for(i in a) if($1 ~ i) delete a[i]} is the same as $1 in a {delete a[$1]}, right? – Quasímodo Nov 15 '20 at 21:06
  • This worked amazing! Thank you so much! – MSTek MTL Nov 15 '20 at 21:07
  • This will also print lines from Temp.txt that has no match in Extensions.txt (see discussion in comments). I just guess that Temp.txt (first fields of it) is a subset of Extensions.txt, although this was not stated. – thanasisp Nov 15 '20 at 21:10
  • But now I see "append the missing line to Temp.txt" which is the opposite, so you do want to print them (if existed), as said, I guess it is a always a subset, that's why both parsing seem good to OP. – thanasisp Nov 15 '20 at 21:29
  • @Quasímodo $1 in a would work fine for complete matches; for(i in a) if($1 ~ i) will catch partial (regex) matches, which is what the OP asks for - although it's not clear if it's actually what they want. – steeldriver Nov 15 '20 at 21:49
  • @MSTekMTL if you do want a partial match, like 1234 to match the line 12345/... from temp, please use (and accept) this answer, I have to modify mine with a loop like here for partial match, I use full match of the first field. – thanasisp Nov 15 '20 at 22:22
2

Using grep+cut:

grep -xvFf <(cut -d'/' -f1 tmp) ext >> tmp

Here we are safe in grep using tmp for the input in process-substitution as the patterns feed and write the result back into same tmp file in append mode; see the explanation in below link:

Using same filename for the input in sub-shell and also as output in parent shell will conflict?

αғsнιη
  • 41,407
0

Also can use join.

join -t"/" -a1 -e " " -o '1.1,2.1,2.2' <(awk '{ print $1"/"}' Extension.txt ) <( awk '{ print $0}' Text.txt)
αғsнιη
  • 41,407