-1

I've got two text files, e.g. File1.txt:

A
B
C
E

and File2.txt:

C
D
E

where the letters stand for lines.

I'd like to find all results in File1.txt, that are not in File2.txt The results in both files vary.

How could this be done? So in this case, it shall call out A and B.

X3nion
  • 33

4 Answers4

1

If they are sorted, try:

comm -23 File1.txt File2.txt

If they aren't sorted, but it is OK to sort them, try, in bash:

comm -23 <(sort File1.txt) <(sort File2.txt)

Unless you uniq or sort -u File1.txt, lines that occur more times in File1.txt than in File2.txt will be output. This may or may not be appropriate for your use case.

If one file is already sorted, you can use a simple pipeline in most shells, like:

sort File1.txt | comm -23 - File2.txt
David G.
  • 1,369
  • What do you mean by "sorted"? According to which criteria are the files sorted? And are the files manipulated after the sorting, or is this just temporary? – X3nion Sep 03 '20 at 11:11
  • @X3nion Sorted by the default comparison of the sort command, which is to say a strict byte by byte comparison of the lines. This will probably be the default of any other sorting operation you might apply. The sort command normally reads standard input or one or more files, and outputs sorted lines to standard output, so it does not touch the input files. My comment about "OK to sort them" is more about "if you don't need the unique lines in the order they were in the original file", and that command will not affect the input files in any way. – David G. Sep 03 '20 at 13:15
1

Simplified, thanks to @Jeff Schaller

Try:

fgrep -vx -f File2.txt File1.txt

This is: find all lines in File1.txt that do not match a line from File2.txt

The -x option that I didn't know about before causes the match to require being the complete line.

The -v option says show the ones that don't match.

The -f option specifies that the lines in the file that follow are the patterns.

David G.
  • 1,369
0

A quick tcsh script:

#  arg2linesNOTINarg1.csh:
#  tcsh
#  LINES FROM ARG2 THAT ARE NOT IN ARG1
#
if ( $#argv < 2 ) then
   echo ' set fileWITHavoidedLINES = $1 '
   echo ' set fileTOsearch = $2 '
else
   set fileWITHavoidedLINES = $1
   set fileTOsearch = $2
endif
set genSRCHstr =  'awk '"'"'BEGIN { started=0; } \
                          { if (started==0) printf("^%s$",$0);  \
                           else printf("|^%s$",$0) ; started=1 } \
     END { printf("\n") } '"'"' '"${fileWITHavoidedLINES}"' '
egrep -v `eval ${genSRCHstr}` $fileTOsearch

That can be run with:

tcsh arg2linesNOTINarg1.csh File2.txt  File1.txt

May run into problems generating the search strings if the lines are too long or if they contain particular characters like spaces or tabs. Might be able to modify this to prevent these problems, but this suggestion is just a start.

jmf7
  • 31
0

Awk

awk 'NR==FNR{a[$1];next}!($1 in a){print $1}' file2.txt file1.txt

output

A
B