Identifying the difference in two files in unix

Question

I have 2 files rec1.txt and rec2.txt.

[gpadmin@subh ~]$cat ret1.txt
emcas_fin_bi=324 
emcas_fin_drr=3294   
emcas_fin_exp=887 
emcas_fin_optics=0
emcas_gbo_gs=3077

and

[gpadmin@subh ~]$ cat ret2.txt 
emcas_fin_bi=333 
emcas_fin_drr=5528 
emcas_fin_exp=1134 
emcas_fin_optics=0 
emcas_fin_revpro=0 
emcas_gbo_gs=3897

I am providing for compare as :-

 [gpadmin@subh ~]$ diff -y ret1.txt ret2.txt  
emcas_fin_bi=324 | emcas_fin_bi=333  
emcas_fin_drr=3294 | emcas_fin_drr=5528
emcas_fin_exp=887 | emcas_fin_exp=1134
emcas_fin_optics=0 emcas_fin_optics=0
emcas_gbo_gs=3077 | emcas_fin_revpro=0 
                        >  emcas_gbo_gs=3897

I see this is wrong output from above output since emcas_gbo_gs is common but showing as new word:-

emcas_gbo_gs=3077 | emcas_fin_revpro=0
               > emcas_gbo_gs=3897

Desired Output :-

emcas_gbo_gs=3077 | emcas_gbo_gs=3897   
                      > emcas_fin_revpro=0

That looks fine to me. diff does a line-by-line comparison and -y puts the output in two columns which is exactly what you have. emcas_gbo_gs=3897 appears on the sixth line in the second file whereas your first file doesn't have a sixth line. — Nasir Riley, Aug 26 '18 at 23:20
Kindly check the desired output.. my current output is wrong. :( — Subhashis Dey, Aug 26 '18 at 23:22
No, it is not wrong. The command is working exactly as it should. It is not coming to give you the desired output because it's not supposed to work in that way. — Nasir Riley, Aug 26 '18 at 23:24
Then you are going to need to use a different command which won't be for comparing the two files. diff doesn't work in the way that would be required for giving you that output. — Nasir Riley, Aug 26 '18 at 23:27
You need to be more clear in your question. Are you trying to compare the two files or are you just trying to get a certain output based on their contents? — Nasir Riley, Aug 26 '18 at 23:34
Not possible. diff compares the files line-by-line so it will never give you that output with those two files. In fact, it would be wrong if it did do that because it would be comparing two different lines. I don't see any reason why you can't work with the output it's giving you as it's telling you exactly where the two files are different. Your desired output wouldn't make any sense. — Nasir Riley, Aug 26 '18 at 23:46
If you want something like "show nothing for identical key names with identical values, then show side-by-side diffs for keys with identical names and different values, then keys that occur only in file1, then keys that occur only in file2", that might be doable, but you should put that in your question. — Mark Plotnick, Aug 27 '18 at 03:00
@MarkPlotnick Can you help me.? Thats ok for the output if revpro comes last. — Subhashis Dey, Aug 27 '18 at 05:21
Your desired output is strange. Why does it contain emcas_gbo_gs and not emcas_fin_bi? Both options have changed. Don't you want to see all optons that differ between files? — Kamil Maciorowski, Aug 27 '18 at 05:35
@Kamil, I will test this and let you know... Btw, Thanks a lot for guiding me. — Subhashis Dey, Aug 27 '18 at 07:58

Kamil Maciorowski · Accepted Answer · 2018-08-27T07:48:53.333

I assume you're interested in comparing lines that follow the pattern

key=value

and the order of keys within a given file doesn't really matter.

Since one of your files contains trailing spaces, I think it's good to sanitize input first. The following helper function does more. It discards lines without =. It removes leading whitespace characters and trailing whitespace characters. It also removes whitespace characters neighboring =.

sanitize() { grep '=' "$1" | sed 's/^[[:space:]]*//; s/[[:space:]]*$//; s/[[:space:]]*=[[:space:]]*/=/'; }

Another function (using process substitution, not POSIX-friendly)

prepare() { diff <(sanitize "$1") <(sanitize "$2") | grep '^[<|>]' | sort -k 2 | uniq -u -f 1; }

will yield differences. Use it like this:

prepare ret1.txt ret2.txt

and the output will be:

< emcas_fin_bi=324
> emcas_fin_bi=333
< emcas_fin_drr=3294
> emcas_fin_drr=5528
> emcas_fin_exp=1134
< emcas_fin_exp=887
> emcas_fin_revpro=0
< emcas_gbo_gs=3077
> emcas_gbo_gs=3897

It's not the output you want but it's quite parsable. This means you can process it further in almost any way you want. E.g. you can use awk and column to get the desired format (or at least something close to it):

prepare ret1.txt ret2.txt | awk -F '[ =]' '
    { $1 == "<" ? L[$2]=$3 : R[$2]=$3 }
    END {
         for (key in L) if (key in R) print key"="L[key]"/|/"key"="R[key]
         for (key in L) if (! (key in R)) print key"="L[key]"/<"
         for (key in R) if (! (key in L)) print " />/"key"="R[key]
        }
    ' | column -s / -t

The result is:

emcas_fin_exp=887   |  emcas_fin_exp=1134
emcas_fin_bi=324    |  emcas_fin_bi=333
emcas_fin_drr=3294  |  emcas_fin_drr=5528
emcas_gbo_gs=3077   |  emcas_gbo_gs=3897
                    >  emcas_fin_revpro=0

_{Note: tested on Debian GNU/Linux 9.}

Identifying the difference in two files in unix

1 Answers1