I want to be able to go through a file and find lines that are different in the fields I expect to be stable, ignoring differences in the fields I know will be different.
I want to ignore columns 2,5,6 - changes in columns 1,3,4 are ones that are worth reporting.
To give an example:
I would not want the first two lines reported as containing a change, but I would want the second pair to two lines to be reported.
The file was already sorted with
sort -k1,1 -k3,3n -k4,4n
Any suggestions? (Apologies for the question formatting, I'm new)
The data in non-image form with addition lines before and after:
NZ_CP020102 B4U62_RS00130 26852 28543 DNA polymerase III subunit gamma/tau NCIB3610a
NZ_CP020102 TESTGENOMECL_26 26852 28543 DNA polymerase III subunit gamma/tau TESTGENOME
NZ_CP020102 B4U62_RS00135 28567 28890 YbaB/EbfC family nucleoid-associated protein NCIB3610a
NZ_CP020102 TESTGENOMECL_27 28567 28890 YbaB/EbfC family nucleoid-associated protein TESTGENOME
NZ_CP020102 B4U62_RS00140 28905 29501 recombination protein RecR NCIB3610a
NZ_CP020102 TESTGENOMECL_28 28905 29501 recombination protein RecR TESTGENOME
NZ_CP020102 B4U62_RS00145 29519 29743 DUF2508 domain-containing protein NCIB3610a
NZ_CP020102 TESTGENOMECL_29 29519 29743 DUF2508 domain-containing protein TESTGENOME
NZ_CP020102 B4U62_RS00150 29810 30073 sigma-K factor-processing regulatory protein BofA NCIB3610a
NZ_CP020102 TESTGENOMECL_30 29810 30073 sigma-K factor-processing regulatory protein BofA TESTGENOME
NZ_CP020102 B4U62_RS00155 30317 31869 16S ribosomal RNA NCIB3610a
NZ_CP020102 TESTGENOMECL_31 30317 31870 16S ribosomal RNA TESTGENOME
NZ_CP020102 B4U62_RS00160 31969 32045 tRNA-Ile NCIB3610a
NZ_CP020102 TESTGENOMECL_32 31969 32045 tRNA-Ile TESTGENOME
The only two lines that should be returned as relevantly different are the two 16s line, due to the difference in column 4.
For the most part, the lines are paired, but there can be omissions
NZ_CP020102 B4U62_RS00085 20006 20596 pyridoxal 5'-phosphate synthase glutaminase subunit PdxT NCIB3610a
NZ_CP020102 TESTGENOMECL_17 20006 20596 pyridoxal 5'-phosphate synthase glutaminase subunit PdxT TESTGENOME
NZ_CP020102 TESTGENOMECL_4554 20704 20925 hypothetical protein TESTGENOME
NZ_CP020102 B4U62_RS00090 20918 22195 serine--tRNA ligase NCIB3610a
NZ_CP020102 TESTGENOMECL_18 20918 22195 serine--tRNA ligase TESTGENOME
In this case, the unpaired line would be what I would be interested in.
Essentially, I'm looking to suppress columns, run a diff, but then have the diff output include the suppressed columns.