How to delete lines containing string in column n using awk?

Question

I have a tab-separated file that looks like this:

$ cat in_file
NC_013132.1     7260299 7261429 WP_012793281.1
NC_013132.1     7270674 7270862 NC_013132.1     7270674 7270862 ID=cds5678
NC_013132.1     7573559 7574311 WP_012793549.1
NZ_CP022095.2   2809552 2809629 NZ_CP022095.2   2809552 2809629 ID=cds2731
NZ_CP022095.2   2884046 2885668 WP_003877393.1
NZ_CP022095.2   3106358 3106435 NZ_CP022095.2   3106358 3106435 ID=cds2976

I want to delete lines that start with NC or NZ in column 4. I tried doing it with awk -F '\t' '$4 != "^NC | ^NZ"' in_file but it didn't work.

The output should look as follows:

$ cat out_file
NC_013132.1     7260299 7261429 WP_012793281.1
NC_013132.1     7573559 7574311 WP_012793549.1
NZ_CP022095.2   2884046 2885668 WP_003877393.1

Inian · Accepted Answer · 2019-05-22T10:52:47.303

You can simply do that as below. When you use awk with == you are performing a literal string match for equality, in which you cannot do regular expression matches like ^ or $. You can do simply do pattern matching with ~ and turn on the negation match with !. For multiple patterns, use the alternation (pat1|pat2) style supported in ERE

awk 'BEGIN { OFS=FS="\t" } $4 !~ /^(NZ|NC)/' file

Add a re-direction at the end of the command to write the output to a new file > newfile. To modify the file in-place, follow the steps in this answer How to permanently change a file using awk? (“in-place” edits, as with “sed -i”)

αғsнιη · Answer 2 · 2019-05-22T10:46:17.853

2

You need pattern matching operator ~ (or !~ for negation), treating the right hand operand as an (extended) regular expression on the left hand one as a string, so

awk -F'\t' '$4 !~ "^(NC|NZ)"' infile

Or shorter:

awk -F'\t' '$4 !~ "^N[CZ]"' infile

and even shorter if you don't have space within a column (since default awk's FS is Tab/space):

awk '$4 !~ "^N[CZ]"' infile

edited May 22 '19 at 10:46

answered May 22 '19 at 10:38

αғsнιη

41,407

score 0 · Answer 3 · answered May 22 '19 at 16:53

Tried with below method

command

awk '$4 !~ /^NC|^NZ/{print $0}' filename

output

awk '$4 !~ /^NC|^NZ/{print $0}' o.txt
NC_013132.1     7260299 7261429 WP_012793281.1
NC_013132.1     7573559 7574311 WP_012793549.1
NZ_CP022095.2   2884046 2885668 WP_003877393.1

How to delete lines containing string in column n using awk?

3 Answers3