3

I have a tab-separated file that looks like this:

$ cat in_file
NC_013132.1     7260299 7261429 WP_012793281.1
NC_013132.1     7270674 7270862 NC_013132.1     7270674 7270862 ID=cds5678
NC_013132.1     7573559 7574311 WP_012793549.1
NZ_CP022095.2   2809552 2809629 NZ_CP022095.2   2809552 2809629 ID=cds2731
NZ_CP022095.2   2884046 2885668 WP_003877393.1
NZ_CP022095.2   3106358 3106435 NZ_CP022095.2   3106358 3106435 ID=cds2976

I want to delete lines that start with NC or NZ in column 4. I tried doing it with awk -F '\t' '$4 != "^NC | ^NZ"' in_file but it didn't work.

The output should look as follows:

$ cat out_file
NC_013132.1     7260299 7261429 WP_012793281.1
NC_013132.1     7573559 7574311 WP_012793549.1
NZ_CP022095.2   2884046 2885668 WP_003877393.1
Inian
  • 12,807

3 Answers3

4

You can simply do that as below. When you use awk with == you are performing a literal string match for equality, in which you cannot do regular expression matches like ^ or $. You can do simply do pattern matching with ~ and turn on the negation match with !. For multiple patterns, use the alternation (pat1|pat2) style supported in ERE

awk 'BEGIN { OFS=FS="\t" } $4 !~ /^(NZ|NC)/' file

Add a re-direction at the end of the command to write the output to a new file > newfile. To modify the file in-place, follow the steps in this answer How to permanently change a file using awk? (“in-place” edits, as with “sed -i”)

Inian
  • 12,807
2

You need pattern matching operator ~ (or !~ for negation), treating the right hand operand as an (extended) regular expression on the left hand one as a string, so

awk -F'\t' '$4 !~ "^(NC|NZ)"' infile

Or shorter:

awk -F'\t' '$4 !~ "^N[CZ]"' infile

and even shorter if you don't have space within a column (since default awk's FS is Tab/space):

awk '$4 !~ "^N[CZ]"' infile
αғsнιη
  • 41,407
0

Tried with below method

command

awk '$4 !~ /^NC|^NZ/{print $0}' filename

output

awk '$4 !~ /^NC|^NZ/{print $0}' o.txt
NC_013132.1     7260299 7261429 WP_012793281.1
NC_013132.1     7573559 7574311 WP_012793549.1
NZ_CP022095.2   2884046 2885668 WP_003877393.1