-1

I have many files in a folder:

$ ls -hlS | head
total 75M
-rw-r--r-- 1 ubuntu ubuntu 511 Aug  3 16:27 NW_009517088.1.lst
-rw-r--r-- 1 ubuntu ubuntu 478 Aug  3 16:27 NW_009539008.1.lst
-rw-r--r-- 1 ubuntu ubuntu 471 Aug  3 16:27 NW_009386266.1.lst
-rw-r--r-- 1 ubuntu ubuntu 471 Aug  3 16:27 NW_009411177.1.lst
-rw-r--r-- 1 ubuntu ubuntu 451 Aug  3 16:27 NW_009514912.1.lst

The content of each *.lst file looks as following:

$ cat NW_009514912.1.lst
rna-NisyCt036+
cds-YP_358756.1-
rna-NisyCt037+
cds-YP_358757.1+
cds-YP_358758.1+
cds-YP_358758.1+
id-NisyCp117-1+
id-NisyCp117-2+
id-LOC104209938-1-
rna-XM_009770987.1-
rna-XM_009780247.1+
rna-XM_009783083.1+
rna-XM_009784022.1-
rna-TRNAN-GUU+

How is it possible to delete from each *.lst file, line which do not start with rna-XM_?

AdminBee
  • 22,803
  • 2
    A clear case for find ... | xargs sed -i .... The find may need to be prevented from descending subdirectories, and the invariant part of the filename needs to be defined (like, is the .1.lst fixed). Can you read up those man pages, and show where you get stuck? – Paul_Pedant Aug 03 '20 at 08:02
  • Thank you, but find query/ -name "*.lst" | xargs sed -i '/^rna-XM_/d' doesn't work. What did I miss? – user977828 Aug 03 '20 at 10:23
  • 2
    You missed a bang sign: '/^rna-XM_/!d' – Rakesh Sharma Aug 03 '20 at 11:10
  • 1
    It is not clear if you want to delete "a line not starting with…", "first line not starting with…", or "all lines not starting with". Please edit question to make it clear. – ctrl-alt-delor Aug 03 '20 at 11:51
  • 1
    What is query/? – ctrl-alt-delor Aug 03 '20 at 11:55
  • Also, doesn't work is not an error message. All it conveys is that you did not get what you want, whatever it is. – Quasímodo Aug 03 '20 at 12:04

2 Answers2

3

Assuming you want to remove all lines that do not start with rna-XM_ (= keep only those that do start with rna-XM_), you can try the following:

for file in *.lst; do awk '/^rna-XM_/' "$file" > "${file}.new"; done

This will loop over all files whose names end in .lst and print only those lines starting with rna-XM_, where the output is written to a file filename.lst.new (which you then may have to rename to filename.lst if you want to replace the original file content).

The same also works with sed (see comment by @Rakesh Sharma):

for file in *.lst; do sed '/^rna-XM_/!d' "$file" > "${file}.new"; done

If you are confident that the code is correct, you can then use the "inline" editing feature of sed and state

for file in *.lst; do sed -i '/^rna-XM_/!d' "$file"; done

This will modify the files in-place, so you don't have to rename the filename.lst.new to filename.lst

Note that the "shell-for-loop"-approach is more robust than parsing the output of find (proposed in some comments) if your filenames can contain special characters. Although the GNU implementation of find and xargs have the -print0 and -0 options to deal with these situations, it is not portable.

AdminBee
  • 22,803
  • Glad someone else came in. I could see it was deleting the converse set, but I never previously saw the ! operator: it is in neither the man sed or the gnu.org manual. My best shot would have been sed -n '/re/p' . – Paul_Pedant Aug 03 '20 at 13:04
  • @Paul_Pedant You can find it in Section 4.1. of the GNU sed manual: "Appending the ! character to the end of an address specification (before the command letter) negates the sense of the match. That is, if the ! character follows an address or an address range, then only lines which do not match the addresses will be selected." – AdminBee Aug 03 '20 at 13:08
  • 1
    Thanks. I found the I ignore-case and M multiline modifiers in 4.3, but missed ! in 4.1. None of these are in the index, and the multi-part html version is not easy to search in a browser. Further reasons why I favour awk -- I can read the code, and the man page is better. – Paul_Pedant Aug 03 '20 at 13:16
  • @Paul_Pedant yes, it would have been nicer if they had made it a little more prominent ... – AdminBee Aug 03 '20 at 13:18
0

You can use below find command to get checked in current directory for files extension *.lst and delete the all lines in files which is not starting with "rna"

find . -maxdepth 1 -type f -name "*.lst" -exec sed -i '/^rna/!d' {} \;