0

I have this sed script that I am working on to delete the whole line in my OTU table when the species listed in a text file are found in my OTU table.

The script that I have right now and that doesn't work is below, but I can't make it work. Anyway, if you could please help me ASAP it would be very appreciated.

(read each line, search for it in my table and delete line, infile)

for i in $(cat /my/path/species.txt); do 
    sed -i '/"$i"/d' /my/path/ITS.OTU.table.tsv
done

thanks so much

don_crissti
  • 82,805
  • you'd need something like '/'"$i"'/d' but it is error prone in so many ways and in-efficient as well... not able to find exact duplicate of this question, but this one is close... https://unix.stackexchange.com/questions/398142/common-lines-between-two-files needs tweaking to delete, for ex: grep -vFf file1 file2 etc – Sundeep Oct 27 '17 at 14:23
  • 2
    Please [edit] your question and show us i) an example species.txt file; ii)_ an example ITS.OTU.table.tsv and iii) the output you are expecting from those two example input files. We can't help you parse data that you don't show. For example, should we expect d. melanogaster, Drosophila melanogaster, dmel, fly? Will they always be one word or two? Will they appear in exactly the same way in both files? Can there be more than one per line? That said, the answer will almost certainly be what Sundeep suggested above. – terdon Oct 27 '17 at 14:25
  • 1
    The OTU table looks like this format: Sample 1 Sample 2 taxonomy 0 4 k__Fungi; p__Basidiomycota; c__Microbotryomycetes; o__Leucosporidiales; f__Leucosporidiaceae; g__Mastigobasidium; s__Mastigobasidium intermedium

    the text file is like this: Postia leucomallella Candida boidinii Diederichomyces cladoniicola

    mostly two words sometimes more they will appear exactly the same way in both files never more than once per line or in the table file at all.

    – Émilie Tremblay Oct 27 '17 at 14:31
  • 4
    Please *edit* your question and show us those files. Comments are hard to read, impossible to format clearly and easy to miss. We need to see a specific example and the output you expect. And it doesn't look like they're identical. You show s__Mastigobasidium intermedium in the OTU file but suggest that you would have Mastigobasidium intermedium in the species file. That isn't identical. That's why we need to see an actual example. – terdon Oct 27 '17 at 15:08

3 Answers3

6
grep -v -xF -f /my/path/species.txt /my/path/ITS.OTU.table.tsv >/my/path/ITS.OTU.table.tsv.new

This would write /my/path/ITS.OTU.table.tsv.new. Its content would be all lines from the original file that does not exactly match a line in /my/path/species.txt.

The -xF options forces a string-identical full line match, and -f filename reads the lines to match with from the given file. The -v reverses the sense of the match so that only lines not matching are returned.

If you need to relax the match so that a string read from /my/path/species.txt may match anywhere on each line of /my/path/ITS.OTU.table.tsv, then remove the -x option.

If you know where on each line a match should be tested for, one could use awk to compare only these bits of data, but we don't currently know what your data looks like.

Kusalananda
  • 333,661
0

This is your script:

while read -r line 
do
    echo $line
    sed -i "/$line/d" /my/path/ITS.OTU.table.tsv
done < /my/path/species.txt

The echo $line is more for debugging reasons, so remove it eventualy.

  • Hello, when I run this script, I get this message after I see all the species being listed on the terminal: sed: -e expression #1, char 0: no previous regular expression but, it still works! – Émilie Tremblay Oct 27 '17 at 14:42
  • @ÉmilieTremblay It is because you have an empty line in your species.txt file. – Kusalananda Nov 24 '20 at 10:12
0

For unix, such as MacOS/BSD/Solaris, you need to pass an empty string to sed to avoid the issue Emilie encountered, so the answer'd be:

while read -r line 
do
    echo $line
    sed -i "" "/$line/d" /my/path/ITS.OTU.table.tsv
done < /my/path/species.txt