0

I'm stuck on this script. I've defined a variable with a single cat of a temp file, and these are the following operations done by the script.

PROJECT=$(cat temp/project_name_final) ;

#Bifidobacterium contigs selection

grep "Bifidobacterium" ${PROJECT}_genera.txt | gawk '{print $2}' > temp/${PROJECT}_Bif_genera ;
gawk '{print $3}' ${PROJECT}_species.txt > temp/${PROJECT}_Bif_species ;
grep -v -f temp/${PROJECT}_Bif_species temp/${PROJECT}_Bif_genera > temp/${PROJECT}_selected_Bif ;

The first grep works fine, the awk one too. Variable is used well into the given filename. The last grep seems to work bad, and the generated file is named "_selected_Bif" , so the variable isn't used as before. I'm tryin to find a way to explain why I get this trouble.

TL;DR None of the used file are empty. The expected file, named "${PROJECT}_selected_Bif" also could not be empty, if grep works as expected.

Shred
  • 133
  • 7

3 Answers3

1

I really can't see why the PROJECT variable is not getting expanded properly on you last line of code (unless you're looking in the wrong place for the generated file), but I do see that you don't properly double quote your variable expansions. Not double quoting these would cause issues as soon as $PROJECT contained spaces or newlines, or any other character that is special to the shell.

You are also jumping through a lot of hoops for something that could be done with a single awk program:

proj=$(<temp/project_name_final)

awk 'NR==FNR { species[$3]; next } /Bifidobacterium/ && !($2 in species) { print $2 }' \
    "${proj}_species.txt" "${proj}_genera.txt" >"temp/${proj}_selected_Bif"

This awk program reads the two files ${proj}_species.txt and ${proj}_genera.txt. While reading the first file, its third column is used to create a key in the associative array or hash species. When we then start reading the second file, we are only interested in lines that contain the string Bifidobacterium and whose second column is not a key in the species hash. For those lines, we output the second column.

All output goes to temp/${proj}_selected_Bif.

Note the double quoting of all expansions of the proj variable. I used a lower-cased variable name since upper-cased variables are reserved for system and shell environment variables.

See also

Kusalananda
  • 333,661
  • Thanks for the awk solution, I'm still a beginner with bash. What I actually don't understand is why while replacing the last grep command, without using any other trick like double quotes, filename is printed correctly. What I've seen till now is that redirection of stdout into a file always work, also with empty stdout. – Shred Jun 13 '18 at 10:16
  • @Shred Yes, any redirection with > creates the output file (or empties it if it exists) before the command is even started (if the output file is writable at all). – Kusalananda Jul 09 '18 at 07:45
1

Your initial file contains a line that ends with CR/LF. The CR is carried through as part of the $PROJECT variable, and as CR is a valid character in a filename, all the intermediate steps work correctly. (But the filenames are "wrong".)

The last output is also correct, but the CR in the filename is interpreted to force the cursor back to the beginning of the line, so all you see is _selected_Bif.

You can prove this by stripping the CR as you read the file contents.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • ...and on this basis the question is a duplicate. But I can't find it. – Chris Davies Jun 12 '18 at 23:04
  • False. By changing the last grep instruction, filename is printed correctly. – Shred Jun 13 '18 at 10:13
  • And I don't understand why just the last redirection of stdout will print the "wrong" filename, if there's a CR/LF ending line. (false, I've double checked now). – Shred Jun 13 '18 at 10:19
0

I've solved. Trouble here is that grep without other instruction isn't able to handle lines without sorting them. So the output file will be empty, and, don't know why, grep create this empty file but with a wrong filename. (anyone knows why?)

So instead of

$ grep -v -f 

I've used

$ grep -F -x -v -f 
Shred
  • 133
  • 7
  • 1
    grep doesn't care if the lines are sorted; if you can show evidence of different behavior, that'd be good for a separate question (new questions don't go in Answers) – Jeff Schaller Jun 12 '18 at 11:23
  • 1
    grep does not create a file. When you use redirection, the shell creates the file. So the file name cannot possibly have anything to do with the options you pass to grep. If editing the file fixed it, maybe there was a non-ASCII character or a control character somewhere and you removed it without noticing. But without seeing your exact source, that's just speculation. – Gilles 'SO- stop being evil' Jun 12 '18 at 19:16