1

I have a series of directories, all with list.txt in the same format, and I wish to put the results into a single file. I am looking to write a script that will iteratively move through each directory tree, extract a specific column from the list.txt file without surrounding text using the grep/awk pipeline below and write the outputs of each to the same file.

    grep 'bar[0-9]' file.txt | awk '{print $1}'

I have attempted the following but I am not sure exactly where my loops in the script are going wrong.

#!/bin/bash
##Extract ligands from toplist and concatenate to file
for i in /home/ubuntu/Project/working/library_*/Results/list.txt
do
    grep 'bar[0-9]' i | awk '{print $1}' | cat ../output.txt i
done

The directory tree is as follows:

.
├── library_1-200
│   ├── Results
│   │   ├── complex
│   │   ├── sorted.txt
│   │   └── list.txt
│   ├── files
│   │   ├── output
│   │   └── txt
│   └── summary.txt
├── library_201-400
│   ├── Results
│   │   ├── complex
│   │   ├── sorted.txt
│   │   └── list.txt
│   ├── files
│   │   ├── output
│   │   └── txt
│   └── summary.txt
├── library_401-600
│   ├── Results
│   │   ├── complex
│   │   ├── sorted.txt
│   │   └── list.txt
│   ├── files
│   │   ├── output
│   │   └── txt
│   └── summary.txt
└── library_601-800
    ├── Results
    │   ├── complex
    │   ├── sorted.txt
    │   └── list.txt
    ├── files
    │   ├── output
    │   └── txt
    └── summary.txt

Sample of list.txt, where I just want the Name values put into output.txt

Name    Score
bar65    -7.8 
bar74    -7.5 
bar14    -7.5 
bar43    -7.4 
bar94    -7.4 
bar16    -7.4 
bar12    -7.3 
bar25    -7.3 
bar65    -7.3 
bar76    -7.3 
bar24    -7.3 
bar13    -7.3 
bar58    -7.2 
bar68    -7.2 
bar28    -7.2 

Solution was to put "$i" where I previously had just i and to modify to | cat >> ../output.txt

  • Assuming you mean to do something like grep 'bar[0-9]' "$i" | awk '{print $1}' | cat > "$i", see https://unix.stackexchange.com/a/425801/70524 – muru May 27 '19 at 05:07
  • Thanks @muru. That did the job of writing to an output file, however it printed the entirety of the list.txt files to the single output, seemingly ignoring the grep and awk commands. – proteinmodels May 27 '19 at 05:17
  • You probably meant ... | cat > ../output.txt, or without the unnecessary cat, just > ../output.txt. – RalfFriedl May 27 '19 at 05:27
  • Yeah now tried removing the "$i" at the end, and it worked. Thanks! – proteinmodels May 27 '19 at 05:28

2 Answers2

0

You are using i, instead of this use $i in grep command.

And you said that you want to put all of them into single file then the last command should be:

cat >> /home/ubuntu/Project/working/output.txt

Or just:

>> /home/ubuntu/Project/working/output.txt
Prvt_Yadav
  • 5,882
  • Thanks for that tip, it is now actually running and spitting something out, however it tells me that output.txt does not exist. is there a command I should be using instead of cat in order to create the output and add to it? It is also just printing the entirety of each list.txt and not the specific sections column I am after. Does that have to do with the way I have written the for i in /*/list.txt? – proteinmodels May 27 '19 at 05:10
  • Please edit question and provide a sample example of your files. And also first manually create output.txt then it will work. – Prvt_Yadav May 27 '19 at 05:13
  • Added a sample of list.txt, and the comment by @muru taught me about redirection. – proteinmodels May 27 '19 at 05:22
0

Apart from correcting some small typos in your original code (using "$i" in place of i and redirecting the output to the output file rather than trying to output its contents), if you don't have many thousands of these list.txt files:

awk '/^bar[0-9]/ { print $1 }' /home/ubuntu/Project/working/library_*/Results/list.txt >output.txt

This is using awk to extract the first column of all lines that start with the string bar followed by a digit. It does this for all files matching the patten /home/ubuntu/Project/working/library_*/Results/list.txt. The extracted data is redirected to output.txt.

The loop becomes necessary when the filename globbing pattern /home/ubuntu/Project/working/library_*/Results/list.txt expands to too many names:

for pathname in /home/ubuntu/Project/working/library_*/Results/list.txt; do
    awk '/^bar/ { print $1 }' "$pathname"
done >output.txt

Note that it's more efficient to redirect the output of the loop than of each individual awk call. Also note that awk easily does the job of grep to detect the wanted lines and that cat is not needed.

If you need the first column from all lines except the first (as in your example data), you can change the condition in the awk code from /^bar[0-9]/ to FNR > 1.

Kusalananda
  • 333,661
  • Thanks so much for the detailed reply. Super useful to learn and will definitely be using it in the future. I'm still finding my way around bash. How many times for the expansion would you allow before putting it into a loop? – proteinmodels May 27 '19 at 06:46
  • @proteinmodels You would get an error saying Argument list too long if the expansion became too long. "Too long" means literally that the expanded string consisting of all matching pathnames becomes too long, so it depends on the length of the path that you expand in combination with the number of library_* directories that you have. I would expect it to cope with a few thousands of your directories though, more if you use just library_*/Results/list.txt from within the /home/ubuntu/Project/working directory (because the pathnames would be shorter). – Kusalananda May 27 '19 at 06:49