1

I have been using a for loop to run a pipeline for multiple files but unfortunately the terminal froze halfway. I would like to run the pipeline again but because of time I would like to skip the directories that already has the output files created. Basically nest a if statement - if file output file exists, ignore if not run pipeline. Is this possible?

for f in /Volumes/My\ Passport/Documents/Projects/untitled\ folder\ 2/untitled\ folder\ 3/untitled\ folder\ 2/untitled\ folder/*/*_1.fastq; do
subdir="${f%/*}"
pushd "$subdir" &>/dev/null
file1="${f##*/}"
file2="${file1%_1.fastq}_2.fastq"
adapter="/Volumes/My\ Passport/Documents/adapters.fa"
reference="/Volumes/My\ Passport/Documents/ucsc_hg19/ucsc.hg19.fasta"
dbSNP="/Volumes/My\ Passport/Documents/ucsc_hg19/dbsnp_138.hg19"
COSMIC="/Volumes/My\ Passport/Documents/ucsc_hg19/CosmicCodingMuts.vcf"
interval="/Volumes/My\ Passport/Documents/plist.bed"
sjdb="/Volumes/My\ Passport/Documents/ucsc_hg19/ucsc.hg19.gtf"
file3="${file1%_1.fastq}_1_trimmed.fastq"
file4="${file2%_2.fastq}_2_trimmed.fastq"

#preQC (cutadapt -O subtracted, prinseq -min_qual_score 4 -ns_max_p 2 subtracted)
~/Desktop/UTSW/Applications/bbmap/bbduk.sh -Xmx120g in1="${file1}" in2="${file2}" out1="${file1%_1.fastq}_1_trimmed.fastq" out2="${file2%_2.fastq}_2_trimmed.fastq" ref="${adapter}" trimq=10

paste - - - - < "${file3}" | sort -k1,1 -t " " | tr "\t" "\n" > "${file3%_1_trimmed.fastq}_trimmed_sorted_1.fastq"
paste - - - - < "${file4}" | sort -k1,1 -t " " | tr "\t" "\n" > "${file4%_2_trimmed.fastq}_trimmed_sorted_2.fastq"

parallel -j $PARALLEL_TASKS perl ~/UTSW/Applications/prinseq-lite-0.20.4/prinseq-lite.pl -fastq "${file3%_1_trimmed.fastq}_trimmed_sorted_1.fastq" -fastq2 "${file4%_2_trimmed.fastq}_trimmed_sorted_2.fastq" -no_qual_header -trim_right 1 -custom_params "A 75%;T 75%;G 75%;C 75%" min_qual_mean 25 -min_len 40 -out_format 3 -out_good "${f%.*}_QC" -out_bad null -log

done
ozarka
  • 287

2 Answers2

4

I'm not sure where to recommend putting the test, but the [ shell command, and the [[ bash built-in both have tests that could be used this way:

for f in ...
do
    if [[ ! -e "$f" ]]
    then
        # do work here because file $f does not exist
    fi
done

That's just an example, I'm not sure what work your loop body does, so maybe the test should go else where in the loop body.

2

In general, the best way to test whether you can open a file - whether it be for input or output - is simply to try to open it.

More specifically for output, if you only wish to open a file for output if doing so would create a new file, POSIX shells offer the no-clobber shell option configurable via set, and so you can use this to test whether output exists before proceeding.

Trying to make sense of your current script is a little difficult, but maybe consider:

set -C -- '/Volumes/My Passport/Documents/Projects/untitled folder 2/untitled folder 3/untitled folder 2/untitled folder/'*/*_1.fastq
[ -e "$1" ] &&
for f
do    if    cd -- "${f%/*}" &&
            f=${f##*/} f=${f%1*}
      then  if    command exec \
                      3> "$f"1_trimmed.fastq \
                      4> "$f"2_trimmed.fastq \
                      5> "$f"_trimmed_sorted_1.fastq \
                      6> "$f"_trimmed_sorted_2.fastq
            then  ~/Desktop/UTSW/Applications/bbmap/bbduk.sh -Xmx120g in1="$f"1.fastq in2="$f"2.fastq out1=/dev/fd/3 out2=/dev/fd/4 ref="${adapter}" trimq=10 &&
                  paste - - - - < "$f"1_trimmed.fastq | sort ... | tr >&5 ... &&
                  paste - - - - < "$f"2_trimmed.fastq | sort ... | tr >&6 ... 
            fi
       fi
 done