0

I have a folder with the following file content:

ls bams-lab/*.name-sorted.fixmate.sorted.dedup.sam
bams-lab/OZBenth2_.fastp.fq.gz.name-sorted.fixmate.sorted.dedup.sam  
...
bams-lab/OZBenth7_.fastp.fq.gz.name-sorted.fixmate.sorted.dedup.sam

I tried to create a list of files with the below bash script

#!/bin/bash
# usage: sh merge_sam_pbs.sh /path/to/*.name-sorted.fixmate.sorted.dedup.sam 
output=$(dirname $1)
samlist=$(for sam in $1; do echo "I=$sam "; done)
cat << EOF  |cat #qsub
#!/bin/bash -l
#PBS -N merge
#PBS -l walltime=150:00:00
#PBS -j oe
#PBS -l mem=70G
#PBS -l ncpus=2
#PBS -M m.lorenc@qut.edu.au

cd \$PBS_O_WORKDIR

conda activate picard
echo $samlist

picard -Xmx10g  MergeSamFiles \
      $samlist \
      O=${output}/merged.sorted.dedup.bam

EOF

but it only picks up one file

> sh merge_sam_pbs.sh bams-lab/*.name-sorted.fixmate.sorted.dedup.sam 
#!/bin/bash -l
#PBS -N merge
#PBS -l walltime=150:00:00
#PBS -j oe
#PBS -l mem=70G
#PBS -l ncpus=2
#PBS -M m.lorenc@qut.edu.au

cd $PBS_O_WORKDIR

conda activate picard
echo I=bams-lab/OZBenth2_.fastp.fq.gz.name-sorted.fixmate.sorted.dedup.sam 

picard -Xmx10g  MergeSamFiles       I=bams-lab/OZBenth2_.fastp.fq.gz.name-sorted.fixmate.sorted.dedup.sam        O=bams-lab/merged.sorted.dedup.bam

What did I miss?

Rui F Ribeiro
  • 56,709
  • 26
  • 150
  • 232

2 Answers2

1

It picks up one file only, because $1 is just one file.

The * is interpreted when you call your script, so your call

sh merge_sam_pbs.sh bams-lab/*.name-sorted.fixmate.sorted.dedup.sam 

is issued as

sh merge_sam_pbs.sh "bams-lab/1.name-sorted.fixmate.sorted.dedup.sam" "bams-lab/2.name-sorted.fixmate.sorted.dedup.sam" "bams-lab/3.name-sorted.fixmate.sorted.dedup.sam"

with $1 then being "bams-lab/1.name-sorted.fixmate.sorted.dedup.sam".


You want to use "$@" in the forloop:

samlist=$(for sam in "$@"; do echo "I=$sam "; done)

or better replace the for loop with printf:

samlist=$(printf 'I=%s\n' "$@")

or even better for your use case, add quotes and a space instead of newline:

samlist=$(printf 'I="%s" ' "$@")
pLumo
  • 22,565
0

You've declared the script as being a bash shell script so I'm going to assume that's what you're intending to use. (Don't run it with sh script, though; use bash script instead. They can be different shells.)

You can replace the samlist string with an array of file elements

#!/bin/bash
# usage: sh merge_sam_pbs.sh /path/to/*.name-sorted.fixmate.sorted.dedup.sam 
output=$(dirname $1)
samlist=$(for sam in $1; do echo "I=$sam "; done)

Becomes

#!/bin/bash
# usage: bash merge_sam_pbs.sh /path/to/*.name-sorted.fixmate.sorted.dedup.sam

# Create output directory based on first filename passed to the script
output="${1%/*}"

# For all the filenames passed to the script, prefix with 'I=', and add to array
samlist=()
for sam in "$@"
do
    samlist+=("I=$sam")
done

And now you can use the array you've created. So instead of this

picard -Xmx10g  MergeSamFiles \
      $samlist \
      O=${output}/merged.sorted.dedup.bam

You can use this

picard -Xmx10g  MergeSamFiles "${samlist[@]}" O="$output/merged.sorted.dedup.bam"

Notice that I've quoted all of the variables when I've used them. This stops the shell trying to process the individual space-separated items. Furthermore, if the "{samlist[@]}" contains to no elements, it simply disappears. Take a look at Why does my shell script choke on whitespace or other special characters for more details.

Chris Davies
  • 116,213
  • 16
  • 160
  • 287