0

I want to make a bash script (split.sh) that iterates across multiple dirs with same suffix, and then runs a function for specific files within them. I am almost there:

#!/bin/bash
path="/mypath/MAP-9-[0-9][0-9][0-9]"

for filename in $path/*bam; do [ -e "$filename" ] || continue echo $filename for chrom in seq 1 22 X Y do samtools view -bh $filename $chrom > $path/$chrom.bam samtools index > $path/$chrom.bam; done done

However, I get many messages of this kind: "split.sh: line 12: /mypath/MAP-9-[0-9][0-9][0-9]/6.bam: No such file or directory"

The problem is that the script is not recognizing the "[0-9][0-9][0-9]" regex part of the pathname. I also tried adding escape characters to the square brackets without success. It must be a very simple solution, but I am not able to solve it.

This is an extract of the tree command's output:

|-- [[
|-- MAP-9-001
|   |-- MAP-9-001.bam
|   `-- MAP-9-001.bam.bai
|-- MAP-9-003
|   |-- MAP-9-003.bam
|   `-- MAP-9-003.bam.bai
|-- MAP-9-005
|   |-- MAP-9-095.bam
|   `-- MAP-9-095.bam.bai
|-- split.sh
Kusalananda
  • 333,661
Lucas
  • 99

2 Answers2

3

Don't be confused by glob vs regex (you use glob here):

Globs are shell patterns that can be used for matching strings or expanding pathnames:

[[ $name = Bob* ]]
rm *.txt

See http://mywiki.wooledge.org/glob

A corrected version of your script:

#!/bin/bash

for filename in /path/MAP-9-[0-9][0-9][0-9]/*bam; do [[ -e $filename ]] || continue echo "$filename" for chrom in {1..22} X Y; do samtools view -bh "$filename" "$chrom" > "$(dirname "$filename")/$chrom.bam" samtools index "$(dirname "$filename")/$chrom.bam" done done

Learn how to quote properly in shell, it's very important :

"Double quote" every literal that contains spaces/metacharacters and every expansion: "$var", "$(command "$var")", "${array[@]}", "a & b". Use 'single quotes' for code or literal $'s: 'Costs $5 US', ssh host 'echo "$HOSTNAME"'. See
http://mywiki.wooledge.org/Quotes
http://mywiki.wooledge.org/Arguments
http://wiki.bash-hackers.org/syntax/words
when-is-double-quoting-necessary


[[ is a bash keyword similar to (but more powerful than) the [ command. See http://mywiki.wooledge.org/BashFAQ/031 and http://mywiki.wooledge.org/BashGuide/TestsAndConditionals. Unless you're writing for POSIX sh, I recommend [[

Kusalananda
  • 333,661
1

/mypath/MAP-9-[0-9][0-9][0-9]/*.bam is a shell glob, or filename expansion expression. It expands to a list of matching files - you can use that to iterate over your input files, but you can't expect it to work as a "per iteration" wildcard to generate the corresponding output files. I think what you probably want instead is to generate each output file from the corresponding loop variable $filename, as follows:

#!/bin/bash

shopt -s nullglob

for filename in /mypath/MAP-9-[0-9][0-9][0-9]/.bam; do [ -e "$filename" ] || continue echo "$filename" for chrom in {1..22} X Y; do samtools view -bh "$filename" "$chrom" > "${filename%/}/${chrom}.bam" samtools index > "${filename%/*}/$chrom.bam" done done

The shell parameter expansion ${filename%/*} expands to the value of $filename with the shortest trailing substring /* removed; it thus gives you the dirname of each input file, to which you can then add the $chrom.bam to form each output file in turn.

Kusalananda
  • 333,661
steeldriver
  • 81,074