0

I am using EOF to generate bash scripts that run the Rscripts. In the Rscript I used basename to specify the output file name.

When I use EOF to generate a list of bash scripts, I could not get basename to work. The error message is shown below. I was still able to get the bash scripts generated but the ${AF} turned into a blank in both places where it presented. Very strange!

I had the bash script tested and it is working so I know the problem is somewhere between EOF and basename.

How can I use basename with EOF? Or is there any alternative methods? Thank you.

for letter in {A..Z}
      do cat <<- EOF > batch_${letter}.sh
    #!/bin/bash
    module load R/3.5.1
    R_func="/home/dir/R_func"
    TREAT="/home/dir/POP"
    BASE="/home/dir/base"
    OUTPUT="/home/dir/tmp"

    for AF in ${BASE}/${letter}*.txt_step3; do
    Rscript ${R_func}P_tools.R \
    --ptool ${R_func}/P_tools_linux \
    --group ${AF} \
    --treat ${TREAT}/pop_exclude24dup \
    --out ${OUTPUT}/OUT_$(basename ${AF%%_txt_step3})_noregress \
    --binary-target F; done

    EOF
       done

This is the error message

basename: missing operand Try 'basename --help' for more information.

Molly_K
  • 161

2 Answers2

3

Command substitutions like your $(basename ...) and variables are also expanded in here-documents if the delimiter is not quoted. You should escape the $ from $(basename ...) and also any $ inside it.

Corrected version of your script:

for letter in {A..Z}
        do cat <<- EOF > batch_${letter}.sh
                #!/bin/bash
                module load R/3.5.1
                R_func="/home/dir/R_func"
                TREAT="/home/dir/POP"
                BASE="/home/dir/base"
                OUTPUT="/home/dir/tmp"

                for letter in {A..Z} do {
                for AF in \${BASE}/${letter}*.txt_step3; do
                Rscript \${R_func}P_tools.R \
                --ptool \${R_func}/P_tools_linux \
                --group \${AF} \
                --treat \${TREAT}/pop_exclude24dup \
                --out \${OUTPUT}/OUT_\$(basename \${AF%%_txt_step3})_noregress \
                --binary-target F; done
                }
        EOF
done

That is actually indented by tabs; this stupid web interface is turning tabs into spaces, which will probably break <<- which in any POSIX shell is only stripping tabs, not spaces from before the EOF delimiter and the lines from the here-document.

  • I actually like your previous version better! haha! Thank you @unclebilly, it works very well and your explanation is great. – Molly_K Mar 26 '19 at 21:50
  • It's still wrong, the $ from R_func should be quoted, too. Fixed now. –  Mar 26 '19 at 21:51
  • Yes, it was not fully corrected at the time but enough info was given so I learned from you that I needed to add the backsslah to variables. Thank you again. Very helpful! – Molly_K Mar 26 '19 at 21:58
3

The << EOF...EOF construct is called a here-document, and you can put whatever string you like as the delimiter, but EOF is common.

The issue you're facing is that the here-document acts like a double-quoted string, so the variables and the command substitution in it are expanded when cat runs, they're not stored as-is in the resulting file. This is probably not what you want, since you set, e.g. R_func in the batch_x.sh you're writing, but ${R_func} would expand to whatever value R_func has in the generating script.

You can prevent this, by quoting the here-doc delimiter, i.e. use cat << 'EOF' instead. However, this prevents expansion of all variables, so you can't have one expanded and the others not without creating the file in parts or using an unquoted delimiter and escaping all but one variable, as in Uncle Billy's answer.


Now, if I understood your idea correctly, you want to create 26 scripts with each letter hard-coded in one of them. The first script you create (batch_A.sh) then looks something like this:

module load R/3.5.1
R_func="/home/dir/R_func"
....
for AF in ${BASE}/A*.txt_step3; do
   Rscript ${R_func}P_tools.R \
   ...
done

Instead, you could create just one script, as below, and the pass the letter to it as a command line argument. The first command line argument is available as "$1":

#!/bin/bash
module load R/3.5.1
R_func="/home/dir/R_func"
TREAT="/home/dir/POP"
BASE="/home/dir/base"
OUTPUT="/home/dir/tmp"
letter=$1

for AF in ${BASE}/${letter}*.txt_step3; do
    Rscript "${R_func}P_tools.R" \
    --ptool "${R_func}/P_tools_linux" \
    --group "${AF}" \
    --treat "${TREAT}/pop_exclude24dup" \
    --out "${OUTPUT}/OUT_$(basename "${AF%%_txt_step3}")_noregress" \
    --binary-target F;
done

The variable letter would now be taken from the command line, so you could run batch.sh A to process the A files, etc.

ilkkachu
  • 138,973
  • Thank you @ilkkachu, great answer and works well!! – Molly_K Mar 26 '19 at 21:50
  • Hi @ilkkachu, I wonder if you could let me know if there's a way to escape the "stored as-is" under the cat << 'EOF' circumstance. More specifically, I want ${letter} in both bash script filename and within the script. – Molly_K Mar 27 '19 at 15:14
  • You mentioned that creating "one" script and passing the letter to it as a command line argument. Could you elaborate more on that? My original thought is to break down files based on alphabets and run in parallel. However, if there's a way to break down the files individually and run scripts in parallel would be even better, I should work on that. – Molly_K Mar 27 '19 at 15:15
  • @Molly_K, I tried to adapt your script to picking $letter from the command line. For parallel processing, you could write the script so that it processes just one file, and then run xargs -P or use GNU parallel to run a bunch of them at a time. – ilkkachu Mar 27 '19 at 19:14
  • There's probably a dozen or so Q&A's about running scripts in parallel here on the site, see e.g. https://unix.stackexchange.com/q/169326/170373 and https://unix.stackexchange.com/q/211976/170373 and the search https://unix.stackexchange.com/search?tab=votes&q=run%20script%20parallel – ilkkachu Mar 27 '19 at 19:16
  • @Molly_K a better way to replace some variable in a here-document is to do it the autoconf way: quote the delimiter so you don't have to escape any $, and then pipe the heredoc to sed, replacing @@var@@ with the value of $var: sed <<-'EOT' >out "s|@@var@@|$var|g"; ... lines containing @@var@@ ... EOT (replace the | with a char that doesn't appear in the value of $var). But, as ilkkachu has said, it may be that you don't really need to autogenerate any scripts. –  Mar 28 '19 at 05:35