Unable to understand the following shell script

Question

I am trying to run a demo example given in the tutorial for running codes on a cluster computer. Below is the example but I am unable to understand most of the statements -

#BSUB -L /bin/bash
#BSUB -J "MNIST_DDL"
#BSUB -o "MNIST_DDL.%J"
#BSUB -n 12
#BSUB -R "span[ptile=4]"
#BSUB -gpu "num=2"
#BSUB -q "normal"
#BSUB -W 00:10
ml wml_anaconda3
conda activate <your environment>
Workaround for GPU selection issue
cat > launch.sh << EoF_l
#! /bin/sh
export CUDA_VISIBLE_DEVICES=0,1
exec $*
EoF_l
chmod +x launch.sh
Run the program
export PAMI_IBV_ADAPTER_AFFINITY=0
ddlrun ./launch.sh python /path/to/your_program.py
Clean up
/bin/rm -f launch.sh

I can understand the initial #BSUB tagged lines, they tells about the allocation of resources and the meta-data about the code. But I am really not able to get the following lines-

  # Workaround for GPU selection issue
cat > launch.sh << EoF_l
#! /bin/sh
export CUDA_VISIBLE_DEVICES=0,1
exec \$*
EoF_l
chmod +x launch.sh

Thank You.

The code cat > launch.sh << EoF_l... creates a file launch.sh with the 3 lines of code before EoF_l and makes it executable. (I don't know how it is related to a"GPU selection issue.) — Bodo, Feb 22 '21 at 16:48
That's an awfully roundabout way of setting CUDA_VISIBLE_DEVICES=0,1 in the environment for the Python code. You could probably just do ddlrun env CUDA_VISIBLE_DEVICES=0,1 python /path/to/your_program.py and delete that here-document, or export it just before, as is done for PAMI_IBV_ADAPTER_AFFINITY (unless ddlrun cleans the environment). — Kusalananda, Feb 22 '21 at 17:06
Your script is using a heredoc to create a text file called launch.sh containing 3 lines of commands and then making it an executable shell script with chmod +x launch.sh — fpmurphy, Feb 22 '21 at 17:07
Thanks everyone for their inputs. I have another question - Where will this launch.sh be created, in the same directory as the above file? Because if I remove the rm - f launch.sh then it is bein created in the parent directory, where the above file is stored — Beginner, Feb 22 '21 at 17:42
@Kusalananda Could you please press more on why it is not a good way, Also I didn't understand how my `CUDA devices were being set before. — Beginner, Feb 22 '21 at 17:43
The launch.sh file will be created (or attempted to) in the current directory where you run this script from. Kusalananda is saying that creating launch.sh, running it, then deleting it, is unneeded complexity. Why not just perform the task of the launch.sh script directly in the parent script? — spuck, Feb 22 '21 at 17:48
Does this answer your question? How does cat > file << "END" work? — AdminBee, Feb 23 '21 at 13:14
@AdminBee Thank You for this. I have a doubt & I am quoting from the above answer - "stop interpreting the stream data as commands and pass them on to the stdin of the command you are going to execute" but in my case, we are changing the heredoc to the executable script by doing chmod +x launch.sh. How is it different from the case that if we haven't used the heredoc and have simply written export CUDA_VISIBLE_DEVICES=0,1 in the above script because according to my understanding the above script will be treated as a command and commands are executable(Let me know if my thinking is correct? — Beginner, Feb 25 '21 at 18:07

Unable to understand the following shell script

Workaround for GPU selection issue

Run the program

Clean up

0 Answers0