I am trying to run a demo example given in the tutorial for running codes on a cluster computer. Below is the example but I am unable to understand most of the statements -
#BSUB -L /bin/bash
#BSUB -J "MNIST_DDL"
#BSUB -o "MNIST_DDL.%J"
#BSUB -n 12
#BSUB -R "span[ptile=4]"
#BSUB -gpu "num=2"
#BSUB -q "normal"
#BSUB -W 00:10
ml wml_anaconda3
conda activate <your environment>
Workaround for GPU selection issue
cat > launch.sh << EoF_l
#! /bin/sh
export CUDA_VISIBLE_DEVICES=0,1
exec $*
EoF_l
chmod +x launch.sh
Run the program
export PAMI_IBV_ADAPTER_AFFINITY=0
ddlrun ./launch.sh python /path/to/your_program.py
Clean up
/bin/rm -f launch.sh
I can understand the initial #BSUB
tagged lines, they tells about the allocation of resources and the meta-data about the code.
But I am really not able to get the following lines-
# Workaround for GPU selection issue
cat > launch.sh << EoF_l
#! /bin/sh
export CUDA_VISIBLE_DEVICES=0,1
exec \$*
EoF_l
chmod +x launch.sh
Thank You.
cat > launch.sh << EoF_l
... creates a filelaunch.sh
with the 3 lines of code beforeEoF_l
and makes it executable. (I don't know how it is related to a"GPU selection issue.) – Bodo Feb 22 '21 at 16:48CUDA_VISIBLE_DEVICES=0,1
in the environment for the Python code. You could probably just doddlrun env CUDA_VISIBLE_DEVICES=0,1 python /path/to/your_program.py
and delete that here-document, or export it just before, as is done forPAMI_IBV_ADAPTER_AFFINITY
(unlessddlrun
cleans the environment). – Kusalananda Feb 22 '21 at 17:06heredoc
to create a text file calledlaunch.sh
containing 3 lines of commands and then making it an executable shell script withchmod +x launch.sh
– fpmurphy Feb 22 '21 at 17:07launch.sh
be created, in the same directory as the above file? Because if I remove therm - f launch.sh
then it is bein created in the parent directory, where the above file is stored – Beginner Feb 22 '21 at 17:42chmod +x launch.sh
. How is it different from the case that if we haven't used the heredoc and have simply writtenexport CUDA_VISIBLE_DEVICES=0,1
in the above script because according to my understanding the above script will be treated as a command and commands are executable(Let me know if my thinking is correct? – Beginner Feb 25 '21 at 18:07