0
DASqv -v -H$H -c$cov $db $i | grep Recommend - | sed "s|Recommend ||g" - | sed "s|'||g" -` by itself produces `DAStrim -g20 -b25

My goal is to combine the previous results with awk '{print $1 " " $2 " "$3 " $db $i"}' and pipe the whole command to an output file > $(basename $i .las).DAStrim.

Unfortunately, I only get as result bananaDB ./bananaDB.100.las and not DAStrim -g20 -b25 bananaDB ./bananaDB.100.las with the following code:

#!/bin/bash

db=bananaDB
H=6973
cov=38

for i in $(find . -type f -name "*.*.las");
do
  #cat <<EOF
  qsub <<EOF

#!/bin/bash -l

#PBS -N DASqv
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -l mem=1G
#PBS -l ncpus=1
#PBS -M m.lorenc@qut.edu.au
##PBS -m bea

cd \$PBS_O_WORKDIR

source activate thegenemyers


DASqv -v -H$H -c$cov $db $i | grep Recommend - | sed "s|Recommend ||g" - | sed "s|'||g" - | awk '{print $1 " " $2 " "$3 " $db $i"}' > $(basename $i .las).DAStrim

EOF

done

UPDATE

DASqv -v -H$H -c$cov $db $i

produced:

DASqv -c38 bananaDB ./bananaDB.100.las

Input:   16,450reads,   210,758,575 bases (another 9,934 were < H-length)

Histogram of q-values (average 10 best)

                 Input                 QV

    50:    1494189    0.2%       380302   18.0%

    49:     364713    0.0%          484    0.0%
    48:     545846    0.1%          423    0.1%
    47:     650479    0.2%          466    0.1%
    46:     835282    0.3%          548    0.1%
    45:    1054589    0.4%          648    0.1%
    44:    1299423    0.5%          775    0.2%
    43:    1644281    0.7%          895    0.2%
    42:    2036915    0.9%         1193    0.3%
    41:    2571126    1.2%         1334    0.4%
    40:    3518594    1.5%         1647    0.5%
    39:    3641660    1.9%         2046    0.6%
    38:    5026473    2.4%         2291    0.7%
    37:    6243982    3.1%         2708    0.9%
    36:    7600704    3.9%         3301    1.1%
    35:    9313754    4.9%         4002    1.3%
    34:   11257936    6.0%         4676    1.6%
    33:   13508338    7.5%         5544    1.9%
    32:   15981847    9.1%         6552    2.3%
    31:   18648809   11.1%         7771    2.7%
    30:   22290239   13.4%         9124    3.3%
    29:   25083448   16.0%        10624    3.9%
    28:   29566164   19.1%        12874    4.6%
    27:   33339712   22.6%        15482    5.5%
    26:   37891335   26.6%        18869    6.6%
    25:   44146531   31.2%        23307    7.9%
    24:   44948068   35.9%        28142    9.5%
    23:   50951224   41.3%        33590   11.5%
    22:   55009718   47.1%        42157   13.9%
    21:   57456151   53.1%        52181   16.9%
    20:   60635065   59.4%        63207   20.6%
    19:   58423422   65.6%        76426   25.0%
    18:   58472922   71.7%        91565   30.2%
    17:   55127848   77.5%       107289   36.4%
    16:   50395382   82.7%       123758   43.6%
    15:   43893354   87.3%       136465   51.4%
    14:   36509552   91.2%       145632   59.8%
    13:   28654550   94.2%       145540   68.2%
    12:   21245809   96.4%       138232   76.2%
    11:   14560980   97.9%       121403   83.2%
    10:    9345155   98.9%        98071   88.8%
     9:    5395169   99.5%        73996   93.1%
     8:    2894210   99.8%        52246   96.1%
     7:    1335673   99.9%        33845   98.0%
     6:     581470  100.0%        19476   99.2%
     5:     201756  100.0%         9367   99.7%
     4:      76322  100.0%         3760   99.9%
     3:      18979  100.0%         1082  100.0%
     2:       4751  100.0%          264  100.0%
     1:        456  100.0%           41  100.0%
     0:       2686  100.0%           38  100.0%

  Recommend 'DAStrim -g20 -b25'

What did I miss?

Thank you in advance.

  • I suspect it might have to do with the $db and $i variables in your awk command. What about: awk -v db="$db" -v i="$i" '{print $1,$2,$3,$db,$i}'? – jesse_b Feb 01 '18 at 23:14
  • Also I don't think $(basename $i .las) is correct, not that it would be your issue but it should just be $(basename "$i") – jesse_b Feb 01 '18 at 23:20
  • awk: cmd. line:1: {print ,,,bananaDB,./bananaDB.100.las} awk: cmd. line:1: ^ syntax error awk: cmd. line:1: {print ,,,bananaDB,./bananaDB.100.las} awk: cmd. line:1: ^ syntax error awk: cmd. line:1: {print ,,,bananaDB,./bananaDB.100.las} awk: cmd. line:1: ^ syntax error awk: cmd. line:1: {print ,,,bananaDB,./bananaDB.100.las} awk: cmd. line:1: ^ syntax error awk: cmd. line:1: {print ,,,bananaDB,./bananaDB.100.las} awk: cmd. line:1: ^ unterminated regexp awk: cmd. line:1: {print ,,,bananaDB,./bananaDB.100.las} – user977828 Feb 01 '18 at 23:23
  • sorry I'm not sure. – jesse_b Feb 01 '18 at 23:32
  • You are making this far more complicated than it needs to be. What is the output of DASqv -v -H$H -c$cov $db $i before it is piped into grep|sed|sed|awk? Do not reply in a comment, edit your question and add the requested information there. – cas Feb 01 '18 at 23:42
  • added the output – user977828 Feb 01 '18 at 23:54
  • do you want the submitted job to just output DAStrim -g20 -b25 bananadb ./bananaDB.100.las to ./bananaDB.100.DAStrim or to actually run that command and redirect the output to ./bananaDB.100.DAStrim? The script in my answer assumes the former but can easily be changed to do the latter. – cas Feb 02 '18 at 00:44
  • only to create the command and not running it. – user977828 Feb 02 '18 at 00:46

1 Answers1

2

You're making things more difficult than they need to be, and running into white-space and quoting issues. Try something like the following:

Step 1: create a standalone script that does what you want with one or more of your data file(s), given the appropriate args and filename(s) on the command line.

#!/bin/sh

# use the first 3 arguments for the values to pass to DASqv
db="$1"
H="$2"
cov="$3"

# use shift to get rid of them once we have them in variables, ...
shift 3

# ... so we can loop over the remaining filenames (1 or more) on the command line
for filename in "$@" ; do
  outfile="$(basename "$filename" .las).DAStrim"
  qsub <<EOF
#!/bin/bash -l

#PBS -N DASqv
#PBS -l walltime=48:00:00
#PBS -j oe
#PBS -l mem=30G
#PBS -l ncpus=1
#PBS -M m.lorenc@qut.edu.au
##PBS -m bea

cd "\$PBS_O_WORKDIR"

source activate thegenemyers

DASqv -v -H"$H" -c"$cov" "$db" "$filename" | 
  sed -n -e '/Recommend/ {
               s/Recommend //;
               s/\x27//g;
               s:$: "$db" "$filename":;
               p
             }' > "$outfile"

EOF

done

(the sed script in the middle of that could be all on one line but the extra line-feeds and indentation make it more readable without changing what it does / how it runs in any way. Also, note the use of \x27 to remove all single-quote characters. 0x27 is the hexadecimal notation for the ASCII single-quote character)

save it as e.g. submit-jobs.sh and make it executable with chmod +x submit-job.sh.

Step 2: Test it

Test that the script does what you want by manually using it to submit jobs. e.g. run:

/path/to/submit-jobs.sh bananaDB 6973 38 /path/to/somefile.las

Modify the script if necessary until it does exactly what you want.

Step 3: Now use find to submit multiple jobs using the script:

find . -type f -name '*.las' -exec /path/to/submit-jobs.sh bananaDB 6973 38 {} +

Step 4: (optional) turn step 3 into a script that you can run with different arguments to save you from having to type the find ... command every time you want another run with slightly different values. e.g.

#!/bin/sh
find . -type f -name '*.las' -exec /path/to/submit-jobs.sh "$1" "$2" "$3"

If you saved this as find-and-submit.sh and made it executable with chmod +x, you would run it as:

find-and-submit.sh bananaDB 6973 38

This step 4 script could even have a for loop for the variables so that, for example, it submitted jobs for $cov values from 35 to 45 instead of requiring $cov to be one of the arguments.

cas
  • 78,579
  • I got sed: -e expression #1, char 112: unknown option tos'` – user977828 Feb 02 '18 at 00:55
  • BTW, one other common problem when submitting jobs to PBS or Torque or slurm etc is that the environment when the job is executed may be quite different to the command line environment when you submitted the job. e.g. if the activate file sourced into the submitted script isn't in $PBS_O_WORKDIR then the job wil fail. – cas Feb 02 '18 at 00:56
  • found what the problem is. my tests were on files in the current dir, so there were no / characters in any filename. When there are / characters (e.g. when used with find), that breaks the sed command which adds the filename to the DAStrim command. I'll fix it by using : as the delimiter on that line. – cas Feb 02 '18 at 01:20
  • What does shift 3 mean? – user977828 Feb 02 '18 at 01:54
  • shift dumps the first positional parameter and then renumbers the parameters remaining so that what was $2 is now $1, $3 is now $2, etc. shift 3 dumps the first three parameters. try help shift in bash, or man bash and search for shift \[n\] for more details. – cas Feb 02 '18 at 01:57
  • BTW, i've slightly improved the script. Works the same, but should be easier to read/understand. The sed part of the script now uses \x27, the hexadecimal ASCII code for a single-quote. This allows the entire sed script to be single-quoted, eliminating the need to backslash-escape the double-quotes. – cas Feb 02 '18 at 02:11