xargs version:
To do this, you need to split each line of the input file on white-space (which is xargs
' default behaviour), and then get xargs
to run the sh
script once for each word, using -n 1
. Alternatively, you could get the shell script to loop over "$@".
You can't use -I {}
here because that causes xargs
to read the input file one line at a time. Even if you set the delimiter to a space with -d ' '
, you'll get errors at the end of every input line, when it begins reading the next input line.
Fortunately, you don't need to use -I {}
at all - you're already just appending echo word of the input to the end of the sh
command line as a separate argument, which is xargs
' default behaviour without -I.
In the shell script, you refer to the argument by its positional parameter (i.e. as $1
), as you would in any shell script. You can use $1
as often as you like within the shell script.
You also need to provide argument 0 (an arbitrary name for the sh process - the string "sh" is convenient - but you can use any name if you want it to be easy to find in ps
) for the sh -c '...script...'
command. It has to be the first argument after the '...script...'
argument, and will be $0
inside the shell script.
So, you need to do something like this (without a for loop):
xargs -n 1 sh -c \
'./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran \
-1 "./samples/$1_chrX_1.fastq.gz" -2 "./samples/$1_chrX_2.fastq.gz" \
-S "./map/$1_chrX.sam"' sh < ./samples.txt
or (with a for loop):
xargs sh -c '
for f in "$@"; do \
./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran \
-1 "./samples/${f}_chrX_1.fastq.gz" -2 "./samples/${f}_chrX_2.fastq.gz" \
-S "./map/${f}_chrX.sam"
done' sh < ./samples.txt
The for loop version will be faster because it doesn't need to execute sh
multiple times (once for each "word").
It runs sh
as few times as possible, with the limit being the shell's maximum command-line length (about 2MB on modern systems). Unless samples.txt
is extremely large (over 200,000 entries) that means it will run sh
only once.
shell while-read loop:
xargs isn't needed for a job like this. The following will work in bash, and probably other Bourne-like shells that support arrays and the -a
option to read
.
while read -a words; do
for f in "${words[@]}"; do
./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran \
-1 "./samples/${f}_chrX_1.fastq.gz" \
-2 "./samples/${f}_chrX_2.fastq.gz" \
-S "./map/${f}_chrX.sam"
done
done < samples.txt
This reads each word of each input line into a bash array, then (using a for
loop iterating over each word in the array), runs the hisat2 program with the appropriate args.
However, see: Why is using a shell loop to process text considered bad practice?
awk version:
awk '{
for (i=1;i<=NF;i++) {
printf "./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran -1 \"./samples/%s_chrX_1.fastq.gz\" -2 \"./samples/%s_chrX_2.fastq.gz\" -S \"./map/%s_chrX.sam\"\n", $i, $i,$i;
}
}' ./samples.txt | sh
Note that this pipes the output of awk in to sh to be excecuted. Without that pipe into sh, the output looks like:
./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran -1 "./samples/ERR199044_chrX_1.fastq.gz" -2 "./samples/ERR199044_chrX_2.fastq.gz" -S "./map/ERR199044_chrX.sam"
./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran -1 "./samples/ERR188104_chrX_1.fastq.gz" -2 "./samples/ERR188104_chrX_2.fastq.gz" -S "./map/ERR188104_chrX.sam"
./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran -1 "./samples/ERR188234_chrX_1.fastq.gz" -2 "./samples/ERR188234_chrX_2.fastq.gz" -S "./map/ERR188234_chrX.sam"
./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran -1 "./samples/ERR188245_chrX_1.fastq.gz" -2 "./samples/ERR188245_chrX_2.fastq.gz" -S "./map/ERR188245_chrX.sam"
This will also run quickly, as the sh script executes each line as the awk script prints it.
Alternatively, you could use sprintf
to put the command line into a variable, rather than printf
to stdout. Then you could use awk
's system()
function to execute it directly, similar to the perl example below:
perl version:
perl -lane '
foreach $f (@F) {
system(qw(./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran),
-1, "./samples/${f}_chrX_1.fastq.gz",
-2, "./samples/${f}_chrX_2.fastq.gz",
"-S", "./map/${f}_chrX.sam");
};' ./samples.txt
This uses perl
's system()
function so it executes the commands directly, without needing to be piped into sh
.
For a test-run, add the word echo
and a space immediately after the qw(
quote-operator.
BTW, it would be easier to add code to check whether the files existed, and/or test whether each run of ./hisat2-2.1.0/hisat2
was successful or post-process the output in this perl version than in the shell or awk versions - especially if it was written as a stand-alone script rather than a one-liner. For example:
#!/usr/bin/perl -w
use strict;
while(<>) {
foreach my $f (split) {
my $f1 = "./samples/${f}_chrX_1.fastq.gz";
my $f2 = "./samples/${f}_chrX_2.fastq.gz";
my $sam = "./map/${f}_chrX.sam";
if (!(-r $f1 && -r $f2 && -r $sam)) {
warn "Missing or unreadable file for $f\n";
next
};
my $rc = system(
qw(echo ./hisat2-2.1.0/hisat2 -p 8 --dta -x ./indexes/chrX_tran),
-1, $f1, -2, $f2, '-S', $sam
);
if ($rc) {
warn "hisat2 returned non-zero exit code for $f: $rc\n";
};
}
}
xargs
, so we can see what you are trying to achieve. Possibly the biggest issue is that-I {}
implies-L 1
which is likely not what you want if your input is space-separated – steeldriver Sep 05 '19 at 00:48