got this script to work against a file, composed by lots of line (>500Mb) with this scheme:
odd lines: >BLA_BLA lenght_XX cov.XX
even lines: AGCAGCAGACTCAGACTACAGAT # on even lines there's a DNA sequence
Its function is to recalc value after "cov." using parameters passed by arguments and replace the older one and calc the percent amount of "G" and "C" into the DNA seq, in even lines.
So, output looks like:
> BLA_BLA lenght_XX
> nucleotidic_cov XX
> DNA seq (the same of even lines)
> GC_CONT: XX
Here's the code (only the loop):
K=$(($READLENGHT - $KMER + 1))
Y=$(echo "scale=4; $K / $READLENGHT" | bc)
while read odd; do
echo -n "${odd##}" | cut -d "_" -f 1,2,3,4 && printf "nucleotide_cov: "
echo "scale=4;${odd##*_} / $Y" | bc
read even
echo "${even##}" &&
ACOUNT=$(echo "${even##}" | sed -e "s/./&\n /g" | grep -c "A")
GCOUNT=$(echo "${even##}" | sed -e "s/./&\n /g" | grep -c "G")
CCOUNT=$(echo "${even##}" | sed -e "s/./&\n /g" | grep -c "C")
TCOUNT=$(echo "${even##}" | sed -e "s/./&\n /g" | grep -c "T")
TOTALBASES=$(($ACOUNT+$GCOUNT+$CCOUNT+$TCOUNT))
GCCONT=$(($GCOUNT+$CCOUNT))
printf "GC_CONT: "
echo "scale=2;$GCCONT / $TOTALBASES *100" | bc
done < "$1"
It's incredibly slow when runs against huge text file (bigger than 500Mb) on a 16 core server. Any idea on how to increase speed of this script?
EDIT
As requested, desidered I/O provided via pastebin: https://pastebin.com/FY0Z7kUW