I came across this one-liner script fu for getting rid of newline characters in a fixed width text file. The idea is to change a file full of entries like:
>IGHV1-18*01
CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAG
GTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGC
TGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTAC
AATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACA
GACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCC
GTGTATTACTGTGCGAGAGA
to
>IGHV1-18*01
CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAGAAGCCTGGGGCCTCAGTGAAGGTCTCCTGCAAGGCTTCTGGTTACACCTTTACCAGCTATGGTATCAGCTGGGTGCGACAGGCCCCTGGACAAGGGCTTGAGTGGATGGGATGGATCAGCGCTTACAATGGTAACACAAACTATGCACAGAAGCTCCAGGGCAGAGTCACCATGACCACAGACACATCCACGAGCACAGCCTACATGGAGCTGAGGAGCCTGAGATCTGACGACACGGCCGTGTATTACTGTGCGAGAGA
I am not very experienced with AWK so I figured it would be a good learning experience to try and decipher it. However, I am having difficulties. Specifically about multiple blocks coming after each other, is the first block an implicit for-loop?
awk '/^>/ {printf("\n%s\n",$0);next; } { printf("%s",$0);} END {printf("\n");}' < file.fa
{}
block which seemingly corresponds to theelse
. Thanks for the explanation! As a side question, one shortcoming of this script is that it adds an extra newline in the beginning of the file, which breaks some tools. Would it be possible to avoid that somehow? – posdef Sep 12 '16 at 14:22next
in the 1st block ensures that the second is only run if the 1st fails. So yes, it acts like anelse
but isn't really implicit (there's an explicitnext
). And yes, that newline was there in the biostars answer. The simplest way to get rid of it would be to pass the output through| tail -n +2
. You might also be interested in the scripts in my answer here, by the way. I find the tbl format much more useful than the fake fasta with the whole seq on one line. – terdon Sep 12 '16 at 14:26tail -n +2
i suppose? – posdef Sep 13 '16 at 11:12