I have a FASTA format file:
>Ipunensis_00386 Yfr1
GCGGAGACGAAAGTTTCCGTTCACTCCTCACACCACACTCCGCCCAAATCATTGATTTGG
GCGGTT
>Ipunensis_00401 tRNA-Gly(gcc)
GCGGGTATAGCTCAGTGGTAGAGCGTCACCTTGCCAAGGTGAATGTCGCGCGTTCGAATC
GCGTTACCCGCT
>Ipunensis_00001 transposase IS4 family protein
ATGCAGAAGTTTCAGGGCATCCACTGGGTCAACCTAGACGGGCAGCACCAGGTTAGCAAT
CTCAGTGATGAGCGACGCTTAATCATCCACCTCTTGGGGCCACCTGTTGAGCGCTACTAC
CATGCCCCTGGTTAA
>Ipunensis_00002 Photosystem I assembly protein Ycf3
ATGCGTCACCCCGCCAAGTTACTCGGGTTAGTCACTCTCACCAGTATGCTTACGCTGGCT
>Ipunensis_00003 Cell wall-associated protease
ATGAAACGTTTTCTGACCAGTCTTTTGCTGACGGGCCTGCTTTGGCATAGTGGGGGCAGC
GTTGGGGTTGGGAGAGGTGCGATCGCACAAACCCAGTCCACCCCAGACCTCTACTACACC
>Ipunensis_00004 Photosystem I assembly protein Ycf3
TTGACCTGCGGCCCGCAGCCCTACCTGCCCAACCTGACTCCAGAAATTCCCATGATCTAC
CGCCTCTCGTCTCCCGGATTTTTGCTGGCGCTGCTGCTGCTATCTGCCGTCGATCCGGCA
>Ipunensis_00226 tRNA-Leu(gag)
TGCGGATGTGGTGGAACTGGTAGACACGCACGTTTGAGGGGCGTGTGGCTTACGCCTTGC
GAGTTCGAGTCTCGCCATCCGCAT
>Ipunensis_00045 tRNA-Ala(cgc)
GGGGAATTAGCTCAGCTGGTAGAGCGCTGCGATCGCACCGCAGAGGTCAGGAGTTCGAAT
CTCCTATTCTCCA
>Ipunensis_00357 glnA
ATCGTTCATCTCTTCAAACTGTCAAAGCTACTTACAAAAGCTACAGACGCACCAAGAGAC
GGAAGTAGGGGTCTGATCCCCCCGAAGGAACGCGCC
>Ipunensis_00403 tRNA-Gly(gcc)
GCGGGTATAGCTCAGTGGTAGAGCGTCACCTTGCCAAGGTGAATGTCGCGCGTTCGAATC
How can I sort the above fasta file based on its alphanumeric ids: starting with >Ipunensis_00001 so on and so forth.
Desired output:
>Ipunensis_00001 transposase IS4 family protein
ATGCAGAAGTTTCAGGGCATCCACTGGGTCAACCTAGACGGGCAGCACCAGGTTAGCAAT
CTCAGTGATGAGCGACGCTTAATCATCCACCTCTTGGGGCCACCTGTTGAGCGCTACTAC
CATGCCCCTGGTTAA
>Ipunensis_00002 Photosystem I assembly protein Ycf3
ATGCGTCACCCCGCCAAGTTACTCGGGTTAGTCACTCTCACCAGTATGCTTACGCTGGCT
>Ipunensis_00003 Cell wall-associated protease
ATGAAACGTTTTCTGACCAGTCTTTTGCTGACGGGCCTGCTTTGGCATAGTGGGGGCAGC
GTTGGGGTTGGGAGAGGTGCGATCGCACAAACCCAGTCCACCCCAGACCTCTACTACACC
>Ipunensis_00004 Photosystem I assembly protein Ycf3
TTGACCTGCGGCCCGCAGCCCTACCTGCCCAACCTGACTCCAGAAATTCCCATGATCTAC
CGCCTCTCGTCTCCCGGATTTTTGCTGGCGCTGCTGCTGCTATCTGCCGTCGATCCGGCA
>Ipunensis_00045 tRNA-Ala(cgc)
GGGGAATTAGCTCAGCTGGTAGAGCGCTGCGATCGCACCGCAGAGGTCAGGAGTTCGAAT
CTCCTATTCTCCA
>Ipunensis_00226 tRNA-Leu(gag)
TGCGGATGTGGTGGAACTGGTAGACACGCACGTTTGAGGGGCGTGTGGCTTACGCCTTGC
GAGTTCGAGTCTCGCCATCCGCAT
>Ipunensis_00357 glnA
ATCGTTCATCTCTTCAAACTGTCAAAGCTACTTACAAAAGCTACAGACGCACCAAGAGAC
GGAAGTAGGGGTCTGATCCCCCCGAAGGAACGCGCC
>Ipunensis_00386 Yfr1
GCGGAGACGAAAGTTTCCGTTCACTCCTCACACCACACTCCGCCCAAATCATTGATTTGG
GCGGTT
>Ipunensis_00401 tRNA-Gly(gcc)
GCGGGTATAGCTCAGTGGTAGAGCGTCACCTTGCCAAGGTGAATGTCGCGCGTTCGAATC
GCGTTACCCGCT
>Ipunensis_00403 tRNA-Gly(gcc)
GCGGGTATAGCTCAGTGGTAGAGCGTCACCTTGCCAAGGTGAATGTCGCGCGTTCGAATC
id
field that you want to sort by? Are the>
characters really in the file?) What I can understand from your question is "How can I sort the above specially formatted file based on some field" – Chris Davies Aug 01 '20 at 20:39>
character. Are there really multiple lines per section, or is that an artefact of the way you posted your question? – Chris Davies Aug 01 '20 at 20:48>Ipunensis_00045
and>Ipunensis_00403
are missing from your expected output. If that's a mistake, please fix it, otherwise please explain it. – Ed Morton Aug 02 '20 at 00:52>
starts a multi-line block. So not lines shall be sorted but blocks. – Hauke Laging Aug 02 '20 at 14:37