Input file: 131751_pphA.fasta
>ID:NDNDCOEC_02118 |[Genus species]|strain|PANS_1_2_annot.gbk|pphA|855|NODE_3_length_422941_cov_112.146787422941(422941):170566-171420:1 ^^ Genus species strain strain.|neighbours:ID:NDNDCOEC_02117(1),ID:NDNDCOEC_02119(1)|neighbour_genes:hypothetical protein,ntaA| | aligned:1-284 (284)
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
>ID:KJDCINFB_03194 |[Genus species]|strain|PNA_1_5_annot.gbk|pphA|855|NODE_5_length_527105_cov_93.286545527105(527105):274765-275619:1 ^^ Genus species strain strain.|neighbours:ID:KJDCINFB_03193(1),ID:KJDCINFB_03195(1)|neighbour_genes:hypothetical protein,ntaA| | aligned:1-284 (284)
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
>ID:LBFHNJKP_02554 |[Genus species]|strain|PANS_1_6_annot.gbk|pphA|855|NODE_4_length_527158_cov_95.108790527158(527158):251540-252394:-1 ^^ Genus species strain strain.|neighbours:ID:LBFHNJKP_02553(-1),ID:LBFHNJKP_02555(-1)|neighbour_genes:ntaA,hypothetical protein| | aligned:1-284 (284)
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
>ID:GPMHBDBL_03046 |[Genus species]|strain|PNA_200_2_annot.gbk|pphA_2|855|NODE_4_length_530984_cov_86.347264530984(530984):275036-275890:1 ^^ Genus species strain strain.|neighbours:ID:GPMHBDBL_03045(1),ID:GPMHBDBL_03047(1)|neighbour_genes:hypothetical protein,ntaA| | aligned:1-284 (284)
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
Desired output: four separate output files:
PANS_1_2_pphA.fasta
>PANS_1_2_pphA
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
PNA_1_5_pphA.fasta
>PNA_1_5_pphA
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
PANS_1_6_pphA.fasta
>PANS_1_6_pphA
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
PNA_200_2_pphA_2.fasta
>PNA_200_2_pphA_2
MIKKLIAEKGTLIFIEAHNPLSALIASKAEQTNSEGRIVKFDGIWSSSLTDSASRGIPDNETLALSSRLENIADIRNVTDMPIIMDADTGGKPEHFSYYVKRMINNGVNGVIIEDKTGLKKNSLFGTEVEQTLADINDFSEKIKRGKSAVYIDDFMIIARLESLIAGFDVEHALERADAYVEAGADGIMIHSCKKTPDEVFLFSTKFRKKYPSVPLICVPTTYSATSNRELSEAGFNVIIYANHMLRAAYKAMENVSKEILRYGRTAEIEKSCMSVKEIISLIP
The multifasta input file (131751_pphA.fasta
) contains four fasta sequences with headers. I want four output files that are individual fasta sequences with their names and headers named as per the strain as mentioned above. For example, one of headers in the input fasta contains the strain information as |strain|PANS_1_2_annot.gbk|pphA|
. The output file should have its name as
PANS_1_2_pphA.fasta
and its header as >PANS_1_2_pphA
.
Similarly the other output files as
PNA_1_5_pphA.fasta
with header >PNA_1_5_pphA
PANS_1_6_pphA.fasta
with header >PANS_1_6_pphA
PNA_200_2_pphA_2.fasta
with header >PNA_200_2_pphA_2
Tried the following code:
awk -F "|" '/^>/ {close(F); ID=$1; gsub("^>", "", ID); F=ID".fasta"} {print >> F}' 123764_pphA.fasta
Resulting in fasta output files with following names:
ID:BKKCPFME_02840 .fasta ID:EKPOMJAO_03222 .fasta ID:HEIIBHGJ_01315 .fasta ID:KBMOKBJB_03162 .fasta ID:LECGKDGM_03166 .fasta