I have a really big file that looks like this:
>name1
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
>name2
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
>name
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
>name4
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
ACGTACGTACGT
It is a fasta file. It has about 3183 lines that start with >
(3183 names), followed by random number of lines of ACGTs. I want to split it into smaller files of 250 >
s followed by their number of lines of ACGTs. And if the last file does not have 250 >
s that is fine. I would still like to keep it. So far I tried split, which I don't think is appropriate here since it splits the file into one >
in each small file. I also tried awk:
awk -F'>' 'NR==1{f=0;c=1}NR>1{
c++
if($((c%250))==0) {
fn="file"c".fasta";
print > fn}
}' kmer_subtraction/kmercollection.fasta
I am not sure if this works because I cannot see my file. Could you please help me with this? Thank you!