I have a file like the following on a Linux machine, and the gene_id
is wrong. So I would like to replace the string "PB"
in the gene_id
with the transcript_id
from the same line.
$ cat try.gff
transcript 30351 32332 . + . gene_id "PB"; transcript_id "PB.1.66";
exon 30351 31677 . + . gene_id "PB"; transcript_id "PB.1.59";
exon 31758 31871 . + . gene_id "PB"; transcript_id "PB.1.40";
exon 31968 32178 . + . gene_id "PB"; transcript_id "PB.1.30";
exon 32257 32332 . + . gene_id "PB"; transcript_id "PB.1.20";
transcript 30351 32332 . + . gene_id "PB"; transcript_id "PB.28.309";
exon 30351 31677 . + . gene_id "PB"; transcript_id "PB.58.900";
exon 31758 31871 . + . gene_id "PB"; transcript_id "PB.10000.1001";
exon 31968 32178 . + . gene_id "PB"; transcript_id "PB.19897.1087541";
exon 32257 32332 . + . gene_id "PB"; transcript_id "PB.1.11";
Expected result
transcript 30351 32332 . + . gene_id "PB.1"; transcript_id "PB.1.66";
exon 30351 31677 . + . gene_id "PB.1"; transcript_id "PB.1.59";
exon 31758 31871 . + . gene_id "PB.1"; transcript_id "PB.1.40";
exon 31968 32178 . + . gene_id "PB.1"; transcript_id "PB.1.30";
exon 32257 32332 . + . gene_id "PB.1"; transcript_id "PB.1.20";
transcript 30351 32332 . + . gene_id "PB.28"; transcript_id "PB.28.309";
exon 30351 31677 . + . gene_id "PB.58"; transcript_id "PB.58.900";
exon 31758 31871 . + . gene_id "PB.10000"; transcript_id "PB.10000.1001";
exon 31968 32178 . + . gene_id "PB.19897"; transcript_id "PB.19897.1087541";
exon 32257 32332 . + . gene_id "PB.1"; transcript_id "PB.1.11";
I managed to get the right text for replacement with the code awk -F";" '{gsub(" transcript_id ","");print $2}' try.gff | sed 's/"//g' | cut -d '.' -f 1,2
PB.1
PB.1
PB.1
PB.1
PB.1
PB.28
PB.58
PB.10000
PB.19897
PB.1
How can I replace the "PB"
in the gene_id
part?
gene_id "AB"; transcript_id "CD.1.59";
? If so, what would you want to happen? – G-Man Says 'Reinstate Monica' Mar 16 '22 at 16:18