I recently linearized a fasta file using awk. The output is perfect. However there is a caret(^) in my sequence. I want to remove this caret. below is my attempt, any assistance is highly appreciated.
>P1
MPPRRSIVEVKVLDVQKRRVPNKHYVYIIRVTWSSGATEAIYRRYSKFFDLQMQMLDKFP^MMEGGQKDPKQRIIPFLPGKILFRRSHIRDVAVKRLIPIDEYCKALIQLPPYISQCDEVLQ^MFFETRPEDLNPPKEEHIGKKKSGNDPTSVDPM
>P2
MAEVRKFTKRLSKPGTAAELRQSVSEAVRGSVVLEKAKLVEPLDYENVITQRKTQIYSDP^MLRDLLMFPMEDISISVIGRQRRTVQSTVPEDAEKRAQSLFVKECIKTYSTDWHVVNYKYE^MDFSGDFRMLPCKSLRPEKIPNHVFEIDEDCEK
>P3
GDDSEWLKLPVDQKCEHKLWKARLSGYEEALKIFQKIKDEKSPEWSKYLGLIKKFVTDS^MNAVVQLKGLEAALVYVENAHVAGKTTGEVVSGVVSKAKELGIEICLMYVEIE^MKGESVQEELLKGLDNKNPKIIVACIETLRKALS
I tried using:
$ sed '/s: ^// seq2.fa>seq3.fa
The code above is giving me an error of sed:e expression #1,char7: unkown command: '/'
Any assistance is appreciated, thanks.
^M
, carriage return characters. If you just remove the^
, you will get the wrong sequence. – terdon Dec 31 '22 at 15:02^M
– thole Dec 31 '22 at 15:25rstrip()
which will remove both\r
and\n
. But seriously, I cannot stress this enough: do NOT try to use both Windows and non-Windows systems on the same file unless you always remember to convert between the line endings. Even better, if you're doing bioinformatics, just don't use Windows at all. – terdon Dec 31 '22 at 15:44^M
but instead focus on^
, you will have borked your sequences with extra methionine residues so you really want to fix that too. – terdon Dec 31 '22 at 15:45