0

I have a txt data like this (df1.txt)

>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU 
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN 
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET

and I have a txt data like this (df2.txt)

tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2

I want to merge them together based on the similar info so I want to have a output like this

>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET

I am trying this with no success, any thought ?

cat df1.txt | seqkit replace -k df2.txt -p '(.+)' -r '$1 {kv}'
Learner
  • 165
  • 1
  • @Jeff Schaller yes – Learner Dec 22 '18 at 02:37
  • Is it different from them? I don't think you should write the same question three times. – Jeff Schaller Dec 22 '18 at 02:42
  • @Jeff Schaller this is different if it was the same I would not ask. actually if I ask they say why you ask three times, if I write everything in one question they say it is too many question and confusing . which way is best ? – Learner Dec 22 '18 at 02:45
  • 1
    well, that's two of us that you've confused with very similar questions, so it might help prevent accidental closure of these questions by giving an indication as to how they're different. All 3 sound to me like you're trying to join text from two different files, both of which have had the same filename and very similar data. – Jeff Schaller Dec 22 '18 at 02:47
  • @Jeff Schaller yes my problem is manipulating from one file to another. differences is that the two others I was mainly working to combine and lower case then this one I only work on a part of text and replace it . that is all about this question . look at the output – Learner Dec 22 '18 at 02:53

1 Answers1

0

You could do something like this in Awk:

awk '
  NR==FNR {a[">"$1] = ">"$0; next} $1 in a {$1 = a[$1]} 1
' df2.txt df1.txt

Ex.

$ awk 'NR==FNR {a[">"$1] = ">"$0; next} $1 in a {$1 = a[$1]} 1' df2.txt df1.txt 
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET
steeldriver
  • 81,074