how to add part of similar string from one file to another file

Question

I have a txt data like this (df1.txt)

>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU 
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN 
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET

and I have a txt data like this (df2.txt)

tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2

I want to merge them together based on the similar info so I want to have a output like this

>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET

I am trying this with no success, any thought ?

cat df1.txt | seqkit replace -k df2.txt -p '(.+)' -r '$1 {kv}'

Is it different from them? I don't think you should write the same question three times. — Jeff Schaller, Dec 22 '18 at 02:42
@Jeff Schaller this is different if it was the same I would not ask. actually if I ask they say why you ask three times, if I write everything in one question they say it is too many question and confusing . which way is best ? — Learner, Dec 22 '18 at 02:45
well, that's two of us that you've confused with very similar questions, so it might help prevent accidental closure of these questions by giving an indication as to how they're different. All 3 sound to me like you're trying to join text from two different files, both of which have had the same filename and very similar data. — Jeff Schaller, Dec 22 '18 at 02:47
@Jeff Schaller yes my problem is manipulating from one file to another. differences is that the two others I was mainly working to combine and lower case then this one I only work on a part of text and replace it . that is all about this question . look at the output — Learner, Dec 22 '18 at 02:53

score 0 · Answer 1 · answered Dec 22 '18 at 03:27

You could do something like this in Awk:

awk '
  NR==FNR {a[">"$1] = ">"$0; next} $1 in a {$1 = a[$1]} 1
' df2.txt df1.txt

Ex.

$ awk 'NR==FNR {a[">"$1] = ">"$0; next} $1 in a {$1 = a[$1]} 1' df2.txt df1.txt 
>tr|A0A1B1L9R9|A0A1B1L9R9_BACTU ABC transporter permease OS=Bacillus thuringiensis OX=1428 GN=berB PE=4 SV=1
MNKQLFLASLKETQKSILSYACGAALYLWLLIWIFPSMVSAKGLNELIAAMPDSVKKIVG
MESPIQNVMDFLAGEYYSLLFIIILTIFCVTVATHLIARHVDKGAMAYLLATPVSRVQIA
ITQATVLILGLLIIVSVTYVAGLVGAEWFLQDNNLNKELFLKINIVGGLIFLVVSAYSFF
FSCICNDERKALSYSASLTILFFVLDMVGKLSDKLEWMKNLSLFTLFRPKEIAEGAYNIW
PVSIGLIAGALCIFIVAIVVFKKRDLPL
>sp|O15304|SIVA_HUMAN Apoptosis regulatory protein Siva OS=Homo sapiens OX=9606 GN=SIVA1 PE=1 SV=2
MPKRSCPFADVAPLQLKVRVSQRELSRGVCAERYSQEVFEKTKRLLFLGAQAYLDHVWDE
GCAVVHLPESPKPGPTGAPRAARGQMLIGPDGRLIRSLGQASEADPSGVASIACSSCVRA
VDGKAVCGQCERALCGQCVRTCWGCGSVACTLCGLVDCSDMYEKVLCTSCAMFET

how to add part of similar string from one file to another file

1 Answers1