0

I have a text file containing a list of strings, for example a.txt

one
two
three

I also have another text file containing a list of strings, for example b.txt

threetwo
onetwothree
zero
twozero

What I wish to do is compare the two and find whether any of the fields inside b.txt contain fields from a.txt

Example of output in this case would be,

threetwo > two, three
onetwothree > one, two, three
twozero > two

If my explanation wasn't explanatory enough then I have this written in C# which will produce my expectations.

List<string> allElements = new List<string> { "one", "two", "three" };
string str = "onetwothree";
var containingElements = allElements.Where(element => str.Contains(element));
foreach(string element in containingElements)
{
    Console.WriteLine(element);
}

you can run the above code on dotnetfiddle.net

I would prefer this to be achieved using awk, any help would be appreciated :).

Kusalananda
  • 333,661
  • 1
    Hello and welcome! Would you mind editing your post and showing us what you've attempted so far? – Daniel Walker Jan 22 '21 at 03:49
  • @DanielWalker to be honest I haven't attempted anything, I'm not sure where to begin.. was just looking for some direction – user452306 Jan 22 '21 at 04:03
  • Why awk? If I understand the requirement correctly, grep -f a.txt b.txt finds whether any of the fields inside b.txt contain fields from a.txt. – berndbausch Jan 22 '21 at 07:06

1 Answers1

0

You could use the return value of awk's index function to determine whether a line in b.txt contains a substring from a.txt.

index(in, find)
Search the string in for the first occurrence of the string find, and return 

the position in characters where that occurrence begins in the string in.

For example:

awk '
  NR==FNR{strings[$1]; next}
  {
    m = ""
    for(s in strings){
      if(index($0,s) > 0) m = (m=="") ? s : m ", " s
    }
  }
  m != "" {print $0, ">", m}
' a.txt b.txt
threetwo > three, two
onetwothree > three, two, one
twozero > two

Note that array traversal order (in this case, the array of substrings constructed from a.txt) is not guaranteed in awk.

steeldriver
  • 81,074