1

i have 2 text files: file1 contains only one column with a series of id values; file2 contains many columns. I need to check whether the values of file1 are contained in column2 of file2; the values of file1 only have to be contained in the values of column2 of file2, e.g.: 347588 file1 -> 1000347588 file2. This would be a match ;-)

Thanks a lot!

sebw
  • 11
  • And I presume 034 would be matched to 1000347588 as well? Could you show some data, and also mention what you would need to see as output? At the moment, a "yes" or "no" seems to be what you are looking for. – Kusalananda Oct 25 '19 at 10:16
  • 1
    post a testable samples – RomanPerekhrest Oct 25 '19 at 10:28
  • as output i'd need a third file containing the lines of file2 that matched with the values of file1. yes 034 would also match. i'm completely new to this so i don't even know how to show some data here in the comments, but as mentioned: file1 has only one col with ~47000 lines and file2 has several cols with col5 being the one i need to check. thanks a lot for the help, and sorry... i'm hopeless here :-D – sebw Oct 25 '19 at 10:40
  • @sebw https://unix.stackexchange.com/questions/548698/test-if-values-in-file1-are-contained-in-column5-of-another-file#comment1018401_548698 – RomanPerekhrest Oct 25 '19 at 10:52
  • @sebw Please [edit] your question and write all clarification there. Please show some example input files and the expected output in your question. Format the data as code. In the editor field you can do this by selecting the data using the mouse or by pressing SHIFT and moving with the cursor keys, then using the {} tool. – Bodo Oct 25 '19 at 11:03
  • What is the delimiter? Space or maybe a CSV? – pLumo Oct 25 '19 at 11:32

1 Answers1

1

If your file is space-delimited, use awk:

awk '
    NR==FNR{s[$0]=1}
    NR!=FNR{for (v in s){ if ($2 ~ v) { print; next; }}}
' file1 file2
  • Save lines ($0) of file1 as key in array s.
  • For second file, check $2 matches any key of s and if yes, print the line.
  • Go to next line after a match to prevent duplicates of the same line.
pLumo
  • 22,565