0

I have a large file, file1, containing lots of information, e.g.

rs969931    C   A   1.993   1.189   1.003 ..............
rs2745406   C   T   1.993   1.166   1.003 ..............
rs6939431   A   G   0.003   0.207   0.005 ..............
rs1233427   A   G   1.990   1.150   1.001 ..............

and a second file, file2, containing the quality of that information, where the value of row N of file 2 corresponds to quality of row N of file 1

0.19893
0.94752
0.93768
0.47781

What I'd like to do is select rows from file 1 where file 2 > 0.5. Closest I've been able to find is an ID matching question (Select lines from text file which have ids listed in another file), whereas here I need to perform some logical operation using the values of file 2.

Operation may need to be performed many times on large files so looking to avoid clunky solutions like appending file 2 to file 1 and then removing it after filtering.

Kusalananda
  • 333,661
E. Rei
  • 111

2 Answers2

3
paste qual.txt data.txt | awk '$1 > 0.5'

This will first generate data that contains the quality values as the first column and the other data as the other columns using paste. The awk code simply selects and prints the lines whose first column (the quality) is greater than 0.5.

If you don't want the quality in the output:

paste qual.txt data.txt | awk '$1 > 0.5' | cut -f 2-

For the given example, this will generate

rs2745406   C   T   1.993   1.166   1.003 ..............
rs6939431   A   G   0.003   0.207   0.005 ..............
Kusalananda
  • 333,661
2

With awk and getline (see all about getline for caveats)

$ # can also use: awk '{getline num < "file2"} num>0.5' file1
$ awk -v cmp_f='file2' '{getline num < cmp_f} num>0.5' file1
rs2745406   C   T   1.993   1.166   1.003 ..............
rs6939431   A   G   0.003   0.207   0.005 ..............
  • getline num < cmp_f save a line from file2 to num
  • num>0.5 print line from file1 if condition is satisfied


I think the below version might be better

awk '(getline num < "file2")>0 && num>0.5' file1
Sundeep
  • 12,008