2

I have a .txt file with contents similar to this:

  • 100 150 180 200 300 400
  • 100 200 250 350 380 400
  • 100 160 170 400 450 500
  • 100 120 140 160 180 200
  • 100 120 140 160 180 300

I want to grab all the lines, starting from a specific column (like 2, 3 or any other) that contains '100' and '200' in any postion and then output it to another separate txt file. How can I do that? In the example above, the correct print should be:

  • 100 150 180 200 300 400
  • 100 200 250 350 380 400
  • 100 120 140 160 180 200

I have tried using sublime's "Find All" feature and then use the right arrow to the end of the line to highlight them, but unfortunately some lines are much longer than the others so it doesn't work.

Hector
  • 21

2 Answers2

2
$ grep 100 <file | grep 200 >newfile
$ cat newfile
100 150 180 200 300 400
100 200 250 350 380 400
100 120 140 160 180 200

The first grep extracts all lines from the original file that contains the string 100. The second grep extracts all lines from that result that contains the string 200.

Note that this would also extract lines that contained strings like 1100 and 1200 since these contains the wanted strings as substrings. To avoid that, use grep with its -w option (if available).


To test only from a specific column onwards, use a short awk program:

$ awk -v col=1 '{ delete c; for (i=col; i<=NF; ++i) ++c[$i]; if (c[100] > 0 && c[200] > 0) print }' <file >newfile
$ cat newfile
100 150 180 200 300 400
100 200 250 350 380 400
100 120 140 160 180 200

This awk program takes a the value of the col variable from the command line (here the value is 1). It then goes through each input line from column col onwards, counting how many times each value occurs. If the values 100 and 200 occurs more than zero times, the line is printed.

The program, with nicer layout:

{
    delete c

    for (i=col; i<=NF; ++i)
        ++c[$i]

    if (c[100] > 0 && c[200] > 0)
        print
}

This program also lends itself to extracting lines with specific number of matches of certain items.

Kusalananda
  • 333,661
0

If you're sure that there won't be false positives, try also

awk '{TMP = $0; sub ($1 FS $2, "")} /100/ && /200/ {print TMP} ' file
RudiC
  • 8,969