awk start printing based on a condition

Question

I have test data in a file text.txt

a
b
test
test2
1,2
3,3

I want to output the file starting from the line number where test is + 2. I need this to be a oneliner usable in gnuplot, i have comeup with the following:

awk -v linestart=$(awk '$0~"test" {a=NR}END{print a+2}' $filename) 'BEGIN{FS=",";OFS="\t";lines}NR>=linestart{print $1, $2}' $filename

but i need somehow to supply the file contents to two awk's which i do not know how to do. So i came up with solution with the $filename but this has the problem, how to get the $filename in.

I was thinking along the lines:

echo "test.txt" | read filename | awk -v linestart=$(awk '$0~"test" {a=NR}END{print a+2}' $filename) 'BEGIN{FS=",";OFS="\t";lines}NR>=linestart{print $1, $2}' $filename

but that does not work.

How else can i make the above work? The obvious problem is that i need to know the number of the line where i want to start printing before i run awk. i was also thinking something along this:

awk 'BEGIN{FS=",";OFS="\t";lines=100000}{if ($0~"test"){lines=NR+2}; if(NR>=lines){print $1, $2}}'

But i did not even try it since, it is very ugly and not general, i have to make the variable lines always sufficiently big. So is there an elegant solution that would work with a normal text file pipe or in the other case with some way of pushing the file name inside?

Kusalananda · Accepted Answer · 2020-02-04T23:30:41.943

Using ed:

$ printf '%s\n' '/^test/+2,$p' | ed -s file
1,2
3,3

In the ed editor, the command /^test/+2,$p would print (p) the lines from two lines beyond the line matching ^test, to the end ($).

Using awk:

$ awk '/^test/ { flag = 1; count = 1 }; (flag == 1 && count <= 0); { count-- }' file
1,2
3,3

Here, a line will be printed if flag is 1 and if count is less than or equal to zero. The flag is set to 1 when the pattern ^test is matched in the input data, and count is then also set to the number of lines to skip until the output should start (not counting the current line). The count is decreased for all lines.

A slightly different approach with awk:

$ awk '/^test/ { getline; while (getline > 0) print }' file
1,2
3,3

Here, we match our pattern and then immediately read and discard the next line of input. Then we use a while loop to read the rest of the file, printing each line read.

The exact same approach, but with sed:

$ sed -n -e '/^test/ { n' -e ':again' -e 'n; p; b again' -e '}' file
1,2
3,3

Match the pattern, then read and discard the next line (n), then get into a loop reading and printing each line (n; p;). The loop is made up of the label again and the branching/jumping to this label (b again).

Would you mind explaining the first awk method? Specifically, where the printing done? I found that it seems to be following (flag == 1 && count <= 0) but i don't understand the syntax here. — atapaka, Feb 04 '20 at 23:42
@leosenko in awk, print $0 is the default action – every record that passes the test flag == 1 && count <= 0 gets printed — , Feb 05 '20 at 02:15

score 2 · Answer 2 · answered Feb 05 '20 at 05:59

If you know your data starts 2 lines after test, and there are no more lines with test on them, you can get away with something like this:

awk '/^test$/ { f=1 } f && f++ > 2' filename

Also, to send this data to Gnuplot, you might consider doing it through a pipe like this:

(
echo "set datafile separator ','"
echo "plot '-' using 1:2 with lines"
awk '/^test$/ { f=1 } f && f++ > 2' filename
echo "e"
) | gnuplot -persist

Plot of the data

score 0 · Answer 3 · answered Feb 05 '20 at 10:17

0

You can trivially do that with a start, end range with an end condition which is always false and a start condition which skips lines:

awk '/test/ && getline && getline, 0'

answered Feb 05 '20 at 10:17

1

Don't do that as it can silently fail producing incorrect output or spin off into an infinite loop, see http://awk.freeshell.org/AllAboutGetline. It's also not easily extensible to start, say, 50 lines after the target line (it'd require you typing getline 50 times!). – Ed Morton Feb 05 '20 at 14:56
Can you give me an example input and awk implementation where this will silently fail or spin off into an infinite loop? I'm especially interested if there's any awk implementation where getline can return transient errors. Thanks. – Feb 05 '20 at 15:13
Sure - any awk when the input file is or becomes unreadable. It won't actually go into an infinite loop in this case since you don;t have an explicit loop . – Ed Morton Feb 05 '20 at 15:14
Not in my testing. They all seem to bail out on the first EIO (or other) error. I may be missing something -- my question was not rhetorical; I'm genuinely interested in an example. – Feb 05 '20 at 15:18
And I'm genuinely giving you it but the input file would need to become unreadable after awk first opens it and before the first or 2nd getline executes - that's going to be tough to test. – Ed Morton Feb 05 '20 at 15:19
I had used a special fuse filesystem for that (which times out, returns errors based on the file names). But I may have messed something up -- thence my question. – Feb 05 '20 at 15:24
If you just always test for a positive return code from getline (which is what I'm recommending in that article I referenced) rather than a non-zero one then you don't have to think about the ways in which it might fail for any given application. Using getline for something like this makes it harder to enhance the code to do other things anyway though so I just wouldn't use it in this case as it has no benefits. – Ed Morton Feb 05 '20 at 15:26
So I do in general. I didn't do it here because I'm not convinced that the first getline could fail and the second succeed. – Feb 05 '20 at 15:30
To me not testing for a positive return from getline when you think you don't have to is like not quoting your shell variables when you think you don't have to, etc. I'd much rather just not have to think about whether or not it's safe to not program defensively in a given context and always quote my shell variables, test getline return, add IFS= and -r to read, etc., etc. People often ask me "how can getline fail here" and I usually show them a concrete example but I simply don't know all the ways getline can fail because I'll never use it in a way where I have to care. – Ed Morton Feb 05 '20 at 15:38

score 0 · Answer 4 · answered Feb 05 '20 at 14:55

0

$ awk '/test/{n=NR} n && NR>n' file
1,2
3,3

$ awk '/test/{n=NR+1} n && NR>n' file
3,3

See also https://stackoverflow.com/a/17914105/1745001

answered Feb 05 '20 at 14:55

Ed Morton

31,617

awk start printing based on a condition

4 Answers4