I have a bash commands pipeline that produces a ton of logging text output. But mostly it repeats the previous line except for the timestamp and some minor flags, the main output data changes only once in a few hours. I need to store this output as a text file for future handling/research. What should I pipe it to in order to print only 1st line out of every X?
-
2You might be better not printing subsequent lines that only differ in "timestamp and some minor flags" rather than figuring out a specific number of lines to skip. If you [edit] your question to include concise, testable sample input and expected output we can help you. – Ed Morton May 22 '21 at 11:36
5 Answers
Print 1st line and skip next N-1 lines out of every N lines.
awk -v N=100 'NR%N==1' infile
test with:
$ seq 1000 |awk -v N=100 'NR%N==1'
1
101
201
301
401
....
to pass the number of lines you want to skip them we can read that from a parameter too, so:
$ seq 1000 |awk -v Num=100 -v Skip=98 '(NR-1)%Num<Num-Skip'
1
2
101
102
201
202
301
302
401
402
501
502
601
602
701
702
801
802
901
902

- 41,407
@αғsнιη already showed you how to do what you asked for (skip a specific number of lines) but it sounds like you might be better off just not printing subsequent lines that only differ in "timestamp and some minor flags" rather than figuring out a specific number of lines to skip. If so here's how you'd do that if those "timestamp and some minor flags" were stored in fields 3, 6, 8, and 17:
awk '
{
origRec = $0
$3=$6=$8=$17=""
currKey = $0
}
currKey != prevKey {
print origRec
prevKey = currKey
}
' file
You can easily tweak the above to print not just the first line of every similar group, but also the last line too so you can see the first and last timestamps if that's useful, and/or you can add a print of a count of how many similar lines were skipped.

- 31,617
Using GNU split
:
$ split -n r/1/100 input
We can test this with seq
or jot
:
$ jot 500 | split -n r/1/100
$ seq 500 | split -n r/1/100
1
101
201
301
401
From
coreutils
:
r/k/n likewise but only output kth of n to stdout
-n r/1/100
This prints only first line in every hundred lines. Likewise
-n r/2/100
will print second line in every hundred lines.
I have done nothing but slightly changed the command in the answer.
With perl
:
$ perl -ne 'print if $_ % 100 == 1' input
This is perl command similar to command described in this answer.

- 1,925
if GNU sed
is an option you can use the first~step
addressing:
seq 1000 | sed '1~100!d'
or more redable:
seq 1000 | sed -n '1~100p'
Another ways in awk
:
seq 1000 | awk -v l=100 'NR == 1 || c++ == l {c=1; print}'
This print the first line and then skip as many lines as you specify in the l
variable.

- 2,824
seq 1000| awk -v x=1 'NR==x{print ; x=NR+100}'
output
1
101
201
301
401
501
601
701
801
901

- 5,211