1

I have a bash commands pipeline that produces a ton of logging text output. But mostly it repeats the previous line except for the timestamp and some minor flags, the main output data changes only once in a few hours. I need to store this output as a text file for future handling/research. What should I pipe it to in order to print only 1st line out of every X?

αғsнιη
  • 41,407
  • 2
    You might be better not printing subsequent lines that only differ in "timestamp and some minor flags" rather than figuring out a specific number of lines to skip. If you [edit] your question to include concise, testable sample input and expected output we can help you. – Ed Morton May 22 '21 at 11:36

5 Answers5

10

Print 1st line and skip next N-1 lines out of every N lines.

awk -v N=100 'NR%N==1' infile

test with:

$ seq 1000 |awk -v N=100 'NR%N==1'
1
101
201
301
401
....

to pass the number of lines you want to skip them we can read that from a parameter too, so:

$ seq 1000 |awk -v Num=100 -v Skip=98 '(NR-1)%Num<Num-Skip'
1
2
101
102
201
202
301
302
401
402
501
502
601
602
701
702
801
802
901
902
αғsнιη
  • 41,407
6

@αғsнιη already showed you how to do what you asked for (skip a specific number of lines) but it sounds like you might be better off just not printing subsequent lines that only differ in "timestamp and some minor flags" rather than figuring out a specific number of lines to skip. If so here's how you'd do that if those "timestamp and some minor flags" were stored in fields 3, 6, 8, and 17:

awk '
{
    origRec = $0
    $3=$6=$8=$17=""
    currKey = $0
}
currKey != prevKey {
    print origRec
    prevKey = currKey
}
' file

You can easily tweak the above to print not just the first line of every similar group, but also the last line too so you can see the first and last timestamps if that's useful, and/or you can add a print of a count of how many similar lines were skipped.

Ed Morton
  • 31,617
3

Using GNU split:

$ split -n r/1/100 input

We can test this with seq or jot:

$ jot 500 | split -n r/1/100 
$ seq 500 | split -n r/1/100

1

101

201

301

401

From coreutils:

r/k/n likewise but only output kth of n to stdout

-n r/1/100 This prints only first line in every hundred lines. Likewise
-n r/2/100 will print second line in every hundred lines.

I have done nothing but slightly changed the command in the answer.

With perl:

$ perl -ne 'print if $_ % 100 == 1' input

This is perl command similar to command described in this answer.

2

if GNU sed is an option you can use the first~step addressing:

seq 1000 | sed '1~100!d'

or more redable:

seq 1000 | sed -n '1~100p'

Another ways in awk:

seq 1000 | awk -v l=100 'NR == 1 || c++ == l {c=1; print}'

This print the first line and then skip as many lines as you specify in the l variable.

2
seq 1000| awk -v x=1 'NR==x{print ; x=NR+100}'

output

1
101
201
301
401
501
601
701
801
901