Pipe skip 99 lines out of every 100

Question

I have a bash commands pipeline that produces a ton of logging text output. But mostly it repeats the previous line except for the timestamp and some minor flags, the main output data changes only once in a few hours. I need to store this output as a text file for future handling/research. What should I pipe it to in order to print only 1st line out of every X?

You might be better not printing subsequent lines that only differ in "timestamp and some minor flags" rather than figuring out a specific number of lines to skip. If you [edit] your question to include concise, testable sample input and expected output we can help you. — Ed Morton, May 22 '21 at 11:36

αғsнιη · Accepted Answer · 2021-05-24T16:32:00.980

10

Print 1^st line and skip next N-1 lines out of every N lines.

awk -v N=100 'NR%N==1' infile

test with:

$ seq 1000 |awk -v N=100 'NR%N==1'
1
101
201
301
401
....

to pass the number of lines you want to skip them we can read that from a parameter too, so:

$ seq 1000 |awk -v Num=100 -v Skip=98 '(NR-1)%Num<Num-Skip'
1
2
101
102
201
202
301
302
401
402
501
502
601
602
701
702
801
802
901
902

edited May 24 '21 at 16:32

answered May 22 '21 at 10:31

αғsнιη

41,407

Ed Morton · Answer 2 · 2021-05-22T11:53:42.687

@αғsнιη already showed you how to do what you asked for (skip a specific number of lines) but it sounds like you might be better off just not printing subsequent lines that only differ in "timestamp and some minor flags" rather than figuring out a specific number of lines to skip. If so here's how you'd do that if those "timestamp and some minor flags" were stored in fields 3, 6, 8, and 17:

awk '
{
    origRec = $0
    $3=$6=$8=$17=""
    currKey = $0
}
currKey != prevKey {
    print origRec
    prevKey = currKey
}
' file

You can easily tweak the above to print not just the first line of every similar group, but also the last line too so you can see the first and last timestamps if that's useful, and/or you can add a print of a count of how many similar lines were skipped.

Prabhjot Singh · Answer 3 · 2021-05-29T18:17:20.707

3

Using GNU split:

$ split -n r/1/100 input

We can test this with seq or jot:

$ jot 500 | split -n r/1/100 
$ seq 500 | split -n r/1/100
1
101
201
301
401

From coreutils:

r/k/n likewise but only output k^th of n to stdout

-n r/1/100 This prints only first line in every hundred lines. Likewise
-n r/2/100 will print second line in every hundred lines.

I have done nothing but slightly changed the command in the answer.

With perl:

$ perl -ne 'print if $_ % 100 == 1' input

This is perl command similar to command described in this answer.

edited May 29 '21 at 18:17

answered May 24 '21 at 13:37

Prabhjot Singh

1,925

this is more elegant! thank you! – scythargon Jun 09 '21 at 15:55

score 2 · Answer 4 · answered May 22 '21 at 19:27

if GNU sed is an option you can use the first~step addressing:

seq 1000 | sed '1~100!d'

or more redable:

seq 1000 | sed -n '1~100p'

Another ways in awk:

seq 1000 | awk -v l=100 'NR == 1 || c++ == l {c=1; print}'

This print the first line and then skip as many lines as you specify in the l variable.

score 2 · Answer 5 · answered May 22 '21 at 20:08

2

seq 1000| awk -v x=1 'NR==x{print ; x=NR+100}'

output

answered May 22 '21 at 20:08

Praveen Kumar BS

5,211

Pipe skip 99 lines out of every 100

5 Answers5

1

101

201

301

401