66

I am trying to print every Nth line out of a file with more than 300,000 records into a new file. This has to happen every Nth record until it reaches the end of the file.

Satō Katsura
  • 13,368
  • 2
  • 31
  • 50
Terisa
  • 707
  • see also: https://unix.stackexchange.com/q/214445/117549 – Jeff Schaller Jun 04 '17 at 19:54
  • Looking in your comments, we cant understand what you need. Provide sample input and sample output. Do you need a range ? From Nth line up to EOF? – George Vasiliou Jun 04 '17 at 20:25
  • thanks, I have 355,000 records which is sorted but I need to get a sample of the data (1/3 which is about 100,000) so I thought if I retrieve the 300th of the sorted file from 1 to EOF, I should be able to get a fair sample. – Terisa Jun 04 '17 at 20:50
  • What the word "records" means to you? Do you refer to number of lines in a file or you refer to a number of files? Better describe your problem with terms like files and lines. Avoid the word record. Tell us how many lines has your file or how many files you need to parse. – George Vasiliou Jun 04 '17 at 21:00
  • 3
    Please explain your requirements more clearly. Against my answer you wrote. "For example for an input file with 300000 I should get 100000 records in the output." That sentence doesn't make any sense, unless if you mentioned that n=3 and you wanted the 3rd, 6th, 9th line. Or perhaps, you wanted the 1st, 4th, 7th line. There are multiple different solutions because the way you're asking the question is not clear. – Stephen Quan Jun 05 '17 at 02:27

4 Answers4

108
awk 'NR % 5 == 0' input > output

This prints every fifth line.

To use an environment variable:

NUM=5
awk -v NUM=$NUM 'NR % NUM == 0' input > output
Deathgrip
  • 2,566
  • I ran this command and got only 1166 in the output. I expected 100,000. – Terisa Jun 04 '17 at 21:09
  • awk 'NR % 300 ==0' 350000.records > 100000-records – Terisa Jun 04 '17 at 21:10
  • 1
    If you want 1/3 the file, you wanted every 3rd line, not 300th. – Deathgrip Jun 04 '17 at 21:14
  • awk 'NR % 3 == 0' 350000.records > 100000-records. That will give you $((350000/3)) lines, or 116666. – Deathgrip Jun 04 '17 at 21:17
  • That is correct, an idiotic mistake. Thanks very much – Terisa Jun 04 '17 at 21:18
  • 4
    As commented in your "answer" below, pleas accept this answer as the solution. Thank you. – Deathgrip Jun 04 '17 at 21:30
  • 2
    or every 5th line starting at the 1st using NR % 5 == 1 or every 5th line starting at the 4th using NR % 5 == 4 – northern-bradley Oct 24 '18 at 20:23
  • 1
    ffmpeg and other programs expect the data of a file or files piped in. Your solution helped me list all the JPGs in a dir and feed ever 5th filename to cat to read the data to pipe to ffmpeg. Couldn't have done it without you! (I looked all over and tried dozens of possible solutions).

    @northern-bradley your ==1 helps too if there's only 1 file in the directory.

    cat $(ls *.jpg | awk 'NR % 5 == 1' -) | ffmpeg -r 15 -f image2pipe -vcodec mjpeg -i - -r 30 test.mp4

    – Able Mac Jun 15 '19 at 04:15
52

To print every N  th line, use

sed -n '0~Np'
For example, to copy every 5th line of oldfile to newfile, do
sed -n '0~5p' oldfile > newfile

This uses sed’s first~step address form, which means “match every step’th line starting with line first.”  In theory, this would print lines 0, 5, 10, 15, 20, 25, …, up to the end of the file.  Of course there is no line 0, so it just prints lines 5, 10, 20, 25, …;  0~5 is just a convenient alternative way of saying 5~5 (which prints every 5th line starting with line 5; i.e., lines 5, 10, 15, 20, 25, …).

For another example of this sed capability (which does not answer the question),

sed -n '2~5p' oldfile

would print lines 2, 7, 12, 17, 22, 27, …, up to the end of the file.

Note: This approach requires GNU sed, as the first~step address form is a non-portable extension.  (Some old versions of GNU sed may require the 5~5 form as opposed to the 0~5 form.)

3

Here is the perl version:

perl -ne 'print if $. % 5 == 0;' infile > outfile
harmic
  • 131
  • 4
1

Similarly to sed, we have also awk:

$ seq 1000000000 |awk 'NR==500000{print;exit}'
500000

NR=Number of line you want to print (and then exit to avoid waiting the file to finish). In your case

awk 'NR==Nth{print;exit}' inputfile >outputfile

Where Nth is the Nth line number you need to print.