What's a good way of extracting say, lines 20 -45 out of a huge text file. Non-interactively of course!
5 Answers
Even simpler:
sed -n '20,45p;45q' < textfile
The -n flag disables the default output. The "20,45" addresses lines 20 to 45, inclusive. The "p" command prints the current line. And the q quits after printing the line.

- 776
you could try:
cat textfile | head -n 45 | tail -n 26
or
cat textfile | awk "20 <= NR && NR <= 45"
update:
As Mahomedalid pointed out, cat
is not necessary and a bit redundant, but it does make for a clean, readable command.
If cat
does bother you, a better sollution would be:
<textfile awk "20 <= NR && NR <= 45"

- 25,300
-
4
-
I like the the use of stdin more, it has some global consistancy with the rest of nix – Stefan Sep 16 '10 at 08:52
-
2Reading from command line arguments has consistency with other UNIX utilities too, and my main point was to demonstrate awk's
,
range operator. – ephemient Sep 17 '10 at 18:10 -
-
1I think @ephemient's answer is the best one here. Otherwise, the commands are rather cryptic. – Léo Léopold Hertz 준영 Sep 11 '15 at 12:52
This is not an answer but can't post it as a comment.
Another (very fast) way to do it was suggested by mikeserv here:
{ head -n 19 >/dev/null; head -n 26; } <infile
Using the same test file as here and the same procedure, here are some benchmarks (extracting lines 1000020-1000045):
mikeserv:
{ head -n 1000019 >/dev/null; head -n 26; } <iplist
real 0m0.059s
Stefan:
head iplist -n 1000045 | tail -n 26
real 0m0.054s
These are by far the fastest solutions and the differences are negligible (for a single pass) (I tried with different ranges: a couple of lines, millions of lines etc).
Doing it without the pipe might offer a significant advantage, however, to an application which needed to seek over multiple ranges of lines in similar fashion, like:
for pass in 0 1 2 3 4 5 6 7 8 9
do printf "pass#$pass:\t"
head -n99 >&3; head -n1
done <<1000LINES 3>/dev/null
$(seq 1000)
1000LINES
...which prints...
pass#0: 100
pass#1: 200
pass#2: 300
pass#3: 400
pass#4: 500
pass#5: 600
pass#6: 700
pass#7: 800
pass#8: 900
pass#9: 1000
...and only reads the file through the one time.
The other sed
/awk
/perl
solutions read the whole file and since this is about huge files, they're not very efficient. I threw in some alternatives that exit
or q
uit after the last line in the specified range:
Stefan:
awk "1000020 <= NR && NR <= 1000045" iplist
real 0m2.448s
vs.
awk "NR >= 1000020;NR==1000045{exit}" iplist
real 0m0.243s
dkagedal (sed
):
sed -n 1000020,1000045p iplist
real 0m0.947s
vs.
sed '1,1000019d;1000045q' iplist
real 0m0.143s
Steven D:
perl -ne 'print if 1000020..1000045' iplist
real 0m2.041s
vs.
perl -ne 'print if $. >= 1000020; exit if $. >= 1000045;' iplist
real 0m0.369s

- 82,805
-
+1 I think this is the best answer here! It would be nice to get how much it takes time with this
awk NR==1000020,NR==1000045 textfile
in your system. – Léo Léopold Hertz 준영 Sep 11 '15 at 13:02
ruby -ne 'print if 20 .. 45' file

- 919
-
1
-
1While we're at it, why not
python -c 'import fileinput, sys; [sys.stdout.write(line) for nr, line in enumerate(fileinput.input()) if 19 <= nr <= 44]'
too? :-P This is something that Ruby, modeled after Perl, inspired by awk/sed, can do easily. – ephemient Sep 17 '10 at 18:21
Since sed and awk were already taken, here is a perl solution:
perl -nle "print if ($. > 19 && $. < 46)" < textfile
Or, as pointed out in the comments:
perl -ne 'print if 20..45' textfile

- 46,160
-
3What's with all those extra characters? No need to strip and re-add newlines, flip-flop assumes comparison to line number, and diamond operator runs through arguments if provided.
perl -ne'print if 20..45' textfile
– ephemient Sep 15 '10 at 21:09 -
1Nice. -nle is a bit of a reflex I suppose, as for the rest, I have no excuse save ignorance. – Steven D Sep 15 '10 at 21:17
q
command (everything starting from;
) improved performance for me when extracting single line 26995107 from a 27169334-line file. – Ruslan Apr 16 '19 at 11:32