Neither, use tail
or head
instead:
$ time tail -n 4000001 foo | head -n 11
real 0m0.039s
user 0m0.032s
sys 0m0.004s
$ time head -n 4000010 foo | tail -n 11
real 0m0.055s
user 0m0.064s
sys 0m0.036s
tail
is in fact consistently faster. I ran both commands 100 times and calculated their average:
tail:
real 0.03962
user 0.02956
sys 0.01456
head:
real 0.06284
user 0.07356
sys 0.07244
I imagine tail
is faster because though it has to seek all the way to line 4e10, it does not actually print anything until it gets there while head
will print everything until line 4e10 + 10.
Compare to some other methods sorted in order of time:
sed:
$ time sed -n 4000000,4000011p;q foo
real 0m0.312s
user 0m0.236s
sys 0m0.072s
Perl:
$ time perl -ne 'next if $.<4000000; print; exit if $.>=4000010' foo
real 0m1.000s
user 0m0.936s
sys 0m0.064s
awk:
$ time awk '(NR>=4000000 && NR<=4000010){print} (NR==4000010){exit}' foo
real 0m0.955s
user 0m0.868s
sys 0m0.080s
Basically, the rule is the less you parse, the faster you are. Treating the input as a stream of data which only needs to be printed to the screen (as tail
does) will always be the fastest way.
head -n 4000010 foo.txt | tail -n 10
? It seems the more intuitive way to do it to me, since it doesn't require you to know the exact length of the file. It should even be faster, since AFAIKtail
has to read the whole file (in order to know how many lines there are), whereashead
stops reading after the appropriate number of lines. – evilsoup Oct 09 '13 at 20:47head
will be slower because it has to print unnecessary lines. – terdon Oct 10 '13 at 00:01NR>4000010{exit}
to the end of the awk or a;4000011q
to the end of the sed function cuts the time down by almost half, largely because that allows the script to skip reading nearly half of the file. But head | tail is still faster. – dannysauer Oct 10 '13 at 01:36