10

What is the simplest way to extract from a file a line given by its number. E.g., I want the 666th line of somefile. How would you do this in your terminal, or in a shell script?

I can see solutions like head -n 666 somefile | tail -n 1, or even the half-incorrect cat -n somefile | grep -F 666, but there must be something nicer, faster, and more robust. Maybe using a more obscure unix command/utility?

Anthon
  • 79,293
phs
  • 443
  • Related question: http://stackoverflow.com/q/12182910/1331399 – Thor Sep 09 '15 at 12:59
  • 1
    It really doesn't get much faster or robust than the head/tail approach. My Perl solution is as fast or slightly faster in some cases (but the inverse will probably be true in others). The only "nicer" one will be the awk 'NR==666 but that, while shorter, is significantly slower. – terdon Sep 09 '15 at 14:37
  • 1
    @phs Please post your 'PS' as comments to the individual answerers. It has nothing to do with the question content and should not be part of one. Apart from that it triggered a spurious reopen review cycle. – Anthon Sep 10 '15 at 11:25

5 Answers5

23

sed (stream editor) is the right tool for this kind of job:

sed -n '666p' somefile

Edit: @tachomi's solution sed '666q;d' somefile is better when operating on a huge text file, because it makes sed exit after printing the pattern without reading the rest of the file. On all other files, the difference is irrelevant.

dr_
  • 29,602
  • @dr01: Thanks! I only use sed for replacements, e.g. sed -e "s/../../g", sometimes with regular expressions, and always felt that sed's manual and its full list of commands was too painful for me. So here -n is don't echo, 666 is an address, and p is print? – phs Sep 09 '15 at 12:52
  • 1
    Yes. sed is a powerful tool, it is worth learning all its options. – dr_ Sep 09 '15 at 12:55
  • sed -n '666{p;q}' somefile unless your sed dialect won't accept that and requires sed -n -e '10{p' -e 'q}' somefile This allows you to quit early without the conceptual dissonance of "deleting" the lines that you don't want printed. It's merely a stylistic alternative. – Dennis Williamson Sep 09 '15 at 22:40
18

You can use sed

sed -n '666p' somefile

Or

sed '666!d' somefile

Or in large files

sed '666q;d' somefile 

In bash script

#!/usr/bin/bash
line=666
sed "$line"'q;d' somefile
tachomi
  • 7,592
7

POSIXly (and maybe the fastest with huge file):

tail -n +666 | head -n1
cuonglm
  • 153,898
  • Why the downvote? – cuonglm Sep 09 '15 at 12:48
  • This solution has already been suggested by the OP and correctly discarded as it uses two processes. – dr_ Sep 09 '15 at 12:48
  • 1
    @dr01: No, the OP use head then tail, which is very different from tail then head. – cuonglm Sep 09 '15 at 12:49
  • Didn't downvote, but isn't the sed command POSIX compliant too? – Eric Renouf Sep 09 '15 at 12:50
  • It doesn't matter; there's no need to use two commands when you can use one. – dr_ Sep 09 '15 at 12:50
  • 3
    @dr01: Run your sed and mine with the file a huge file, and you can see the different. Yours is even worse than tachomy, since you read the rest of the file instead of quitting after hit the line. Read this for more details. – cuonglm Sep 09 '15 at 12:51
  • @EricRenouf: The sed is compliant. – cuonglm Sep 09 '15 at 12:53
  • 1
    @cuonglm Good point. +1 for speed. – dr_ Sep 09 '15 at 13:01
  • I tested this on a ~6M and a ~848M file and my Perl approach was faster on both. Also, the OP's approach was exactly as fast as yours, why do you say that tail/head will be better than head/tail? – terdon Sep 09 '15 at 14:35
  • 1
    @terdon: You can see the same issue and benchmark here. – cuonglm Sep 09 '15 at 15:51
  • Hmm. I don't see the same result here. My guess is that it will depend on the size of the file and how close to the end the desired line is. Sometimes head/tail will be faster and other times tail/head will. – terdon Sep 09 '15 at 16:02
6

try

awk 'NR == 666 { print ; exit ; } '

or

awk -vline=$LINE 'NR == line { print ;  exit ; } ' 
awk 'NR == '$LINE' { print ; exit ;  } '

if you want to provide line number via a shell variable ($LINE) .

e[dx]it: as per terdon suggestion.

Archemar
  • 31,554
  • 1
    You don't need that. Remember that awk treats any statement evaluating to true as a call to print. That's why 1; will print the line. All you need is awk 'NR==666. – terdon Sep 09 '15 at 14:20
  • That wasn't my suggestion, it was yours! My suggestion was awk 'NR==666 which is as slow as your original but shorter. The exit; makes all the difference! – terdon Sep 09 '15 at 14:49
  • the exit in perl make me think of an exit in awk. – Archemar Sep 09 '15 at 14:51
2

A Perl way:

perl -ne 'print && exit if $.==666' file

I tested by creating a file with the numbers from 1 to 999999. On this file, the Perl solution above and awk with exit are the fastest of those mentioned so far:

$ perl -le 'print for 1..999999' > file

$ time perl -ne 'print && exit if $.==666' file
666

real    0m0.004s
user    0m0.000s
sys     0m0.000s

$ time awk 'NR==666 { print ; exit ; } ' file
666

real    0m0.004s
user    0m0.000s
sys     0m0.000s

$ time tail -n +666 file | head -n1
666

real    0m0.021s
user    0m0.004s
sys     0m0.000s

$ time sed -n '666p' file
666

real    0m0.125s
user    0m0.112s
sys     0m0.012s

$ time awk 'NR==666' file
666

real    0m0.161s
user    0m0.156s
sys     0m0.000s

That said, your original solution of head -n666 file | tail -n1 is also blindingly fast, very robust and completely portable. Why do you think it's not?

terdon
  • 242,166
  • can you time awk 'NR==666 { print ; exit ; } '? I guess it would be as fast as perl. – Archemar Sep 09 '15 at 14:46
  • @Archemar of course! You're quite right, that one is as fast as Perl. Add it to your answer. – terdon Sep 09 '15 at 14:47
  • @terdon: Yes head -n666 is fast on your huge file because head stops reading after 666 lines and because 666 is much smaller than 999999. Still, that solution will have head output a lot of garbage for tail to read and dismiss. – phs Sep 09 '15 at 15:22
  • @phs good point about 666 being smaller. My Perl approach is significantly slower when told to print line 999999. However, the head/tail or tail/head is still fast as anything, portable and efficient. I understand the aesthetic considerations and it would be nicer not to print needless lines but this all happens in the background and it is still blindingly fast. I'd stick with it. – terdon Sep 09 '15 at 15:27
  • 1
    Run { head -n 665 >/dev/null; head -n 1; } <infile if you want to avoid reading and dismissing lots of garbage. – don_crissti Sep 09 '15 at 18:03
  • @don_crissti: That's great!! So many neat tricks popping up... – phs Sep 09 '15 at 19:30