What's the best way to take a segment out of a text file?

Question

What's a good way of extracting say, lines 20 -45 out of a huge text file. Non-interactively of course!

score 13 · Answer 1 · edited Jul 21 '15 at 09:52

13

Even simpler:

sed -n '20,45p;45q' < textfile

The -n flag disables the default output. The "20,45" addresses lines 20 to 45, inclusive. The "p" command prints the current line. And the q quits after printing the line.

edited Jul 21 '15 at 09:52

answered Sep 15 '10 at 16:46

dkagedal

776

1

+1 nice, i Like, but its line 20 to 45 :) – Stefan Sep 15 '10 at 18:53
1

ok ok, I edited it to say 20,45 :-) – dkagedal Sep 15 '10 at 23:20
Removing the q command (everything starting from ;) improved performance for me when extracting single line 26995107 from a 27169334-line file. – Ruslan Apr 16 '19 at 11:32

Stefan · Accepted Answer · 2010-09-15T22:08:03.940

12

you could try:

cat textfile | head -n 45 | tail -n 26

or

cat textfile | awk "20 <= NR && NR <= 45"

update:

As Mahomedalid pointed out, cat is not necessary and a bit redundant, but it does make for a clean, readable command.

If cat does bother you, a better sollution would be:

<textfile awk "20 <= NR && NR <= 45"

edited Sep 15 '10 at 22:08

answered Sep 15 '10 at 13:25

Stefan

25,300

4

awk NR==20,NR==45 textfile works too, and reads easily. – ephemient Sep 16 '10 at 01:47
I like the the use of stdin more, it has some global consistancy with the rest of nix – Stefan Sep 16 '10 at 08:52
2

Reading from command line arguments has consistency with other UNIX utilities too, and my main point was to demonstrate awk's , range operator. – ephemient Sep 17 '10 at 18:10
lol, i meant @adam. but yes, I like your suggestion – Stefan Nov 03 '10 at 04:03
1

I think @ephemient's answer is the best one here. Otherwise, the commands are rather cryptic. – Léo Léopold Hertz 준영 Sep 11 '15 at 12:52

score 7 · Answer 3 · edited Apr 13 '17 at 12:36

This is not an answer but can't post it as a comment.

Another (very fast) way to do it was suggested by mikeserv here:

{ head -n 19 >/dev/null; head -n 26; } <infile

Using the same test file as here and the same procedure, here are some benchmarks (extracting lines 1000020-1000045):

mikeserv:

{ head -n 1000019 >/dev/null; head -n 26; } <iplist

real    0m0.059s

Stefan:

head iplist -n 1000045 | tail -n 26

real    0m0.054s

These are by far the fastest solutions and the differences are negligible (for a single pass) (I tried with different ranges: a couple of lines, millions of lines etc).

Doing it without the pipe might offer a significant advantage, however, to an application which needed to seek over multiple ranges of lines in similar fashion, like:

for  pass in 0 1 2 3 4 5 6 7 8 9
do   printf "pass#$pass:\t"
     head -n99 >&3; head -n1
done <<1000LINES 3>/dev/null
$(seq 1000)
1000LINES

...which prints...

pass#0: 100
pass#1: 200
pass#2: 300
pass#3: 400
pass#4: 500
pass#5: 600
pass#6: 700
pass#7: 800
pass#8: 900
pass#9: 1000

...and only reads the file through the one time.

The other sed/awk/perl solutions read the whole file and since this is about huge files, they're not very efficient. I threw in some alternatives that exit or quit after the last line in the specified range:

Stefan:

awk "1000020 <= NR && NR <= 1000045" iplist

real    0m2.448s

vs.

awk "NR >= 1000020;NR==1000045{exit}" iplist

real    0m0.243s

dkagedal (sed):

sed -n 1000020,1000045p iplist

real    0m0.947s

vs.

sed '1,1000019d;1000045q' iplist

real    0m0.143s

Steven D:

perl -ne 'print if 1000020..1000045' iplist

real    0m2.041s

vs.

perl -ne 'print if $. >= 1000020; exit if $. >= 1000045;' iplist

real    0m0.369s

+1 I think this is the best answer here! It would be nice to get how much it takes time with this awk NR==1000020,NR==1000045 textfile in your system. — Léo Léopold Hertz 준영, Sep 11 '15 at 13:02

score 3 · Answer 4 · answered Sep 16 '10 at 04:33

3

ruby -ne 'print if 20 .. 45' file

answered Sep 16 '10 at 04:33

user1606

919

1

a fellow rubyist, you get my vote sir – Stefan Sep 16 '10 at 16:36
1

While we're at it, why not python -c 'import fileinput, sys; [sys.stdout.write(line) for nr, line in enumerate(fileinput.input()) if 19 <= nr <= 44]' too? :-P This is something that Ruby, modeled after Perl, inspired by awk/sed, can do easily. – ephemient Sep 17 '10 at 18:21

Steven D · Answer 5 · 2011-03-05T05:21:24.630

2

Since sed and awk were already taken, here is a perl solution:

perl -nle "print if ($. > 19 && $. < 46)" < textfile

Or, as pointed out in the comments:

perl -ne 'print if 20..45' textfile

edited Mar 05 '11 at 05:21

answered Sep 15 '10 at 19:46

Steven D

46,160

3

What's with all those extra characters? No need to strip and re-add newlines, flip-flop assumes comparison to line number, and diamond operator runs through arguments if provided. perl -ne'print if 20..45' textfile – ephemient Sep 15 '10 at 21:09
1

Nice. -nle is a bit of a reflex I suppose, as for the rest, I have no excuse save ignorance. – Steven D Sep 15 '10 at 21:17

What's the best way to take a segment out of a text file?

5 Answers5

Linked

Related