I want to extract an exact line from a very big file. For example, line 8000 would be gotten like this:
command -line 8000 > output_line_8000.txt
I want to extract an exact line from a very big file. For example, line 8000 would be gotten like this:
command -line 8000 > output_line_8000.txt
There's already an answer with perl
and awk
. Here's a sed
answer:
sed -n '8000{p;q}' file
The advantage of the q
command is that sed
will quit as soon as the 8000-th line is read (unlike the other (it was changed after common creativity, haha)).perl
and awk
methods
A pure Bash possibility (bash≥4):
mapfile -s 7999 -n 1 ary < file
printf '%s' "${ary[0]}"
This will slurp the content of file
in an array ary
(one line per field), but skip the first 7999 lines (-s 7999
) and only read one line (-n 1
).
mapfile
variant didn't make it way, surprisingly :)
.
– devnull
May 17 '14 at 09:09
It's Saturday and I had nothing better to do so I tested some of these for speed. It turns out that the sed
, gawk
and perl
approaches are basically equivalent. The head&tail one is the slowest but, suprisingly, the fastest by an order of magnitude is the pure bash one:
Here are my tests:
$ for i in {1..5000000}; do echo "This is line $i" >>file; done
The above creates a file with 50 million lines which occupies 100M.
$ for cmd in "sed -n '8000{p;q}' file" \
"perl -ne 'print && exit if $. == 8000' file" \
"awk 'FNR==8000 {print;exit}' file"
"head -n 8000 file | tail -n 1" \
"mapfile -s 7999 -n 1 ary < file; printf '%s' \"${ary[0]}\"" \
"tail -n 8001 file | head -n 1"; do
echo "$cmd"; for i in {1..100}; do
(time eval "$cmd") 2>&1 | grep -oP 'real.*?m\K[\d\.]+'; done |
awk '{k+=$1}END{print k/100}';
done
sed -n '8000{p;q}' file
0.04502
perl -ne 'print && exit if $. == 8000' file
0.04698
awk 'FNR==8000 {print;exit}' file
0.04647
head -n 8000 file | tail -n 1
0.06842
mapfile -s 7999 -n 1 ary < file; printf '%s' "This is line 8000
"
0.00137
tail -n 8001 file | head -n 1
0.0033
tail | head
, which was the best method in your benchmark last time this came up (bash's mapfile
didn't come up that time).
– Gilles 'SO- stop being evil'
May 18 '14 at 23:12
mapfile
is still the fastest but just barely.
– terdon
May 19 '14 at 00:04
You can do it many ways.
Using perl
:
perl -nle 'print && exit if $. == 8000' file
Using awk
:
awk 'FNR==8000 {print;exit}' file
Or you can use tail
and head
to prevent from reading entire file until the 8000th line:
tail -n +8000 | head -n 1
You could use sed
:
sed -n '8000p;' filename
If the file is large, then it'd be better to quit:
sed -n '8000p;8001q' filename
You could similarly quit reading the entire file using awk
or perl
too:
awk 'NR==8000{print;exit}' filename
perl -ne 'print if $.==8000; last if $.==8000' filename