Let's say, I have a really big text file (about 10.000.000 lines). I need to grep
it from the end and save result to a file. What's the most efficient way to accomplish task?
Asked
Active
Viewed 1.7k times
45

chaos
- 48,171
4 Answers
47
tac/grep Solution
tac file | grep whatever
Or a bit more effective:
grep whatever < <(tac file)
Time with a 500MB file:
real 0m1.225s
user 0m1.164s
sys 0m0.516s
sed/grep Solution:
sed '1!G;h;$!d' | grep whatever
Time with a 500MB file: Aborted after 10+ minutes.
awk/grep Solution:
awk '{x[NR]=$0}END{while (NR) print x[NR--]}' file | grep whatever
Time with a 500MB file:
real 0m5.626s
user 0m4.964s
sys 0m1.420s
perl/grep Solution:
perl -e 'print reverse <>' file | grep whatever
Time with a 500MB file:
real 0m3.551s
user 0m3.104s
sys 0m1.036s

chaos
- 48,171
-
@chaos, I think
grep "somepattern" < <(tac filename)
will be faster. – Valentin Bajrami Jul 23 '14 at 12:43 -
2@val0x00ff The
< <(tac filename)
should be as fast as a pipe: in both cases, the commands run in parallel. – vinc17 Jul 23 '14 at 12:46 -
7If you're going for efficiency, it would be better to put the
tac
after the grep. If you've got a 10,000,000 line file, with only 2 matches,tac
will only have to reverse 2 lines, not 10m.grep
is still going to have to go through the whole thing either way. – phemmer Jul 23 '14 at 14:10 -
3If you put
tac
after thegrep
, it will be reading from a pipe and so can't seek. That will make it less efficient (or fail completely) if the number of found lines is large. – jjanes Jul 23 '14 at 19:45 -
@jjanes Can you expand a bit on that? I don't get your point, what is
tac
trying to seek? – Bernhard Jul 24 '14 at 07:12 -
-
1@Bernhard If you tac a real file, it
lseek
s backwards through the file to read it backwards in chunks, and then reverses the lines in each chunk, remembering the line broken across chunks to put them back together. If reading from a pipe, it can't do that. It either needs to read the whole thing into memory, or write it to a temp file, or fail. – jjanes Jul 24 '14 at 15:52 -
But if you put
tac
aftergrep
, it only has to reverse the matched lines, not the whole file. So unless you're matching lots of lines in the file, it should be reasonably efficient. – Barmar Jul 24 '14 at 17:04 -
@Patrick, et. al. So the “obvious” compromise is to do
grep (pattern) (input_file) > (temp_file); tac (temp_file) > (output_file); rm (temp_file)
, right? Note that, if the user wants to know line numbers of matches (by specifying the-n
option), this will report correct line numbers in the original input file, whereastac (input_file) | grep -n (pattern)
will report, for example, the third-to-last line in the file as line3
. (Of course, that might be what the OP wants.) – Scott - Слава Україні Aug 16 '14 at 20:26 -
@chaos Would be please decipher the sed/grep solution functionality is some more detail? Where is the file in that command? – Geek Sep 30 '15 at 13:40
17
This solution might help:
tac file_name | grep -e expression

derobert
- 109,670

Anshul Patel
- 651
- 5
- 11
-
3
tac
is the GNU command. On most other systems, the equivalent istail -r
. – Stéphane Chazelas Jul 23 '14 at 14:55 -
@Stéphane: On at least some Unix systems,
tail -r
is limited to a small number of lines, this might be an issue. – RedGrittyBrick Jul 23 '14 at 16:20 -
1@RedGrittyBrick, do you have any reference for that, or could you please tell which systems have that limitation? – Stéphane Chazelas Jul 23 '14 at 16:50
-
@StéphaneChazelas,
tail -r /etc/passwd
fails withtail: invalid option -- 'r'
. I'm using coreutils-8.21-21.fc20.x86_64. – Cristian Ciupitu Jul 23 '14 at 20:14 -
@CristianCiupitu, as I said, GNU has
tac
(and only GNU has tac) many other Unices havetail -r
. GNUtail
doesn't support-r
– Stéphane Chazelas Jul 23 '14 at 22:41 -
So a more portable solution would be
(if command -v tac >/dev/null 2>&1; then file_name; else tail -r file_name; fi) |grep expression
(this should be a fair assumption since GNU Coreutils supplies bothtac
andtail
, so a system withouttac
should have non-GNUtail
and therefore support fortail -r
). – Adam Katz Jan 15 '15 at 17:35
10
This one exits as soon as it finds the first match:
tac hugeproduction.log | grep -m1 WhatImLookingFor
The following gives the 5 lines before and after the first two matches:
tac hugeproduction.log | grep -m2 -A 5 -B 5 WhatImLookingFor
Remember not to use -i
(case insensitive) unless you have to as that will slow down the grep.
If you know the exact string you are looking for then consider fgrep
(Fixed String)
tac hugeproduction.log | grep -F -m2 -A 5 -B 5 'ABC1234XYZ'

zzapper
- 1,140
9
If the file is really big, can not fit in memory, I will use Perl
with File::ReadBackwards module from CPAN
:
$ cat reverse-grep.pl
#!/usr/bin/perl
use strict;
use warnings;
use File::ReadBackwards;
my $pattern = shift;
my $rev = File::ReadBackwards->new(shift)
or die "$!";
while (defined($_ = $rev->readline)) {
print if /$pattern/;
}
$rev->close;
Then:
$ ./reverse-grep.pl pattern file

cuonglm
- 153,898
-
The advantage of this approach is that you can tweak the Perl to do anything you want. – zzapper Jul 24 '14 at 15:52
-
1@zzapper: It's memory efficient, too, since when it read file line by line instead of slurp file in memory like
tac
. – cuonglm Jul 24 '14 at 15:54 -
can anyone add a -m support for this ? I'd like to test in on real files. See : https://gist.githubusercontent.com/ychaouche/cdbacdc114e7c401b16ac1643071b83a/raw/a40dbb6bc696c7e96237a3350d86e7a7eb217e54/gistfile1.txt – ychaouche Nov 05 '18 at 14:29
grep
has a--max-count (number)
switch that aborts after a certain number of matches, which might be interesting to you. – Ulrich Schwarz Jul 23 '14 at 13:28