I have 2 files, A, with 6 million lines, and B with 5 million lines, I'm trying to get lines that are in A, but are missing from B, with grep -v -f B A
, but it's very slow. Is there any way to speed it up?
Asked
Active
Viewed 384 times
2 Answers
2
If the two files are sorted (in the same locale as the current one), use this command.
comm -23 A.txt B.txt
If they're not sorted and your shell supports ksh-style process substitution:
(export LC_ALL=C; comm -23 <(sort A.txt) <(sort B.txt))
(LC_ALL=C
to get a deterministic (and fast) sorting order).
See also the combine
utility from moreutils
that doesn't require files to be sorted:
combine A.txt not B.txt
Beware it loads the whole files in memory though.

Stéphane Chazelas
- 544,893

Eranda Peiris
- 335
0
If, like me, you need to grep for lines in a file where file1 and file2 don't have identical lines, but file1 contains strings to grep for, you may be able to sort
, and then use join
.

Max Bileschi
- 241
-F
and-x
options if you are matching whole lines literally (no regex) – Sundeep Apr 13 '18 at 09:11