Is there a way to speed up grep -v -f?

Question

I have 2 files, A, with 6 million lines, and B with 5 million lines, I'm trying to get lines that are in A, but are missing from B, with grep -v -f B A, but it's very slow. Is there any way to speed it up?

is the input data ASCII? you could add -F and -x options if you are matching whole lines literally (no regex) — Sundeep, Apr 13 '18 at 09:11
https://unix.stackexchange.com/questions/418429/find-intersection-of-lines-in-two-files might help — Sundeep, Apr 13 '18 at 09:13
Related: Linux tools to treat files as sets and perform set operations on them — Stéphane Chazelas, Feb 28 '23 at 14:38

score 2 · Accepted Answer · edited Feb 28 '23 at 14:36

2

If the two files are sorted (in the same locale as the current one), use this command.

comm -23 A.txt B.txt

If they're not sorted and your shell supports ksh-style process substitution:

(export LC_ALL=C; comm -23 <(sort A.txt) <(sort B.txt))

(LC_ALL=C to get a deterministic (and fast) sorting order).

See also the combine utility from moreutils that doesn't require files to be sorted:

combine A.txt not B.txt

Beware it loads the whole files in memory though.

edited Feb 28 '23 at 14:36

Stéphane Chazelas

544,893

answered Apr 13 '18 at 09:11

Eranda Peiris

335

4

that assumes input is sorted – Sundeep Apr 13 '18 at 09:14
1

Thanks mate, sorted the files beforehand and it worked instantly – Fluffy Apr 13 '18 at 09:15

score 0 · Answer 2 · answered Feb 28 '23 at 14:10

0

If, like me, you need to grep for lines in a file where file1 and file2 don't have identical lines, but file1 contains strings to grep for, you may be able to sort, and then use join.

answered Feb 28 '23 at 14:10

Max Bileschi

241

Is there a way to speed up grep -v -f?

2 Answers2