1

I have 2 files, A, with 6 million lines, and B with 5 million lines, I'm trying to get lines that are in A, but are missing from B, with grep -v -f B A, but it's very slow. Is there any way to speed it up?

Fluffy
  • 2,077

2 Answers2

2

If the two files are sorted (in the same locale as the current one), use this command.

comm -23 A.txt B.txt

If they're not sorted and your shell supports ksh-style process substitution:

(export LC_ALL=C; comm -23 <(sort A.txt) <(sort B.txt))

(LC_ALL=C to get a deterministic (and fast) sorting order).

See also the combine utility from moreutils that doesn't require files to be sorted:

combine A.txt not B.txt

Beware it loads the whole files in memory though.

0

If, like me, you need to grep for lines in a file where file1 and file2 don't have identical lines, but file1 contains strings to grep for, you may be able to sort, and then use join.