How to remove identical lines in one file from another, using sed?

Question

I have two files, one being a superset of the other. I want to remove the identical lines in the smaller files from the larger file.

One possible complication is that the lines contain backslashes.

How do I do this?

You (or someone) may be able to leverage some similar code I wrote, which uses sed to strip out all copies of the first line from a file (but leaves the first line in place). — Wildcard, Mar 16 '16 at 11:40

Rajish · Answer 1 · 2011-08-24T09:44:58.553

8

Here is my snippet:

remove_lines()
{
    # remove lines from a file 
    #  
    # $1 - source file with patterns of lines to be removed
    # $2 - destination file
    tmpfile=$(mktemp "$(dirname -- "$2")"/XXXXXXXX) &&
    grep -F -f "$1" -v -- "$2" >>"$tmpfile" &&
    mv -- "$tmpfile" "$2" &&
}

EDIT: I've just realized that there is no sed in it, but that wasn't critical, was it?

edited Aug 24 '11 at 09:44

answered Jun 20 '11 at 09:18

Rajish

797

3

You might want to use mktemp for generating the temporary file name, instead of tmp-$(uuidgen) or similar hacks. – Riccardo Murri Jun 20 '11 at 13:06
OP presupposed that sed was necessary, but didn't specify why another solution would be unacceptable. +1. – Kevin M Jun 20 '11 at 13:40

I used the line in your script like so:

`grep -F -f "uniq_failing_specs.txt" -v -- "all_specs.txt" >>"passing_specs.txt"`

– thekingoftruth Oct 16 '12 at 11:36

@ThomasDickey's point in his answer is correct; by not using -x you open this up to unexpected behavior due to flawed logic. – Wildcard Mar 16 '16 at 11:43

score 2 · Answer 2 · edited Apr 05 '19 at 22:28

2

Try the following script;

## $1 - Small File
## $2 - Large File

sed 's/^/\//; s/$/\/d/; s/\\/\\\\/g' $1 > $HOME/sed_scpt.txt
sed 's/\\/\\\\/g' $2 | sed -f $HOME/sed_scpt.txt > $HOME/desired_output.txt

## Alternatively, you could change the 2nd line with the following;
sed -f $HOME/sed_scpt.txt $2 > $HOME/desired_output.txt

NOTE: I've used GNU sed 4.2.1.

edited Apr 05 '19 at 22:28

Rui F Ribeiro

56,709
26
150
232

answered Jul 03 '11 at 08:25

nvarun

53

score 2 · Answer 3 · answered Feb 15 '16 at 17:10

The answer by @rajish using grep was close, but overlooked something: the question asked about removing identical lines. By default, grep will match strings (parts of lines).

POSIX grep has a suitable option:

-x
Consider only input lines that use all characters in the line excluding the terminating newline to match an entire fixed string or regular expression to be matching lines.

Given that, one could use grep to do this:

cp -f -p input.txt input.txt~
grep -v -x -F -f input.pat input.txt~ >input.txt

where input.pat contains the lines to be removed, and input.txt is the file to be updated.

The solution by @nvarun using sed had a similar problem, in addition to not escaping / characters in the pattern file. This example works for me, and limits the syntax to POSIX sed:

cp -f -p input.txt input.txt~
sed -e 's/\([\/]\)/\\\1/g' -e 's/^/\/^/' -e 's/$/$\/d/' input.pat > input.sed
sed -f input.sed input.txt~ >input.txt

Just to be tidy, both save the original file before updating it (POSIX cp).

input.pat

first
this is second
second/third
second\third

input.txt

first
only first should match
this is not first
this is second
the previous line said this is second
first/second/third
second/third
first\second\third
second\third

Result:

only first should match
this is not first
the previous line said this is second
first/second/third
first\second\third

How to remove identical lines in one file from another, using sed?

3 Answers3