I have two files, one being a superset of the other. I want to remove the identical lines in the smaller files from the larger file.
One possible complication is that the lines contain backslashes.
How do I do this?
I have two files, one being a superset of the other. I want to remove the identical lines in the smaller files from the larger file.
One possible complication is that the lines contain backslashes.
How do I do this?
Here is my snippet:
remove_lines()
{
# remove lines from a file
#
# $1 - source file with patterns of lines to be removed
# $2 - destination file
tmpfile=$(mktemp "$(dirname -- "$2")"/XXXXXXXX) &&
grep -F -f "$1" -v -- "$2" >>"$tmpfile" &&
mv -- "$tmpfile" "$2" &&
}
EDIT: I've just realized that there is no sed
in it, but that wasn't critical, was it?
mktemp
for generating the temporary file name, instead of tmp-$(uuidgen)
or similar hacks.
– Riccardo Murri
Jun 20 '11 at 13:06
`grep -F -f "uniq_failing_specs.txt" -v -- "all_specs.txt" >>"passing_specs.txt"`
– thekingoftruth
Oct 16 '12 at 11:36
-x
you open this up to unexpected behavior due to flawed logic.
– Wildcard
Mar 16 '16 at 11:43
Try the following script;
## $1 - Small File
## $2 - Large File
sed 's/^/\//; s/$/\/d/; s/\\/\\\\/g' $1 > $HOME/sed_scpt.txt
sed 's/\\/\\\\/g' $2 | sed -f $HOME/sed_scpt.txt > $HOME/desired_output.txt
## Alternatively, you could change the 2nd line with the following;
sed -f $HOME/sed_scpt.txt $2 > $HOME/desired_output.txt
NOTE: I've used GNU sed 4.2.1.
The answer by @rajish using grep
was close, but overlooked something: the question asked about removing identical lines. By default, grep
will match strings (parts of lines).
POSIX grep has a suitable option:
-x
Consider only input lines that use all characters in the line excluding the terminating newline to match an entire fixed string or regular expression to be matching lines.
Given that, one could use grep
to do this:
cp -f -p input.txt input.txt~
grep -v -x -F -f input.pat input.txt~ >input.txt
where input.pat contains the lines to be removed, and input.txt is the file to be updated.
The solution by @nvarun using sed
had a similar problem, in addition to not escaping /
characters in the pattern file. This example works for me, and limits the syntax to POSIX sed:
cp -f -p input.txt input.txt~
sed -e 's/\([\/]\)/\\\1/g' -e 's/^/\/^/' -e 's/$/$\/d/' input.pat > input.sed
sed -f input.sed input.txt~ >input.txt
Just to be tidy, both save the original file before updating it (POSIX cp).
input.pat
first
this is second
second/third
second\third
input.txt
first
only first should match
this is not first
this is second
the previous line said this is second
first/second/third
second/third
first\second\third
second\third
Result:
only first should match
this is not first
the previous line said this is second
first/second/third
first\second\third
sed
to strip out all copies of the first line from a file (but leaves the first line in place). – Wildcard Mar 16 '16 at 11:40