How to pipe output of find as input for diff?

Question

This is a basic question regarding bash but I could not find a solution.

I have many subdirectories with identically named files and I want to compare all of them for identity.

I can return list of this files with

find . -name "protein.mol2"

I know that file can be used as query in diff

diff -q --from-file dir1/file dir2/file dir3/file; echo $?

How do I pipe the output of find to diff?

Possible duplicate of Storing output of command in shell variable — Dmitry Grigoryev, Oct 16 '15 at 12:26
@roaima Command substitution can be used in many scenarios, and I believe that diff $(find) is really the same as e.g. mkdir $(date), or var=$(pwd). Do you think those should be three separate questions? — Dmitry Grigoryev, Oct 16 '15 at 13:21
@Dmitry definitely not! The potential complexity here is the correct handling of filenames containing spaces, etc., which doesn't lend itself cleanly to the diff $(...) style — Chris Davies, Oct 16 '15 at 14:52
@roaima That would be "why is my script failing on some files?" and I see no evidence of that in the question. — Dmitry Grigoryev, Oct 16 '15 at 15:07

score 8 · Accepted Answer · answered Oct 16 '15 at 09:47

The --from-file option allows you to compare one file against many files (rather than something like tar --files-from which reads a list of files to operate on from a file). It has an analogous --to-file, which of the two you use would depend on relative "direction" of the change. Since you're using -q which only says if there is a difference, this hopefully should not matter to you here.

I assume you have a reference file, and you wish to compare it to a set of identically named files, so either of these should work:

diff -q --from-file dir1/protein.mol2 $(find . -name protein.mol2)
find . -name protein.mol2 | xargs diff -q --from-file dir1/protein.mol2

In the first case diff will run only once, and its exit code will reflect whether or not any differences were found in the set.

In the second case diff may run more than once. This second form can be used in case you have a large number of files (or very long file/directory names) and hit a command argument limit (usually 128kB on Linux systems).

score 2 · Answer 2 · answered Oct 16 '15 at 09:05

2

try

diff -q --from-file $(find . -name "protein.mol2" -print) ; echo $?

$( ) construct basically insert list of file from find.

answered Oct 16 '15 at 09:05

Archemar

31,554

Thanks, this works! But the second answer is more detailed. – DrDom Oct 16 '15 at 10:04

Chris Davies · Answer 3 · 2015-10-16T15:27:31.947

If you simply want to compare them for identity then you could consider using something like a checksum to tag the file based on its content:

find . -name 'protein.mol2' -exec cksum {} + | sort

You can save the output to a file. Lines with the first pair of numbers the same represent files that are (almost certainly) identical. This extension to the command will group files by identity:

find . -name 'protein.mol2' -exec cksum {} + |
sort |
while read c1 c2 file
do
    test "$c1-$c2" != "$o1-$o2" && echo
    echo "$file"
    o1="$c1" o2="$c2"
done

_{As a one-liner it's find . -name 'protein.mol2' -exec cksum {} + | sort | while read c1 c2 file; do test "$c1-$c2" != "$o1-$o2" && echo; echo "$file"; o1="$c1" o2="$c2"; done but it would probably be better off being put into a script file for reuse.}

How to pipe output of find as input for diff?

3 Answers3