4

This is a basic question regarding bash but I could not find a solution.

I have many subdirectories with identically named files and I want to compare all of them for identity.

I can return list of this files with

find . -name "protein.mol2"

I know that file can be used as query in diff

diff -q --from-file dir1/file dir2/file dir3/file; echo $?

How do I pipe the output of find to diff?

Erathiel
  • 1,575
DrDom
  • 143

3 Answers3

8

The --from-file option allows you to compare one file against many files (rather than something like tar --files-from which reads a list of files to operate on from a file). It has an analogous --to-file, which of the two you use would depend on relative "direction" of the change. Since you're using -q which only says if there is a difference, this hopefully should not matter to you here.

I assume you have a reference file, and you wish to compare it to a set of identically named files, so either of these should work:

diff -q --from-file dir1/protein.mol2 $(find . -name protein.mol2)
find . -name protein.mol2 | xargs diff -q --from-file dir1/protein.mol2

In the first case diff will run only once, and its exit code will reflect whether or not any differences were found in the set.

In the second case diff may run more than once. This second form can be used in case you have a large number of files (or very long file/directory names) and hit a command argument limit (usually 128kB on Linux systems).

mr.spuratic
  • 9,901
2

try

diff -q --from-file $(find . -name "protein.mol2" -print) ; echo $?
  • $( ) construct basically insert list of file from find.
Archemar
  • 31,554
0

If you simply want to compare them for identity then you could consider using something like a checksum to tag the file based on its content:

find . -name 'protein.mol2' -exec cksum {} + | sort

You can save the output to a file. Lines with the first pair of numbers the same represent files that are (almost certainly) identical. This extension to the command will group files by identity:

find . -name 'protein.mol2' -exec cksum {} + |
sort |
while read c1 c2 file
do
    test "$c1-$c2" != "$o1-$o2" && echo
    echo "$file"
    o1="$c1" o2="$c2"
done


As a one-liner it's find . -name 'protein.mol2' -exec cksum {} + | sort | while read c1 c2 file; do test "$c1-$c2" != "$o1-$o2" && echo; echo "$file"; o1="$c1" o2="$c2"; done but it would probably be better off being put into a script file for reuse.
Chris Davies
  • 116,213
  • 16
  • 160
  • 287