2

In a Linux shell, how to compare two directories recursively, and for each pair of files (including symlinks and directories) with the same location (including the name) in the two directories and the same contents and different modification times, say which file is older? If the two files of a pair have the same age, there should be NO output for this pair. If the two files of a pair have different contents (and the same location), there should be no output for this pair.

Examples of usage:

# First, ensure that /tmp/d1, /tmp/d2, ~/d3, and ~/f don't exist. Then:
$ cd /tmp
$ mkdir d1 d2 ~/d3
$ touch d1/f && sleep 1 && touch d2/f ~/d3/f ~/f
$ echo "g1" > d1/g
$ echo "g2" > d2/g
$ echo "g3" > ~/d3/g
$ compare_times.sh d1 d2
d1/f is older than d2/f
$ cd d1
$ compare_times.sh . ../d2
f is older than ../d2/f
$ cd ../d2
$ compare_times.sh . ../d1
../d1/f is older than f
$ cd ..
$ compare_times.sh /tmp/d1 d2
/tmp/d1/f is older than d2/f
$ compare_times.sh d1 /tmp/d2
d1/f is older than /tmp/d2/f
$ compare_times.sh d1 ~/d3
d1/f is older than ~/d3/f
$ cd d1
$ compare_times.sh ~ .
f is older than ~/f

Our comparison script compare_times.sh (naturally, you may opt for compare_times.zsh instead if you happen to program in zsh instead of dash or bash) should accept two arguments, which in general can be absolute or relative paths of directories (including simply .), potentially terminated by /.

The unquoted string ~ should be interpreted as the home directory as usual in any of the two arguments. It'd probably be an overkill to actually print the home directory in the output; the concise ~ in the output should do.

The output should be possibly concise as usual: e.g., a path // should be shortened to / (for all nonempty and all ), a path /./ should be shortened to / (for all nonempty and all ), a path / should be shortened to (for all nonempty ), a path ./ should be shortened to (for all nonempty not starting with /), and a prefix of 3 or more / not followed by a / should be shortened to the prefix /.

(Rationale. We prefer concise output because it will be manually inspected later to try to find the reasons for the different timestamps; after inspection, the user plans to manually equalize the timestamps of the otherwise equal files (depending on the result of the inspection). We found out that superfluous characters produce longer output lines and waste the user's time while he/she registers information unrelated to his/her primary question. Longer output lines tend to wrap at the window border more often, and selecting a two-line path with a mouse might take slightly longer than selecting a single-line path.)

There was a question with answers (including one particular solution with a bash script with four arguments and another solution with a zsh script) earlier somewhere on one of the SE sites, but I can't find this question any longer. Whoever finds it, if it still exists, please feel free to mark this question as a duplicate of the found one.

AlMa1r
  • 125
  • 1
    Many shells expand an unquoted ~ to the value of $HOME, but ~ is also otherwise a valid name for a file or directory. To me, it would be as wrong for your script to treat ~ as meaning $HOME as it would be for say rm. For instance, what should your script output after mkdir '~~' '~'; echo a > '~~/file'; echo b > '~/file'; your-script '~' '~~'? – Stéphane Chazelas Jan 27 '24 at 14:56
  • Comments have been moved to chat; please do not continue the discussion here. Before posting a comment below this one, please review the purposes of comments. Comments that do not request clarification or suggest improvements usually belong as an answer, on [meta], or in [chat]. Comments continuing discussion may be removed. – Jeff Schaller Jan 27 '24 at 17:25

3 Answers3

4

This script will compare two directory trees per your requirement. File names are not restricted. (If you don't have GNU comm, find and sort, or other equivalent tools that can handle NULL-terminated records, you'll have to forego the ability to handle newlines in file names.)

#!/bin/bash
#
d1=$1 d2=$2

Identify the set of matching file names

LC_ALL=C comm -z -12
<( cd -P -- "$d1" && find . -type f -print0 | LC_ALL=C sort -z )
<( cd -P -- "$d2" && find . -type f -print0 | LC_ALL=C sort -z ) | while IFS= read -rd '' fn do # Tidy the filenames fn=${fn#./} f1="$d1/$fn" f2="$d2/$fn"

    # Compare content
    if cmp -s -- &quot;$f1&quot; &quot;$f2&quot;
    then
        # Report older/newer file pairs (not those the same)
        [[ &quot;$f1&quot; -ot &quot;$f2&quot; ]] &amp;&amp; printf '%s is older than %s\n' &quot;${f1#./}&quot; &quot;${f2#./}&quot;
        [[ &quot;$f2&quot; -ot &quot;$f1&quot; ]] &amp;&amp; printf '%s is older than %s\n' &quot;${f2#./}&quot; &quot;${f1#./}&quot;
    fi
done

Chris Davies
  • 116,213
  • 16
  • 160
  • 287
  • The -print0 aspect is brilliant, thank you for that. It never ceases to amaze me why people put SPACE characters in directory names. – J_H Jan 27 '24 at 23:01
  • First, thank you a lot, and thank you once again. Second, tidying up doesn't work: mkdir d1 d2 && touch d1/f && sleep 1 && touch d2/f && cd d1 && compare_times.sh . ../d2 yields ./f is older than ../d2/f, whereas we expect f is older than ../d2/f. Third, the second comparison can be skipped if the first one succeeded: if [[ "$d1/$fn" -ot "$d2/$fn" ]]; then printf '%s is older than %s\n' "$d1/$fn" "$d2/$fn"; else if [[ "$d2/$fn" -ot "$d1/$fn" ]]; then printf '%s is older than %s\n' "$d2/$fn" "$d1/$fn"; fi; fi. – AlMa1r Jan 28 '24 at 01:00
  • Finally, why do we say printf '%s is older than %s' "$d1/$fn" "$d2/$fn" and not simpler echo "$d1/$fn" is older than "$d2/$fn"? – AlMa1r Jan 28 '24 at 01:03
  • You specified the directory . to search in, so the file that matched is ./f. There's nothing stopping you taking this code and adapting it further in your own environment
  • – Chris Davies Jan 28 '24 at 07:58
  • I felt mine was easier to read
  • – Chris Davies Jan 28 '24 at 07:58
  • Consider a directory called -n. Some versions of echo would parse that as an argument. Using printf guarantees dependability
  • – Chris Davies Jan 28 '24 at 08:00