0

Consider the following script compare_times.sh (thx to http://superuser.com/a/1780500):

#!/bin/bash
# Syntax: compare_times.sh directory_1 directory_2
# Semantics: for pairs of files with the same path at any depth in both directory_1 and directory_2, compare the file modification times and, if they differ, say which of the two files is older.
case "$2" in
    /*)
        # $2 is an absolute path
        cd $1
        find . -type f -exec bash -c "if [[ -f \"{}\" && -f \"$2/{}\" ]]; then if (cmp -s \"{}\" \"$2/{}\") then if [[ \"{}\" -ot \"$2/{}\" ]]; then echo \"$1/{} is older than $2/{}\"; else if [[ \"$2/{}\" -ot \"{}\" ]]; then echo \"$2/{} is older than $1/{}\"; fi; fi; fi; fi" \;;;
    *)
        # $2 is a relative path
        WORKING_DIR=$PWD
        cd $1
        find . -type f -exec bash -c "if [[ -f \"{}\" && -f \"$WORKING_DIR/$2/{}\" ]]; then if (cmp -s \"{}\" \"$WORKING_DIR/$2/{}\") then if [[ \"{}\" -ot \"$WORKING_DIR/$2/{}\" ]]; then echo \"$1/{} is older than $2/{}\"; else if [[ \"$WORKING_DIR/$2/{}\" -ot \"{}\" ]]; then echo \"$2/{} is older than $1/{}\"; fi; fi; fi; fi" \;
esac

Computation time and potential vulnerabilities in path names put aside, the script seems to work fine. However, the output has superfluous ./:

$ pwd
/tmp
$ ls --full-time ad bd | cut -d ' ' -f 6-
ad:

2023-05-14 15:38:02.707216583 +0200 f

bd:

2023-05-14 15:38:06.835165122 +0200 f $ compare_times.sh ad bd ad/./f is older than bd/./f $ compare_times.sh /tmp/ad bd /tmp/ad/./f is older than bd/./f $ compare_times.sh ad /tmp/bd ad/./f is older than /tmp/bd/./f $ cd ad $ compare_times.sh . ../bd
././f is older than ../bd/./f $ compare_times.sh . /tmp/bd ././f is older than /tmp/bd/./f $ cd ../bd $ compare_times.sh ../ad . ../ad/./f is older than ././f $ compare_times.sh /tmp/ad . /tmp/ad/./f is older than ././f

How to get rid of these ./ to clean up the output and make it more readable? For example, for the commands issued above, the expected output should be this:

$ compare_times.sh ad bd
ad/f is older than bd/f
$ compare_times.sh /tmp/ad bd
/tmp/ad/f is older than bd/f
$ compare_times.sh ad /tmp/bd
ad/f is older than /tmp/bd/f
$ cd ad
$ compare_times.sh . ../bd      
f is older than ../bd/f
$ compare_times.sh . /tmp/bd
f is older than /tmp/bd/f
$ cd ../bd
$ compare_times.sh ../ad .
../ad/f is older than f
$ compare_times.sh /tmp/ad .
/tmp/ad/f is older than f

4 Answers4

0

That is really terrible code.

The first step to make this MUCH easier to read and understand would be to split this in two scripts. But it can be done with just one script, too:

#! /bin/bash -

if [[ "$#" -eq 2 ]]; then

[[ -d "$1" && -d "$2" ]] || exit 2

scriptpath="$(realpath -- "$0")"
d1_path="$(realpath -- "$1")"
d2_path="$(realpath -- "$2")"
PWD_ORI="$(realpath -- "$PWD")"
cd -- "$d1_path" || exit 1
find . -type f -exec "$scriptpath" "$d1_path" "$d2_path" {} "$PWD_ORI" \;

elif [[ "$#" -eq 4 ]]; then

[[ -d "$1" && -d "$2" && -f "$3" ]] || exit 2

d1_path="$1"
d2_path="$2"
file_relpath="$3"
file_relpath="${file_relpath#./}"
f1_path="${d1_path}/${file_relpath}"
f2_path="${d2_path}/${file_relpath}"
PWD_ORI="$4"

if [[ -f "$f1_path" && -f "$f2_path" ]]; then
    if cmp -s -- "$f1_path" "$f2_path"; then
        if   [[ "$f1_path" -ot "$f2_path" ]]; then
            printf '%s\n' "'${f1_path#"$PWD_ORI"}' is older than '${f2_path#"$PWD_ORI"}'"
        elif [[ "$f2_path" -ot "$f1_path" ]]; then
            printf '%s\n' "'${f2_path#"$PWD_ORI"}' is older than '${f1_path#"$PWD_ORI"}'"
        fi
    fi
fi

fi

Hauke Laging
  • 90,279
  • Thx! I'm testing this now. What does the dash - in #! /bin/bash -mean? –  May 17 '23 at 17:25
  • @AlMa0 "A -- signals the end of options and disables further option processing. Any arguments after the -- are treated as filenames and arguments. An argument of - is equivalent to --." I guess this is a protection against the rather extreme case that a script's file name starts with - and that script is in the $PATH. – Hauke Laging May 17 '23 at 20:30
  • I found “Any arguments after the -- are treated as filenames and arguments” in a man page for bash. The quotation is strange. To see this, take an invocation line bash -- with a single-word argument without spaces. This invocation exhibits the called program bash, its first argument --, and its second argument . By quotation, is treated as a filename and an argument. But we already knew that is an argument. So why not saying “Any arguments after the -- are treated as filenames”? The addendum “and arguments” carries no additional information. –  May 17 '23 at 22:55
  • If you object and say that “and” in “filenames and arguments” should be interpreted as a logical “or” per argument, then the quotation becomes completely tautological. Any argument of any program is, logically speaking, 〈anything〉 or argument. For example, any argument is a crocodile or an argument. This is because whichever argument you give me, it will still be an argument, even it is nothing else in addition, such as a crocodile or a filename. So if we dare to interpret the “and” as an “or” per argument, the quotation would be a tautology and could be dropped altogether. –  May 17 '23 at 23:04
  • In summary, I don't understand the quotation. –  May 17 '23 at 23:09
  • @AlMa0 I really don't understand why you feel the need to have such a discussion about the wording of the man page. You obviously misunderstand "argument to bash" vs. "argument to the script". If there are two elements then they are both arguments to bash but only the second one is an argument to the script. That's not wording but a hard technical difference. – Hauke Laging May 17 '23 at 23:18
  • No because in the context of the quotation in the man page, there are no scripts. That is, they don't talk about #! /bin/bash - or #! /bin/bash --. I'm trying to understand what they mean originally in the man page. It seems to me that what they could mean is simply A -- signals the end of options and disables further option processing. Any arguments after -- are not interpreted as invocation options of bash but as a command string or a file name. Is this what they really mean? That would make some sense, but they don't say it this way. –  May 17 '23 at 23:34
  • @AlMa0 Of course, there are. “Any arguments after the -- are treated as filenames and arguments” The file is the script. Maybe you don't know how the shebang line works. If you run a script ./script then the kernel detects it is not a binary and takes the elements after #! of the first line in the file and creates a command line of them and the filename. Usually the only element is /bin/bash so the command line the kernel runs is /bin/bash ./script. Or, in this case, /bin/bash - ./script – Hauke Laging May 18 '23 at 13:17
  • Oh. That's what they want to say. So the generic “filenames and arguments” means, specifically in that position, “the name of the script file to run and it's arguments”. Thanks! So, e.g., in our case, #!/bin/bash -- would yield /bin/bash -- ./script, wouldn't it? –  May 18 '23 at 13:32
  • @AlMa0 Yes, that is correct (and was already part of my last comment). – Hauke Laging May 18 '23 at 13:57
  • Notice the double dash. I wanted to make sure we understand each other. –  May 18 '23 at 14:11
  • @Stéphane Chazelas Why scriptpath="$(realpath -- "$0")"? What could go wrong with a simpler scriptpath=$0? What could go wrong with a simpler find . -type f -exec $0 … or, say, find . -type f -exec "$0" …? –  Jun 28 '23 at 01:21
  • @AlMa0, you need an absolute path because after cd, a relative $0 would no longer refer to the same file. – Stéphane Chazelas Jun 28 '23 at 03:54
  • @StéphaneChazelas Thx; I see. So we need at least $(realpath -- $0). You introduce double quotation marks: "$(realpath -- "$0")". What could go wrong without them, i.e., with scriptpath=$(realpath -- $0)? –  Jun 28 '23 at 14:24
  • @AlMa0 about missing quotes, see for instance Security implications of forgetting to quote a variable in bash/POSIX shells. Note that this is not my answer, my own answer is there and uses zsh which doesn't have most of bash's issues. – Stéphane Chazelas Jun 28 '23 at 16:09
  • @StéphaneChazelas I see I've got quite a lot to learn. As for my “smth. is wrong” comment, it's directed more to the original author than to you. As for zsh, it's postponed: I wish to try to understand this answer first before tackling other answers (and, e.g., switching from bash to zsh, which is completely new to me). –  Jun 28 '23 at 23:56
  • @HaukeLaging Something is wrong. Open a fresh xterm with bash and do http://pastebin.com/bKiBstav , where cmp_times.sh is the name of the script. This yields '/home/username/f' is older than '/f'. The last '/f' is wrong; it should be f or at least /tmp/d/f. Might I kindly ask you to look into your answer again? –  Jun 30 '23 at 22:52
0

Sorry, I don't like much approaches with -exec, I will show a batch style approach with objects in a sorted flow where first you collect (name, size, age), then you introduce logic. With the following code, output is ready to pretty print or to source into an interpreter with adhoc function names (e.g only would cp "$2/$1" "$3/$1", differ would overwrite,older would touch & same would ignore).

Character bandwidth may hinder any parser, it is a serious point that emerges at interfaces needing parsing, typically the invocation of cmp, in any style of solution. With high standards regarding injection and character bandwidth, I am impressed by zsh handling of strings as shown in an alternate answer.

#!/bin/bash
# <$1: mandatory directory
# <$2: mandatory directory
# <$FS: optional separator (default ";")
# <$RS: optional separator (default "\n")
# >stdout: differ|older|same|only relative_path directory directory
[ -d "$1" ] && [ -d "$2" ] && [[ "$1" != "$2" ]] || exit 8
FS="${FS:-;}"
RS="${RS:-$'\n'}"
(
  find "$1" -type f -printf "%T@$FS%s$FS$1$FS%P$RS";
  find "$2" -type f -printf "%T@$FS%s$FS$2$FS%P$RS";
) | # mod time ; size ; directory ; relative path
LC_COLLATE=C sort -t "$FS" -k4 -k1,1n |
awk 'NF!=4{ print "ERROR: lost parsing "FNR":"$0 >"/dev/stderr"
    exit 8
  }
  function alter(dir) { return (dir == d1 ? d2 : d1) }
  function quote(s){ return SQ gensub(SQ,BQ,"g",s) SQ }
  # differ 'path' 'dir1' 'dir2' : older    in dir1 than in dir2 and have different content
  # older  'path' 'dir1' 'dir2' : older    in dir1 than in dir2 and have same content
  # same   'path' 'dir1' 'dir2' : same age in dir1 than in dir2 and have same content
  pname && pname == $4 {
    cmp = "cmp -s "quote(pdir"/"pname)" "quote(alter(pdir)"/"pname)
    print( (psize == $2 && !system(cmp)) ? (ptime == $1 ? "same" : "older") : "differ",
      quote(pname), quote(pdir), quote(alter(pdir)))
    pname = ""; next
  }
  # only  'path' 'dir1' 'dir2' : path exist in dir1 not    in dir2
  pname { print("only", quote(pname), quote(pdir), quote(alter(pdir)))
    pname = ""
  }
  END { if (pname) print("only", quote(pname), quote(pdir), quote(alter(pdir)))
  }
  { pname = $4; pdir = $3; psize = $2; ptime = $1;
  }
' d1="$1" d2="$2" FS="$FS" RS="$RS" OFS=" " SQ="'" BQ="'\\\\\\\\''"

With the same approach but less injection opportunity, yet no single existence nor difference detection, fdupes collect much information for you, including age and similarity, that you sort in the pipe like above. If you wanted to add functionality, you can merge other collected file system information in the convenient format.

#!/bin/bash
# <$1: mandatory directory
# <$2: mandatory directory
# >stdout: "older" relative_path directory directory
[ -d "$1" ] && [ -d "$2" ] && [[ "$1" != "$2" ]] || exit 8
fdupes -q -t -r "$1" "$2" |
awk '
  !NF{same++; next} # similarity id
  NF > 1 && $1" "$2 ~ "....-..-.. ..:.." {
    sub(" "d1"/"," {D1} "); sub(" "d2"/"," {D2} ")
    printf(ORS"%06d %s", same,$0) 
    next}
  {printf("\\n%s",$0)} # newline in name    
' d1="$1" d2="$2" | LC_COLLATE=C sort -k5 -k2,3 |
# same yyyy-mm-dd HH:MM {D} name 
awk '
  function direc(dir) { return (dir == "{D1}" ? d1 : d2) }
  function alter(dir) { return (dir == "{D1}" ? d2 : d1) }
  pname && pname == $5 {
    if ( psame == $1 && ptime != $2" "$3 )
      print("older", pname, direc(pdir), alter(pdir))
    pname = ""; next
  }
  { pname = $5; pdir = $4; psame = $1; ptime = $2" "$3 } 
' d1="$1" d2="$2"
  • 1
    The -a binary [ operator is deprecated as it makes for unreliable expressions. In bash, that [ -d "$1" -a -d "$2" ] for instance fails with an error if $1 is =. – Stéphane Chazelas May 26 '23 at 20:51
  • 1
    Note that find "$1" ... fails or won't do what you want if $1 starts with - or is one of !, (, )... – Stéphane Chazelas May 26 '23 at 20:52
  • 1
    %C@ gives the change-status time, while the OP was using -ot which compares the last-modification time (the age of the contents of the files). – Stéphane Chazelas May 26 '23 at 20:53
  • 1
    [[ $1 != $2 ]] in bash, like in ksh (unlike zsh), is to check whether $1 matches the pattern in $2, you need [[ $1 != "$2" ]] for byte-to-byte comparison (or [ "$1" != "$2" ]) – Stéphane Chazelas May 26 '23 at 20:54
  • 1
    It looks like that code assumes directory names don't contain ; characters. – Stéphane Chazelas May 26 '23 at 20:55
  • 1
    It seems the OP only wants to compare the files with same relative path and same exact contents (cmp -s). It doesn't look like your approach compares file contents. – Stéphane Chazelas May 26 '23 at 20:58
  • 1
    Also assumes file paths don't contain newline characters. – Stéphane Chazelas May 26 '23 at 20:58
  • 1
    Also beware that outside the C locale, on several systems, sort order is not guaranteed to be deterministic if file paths can't be decoded as text in the current locale or contain characters with undefined order, so you could end up with lines for identical file paths not next to each other. – Stéphane Chazelas May 26 '23 at 21:05
  • Thanks @StéphaneChazelas - I improved the trail, excepted for the acceptance of all characters in file names. – Thibault LE PAUL May 26 '23 at 21:59
  • 1
    That system("cmp -s " quote(pdir "/" pname)... " introduces a command injection vulnerability, you'd have to do proper single quote quoting for sh (also handling file names with single quotes in them) to avoid it. – Stéphane Chazelas May 26 '23 at 22:03
0

If switching to zsh is an option, then using the same approach, it becomes much easier and a lot more reliable:

#! /bin/zsh -
f1=( ${1?}/**/*(ND.) )  f2=( ${2?}/**/*(ND.) )
f1=( ${f1#$1/}       )  f2=( ${f2#$2/}       )

for f in ${f1:*f2}; do f1=$1/$f f2=$2/$f if cmp -s -- $f1 $f2; then if [[ $f1 -ot $f2 ]]; then print -r -- ${(q+)f1} is older than ${(q+)f2} elif [[ $f2 -ot $f2 ]]; then print -r -- ${(q+)f2} is older than ${(q+)f1} fi fi done

-2

s/\/\.\//\//g use sed, the stream editor to replace /./ with / everywhere in the output

compare_times.sh ad bd | sed -e  "s/\/\.\//\//g"

applying sed inside the script requires that the work be done in a bash function in the script.

function my_function(){
... previous script goes here ...
}
my_function $1 $2 | sed -e  "s/\/\.\//\//g"
Hauke Laging
  • 90,279
  • 2
    you need to escape the dots for sed, as it matches any single character in regexes – ilkkachu May 14 '23 at 15:47
  • You are correct; without escaping the period, this would fail and replace /x/ with / where x is any single valid character in a path. When I tested this, my test cases did not include such a path. Very nice! thanks. I'm updating the post. – Patrick Callahan May 14 '23 at 16:36
  • 1
    You may also want to point out and correct the obvious issues with the user's use of find (several code injection vulnerabilities and at least one syntax error). – Kusalananda May 14 '23 at 16:47
  • Thx! Frist, we also need to strip off the front ./, if present. Second, is it possible to redirect the output of commands as echo \"$1/{} is older than $2/{}\" to sed (with arguments -e \"s/\/\.\//\//g\" or whatever) inside my script? –  May 14 '23 at 23:02
  • @Kusalananda Typo corrected. As for code injection, I'm unsure how to fight it. Though, I plan to use the code only on my private stuff I know (say, a directory and its almost-copy). So it's not that necessary to fight code injection if readability suffers even further. –  May 14 '23 at 23:08
  • 1
    I would use an alternate separator. e.g. s:/\./:/:g – Archemar May 15 '23 at 08:19