1

I have many identical files in two (local) devices, and it happens that I have renamed some, only in one device (A). I found the following way to rename the identical files in the other device (B), according to the new names in device A, so that at the end I have identical both files and namefiles, without deleting or copying anything, but only renaming.

Indeed I found an interesting answer is here from Hai Vu (the second answer).

The script works, but there are two problems:

  1. with spaces in filenames: the names are truncated by spaces. and
  2. with apos (') in filenames

The script (rename-identical-files.awk) is

/^total/ {next} # Skip the first line (which contains the total, of ls -l)

{ if (name[$5] == "") { name[$5] = $NF print "# File of size", $5, "should be named", $NF } else { printf "mv '%s' '%s'\n", $NF, name[$5] } }

and it is called from command line (in the destination folder):

awk -f ~/rename-identical-files.awk <(ls -l /model-folder-path) <(ls -l) | sh 

The problem seems ls command, that has limits (at least) with spaces and other characters in filenames.

What code should I write, to avoid space (and apostrophes) problems?

paco
  • 13
  • Unfortunately no – paco Jan 03 '21 at 16:20
  • 3
    So, you're relying on the file sizes to identify which files are the same? Using a checksum would be more accurate. – glenn jackman Jan 03 '21 at 16:52
  • Assuming that 2 files of the same size are identical is a really fragile approach that will almost certainly lead to frequent failures. Use a checkum as @glennjackman suggested or just diff or comm the files. – Ed Morton Jan 03 '21 at 17:53

3 Answers3

1

Yes, there's no easily way to parse the output of ls reliably.

Here, you could use zsh instead:

#! /bin/zsh -
zmodload zsh/stat || exit
typeset -A size_to_name
model_folder=${1?}

for f in $model_folder/*(ND.); do stat -LA size +size -- $f && size_to_name[$size]=$f:t && print -r "# File of size $size should be named ${(q)f:t}" done

for f in *(ND.); do stat -LA size +size -- $f && (($+size_to_name[$size])) && [[ $f != $size_to_name[$size] ]] && print -r mv -i -- ${(qq)f} ${(qq)size_to_name[$size]} done

(to be run as that-script /model-folder-path)

Which should work correctly regardless of what characters or non-characters the file names may contain.

Pipe to sh once you've verified it was correct. Note that we don't check for the case where two files have the same size. In that case, the last in lexical order will be picked (if a and z both have size 42 in the model_folder, then any file of size 42 will be renamed to z in the current folder (though -i will give you a chance to avoid overwriting the first with the second)).

  • After installing zsh it works! Thanks!
    There is just an improvement I wish: as you wrote, I should change every time the script to adapt the model folder: there is no way to change the "model folder" from command line?
    – paco Jan 03 '21 at 16:26
  • @paco, see edit. – Stéphane Chazelas Jan 03 '21 at 16:31
  • perfect!! Many thanks! – paco Jan 04 '21 at 20:04
  • And, to improve this script, it wouldn't be possible to make it recursive? Provided, of course, that the subfolders are identical in both the devices to sync. – paco May 31 '22 at 15:45
  • @paco, sure. See **/ in zsh globs to match any level of subdirectories. – Stéphane Chazelas May 31 '22 at 15:48
  • Thank you! But I should modify a) the (above) script or b) the way (the command) to call it? And how should I do? – paco Jun 02 '22 at 02:26
  • I did some attempts. With this code (in the script)
    for f in **/*(ND.); do

    I get a correct renaming of files in subtress, but it moves files in root to the first subfolder...

    With this code I get a correct preview, but no result (with |sh)

    CDPATH= cd -P -- "$model_folder" || exit for f in **/*(ND.); do

    cd $OLDPWD || exit for f in **/(ND.); do

    – paco Jun 02 '22 at 09:23
  • This code works, but only if I have only two subfolders: for f in ***/***(ND.); do – paco Jun 02 '22 at 13:25
1

It'll be more accurate to use checksums to identify the identical files between the 2 directories: with bash 4.3+ you can do

getFiles() {
    local -n _files=$1
    local dir=${2:-.}
    cd "$dir"
    for file in *; do
        [[ -d $file ]] && continue
        read sum name < <(md5sum "$file")
        _files[$sum]="$file"
    done
    cd -
}

declare -A pwdFiles getFiles pwdFiles

declare -A modelFiles getFiles modelFiles /model-folder-path

for sum in "${!pwdFiles[@]}"; do if [[ -v modelFiles[$sum] ]]; then mv -v "${pwdFiles[$sum]}" "${modelFiles[$sum]}" fi done

glenn jackman
  • 85,964
  • 1
    Note that you need read permission to the files to get the md5sum. You're not checking for failure of md5sum or cd here. And you have a few missing --s. Instead of excluding dirs, it would be better to only include regular files as getting a checksum on other types of files (like fifos, sockets...) is unlikely to do anything useful. – Stéphane Chazelas Jan 03 '21 at 17:59
0

Assuming this will be a one-off operation to fix a previous mistake in renaming, what's wrong with a simple diff, something like (untested):

for newfile in *; do
    for oldpath in /model-folder-path/*; do
        if ! diff -q "$oldpath" "$newfile" >/dev/null; then
            oldfile=${oldpath##*/}
            if [[ "$oldfile" != "$newfile" ]]; then
                if [[ -f "$oldfile" ]]; then
                    echo "clash on $oldfile" >&2
                    exit 1
                fi
                mv -- "$newfile" "$oldfile"
            fi
        fi
    done
done
Ed Morton
  • 31,617