Trying to get script in Linux console to show duplicate file names instead of an error

Question

I am trying to write a script that is going to run in the Linux environment.

It will look for .jpeg files within a tree of directories and then copy them to a directory called jpegs.

I want to get it to show me the duplicate filenames but only list the duplicate file name once in the list produced (so it doesn't show the same name twice), instead of just showing me the error.

This is the script I have so far:

#!/bin/sh
if  ! mkdir jpegs 2> /dev/null 
  then
     echo " Cannot create directory \"jpegs\" perhaps it already exists."
     echo "     delete the directory and try again."
     exit
fi
for srcpath in $(find fs282/mirror -iname "*.jpg")
   do
      cp --backup $srcpath  jpegs/ 
   done
echo "List of Duplicate Files Follows"

Kusalananda · Answer 1 · 2018-11-04T22:33:33.917

Don't loop over the output of find. It is inelegant (the loop won't start its first iteration until all pathnames are found) and dangerous (the found pathnames will be split on spaces, tabs and newlines, and the shell will also try to expand them as filename globbing patterns).

Instead (this is all assuming you don't want to copy the found files if there is a name clash):

find fs282/mirror -type f -iname '*.jpeg' -exec sh -c '
    for pathname do
        if [ -e "jpegs/${pathname##*/}" ]; then
            printf "%s\n" "${pathname##*/}"
        else
            cp "$pathname" jpegs/
        fi
    done' sh {} + | sort -u

This uses find as a pathname generator for an in-line shell script. find will pass found pathnames to the script and it will iterate over these with each pathname in $pathname. The script tests whether the filename component of the pathname exists under the jpegs directory, and if it does, it prints the filename at the end of the pathname to standard output. If the filename does not exist under jpegs, it copies the file.

The parameter substitution ${pathname##*/} removes everything form the beginning of $pathname up to and including the last / character.], leaving just the filename component at the end.

The sort -u at the end will take all filenames printed by the in-line script and sort them while removing duplicates.

Another approach:

find fs282/mirror -type f -iname '*.jpeg' \
    ! -exec sh -c '[ -e "jpegs/${1##*/}" ] && printf "%s\n" "${1##*/}"' sh {} ';' \
    -exec cp {} jpegs ';' | sort -u

This is essentially the same thing but formulated completely differently.

It tests with a short in-line shell script whether the filename exists under jpegs, and if it does, the filename is printed (and later sorted) and find continues with the next file. If it doesn't exist, the file is copied.

Trying to get script in Linux console to show duplicate file names instead of an error

1 Answers1