Need a way to search for missing files

Question

If I have a group of images named as follows in a directory:

filename.jpg
filename.jpg.webp
filename-150x150.jpg
filename-250x250.jpg
filename-250x250.jpg.webp
filename-600x600.jpg
filename-800x800.jpg
filename-800x800.jpg.webp

Is there a way use the find command (or something else) to return each of the files that don't have a .webp version? So in this case it would return the 150x150 and 600x600 files.

Kusalananda · Accepted Answer · 2021-03-10T08:52:31.590

If you plan on using the filenames found by any of the various variations below, then make sure that you don't parse the filenames from the output. Instead, replace the printf with whatever operation that you need to perform. This way, you avoid issues with filenames that contain whitespaces and globbing characters etc. It will also make your code more efficient and beautiful, and will eventually lead to a healthier lifestyle and nicer friends.

See Why is looping over find's output bad practice?

An assumption I'm making is that you'd like to find the JPEG images that does not have a .webp file corresponding to it because you'd like to create the .webp files. This is why I'm using an -e ("name exists") test below rather than a -f ("name exists and is a regular file or symbolic link to such file") or -r ("name exists and is readable") test.

If the .webp name exists, and if it's a directory or a character special file or some other non-regular file type (the -f test would be false), or if it's not readable (the -r test would be false), it would be problematic to create the .webp file as it would overwrite something existing, or be put into a directory, or cause a permission error, depending on how that .webp file was created.

Using a simple loop that iterates over the names matching *.jpg and prints those names out that don't correspond to a *.jpg.webp file:

for name in *.jpg; do
    [ -f "$name" ] && [ ! -e "$name.webp" ] && printf '%s\n' "$name"
done

To save the names in a list, we may use the list of positional parameters:

set --
for name in *.jpg; do
    [ -f "$name" ] && [ ! -e "$name.webp" ] && set -- "$@" "$name"
done
printf 'Needs webp fixed: %s\n' "$@"

In bash, you may want to save the names in a named array:

needs_webp=()
for name in *.jpg; do
    [ -f "$name" ] && [ ! -e "$name.webp" ] && needs_webp+=( "$name" )
done
printf 'Needs webp fixed: %s\n' "${needs_webp[@]}"

In the zsh shell, you could do the testing for the *.webp name in the globbing itself:

printf 'Needs webp fixed: %s\n' *.jpg(.e['[ ! -e $REPLY.webp ]'])

Here, the . in the globbing qualifier at the end of the pattern makes sure that only regular files (not directories etc.) are matched, and e['[ ! -e $REPLY.webp ]'] acts like a test to see whether the current name, $REPLY, should be included in the resulting list or not.

With find:

find . -name '*.jpg' -type f ! -exec test -e {}.webp \; -print

This looks for any regular file with a name matching the pattern *.jpg in the current directory or below, and then prints that name if there is no file corresponding to the same name but with .webp added to the end.

However, the POSIX standard does not guarantee that {} is replaced by the current filename if it's concatenated with some other string, like it is above. So if you want to be certain it'll work, you'd say

find . -name '*.jpg' -type f -exec sh -c '
    for name do
        [ -e "$name.webp" ] && printf "%s\n" "$name"
    done' {} +

As you see, now we're back at the first loop in this answer, more or less. The only difference now is that find is feeding it existing pathnames, recursively from the current directory or any of its subdirectories.

Why do you use -e instead of -f? Wouldn't that return directories as well? — schrodingerscatcuriosity, Mar 10 '21 at 08:21
@schrodigerscatcuriosity Think about the application of this. If the name somename.jpg.webp does not exist (at all) but somename.jpg does, then one may imagine that the .webp file should be created. If the .webp name exists, but is a directory or some other non-regular file, then it would not be apropriate to try to create a file with that name. — Kusalananda, Mar 10 '21 at 08:27
@schrodigerscatcuriosity What I will fix is to make sure that the .jpg names are regular files (or symbolic links to such files). — Kusalananda, Mar 10 '21 at 08:28
@schrodigerscatcuriosity Thinking about this a bit more, I should probably add what I wrote in my first comment as an assumption that I'm making, to my answer. Thanks for pointing this out! — Kusalananda, Mar 10 '21 at 08:46

score 1 · Answer 2 · answered Mar 10 '21 at 06:50

This works for me:

find . -not -name '*.webp' -exec test ! -r {}.webp \; -exec echo {} \;

It finds all names that do not end in .webp. For each of the names found, it tests whether a file with the same name and extension .webp does exist. If not then it prints the file.

It is important to notice that the second -exec only executes if the first returns successfully, i.e., only if the corresponding file with .webp extension does not exist.

Running on your list of files, my output is

./filename-600x600.jpg
./filename-150x150.jpg

schrodingerscatcuriosity · Answer 3 · 2021-03-10T08:47:48.570

Try this with a loop:

# to expand to null string in empty directories¹
shopt -s nullglob
# loop over the jpg files
for i in *.jpg; do 
  # test if there's a file that matches the name but with '.webp' extension
  # if there's a match don't do anything and continue with the loop
 [[ -f "$i.webp" ]] && continue || \
 # else perform an operation
 echo "$i"
done
filename-150x150.jpg
filename-600x600.jpg

¹ 4.3.2 The Shopt Builtin

nullglob

If set, Bash allows filename patterns which match no files to expand  <br>to a null string, rather than themselves.

Need a way to search for missing files

3 Answers3