-1

I would like to do the same as How can I use two bash commands in -exec of find command? but with grep as the second command. The solutions posted for that prior solution don't work when grep is the second command. Another question Combination of find and grep command with exec option inquires about using grep, but all the answers don't use grep. I think I need grep.

For example,

sudo find -D exec . -maxdepth 1 -type f -iname "*" -exec file -N '{}' \; -exec echo 'asdf' \;

works fine, but

sudo find -D exec . -maxdepth 1 -type f -iname "*" -exec file -N '{}' \; -exec grep "JPEG" {} \;

shows no evidence of grep doing anything. How can I force the stdout of the first command to stdin for grep? If I instead, pipe the file command output to a file and run grep separately on the file it works great:

dell@DELL-E6440:~$ rm junk.txt
dell@DELL-E6440:~$ sudo find -D exec . -maxdepth 1 -type f -iname "*" -exec file -N '{}' >> junk.txt \;
dell@DELL-E6440:~$ grep "JPEG" junk.txt
./150120-ssc-proxy~20190508-061623.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI), density 150x150, segment length 16, baseline, precision 8, 1018x1426, frames 3
./avoid-powered-overfight~20190508-061623.jpg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, comment: "Intel(R) JPEG Library, version [2.0.16.48]", baseline, precision 8, 1048x659, frames 3
dell@DELL-E6440:~$

but the point is I want to do it on one bash line, and >> doesn't properly flush the file between runs anyhow.

Brian
  • 39
  • You don't get any output from grep because none of the files you run grep on contains the string JPEG. But thu command is definitely running. – Kusalananda Nov 08 '19 at 07:19
  • Kusalananda, The test files do have the string JPEG in them - I know because when I leave the grep command off the end, I see them. Why did you assert that they do not have JPEG in them? The last clip of command lines shows the output dumped to a file and then you can see the file does have the string "JPEG". – Brian Nov 09 '19 at 13:58
  • Your junk.txt file contains the string JPEG, because it contains the output of the file command. The image files do not contain the word JPEG themselves (except possibly by coincidence as part of the binary image data, but not in your case). Your find commands runs file on the files, and then grep on the same files. The file command generates the output you redirect into junk.txt while the grep command that find executes does not generate any output. – Kusalananda Nov 09 '19 at 14:02
  • @Kusal, ah.. yes. I misread your comment. The test files do not have the test strings; the output of the file command does. My idea was to convert filenames to best-guess file types, while also dragging the filename along in the text stream so both would be printed on the console by grep. – Brian Nov 09 '19 at 14:37

3 Answers3

3

If you want to run file on every file and then grep the output from file for the string JPEG:

find . -maxdepth 1 -type f -exec file {} + | grep JPEG

This runs file on batches of regular files, producing a stream of result output. This stream is then filtered by grep for lines containing the particular string JPEG. Note that filenames containing newlines would be misrepresented in the output of this pipeline.

Your command in the question would run grep on the files themselves, not on the output of the file command. Your find command also uses -iname "*" which is a no-op since all filenames matches that predicate.

Alternatively, using bash to loop over the files in the current directory:

shopt -s dotglob nullglob

for name in ./*; do
    mimetype=$( file --brief --mime-type "$name" )
    if [[ $mimetype == */jpeg ]]; then
        printf '"%s" is a JPEG file\n' "$name"
    fi
done

This would use the MIME-type reported for each file in the current directory by file to filter out the names that correspond to JPEG files.

A slightly expanded example:

shopt -s dotglob nullglob

for name in ./*; do
    mimetype=$( file --brief --mime-type "$name" )
    case $mimetype in
        */jpeg)   printf '"%s" is a JPEG file\n'             "$name" ;;
        */png)    printf '"%s" is a PNG file\n'              "$name" ;;
        image/*)  printf '"%s" is some form of image file\n' "$name" ;;
        *)        printf '"%s" is not an image file\n'       "$name"
    esac
done

And finally, using the MIME-type directly in find:

find . -maxdepth 1 -exec bash -c '[[ $(file -b --mime-type "$1") == */jpeg ]]' bash {} \; -print

or, more efficiently,

find . -maxdepth 1 -exec bash -c '
    for pathname do
        [[ $(file -b --mime-type "$pathname") == */jpeg ]] && printf "%s\n" "$pathname"
    done' bash {} +

These last two commands will just print out the pathnames corresponding to JPEG images.

Kusalananda
  • 333,661
  • Kusalananda, You're better than I am with concise batch loops. Your comment about greping the filenames, rather than the file command output is instructional. I now see both of my execs get their output from the find command; the first does not feed the second. Got it. I need to go learn how the "+" operator changes this. – Brian Nov 09 '19 at 14:04
  • 1
    @Brian The + at the end only means "give me as many files as possible at once". This reduces the number of times the file command is actually executed by running it with many files at once (as file file1 file2 file3 etc). The real difference is that grep then acts on the output of this. grep is not executed by find here, but is a separate step in a pipeline. – Kusalananda Nov 09 '19 at 14:05
  • I notice you introduced maxdepth 1. I wonder if the + has a practical limit either from the find or into the file. After all, the idea here is that I'm searching every file under possibly the entire root file system. – Brian Nov 09 '19 at 14:33
  • @Brian No, you introduced -maxdepth 1, so I used it as well to only search on the first level. You may want to read Understanding the -exec option of `find` – Kusalananda Nov 09 '19 at 16:11
0

Wrapping two commands into the find command wasn't the best idea. Instead, I unwrapped the two operations from within find and instead sequenced them through pipes.

$ sudo find . -type f | file -Nz -f - | grep -f ~/badtypes.txt

This now gives me a way of recursing with the file command, and find any types of file that do not belong on the computer, regardless of whether they've been renamed or compressed. I wonder why the file command doesn't naturally have a recurse option.

Brian
  • 39
  • "I wonder why the file command doesn't naturally have a recurse option." Command file just like cpio rely on a list of filenames, and find is a very flexible way to define files. Where you put -type f -iname "*" you can easily add more tests. –  Nov 08 '19 at 09:08
  • rastafile, Thanks for the ideas about find options. Yes, it's a pretty capable utility. However, my wonder thought was about why file (not find) doesn't naturally have a recurse option. – Brian Nov 09 '19 at 13:54
  • file's natural recurse option is find. Ask the author! He didn't strip the -r off, he implemented -f -. In your case this detail does create some confusion, but there are three quite equal solutions, no need to call for a new "file -r". Unlike grep, file cannot pull advantage (?) of find -exec + . –  Nov 09 '19 at 14:40
0

Kusa's ...-exec file {} + | grep ...

Your find | file -f - | grep

And also this:

]# cat fhs.sh 
#/bin/bash
file -Nz * | grep "magic"

]# find . -exec ./fhs.sh {} +
uout: very short file (no magic)

All seem to work, using the same amount of time. Some other variations also work, but they take longer.


Normally, the find ... -exec grep xxx {} + construct is very efficient (the + instead of \;). But here, with file, you "interfere" with the filenames.

So here, your "flat" (pipeline) approach is OK.

  • 1
    @ rastafile, you used the phrase "interfere with filenames". Actually, the filenames are in the output stream from file so grep should find them. It's just that a second exec (the grep one in my case) gets output directly from the find utility, which includes only filename, not the search patterns grep looks for from the file utility output. – Brian Nov 09 '19 at 14:50