10

I'm trying to list only non-image files, searching only in the most recent 500 files. So I run

ls -t | head -500 | file | grep -v 'image'

which isn't right: it displays a help message. Changing it to

ls -t | head -500 | xargs file | grep -v 'image'

I now sometimes get the output I want, but if the filename has spaces in it—for example Plutonian\ Nights\ -\ Sun\ Ra.mp3—then xargs will run file Plutonian, file Nights, etc.


How do I either help xargs see the spaces, or otherwise accomplish what I'm trying to accomplish?

  • In popular xargs implementations, the delimiter can be changed, for example to '\n'. This is often helpful when the input is not generated by find. See -d (GNU) and -E (OSX) – MattBianco Feb 27 '17 at 09:39
  • @MattBianco xargs -d '\n\ doesn't appear to recognize newlines properly. – Michael Apr 20 '23 at 22:50

5 Answers5

9

Using xargs, it can be done in this way:

find . -type f -print0 | xargs -0 file | grep -v 'image' 

But xargs is so yesterday. The cool kids use parallel today. Using parallel, it would be:

find . -type f | parallel file | grep -v 'image'

See. No use of -print0 and -0. parallel is really smart by itself.

UPDATE

For listing only the most recent 500 files, your command would be:

ls -1t | head -500 | parallel file {} | grep -v image

Important

In case your parallel is old and above syntax doesn't work, then install the new version of parallel as explained here: http://www.gnu.org/software/parallel/parallel_tutorial.html

shivams
  • 4,565
3

Use "find" with "-print0" option & pipe the output to "xargs" with "-0" option.

Even though I know (and use) this technique, I see that user @Jens has answered a similar question, where you can find more Details :

https://stackoverflow.com/questions/16758525/use-xargs-with-filenames-containing-whitespaces

Prem
  • 3,342
1

I have two crude suggestions that might help. Neither feels particularly satisfying though, so perhaps something better will come up.

First, use sed to add quotes to everything, so you'd only end up with trouble if there are quotes in the file name like

ls -t | head -500 | sed -e 's/\(.*\)/"\1"/' | xargs file | grep -v 'image'

The other is to use the ls to find the 501st most recent then use find to get the newer stuff like

find -newer $(ls -t | head -501 | tail -1) -type f -exec file {} \; | grep -v image
Eric Renouf
  • 18,431
  • 2
    As long as we're going to parse the output of ls, I believe your first snippet would be improved by replacing newlines with nulls (tr \\n \\0) and using xargs -0. – dhag May 01 '15 at 15:24
1

For generic advice regarding processing of file names potentially containing spaces, see Why does my shell script choke on whitespace or other special characters?

The difficulty with what you're trying to do is that there's no nice way to list the N most recent files with standard tools.

The easiest way to do what you're doing here is to use zsh as your shell. It has glob qualifiers to sort files by date. To run file on the 500 most recent files:

file *(om[1,500])

With the Linux file utility, pass the -i or --mime-type option to get output that's easier to parse. Image files are identified by lines ending with image/something.

file --mime-type *(om[1,500]) | sed -n 's~: *image/[^ ]*$~~p'

If you need to cope with absolutely all file names, including those with a newline in their name, use the -0 option for null-delimited output. Recent versions of GNU sed can use null bytes as the record delimiter instead of newlines.

file --mime-type -- *(om[1,500]) | sed -zn 's~: *image/[^ ]*$~~p'

If you don't have zsh, you can use ls and cope with file names that contain spaces but not newlines or trailing spaces by passing the -L1 option to file. This invoked file on one file at a time, so it's slightly slower.

ls -t | head -n 500 | xargs -L1 file --mime-type -- | sed -n 's~: *image/[^ ]*$~~p'
  • "no nice way to list the N most recent files with standard tools" ... "ls -t | head -n 500" ??? – Jonathan Hartley Feb 23 '23 at 02:58
  • @JonathanHartley I mention that in my answer. I also mention its limitations: it only works if your file names don't contain problematic characters. – Gilles 'SO- stop being evil' Feb 23 '23 at 07:26
  • Hey. I know you mention it in your answer, that's my point. The first sentence I quote contradicts the second, and in my opinion ought to be deleted. As other answers note, using ls -print0, or IFS, is the way to quote spaces or other characters in filenames when using standard tools. – Jonathan Hartley Feb 23 '23 at 14:19
-1

You might try

printf "%s\0" $(ls -t | head -500) | xargs -0 file | grep -v image

This forces xargs to null-delimit the file name arguments.

doneal24
  • 5,059