4

Disclaimer: I am a novice to Unix/Linux, but I am looking forward to learning! I have tried a search on this stackexchange and read the the man find, but I can't seem to figure this out.

I want to use the find ... -exec {} + command to recursively find all files with a particular file extension and run a command on the list of files. There are approximately 100k files that I need to convert. The command that I am running accepts the filename (or a list of filenames, eg f1 f2 f3) as a parameter, but I also need to specify additional parameters to run the command.

What I tried so far:

This works:

find . -iname "*.extension" -exec <command> {} <additional parameters> \;

This doesn't seem to work:

find . -iname "*.extension" -exec <command> {} <additional parameters> +

I get the error message, find: missing argument to '-exec'. I am guessing that I cannot specify additional parameters after the {}?

Some notes:

The command in question takes the filename as the first parameter, and then I need to designate some additional parameters, such as the output directory -o <outputDir> and the variables to extract from the files -v <var1,var2,...>.

I am running this on the terminal in Ubuntu 12.04, if that makes any difference.

Anthon
  • 79,293
ialm
  • 143
  • What command is <command> ? You might want to fix its odd syntax first as it breaks the POSIX Utility syntax guidelines. (All options should precede operands on the command line.) – jlliagre Jun 11 '13 at 23:31
  • @jlliagre <command> is to be replaced by the actual command I'm using, such as ls or rm. In my case, it is a tool that converts from one file format to another, and it does not actually have < or > in the call. – ialm Jun 12 '13 at 16:13

4 Answers4

4
find . -iname "*.extension" -exec sh -c '
  exec <command> "$@" <additional parameters>' sh {} +

See How does this find command using "find ... -exec sh -c '...' sh {} +" work? for details.

  • Thanks, this worked for me! May I ask for an explanation of what is happening here? – ialm Jun 11 '13 at 20:45
  • This worked for a small subset of files that I was testing with, but now that I am trying this on the set of 100000 files, I get "set: Too many arguments." errors. I read that using {} + was faster than {} \;, but I guess I can't use it! Thanks for your answer, though! – ialm Jun 11 '13 at 22:05
  • @ialm, that's a limitation of csh (its builtins have a limit (1000 on the one found on Ubuntu) on the number of arguments), the shell that you must be using in a script called by <command>. You could at least use tcsh (which should be backward compatible with csh), but best is to avoid csh at all for scripting. – Stéphane Chazelas Jun 12 '13 at 10:38
  • There is no evidence csh is involved. – jlliagre Jun 13 '13 at 08:13
  • @jilliagre, yes there's "set: Too many arguments." which is a csh message and no Bourne-like shell set builtin would have this kind of limitation. tcsh could output it as well, but with numbers of arguments you're unlikely to reach. You can reproduce it with csh -c 'set a=($argv)' {1..998} – Stéphane Chazelas Jun 13 '13 at 10:18
  • 1
    @StephaneChazelas - I asked this A asking for someone to explain the above code: http://unix.stackexchange.com/questions/93324/how-does-this-code-work – slm Oct 02 '13 at 18:26
  • FWIW, I don't think the extra exec in the shell command is needed. – Joshua Skrzypek Jul 12 '22 at 17:42
  • 2
    @JoshuaSkrzypek, in some sh implementations, that saves a process. Some other sh implementations do the exec implicitly as an optimisation. – Stéphane Chazelas Jul 12 '22 at 17:55
1

With the + it's going to list multiple filenames separated by spaces in place of {} (and it will be a long list, since you have 100000 files) rather than just a single filename. That being the case, the {} is required to come at the end of the command.

See the find(1) man page under -exec command {} +.

bahamat
  • 39,666
  • 4
  • 75
  • 104
  • That is not a valid argument. There is no technical reason which forbids trailing arguments. The command line calculation would be nearly the same. The only reason is that this is a stupid limitation of both find and xargs. – Hauke Laging Jun 11 '13 at 20:00
  • @HaukeLaging: Take it up with the authors. I am simply stating what is. – bahamat Jun 11 '13 at 20:03
  • If there is not a language problem (and I misunderstand you) then you are not stating what is. This is: "Because of a design decision {} must come at the end." You say (in my understanding): "Because {} expands to many files {} must come at the end. And that is simply not true. – Hauke Laging Jun 11 '13 at 20:26
  • 1
    There's no spaces coming into the picture there. {} is replaced with a list of arguments passed to the command, that's all. spaces in shell command line are used to separate argument to commands, but here, find doesn't start any shell. – Stéphane Chazelas Jun 11 '13 at 20:36
  • My man page on -exec comman {} + states Only one instance of{}' is allowed within the command.but nothing about where{}should be placed. But when testing the command it does require that I place{}` in the end. Weird. – Lii Apr 12 '14 at 18:33
1

Assuming all directories and files have regular names, i.e. not containing spaces, newlines or similar, this should work even with a huge number of files:

find . -iname "*.extension" -exec sh -c '
command="<command>"
additionalParameters="<additional parameters>"
h=$(($#/2))
cmd="$command "
for i in $(seq 1 $h);do
        cmd="$cmd $(eval echo \$$i) "
done
cmd="$cmd $additionalParameters"
$cmd
shift $h
$command "$@" $additionalParameters' sh {} +

Rationale:

When using the + punctuation, find builds a command as large as possible. There are two limitations involved, the maximum number of arguments allowed (should be 128k on Gnu/Linux) and the maximum size of the argument list (should be 2 MB on Gnu/Linux). The issue is the command called requires extra arguments (additional parameters). Adding them overflows the limit leading to the "too many arguments error". The script I suggest split the built parameter list in two parts and run two commands instead of one per block so adding extra arguments do not exhibit the issue.

jlliagre
  • 61,204
  • Thank you for the answer! I opted to be patient and use the slower \; option, and the job should be done in a couple of days. If I ever need to run the job again, I will try this! – ialm Jun 12 '13 at 16:16
  • 1
    Answer updated to explain why I guess it failed with Stephane's script. – jlliagre Jun 13 '13 at 08:12
0

You can use this script:

#! /bin/bash

cmd=echo

test $# -gt 2 || exit 2
num_trailing_args="$1"
[[ $num_trailing_args =~ ^(0|[1-9][0-9]*)$ ]] ||
  { echo "Illegal first argument ('${num_trailing_args}'); aborting"; exit 2; }
test $# -lt $((num_trailing_args+2)) &&
  { echo "Too few arguments; aborting"; exit 2; }
shift
trailing_args=()
for((i=0;i<num_trailing_args;i++)); do
        trailing_args[i]="$1"
        shift
done

"$cmd" "$@" "${trailing_args[@]}"

and then use

find ... -exec args_change_script.sh 3 t1 t2 t3 {} +

The name of the command should not be longer than the name of the script (just to be sure).

Hauke Laging
  • 90,279