7

Suppose I have a folder containing files with names like

file1.txt
file2.txt
file2.txt

etc. I would like to run a command on each of them, like so:

mycommand file1.txt -o file1-processed.txt
mycommand file2.txt -o file2-processed.txt
mycommand file3.txt -o file3-processed.txt

etc.

There are several similar questions on this site - the difference is that I want to insert the -processed test into the middle of the file name, before the extension.

It seems like find should be the tool for the job. If it wasn't for the -o flag I could do

find *.txt -exec mycommand "{}" ";"

However, the {} syntax gives the whole file name, e.g. file1.txt etc., so I can't add the "-processed" in between the filename and its extension. A similar problem exists with using a simple bash for loop.

Is there a simple way to accomplish this task, using find or otherwise?

N. Virgo
  • 173

6 Answers6

25

If all the files to be processed are in the same folder, you don't need to use find, and can make do with native shell globbing.

for foo in *.txt ; do
  mycommand "${foo}" -o "${foo%.txt}-processed.txt"
done

The shell idiom ${foo%bar} removes the smallest suffix string matching the pattern bar from the value of foo, in this case the .txt extension, so we can replace it with the suffix you want.

Kusalananda
  • 333,661
user1404316
  • 3,078
  • To be sure that mycommand is only invoked on regular files, one could insert [ -f "$foo" ] && or test -f "$foo" && in front of mycommand .... – Kusalananda Jan 30 '18 at 10:08
6

If you write a script to be run on various systems and portability is a concern, then nothing beats for loops and ${var%sfx} from user1404316's answer.

However, if you are looking for a convenient way to do similar things on your own system, then I heartily recommend GNU Parallel, which is basically xargs on steroids. Here's a solution for your particular task:

parallel mycommand {} -o {.}-processed.txt ::: *.txt

It will take a list of strings after ::: (shell would expand *.txt to a list of matching filenames) and run mycommand {} -o {.}-processed.txt for each string, replacing {} with input string (just like xargs) and {.} with filename without extension.

You can feed list of input strings via stdin (xargs-way), to pair it with find or locate, but I rarely need anything more than zsh's extended globs.

Take a look at GNU Parallel tutorial to get the idea of what it can do. I use it all the time for batch converts, extracting archives to sudirectories etc.

4

This adapts user1404316’s answer to work with find:

find . -type f -name '*.txt' \
   -exec sh -c 'for file do mycommand "$file" -o "${file%.txt}-processed.txt"; done' sh {} +

(You can type that all on one line; just leave out the \.  I broke it into two lines for readability.)


Another way to format it for readability, that makes the embedded shell script a little bit clearer:

find . -type f -name '*.txt' -exec sh -c '
  for file
  do
    mycommand "$file" -o "${file%.txt}-processed.txt";
  done
' sh {} +

Basically, it creates an unnamed shell script:

for file
do
    mycommand "$file" -o "${file%.txt}-processed.txt"
done

(that’s the string between the single quotes, '…', unrolled) and passes it to the shell as a command (sh -c) with the names of all your .txt files as parameters.  (You generally don’t need to quote {}, and you don’t need curly braces in "$file".)

Wildcard
  • 36,499
  • Very nice solution; this is exactly what I was going to write if you hadn't already done so. :) I added an edit; see if you like it—but I didn't clean up the surrounding text in light of the edit so you may want to have a look. – Wildcard Jan 29 '18 at 20:08
  • @Wildcard: Your edit strikes me as a little verbose, but I suppose it might be useful to others. I don’t see anything in the surrounding text that really needs to be changed. – G-Man Says 'Reinstate Monica' Jan 29 '18 at 20:12
  • That's fine then. If I were to really go all-out to edit it, I would leave just the line-wrapped version I posted, since it makes the unnamed shell script (within the '...') easy to see without needing to present it again separately. But, I'm a bit of a perfectionist about such things. :) – Wildcard Jan 29 '18 at 20:46
4

With zsh:

for f (*.txt) mycommand -o $f:r-processed.txt -- $f
  • avoid using options after non-option arguments. That's supported by few commands, and for most of those that do (like the ones using the GNU getopt API), not when $POSIXLY_CORRECT is in the environment. That also means you can't work around problems with file names that start with - (like we do with -- here) other than by using ./-file-.txt.
  • for var (values) cmd is a handy short form of for loop reminiscent of perl's syntax.
  • zsh is one of those rare shells where $f doesn't need to be quoted (assuming it's not in a mode where it's emulating other shells)
  • $file:r expands to the root name of the file, that is with the extension removed like in csh. It's more or less equivalent to the Korn shell's ${f%.*} with the added benefit that it won't cross / boundaries (with f=./foo, $f:r is ./foo instead of the empty string), though it doesn't apply here.
  • that excludes hidden files (like .foo.txt). If you want them included, you can add the D glob qualifier (*.txt(D)), but you may want to exclude a file called .txt alone as then the output file would be -processed.txt which wouldn't be ideal (and also would not be hidden), by using ?*.txt(D).
  • if there's no *.txt file in the current directory, that will fail with an error (and thankfully not run mycommand at all contrary to with other shells). If you want the loop just not to do anything instead, you can add the N glob qualifier (*.txt(N)).
  • Note that that includes all *.txt files regardless of their type (regular, symlink, directory, fifo, socket, device...). If you want to only consider regular files, you can add the . glob qualifier (or -. to also include symlinks to regular files, that is, the files for which [ -f "$f" ] would return true). So, with all of the above applied:

    for f (?*.txt(ND.)) mycommand -o $f:r-processed.txt -- $f
    
  • to search for *.txt files in subdirectories as well, change it to **/*.txt (with the D flag it also descends in hidden directories).

3

A combination of find, sed and xargs might work, if you need recursion and more complex string substitution:

find . -iname '*.txt' -printf "%p\0-o\0%p\0" | sed -z '3~3s/\.txt$/-processed&/' | xargs -0 -n 3 echo mycommand

Example:

$ find
.
./bar baz.txt
./foo.txt
$ find . -iname '*.txt' -printf "%p\0-o\0%p\0" | sed -z '3~3s/\.txt$/-processed&/' | xargs -0 -n 3 echo mycommand
mycommand ./bar baz.txt -o ./bar baz-processed.txt
mycommand ./foo.txt -o ./foo-processed.txt

The -printf "%p\0-o\0%p\0" prints the file path twice, with -o in between, delimited with the ASCII NUL character, and the sed command inserts a -processed before a trailing .txt on every third line. Then xargs runs the command with three arguments at a time, which would be the filename, -o, and the edited filename.

muru
  • 72,889
0

One could always make use of basename command, which is part of GNU coreutils. From the directory with the files:

for filename in ./file*.txt
do
    mycommand "$filename" -o ./"$(basename -s '.txt' "$filename" )"-processed.txt
done