0

How can I do a fast text replace with recursive directories and filenames with spaces and single quotes? Preferably using standard UNIX tools, or alternatively a well-known package.

Using find is extremely slow for many files, because it spawns a new process for each file, so I'm looking for a way that has directory traversing and string replacement integrated as one operation.

Slow search:

find . -name '*.txt'  -exec grep foo {} \;

Fast search:

grep -lr --include=*.txt foo

Slow replace:

find . -name '*.txt' -exec perl -i -pe 's/foo/bar/' {} \;

Fast replace:

# Your suggestion here

(This one is rather fast, but is two-pass and doesn't handle spaces.)

perl -p -i -e 's/foo/bar/g' `grep -lr --include=*.txt foo`
forthrin
  • 2,289

3 Answers3

5

You'd only want to use the:

 find . -name '*.txt'  -exec cmd {} \;

form for those cmds that can only take one argument. That's not the case of grep. With grep:

 find . -name '*.txt'  -exec grep foo /dev/null {} +

(or use -H with GNU grep). More on that at Recursive grep vs find / -type f -exec grep {} \; Which is more efficient/faster?

Now for replacement, that's the same, perl -pi can take more than one argument:

 find . -name '*.txt' -type f -exec perl -pi -e s/foo/bar/g {} +

Now that would rewrite the files regardless of whether they contain foo or not. Instead, you may want (assuming GNU grep and xargs or compatible):

 find . -name '*.txt' -type f -exec grep -l --null foo {} + |
   xargs -r0 perl -pi -e s/foo/bar/g

Or:

 grep -lr --null --include='*.txt' foo . |
   xargs -r0 perl -pi -e s/foo/bar/g

So only the files that contain foo be rewritten.


BTW, --include=*.txt (--include being another GNU extension) is a shell glob, so should be quoted. For instance, if there was a file called --include=foo.txt in the current directory, the shell would expand --include=*.txt to that before calling grep. And if not, with many shells, you'd get an error about the glob failing to match any file.

So you'd want grep --include='*.txt'

  • Aha! So that's what the + is for. I've seen it around (eg. Emacs grep-find), but never pondered it's meaning. Way faster. Good point about not rewriting unchanged files (I assume perl has no shorthand way of avoiding this?) Good answer! One that will make my life easier! – forthrin Mar 30 '18 at 09:22
2

When your find expression is that simple, you can use your shell to do the globbing instead. The main limit you could run up against would be dealing with more files than fit on a command-line.

An example in bash:

$ shopt -s globstar

$ date > a.txt
$ date > b.txt
$ date > c.txt
$ cat *.txt
Thu Mar 29 14:57:57 EDT 2018
Thu Mar 29 14:58:00 EDT 2018
Thu Mar 29 14:58:02 EDT 2018
$ mkdir -p deep/sub/dir
$ mv *.txt deep/sub/dir

$ perl -pi -e 's/EDT/EST/' **/*.txt

$ cat deep/sub/dir/*.txt
Thu Mar 29 14:57:57 EST 2018
Thu Mar 29 14:58:00 EST 2018
Thu Mar 29 14:58:02 EST 2018
Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • Whoa! I didn't know about globstar. Alas, it's not available by default on macOS (GNU bash, version 3.2.57), but there's a Homebrew (4.4.19 ATOW). https://apple.stackexchange.com/questions/291287/ Very nice shorthand way of getting to specific files in subdirectories! – forthrin Mar 30 '18 at 09:20
0

You can use find ... -exec with a "+" terminator instead of ";" to run files in large batches instead of one-at-a-time (provided the command being execed supports multiple files in one invocation):

find . -name '*.txt' -exec grep foo {} +
find . -name '*.txt' -exec perl -i -pe 's/foo/bar/' {} +