0

My question, might sounds a bit strange, but this morning I read this question:
Why is “echo” so much faster than “touch”?
I perfectly understand the concept of built-in function, but I wonder if you take the same example:

#!/bin/bash (or any other shell if it's possible in another)
for file in `find . -name "*.xml"`;
do 
   touch $file; 
done

Is there any possibility of pre-loading touch so you don't need to fork it each time you use it and then become way quicker ?

Kiwy
  • 9,534
  • Problem is the overhead of creating a process and context-switches (including kernelspace → userspace and vice versa), you cannot avoid this. But why would you, if there's another way (: >> $file)¹? Or is this example just a representative one? – Andreas Wiese Apr 09 '14 at 08:30
  • no the example is just to put an example not a working example – Kiwy Apr 09 '14 at 08:31
  • 1
    ¹ I'd really advise against using echo >> $file as replacement of touch, since echo emits a new line that you always append to your file. : (a.k.a. true) does not emit anything, so this is the better way. – Andreas Wiese Apr 09 '14 at 08:31
  • Then I think you can't do it. You could use memlockd or something to lock the programs you have to call often in memory, but that shouldn't help you much since at least after the first call it's in the buffer cache anyway. But you cannot avoid the system call overhead. – Andreas Wiese Apr 09 '14 at 08:32

2 Answers2

1

In practice, the kernel will cache the executable and any file it needs (e.g. libraries) in RAM.

The shell has no way to do anything. If it's an external program, it needs to be executed. Unices (excluding unix emulation layers like Cygwin) tend to make loading a program pretty efficient, but it's never going to be as fast as executing a built-in command.

You can save time by grouping the calls to touch. First, you should never use command substitution on the output of find: this has absolutely no advantage, but several disadvantages:

  • It will break if you have file names containing whitespace or globbing characters.
  • It's slower, because first find needs to complete its traversal of the directory tree, and only then does the processing start.
  • It uses more memory, to store the output of find.

Instead of looping on the results, make find execute the command. find is designed for this kind of use.

find . -name "*.xml" -exec touch {} \;

Now, since touch can process many files in one go, simply replace \; by +, and voilà, touch will be called in batches containing as many files as possible.

find . -name "*.xml" -exec touch {} +

If you need performance in a shell script, builtins will always beat external commands unless you're manipulating large amounts of data. For example, using the shell's string processing constructs is a lot faster (and less error-prone due to quoting issues) than calling sed if you have a line of text. On the other hand, use sed or other specialized external tools if you have millions of lines to process.

If you're creating new files, you can replace the use of touch -- "$file" by : >"$file". There's no such shortcut to change the date of an existing file to the current time.

Shells have a few built-in commands that could be implemented as external commands, but are internal for performance. echo is a prime example. There are also shells with additional builtins for use in rescue situations where external commands may be broken, or when the process table is full, etc. For example, zsh comes with the zsh/files module contains commands such as mkdir and mv, but not touch. Sash has -touch built-in.

0
find . -name "*.xml" -print0 | xargs -0 touch

Learn find | xargs

Joshua
  • 1,893