This other answer is somewhat flawed. The command is
find . -name '*.txt' | head -n 3
Then there's an explanation in one of the comments [emphasis mine]:
head
starts up and waits for input from the lefthand side of the pipe. Then find
starts up and searches for files that match the criteria specified, sending its output through the pipe. When head
has received and printed the number of lines requested, it terminates, closing the pipe. find
notices the closed pipe and it also terminates. Simple, elegant and efficient.
This is almost true.
The problem is find
notices the closed pipe only when it tries to write to it – in this case it's when the 4th match is found. But if there's no 4th match then find
will continue. Your shell will wait! If it happens in a script, the script will wait, despite the fact we already know the pipe output is final and nothing can be added to it. Not so efficient.
The effect is negligible if this particular find
finishes fast by itself but with complex search in a large file tree the command may unnecessarily delay whatever you want to do next.
The not-so-perfect solution is to run
( find … & ) | head -n 3
This way when head
exits, the shell continues immediately. Background find
process may be ignored then (it will exit sooner or later) or targeted with pkill
or something.
To prove the concept you may search for /
. We expect one match only, yet find
looks for it everywhere and it may take a lot of time.
find / -name / 2>/dev/null | head -n 1
Terminate it with Ctrl+C as soon as you see the issue. Now compare:
pidof find ; ( find / -name / 2>/dev/null & ) | head -n 1 ; pidof find
A better solution may be:
yes | head -n 2 \
| find … -print -exec sh -c '
read dummy || kill -s PIPE "$PPID"
' find-sh \;
Notes:
Here we want 3 matched files, but we use head -n 2
(not head -n 3
). After the third matching file, read
finds no input on its stdin and then kill
terminates find
. If we used head -n 3
, then kill
would be triggered after the fourth file.
The signal is SIGPIPE
. kill -s INT …
should work as well. I deliberately chose SIGPIPE
because it's the signal that terminates find
in the simplest solution (find … | head -n 3
).
Running one sh
per matching file will be negligible if you want 3 files. Remember our goal is to avoid this find
(from what I called "not-so-perfect solution") running in the background in vain; for the overall performance of the OS, few short-living shells instead of "abandoned" find
that traverses the filesystem are better for sure. But if you want (at most) 1000 files and the chances are find
may run out of files even earlier (so there may be no problem we want to avoid), then these shells are a burden.
The following code spawns reduced number of sh
processes, but I think it's flawed:
# flawed, DO NOT USE
yes | head -n 999 \
| find … -exec sh -c '
for pathname do
printf "%s\\n" "$pathname"
read dummy || { kill -s PIPE "$PPID"; exit 0; }
done
' find-sh {} +
I had to replace -print
(from the outside of the shell code) with printf …
(inside the shell code). The reason is -print
before -exec sh … {} +
could (and probably would) print too many pathnames.
A potential problem arises: if each printf
created a separate process, then it would make this "optimization" pointless. Fortunately in almost(?) every sh
printf
is a builtin.
But the real flaw is the fact exec sh … {} +
waits for as many pathnames as possible before handing them over to sh
. On one hand this is exactly what reduces the number of sh
processes. On the other hand it's almost certain that when the 1000th match is enqueued, find
will keep searching for 1001st; and when the 1001st is found, probably for even more. Note in this case the 1001st match is the one that would terminate find … | head -n 1000
; so the flawed solution is even worse than the simplest solution, do not use it.
The simplest solution (find … | head -n 3
) will miscount if there's a newline character in one of the printed pathnames. If you want null-terminated strings then the simplest solution will become like find … -print0 | head -z -n 3
, i.e. you will need head
that supports this non-portable option -z
. In our optimized solution you will need neither head -z
nor find -print0
; printf "%s\\0" "$pathname"
in the shell code will be enough.
Counting is done inside sh
by consuming lines from the stdin inherited from find
. Usually you don't pipe anything to find
, but in general you may want to do this for some purpose other than our counting. The other purpose and our counting method will be incompatible then.
yes
is not portable. For our purpose while :; do echo; done
is a portable replacement.
find-sh
is explained here: What is the second sh in sh -c 'some shell code' sh
?
A fellow user asked for a shell function that implements the solution. Here it is:
findn () (
n="$1"
shift
case "$n" in
'' | *[!0123456789]*) echo >&2 not a valid number;
exit 1;;
esac
[ "$n" -eq 0 ] && exit 0
n="$((n-1))"
while :; do echo; done | head -n "$n" \
| find "$@" -exec sh -c '
read dummy || kill -s PIPE "$PPID"
' find-sh \;
)
The first argument is the maximum number of matches you want, the rest will be given to find
. Notes:
Example usage:
findn 2 / -name bin -print 2>/dev/null
find . -name '*.txt' -print -quit
only show the first match and letfind
exit after the first match. I do not know if it is possible to adapt to the case "exit after finding n matches". – N.N. Mar 19 '13 at 09:49