I'm running a long-running pipeline from bash, in the background:
find / -size +500M -name '*.txt' -mtime +90 |
xargs -n1 gzip -v9 &
The 2nd stage of the pipeline takes a long time to complete (hours) since there are several big+old files.
In contrast, the 1st part of the pipeline completes immediately, and since the pipe isn't full, and it has completed, find
exits successfully.
The parent bash
process seems to wait
properly for child processes.
I can tell this because there's no find
(pid 20851) running according to either:
ps alx | grep 20851
pgrep -l find
There's no zombie process, nor there's any process with process-id 20851
to be found anywhere on the system.
The bash builtin jobs
correctly shows the job as a single line, without any process ids:
[1]+ Running find / -size +500M -name '*.txt' -mtime +90 | xargs -n1 gzip -v9 &
OTOH: I stumbled by accident on a separate job control command (/bin/jobs
) which shows:
[1]+ 20851 Running find / -size +500M -name '*.txt' -mtime +90
20852 Running | xargs -n1 gzip -v9 &
and which is (wrongly) showing the already exited 20851
find process as "Running".
This is on CentOS (edit: More accurately: Amazon Linux 2 AMI
) Linux.
Turns out that /bin/jobs
is a two line /bin/sh
script:
#!/bin/sh
builtin jobs "$@"
This is surprising to me. How can a separate process, started from another program (sh
), know the details of a process which is managed by another (bash
) after that process has already completed and exited and is NOT a zombie?
Further:
how can it know details (including pid
) about the already exited process, when other methods on the system (ps
, pgrep
) can't?
Edits:
(1) As Uncle Billy noted in the comments, on this system /bin/sh
and/bin/bash
are the same (/bin/sh
is a symlink to /bin/bash
) but /bin/jobs
is a script with a shebang line so it runs in a separate process.
(2) Also, thanks to Uncle Billy: an easier way to reproduce. /bin/jobs
was a red herring. I mistakenly assumed it is the one producing the output. The surprising output apparently came from the bash builtin jobs
when called with -l
:
$ sleep 1 | sleep 3600 &
[1] 13616
$ jobs -l
[1]+ 13615 Running sleep 1
13616 Running | sleep 3600 &
$ ls /proc/13615
ls: cannot access /proc/13615: No such file or directory
So process 13615 doesn't exist, but is shown as "Running" by bash builtin job control, which appears like a bug in jobs -l
.
The presence on /bin/jobs
which confused me to think it must be the culprit (it wasn't), seems confusing and questionable. I believe it should be removed from the system as it is useless (a sh
script running in a separate process, which can't show jobs of the caller anyway).
jobs -l
, even when a process has exited:sleep 1 | sleep 3600 &
... after 2 secsjobs -l
will show both as running, though the firstsleep
has terminated. What version of bash is that? What doesalias | grep jobs
say? – Jan 30 '21 at 22:44$ bash --version
showsGNU bash, version 4.2.46(2)-release (x86_64-koji-linux-gnu)
– arielf Jan 30 '21 at 23:07type /bin/jobs
say? Notice that on centos/rhel/bin/sh
is still bash under another name. – Jan 30 '21 at 23:11$ type /bin/jobs
shows:/bin/jobs is /bin/jobs
I have no alias for jobs, and indeed/bin/sh
is a symlink to/bin/bash
on this system, so same program, different processes. Great questions! – arielf Jan 30 '21 at 23:13execve(2)
, as running/bin/jobs
should be -- unless there's some kind of trick. Another possibility would be that eitherjobs
orbuiltin
is an exported function ;-) – Jan 30 '21 at 23:28/bin/jobs
, its content is what you wrote; but it behaves like we all expect, not like in the question. So if things work for you as you described then it's not because of the Amazon Linux 2 itself (phew!). – Kamil Maciorowski Jan 31 '21 at 05:01jobs
is specified by POSIX and POSIX explicitly requires it as a standalone executable. In this matter Amazon Linux 2 is more POSIX-compliant than e.g. Debian. Note in AL2 there is/usr/bin/cd
as well. – Kamil Maciorowski Feb 01 '21 at 19:15