Why does umount -R
take 0.2s to unmount this subtree of bind mounts? It only took 0.02s to mount the subtree, and 0.00s to change its propagation flags.
(I was looking at cloning the current set of mounts, to a subdirectory, changing them, then switching to them using pivot_mount
. But the delay I'm observing would not really be acceptable for the purpose).
This exercise assumes /
and sub-mounts are shared mounts. Linux doesn't make them so by default, but systemd does.
# mkdir /mnt/a
# mount --bind /mnt/a /mnt/a --make-private
# time mount --rbind / /mnt/a
0.00user 0.00system 0:00.02elapsed 9%CPU (0avgtext+0avgdata 3020maxresident)k
0inputs+0outputs (0major+135minor)pagefaults 0swaps
# time mount --make-rprivate /mnt/a
0.00user 0.00system 0:00.00elapsed 100%CPU (0avgtext+0avgdata 3184maxresident)k
0inputs+0outputs (0major+136minor)pagefaults 0swaps
# time umount -R /mnt/a
0.00user 0.00system 0:00.19elapsed 9%CPU (0avgtext+0avgdata 3392maxresident)k
0inputs+0outputs (0major+194minor)pagefaults 0swaps
Further tests
Running under strace -cw
shows
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.44 0.180113 5297 34 umount2
...
So it's not super revealing, other than pointing out that this last operation requires 34 individual umount2()
calls, whereas both the others only include one call to mount()
, using the MS_REC (recursive) flag. As with the headline figures from time
, this is wall-clock time. strace -c
shows system time (i.e. cpu time spent in the kernel), which only has a total of 0.009s.
Although it does point to something interesting. Using umount -l /mnt/a
instead, reduces the total time to 0.02s. This uses a single umount2()
call to detach the subtree /mnt/a
(and do any clean up in the background).
Looking at the individual calls with strace -ttt -T -e trace=umount2 umount -R /mnt/a
, the distribution of times for individual calls is relatively even; they range from 0.002s to 0.012s, but there is no clear pattern, and the pattern does not look consistent if I repeat it.
After running umount -R
under perf record -a
, perf report
shows several hotspots in gsd-housekeeping
, gvfs-udisks2-volume-monitor
, and systemd
. The umount
process doesn't show up at all. This could explain why time
showed umount
spending neglible cpu time either in the kernel or in userspace.
(I'd be very interested if anyone has a more comprehensive way to aggregate cpu usage per process during a test :).
The other processes are probably doing some processing in response to each mount event.
E.g. systemd seems to be responsible for using 0.13s of one of my four cpus, during a run that took 0.4s:
# systemctl set-property init.scope CPUAccounting=yes
# systemctl show --property CPUUsageNSec init.scope; time umount -R /mnt/a ; systemctl show --property CPUUsageNSec init.scope
CPUUsageNSec=2403124481
real 0m0.408s
user 0m0.015s
sys 0m0.020s
CPUUsageNSec=2534058385
# echo $(( 2534058385 - 2403124481 ))
130933904
But this does not seem to be the right explanation, because the same delay is happening when running in a private mount namespace. In this case, perf record -a
does not show the other processes, only the umount
process (and perf itself).
# unshare -m
# time mount --rbind / /mnt/a
real 0m0.005s
user 0m0.003s
sys 0m0.002s
# time mount --make-rprivate /mnt/a
real 0m0.005s
user 0m0.003s
sys 0m0.002s
# systemctl show --property CPUUsageNSec init.scope; time umount -R /mnt/a ; systemctl show --property CPUUsageNSec init.scope
CPUUsageNSec=3637792026
real 0m0.381s
user 0m0.026s
sys 0m0.018s
CPUUsageNSec=3645973005
# echo $((3645973005-3637792026))
8180979
The cpu doesn't seem to be relevant in this case. I have 4 cpu cores capable of running at 2.3Ghz, but perf stat -a
shows less than 5% cpu usage overall. (Ignore "CPUs utilized, I think that always shows the full value when -a
is used).
# time perf stat -a umount -R /mnt/a
Performance counter stats for 'system wide':
2079.333650 cpu-clock (msec) # 3.998 CPUs utilized
635 context-switches # 0.305 K/sec
23 cpu-migrations # 0.011 K/sec
333 page-faults # 0.160 K/sec
198,278,822 cycles # 0.095 GHz
138,734,277 instructions # 0.70 insn per cycle
31,401,067 branches # 15.102 M/sec
934,327 branch-misses # 2.98% of all branches
0.520083596 seconds time elapsed
real 0m0.543s
user 0m0.038s
sys 0m0.043s
However there is still some process responding to this event... The umount still triggers 78 lines of messages in the system log.
Feb 07 10:34:26 alan-laptop systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 6040 (umount)
Feb 07 10:34:26 alan-laptop systemd[1]: proc-sys-fs-binfmt_misc.automount: Automount point already active?
Feb 07 10:34:26 alan-laptop systemd[1]: proc-sys-fs-binfmt_misc.automount: Got automount request for /proc/sys/fs/binfmt_misc, triggered by 6040 (umount)
Feb 07 10:34:26 alan-laptop systemd[1]: proc-sys-fs-binfmt_misc.automount: Automount point already active?
Note findmnt
shows I avoided creating any horrific recursive-like propagation, e.g. if I run it after the --make-rprivate
:
findmnt -o TARGET,PROPAGATION
TARGET PROPAGATION
/ shared
├─/sys shared
│ ├─/sys/kernel/security shared
│ ├─/sys/fs/cgroup shared
│ │ ├─/sys/fs/cgroup/unified shared
│ │ ├─/sys/fs/cgroup/systemd shared
│ │ ├─/sys/fs/cgroup/net_cls,net_prio shared
│ │ ├─/sys/fs/cgroup/cpu,cpuacct shared
│ │ ├─/sys/fs/cgroup/devices shared
│ │ ├─/sys/fs/cgroup/freezer shared
│ │ ├─/sys/fs/cgroup/perf_event shared
│ │ ├─/sys/fs/cgroup/hugetlb shared
│ │ ├─/sys/fs/cgroup/memory shared
│ │ ├─/sys/fs/cgroup/blkio shared
│ │ ├─/sys/fs/cgroup/cpuset shared
│ │ └─/sys/fs/cgroup/pids shared
│ ├─/sys/fs/pstore shared
│ ├─/sys/fs/selinux shared
│ ├─/sys/kernel/debug shared
│ └─/sys/kernel/config shared
├─/proc shared
│ └─/proc/sys/fs/binfmt_misc shared
├─/dev shared
│ ├─/dev/shm shared
│ ├─/dev/pts shared
│ ├─/dev/mqueue shared
│ └─/dev/hugepages shared
├─/run shared
│ ├─/run/user/1000 shared
│ └─/run/user/42 shared
├─/usr shared
├─/tmp shared
├─/boot shared
└─/mnt/a private
└─/mnt/a private
├─/mnt/a/usr private
├─/mnt/a/sys private
│ ├─/mnt/a/sys/kernel/security private
│ ├─/mnt/a/sys/fs/cgroup private
│ │ ├─/mnt/a/sys/fs/cgroup/unified private
│ │ ├─/mnt/a/sys/fs/cgroup/systemd private
│ │ ├─/mnt/a/sys/fs/cgroup/net_cls,net_prio private
│ │ ├─/mnt/a/sys/fs/cgroup/cpu,cpuacct private
│ │ ├─/mnt/a/sys/fs/cgroup/devices private
│ │ ├─/mnt/a/sys/fs/cgroup/freezer private
│ │ ├─/mnt/a/sys/fs/cgroup/perf_event private
│ │ ├─/mnt/a/sys/fs/cgroup/hugetlb private
│ │ ├─/mnt/a/sys/fs/cgroup/memory private
│ │ ├─/mnt/a/sys/fs/cgroup/blkio private
│ │ ├─/mnt/a/sys/fs/cgroup/cpuset private
│ │ └─/mnt/a/sys/fs/cgroup/pids private
│ ├─/mnt/a/sys/fs/pstore private
│ ├─/mnt/a/sys/kernel/config private
│ ├─/mnt/a/sys/fs/selinux private
│ └─/mnt/a/sys/kernel/debug private
├─/mnt/a/dev private
│ ├─/mnt/a/dev/shm private
│ ├─/mnt/a/dev/pts private
│ ├─/mnt/a/dev/mqueue private
│ └─/mnt/a/dev/hugepages private
├─/mnt/a/run private
│ ├─/mnt/a/run/user/1000 private
│ └─/mnt/a/run/user/42 private
├─/mnt/a/proc private
│ └─/mnt/a/proc/sys/fs/binfmt_misc private
├─/mnt/a/tmp private
├─/mnt/a/boot private
└─/mnt/a/mnt/a private
strace -f -ttt -T -o /path/to/output/file umount ...
should at least show you which system call(s) are taking up the time. – Andrew Henle Jan 18 '18 at 21:46u
and ends inmount
:-P. – sourcejedi Jan 19 '18 at 09:00