As described in the kernel commit log linked to by jiliagre above, the nsfs
filesystem is a virtual filesystem making Linux-kernel namespaces available. It is separate from the /proc
"proc" filesystem, where some process directory entries reference inodes in the nsfs
filesystem in order to show which namespaces a certain process (or thread) is currently using.
The nsfs
doesn't get listed in /proc/filesystems
(while proc
does), so it cannot be explicitly mounted. mount -t nsfs ./namespaces
fails with "unknown filesystem type". This is, as nsfs
as it is tightly interwoven with the proc
filesystem.
The filesystem type nsfs
only becomes visible via /proc/$PID/mountinfo
when bind-mounting an existing(!) namespace filesystem link to another target. As Stephen Kitt rightly suggests above, this is to keep namespaces existing even if no process is using them anymore.
For example, create a new user namespace with a new network namespace, then bind-mount it, then exit: the namespace still exists, but lsns
won't find it, since it's not listed in /proc/$PID/ns
anymore, but exists as a (bind) mount point.
# bind mount only needs an inode, not necessarily a directory ;)
touch mynetns
# create new network namespace, show its id and then bind-mount it, so it
# is kept existing after the unshare'd bash has terminated.
# output: net:[##########]
NS=$(sudo unshare -n bash -c "readlink /proc/self/ns/net && mount --bind /proc/self/ns/net mynetns") && echo $NS
# notice how lsns cannot see this namespace anymore: no match!
lsns -t net | grep ${NS:5:-1} || echo "lsns: no match for net:[${NS:5:-1}]"
# however, findmnt does locate it on the nsfs...
findmnt -t nsfs | grep ${NS:5:-1} || echo "no match for net:[${NS:5:-1}]"
# output: /home/.../mynetns nsfs[net:[##########]] nsfs rw
# let the namespace go...
echo "unbinding + releasing network namespace"
sudo umount mynetns
findmnt -t nsfs | grep ${NS:5:-1} || echo "findmnt: no match for net:[${NS:5:-1}]"
# clean up
rm mynetns
Output should be similar to this one:
net:[4026532992]
lsns: no match for net:[4026532992]
/home/.../mynetns nsfs[net:[4026532992]] nsfs rw
unbinding + releasing network namespace
findmnt: no match for net:[4026532992]
Please note that it is not possible to create namespaces via the nsfs filesystem, only via the syscalls clone() (CLONE_NEW...
) and unshare. The nsfs
only reflects the current kernel status w.r.t. namespaces, but it cannot create or destroy them.
Namespaces automatically get destroyed whenever there isn't any reference to them left, no processes (so no /proc/$PID/ns/...
) AND no bind-mounts either, as we've explored in the above example.
snap
bind-mounts those files is so that the corresponding namespace will be kept even when it has no running process. – Stephen Kitt Aug 30 '18 at 08:48ip netns
commands mount net namespaces to keep them up without process. Example of use: to have a bridge staying in its own namespace without any iptables/ebtables/nftables interaction from other namespaces (it can still have its own ebtables rules, its own vlan settings etc.). This bridge is then linked to other namespaces with veth pairs – A.B Sep 02 '18 at 22:46