What is the NSFS filesystem?

Question

The kernel contains a filesystem, nsfs. snapd creates a nsfs mount under /run/snapd/ns/<snapname>.mnt for each installed snap. ls shows it as a 0 byte file.

The kernel source code does not seem to contain any documentation or comments about it. The main implementation seems to be here and the header file here.

From that, it seems to be namespace related.

A search of the repo does not even find Kconfig entries to enable or disable it...

What is the purpose of this filesystem and what is used for?

TheDiveO · Accepted Answer · 2020-01-06T16:19:33.980

As described in the kernel commit log linked to by jiliagre above, the nsfs filesystem is a virtual filesystem making Linux-kernel namespaces available. It is separate from the /proc "proc" filesystem, where some process directory entries reference inodes in the nsfs filesystem in order to show which namespaces a certain process (or thread) is currently using.

The nsfs doesn't get listed in /proc/filesystems (while proc does), so it cannot be explicitly mounted. mount -t nsfs ./namespaces fails with "unknown filesystem type". This is, as nsfs as it is tightly interwoven with the proc filesystem.

The filesystem type nsfs only becomes visible via /proc/$PID/mountinfo when bind-mounting an existing(!) namespace filesystem link to another target. As Stephen Kitt rightly suggests above, this is to keep namespaces existing even if no process is using them anymore.

For example, create a new user namespace with a new network namespace, then bind-mount it, then exit: the namespace still exists, but lsns won't find it, since it's not listed in /proc/$PID/ns anymore, but exists as a (bind) mount point.

# bind mount only needs an inode, not necessarily a directory ;)
touch mynetns
# create new network namespace, show its id and then bind-mount it, so it
# is kept existing after the unshare'd bash has terminated.
# output: net:[##########]
NS=$(sudo unshare -n bash -c "readlink /proc/self/ns/net && mount --bind /proc/self/ns/net mynetns") && echo $NS
# notice how lsns cannot see this namespace anymore: no match!
lsns -t net | grep ${NS:5:-1} || echo "lsns: no match for net:[${NS:5:-1}]"
# however, findmnt does locate it on the nsfs...
findmnt -t nsfs | grep ${NS:5:-1} || echo "no match for net:[${NS:5:-1}]"
# output: /home/.../mynetns nsfs[net:[##########]] nsfs rw
# let the namespace go...
echo "unbinding + releasing network namespace"
sudo umount mynetns
findmnt -t nsfs | grep ${NS:5:-1} || echo "findmnt: no match for net:[${NS:5:-1}]"
# clean up
rm mynetns

Output should be similar to this one:

net:[4026532992]
lsns: no match for net:[4026532992]
/home/.../mynetns nsfs[net:[4026532992]] nsfs   rw
unbinding + releasing network namespace
findmnt: no match for net:[4026532992]

Please note that it is not possible to create namespaces via the nsfs filesystem, only via the syscalls clone() (CLONE_NEW...) and unshare. The nsfs only reflects the current kernel status w.r.t. namespaces, but it cannot create or destroy them.

Namespaces automatically get destroyed whenever there isn't any reference to them left, no processes (so no /proc/$PID/ns/...) AND no bind-mounts either, as we've explored in the above example.

jlliagre · Answer 2 · 2018-08-30T11:07:43.087

11

That's the "Name Space File System", used by the setns system call and, as its source code shows, Name Space related ioctl's (e.g. NS_GET_USERNS, NS_GET_OWNER_UID...)

NSFS pseudo-files entries used to be provided by the /proc file system until Linux 3.19. Here is the commit of this change.

See Stephen Kitt's comment about a possible explanation about this files presence.

edited Aug 30 '18 at 11:07

answered Aug 30 '18 at 07:58

jlliagre

61,204

2

You’re welcome! Regarding the underlying question in the question, I suspect (but I’m not 100% sure) that the reason snap bind-mounts those files is so that the corresponding namespace will be kept even when it has no running process. – Stephen Kitt Aug 30 '18 at 08:48
1

Other example ip netns commands mount net namespaces to keep them up without process. Example of use: to have a bridge staying in its own namespace without any iptables/ebtables/nftables interaction from other namespaces (it can still have its own ebtables rules, its own vlan settings etc.). This bridge is then linked to other namespaces with veth pairs – A.B Sep 02 '18 at 22:46
1

This series of articles on what kernel namespaces are also seems relevant: https://lwn.net/Articles/531114/#series_index – Gert van den Berg Sep 25 '18 at 07:01
just fyi: docker is using nsfs too on archlinux for a while now. – Jan-Stefan Janetzky Jul 26 '19 at 06:40

What is the NSFS filesystem?

2 Answers2

Linked