per process private file system mount points

Question

I was checking unshare command and according to it's man page,

   unshare - run program with some namespaces unshared from parent

I also see there is a type of namespace listed as,

 mount namespace
              mounting and unmounting filesystems will not affect rest of the system.

What exactly is the purpose of this mount namespace? I am trying to understand this concept with the help of some example.

See also kernel: Namespaces support and Michael Kerrisk's articles on namespaces on LWN. — Gilles 'SO- stop being evil', Sep 04 '14 at 21:59
@Gilles, thanks. I will check it out. In the mean time, please let me know if something else needs to be added in the answer. — Ramesh, Sep 04 '14 at 23:15

score 34 · Accepted Answer · edited May 13 '21 at 20:14

Running unshare -m gives the calling process a private copy of its mount namespace, and also unshares file system attributes so that it no longer shares its root directory, current directory, or umask attributes with any other process.

So what does the above paragraph say? Let us try and understand using a simple example.

Terminal 1:

I do the below commands in the first terminal.

#Creating a new process
unshare -m /bin/bash
#creating a new mount point
secret_dir=`mktemp -d --tmpdir=/tmp`
#creating a new mount point for the above created directory. 
mount -n -o size=1m -t tmpfs tmpfs $secret_dir
#checking the available mount points. 
grep /tmp /proc/mounts

The last command gives me the output as,

tmpfs /tmp/tmp.7KtrAsd9lx tmpfs rw,relatime,size=1024k 0 0

Now, I did the following commands as well.

cd /tmp/tmp.7KtrAsd9lx
touch hello
touch helloagain
ls -lFa

The output of the ls command is,

ls -lFa
total 4
drwxrwxrwt   2 root root   80 Sep  3 22:23 ./
drwxrwxrwt. 16 root root 4096 Sep  3 22:22 ../
-rw-r--r--   1 root root    0 Sep  3 22:23 hello
-rw-r--r--   1 root root    0 Sep  3 22:23 helloagain

So what is the big deal in doing all this? Why should I do it?

I open another terminal now (terminal 2) and do the below commands.

cd /tmp/tmp.7KtrAsd9lx
ls -lFa

The output is as below.

ls -lFa
total 8
drwx------   2 root root 4096 Sep  3 22:22 ./
drwxrwxrwt. 16 root root 4096 Sep  3 22:22 ../

The files hello and helloagain are not visible and I even logged in as root to check these files. So the advantage is, this feature makes it possible for us to create a private temporary filesystem that even other root-owned processes cannot see or browse through.

From the man page of unshare,

mount namespace Mounting and unmounting filesystems will not affect the rest of the system (CLONE_NEWNS flag), except for filesystems which are explicitly marked as shared (with mount --make-shared; see /proc/self/mountinfo for the shared flags).

It's recommended to use mount --make-rprivate or mount --make-rslave after unshare --mount to make sure that mountpoints in the new namespace are really unshared from the parental namespace.

The memory being utilized for the namespace is VFS which is from kernel. And - if we set it up right in the first place - we can create entire virtual environments in which we are the root user without root permissions.

References:

The example is framed using the details from this blog post. Also, the quotes of this answer are from this wonderful explanation from Mike. Another wonderful read regarding this can be found from the answer from here.

this feature makes it possible for us to create a private temporary filesystem that even other root-owned processes cannot see or browse through. And compared to chroot, with chroot files are visible to others. This is amazing, and that sentence probably should be like at the top of the answer. +1ed. — Sergiy Kolodyazhnyy, Sep 12 '18 at 04:06
Nothing escapes the root! Using nsenter you can enter the namespace and view the temporary files. Assuming only one unshare (the one owning the tempdir), then sudo nsenter -t $(pgrep -P $(ps aux | grep unshare | grep -v grep | awk '{print $2}')) -m -p will allow the contents to be viewed — earcam, Dec 02 '18 at 23:37
Actually the very important point is completely missed. But this is really has matter. Suppose we have mount namespace A and mount point /01. Then we create mount namespace B from A. Now we have /01 in A. If we create file in A in /01 immidiately we will see that file in /01 in B. — Alex, Sep 21 '20 at 18:29
Such filesystems are accessible in /proc/$pid/root. Mountpoints are also visible on /proc/$pid/mount{s,stats,info}. — quant2016, May 25 '23 at 20:10

Alex · Answer 2 · 2022-02-05T03:42:25.613

Very very important point about mount namespaces is completely missed.

I m not going to give big detailed explanation but will give you some flavour.

When we use two mount namespaces it does NOT mean that we have two independent file systems. it is completely wrong.

example.

we have mount point /01 in mount name space A

next we create mount namespace B from A.

now we have /01 in mount namespace B.

next we make /01 in B as private

(namespace B) # mount --make-private /01

next we create file in A

(namespace A) # touch /01/a.txt

we will see that file in B /01

next we create b.txt in B

(namespace B)# touch /01/b.txt

and we will see b.txt in A /01

So. there is no any independence between mount namespaces.

there is 100% transparancy as for simple files and simple folders between two mount points when one mount point in one namespace is the source for another mount point for another namespace. It doesnt matter what options you will assign for mountpoints (shared, private, slave). it will not help at all.

So if you think you make new mount namespace assgin private options for all mountpoints in new namespace and get independents filesystem - it is completely wrong.

The real independence is related ONLY for NEW SUB-mountpoints.

Also if you make new sub-mount point in new namespace in general it does not mean that this submount point is independent from another mount namespace. the point is that every mount point has backend ( for instance some real physical disk). So if you know the backend you can mount it and make changes.

(namespace A) # mount /dev/sdb1 /mnt
(namespace A) # mount --make-private /mnt
(namespace A) # unshare -m bash
(namespace B) #

return to namespace A

(namespace A) # mkdir /mnt/01
(namespace A) # mount /dev/sdc1 /mnt/01
(namespace A) # mount --make-private /mnt/01
(namespace A) # touch /mnt/01/a.txt

we will not see a.txt in namespace B

(namespace B) # ls -1al /mnt/01

it will show nothing.

so all is fine at the moment.

but when we know that for /mnt/01 backend is /dev/sdc1 we can mount this backend in namespace B and at last will see a.txt

(namespace B) # mkdir /mnt/02
(namespace B) # mount /dev/sdc1 /mnt/02
(namespace B) # ls -1al /mnt/02/a.txt

victory

Finally, as a conclusion - mount namespaces are tricky things and you must understand all the details under the hood very good to make really independent file system or get the result you want to get from them.

VasyaNovikov · Answer 3 · 2018-12-02T21:58:20.230

2

If you have bubblewrap installed on your system, you can do it easily in one step:

bwrap --dev-bind / / --tmpfs /tmp bash

In the example above, inner bash will have its own view on /tmp.

Solution inspired by @Ramesh-s answer - thanks for it!

edited Dec 02 '18 at 21:58

answered Sep 24 '18 at 13:00

VasyaNovikov

1,246

per process private file system mount points

3 Answers3

Linked