3

either if you fork into a new mount namespace, or enter an existing one.

It is possible to hold file descriptors from a foreign mount namespace. You can demonstrate this very easily, by finding a process in a foreign mount namespace such as [kdevtmpfs], and opening /proc/$PID/root. (If I change to this directory and run /bin/pwd, it seems to print the awesome error message /usr/bin/pwd: couldn't find directory entry in ‘..’ with matching i-node, and strace shows that getcwd() returned (unreachable)/).

Please define what happens to the existing references which a process holds to the current mount namespace - the current directory and current root (chroot) - when entering a new mount namespace.

If neither of these references were modified, there would not be much point entering a new mount namespace. E.g. opening a file /path/to/file would open it from the old mount namespace, if the process' root still pointed into the old mount namespace.

Again, I would like to understand both the case of clone() with CLONENEWNS (like the unshare command), and the case of setns() (like the nsenter command).

sourcejedi
  • 50,249

1 Answers1

2

Both the current working directory, and the root, are reset to the root filesystem of the entered mount namespace.

For example, I have tested that I can escape chroot by running nsenter -m --target $$.

(Reminder: chroot is easy to escape when you are still root. man chroot documents the well-known way of doing this).


Source

https://elixir.bootlin.com/linux/latest/source/fs/namespace.c?v=4.17#L3507

static int mntns_install(struct nsproxy *nsproxy, struct ns_common *ns)
{
    struct fs_struct *fs = current->fs;

Note: current means the current task - the current thread/process.

->fs will be the filesystem data of that task - this is shared between tasks that are threads within the same process. E.g. you will see below that changing the working directory is an operation on ->fs.

E.g. changing the working directory affects all threads of the same process. POSIX-compatible threads like this are implemented using the CLONE_FS flag of clone().

    struct mnt_namespace *mnt_ns = to_mnt_ns(ns), *old_mnt_ns;
    struct path root;
    int err;

...

    /* Find the root */
    err = vfs_path_lookup(mnt_ns->root->mnt.mnt_root, &mnt_ns->root->mnt,
                "/", LOOKUP_DOWN, &root);

here is the line in question:

    /* Update the pwd and root */
    set_fs_pwd(fs, &root);
    set_fs_root(fs, &root);

...

}

...

const struct proc_ns_operations mntns_operations = {
    .name       = "mnt",
    .type       = CLONE_NEWNS,
    .get        = mntns_get,
    .put        = mntns_put,
    .install    = mntns_install,
    .owner      = mntns_owner,
};
sourcejedi
  • 50,249
  • It seems that you're still able to access file even when the filesystem is not mounted in the mount namespace of a process. Like getting a fd from Unix domain socket or pre-open a fd before entering new mount namespace – 炸鱼薯条德里克 May 20 '19 at 17:15
  • @炸鱼薯条德里克 absolutely, yes. So I was thinking, what does change when you switch namespaces? How could it change the process-local / and . to refer to the new namespace, if the same paths did not exist in the new namespace? The answer was the kernel does not preserve the current locations of / and . at all : it resets both of them to point to the root directory in the new namespace. – sourcejedi May 20 '19 at 20:11