3

Say I have a process with an isolated MOUNT namespace pid 1200 the process called unshare() to isolate its namespace from the parent process. I then want to mount a device only accessible outside pid 1200's namespace inside pid 1200 namespace. Is this possible?

I want to mount a device or bind mount a directory on the host inside a running LXC container that has lxc.monitor.unshare = 1 without restarting the container.

user2059857
  • 166
  • 5

3 Answers3

1

Not ideal, but you could always do a NFS mount or other network filesystem.


The part below does not work (at least not with a 4.2 kernel), putting it for reference so one doesn't to try for themselves.

Though when you enter a mount namespace (nsenter -m or setns(CLONE_NEWNS)), your working directory is automatically changed to the root (/) of that namespace, it is still possible to open a directory on some file descriptor, enter the namespace and still have that directory open on that fd (and for instance do a fchdir() on it).

So you'd think this approach might work:

#define _GNU_SOURCE
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <sys/mount.h>
#include <sched.h>

void die(char *msg) {perror(msg); exit(1);}
int main(int argc, char *argv[]) {
  int fd;
  if (argc != 3) {
    fprintf(stderr, "Usage: %s <source-in-current-namespace> <dest-in-namespace-on-stdin>\n");
    exit(1);
  }
  fd = open(argv[1], O_RDONLY|O_DIRECTORY);
  if (fd < 0) die("open");
  if (setns(0, CLONE_NEWNS) < 0) die("setns");
  if (fchdir(fd) < 0) die("fchdir");
  printf("cwd: %s\n", get_current_dir_name());
  if (mount(".", argv[2], 0, MS_BIND, 0) < 0) die("mount");
}

It does work up until the fchdir(), but the mount fails with EINVAL:

# ~/a.out /home /mnt < /proc/1200/ns/mnt
cwd: (unreachable)/home
mount: Invalid argument
  • seems like you could just mount something into the the container's /proc/$pid/root if you grabbed a handle for its < /proc/$pid/ns/mnt, maybe. maybe you'd need to add --make-shared or something. i think nsenter can do it with -wd or something along those lines, but i really don't like setting up lxc. – mikeserv Jan 03 '16 at 20:05
0

I had some success with this - if not with an lxc container, I did manage to make it work for an otherwise private mount namespace. Because lxc is built on the underlying linux namespaces that I was also using I don't see any reason why it shouldn't work for you.

In the first place I setup the namespace like:

sudo unshare -m sh -c '
    mount -ttmpfs none /tmp
    echo x > /tmp/mytmp
    findmnt -o+PROPAGATION /tmp
    echo "$$"
    cd   /tmp
    exec "$0" -i

TARGET SOURCE FSTYPE OPTIONS     PROPAGATION
/tmp   tmpfs  tmpfs  rw          private
/tmp   none   tmpfs  rw,relatime private
29384
$ 

...and I got an interactive shell. The next thing I did in a separate terminal session was...

sudo sh -c ' { cd /dev/fd/0 ; mkdir mnt
               ls -l;         cat mytmp
             } 3<$0/ns/mnt  <$0/29384/cwd
' /proc/29384

drwxr-xr-x 2 root root 40 Jan  4 02:52 mnt
-rw-r--r-- 1 root root  2 Jan  4 02:38 mytmp
x

...which was very encouraging!

But I couldn't get a mount in there - every time I tried to mount a parent ns directory over one in the child ns it failed - miserably. Some research suggests this is by design (in particular: see the caveats in man 7 user_namespaces regarding PROPAGATION flags). What did work, though, was (in a new namespace):

sudo unshare --propagation slave -m sh -c '
     mount -ttmpfs none /tmp; cd /tmp
     exec "$0" -i'

And then in the parent namespace session...

sudo mount --bind / /mnt
sudo mount --bind / /tmp
sudo mount --bind /tmp /mnt/img/tmp

Now the above works in the first case but not in the second. Because the child ns does not propagate fs changes upwards the parent won't affect changes it has made to its fs view. And so because the child has its own mount on /tmp anything the parent does is irrelevant there. However, if there is some common hierarchy and the child ns is configured to receive filesystem changes then it will see changes the parent propagates downward.

In the child ns after running the above...

ls /tmp /mnt /mnt/tmp

/mnt:
bin   dev  etc   lib    mnt  proc  run   srv  tmp  var
boot  esp  home  lib64  opt  root  sbin  sys  usr

/mnt/tmp:
serverauth.FT3Z6IFyWW
systemd-private-...systemd-timesyncd.service-YUkVU6

/tmp:

And so I guess to answer the question - yes, I believe it is possible. But, I'm also fairly sure you'd need to arrange for it to be so ahead of time.

mikeserv
  • 58,310
0

This answer has some working examples that use unshare (not as root) and nsenter (as root inside the container) to cause bind mounts, and invokes them in a synchronized way to do that you want. Real root access is not required

Sadly it requires util-linux 2.39.1 which probably means ubuntu 23.04 or later.

Note that the unshare invocation has what were previously mutually exclusive options.

I'm still trying to get it to work for earlier ubuntu, e.g. 22.04 (the most recent LTS). I'll stop trying once 24.04 is out.