system lock up when storage go offline when using nfs mount options bg,hard,nointr

Question

I had the following mount option defined on my system. In the event of any storage outage would these (bg,hard,nointr) causes to lock the console access?

        storage:/vol/myvol on /test type nfs (rw,bg,hard,nointr,rsize=65536,wsize=65536,tcp,nfsvers=3,timeo=600)

What combination of nfs options are consider right practice?

Mark Plotnick · Accepted Answer · 2014-07-29T23:02:17.113

All NFS mount options have good points and bad points.

bg means that, when you try to mount the filesystem (usually during system boot), if the server doesn't respond in time, the mount forks off a process which runs in the background and periodically retries the mount.

If you don't use the bg option, mount will retry and will not exit (nor proceed to mount other filesystems if you used mount -a) until the mount either succeeds or fails.

If you need to mount a filesystem from a server that is often down, and you don't want your system boot to be delayed by it, use the bg option (or an automounter).

The downside of the bg option is that the system may boot without having mounted the remote filesystem, which may cause applications that wish to use that filesystem to fail (or worse, to fill up the local disk with stuff that was meant to be written to the remote filesystem).

So using bg is a choice you get to make.
hard and soft apply after the filesystem is mounted.

If the remote server crashes or is otherwise inaccessible, a hard mount will keep on retrying the i/o request, indefinitely.

A soft mount will return an error to the application, and usually the application will treat this as a nonrecoverable error, as if a local disk drive had been powered down. If the application executable itself was on the remote filesystem that was soft mounted and became inaccessible, then when the local kernel needs to get a page from the remote filesystem, the application will be killed.

So the choice is up to you: when a remote server (or your network) goes down, do you want programs to fail, or do you want them to retry the i/o indefinitely until the remote is reachable again?
with a hard mount, if the remote server gives down, any programs using the remote filesystem will be not be interruptible by signals, in the same way that programs using a local disk aren't interruptible during the (usually tiny) amount of time it takes to do disk i/o. This can frustrate users, because their programs will hang and they can't kill them with control-C. If you want to be able to interrupt programs waiting for NFS i/o, include the intr mount option. It is usually safe to use the intr option; just keep in mind that it may cause programs to see i/o errors (the EINTR error) when interrupted.

One recommendation I have: when using hard-mounted NFS filesystems with remote servers that may go down, do not mount the filesystem on a directory in / (such as /test), or indeed on any directory at the same level as a directory that many people use, such as /home/username. This is because pwd or its C library equivalent walks up through the directory tree, doing stats on directories. If an application does a stat on a hard-mounted NFS mount point that is not responding, it will hang. This is the leading cause of user complaints about NFS: they can't login because their shell does a pwd and some NFS filesystem that they don't even need to use is down. This is another good reason to use an automounter for home directories.

This is our best practice for NFS mounts:

use the automounter
if you can't, then mount each remote filesystem on /n/remoteservername/filesystemname with the options hard,intr.
/n and /n/remoteservername are local directories that are never NFS mount points.
configure updatedb or anything else that looks through the entire directory tree to not look in /n.

Thank you. This is good info. Can you go in more detail on stat that system is doing for the volume are mounted on /test with nfs option hard,nointr. so, what happens when a user ls on / and storage on /test is down or what other process causing to pull that stat? — Raza, Jul 29 '14 at 22:29
ls will use stat() if you give it the -l, -t, -u, -F, or -C options. So when you type ls -C /, ls will do a stat on every file and directory in the / directory, before printing out anything. When it does stat("/test"), if that NFS server is down, ls will hang until the server comes back up. And because it's mounted with nointr, Control-C cannot interrupt it. You need to go to another window and use kill -9 to kill it. — Mark Plotnick, Jul 29 '14 at 22:56
Ok thanks. Anything else like stat that you can think of that could cause nfs to take the control? — Raza, Jul 30 '14 at 00:44
The other NFS operations are used when you actually try to do something with an NFS-mounted file or directory - read, write, chmod, chown, list, create, remove, rename. The stat system call (used by ls, pwd, find, etc.) is the most common way that users inadvertently trigger an NFS operation. — Mark Plotnick, Jul 30 '14 at 15:02
@MarkPlotnick this was a really great answer. Could you please elaborate on the 'use the automounter' bullet point? Which automounter is it? — sandre89, Sep 02 '20 at 03:13
@sandre89 I'll add some more to my answer. Using any automounter helps, whether it's autofs or amd or the automount functionality of systemd. When you use them, the share is mounted only when needed, rather than all the time, which reduces the chances that a crashed server will affect the client. And automounters generally make it easy to configure things such that mount points from distinct servers are in distinct directories, which makes things like pwd fail less often.. — Mark Plotnick, Sep 02 '20 at 19:20

system lock up when storage go offline when using nfs mount options bg,hard,nointr

1 Answers1