reinit NFS client without restart

Question

I have been working on my server, from which I export one directory using NFS. Of course over the week or so of server reboots, I multiple times forgot to umount the export filesystem in my workstation (which gets mounted from /etc/fstab on boot). In between I was able to umount after the fact and remount (I am not using autofs):

umount -fl /data0
mount /data0

But this no longer works.

I cannot mount the exported directory from the server on a different directory (mount hangs), but I can nfs mount that exported dir on a virtual machine running on my workstation.

What I tried is removing (rmmod) the nfs and nfsv3 module (which would not work: Resource temporarily unavailable). lsof hangs. mount doesn't show anything mounted via nfs. This is all probably a result of using 'umount -l' multiple times, but the first two times this worked without a problem.

I have restarted the server in the mean time, after not being able to mount without that making any difference. I also used service nfs-kernel-server restart. I suspect everything would be back to normal if I restart the client workstation.

Is there a way to recover from this and reinitialise the nfs client side on my workstation without a reboot?
If I cannot fix this without reboot, would this not reoccur if I start using autofs?

lsof -b hangs with as last lines:

lsof: avoiding readlink(/run/user/1001/gvfs): -b was specified.
lsof: avoiding stat(/run/user/1001/gvfs): -b was specified.
lsof: WARNING: can't stat() fuse.gvfsd-fuse file system /run/user/1001/gvfs
      Output information may be incomplete.

in the lines preceding that, there is no /data0.

The entry in /etc/fstab:

192.168.0.2:/data0 /data0  nfs  defaults,auto,nolock,user 0 2

"but the first two times this worked without a problem" ... reminds me of Russian roulette. Does lsof -b hang? — muru, Jan 02 '15 at 12:26
@muru Yes it hangs, I updated the Q with the output. BTW, I never heard anyone complain about losing with Russian roulette, so it must be win-win game. I usually expect things to work never, once, or always, not some count X times, but maybe the circumstances were different. — Anthon, Jan 02 '15 at 13:26
Not sure how it works in Ubuntu with upstart and all. You probably want to restart all the services in the nfs-common package, looks like there are a few. Order likely matters as well, so try stopping then starting in order of dependency. You probably also want to do rpcbind as your last stop/first start. I have done this before on Debian, but it just has one nice nfs-common service. — Graeme, Jan 02 '15 at 15:03
@Anthon show us your mount command from your nfs client. A possible tweak in your mount statement could lesson this behavior in the future. — PaperMonkey, Jan 02 '15 at 15:44
@Graeme there is an nfs-common package on Mint, but there are no /etc/init.d entries from that to restart. rpcbind restart doesn't help. — Anthon, Jan 02 '15 at 16:08
@PaperMonkey The command was already in the Q (mount /data0), I have added the full /etc/fstab entry. — Anthon, Jan 02 '15 at 16:09
@Anthon, upstart configurations go in /etc/init, /etc/init.d is only for SystemV scripts - https://help.ubuntu.com/community/UbuntuBootupHowto — Graeme, Jan 02 '15 at 16:13
@Anthon add intr to your mount options, see the nfs man page man nfs for a description of intr. This should help prevent this in the future. You are most likely going to need to reboot to clear this condition, nfs sub system is just locked at this point. — PaperMonkey, Jan 02 '15 at 16:50
@PaperMonkey intr is deprecated and ignored on my 3.13 kernel — Anthon, Jan 02 '15 at 16:57
I suggest to use nfs4 and autofs,is more simple than nfs3 because use only two tcp ports and number of daemons are less — elbarna, May 01 '15 at 02:38

score 7 · Answer 1 · answered May 01 '15 at 22:23

7

As @PaperMonkey suggested in comments, you may be screwed because you used the default mount options, which include retrying forever.

intr used to be a way to make it easier to interrupt things that were stuck on I/O to a broken NFS mount, but now it's a no-op. SIGKILL can still interrupt processes stuck on NFS, at least so nfs(5) claims. See that man page for mount options.

Use soft instead of the default hard if you want NFS not to retry forever.

I also recommend using the automounter. Make symlinks to /net/host/foo/bar somewhere, if you want.

Often it's easier to just reboot, but I think in theory you should be able to kill -9 (i.e. kill -KILL) any processes that are stuck on NFS. THEN umount -f might work. Just be careful not to let tab-completion get more processes stuck on the NFS mount.

answered May 01 '15 at 22:23

Peter Cordes

6,466

In theory, but it's hard to find those processes when lsof hangs. – kmarsh Aug 12 '16 at 13:55
@kmarsh: any process in state D (Disk-sleep) in ps / top is probably stuck on NFS. – Peter Cordes Aug 12 '16 at 14:50
2

Please note that when using "soft" instead of "hard" there is a possibility for data loss each time NFS server is temporary unavailable. – Marki555 Dec 12 '19 at 13:17

score 5 · Answer 2 · answered Mar 01 '16 at 11:38

5

Below is a list of commands to run to fix this issue on a RPM based distro.

service rpcbind stop
service nfslock stop
rm -rf /var/lib/nfs/statd/sm/*
rm -rf /var/lib/nfs/statd/sm.bak/*

After that:

umount -f /share

answered Mar 01 '16 at 11:38

nmishin

151

Eric Sokolowsky · Answer 3 · 2021-06-14T16:13:44.947

I know that I'm really late to this party, but since I had the same question and parts of some of the answers and comments above were useful, I wanted to summarize. In my situation, I was using autofs, one of my NFS mounts was hung, umount -f wasn't working, and lsof was also hanging. My NFS mount uses the hard option among others.

You can use this command to show the status, pid, and command for all processes that are waiting for I/O:

ps -e -o s,pid,cmd | grep ^D

In this case, the ^D is a carat (shift 6) followed by an upper-case D, not a control-D sequence. Then you can inspect these processes and kill the ones that are likely to be related to the hung filesystem. After all of the relevant processes are killed, then you can unmount the file system with:

umount -f fs

Where fs is the file system that is hung, after which it can be remounted. This procedure may help even in the case of not using autofs. I tested all this on Fedora 30.

score 0 · Answer 4 · answered Jun 27 '15 at 00:54

Using autofs will help avoid this issue in the future. The biggest benefit to autofs is that it does not try to mount the directory until you try to use it, this means you avoid broken mount points and that it will not try to mount indefinitely, you can set a timeout period for unmounting (which is normally short). I'm not sure if automount retries at all during this pretimout period, but either way I normally set the automount timeout to only a few seconds.

To resolve the issue without restarting you may be able to get by with umount -a (unmount all mentioned in /etc/fstab) mount -a (mount all in /etc/fstab) but I've unless the directory you've lost contains the home directory you're best off saving work elsewhere and just rebooting.

score 0 · Answer 5 · answered Nov 07 '15 at 17:01

Use the results of the lsof command to find the processes on the client holding references to the stale file system and kill those processes.

umount -f /data0

assure you can ping the server then remount the drive. Restart any desired processes.

Clusters

Note, if you run a cluster server setup you will get a stale nfs file handle every time the server must fail over. To avoid that you should export your file systems using the fsid option. The number for the fsid should be the same for each respective file system on the two servers. It is up to you to assure replication of the files occurs. See the snippet from the man page below:

fsid=num|root|uuid NFS needs to be able to identify each filesystem that it exports. Normally it will use a UUID for the filesystem (if the filesystem has such a thing) or the device number of the device holding the filesystem (if the filesystem is stored on the device). As not all filesystems are stored on devices, and not all filesystems have UUIDs, it is sometimes necessary to explicitly tell NFS how to identify a filesystem. This is done with the fsid= option.

For NFSv4, there is a distinguished filesystem which is the root of all exported filesystem. This is specified with fsid=root or fsid=0 both of which mean exactly the same thing.

Other filesystems can be identified with a small integer, or a UUID which should contain 32 hex digits and arbitrary punctuation.

Linux kernels version 2.6.20 and earlier do not understand the UUID setting so a small integer must be used if an fsid option needs to be set for such kernels. Setting both a small number and a UUID is supported so the same configuration can be made to work on old and new kernels alike.

reinit NFS client without restart

5 Answers5

Clusters

Linked

Related