What is a better way to deal with server disconnects of sshfs mounts?

Question

I have several directories mounted through sshfs. I sometimes get disconnects from the server (not configurable by me). I usually mount the directories like this

sshfs user@server.example.com:/home/user /mnt/example

When a server disconnects, the sshfs subsystem doesn't umount / free the directory but instead locks it inaccessible. The mount is still visible when typing mount. When I type

ls /mnt/example

the process gets locked (also Ctrl+c doesn't help). I therefore do

sudo umount -l /mnt/example
# find pid of corresponding process:
ps aux | grep example.com
kill -9 <pid of locked sshfs process>

Is there a better way to deal with this? Obviously sshfs should do the umount and clean up... Ideally it would reconnect automatically.

This is a job for an automatically reconnecting TCP tunnel. – Gilles 'SO- stop being evil' Jun 01 '11 at 11:17 — Gilles 'SO- stop being evil', Jun 01 '11 at 11:17
any solution using maintained software? – Sebastian Jun 02 '11 at 16:37 — Sebastian, Jun 02 '11 at 16:37

score 58 · Accepted Answer · edited Jun 18 '12 at 20:40

58

You can run sshfs with the "reconnect" option. We use sshfs with PAM/automount to share server files for each workstation in our network. We use -o reconnect as parameter for sshfs, mostly because our users suspended their computers and on wake sshfs would not reconnect (or respond, or anything).

For example:

sshfs mvaldez@192.168.128.1:/home/mvaldez/REMOTE /home/mvaldez/RemoteDocs -o reconnect,idmap=user,password_stdin,dev,suid

Just a note, if the remote computer is really down, sshfs may become unresponsive for a long time.

edited Jun 18 '12 at 20:40

bahamat

39,666
4
75
104

answered Jun 18 '12 at 09:30

MV.

989

1

the option password_stdin has to be left out, otherwise the command is not working. Could you explain if / why the other options are necessary. Only using the reconnect option didn't work for me: sshfs again crashed after ~24h – mcExchange Mar 27 '20 at 09:36
@mcExchange the password_stdin option is only needed if you use pam_mount. the idmap option is to translate the UID between local and remote user (there must be a remote user named like your local user, otherwise this option should not be used), the dev/nodev and suid/nosuid options are generic mount options to allow devices and SUID permissions. – MV. Mar 27 '20 at 15:52
@mcExchange as for sshfs crashing, test it by running it in the foreground (not from PAM, but directly from the command line with the -f option) with the sshfs_debug option enabled and wait to see it there is something in the debug messages. That may help you find the problem. And you can try also a different version of sshfs or run it with a debugger like gdb. You can also post your questions on the libfuse/sshfs github repository. – MV. Mar 27 '20 at 16:08
Even though the connection is re-established, in my testing any application currently writing to a file will likely hang or possibly see an I/O error, which may cause it to think the connection is cause and not properly clean up. This is a big problem, for instance, trying to write video files. – Michael Apr 16 '20 at 04:54
Nice in theory, but NOPE, totally does not work. The same level of unresponsiveness prevails – IceFire Feb 04 '23 at 11:12

score 15 · Answer 2 · edited Feb 25 '14 at 21:16

15

This can be worked around by decreasing the timeout. Add the following to $HOME/.ssh/config or /etc/ssh/ssh_config:

ServerAliveInterval 15
ServerAliveCountMax 3

This results in a 45 seconds timeout.

edited Feb 25 '14 at 21:16

PJ Brunet

783

answered Jun 17 '12 at 00:59

Thor

17,182

5

This would only help if the problem is SSH's fault. There's a larger issue that sshfs doesn't deal with death of the underlying ssh process gracefully. – bahamat Jun 18 '12 at 20:41
Indeed this is only a workaround and should be fixed inside sshfs. – Thor Jun 19 '12 at 10:23
1

But only a workaround that deals with one cause out of many. His problem may have nothing to do with keepalives. The nature of the question is less about the cause and more about cleaning up to a consistent state. – bahamat Jun 19 '12 at 17:14

score 11 · Answer 3 · edited May 23 '17 at 14:21

I have a server that I use for storage and for some lack of space where I live, I keep it in another location. In order to bring the files into my network I use a raspberry pi that mount the files from the server using sshfs.

Recently I had to upgrade to raspbian jessie due to a power failure and realised that sshfs become seriously unstable. The folders would be properly mounted but after some time I would not be able to connect to them and the raspberry pi would freeze if I wanted to list the contents of the mounts.

What I tried was:

used reconnect in the fstab
used the ServerAliveInterval and ServerAliveCountMax in the .ssh/config file but to no avail.
other solutions I read on most forums.

but no dice! Until I modified the fstab file as follows:

sshfs#user@server:/remote/folder /local/mount/dir fuse IdentityFile=sshkeyfile,Port=XXX,uid=1000,gid=1000,allow_other,_netdev,ServerAliveInterval=45,ServerAliveCountMax=2,reconnect,noatime,auto 0 0

And it works! No more disconnects! I looks like sshfs does not read the ssh config file for some reason and the keep alive signals were never sent.

Here is explained in detail how the sshfs command is used. Basically you have to write sudo sshfs -o list,of,options,as,above user@server:/path/to/remote/dir /local/path/to/mount/point — lucian, Jul 14 '20 at 06:41

score 4 · Answer 4 · answered Jun 01 '11 at 10:17

4

This sounds like a job for autofs. It's rather adept at handling network mounts of various kinds (nfs, samba, sshfs, you name it) and noticing when those things need re-mounting. It can also takes care of unmounting them after periods of disuse and mounting them when a file system request is made.

answered Jun 01 '11 at 10:17

Caleb

70,105

12

autofs will do the connecting on demand and can unmount when idle (which reduces the problematic time window), but it won't help if sshfs hangs because the server has disconnected. – Gilles 'SO- stop being evil' Jun 01 '11 at 11:17
disarming. @Gilles'SO-stopbeingevil' is right of course. However, the following tries to avoid the issue in a different way: Mount for a short time while needed then disconnect prior to trouble. Good idea at least:
https://www.tjansson.dk/2008/01/autofs-and-sshfs-the-perfect-couple/ – opinion_no9 Mar 31 '24 at 00:11

Totor · Answer 5 · 2021-12-02T01:36:06.117

Well, I found peace in the other answers, but I wanna sum up what's working for me in a simple way:

sshfs -o reconnect,ServerAliveInterval=15 remote.srv:/somedir /local/mymount

I'm not specifying ServerAliveCountMax: it defaults to 3.

So, the above command (with the reconnect option):

will check if the connection to remote.srv is alive every 15 seconds.
If 3 consecutive check fail, it will try to reconnect the sshfs mount (ssh-agent recommended).
If reconnecting fails, processes trying to access remote data through sshfs will receive an I/O error instead of hanging forever.
If the connection with remote.srv becomes available again, the reconnection should happen automatically.

Without the reconnect from the (-o) options, then:

sshfs will still send an error to processes waiting on I/O, but then will umount and exit properly.

no, sorry, this is not in line with the man page: 15 means 45 sec. See man page please: For a more automatic solution, one can use the -o ServerAliveInterval=15 option mentioned above, which will drop the connection after not receiving a response for 3 * 15 = 45 seconds from the remote host. By also supplying -o reconnect, one can ensure that the connection is re-established as soon as possible afterwards. As before, this will naturally lead to loss of data that was in the process of being read or written at the time when the connec‐ tion was interrupted. — opinion_no9, Mar 30 '24 at 23:51

score 3 · Answer 6 · edited Jul 19 '15 at 17:44

If there are still people encountering this problem, I still could not fix it. I did find a working workaround.

The following ruby script did the trick. It creates a folder called "keepalive" over and over. Just keep running this until infinity.

$i =1 
$num =0
begin
    puts("Inside the loop i = #$i" )
    $i +=1
    puts 'creating obj'
    system 'mkdir  /{yourmountpoint}/keepalive'
    sleep 5
    puts 'we did it, it should be still alive'
end while $i > $num

I do not know why this works. But it seems to solve my problem where I am inactive for a minute and everything freezes. It just tries to create a folder at the mounting point and that seems to keep it from disconnecting and freezing everything somehow.

Well, if that works for you, then you don't need a script and ruby interpreter. A single line would do just as well: while true; do mkdir -p /x/y; sleep 2; done — mivk, Nov 17 '15 at 00:10

score 0 · Answer 7 · answered Feb 23 '24 at 17:32

All of my sshfs mounts are at ~/mnt. I therefore put this in my crontab:

*/10 * * * * ls ~/mnt/* > /dev/null 2>&1

This runs every ten minutes. It runs ls on my mount directories. The output is redirected /dev/null. This has worked well for me for years.

score 0 · Answer 8 · answered Mar 30 '24 at 23:55

While unplanned disconnects are an issue with SSHFS (and in the same way ssh connections) the man page tells (quote!):

SSHFS hangs after the connection was interrupted By default, network operations in SSHFS run without timeouts, mirroring the default behavior of SSH itself. As a consequence, if the connection to the remote host is interrupted (e.g. because a network cable was removed), operations on files or directories under the mountpoint will block until the connection is either restored or closed altogether (e.g. manually). Applications that try to access such files or directories will generally appear to "freeze" when this happens. If it is acceptable to discard data being read or written, a quick workaround is to kill the responsible sshfs process, which will make any blocking operations on the mounted filesystem error out and thereby "unfreeze" the relevant applications. Note that force unmounting with fusermount -zu, on the other hand, does not help in this case and will leave read/write operations in the blocking state. For a more automatic solution, one can use the -o ServerAliveInterval=15 option mentioned above, which will drop the connection after not receiving a response for 3 * 15 = 45 seconds from the remote host. By also supplying -o reconnect, one can ensure that the connection is re-established as soon as possible afterwards. As before, this will naturally lead to loss of data that was in the process of being read or written at the time when the connection was interrupted.

What is a better way to deal with server disconnects of sshfs mounts?

8 Answers8

Linked