6

I have to run some tests on a server at the University. I have ssh access to the server from the desktop in my office. I want to launch a python script on the server that will run several tests during the weekend.

The desktop in the office will go on standby during the weekend and as such it is essential that the process continues to run on the server even when the SSH session gets terminated.

I know about nohup and screen and tmux, as described in questions like:

What am I doing right now is:

  • ssh username@server
  • tmux
  • python3 run_my_tests.py -> this script does a bunch of subprocess.check_output of an other script which itself launches some Java processes.
  • Tests run fine.
  • I use Ctrl+B, D and I detach the session.
  • When doing tmux attach I reobtain the tmux session which is still running fine, no errors whatsoever. I kept checking this for minutes and the tests run fine.
  • I close the ssh session

After this if I log in to the server via SSH, I do am able to reattach to the running tmux session, however what I see is something like:

Traceback (most recent call last):
  File "run_my_examples.py", line 70, in <module>
  File "run_my_examples.py", line 62, in run_cmd_aggr
  File "run_my_examples.py", line 41, in run_cmd
  File "/usr/lib64/python3.4/subprocess.py", line 537, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib64/python3.4/subprocess.py", line 858, in __init__
    restore_signals, start_new_session)
  File "/usr/lib64/python3.4/subprocess.py", line 1456, in _execute_child
    raise child_exception_type(errno_num, err_msg)
PermissionError: [Errno 13] Permission denied

I.e. the process that was spawning my running tests, right after the end of the SSH session, was completely unable to spawn other subprocesses. I have chmoded the permissions of all files involved and nothing changes.

I believe the servers use Kerberos for login/permissions, the server is Scientific Linux 7.2.

Could it be possible that the permissions of spawning new processes get removed when I log off from the ssh sessions? Is there something I can do about it? I have to launch several tests, with no idea how much time or space they will take...


  • The version of systemd is 219
  • The file system is AFS, using fs listacl <name> I can confirm that I do have permissions over the directories/files that are used by the script.
Bakuriu
  • 817
  • Are you using systemd 230 or more recent? Possibly related to https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=825394 – user4556274 Aug 05 '16 at 14:07
  • @user4556274 Trying systemd-run --version tells me the server is using systemd 219. – Bakuriu Aug 05 '16 at 14:11
  • @user4556274 In any case I can confirm that the problem is not that the processes are killed, in fact they do keep running, but that they behave differently. As I said I have a script that contains a simple while True: loop and that script runs fine, is not killed and I'm able to resume it without problems. I have also tried to add some subprocess and multiprocessing calls and they work fine. I'm at a loss of what is happening. – Bakuriu Aug 05 '16 at 14:14
  • Does the python script reside on an AFS or NFS filesystem? – Mark Plotnick Aug 05 '16 at 15:32
  • @MarkPlotnick According to df -T it's an AFS file system. could this be that the various files are on different devices and I need a session to let a program access the data on other part of the file system? – Bakuriu Aug 05 '16 at 15:36
  • @MarkPlotnick Checking with fs listacl <name> I am listed as having normal access to all relevant directories/files. – Bakuriu Aug 05 '16 at 15:41
  • 1
    An explanation of what's going on is http://stackoverflow.com/questions/23571012/how-to-provide-an-already-running-process-with-kerberos-and-afs-ticket . I don't have experience with AFS, but some pointers on how to extend access to the Kerberos tickets and AFS are given in that question's answers. – Mark Plotnick Aug 05 '16 at 15:44
  • @MarkPlotnick Seems like launching a new window with kinit && aklog is doing the trick. I'm now verifying this by detaching, waiting some time rebooting and see if the processes are still running, or if they get the permission error. – Bakuriu Aug 05 '16 at 16:08

3 Answers3

9

Thanks to Mark Plotnick I was able to identify and fix the issue.

The problem is the interaction between the AFS file system used by the server and Kerberos handling the authentication. The same issue was brought up in this question on SO.

Basically what is happening is that when I ssh into the server, Kerberos gives the authentication token to the session. This token is used also to access the AFS file system. When closing the SSH session this token gets destroyed and the processes running start to get permission denied errors when trying to access files on the AFS.

The way to fix this is to start a new window inside screen/tmux and launch the command:

kinit && aklog

After that you can detach from screen/tmux and close the ssh session safely.

The commands above create new Kerberos tokens and associate those with the screen/tmux session, in this way when the ssh connection is closed the initial tokens get revoked but since the subprocesses now use those you created they don't suffer permission denied errors.


To summarize:

  • ssh username@server
  • tmux
  • Launch the process you need to keep running
  • Create a new window with Ctrl+B, C
  • kinit && aklog
  • Detach from the session with Ctrl+B, D
  • Close ssh session
Bakuriu
  • 817
0

Such errors probably related to the filesystem permissions. Can you take a look on the server-side syslog events?

Maybe you need to stay logged in your environment Take a look here to know more about Linux file permissions and issues, it may help.

0

Try screen ssh $USER@$HOSTNAME on server.

The kinit && aklog solution didn't work for me, but I found this "sshception" solution. Within screen, I ssh into the same machine and run my programs in that ssh session. Even if screen loses permission, the ssh session inside stays open and authenticated.