22

I have a bash script that starts up a python3 script (let's call it startup.sh), with the key line:

nohup python3 -u <script> &

When I ssh in directly and call this script, the python script continues to run in the background after I exit. However, when I run this:

ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh"

The process ends as soon as ssh has finished running it and closes the session.

What is the difference between the two?

EDIT: The python script is running a web service via Bottle.

EDIT2: I also tried creating an init script that calls startup.sh and ran ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "sudo service start <servicename>", but got the same behavior.

EDIT3: Maybe it's something else in the script. Here's the bulk of the script:

chmod 700 ${key_loc}

echo "INFO: Syncing files."
rsync -azP -e "ssh -i ${key_loc} -o StrictHostKeyChecking=no" ${source_client_loc} ${remote_user}@${remote_hostname}:${destination_client_loc}

echo "INFO: Running startup script."
ssh -i ${key_loc} -o StrictHostKeyChecking=no ${remote_user}@${remote_hostname} "cd ${destination_client_loc}; chmod u+x ${ctl_script}; ./${ctl_script} restart"

EDIT4: When I run the last line with a sleep at the end:

ssh -i ${key_loc} -o StrictHostKeyChecking=no ${remote_user}@${remote_hostname} "cd ${destination_client_loc}; chmod u+x ${ctl_script}; ./${ctl_script} restart; sleep 1"

echo "Finished"

It never reaches echo "Finished", and I see the Bottle server message, which I never saw before:

Bottle vx.x.x server starting up (using WSGIRefServer())...
Listening on <URL>
Hit Ctrl-C to quit.

I see "Finished" if I manually SSH in and kill the process myself.

EDIT5: Using EDIT4, if I make a request to any endpoint, I get a page back, but the Bottle errors out:

Bottle vx.x.x server starting up (using WSGIRefServer())...
Listening on <URL>
Hit Ctrl-C to quit.


----------------------------------------
Exception happened during processing of request from ('<IP>', 55104)
  • Is there any way we can get more of a description of what the python script does? You'd probably still just get guesses without the full source code, but knowing more about what the python script does might help us make better educated guesses. – Bratchley Nov 14 '14 at 14:08
  • Yep - added to the question. – neverendingqs Nov 14 '14 at 14:40
  • The script might be doing something early on that somehow depends on the attached terminal or something like that and it could be a timing issue: if the session lasts past the first few seconds it works, otherwise it doesn't. Your best option might be to run it under strace if you are using Linux or truss if you are running Solaris and see how/why it terminates. Like for example ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> strace -fo /tmp/debug ./startup.sh. – Celada Nov 14 '14 at 15:05
  • Did you try using the & at the end of the start up script? Adding the & takes away the dependency of your ssh session from being the parent id (when parent ids die so do their children). Also I think this is a duplicate question based on this previous post. The post I submitted to you in the previous sentence is a duplicate of this post which might provide better detail. – Jacob Bryan Nov 14 '14 at 20:37
  • I have tried nohup ./startup.sh & before, but it had the same behaviour. startup.sh contains a fork already (nohup python3 -u <script> &), so I'm pretty sure I don't need to fork again. – neverendingqs Nov 14 '14 at 21:30
  • @Celada When I use strace (ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "strace -fo /tmp/debug ./startup.sh") it seems to start up without forking. It hangs while waiting for a CTRL^C, but it continues running even after I send the signal (which is what I want, without the CTRL^C – neverendingqs Nov 17 '14 at 14:31
  • @neverendingqs ahhh,... the strace stays in the foreground and prevents the problem from presenting itself, unfortunately. Looks like you'll have to insert the strace at a different point: somewhere after the job forks itself into the background. Details would depend on how exactly startup.sh is written. – Celada Nov 17 '14 at 15:10
  • I tried strace -fo /tmp/debug nohup python3 -u <script> &. Not sure what to look for inside /tmp/debug, but the last few lines are here: http://pastebin.com/4XwK47JF – neverendingqs Nov 17 '14 at 17:42
  • It seems somewhat implied by your use of nohup, but to confirm, is your python script is intended to be executed on the remote host as a background process (not locally)? I just went ahead and answered for both case scenarios. – iyrin Dec 26 '14 at 01:17
  • @RyanLoremIpsum it is intended to be executed as a background process on the remote host (not locally). – neverendingqs Dec 30 '14 at 15:25
  • The fundamental problem is the difference between how the remote shell is behaving. When logged in you are using it in interactive mode, which for bash enables job control by default.

    When you use SSH it's in non-interactive, which does not have job control enabled, so any child processes started will be in the same job group and all get terminated when you exit.

    – dbailey Apr 18 '16 at 10:00

7 Answers7

13

I would disconnect the command from its standard input/output and error flows:

nohup python3 -u <script> </dev/null >/dev/null 2>&1 &  

ssh needs an indicator that doesn't have any more output and that it does not require any more input. Having something else be the input and redirecting the output means ssh can safely exit, as input/output is not coming from or going to the terminal. This means the input has to come from somewhere else, and the output (both STDOUT and STDERR) should go somewhere else.

The </dev/null part specifies /dev/null as the input for <script>. Why that is useful here:

Redirecting /dev/null to stdin will give an immediate EOF to any read call from that process. This is typically useful to detach a process from a tty (such a process is called a daemon). For example, when starting a background process remotely over ssh, you must redirect stdin to prevent the process waiting for local input. https://stackoverflow.com/questions/19955260/what-is-dev-null-in-bash/19955475#19955475

Alternatively, redirecting from another input source should be relatively safe as long as the current ssh session doesn't need to be kept open.

With the >/dev/null part the shell redirects the standard output into /dev/null essentially discarding it. >/path/to/file will also work.

The last part 2>&1 is redirecting STDERR to STDOUT.

There are three standard sources of input and output for a program. Standard input usually comes from the keyboard if it’s an interactive program, or from another program if it’s processing the other program’s output. The program usually prints to standard output, and sometimes prints to standard error. These three file descriptors (you can think of them as “data pipes”) are often called STDIN, STDOUT, and STDERR.

Sometimes they’re not named, they’re numbered! The built-in numberings for them are 0, 1, and 2, in that order. By default, if you don’t name or number one explicitly, you’re talking about STDOUT.

Given that context, you can see the command above is redirecting standard output into /dev/null, which is a place you can dump anything you don’t want (often called the bit-bucket), then redirecting standard error into standard output (you have to put an & in front of the destination when you do this).

The short explanation, therefore, is “all output from this command should be shoved into a black hole.” That’s one good way to make a program be really quiet!
What does > /dev/null 2>&1 mean? | Xaprb

jlliagre
  • 61,204
  • nohup python3 -u <script> >/dev/null 2>&1 & and nohup python3 -u <script> > nohup.out 2>&1 & worked. I thought nohup automatically redirects all output though - what's the difference? – neverendingqs Dec 30 '14 at 15:11
  • @neverendingqs, what version of nohup do you have on your remote host? A POSIX nohup isn't required to redirect stdin, which I missed, but it should still redirect stdout and stderr. – Graeme Dec 30 '14 at 15:17
  • Looks like I'm working with nohup (GNU coreutils) 8.21. – neverendingqs Dec 30 '14 at 15:19
  • @neverendingqs, does nohup print any messages, like nohup: ignoring input and appending output to ‘nohup.out’? – Graeme Dec 30 '14 at 15:31
  • Yes - that is the exact message. – neverendingqs Dec 30 '14 at 15:34
  • nohup --help includes this: NOTE: your shell may have its own version of nohup, which usually supersedes the version described here. Please refer to your shell's documentation for details about the options it supports. Could I be using another version of nohup somewhere? – neverendingqs Dec 30 '14 at 15:35
  • whereis nohup points to the same version of nohup. – neverendingqs Dec 30 '14 at 15:43
  • (Should figuring out the difference between an explicit redirect and nohup's redirect be a separate question?) – neverendingqs Dec 30 '14 at 15:44
  • Don't use whereis or which to identify what command is executed for a given name, use the type command instead. – jlliagre Dec 30 '14 at 15:47
  • @neverendingqs, if it is bash then it is basically referring to disown, there is no nohup builtin. The message indicates that the nohup redirects succeeded, so I'm stumped as to why it only works with shell redirects. I think if you can recreate the behaviour more generally with sleep or cat or something and prove that it is not just a quirk of your python script then it would be a good subject for another Q. – Graeme Dec 30 '14 at 16:10
  • I added EDIT5. I'm not sure how to reproduce it in a general case, but I'm suspecting a general cases exists if we can observe a difference. – neverendingqs Dec 30 '14 at 17:02
  • I found a general case and asked the question here: http://unix.stackexchange.com/q/176674/52894 – neverendingqs Dec 30 '14 at 19:57
  • This is the best solution, so I'll mark it as the correct solution. The alternative is to use -t with ssh (as per http://unix.stackexchange.com/q/176674/52894) and add a sleep at the end of the command to prevent nohup from terminating prematurely (as per http://unix.stackexchange.com/a/176416/52894). However, this is a bit more finicky as it uses sleep. – neverendingqs Dec 30 '14 at 20:54
  • Added some explanation to this answer. Is that last redirection of STDERR to STDOUT so that errors from the script are streamed to the terminal? – iyrin Dec 31 '14 at 05:18
  • @RyanLoremIpsum redirects are handled by the shell and the script never sees them as arguments. – Anthon Dec 31 '14 at 07:02
  • Added to the answer to explain why the answer works. – neverendingqs Dec 31 '14 at 13:38
  • Why is the -u option necessary? – DaedalusUsedPerl Feb 12 '20 at 18:43
  • @DaedalusUsedPerl I don't think it matters a lot. The OP used that option so there was no reason to drop it in my answer. – jlliagre Feb 12 '20 at 19:49
  • @jlliagre Thanks for the explanation, it's just that I had a similar issue and using -u turned out to be necessary for me, simply redirecting input and output wasn't enough – DaedalusUsedPerl Feb 13 '20 at 20:20
  • @DaedalusUsedPerl Interesting. To be honest, I answered to this question more than 5 years ago, so maybe did I thought it was needed but forgot why. In any case, the "-u" option is disabling buffering so has an impact on timing. – jlliagre Feb 13 '20 at 23:16
4

Look at man ssh:

 ssh [-1246AaCfgKkMNnqsTtVvXxYy] [-b bind_address] [-c cipher_spec] [-D [bind_address:]port]
     [-e escape_char] [-F configfile] [-I pkcs11] [-i identity_file] [-L [bind_address:]port:host:hostport]
     [-l login_name] [-m mac_spec] [-O ctl_cmd] [-o option] [-p port]
     [-R [bind_address:]port:host:hostport] [-S ctl_path] [-W host:port] [-w local_tun[:remote_tun]]
     [user@]hostname [command]

When you run ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh" you are running the shell script startup.sh as an ssh command.

From the description:

If command is specified, it is executed on the remote host instead of a login shell.

Based on this, it should be running the script remotely.

The difference between that and running nohup python3 -u <script> & in your local terminal is that this runs as a local background process while the ssh command attempts to run it as a remote background process.

If you intend to run the script locally then don't run startup.sh as part of the ssh command. You might try something like ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> && "./startup.sh"

If your intention is to run the script remotely and you want this process to continue after your ssh session is terminated, you would have to first start a screen session on the remote host. Then you have to run the python script within screen and it will continue to run after you end your ssh session.

See Screen User's Manual

While I think screen is your best option, if you must use nohup, consider setting shopt -s huponexit on the remote host before running the nohup command. Alternatively, you can use disown -h [jobID] to mark the process so SIGHUP will not be sent to it.1

How do I keep running job after I exit from a shell prompt in background?

The SIGHUP (Hangup) signal is used by your system on controlling terminal or death of controlling process. You can use SIGHUP to reload configuration files and open/close log files too. In other words if you logout from your terminal all running jobs will be terminated. To avoid this you can pass the -h option to disown command. This option mark each jobID so that SIGHUP is not sent to the job if the shell receives a SIGHUP.

Also, see this summary of how huponexit works when a shell is exited, killed or dropped. I'm guessing your current issue is related to how the shell session ends.2

  1. All child processes, backgrounded or not of a shell opened over an ssh connection are killed with SIGHUP when the ssh connection is closed only if the huponexit option is set: run shopt huponexit to see if this is true.

  2. If huponexit is true, then you can use nohup or disown to dissociate the process from the shell so it does not get killed when you exit. Or, run things with screen.

  3. If huponexit is false, which is the default on at least some linuxes these days, then backgrounded jobs will not be killed on normal logout.

  4. But even if huponexit is false, then if the ssh connection gets killed, or drops (different than normal logout), then backgrounded processes will still get killed. This can be avoided by disown or nohup as in (2).

Finally, here are some examples of how to use shopt huponexit.3

$ shopt -s huponexit; shopt | grep huponexit
huponexit       on
# Background jobs will be terminated with SIGHUP when shell exits

$ shopt -u huponexit; shopt | grep huponexit
huponexit       off
# Background jobs will NOT be terminated with SIGHUP when shell exits
iyrin
  • 1,895
  • According to the bash man page, huponexit should only affect interactive shells and not scripts - 'If the huponexit shell option has been set with shopt, bash sends a SIGHUP to all jobs when an interactive login shell exits.' – Graeme Dec 30 '14 at 14:59
2

Maybe worth trying -n option when starting a ssh? It will prevent remote process dependency on a local stdin, which of course closes as soon as ssh session ends. And this will cause remote prices termination whenever it tries to access its stdin.

PersianGulf
  • 10,850
Georgiy
  • 21
2

I suspect you have a race condition. It would go something like this:

  • SSH connection starts
  • SSH starts startup.sh
  • startup.sh starts a background process (nohup)
  • startup.sh finishes
  • ssh finishes, and this kills the child processes (ie nohup)

If ssh hadn't cut things short, the following would have happened (not sure about the order of these two):

  • nohup starts your python script
  • nohup disconnects from the parent process and terminal.

So the final two critical steps don't happen, because startup.sh and ssh finish before nohup has time to do its thing.

I expect your problem will go away if you put a few seconds of sleep in the end of startup.sh. I'm not sure exactly how much time you need. If it's important to keep it to a minimum, then maybe you can look at something in proc to see when it's safe.

mc0e
  • 1,086
  • Good point, don't think the window for this will be very long though - probably only a few milliseconds. You could check /proc/$!/comm is not nohup or more portably use the output of ps -o comm= $!. – Graeme Dec 29 '14 at 17:33
  • That should work for normal logout, but what about when session is dropped or killed? Wouldn't you still need to disown the job so it's entirely ignored by sighup? – iyrin Dec 30 '14 at 09:29
  • @RyanLoremIpsum: The startup script only needs to wait long enough that the child process is fully detached. After that, it doesn't matter what happens to the ssh session. If something else kills your ssh session in the brief window while that happens, there's not much you can do about it. – mc0e Dec 30 '14 at 14:07
  • @Graeme yeah, I presume it's very quick, but I just don't know enough about exactly what nohup does to be sure. A pointer to an authoritative (or at least knowledgeable and detailed) source on this would be useful. – mc0e Dec 30 '14 at 14:09
  • How about this one - http://lingrok.org/xref/coreutils/src/nohup.c – Graeme Dec 30 '14 at 14:19
  • All it does is mess around with redirects for a bit and then the meat is just signal (SIGHUP, SIG_IGN); and an execvp (basically what I described in my answer). The code should be very quick to execute, although there are some calls it could block on so a delay is conceivable. – Graeme Dec 30 '14 at 14:25
  • Would ssh -i <keyfile> -o StrictHostKeyChecking=no <user>@<hostname> "./startup.sh; sleep 5" work? Or would it have to be inside ./startup.sh? – neverendingqs Dec 30 '14 at 14:25
  • Yes, that should also work, but given that you have a startup script, why not put it in there? You could test for the presence of the $SSH_CONNECTION environment variable if you don't want to slow down other uses. – mc0e Dec 30 '14 at 14:44
  • Just wanted to take the simpler approach in testing. Added EDIT4 based on the results. Looks like there is a race condition happening(?), but now it looks like the forking isn't working as intended (but it works closing the terminal manually works...) – neverendingqs Dec 30 '14 at 14:55
1

This sounds more like an issue with what the python script or python itself is doing. All that nohup really does (bar simplifying redirects) is just set the handler for the HUP signal to SIG_IGN (ignore) before running the program. There is nothing to stop the program setting it back to SIG_DFL or installing its own handler once it starts running.

One thing that you might want to try is enclosing your command in parenthesis so that you get a double fork effect and your python script is no longer a child of the shell process. Eg:

( nohup python3 -u <script> & )

Another thing that may be also be worth a try (if you are using bash and not another shell) is to use the disown builtin instead of nohup. If everything is working as documented this shouldn't actually make any difference, but in an interactive shell this would stop the HUP signal from propagating to your python script. You can add the disown on the next line or the same one as below (note adding a ; after a & is an error in bash):

python3 -u <script> </dev/null &>/dev/null & disown

If the above or some combination of it doesn't work then surely the only place to address the issue is in the python script itself.

Graeme
  • 34,027
  • Would the double fork effect be enough (based on @RyanLoremIpsum's answer)? – neverendingqs Dec 30 '14 at 14:29
  • Both did not resolve the issue =[. If it's a Python issue, do you have an idea on where to start investigating (can't post too much of the Python script here)? – neverendingqs Dec 30 '14 at 15:00
  • @neverendingqs, if you mean the huponexit stuff, running in a subshell should have the same effect as disown as the process won't be added to the jobs list. – Graeme Dec 30 '14 at 15:04
  • @neverendingqs, updated my answer. Forgot that you should use redirects with disown. Don't expect that it will make much difference though. I think you best bet is to alter the python script so that it tells you why it is exiting. – Graeme Dec 30 '14 at 15:06
  • Redirecting the output worked (http://unix.stackexchange.com/a/176610/52894), but I'm not sure what the difference is between explicitly doing it and getting nohup to do it. – neverendingqs Dec 30 '14 at 15:15
0

I think it's because the job is tied to the session. Once that ends any user jobs are ended too.

user208145
  • 2,485
0

If nohup can open up its output file you may have a clue in nohup.out. It is possible python is not on the path when you run the script via ssh.

I would try creating a log file for the command. Try using:

nohup /usr/bin/python3 -u <script> &>logfile &
BillThor
  • 8,965