0

I have SSH remote access to a machine I'd like to use for long-running jobs. What I currently do is simply

ssh user@remote command-to-run

This has several drawbacks:

  • I can't simply suspend my local machine - when I do that, SIGHUP will be sent to the remote process, effectively killing it. I could use nohup to prevent that.
  • The output may be long, I'd rather have it redirected to files. Of course, I can do it manually, but it gets clumsy with a series of commands.
  • The process may run a really long time. It would be ideal that the submitting program only confirms that the command (script) has been successfully submitted and terminates.
  • I'd like to get a mail notification, when the process terminates, with its exitcode. Of course, I could use a shell script and a terminal command to send it manually, one more hack.
  • I want to be able to schedule multiple scripts at once safely. In particular, I want to be able to push multiple scripts with the same name without manual renaming. I don't want to worry about possible files which already exist on the file system.

This is very similar to what SLURM does, but I don't have any administrative rights on the remote side. Besides, when I have the access to all cores of the remote machine, it makes no sense to declare, how many cores I need.

Is there anything I could use for this? What I described seems like a common usecase.

marmistrz
  • 2,742

1 Answers1

0

If you can put scripts that run these long-running jobs for you on the remote machine, this becomes very easy:

#!/bin/bash
# This script will run a long-running-job (if it's not already running)
# and email when it completes.
lockfile=/var/run/long-job-1.lock
logfile=$(mktemp)
errfile=$(mktemp)
if [[ -f "$lockfile" ]]; then
    echo "This job is already running." 1>&2
    exit 1
else
    echo $$ > "$lockfile"
    trap 'rm -f "$lockfile" "$logfile" "$errfile"' EXIT
fi

/path/to/some/really/longrunning/job.sh
returncode=$?

if [[ 0 -ne "$returncode" ]]; then
    cat "$errfile" | mailx -s "Job failed with exit code $returncode" -a "$logfile" yourself@example.com
else
    cat "$logfile" | mailx -s "Job succeeded" yourself@example.com
fi

Put that script on the remote server in your home directory as longjob1.sh. Then, locally, you can:

ssh username@remotehost "screen -dmS LongJob1 ./longjob1.sh"

The script (and the job it invokes) will run in a screen session on the remote server, and email you when it is done. If it exits in error, you will be emailed the error log, with the standard log attached to the email.

DopeGhoti
  • 76,081
  • I am curious as to why this was downvoted, as it provides a solution to the question posed. A comment explaining why this was perceived to be a bad answer (particularly in the absence of any others) would be gratefully received. – DopeGhoti Jan 20 '17 at 18:17
  • Sorry. is because the question is not nice, the guy just dont like to use google or something like this. But the answer is nice. I removed the -1 as you ask. – Luciano Andress Martini Jan 20 '17 at 18:59
  • Edited the OP. Two problems with this solution are mentioned there - I can't safely run schedule multiple commands at once. What if LongJob1 already exists? – marmistrz Jan 20 '17 at 21:38
  • That's what the start of the script is for - it drops a looks for a .lock file when it starts. If it's there, it aborts and barks; otherwise it writes the file and sets a trap to remove is when the script exits. You will very briefly have two screen sessions with the name LongJob1 in that instance. – DopeGhoti Jan 20 '17 at 21:40