5

I have a shiny new server for running simulations on, with a pair of Tesla GPUs and 32 cores, running CentOS 7.2. I'd like for multiple users to be able to submit jobs to the server that get queued up and run when the previous finishes, preferably with some sort of prioritisation system and time limit, like PBS/TORQUE but for a single machine rather than a cluster. I know I can install and configure TORQUE for a single machine, but it seems like overkill - theoretically, the scheduler should only have to run when jobs finish or run overtime. I can probably homebrew a set of scripts, but I was wondering if a solution already exists?

Jeff Schaller
  • 67,283
  • 35
  • 116
  • 255
  • 1
    I would suggest just using an existing batch scheduler like PBS or TORQUE. Why hack up a sub-optimal solution of your own when there's already something available that is meant to solve exactly the problem you're trying to solve? Sure, you have a "cluster" of 1, but you're still looking for a job submission and scheduling system. – larsks Feb 05 '16 at 02:06
  • ksh93 comes with this ability builtin in the coshell builtin and the library's cs utility's connect stream services. Its associated userspace 3d filesystem viewpaths really enhance the the usability of such things, in my opinion. – mikeserv Feb 05 '16 at 02:24
  • larsks - I'll see how I go, but I find it hard to believe that no-one else has this problem, so I figured I'd ask. Thanks!

    @mikeserv - I can't find any documentation for coshell, and when I try to run it in my install of ksh it can't find the command. Could you point me in the right direction?

    – Yoshanuikabundi Feb 05 '16 at 02:51
  • You need ksh93 - ksh88 will not do. But you might start here. – mikeserv Feb 05 '16 at 02:57
  • @mikeserv - I compiled the beta branch and could get everything to run, but submitting jobs in a test environment with coshell -r localhost /home/yoshanuikabundi/test.sh resulted in a segfault. coshell --man and coshell -h both don't give any useful info. – Yoshanuikabundi Feb 05 '16 at 05:40
  • @Yoshanuikabundi - there's an issue tracker at the github there, and at the link above there are several links to the relevant mail lists. but... maybe try a more stable version...? – mikeserv Feb 05 '16 at 06:08

2 Answers2

0

Consider TaskSpooler -- http://viric.name/soft/ts/.

It seems to work like 'at' but drops everything into the same sequential queue.

0

HTCondor is a cluster software with an excellent support for single-machine installations. They even provide a minicondor Docker image specifically aimed at single-machine setups: https://htcondor.readthedocs.io/en/latest/getting-htcondor/for-docker.html But you can also install it without Docker.

From the official website:

HTCondor can be useful on a range of network sizes, from small to large. On a single machine, HTCondor can act as a monitoring tool that pauses the job when the user uses the machine for other purposes, and it restarts the job if the machine reboots.

HTCondor is partially developed by Red Hat, so it has a good support for RPM-based distributions like CentOS.

tla
  • 103
  • 3
  • As it’s currently written, your answer is unclear. Please [edit] to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center. – Community Mar 02 '23 at 22:42