2

I want to run a nohup job and to that already running process run a specific new command (e.g. kerberos authentication). In fact the ideal solution would be to first run the reauth command and then the real job all under the same nohup process id. So that the nohup processes never loses the kerberos ticket. So I want to run reauth synchornously to another python script so that neither lose their kerberos ticket -- without screen or tmux or anything that requires me to interact with the script. I basically want to implement a daemon running a python job (or any) that never loses it's ticket until it's done running. How does one do that?

This is my current attempt but I can't quite tell if it's working

# - set up this main sh script
source ~/.bashrc
source ~/.bash_profile
source ~/.bashrc.user
echo HOME = $HOME

source cuda11.1

conda init bash conda activate metalearning_gpu

- get a job id for this tmux session

export SLURM_JOBID=$(python -c "import random;print(random.randint(0, 1_000_000))") echo SLURM_JOBID = $SLURM_JOBID export OUT_FILE=$PWD/main.sh.o$SLURM_JOBID export ERR_FILE=$PWD/main.sh.e$SLURM_JOBID export WANDB_DIR=$HOME/wandb_dir

export CUDA_VISIBLE_DEVICES=4 echo CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES python -c "import torch; print(torch.cuda.get_device_name(0));"

- CAREFUL, if a job is already running it could do damage to it, rm reauth process, qian doesn't do it so skip it

top -u brando9

pkill -9 reauth -u brando9

- expt python script then inside that python pid attach a reauth process

should I run rauth within python with subprocess or package both the nohup command and the rauth together in badsh somehow

#python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE & #nohup (echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE) & nohup (echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/main.py > $OUT_FILE 2> $ERR_FILE) &

other option is to run echo $SU_PASSWORD | /afs/cs/software/bin/reauth inside of python, right?

export JOB_PID=$! echo JOB_PID = $JOB_PID

- Done

echo "Done with the dispatching (daemon) sh script

likely because it's not ran for long enough? Not sure.

I think another alternative would be to run the reauth inside of the python script itself. Not tested it but might work? The main trick is to not like my process with nohup die due to losing the kerberos ticket.

The contents of reauth is:

(metalearning_gpu)~ $ cat /afs/cs/software/bin/reauth
#!/usr/bin/perl
# $Id: reauth 2737 2011-06-20 18:14:05Z miles $
#
# Original version (C) Martin Schulz, 2'2002
# University Karlsruhe
#
# Modifications by Miles Davis <miles@cs.stanford.edu>
#  Super minimal -- call programs rather than functions to reduce dependence
#  on extra perl modules.
#
# Heimdal patches thanks to Georgios Asimenos <asimenos@cs.stanford.edu>
#

General:

##########

This little script aims at maintaining a valid AFS token from a

users password for long running jobs.

As everybody knows (or should know) Kerberos tickets and AFS tokens

only have a limited lifetime. This is so by design and is usually

reasonable. After 12 hours, it is no more obvious that it is really

that user sitting in front of the computer that once typed the

correct password in. Furthermore the damage caused by compromized

AFS tokens is limited to the lifetime of that ticket.

However, there are situations when users want to use long running

jobs that will write to AFS filespace for several days. Renewable

tickets are not so much of help here, since they can only be renewed

if ....

Therefore the secret has somehow deposited on the local computer

that will run the long time job. This can be eiter done by storing a

keytab on the local disk, maybe with a cron(*) principal with

reduces priviledges. The approach taken here is to work with the

original password and keep it in RAM only.

When starting this program, the user is asked for his principal and

the corresponding password. Then the TGT and AFS token is obtained

and displayed, afterwards, a background process is forked and the

main process will return to the system prompt. The workload program

can now be started.

The background process will periodically attempt to obtain krb

tickets and AFS tokens. If this fails for some reason (Kerberos

server not available or anything, the program aborts.

aklog does not create a new pag if not told so. If you want your

background process have a separate pag, create it beforehand.

The reauth.pl program will work until eternity if is not stopped

somehow. The canonical way is kill it by "kill $pid", where $pid is

the process id printed before the return of the initial call to

reauth.pl or found in the output of "ps".

(*) Cron jobs are another issue. Our institute introduced

user.cron-style principals to enable cron to obtain a token and then

work on restricted parts of the users home directories.

Security issues:

##################

reauth.pl will run forever if you do not stop it, so don't forget that!

The password is kept in RAM (of the child process). AFAIK, this can

only be recovered by local root (who you need to trust anyway). It

will not survive a reboot of the local machine.

The password is not kept on any disk. Therefore any bootfloppy

(reboot to single user mode..) or screwdriver (take disk away..)

attacks are not promising.

Be aware that your NSA-, FBI-, MI5-, KGB-, ElQaida-, or (*insert

your favorite opponent or competitor here*)-sponsored cleaning

personnel or coworkers might have even more elaborate means... :-)

BUGS:

#######

Only mildly tested only on Linux and Solaris.

Uses kinit, aklog, klist and tokens programs for a KerberosV/ Ken

Hornstein's migration kit centered AFS setup. Please adjust to your

config.

###########################################################################

Configs:

kinit program, add path if necessary

if ( -e "/usr/kerberos/bin/kinit" ) { $kinit="/usr/kerberos/bin/kinit"; } elsif ( -e "/usr/lib/heimdal/bin/kinit" ) { $kinit = "/usr/lib/heimdal/bin/kinit"; } elsif ( -e "/usr/bin/kinit" ) { $kinit="/usr/bin/kinit"; } else { die("Couln't find kinit.\n"); }

aklog program, add path if necessary

if ( -e "/usr/bin/aklog" ) { $aklog="/usr/bin/aklog"; } elsif ( -e "/usr/lib/heimdal/bin/afslog" ) { # or, afslog, for heimdal weirdos $aklog="/usr/lib/heimdal/bin/afslog"; } else { die("Couln't find aklog or afslog.\n"); }

klist program, add path if necessary

$klist="/usr/kerberos/bin/klist";

tokens program, add path if necessary

$tokens="/usr/bin/tokens";

#################################################################

Program:

use Getopt::Long; use POSIX qw(setuid); use POSIX qw(setgid); use POSIX qw(setsid);

Defaults for command line options.

my $keytab = ''; my $command = ''; my $username = ''; my $debug = 0; my $verbose = 0; my $interval=15000; # time interval in seconds: 4+ hours:

my %opts = ( # Keytab 'k=s' => sub { $keytab = @[1]; $kinit_opts .= "-k -t $keytab "; }, # Run command 'c=s' => sub { $command = @[1]; }, # Run command as user 'u=s' => sub { $username = @[1]; }, # Time interval to sleep 'i=i' => sub { $interval = @[1]; }, # Debug 'd' => sub { $debug++; }, # Be versbose 'v' => sub { $verbose++; }, );

GetOptions(%opts) or die "Usage: reauth [ -k=keytab ] [ -u user ] [ -i <sleep_interval ] [ -v ] [ -c <command> ]\n";

if(@ARGV) { $princ = $ARGV[0]; debug_print(2, "Principal name provided by argument = $princ"); } else {

Assume we want the login name as the principal name

$princ = getpwuid($&lt;);
debug_print(2, &quot;Principal name provided by argument = $princ&quot;);

}

if ($keytab) { # Don't ask for password, a keytab was provided. debug_print(1, "Keytab provided = $keytab"); } else { # read password, but turn off echo before: print "Password for $princ: "; system "stty -echo"; $passwd = <STDIN>; system "stty echo"; printf "\n"; chomp $passwd; # Actually get the tickets/tokens if(obtain_tokens()!=0) { die "Can't obtain kerberos tickets\n"; } if ($verbose) { show_tokens(); } }

fork to go into background:

a) the parent will exit

b) the child will work on

$pid = fork(); if ($pid) { # I am the parent. printf "Background process pid is: $pid\n"; if ($command) { debug_print(1,"Waiting for child to die."); wait; debug_print(1,"Child is dead."); } exit 0; } else { # I am the child. debug_print(2,"I am process $$"); print "Can't set session id\n" unless setsid();

debug_print(2,&quot;KRB5CCNAME: &quot; . $ENV{KRB5CCNAME});
#if ($ENV{KRB5CCNAME}) {
    #$ENV{KRB5CCNAME} =  $ENV{KRB5CCNAME} . &quot;_reauth_$$&quot;;
#} else {
    #$ENV{KRB5CCNAME} =  &quot;/tmp/krb5cc_reauth_$$&quot;;
#}

#debug_print(2,&quot;Creating &quot; . $ENV{KRB5CCNAME});
#system &quot;touch $ENV{KRB5CCNAME}&quot;;


if ($username) {
    debug_print(1, &quot;Looking up UID for $username&quot;);
    ($name,$passwd,$UID,$GID, @junk) = getpwnam($username);
    debug_print(1, &quot;Changing to UID $UID, GID $GID&quot;);
    print &quot;Can't set group id\n&quot; unless setgid($GID);
    print &quot;Can't set user id\n&quot; unless setuid($UID);
    if ($ENV{KRB5CCNAME}) {
        $ENV{KRB5CCNAME} =  $ENV{KRB5CCNAME} . &quot;_reauth_$$&quot;;
    } else {
        $ENV{KRB5CCNAME} =  &quot;/tmp/krb5cc_reauth_$$&quot;;
    }
}

debug_print(2, &quot;Running as uid &quot; . $&lt;);
# Actually get the tickets/tokens
if(obtain_tokens()!=0) {
    die &quot;Can't obtain kerberos tickets\n&quot;;
}

if ($verbose) {
    show_tokens();
}

# If I was told to run a command, do it.
if ($command) {
    debug_print(1,&quot;About to exec $command&quot;);
    exec($command) or die &quot;Can't execute '$command'.\n&quot;;
    exit
}

debug_print(2,&quot;Going into auth loop (interval is $interval).&quot;);

#close(STDOUT);
#close(STDERR);

# Otherwise, work until killed:
while (1) {
    debug_print(2,&quot;Waking up to obtain new tokens.&quot;);
    obtain_tokens();
    if ($verbose) {
        show_tokens();
    }
    sleep $interval;
};

}

#################################################################

sub obtain_tokens() {

ignore sigpipes' (according to perlopentut)

$SIG{PIPE} = 'IGNORE';

#debug_print(1,&quot;Running: | $kinit -f $kinit_opts -p $princ 1&gt;/dev/null 2&gt;&amp;1&quot;);

run kinit

open(KINIT, "| $kinit -f $kinit_opts -p $princ 1>/dev/null 2>&1");

pass password to stdin, password does not show up on command line

if (! $keytab) { print(KINIT "${passwd}\n"); }

close pipe and get status

close(KINIT); $status=$?;

debug_print(1,&quot;kinit exited with status $status\n&quot;);

act on status..

if($status == 256) { if ($verbose) { print "WARNING: kinit is not able to obtain Kerberos ticket ($status).\n"; print " Possible DNS or network problem. Continuing anyway...\n"; } return 1; } elsif($status!=0) { print "kinit is not able to obtain Kerberos ticket: $status\n"; return 2; };

debug_print(1,&quot;Running $aklog...\n&quot;);

$status = system "$aklog >/dev/null" ; debug_print(1,"aklog exited with status $status\n"); if($status!=0) { print "aklog is not able to obtain AFS token: $status\n"; return 3; };

return 0;

};

##################################################################

sub show_tokens() { system $klist ; system $tokens ; };

##################################################################

sub debug_print($$) { my $level = shift; my $message = shift;

if ($debug &gt;= $level) {
    print &quot;DEBUG$debug: $message\n&quot;;
}

}

##################################################################


python reauth attempt:

def run_bash_command(cmd: str) -> Any:
    import subprocess
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
if error:
    raise Exception(error)
else:
    return output


def stanford_reauth(): # def stanford_rauth(password: Optional[str] = None): # password: str = os.environ['SU_PASSWORD'] if password is None else None # assert password is not None, f'Err: {password=}' reauth_cmd: str = f'echo $SU_PASSWORD | /afs/cs/software/bin/reauth' out = run_bash_command(reauth_cmd) print('Output of reauth (/afs/cs/software/bin/reauth with password)') print(f'{out=}')

I'm currently running both and seeing if they pass/not get killed.


New error:

I was trying my attempt:

nohup sh -c "echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_diversity_with_task2vec.py --manual_loads_name diversity_ala_task2vec_hdb1_mio > $OUT_FILE 2> $ERR_FILE" > $PWD/main.sh.nohup.out$SLURM_JOBID &

but I'm not sure if it's working anymore. I get the following error:

Password for brando9: stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device

Can't obtain kerberos tickets


I think I understand the issue is. The new command wants a terminal to send the password so it doesn't let echo send it. I've tried:

I do have ssh and need to send the password. Is there a way send the password with ssh?


related:

  • useful: https://unix.stackexchange.com/questions/724685/how-does-one-authenticate-with-a-command-that-requires-your-password-in-linux?noredirect=1&lq=1 klist discussed – Charlie Parker Nov 15 '22 at 18:49
  • related: https://unix.stackexchange.com/questions/82658/bash-script-error-stty-standard-input-inappropriate-ioctl-for-device/82712?noredirect=1#comment1376906_82712 – Charlie Parker Nov 25 '22 at 20:58
  • has a good set of solns but none work for me for now: https://serverfault.com/questions/241588/how-to-automate-ssh-login-with-password – Charlie Parker Nov 25 '22 at 20:59

1 Answers1

0

idk why but this seems to stop working alternative approach asked here: How does one send a password to a command when you are not tty/stty (terminal) and don't have expect, sshpass?


Seems all 3 attempts work:

  • to run reauth inside the quoted sh script
  • to run reauth inside python
  • to do both of the above

My sample script I used to run jobs:

# https://unix.stackexchange.com/questions/724902/how-does-one-send-new-commands-to-run-to-an-already-running-nohup-process-e-g-r
# sh ~/diversity-for-predictive-success-of-meta-learning/main_nohup_snap.sh
# - set up this main sh script
export RUN_PWD=$(pwd)
source ~/.bashrc
source ~/.bash_profile
source ~/.bashrc.user
echo HOME = $HOME
# since snap .bash.user cd's me into HOME at dfs
cd $RUN_PWD
echo RUN_PWD = $RUN_PWD
realpath .

source cuda11.1

conda init bash conda activate metalearning_gpu

- get a job id for this tmux session

export SLURM_JOBID=$(python -c "import random;print(random.randint(0, 1_000_000))") echo SLURM_JOBID = $SLURM_JOBID export OUT_FILE=$PWD/main.sh.o$SLURM_JOBID export ERR_FILE=$PWD/main.sh.e$SLURM_JOBID export WANDB_DIR=$HOME/wandb_dir echo $OUT_FILE echo $ERR_FILE

export CUDA_VISIBLE_DEVICES=5 echo CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES python -c "import torch; print(torch.cuda.get_device_name(0));"

- CAREFUL, if a job is already running it could do damage to it, rm reauth process, qian doesn't do it so skip it

top -u brando9

pkill -9 reauth -u brando9

- expt python script then inside that python pid attach a reauth process

should I run rauth within python with subprocess or package both the nohup command and the rauth together in badsh somehow

#python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE & nohup sh -c 'echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE' > $PWD/nohup.out$SLURM_JOBID & #nohup python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE &

other option is to run echo $SU_PASSWORD | /afs/cs/software/bin/reauth inside of python, right?

export JOB_PID=$! echo JOB_PID = $JOB_PID echo SLURM_JOBID = $SLURM_JOBID

- Done

echo "Done with the dispatching (daemon) sh script"

main command:

nohup sh -c 'echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE' > $PWD/nohup.out$SLURM_JOBID &

I think I will stick with with option 3 so running both, since if I run a distributed job I don't want those processes to be killed randomly without my permission. For that the python script:

def run_bash_command(cmd: str) -> Any:
    import subprocess
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
if error:
    raise Exception(error)
else:
    return output

def stanford_reauth(): """" re-authenticates the python process in the kerberos system so that the python process is not killed randomly.

ref: https://unix.stackexchange.com/questions/724902/how-does-one-send-new-commands-to-run-to-an-already-running-nohup-process-or-run
&quot;&quot;&quot;
reauth_cmd: str = f'echo $SU_PASSWORD | /afs/cs/software/bin/reauth'
out = run_bash_command(reauth_cmd)
print('Output of reauth (/afs/cs/software/bin/reauth with password): ')
print(f'{out=}')

Don't forget to check reauth every so often kill the reauth jobs:

# - CAREFUL, if a job is already running it could do damage to it, rm reauth process
# pkill -9 reauth -u brando9
  • idk why but this seems to stop working alternative approach asked here: https://unix.stackexchange.com/questions/726314/how-does-one-send-a-password-to-a-command-when-you-are-not-tty-stty-terminal-a – Charlie Parker Nov 25 '22 at 21:09