Kerberos Tickets

The ENG-Grid was updated in 2020, so you no longer need to manage Kerberos Tickets for submitted jobs. This is still possibly useful for workstations attached to the Grid that mount ENGNAS with NFSv4.


For long jobs:

  1. Make a directory for yourself called /mnt/nokrb/yourusername and run “chmod 744” on that directory to restrict its access to you alone.
  2. Put everything your program needs in that directory, and cd into it before you run qsub.
  3. Make sure to copy data back to your home directory or lab research directory and delete it from the nokrb directory after you’re done with a batch of jobs. Your quota in your home directory is 10G and your lab research directory under /ad/eng/research might be quite a bit larger, but your quota in the nokrb directory is only 2GB.

Kerberos tickets are a method of network authentication that makes the grid and SCC secure. To run programs on the grid or scc, you need active tickets to give you permission to connect between nodes.

How do I renew my tickets?

First, do you need to? See the main grid instructions page, under the “for long jobs” part– you may be able to use alternate scratch space in /mnt/nokrb instead of your home directory for your long-running tasks, which could remove the need to worry about ticket renewal. That way you log back in later to retrieve your results (which automatically gets fresh credentials) and can move files to the secured locations as usual. If for any reason this won’t work, read on.

For security, Kerberos tickets expire pretty frequently — every 9 hours. When the ticket expires you can no longer read or write to Kerberos authenticated directories like your home directory or research share. If this happens, you can just run “kinit”. It will prompt you for your password, and you’ll get a new ticket valid for the next 9 hours.

$ kinit

Kerberos tickets you can be renewed for up to 7 days. For example, to renew every 9 hours for 7 days:

$ kinit -r 7d

If you’re running a job that needs Kerberos tickets for more than 9 hours continuously and you don’t want to come back to the machine to retype your password, you will need to do something else. You can separately specify how long your ticket will last before expiring, and how long it could last if you renew it before that expiration, with “kinit -l lifetime -r renewable_life”, but note that the maximum is 9 hours for lifetime and 7 days for renewable life, and our defaults will already request these maximum values. To renew your tickets before the expiration occurs, you can run a script which automatically runs “kinit -R” once every 8 hours or so, to renew your tickets without having to type your password again.

Renewtickets

A script is available called renewtickets which will attempt to automatically renew your kerberos tickets for as long as possible, then exit. If you give it a queue name it will also run gridtickets for you on that queue, as described below. You can use nohup to leave the script running even after you log out, like this:

$ nohup renewtickets  [queue.q] &
nohup: ignoring input and appending output to `nohup.out'

Here’s an example for the bme.q with the output written to bme.kticket.out rather than nohup.out.

$ nohup renewtickets bme.q &> bme.kticket.out&

Note that if you had a job running longer than 7 days, you would have to run kinit again at least once every 7 days, which will require you to retype your password to re-get the ticket with another 7-day lifetime.

Gridtickets and Batch Jobs

When you submit an MPI job to a queue, grid engine will start mpirun on a single host, and that mpirun process will then SSH to the other hosts directly to start up worker processes. Mpirun can use a kerberos ticket to run SSH on your behalf, but first copy your current ticket across the queue to make this work:

$ gridtickets bme.q

You can replace the example bme.q with the queue you need to use.

NOTE WELL: With the Kerberos tickets copied to nodes, you hypothetically could actually use Kerberized directories in your grid submissions. However, you should avoid doing this, and instead still use the -cwd switch and your /mnt/nokrb home directory as usual, because if your job runs longer than the amount of time it takes for your tickets to expire, it will lose access to those directories.

Tickets on the SCC

On the SCC you can use gridticket.sh if you assign you job to a specific queue, but SCC also has krb5cc.sh (/usr/local/bin/krb5cc.sh) which allows a job to pull the Kerberos ticket during execution. To use it, first run it with the “init” option on a login node. After successfully typing in your password, it will background itself and automatically maintain a valid ticket in ~/.kbr5cc for one week. You can invoke this script in your batch scripts with the “batch” option, which will maintain a valid ticket on the batch nodes used by the job for as long as the ticket in ~/.krb5cc is valid or the job exits, whichever occurs first. You can rerun it with “init” option on any login node before the ticket in ~/.krb5cc expires.

For example, setup with:

$ krb5cc.sh init
$ qsub -pe omp 4 comsoljob.sh

Then from within the shell script that qsub runs (e.g. comsoljob.sh), you move the kerberos ticket to the exechost the batch job is running on:

$ krb5cc.sh batch

For more information on running jobs on the SCC, check out the SCC website.

Kerberos tickets with MPI

When you submit an MPI job to a queue, grid engine will start mpirun on a single host, and that mpirun process will then SSH to the other hosts directly to start up worker processes. mpirun can use a kerberos ticket to run SSH on your behalf, but first copy your current ticket across the queue to make this work:

gridtickets bungee.q

NOTE WELL: With the Kerberos tickets copied to nodes, you hypothetically could actually use Kerberized directories in your grid submissions. However, you should avoid doing this, and instead still use the -cwd switch and your /mnt/nokrb home directory as usual, because if your job runs longer than the amount of time it takes for your tickets to expire, it will lose access to those directories.

If you need to queue up a set of MPI jobs and your copied tickets might expire before they all run, you can use the script renewtickets in tandem with gridtickets to automatically renew and copy your credentials.