Grid FAQ

Contents

  1. Why doesn’t chmod/chown work?
  2. Why use NTFS permissions?
  3. Why won’t public key authentication work?
  4. Why did my job run out of memory?
  5. Where are Python’s scientific computing packages?
  6. How can I run a batch job that requires a graphical display?
  7. How can I set default arguments for my grid jobs?
  8. How can I specify combinations of queues when running a job?
  9. How do I access my original directory path in a job script?
  10. Why are my jobs on ece.q or lowpriority.q suspended or not starting?

Why doesn’t chmod/chown work?

Most NFS shares on Engineering file server actually have Windows-style NTFS permissions rather than Unix-style, and unix file system commands like chmod and chown don’t work on NTFS.

Just ask ENGIT to set any permissions you need (preferred), or ask for permission to set them yourself from Windows.

Please read, Why use NTFS permissions?

Why use NTFS permissions?

We use NTFS because most file volumes are shared with Windows, Mac, and Linux systems. Windows being most prevalent makes NTFS a convenient default. Also NTFS permissions, called Access Control Lists (ACLs), can use AD groups (e.g. lab-members, class-list, department), which make managing access by role much easier. Finally ACLs provide more fine-gained control than Unix-style permissions.

This is only and default, and if you need unix style permissions we can provide them.

Why won’t public key authentication work?

FIXED! If you still have this issue just email use and we’ll make a quick change to your account.

Where are Python’s scientific computing packages?

Instead of using the system-provided Python packages, we’ve installed the Anaconda Python distribution and added scientific computing packages there. You can switch your environment to use that Python with the command module load anaconda at the beginning of your job (or anaconda/3.4 for the Python 3.4 version). Or, if you just want to launch your python script directly with that Python installation, you can use the command python-anaconda in place of the usual python.

How can I run a batch job that requires a graphical display?

Some programs insist of having a display to render graphics with, even if they don’t actually use it. (For example, Lumerical FDTD when executing a .lsf script.) As a workaround you can prefix the call to your program in your job script with the command xvfb-run to provide a virtual display to the program. This will seem like a real Linux X11 server to the program while it’s running, and when the program ends, the X server will automatically shut down as well. Note that by default Xvfb creates a low-color-depth screen that doesn’t support OpenGL rendering; use a higher depth with an option like -s "-screen 0 1024x1024x24" if your job fails with a message about GLX.

How can I set default arguments for my grid jobs?

If you frequently specify the same arguments over and over for qsub or qlogin, you can put a file called .sge_request in your home directory containing these arguments as defaults. These should be the same things you would have typed out manually, given one per line in the text file. For more information, see man sge_request.

How can I specify combinations of queues when running a job?

You can use special set notation characters for arguments to qsub, including the queue argument. See the MATCHING TYPES section in man sge_types.

For example, to run a job on either bme.q or me.q:

qsub -q 'bme.q|me.q' job.sh

Or, to run a job anywhere except bme.q or me.q:

qsub -q '!(bme.q|me.q)' job.sh

How do I access my original directory path in a job script?

When submitting a job as a script it will ordinarily be copied to a location in /var. Separately from the script’s location, if you specify -cwd for qsub, the working directory will be set to the location where you were when you submitted the job. You can get some helpful variables from the environment inside a running job. For example:

Variable

Description

Example Value

$PWD

Current working directory

/mnt/nokrb/username/somepath

$0

Location of running script

/var/spool/gridengine/bu-eng/spool/budge02/job_scripts/8132815

$JOB_SCRIPT

SGE’s variable for $0

/var/spool/gridengine/bu-eng/spool/budge02/job_scripts/8132815

$JOB_ID

ID number of the running job

8132815

You can use these variables to keep your scripts portable rather than hardcoding paths like /mnt/nokrb/username. Examine the output of env within a batch job to see everything in the environment.

Why are my jobs on ece.q or lowpriority.q suspended or not starting?

The queues on the instruction lab workstations are generally suspended during daytime hours so as not to interfere with coursework. They will resume at night, so long-running jobs there will eventually complete, but those queues won’t be a good choice for interactive or short-term jobs. Instead use instruction.q or one of the other public queues.