The Shared Computing Cluster (SCC) implements several automatic “process reapers” to enforce policy. These detect and terminate processes or batch jobs that use resources beyond the job request or that make inefficient use of resources. Actions taken by the process reapers are reported to the owner of the impacted process via email. Research Computing is available to assist researchers in optimizing their workflows and batch job specifications.

The Login Node Process Reaper

The SCC Login Nodes are the primary connection point for researchers using the SCC. These nodes can be used for administrative tasks and light work; long-term or high-cpu tasks should be run as a batch jobThis process reaper enforces a time limit of 15 minutes of CPU time on each process on the login node.

The CPU Limit Process Reaper

Compute nodes should only run processes associated with jobs and jobs should use only the resources requested by the job submission. You can learn about process/slot requests on our Submitting Batch Jobs page. This process reaper terminates processes that are not associated with a job (e.g. SSH directly to a compute node) and jobs that use more than processors than requested.

The Idle GPU Process Reaper

Interactive sessions and batch jobs should make effective use of specialized resources, like GPUs, when they are requested. You can learn about the use of GPUs on our GPU Computing page. This process reaper terminates a job if all of the requested GPU(s) remain idle for two hours on Shared resources and some Buy-in resources.

The Unassigned GPU Process Reaper

GPUs are only accessible through batch jobs and batch jobs should use only the GPUs they are assigned. You can learn about use of GPUs and the $CUDA_VISIBLE_DEVICES variable on our GPU Computing page. This process reaper enforces GPU assignment of processes within a batch job – jobs and processes that use a GPU which is not assigned to the job are terminated.