This page demonstrates procedures for running MATLAB applications through the SCC’s batch scheduler. For MATLAB operations such as code development, GUI, and other graphical rendering, an interactive MATLAB window is the natural and preferred mode of operation. Other applications, such as running long duration production runs that do not require interaction, are best run in the background mode, commonly known as batch. On the SCC, batch jobs are managed by the Open Grid Scheduler (OGS). Users submit batch jobs via a job submission command (qsub
) and the rest is handled by the batch scheduler and the operating system. MATLAB batch job submission and handling generally follow the guidelines detailed in the Shared Computing Cluster’s Running Jobs page. Users running a large number of MATLAB jobs may require additional steps for efficient and robust batch operations. These will be elaborated on and demonstrated below where necessary.
Batch Basics
Batch System Usages & Policies
- Batch jobs are submitted to the batch scheduler via
qsub
scc1$ qsub [qsub options] user-script [arg1 ...]
Above,
user-script
is a user supplied shell script that dictates operations to perform whileqsub options
let you specify supported options. There are manyqsub
options that can be included in the user script. If a qsub option appears both asqsub
command-line input and in the user script, the former overrides the latter. - This page uses the words processor, core, thread and slot to interchangeably denote what computer hardware vendors call a processor core.
- A user can submit as many jobs as needed.
- The system default wall clock limit is 12 hours. Specify a different limit with
scc1$ qsub -l h_rt=HH:MM:SS . . .
- Serial MATLAB batch jobs can run for up to 720 wall clock hours (30 days)
scc1$ qsub -l h_rt=720:00:00 . . .
- Multicore MATLAB batch jobs using the “omp” parallel environment can run for up to 720 hours. Take 28 cores for example,
scc1$ qsub -pe omp 28 -l h_rt=720:00:00 . . .
- Note that the
-pe omp 28
only means 28 CPU cores are assigned to the job. Users are responsible to make their Matlab programs really run on these cores. For example, using-maxNumCompThreads(28)
for implicit parallelism orparpool(28)
for explicit parallelism. For more details about the “omp” parallel environment, please refer to this page. - The maximum number of threads is 32 in MATLAB 2014a or newer.
- Multi-node parallel computing in MATLAB is not supported on the SCC.
- The Technical Summary lists available SCC compute nodes with details on cores, memory, scratch disk, and more.
Essential Batch Commands
- Use
qsub
to submit batch jobs. For example:scc1$ qsub ./mbatch
where
mbatch
is a basic batch script to run MATLAB using 1 processor:#!/bin/bash -l module load matlab matlab -nodisplay -singleCompThread -r "n=4, rand(n), exit"
Represented between the pair of double quotes above is, effectively, a MATLAB command window to run supported MATLAB commands: define a variable (
n=4
); run built-in MATLAB utilities (rand, exit
); or user m-files, say,myfct
(omit the .m suffix). The MATLABexit
command ensures proper ending of the MATLAB session and batch job. By default, this is a single-processor job. The-singleCompThread
prevents MATLAB from invoking multithreading (i.e., using multicore) automatically which would cause the job to be killed by the system for overusage of system resources. The-nodisplay
runtime switch suppresses rendering for batch jobs. Graphics may be rendered and saved as an image file with the MATLABprint
utility for viewing subsequently in an interactive window session.The
mbatch
script provides a very simple script for basic, single processor MATLAB batch jobs. Different MATLAB applications may require appropriate changes to the basic script. These various types are demonstrated below in Types of MATLAB Batch Jobs.While not required as a batch script, enabling
mbatch
as an executable extends its functionality as a command — handy for error-checking before use in batch processing.scc1$ chmod +x mbatch scc1$ ./mbatch
- Use
qstat
to query batch queue status. Add-u
option to list jobs for a specific user:scc1$ qstat -u myID job-ID prior name user state submit/start at queue slots ja-task-ID ------------------------------------------------------------------------------------------------------------- 6860722 0.96195 myJobs myID r 03/23/2015 10:20:10 b@scc-ba6.scc.bu.edu 1 6860723 0.00000 myJobs myID qw 03/23/2015 10:20:11 1 6860724 0.00000 myJobs myID Eqw 03/23/2015 10:20:20 1
In the above,
qw
indicates that the job is waiting whiler
means the job is running. A state ofEqw
indicates and error with the job. Useqstat -j 6860724
to see an explanation for the error with that particular job. - To kill a job in the queue:
scc1$ qdel 6860723
- Output goes to
myJobs.o6860722
(this includes the MATLAB splash screen and anything that goes to the command window). More details onqsub
options, such as output control, are available on the Running Jobs page. - The batch scheduler has built-in system default behaviors, like a 12-hour wall time limit. You can define your own
qsub
default behaviors so that you won’t have to specify them each time on the command line or in your batch script. To do this or to control the execution order of your batch jobs, see Advanced Batch System Usage.
Types of MATLAB Batch Jobs
Depending on the applications, MATLAB batch job running procedures generally fall into one of the following categories. Sample batch scripts and utilities discussed in this page are available for download or copy (for SCC users).
scc1$ cp -r /project/scv/examples/matlab/batch your-SCC-dir-path
Running single-core (serial) MATLAB Batch Jobs
run_matlab_job
, an enhanced version of mbatch
, you can specify an arbitrary problem size at runtime.
scc1$ qsub ./run_matlab_job 5 # compute rand(5)
#!/bin/bash -l
module load matlab
matlab -nodisplay -singleCompThread -r "rand($1), exit"
When
qsub
runs, the system shell (bash
) parsesrun_matlab_job
like this:
run_matlab_job
expects one runtime input (random array sizen
) labelled$1
(additional $2, $3, … if required).- Shell substitutes variable $1 with 5, runs
matlab . . -r "rand(5),exit"
To adapt it for your own program, replace
rand($1)
with your m-file (omit.m
).
If you are not submitting a batch job from the directory where your program resides, you may need to use
addpath
, for example, to let MATLAB knows where to find it.
See Running Multiple Batch Jobs on ways to run a group of similar jobs.
Running Multi-core MATLAB Parallel Computing Toolbox Batch Jobs
scc1$ qsub ./run_matlab_pct_job # 4 cores; n=100
scc1$ qsub -pe omp 8 ./run_matlab_pct_job # 8 cores; n=100
scc1$ qsub -pe omp 8 -v n=200 ./run_matlab_pct_job # 8 cores; n=200
run_matlab_pct_job
:
#!/bin/bash -l
#$ -pe omp 4
# set default value for n; override with qsub -v at runtime
#$ -v n=100
# Load the newest version of matlab on SCC
module load matlab
# Additional qsub options here . . .
matlab -nodisplay -r "runBatchJob($n, $NSLOTS); exit"
In general, a PCT job is expected to run on multiple cores. The example batch script presets it to 4 cores (#$ -pe omp 4), which can be overridden at runtime. See Custom qsub settings with .sge_request for details on override rules.
The double-quoted command region in
matlab ... -r " ... "
(first explained in the Serial Batch section) is preprocessed by the system shell inrun_matlab_pct_job
to replace valid environment variables, like $n, with their respective values before being passed to thematlab
command.
- n (defaulted to 100) is used for the computation in the MATLAB program.
- $NSLOTS is set to 4 per qsub option statement
#$ -pe omp 4
(In the above example, $NSLOTS=8 if runtime override exercised.)It is prudent to “inherit” a pre-defined
qsub
option value, per $NSLOTS. Defining it explicitly, e.g., runBatchJob(3, 4), defeats the purpose of runtime override. It may also lead to inadvertent inconsistencies which may lead to the job being killed if system resource overuse occurs.
Each time the PCT is invoked, for internal administrative purposes, MATLAB creates a group of files and folders in the user’s home directory (~/.matlab
). When an interactive MATLAB PCT job is running on one of the login nodes, data flow between the application and ~/.matlab
is local and it poses no communication issues. A batch job, on the other hand, is dispatched to a compute node at runtime and the aforementioned communication is between the login node and the compute node. Generally, inter-nodes communications are much less efficient than local (i.e., intra-node) communications; frequent occurrences could lead to undue I/O traffic bottleneck for the system as well as performance degradation of the application.
runBatchJob.m
below to keep this PCT-related I/O traffic within the compute node to forstall undesirable inter-node data communications.function runBatchJob(n, nslots)
% computes sum s=1+2+3+...+n=n(n+1)/2 with nslots cores
% redirects ~/.matlab PCT temp files to TMPDIR on the compute
% node to avoid inter-node (compute node <--> login node) I/O
myCluster = parcluster('local') % cores on compute node to be "local"
if getenv('ENVIRONMENT') % true if this is a batch job
myCluster.JobStorageLocation = getenv('TMPDIR') % points to TMPDIR
end
% REPLACE BELOW EXAMPLE WITH YOUR APP (either scripts or functions)
parpool(myCluster, nslots) % for MATLAB R2014a or newer
%matlabpool(myCluster, nslots) % for MATLAB R2013a or older
s = 0;
parfor i=1:n
s = s + i; % compute s = 1 + 2 + ... + n = n(n+1)/2
end
fprintf(1,'Computed arithmetic sequence sum s = %d', s);
fprintf(1,' (correct answer: %d)\n\n\n', n*(n+1)/2);
%matlabpool close % MATLAB 2013a or older
delete(gcp) % MATLAB 2014a or newer
end
See Running Multiple Batch Jobs on simple way to run a group of jobs.
Running Pre-compiled MATLAB Standalone Jobs
scc1$ qsub -t 100-300:200 ./run_standalone_job
See How to Create & Run MATLAB Standalone Executable for details.
Running Implicitly Parallel MATLAB Batch Jobs
-singleCompThread
flag from mbatch
and submit the job to a multiprocessor queue. There are two situations associated with implicit parallel computation in regards to batch processing: using a whole node (recommended options are 28 or 16 cores) versus using a partial node (recommended options are 8 or 4 cores).
- Sample batch script for using a whole node (taking 28 cores for example),
wnbatch
:#!/bin/bash -l module load matlab matlab -nodisplay -r "A=rand(1e4); I=A*A; exit"
scc1$ qsub -pe omp 28 ./wnbatch
In this example, a whole compute node with 28 cores is assigned to the job and the MATLAB functions are parallelly executed on the 28 cores.
- Sample batch script for using a partial node (taking 8 cores for example),
pnbatch
:#!/bin/bash -l module load matlab matlab -nodisplay -r "maxNumCompThreads($NSLOTS); A=rand(1e4); I=A*A; exit"
scc1$ qsub -pe omp 8 ./pnbatch
The
qsub
command line option -pe omp 8 passes the core count to the batch scheduler’s $NSLOTS environment variable at runtime.Note the absence of the MATLAB option -singleCompThread in
pnbatch
In this example, 8 cores are assigned to the job and the MATLAB functions are parallelly executed on the 8 cores. Note that it is possible that the job is assigned to a compute node with more than 8 cores (such as 16 cores or 28 cores), but only 8 cores (i.e. a part of the compute node) are allowed to be used by the job. If the MATLAB program actually runs on more than 8 cores, the job will be killed automatically. As such, it is important to use the function maxNumCompThreads($NSLOTS) to ensure that the thread number used in the MATLAB program is the same as the requested number of cores.
Running Multiple Batch Jobs With the qsub Array Job Option
qsub -t #-#[:#]
(aka Array Job) option is ideally suited for parametric studies. Note that if the system load permits, jobs under a parametric study run concurrently — and effectively as Embarrassingly Parallel jobs — they practically scale linearly. The qsub -t #-#[:#]
option essentially has the effect of the PCT’s parfor
. The qsub
manpage states that “The option argument to -t specifies the number of array job tasks and the index number which will be associated with the tasks. The index numbers will be exported to the job tasks via the environment variable SGE_TASK_ID.” The relationship between SGE_TASK_ID and the -t input argument is demonstrated below, along with examples
% qsub -t SGE_TASK_FIRST — SGE_TASK_LAST : SGE_TASK_STEP . . .
% qsub generates SGE_TASK_ID for each job with above indices
% Simulate SGE_TASK_ID with MATLAB colon syntax -- NOT REQUIRED
SGE_TASK_ID=SGE_TASK_FIRST:SGE_TASK_STEP:SGE_TASK_LAST;
= [100,300,500]; % qsub -t 100-500:200 (run 3 jobs)
= 100; % qsub -t 100 (run 1 job)
= [3,4,5]; % qsub -t 3-5 (default step size 1)
qsub -t 100-300
will spawn 201 batch jobs because the default step size is 1 ! Anticipatorily, name your array jobscc1$ qsub -N myJobs -t 100-300:200 . . .
In the event that an array job was submitted by mistake, simply delete all tasks with
scc1$ qdel -u yourUSERID "myJobs*"
Single-Core Parametric Studies
To perform a single-core parametric study, one could use run_matlab_job
multiple times, each with a different value of qsub -v n=N ...
However, it is more convenient to run the jobs as an array, with which the environment variable $SGE_TASK_ID is used directly (or indirectly, like n=fct($SGE_TASK_ID)
) as the random matrix order n
.
scc1$ qsub -t 3-7:2 ./run_matlab_aj
run_matlab_aj
#!/bin/bash -l
#$ -v alpha=1
#$ -v beta=2
module load matlab
# use env var SGE_TASK_ID as random matrix size n
matlab -nodisplay -singleCompThread \
-r "myRand($SGE_TASK_ID, $alpha, $beta), exit"
As previously explained in
run_matlab_job
, the double quoted segment inrun_matlab_aj
is first processed by the system shell to replace all specified environment variables with their respective values before passing control on to MATLAB. The above example runs 3 jobs with $SGE_TASK_ID passed to each as 3, 5, and 7, respectively as the direct substitute forn
, the random square matrix order.
function myRand(n, alpha, beta)
% This is a companion to run_matlab_aj batch script to demonstrate
% qsub -t option. It computes a random matrix, then save output of
% task to a file with name indicative of task
A = rand(n); % computes random matrix
filname=['output_' num2str(n)]; % name of file to be saved
save(filname, 'A','alpha','beta'); % saves A to a mat file
Above,
n
(= $SGE_TASK_ID) is also used to define I/O file name for individual tasks to prevent all tasks writing to the same output file.
Multi-Core (PCT) Parametric Studies
Here are example commands to submit batch jobs for multi-core parametric studies:
scc1$ qsub -t 128-176:16 -pe omp 16 ./run_matlab_pct_aj # 16 cores; n=128, 144, 160, or 176 scc1$ qsub -t 140-196:28 ./run_matlab_pct_aj # 28 cores; n=140, 168, or 196
run_matlab_pct_aj:
#!/bin/bash -l
#$ -pe omp 28
module load matlab
# Additional qsub options here . . .
matlab -nodisplay -r "runBatchJob($SGE_TASK_ID, $NSLOTS); exit"
Each job in the job array will run on a whole compute node with 16 or 28 cores, and all cores on the node are automatically used by the MATLAB functions. Using a whole node for one job has several advantages versus using part of a node:
- More cores may help to speedup computation.
- Being the sole user on the node guarantees that all of the node’s memory is available to your job.
- Having multiple PCT jobs running on the same node may cause a problem. This situation is avoided by using a whole node for one job.
Parametric Studies Requiring Multiple Parameters
Here are a few ways to run parametric studies with multiple variables.
- Hold
N - 1
variables fixed (e.g.,alpha, beta
) and run multiple jobs on the remainingN
th variable via an Array Jobscc1$ qsub -v alpha=1 -v beta=2 -t 10-15 ./run_matlab_aj scc1$ qsub -v alpha=3 -v beta=4 -t 10-15 ./run_matlab_aj
- The above can be further automated, e.g., with this
runjobs
script:#!/bin/bash -l # runjobs shell script for a in `seq 1.1 0.3 3.2`; do for b in `seq 1 1 3`; do qsub -v alpha=$a -v beta=$b -t 10-15 run_matlab_aj done done
The above script uses the Unix sequence command
seq
to generate integer and floating-point sequences to feed the for-loop indicesa
andb
. Input toseq
has this format:first:step:last
. Note that the symbol ` is a left-single-quote. In this case:a = [1.1, 1.4, 1.7, 2.0, 2.3, 2.6, 2.9, 3.2]; % seq 1.1 0.3 3.2 b = [1, 2, 3]; % seq 1 1 3
Don’t forget to give
runjobs
execute attribute:scc1$ chmod +x ./runjobs
Then, executing
runjobs
yields:scc1$ ./runjobs Your job-array 6826755.10-15:1 ("runjobs") has been submitted . . . . . . . . . .
- Alternatively, the MATLAB
ind2sub
utility may be used to map Array Job’s linear indexing to N-Dimensional indexing. For example, if you want to generate a 3×4 array of 2-D indexing, you could submit an array job:scc1$ qsub -t 1-12 . . .
This launches 12 jobs with their respective
$SGE_TASK_ID = 1, 2, 3, . . ., 12
. Next, inmyApp.m
:% with SGE_TASK_ID (say, 7) passed into your myApp.m ... [i, j] = ind2sub([3 4], $SGE_TASK_ID) % returns [row, col] i = 1 j = 3 alpha = Alpha(i); % Alpha is independent of j beta = Beta(j); % Beta is independent of I