SCC Cluster Questions

How do I get an account on the SCC?

People using the Shared Computing Cluster (SCC) must be a member of a Research Computing Project. BU faculty may create a Research Computing Project, serve as the Lead Project Investigator, and add their collaborators (students, researchers outside of BU, etc.) to their project. This is usually done through the SCC management page:

How do I manage my account on the SCC – add more space/users?

Research Computing services has some great help on how to use their account management website, which is where you will do all things related to your SCC account: https://www.bu.edu/tech/support/research/account-management/..

How do I get on the SCC?

There are several ways to access the SCC, including SHH, VNC, and our recommended way, OnDemand.

SCC OnDemand is the recommended way to access the BU Shared Computing Cluster (SCC) over the web using a graphical, menu-based environment that doesn’t require using an SSH client. It is particularly well suited to applications like MATLAB, RStudio, and Jupyter Notebook which have a graphical component. OnDemand is an effective alternative to VNC. Additionally, OnDemand allows you to upload and download files, launch applications, view disk quotas and do many other things on the SCC.

How do I access OnDemand to get on the SCC?

To launch OnDemand, go to: scc-ondemand.bu.edu
Generally you want to launch a session via the Interactive Applications tab. Usually the desktop choice is fine, but if you are doing very graphics heavy work (spinning brains in freesurfer) you might want the VirtualGL Desktop. You can set your resources just as with any cluster job. High core or time request might result in a delay to create your desktop.

How do I get my dicoms from XNAT to the SCC?

We have a new script with more download options, including an option to go directly to bids format. It is curtesy of Tim O’Keefe from the Harvard NRG team, from Github. For complete documentation see here.

To find what the latest version is:
module spider yaxil

Then load the most recent version. Unless it is required to upgrade your version for a particular reason, you should use the same version for all participants in an experiment.

Make sure to replace the ? with the version number you want.
module load yaxil/?.?.?

This load fill fail, and tell you what version of python you need to load. Here is an example from the latest version of yaxil as of Aug 2024.
——————————————————————————-
ERROR: yaxil/0.9.12 requires at least one additional module.
Run the following commands to load all of the dependencies:

module load python3/3.10.12
module load yaxil/0.9.12
——————————————————————————-

Run the two load commands it lists to correctly load the package. The main script this loads is ArcGet.py.

To see the help:
ArcGet.py –help

A basic example:
ArcGet.py -a xnat2 -s 20200312_QA -p qa -f native -o /projectnb/onr/mcmains/

Flag Description
‒a, ‒‒alias will be xnat (older data) or xnat2 (after Sept 2024)
‒l, ‒‒label XNAT session id: what you called the session when you registered them at the scanner
‒p, ‒‒project your XNAT project that the data was archived to
‒‒scans If you don’t want to download the entire dataset, you can enter the scan numbers for those you want. example: ‒‒scans 4 6 13
‒c, ‒‒config full path to a bids configuration file
‒o ‒‒output-dir full path to where you want the data to be placed. If you use flat format, the data will be dumped in this directory, if you use the other formats, a directory with the session id will get created in this directory.
‒f, ‒‒output-format there are several choices: bids, flat, native, flat puts all the dicoms in one directory with somewhat cryptic names, native puts them in separate directors for each scan with more obvious names
‒‒password
‒‒username
‒‒host https://xnat2.bu.edu
In versions 0.6.5 and greater, you can manually enter information about yourself and the host sever via the command line. This allows you to script ArcGet.py as it won’t ask you to manually enter your password. Do not use –alias with this or it will still prompt you for your password. For older data, set the host to https://xnat.bu.edu

If you do not select the bids format, it will not convert your dicoms. If you do select bids, it will create the bids structure and convert your dicoms using dcm2niix.

Unpacking into bids

To unpack into the bids file naming and directory structure, you need to make a configuration file. The format of the one used here is a little easier to read than the native .json format. There is both an example file, and an empty template to work with.

The example will be located in the examples folder inside of the yaxil directory. It will be located at:
$SCC_YAXIL_EXAMPLES/bids_example_config.yaml

For a brief description, see the original documentation page.

Fieldmap tips: If you do the blip-up/blip-down field maps, you would have two fields labeled epi under the fmap field, one with direction ap and one pa (or rl/lr). And you can have the intended for field in the mag/phase fields also. And can obviously have multiple of each that are applied to different runs.

Diffusion data:
In versions 0.6.5 and greater, you can enter dwi at the same level as func,fmap, and anat.

Template configuration file:
To get an empty template to start from (you can add and delete fields as you need), you can copy it from the same directory as the example above.

cp $SCC_YAXIL_EXAMPLES/bids_template_config.yaml .

How do I select software on the cluster?

RCS has some great documentation on the module system that is used to select software. I will highlight a few basic commands below.

To find out what versions of a software exist:
module spider spm

To select a module:
module load spm/12

To make sure module are loaded correctly in scripts, add a -l to the first line of your script.
#!/bin/bash -l

You can load multiple modules at once.
module load python3 spm/12

Any tips for using R, Python, Matlab, C, Fortran on the SCC?

SCC has documentation to help you get started on R, Python, Matlab, C, and Fortran.

How do I submit my job to the cluster: qsub?

RCS has a lot of good documentation generally about interacting with your batch jobs: https://www.bu.edu/tech/support/research/system-usage/running-jobs/, and how specifically to submit your SCC job: https://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/.

I will go over a few highlights here. Generally you want to use a submit script to setup your job. This gives you more options for what you can submit, and allows you to do things like load the required modules.

It might look something like this:

#!/bin/bash –l

#first line specifies this is a bash script, and the -l (lower case L) for modules to work correctly.

#$ -P onrteach #project name, only required if in more than 1 project
#$ -l h_rt=48:00:00 #job time limit, default 12 hours
#$ -o /projectnb/onrteach/repo/output/myscript_$JOB_ID.txt #output file will include jobid num, otherwise created automatically wherever the script was launched from with as job_id.txt
#$ -j y #merge output and error files, otherwise you need to save the error file too (flag is -e).

#load my required modules
module load afni/21.0.4-omp

#run my script, which takes a subject ID as an input.
/projectnb/onrteach/repo/afniworkshop/afniscript.sh 210415_mystudy_subj01

If your script takes advantage of multiple cores (aka parallel computing), you can request multiple cores. For more documentation, see here.
This is also a way to increase your memory, as described here.
To request multiple cores, include the additional few lines:

#!/bin/bash –l

#first line specifies this is a bash script, and the -l (lower case L) for modules to work correctly.

#$ -P onrteach #project name, only required if in more than 1 project
#$ -l h_rt=48:00:00 #job time limit, default 12 hours
#$ -o /projectnb/onrteach/repo/output/myscript_$JOB_ID.txt #output file will include jobid num, otherwise created automatically wherever the script was launched from with as job_id.txt
#$ -j y #merge output and error files, otherwise you need to save the error file too (flag is -e).
#$ -pe omp 8 #number of processors, default 1
#$ –l mem_per_core=4G #number of memory available for each core, default 4G.
#$ -v OMP_NUM_THREADS=4. #often this environmental variable needs to be set in order for the program to take advantage of multiple cores. Any needed environment variable can be set this way. This number should match the number of requested cores (-pe).

#load my required modules
module load afni/21.0.4-omp

#run my script, which takes a subject ID as an input.
/projectnb/onrteach/repo/afniworkshop/afniscript.sh 210415_mystudy_subj01

For more general flags for submitting your script: https://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/#job-options.

For more flags on choosing your resources: https://www.bu.edu/tech/support/research/system-usage/running-jobs/submitting-jobs/#job-resources.

How do I set my memory usage?

See the RCS documentation on memory here: https://www.bu.edu/tech/support/research/system-usage/running-jobs/resources-jobs/#memory.

As a default, you get about 4G per core/processor you request. If you are going to need more than that, you should increase your request via the qsub commands. For a guide of what you should choose, see http://www.bu.edu/tech/support/research/system-usage/running-jobs/batch-script-examples/#MEMORY.

To figure out how much memory you need, you often want to aim high the first time and then check out how much memory it used and adjust accordingly.

How much time and memory did my job take?

If you saved your output file with the JobID number in it, this should be very easy.
RCS has great documentation on this:
https://www.bu.edu/tech/support/research/system-usage/running-jobs/allocating-memory-for-your-job/#QACCT
.

In the output you will also see start_time and end_time which will allow you to calculate the run time.

In general, you should add about 10-25% on to both the memory and time, as the same script ran on different participants with different sized brains can take different amounts of time/memory.

How do I run my job when it requires graphics or user interface -aka an interactive job?

If you are running something with minimal graphics, such as SPM, this can often be done directly in your OnDemad Desktop session. You might just want to request additional resources, aka cores.

If it is a job with similar graphics but requires a lot of resources, you might want the greater qsub control you can get by launching an interactive session through the qsub system as described here.

Finally, if you need ‘serious’ graphics, like for Freesurfer spinning brains or Workbench from HCP, you should use a VirturalGL session that allows for the speeding up of graphics. This can now be done via OnDemand by choosing VirturalGL as the app to launch. There is also a much more complicated way to do it via qsub as described here.