Overview

The Project Disk Space file system comprises twelve petabytes of usable high performance online storage for research computing projects. Project Disk Space is allocated to individual research projects for exclusive use by its members, facilitating collaboration.

Each project is allocated a limited amount of Free Baseline quota. Those projects requiring additional may either purchase Project Disk through the Buy-in program or rent additional Project Disk through the Storage-as-a-Service program.

All Project Disk Space is protected by hardware RAID (protecting against disk failures) and daily
Snapshots (protect against accidental deletion of files).

Kinds of Project Disk Partitions

There are four Project Disk Space partitions on the SCC: /project, /projectnb, /restricted/project, and /restricted/projectnb. These four partitions have identical performance characteristics. The two /restricted partitions are dbGaP compliant for data that needs it (primarily Genomics projects). The two /project partitions are backed up nightly to an independent off-site system for disaster recovery and the two nb partitions are not-backed-up. Regardless, Snapshots are implemented on all four partitions enabling users to easily retrieve accidentally deleted files.

Data Protection Requirements

A portion of the SCC Project Disk Space is set up to be used for processing and storing Confidential data such as a HIPAA Limited Data Set (DOB, DOD, dates of treatment, City, Zip Code) and dbGaP data. Restricted Use data, such as HIPAA or individually identifiable health information may not be stored on any partition of the SCC. The allowed Confidential data may be stored only in the /restricted/projectnb and /restricted/project partitions and can be accessed from all SCC compute nodes but only the scc4.bu.edu login node and scc-ondemand.bu.edu web interface.

Public and Internal data may be stored on /project and /projectnb and accessed from the other login nodes as well as all compute nodes.

Please see http://www.bu.edu/policies/data-classification-policy  for definitions and more information.

For questions about how your data is classified, please send email to bumcinfosec@bu.edu.For questions about using SCC and Project Disk Space, send email to help@scc.bu.edu.

Allocations

Project Disk allocations can be in the form of any of three types: Free, Buy-in, or Storage-as-a-Service. Functionally, rented and purchased Project Disk augment and are largely indistinguishable from free storage.

Forms for requesting both Free and Storage-as-a-Service space can be found with the other project management web pages on TechWeb on your SCC Management Page. For Buy-in space, email buyin@rcs.bu.edu.

Free Baseline Quota

By default, new projects on the SCC are created with 50 GB on /project and 50 GB on /projectnb. The Lead Project Investigator (LPI) can specify whether or not it should be dbGaP compliant. Additional Project Disk Space may be requested by a project’s LPI or IT/Administrative Contact. There is no charge for requests up to a total of 1000 GB with a maximum of 200 GB of that backed up. For LPIs with multiple projects, there is an additional limit of a maximum of 3000 GB (with a maximum of 600 GB of that backed up) of Free Baseline quota across all projects.

Application form: SCC Management Page

Purchasing/Renting Storage through the Buy-in and Storage-as-a-Service programs

The highly successful Buy-in Program is a convenient way to acquire dedicated storage at highly subsidized rates for an extended period of time (5 years). Any Researcher interested should contact buyin@rcs.bu.edu or review the Buy-in options web pages. The current cost is $76/Terabyte/5years.

The Storage-as-a-Service program offers researchers an option to acquire additional disk quota for a flexible time duration at a subsidized rate of $20/Terabyte/year. Allocations are in whole Terabyte (1000 Gigabyte) units only. To purchase an allocation through this program, the LPI should fill in the request form for Storage-as-a-Service and include their Financial Contact information. The Financial Contact will receive details on how to send an Internal Service Request for transmitting payment. The application form is found on each LPI’s SCC Management Page.

All grant rules apply when using grant funds.

Buy-in vs Storage-as-a-Service Comparison

Buy-in Storage-as-a-Service
Model Purchase Rental
Time Horizon 5 years 6 months+
Annual Cost $15.20/TB $20/TB
Billing Schedule Full five year cost paid up front when large storage array purchase occurs. This is generally 0-4 months after a request comes in. Billed (and pro-rated for periods less than one year) annually by fiscal year                                         
Minimum Purchase 10 TB 1 TB
Storage Availability Generally immediately via “Loaner” space until actual purchase but not always Immediately
Capital Expense? Yes No
Fully fungible/
Transferable between projects
Yes Yes
Recommended For Large, long-term purchases Small and/or short term purchases
How to begin purchase? Email buyin@rcs.bu.edu Submit appropriate form on your
SCC Management Page

Accessing Project Disk Space

When a project is created on the SCC, subdirectories will be created for the project under the appropriate /project, /projectnb, /restricted/project, and/or /restricted/projectnb directories. These subdirectories will have the same name as the project and will be writable by any member of the project. The structure and access to the files and subdirectories created under the project’s directory is entirely at the discretion of the project members. The Unix “group” file permission mechanism can be used to control permissions for the project’s subdirectories (see the man page for “chmod” for more details).

Limitations on Number of Files

In addition to the quota on the total size of your files, there is also a limitation on the number of files you can have. The system does not operate well if people have many millions of very small files. It is much better to have a smaller number of somewhat larger files. This limitation only affects a fairly small number of people. There are three formulas in play for this calculation. If your directory has a file size quota of 16TB or more, you are allowed 2 files per MB of quota. This works out to 33.5 million files at 16TB and 200 million files (which is the maximum allowable in any partition) at 100TB or higher. For smaller allocations, you are allowed 1 file for each 32KB of space you have up to 1 TB or 33 million files. After that, the file quantity limit only goes up after 16TB as explained earlier. Here is a table with the # of files limits for certain quotas.

Quota (GB) Quota (Files)
200 6.5 Million
1,024 33.5 Million
16,384 33.5 Million
100,000 200 Million
200,000 200 Million

Quota Enforcement

Project Disk Space quotas on the SCC are enforced by the file system. Daily email reminders are sent to the project’s Lead Project Investigator and all project members to let them know when the project is over its quota, including breaking down how much space each user is using. Projects have a soft limit equal to their granted quota and a hard limit 10% greater (with a maximum of 100GB over the quota, regardless of its size). Projects can never exceed their hard limit and can only go over their soft limit for a maximum of 7 days. A project over its limit simply needs to delete enough files to get under the soft limit to have full write access restored immediately.

To help manage the project members’ Project Disk usage, LPIs may specify a limit for each individual researcher. By default, each project member’s limit is set to project’s full allocation. A LPI may reassign individual quotas at any time using the Project Disk Space update form found at the link above. These individual quotas are enforced by the honor system, with email reminders sent daily to the Lead Project Investigator and user who is over his or her personal quota.

LPIs and users may review a daily record of their project’s and individual Project Disk usage via their SCC Management Page; more detailed information is available for LPIs. They may also use the command pquota -u projectname on the system to see a breakdown of Project Disk usage for a given project.

Please note that the quota -v command will display a user’s home directory usage, not Project Disk Space usage.

Two helpful Linux commands for determining disk usage are du and df -h .. Researchers who keep all of their files in their own subdirectory can cd to that directory and type du -sk to display their usage. You can see your project’s overall usage, available space, and hard limit quota by running the command df -h . anywhere inside of your group’s appropriate Project Disk Space directory.

Backed up vs. Not-backed-up Project Disk

Most computational research projects will need a combination of /project (backed-up) and /projectnb (not-backed-up) disk space. Files on the /project partitions are backed up nightly while those on the /projectnb partitions are not backed up. The /projectnb partitions are appropriate for most files used in computational research on the SCC.

Backing up files requires additional resources and expense. We ask that you use the /project partitions only for files that need to be backed up. These will be restorable in the event of catastrophic failure.

Files that should be stored in /project and backed up are those that are being edited (e.g. codes), files that do not have a copy elsewhere, and files that cannot be regenerated.

Files that should be stored in /projectnb:

    • Data which exists elsewhere and is copied to the Project Disk Space for high performance access during computation.
    • Data which can be easily regenerated.
    • Data which is needed for only a short time.
    • Newly generated data which will be copied to another system for storage.

      If you accidentally delete or corrupt files stored in any of the Project Disk partitions, you may locate them in the Snapshots and copy them into your directory.