Buy-In Program Details : TechWeb : Boston University

Installation: Compute nodes will be installed in a chassis within a Shared Computing Cluster (SCC) rack located at the MGHPCC in Holyoke, MA. Please see the Buy-in Computing System Offerings for details on the systems that are available. Buy-in Storage Offerings will be integrated into the GPFS shared file-system for the cluster.
Networking: By default a 10 Gigabit Ethernet connection to your node will be provided. If you have purchased the IB option, that connection will be provided as well.
Infrastructure: Power and cooling will be provided without charge.
Hardware maintenance: Five years of maintenance is included with purchase. We will provide coordination of hardware maintenance and repair with the vendor. Five years is considered the lifetime of this equipment and after 5 years the equipment will be retired.
System Administration: Your hardware will be integrated into the existing shared Linux cluster. We will be responsible for installing, configuring, and maintaining all system software. Your system will necessarily run the same operating system and software stack as the other nodes in the cluster. See our list of SCC software packages.
Monitoring: We will provide automated monitoring of system status, hardware and software with real-time alerts, as well as regular and active monitoring of system security. We will respond to critical problems within 4 business hours. (See Support below)
File Systems: Buy-in storage will be integrated into the SCC shared file system and allocated to the owner’s project(s). The SCC shared file-system is accessible from all nodes, including Buy-in nodes, within the cluster.
Batch Queues: Your nodes will participate in the batch queuing system for the cluster. (See Priority Access below)
Priority Access: We will create a special batch queue for your use, or others that you designate. This queue will give you priority access to your nodes. Jobs which you submit to the queue will be guaranteed to run next. However, other jobs may run on your nodes when you have no running jobs and no pending requests. If you submit a request while such a job is running, the running job will be allowed to run to completion before your job starts. We will limit the run-time to 12 hours on any such jobs in order to bound the amount of time you may have to wait for your job to start. With the 12 hour limit, you would have an expected wait time of no more than 6 hours. We will work with you to design a priority policy that fits your needs and can be implemented within the batch system. This policy would include any run-time limits you specify, including no restrictions on run-times for your own jobs on your own nodes.
Internet Server Restriction: due to security concerns, we can not allow internet servers (web, FTP, etc.) to run on the cluster nodes. If you have such needs please speak with RCS staff to see how we can best accommodate your requirements.
Database Restriction: The configuration of the nodes prohibits from running database servers.
Hours of Operation: RCS staff are available to resolve problems and provide assistance during normal business hours.
Support: Support is available though our normal support channels such as sending email to help@scc.bu.edu.
Uptime: While the storage systems are on backed-up (i.e. redundant) power to protect them against damage or data loss during unplanned power outages, the compute nodes are not. Nevertheless, we strive for a minimum of 97% uptime. The MGHPCC data center does have an annual 24-hour maintenance outage. This outage is normally scheduled for the week after BU’s graduation date.
Change Policy: Normal changes will be announced at least one week in advance. Emergency changes may occur with little or no advanced notice. The official IS&T change windows are 2:00am – 6:30am on Tuesdays, Wednesdays, and Thursdays and 12:00am – 8:00am on Sundays. Any changes requiring downtime on a login node will take place during one of those windows. For a batch only node, we will first disable scheduling of jobs on that node and wait until all running jobs complete. The change will then be implemented during the next change window or possibly earlier if doing so will allow the node to be placed back in service sooner.