Resolved: Shared Computing Cluster (SCC) – System was Partially Down due to a power outage

Incident Discovery Time: 12:15pm on 11/18/2023 Time of Resolution: 02:15am on 11/18/2023 Services Impacted: Research Computing

Description of Impact

There was an unplanned power outage at the MGHPCC at 12:15 pm. All compute nodes lost power, and the Research Computing Team is working to restore access now. Login nodes and the SCC filesystem were not affected.

Incident Description and Resolution

Power has been fully restored to the data center, and SCC compute nodes are now available. Due to a data center power outage at 12:15 pm, the Shared Computing Cluster (SCC) compute nodes briefly lost power. Access has been restored. Batch jobs running at 12:15 pm were lost and will need to be resubmitted. Queued batch jobs and the SCC login nodes and filesystem were not affected.

Additional Information

The cause of this incident was determined to be a power outage. If you continue to have issues, please contact the IT Help Center.

Previous Update