The incident started: 13:05
Incident was solved: 13:55
Customer impact:
VMs running on the affected compute node was unreachable (Error state) until they were moved to a new compute node.
Timeline:
13:05 - One compute node crashed
13:13 - Server didn’t start again.
13:15 - Evacuation of affected VMs to other compute nodes started.
13:40 - Main part of VMs was moved and started again.
13:55 - All VMs was up and running on new compute nodes.
Root Cause:
The root cause for this incident was broken hardware. The affected server has been taken out of production for further investigation and repair.