Problems with a single compute node in KNA1

Incident Report for Cleura Public Cloud

Postmortem

The incident started: 13:05
Incident was solved: 13:55

Customer impact:
VMs running on the affected compute node was unreachable (Error state) until they were moved to a new compute node.

Timeline:
13:05 - One compute node crashed
13:13 - Server didn’t start again.
13:15 - Evacuation of affected VMs to other compute nodes started.
13:40 - Main part of VMs was moved and started again.
13:55 - All VMs was up and running on new compute nodes.

Root Cause:
The root cause for this incident was broken hardware. The affected server has been taken out of production for further investigation and repair.

Posted Oct 13, 2021 - 16:58 CEST

Resolved

This incident has been resolved.

Posted Oct 13, 2021 - 16:39 CEST

Update

We are continuing to monitor for any further issues.

Posted Oct 13, 2021 - 14:01 CEST

Monitoring

A fix has been implemented and we are monitoring the results.

Posted Oct 13, 2021 - 13:56 CEST

Identified

The issue has been identified and a fix is being implemented.

Posted Oct 13, 2021 - 13:42 CEST

Investigating

Our Engineering team is investigating an issue with a single compute node in Karlskrona region (KNA1).

During this time, the limited amount of users with virtual instances on the actual compute node may experience issues with virtual instances. We apologize for the inconvenience and will share an update once we have more information.

Posted Oct 13, 2021 - 13:17 CEST

This incident affected: KNA1 (Karlskrona) (Compute).