Snel.com - Ceph errors on our distributed storage platform. – Incident details

Ceph errors on our distributed storage platform.

Resolved
Operational
Started over 3 years agoLasted less than a minute

Affected

SmartDC, Rotterdam NL

Operational from 10:21 PM to 10:21 PM

Cloud VPS platform

Operational from 10:21 PM to 10:21 PM

Updates
  • Resolved
    Resolved

    Earlier this night we were experiencing issues with Ceph which is used as a distributed storage platform for our Cloud VPS platform. During our weekly maintenance window we were updating and patching our hypervisors to the latest stable version, also to mitigate the following Proxmox Security vulnerability (CVE-2021-20288)[1]. At approximately 00:21 CEST we were receiving notifications about servers that became unresponsible. The issues were caused by a version mismatch between the hypervisors and were resolved around 01:00. We have been working all night long to check and reboot where necessary to ensure all VM's are available.

    It may be possible that your server was restarted in order to make the server responsive again.

    All the issues should be resolved by now. If you are still experiencing issues feel free to contact us at support@snel.com.

    [1] https://forum.proxmox.com/threads/ceph-nautilus-and-octopus-security-update-for-insecure-global_id-reclaim-cve-2021-20288.88038/