LAX3 encountered what appeared to be multiple hardware failures, however, that was most certainly not the case. Upon replacement of every single component by means of a new physical server, only differing in IP addresses, with the same symptoms being presented… it was clear that something else was at foot.
Due to a software issue between the Linux OS and the firmware on the NVMe disks, the SSDs had become unusable due to what would normally present as failure to the OS, thus removal of the disks from the RAID array and an attempt to preserve the operations currently ongoing - an expected practice with RAID for redundancy. A small nudge in terms of a kernel flag goes a long way, and while not being a part of our standard, it will be applied to any physical server that operates using these SSDs to ensure service continuity and prevent future incidents similar to this one.
We’re working to resolve the matter that incited the incident, which we are confident has been found and will be resolved in a near-future Prisma update. As for any physical systems with the specific brand of NVMe SSDs, we will be resolving this during our global maintenance this week for software updates. As it has been 48 hours since the last incident, we’ve marked the incident as resolved and believe that it will not recur.
Thank you for your patience in this matter and, as always, please contact us if you require any assistance. We’re always here to help!